The invention relates generally to gene association analyses and more specifically to the use of single nucleotide polymorphisms as a determinant of trait identification, parentage identity and breed determination in companion animals.
The generation of deep coverage, high quality, genomic information, and its associated application to gene discovery and polymorphic analysis, will create an unparalleled ability to manage animal health and nutrition through an entire lifetime. The use of better management of companion animal health throughout the life cycle will impact a number of criteria for such animals. These include but are not limited to: treatment selection; monitor effectiveness of therapy; focus more on preventative therapeutics rather than acute treatment; prediction of disease predisposition; earlier disease detection; disease characterization; create value at different points in animal health industry (vets, pharma, pet nutrition, registries ID plus) and creates lower costs for pet owners. Breed specific markers have been identified in bovine subjects (DeNise et al., 2003. U.S. patent application Ser. No. 10/750,622) and can be similarly applied to canine subjects.
Parentage and identity panels are the first applied technology of using genomic analysis to begin managing canine animals. For example, panels have been developed utilizing microsatellite marker panels (DeNise et al., 2004. Anim. Genetics. 35(1): 14-17; Halverson et al., 1995. U.S. Pat. No. 0,587,4217; Ostrander et al., 1993. Genomics, 16: 207-213, Ostrander et al., 1995. Mammalian Genome, 6: 192-195; Franscisco et al., 1996. Mammalian Genome 7:359-362). In particular, parentage and identity panels can be used to:
Classification of individual dogs in a population has often relied on a priori groupings of individual animals on the basis of parentage and registration with a Breed Association for example, the American Kennel Club. If these criteria are not known or not available, animals can be classified as a member of a breed or combination of breeds based on phenotype or geographic location. For example, a dog with a long hair coat, pointed nose, white with black saddle hair color, with an ability to herd animals may be assumed to be of the Border Collie breed. These phenotypes such as size, coat color, coat length, ear length, head shape, body shape, sound of bark, etc. are readily observable by owners and breeders and are frequently used for the basis of breed classification with various degrees of success.
There are two possible options for classifying an individual canine animal into a population are. The first includes assigning an animal to a population based on known or assumed parentage, physical appearance, disposition or special ability. The second includes obtaining from a set of predefined populations (such as breeds) sample DNA from a number of members of each population to estimate allele frequencies in each population. Using the allele frequencies, it is possible to compute the likelihood a given genotype originated in each population and individuals can be assigned to population on the basis of these likelihoods (Parker et al., 2004. Science. 304:1160-1164; Pritchard, J. K., et al., Genetics 155: 945-959 (2000)).
Both strategies (above) rely on defining a set of populations. A classification based on phenotype or geographic locality may not accurately describe the genetic structure of a population if similar phenotypes can arise despite differences in genotype (Rosenberg, N. A., et al., Genetics. 159: 699-713 (2001); Parker, H. G. et al. Science. 1160-1164, (2004)).
To date, the only methods available to qualify animals for these systems are known or assumed parentage or phenotypic appearance. There is an opportunity to improve accuracy of individual animal qualification using the allele frequencies to compute the likelihood that a given genotype originated in specific breed population.
It is important to canine owners to know the breed from which a particular animal may arise because animals of the same breed have similar behavioral and predispositions to disease characteristics. For example, knowledge of breed composition is important to verify claims for breed of a canine animal when parentage is in dispute. Verification of claims for breed or breed composition has not been possible because no available technology could classify a canine animal to a particular population or infer the breed composition of an individual animal. Currently, the only canines accepted by breed are those where the records of individual animals are maintained by Breed Associations.
In addition, breed information is important for understanding the disposition and safety of canines prior to purchase. Canines within a breed often have common personality traits that can be utilized for matching animals to proper homes and understanding and providing the proper environment for breed types. In an extreme example of breed type behaviors, Pit Bulls and Wolf crosses are often banned from communities and may affect homeowners and liability insurance.
Further, lost and found searches for canine animals become more accurate when the exact breed is known. It allows animal shelters to screen animals and announce the results of the search to potential owners and to specific breed rescue groups.
Moreover, mixed breed groups are developing registries that will rely on technology to group animals by the percentage of their breeding, which may lead to new breed development. For example, Doodles (Poodle crosses) have become a popular breed type and this technology could lead to certification programs.
Finally, canines of specific breed types may have characteristics important in toxicology and research model studies. Breed verification and identification could ensure that the animals utilized in these studies will fit the experimental protocol.
Accordingly, there remains a need for methods and compositions that provide information regarding canine breed. For animals not registered with a breed association, information concerning breed can be useful to manage the health and nutrition of an animal. For example, Labrador Retrievers are prone to hip dysplasia; the symptoms can be reduced by adjusting the nutritional regiment of the young canine; owners could use a breed identity tool to determine appropriate preventative measures for ensuring the health of their dog.
The present invention provides methods and systems for managing, selecting and breeding companion animals. These methods for identification and monitoring of key characteristics of individual animals and management of individual animals maximize their individual potential performance and health. The invention methods allow predictive (predisposition) diagnostics, nutritional therapies and veterinary pharmaceutical therapeutics as applied to companion animals. The methods of the invention provide systems to collect, record and store such data by individual animal identification so that it is usable to improve future animals bred by a breeder, for example, and managed by animal owners and breeders. The methods and systems of the present invention utilize information regarding genetic diversity among companion animals, particularly single nucleotide polymorphisms (SNPs), and the effect of nucleotide occurrences of SNPs on important genetic traits and determining the parentage, identity and breed of companion animals.
The present invention is based, in part, on the discovery of canine single nucleotide polymorphism (SNP) markers that can be utilized to identify individual animals; determine or verify parentage of a single dog from any breed if the putative parent(s) are also available for testing; and are associated with, and predictive of, canine breeds including, but not limited to: Afghan Hound, Basenji, Basset Hound, Beagle, Belgian Tervuren, Bernese Mountain Dog, Borzoi, Chihuahua, Chinese Shar-Pei, Cocker Spaniel, Dachshund, Doberman Pinscher, German Shepherd Dog, German Shorthaired Pointer, Golden Retriever, Labrador Retriever, Mastiff, Miniature Schnauzer, Poodle, Pug, Rottweiler, Saluki, Samoyed, Shetland Sheepdog, Siberian Husky, St. Bernard, Whippet, Yorkshire Terrier breeds. Accordingly, the present invention provides methods to discover and use single nucleotide polymorphisms (SNP) for identifying breed, or line and breed, or line composition of a canine subject. The present invention further provides specific nucleic acid sequences, SNPs, and SNP patterns that can be used for parentage, identity, breed identity and identifying breed or breed combinations to manage the health and well being of individual animals based on their breed composition.
Accordingly, in one embodiment the present invention provides a method to match an individual canine from a nucleic acid sample of the canine subject, that includes identifying in the nucleic acid sample, at least one nucleotide occurrence of at least one single nucleotide polymorphism (SNP) in any one of the nucleic acid sequences (SEQ ID NOs:1-101) encompassed by the GenBank Accession numbers provided in Table 1 or sequences listed in Table 8, under parentage and identity marker. The SNP is the last (most 3′) nucleotide listed in any one of SEQ ID NOs:1-101. The sequences containing the canine SNPs provided in Table 1 and Table 8 can be found on the world wide web at ftp://ftp.ncbi.nih.gov/snp/dog/XML. The contents of these files are encoded in XML, and contain the following information: SNP Id, Contig Name denoting the location of the SNP, and 60 bases of sequence flanking 5′ end of SNP, and the alleles comprising the SNP. The position of the SNP in the contig is determined by blasting the 5′ flanking sequence to the contig sequence. The location of the SNP is the base following the last matching base of the 60 bases. Contigs can be found on the world wide web at http://www.ncbi.nlm.nih.gov/entrez/query.fcgidb=Nucleotide&cmd=Search&term=AACN01000 0001:AACN011089636[PACC].
For example, the SNP can be identical to a genotype that is stored within a database of previously identified animals, or an archived nucleic acid or tissue sample of the subject can be identified with at least one SNP in any one of the markers listed in Table 1 or Table 8, thereby matching the canine subject to the archived sample. A SNP is matched to a canine subject when all nucleotide occurrences of the SNP occur in the archived sample and the canine subject. Therefore, in certain aspects, the methods include matching the identity of a subject using the nucleotide occurrence. The probability of matching can be statistically calculated based on the frequencies of nucleotides of each SNP.
Accordingly, in another embodiment the present invention provides a method to assign or verify the parentage of an individual canine from a nucleic acid sample of the canine subject, that includes identifying in the nucleic acid sample, at least one nucleotide occurrence of at least one single nucleotide polymorphism (SNP) in any one of markers listed in Table 1 or Table 8, under parentage and identity markers, wherein the SNP is consistent with the inheritance of the parental nucleotide. Potential parents are excluded from the parent list when the occurrence of the nucleotides in the nucleic acid sample of the canine subject is different than the potential parent in both nucleotides. These nucleotides can be compared through a database of previously identified animals, or an archived nucleic acid or tissue sample of the subject can the identified with at least one SNP in any one of the markers listed in Table 1 or Table 8, thereby matching the potential parents to the canine subject. Parents are verified or identified when all possible parents have been excluded from parentage except a single individual. Therefore, in certain aspects, the methods include matching the canine subject to potential parents using the nucleotide occurrence. The probability of exclusion can be statistically calculated based on the Frequencies of nucleotides of each SNP.
In another embodiment, the present invention provides an isolated polynucleotide that includes a fragment of at least 20 contiguous nucleotides of any one of sequences associated with the accession numbers set forth in Table 1 or Table 8 (SEQ ID NOs:1-101), a polynucleotide at least 90% identical to the 20 contiguous nucleotide fragment, or a complement thereof. In certain aspects, the isolated polynucleotide, for example, includes a fragment of at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 100, 200, 250, 500, or 600 contiguous nucleotides of any one of Table 1 Table 8 sequences. In another aspect, the isolated polynucleotide is at least 65, 70, 75, 80, 85, 90, 95, 96, 97, 98, 99, or 99.5% identical to the sequences that correlate with the accession numbers set forth in Table 1 Table 8 (SEQ ID NOs:1-101), for example.
In another embodiment the present invention provides a method to infer breed of a canine subject from a nucleic acid sample of the canine subject, that includes identifying in the nucleic acid sample, at least one nucleotide occurrence of at least one single nucleotide polymorphism (SNP) in any one of markers listed in Table 1 or Table 8, breed specific markers, wherein the SNP is associated with a breed, thereby inferring the breed of the canine subject. A SNP is associated with a breed when at least one nucleotide occurrence of the SNP occurs more frequently in subjects of a particular breed than other breeds in a statistically significant manner, for example with greater than 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% confidence. Therefore, in certain aspects, the methods include identifying whether the nucleotide occurrence is a canine SNP allele identified herein as associated with canine breed. In certain aspects, the identified breed includes, but is not limited to, Afghan Hound, Basenji, Basset Hound, Beagle, Belgian Tervuren, Bernese Mountain Dog, Borzoi, Chihuahua, Chinese Shar-Pei, Cocker Spaniel, Dachshund, Doberman Pinscher, German Shepherd Dog, German Shorthaired Pointer, Golden Retriever, Labrador Retriever, Mastiff, Miniature Schnauzer, Poodle, Pug, Rottweiler, Saluki, Samoyed, Shetland Sheepdog, Siberian Husky, St. Bernard, Whippet, Yorkshire Terrier.
In another embodiment, the present invention provides a method for determining a nucleotide occurrence of a single nucleotide polymorphism (SNP) in a canine sample, that includes contacting a nucleic acid obtained from the sample with an oligonucleotide that binds to a target region comprising any one of the sequences set forth in the GenBank Accession numbers provided in Table 1. The determination typically includes analyzing binding of the oligonucleotide, or detecting an amplification product generated using the oligonucleotide, thereby determining the nucleotide occurrence of the SNP.
In another embodiment, the present invention provides an isolated polynucleotide that includes a fragment of at least 20 contiguous nucleotides, a polynucleotide at least 90% identical to the fragment of 20 contiguous nucleotides, or a complement thereof, wherein the isolated polynucleotide includes a nucleotide occurrence of a single nucleotide polymorphism (SNP) associated with breed, wherein the SNP corresponds to the last nucleotide provided in any one of SEQ ID NOs:1-101.
As used herein, the term ‘companion animal’ refers to animals commonly domesticated by people and used as companionship pets. This could include, for example, dogs and cats, but otherwise may also include more exotic pets such as various fish, reptiles, birds, horses, rabbits, hamsters, gerbils, mice, rats and the like.
For example, the invention identifies animals that have superior genetic traits, predicted very accurately, that can be used to identify parents of the next generation through selection. These methods can be used to sort companion animals to determine performance for dog shows and breed club shows or for working dogs such as guide dogs, sheep dogs and police dogs. This invention provides a method for determining the optimum male and female parent to maximize the genetic components of dominance and epistasis thus maximizing heterosis and hybrid vigor in the animals.
In one aspect, the invention provides methods to draw an inference of a trait based on genotype of a companion animal subject by determining the nucleotide occurrence of at least one companion animal SNP that is determined using methods disclosed herein, to be associated with the trait. For example, the inference can be drawn regarding a health characteristic, for example, hip dysplasia (bone and joint health); diabetes; hypertension; atherosclerosis; autoimmune disorders; kidney disease and neurological disease. The invention is also useful for assessing complex traits such as energy metabolism; aging and breed-specific traits.
Methods of the present invention that relate to companion animal management, for example management in breeding, typically include managing at least one of food intake, diet composition, administration of feed additives or pharmacological treatments such as vaccines, antibiotics, age and weight at which diet changes or pharmacological treatments are imposed, days fed specific diets, castration, feeding methods and management, imposition of internal or external measurements and environment of the companion animal subject based on the inferred trait.
The inference is used in methods of the present invention for the following aspects of the invention: to improve profits related to selling a companion animal subject; to manage companion animal subjects; to sort companion animal subjects; to improve the genetics of a companion animal population by selecting and breeding of companion animal subjects; to clone a companion animal subject with a specific genetic trait, a combination of genetic traits, or a combination of SNP markers that predict a genetic trait; to track a companion animal subject or offspring; and to diagnose a health condition of a companion animal subject.
In another aspect, the present invention provides a method for identifying a companion animal genetic marker that influences a phenotype of a genetic trait. The method includes analyzing companion animal genetic markers for association with the genetic trait. Preferably, the method involves determining nucleotide occurrences of single nucleotide polymorphisms (SNPs). Preferably, nucleotide occurrences of at least two SNPs are identified that influence the genetic trait or a group of traits.
In another aspect, the present invention provides a high-throughput system for determining the nucleotide occurrences at a series of companion animal single nucleotide polymorphisms (SNPs). The system includes one of the following: solid support to which a series of oligonucleotides can be directly or indirectly attached, homogeneous assays and microfluidic devices. Each of these methods is used to determine the nucleotide occurrence of companion animal SNPs that are associated with a genetic trait.
In another aspect, the present invention provides a computer system that includes a database having records containing information regarding a series of companion animal single nucleotide polymorphisms (SNPs), and a user interface allowing a user to input nucleotide occurrences of the series of companion animal SNPs for a companion animal subject. The user interface can be used to query the database and display results of the query. The database can include records representing some or preferably all of the SNPs of a companion animal SNP map, preferably a high-density companion animal SNP map. The database can also include information regarding haplotypes and haplotype alleles from the SNPs. Furthermore, the database can include information regarding phenotypes and/or genetic traits that are associated with some or all of the SNPs and/or haplotypes. In these embodiments the computer system can be used, for example, for any of the aspects of the invention that infer a trait of a companion animal subject.
In one embodiment, a method for inferring a phenotype or genetic trait of a canine subject from a target nucleic acid sample of the subject is provided. The method includes identifying, in the nucleic acid simple, at least one nucleotide occurrence of a single nucleotide polymorphism (SNP) corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101. In some embodiments the nucleotide occurrence of at least 2 SNP's is determined. In other embodiments, the at least 2 SNPs provide a haploytpe, thereby identifying a haplotype allele that is associated with the trait. In additional embodiments, a diploid pair of haplotype alleles are identified.
In another embodiment, a method for identifying a phenotype or genetic trait of a canine test subject is provided. The method includes obtaining a target nucleic acid sample from the test subject by a method that includes identifying in the nucleic acid sample at least one single nucleotide polymorphism (SNP) corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101. The identification can optionally be repeated for additional subjects. The method further includes determining the allele frequency corresponding to each SNP identified and comparing the allele frequency of the test subject with each additional subject.
In yet another embodiment, a kit for determining nucleotide occurrences of canine SNPs is provided. The kit includes an oligonucleotide probe, primer, or primer pair, or combinations thereof, for identifying the nucleotide occurrence of at least one canine single nucleotide polymorphism (SNP) corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101. The kit can include one or more detectable labels. The detectable label can be a non-extendible nucleotide. The non-extendible nucleotide can be a ddNTP that is fluorescently or chemically labeled, or labeled by biotinylation.
In yet another embodiment, a database including each single nucleotide polymorphism (SNP) corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101, is provided. Also provided is a database including allele frequencies generated by analyzing the aforementioned database of SNPs.
In one embodiment, a method for inferring a phenotype or genetic trait of a canine subject from a target nucleic acid sample of the subject is provided. The method includes identifying, in the nucleic acid sample, at least one nucleotide occurrence of a single nucleotide polymorphism (SNP) corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of the sequences set forth in SEQ ID NOs:1-101 and associated with the GenBank Accession numbers of Table 1 and Table 8.
In yet another embodiment, a computer-based method for identifying or inferring a trait of a canine test subject is provided. The method includes obtaining a nucleic acid sample from the subject and identifying in the nucleic acid sample at least one nucleotide occurrence of at least one single nucleotide polymorphism (SNP) corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101. The method further includes searching a database comprising canine allele frequencies and retrieving the information from database. The method further includes optionally storing the information in a memory location associated with a user such that the information may be subsequently accessed and viewed by the user.
In one embodiment, a method for identifying or inferring a trait of a canine test subject from a nucleic acid sample obtained from the subject is provided. The method includes contacting the nucleic acid sample with a pair of oligonucleotides that comprise a primer pair, wherein amplified target nucleic acid molecules are produced. The further includes hybridizing at least one oligonucleotide primer selected from the group consisting of SEQ ID NOS:306-407 to one or more amplified target nucleic acid molecules, wherein each oligonucleotide primer is complementary to a specific and unique region of each target nucleic acid molecule such that the 3′ end of each primer is proximal to a specific and unique target nucleotide of interest. The method also includes extending each oligonucleotide with a template-dependent polymerase and determining the identity of each nucleotide of interest by determining, for each extension primer employed, the identity of the nucleotide proximal to the 3′ end of each primer. A primer pair includes any of the forward and reverse primer pairs listed in Table 7. For example, a first primer of the primer pair can be selected from SEQ ID NOS:102-203 and the second primer of the primer pair can be selected from SEQ ID NOS:204-305.
In another embodiment, an isolated oligonucleotide comprising any one of SEQ ID NOS:306-407, wherein each oligonucleotide further includes one additional nucleotide positioned proximal to the 3′ end of each oligonucleotide, and wherein the oligonucleotide specifically hybridizes to a nucleic acid sequence derived from a canine subject, is provided. Also provided are the complement of the aforementioned oligonucleotide.
In another embodiment, an isolated single nucleotide polymorphism (SNP) corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101, is provided. Oligonucleotides including the SNP corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101, are provided. The complement of these oligonucleotides are also provided.
In another embodiment, a panel comprising at least one single nucleotide polymorphism (SNP) corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101, is provided.
In another embodiment, a computer-based method for identifying or inferring a trait of a canine test subject is provided. The method includes obtaining a nucleic acid sample from the canine subject and identifying in the nucleic acid sample at least one nucleotide occurrence of at least one single nucleotide polymorphism (SNP) corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101. The method further includes searching a database comprising a plurality of single nucleotide polymorphism (SNP) markers selected from at least two of the SNP markers at the 3′ position to any one of SEQ ID NOs:1-101, wherein the database is generated from a nucleic acid sample obtained from a canine non-test subject. The method also includes retrieving the information from the database and optionally storing the information in a memory location associated with a user such that the information may be subsequently accessed and viewed by the user.
In another embodiment, a method for identifying the parentage of a canine test subject is provided. The method includes obtaining a nucleic acid sample from the test subject and identifying in the nucleic acid sample at least one single nucleotide polymorphism (SNP) corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101. The method further includes determining the alleles corresponding to each SNP identified and comparing the alleles to putative parents of the test subjects. The parents not possessing at least one allele in common with the test subject is excluded.
In another embodiment, a method to infer breed or line of a canine test subject from a nucleic acid sample obtained from the subject is provided. The method includes identifying in the nucleic acid sample, at least one nucleotide occurrence of at least one single nucleotide polymorphism (SNP) corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101.
In yet another embodiment, a method of generating a genome discovery map is provided. The method includes selecting a plurality of single nucleotide polymorphism (SNP) markers selected from at least two of the SNP markers corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101, wherein each marker in the series will be separated by approximately 150,000 bp and generating the genome discovery map based upon the selected markers. The discovery map can be a whole genome discovery map. The plurality of single nucleotide polymorphism (SNP) markers can include about 10, 100, 1000, 5000 or about 10000 markers. The plurality of single nucleotide polymorphism (SNP) markers, or the number of markers indicated by the amount of linkage disequilibrium in a canine species, can further be selected based upon their dispersion across the entire genome.
The methods of the invention are particularly well suited for predictive diagnostics, novel therapeutics, nutritional therapies and breeding genetic information of companion animal subjects. The methods allow for the ability to identify and monitor key characteristics of individual animals and manage those individual animals to maximize their individual potential health and breeding characteristics. Furthermore, the methods of the inventions provide systems to collect, record and store such data by individual animal identification so that it is usable to improve future animals bred. Specific embodiments of the invention are exemplified in Exhibit A, as provided in U.S. Provisional Ser. No. 60/524,180, filed Oct. 24, 2003 and incorporated herein by reference.
Accordingly, a method according to this aspect of the invention includes inferring a trait of the companion animal subject, such as a canine subject, from a nucleic acid sample of the subject. The inference is drawn by a method that includes identifying in the sample, a nucleotide occurrence for at least one single nucleotide polymorphism (SNP), wherein the nucleotide occurrence is associated with the genetic trait; and wherein the genetic trait affects the physical characteristic. Furthermore, the method includes managing at least one of food intake, diet composition, administration of feed additives or pharmacological treatments such as vaccines, antibiotics, age and weight at which diet changes or pharmacological treatments are imposed, days fed specific diets, castration, feeding methods and management, imposition of internal or external measurements and environment of the companion animal subject based on the inferred trait. This management results in a maximization of physical characteristics of a companion animal subject.
The method includes identification of the causative mutation influencing the trait directly or the determination of one or more SNPs that are in linkage disequilibrium with the associated genetic trait.
Preferably, the method includes a determination of the nucleotide occurrence of at least two SNPs. More preferably that at least two SNPs form all or a portion of a haplotype, wherein the method identifies a haplotype allele that is in linkage disequilibrium and thus associated with the genetic trait. Furthermore, the method can include identifying a diploid pair of haplotype alleles.
A method according to this aspect of the invention can further include using traditional factors affecting the economic value of the companion animal subject in combination with the inference based on nucleotide occurrence data to determine the economic value of the companion animal subject.
As used herein, the term ‘at least one’, when used in reference to a gene, SNP, haplotype, or the like, means 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc., up to and including all of the haplotype alleles, genes, and/or SNPs of the companion animal genome. Reference to ‘at least a second’ gene, SNP, or the like, means two or more, i.e., 2, 3, 4, 5, 6, 7, 8, 9, 10, etc., companion animal genes, SNPs, or the like.
Polymorphisms are allelic variants that occur in a population that can be a single nucleotide difference present at a locus, or can be an insertion or deletion of one, a few or many consecutive nucleotides. As such, a single nucleotide polymorphism (SNP) is characterized by the presence in a population of one or two, three or four nucleotides (i.e., adenosine, cytosine, guanosine or thymidine), typically less than all four nucleotides, at a particular locus in a genome such as the human genome. It will be recognized that, while the methods of the invention are exemplified primarily by the detection of SNPs, the disclosed methods or others known in the art similarly can be used to identify other types of canine polymorphisms, which typically involve more than one nucleotide. A SNP is associated with a breed when at least one nucleotide occurrence of the SNP occurs more frequently in subjects of a particular breed in a statistically significant manner, for example with greater than 80%, 85%, 90%, 95%, or 99% confidence. A canine “SNP allele” is a nucleotide occurrence of a SNP within a population of canine animals.
The term ‘haplotypes’ as used herein refers to groupings of two or more SNPs that are physically present on the same chromosome which tend to be inherited together except when recombination occurs. The haplotype provides information regarding an allele of the gene, regulatory regions or other genetic sequences affecting a genetic trait. The linkage disequilibrium and, thus, association of a SNP or a haplotype allele(s) and a companion animal genetic trait can be strong enough to be detected using simple genetic approaches, or can require more sophisticated statistical approaches to be identified.
Numerous methods for identifying haplotype alleles in nucleic acid samples are known in the art. In general, nucleic acid occurrences for the individual SNPs are determined, and then combined to identify haplotype alleles. The Stephens and Donnelly algorithm (Am. J Hum. Genet. 68:978-989, 2001, which is incorporated herein by reference) can be applied to the data generated regarding individual nucleotide occurrences in SNP markers of the subject, in order to determine alleles for each haplotype in a subject's genotype. Other methods can be used to determine alleles for each haplotype in the subject's genotype, for example Clarks algorithm, and an EM algorithm described by Raymond and Rousset (Raymond et al. 1994. GenePop. Ver 3.0. Institut des Siences de l'Evolution. Universite de Montpellier, France. 1994).
As used herein, the term ‘infer’ or ‘inferring’, when used in reference to a phenotype of a genetic trait, means drawing a conclusion about a trait or phenotype using a process of analyzing individually or in combination, nucleotide occurrence(s) of one or more SNP(s), which can be part of one or more haplotypes, in a nucleic acid sample of the subject, and comparing the individual or combination of nucleotide occurrence(s) of the SNP(s) to known relationships of nucleotide occurrence(s) of the SNP(s) and the phenotype. As disclosed herein, the nucleotide occurrence(s) can be identified directly by examining nucleic acid molecules, or indirectly by examining a polypeptide encoded by a particular gene where the polymorphism is associated with an amino acid change in the encoded polypeptide.
A ‘trait’ is a characteristic of an organism that manifests itself in a phenotype. Many traits are the result of the expression of a single gene, but some are polygenic (i.e., result from simultaneous expression of more than one gene). A ‘phenotype’ is all outward appearance or other visible characteristic of an organism. As used herein, a phenotype and a trait may be used interchangeably in some instances.
Methods of the present invention can be used to inter more than one trait. For example a method of the present invention can be used to infer a series of traits. Accordingly, a method of the present invention can infer, for example, coat quality/texture/color; bone/joint health, or predisposition to obesity. This inference can be made using one SNP or a series of SNPs. Thus, a single SNP can be used to infer multiple traits; multiple SNPs can be used to infer multiple traits; or a single SNP can be used to infer a single trait.
Relationships between nucleotide occurrences of one or more SNPs or haplotypes and a breed can be identified using known statistical methods. A statistical analysis result which shows an association of one or more SNPs or haplotypes with a breed with at least 80%, 85%, 90%, 95%, or 99% confidence, or alternatively a probability of insignificance less than 0.05, can be used to identify SNPs and haplotypes. These statistical tools may test for significance related to a null hypothesis that an on-test SNP allele or haplotype allele is not significantly different between groups with different traits. If the significance of this difference is low, it suggests the allele is not related to a breed. Statistical significance can be determined in both Bayesian and Frequentist ways.
As another example, associations between nucleotide occurrences of one or more SNPs or haplotypes and a phenotype (i.e. selection of significant markers) can be identified using a two part analysis in the first part, DNA from animals at the extremes of a genetic trait are pooled, and the allele frequency of one or more SNPs or haplotypes for each tail of the distribution is estimated. Alleles of SNPs and/or haplotypes that are apparently associated with extremes of a genetic trait are identified and are used to construct a candidate SNP and/or haplotype set. Statistical cut-offs are set relatively low to assure that significant SNPs and/or haplotypes are not overlooked during the first part of the method.
During the second stage, individual animals are genotyped for the candidate SNP and/or haplotype set. The second stage is set up to account for as much of the genetic variation as possible in a specific trait without introducing substantial error. This is a balancing act of the prediction process. Some animals are predicted with high accuracy and others with low accuracy.
In diploid organisms such as canines, somatic cells, which are diploid, include two alleles for each single-locus haplotype. As such, in some cases, the two alleles of a haplotype are referred to herein as a genotype or as a diploid pair, and the analysis of somatic cells, typically identifies the alleles for each copy of the haplotype. Methods of the present invention can include identifying a diploid pair of haplotype alleles. These alleles can be identical (homozygous) or can be different (heterozygous). Haplotypes that extend over multiple loci on the same chromosome include up to 2 to the Nth power alleles where N is the number of loci. It is beneficial to express polymorphisms in terms of multi-locus (i.e. multi SNP) haplotypes because haplotypes offer enhanced statistical power for genetic association studies. Multi-locus haplotypes can be precisely determined from diploid pairs when the diploid pairs include 0 or 1 heterozygous pairs, and N or N−1 homozygous pairs. When multi-locus haplotypes cannot be precisely determined, they can sometimes be inferred by statistical methods. Methods of the invention can include identifying multi-locus haplotypes, either precisely determined, or inferred.
A sample useful for practicing a method of the invention can be any biological sample of a subject, for example a canine subject, that contains nucleic acid molecules, including portions of the gene sequences to be examined, or corresponding encoded polypeptides, depending on the particular method. As such, the sample can be a cell, tissue or organ sample, or can be a sample of a biological material such as blood, milk, semen, saliva, hair, tissue, and the like. A nucleic acid sample useful for practicing a method of the invention can be deoxyribonucleic (DNA) acid or ribonucleic acids (RNA). The nucleic acid sample generally is a deoxyribonucleic acid sample, particularly genomic DNA or an amplification product thereof. However, where heteronuclear ribonucleic acid which includes unspliced mRNA precursor RNA molecules and non-coding regulatory molecules such as RNA is available, a cDNA or amplification product thereof can be used.
Where each of the SNPs of the haplotype is present in a coding region of a gene(s), the nucleic acid sample can be DNA or RNA, or products derived therefrom, for example, amplification products. Furthermore, while the methods of the invention generally are exemplified with respect to a nucleic acid sample, it will be recognized that particular haplotype alleles can be in coding regions of a gene and can result in polypeptides containing different amino acids at the positions corresponding to the SNPs due to non-degellerate codon changes. As such, in another aspect, the methods of the invention can be practiced using a sample containing polypeptides of the subject.
In one embodiment, DNA samples are collected and stored in a retrievable barcode system, either automated or manual, that ties to a database. Collection practices include systems for collecting tissue, hair, mouth cells or blood samples from individual animals at the same time that ear tags, electronic identification or other devices are attached or implanted into the animal. Tissue collection devices can be integrated into the tool used for placing the ear tag. Body fluid samples are collected and can be stored on a membrane bound system. All methods could be automatically uploaded into a primary database.
The sample can then be sent to a laboratory where a high-throughput genotyping system is used to analyze the sample. Genetic traits are predicted in the laboratory and forwarded electronically to a breeder, for example. The breeder then uses this information to sort and manage animals to maximize profitability and marketing potential. The information is also useful to a veterinarian, for example, to diagnose or treat a condition associated with a particular breed of companion animal. An exemplary subject of the present invention can be any canine subject, for example a sire, dam, pup, or any canine embryo or tissue. Nevertheless, the methods described herein are applicable to identify traits or breed of any companion animal subject, such as a dog, cat, horse, rabbit, fish, bird, reptile and the like. Thus, the present invention can also be used to provide information to breeders to make breeding, mating, and or cloning decisions. This invention can also be combined with traditional genetic evaluation methods to improve selection, mating, or cloning strategies associated with companion animals.
In another aspect, the present invention provides a method for improving profits related to breeding a companion animal subject. The method includes drawing an inference regarding a trait of the companion animal subject from a nucleic acid sample of the companion animal subject. The method is typically performed by a method that includes identifying a nucleotide occurrence for at least one single nucleotide polymorphism (SNP), wherein the nucleotide occurrence is associated with the genetic trait, and wherein the genetic trait affects the value of the animal or its products.
In one example, the present invention provides a system for determining the nucleotide occurrences in a population of canine single nucleotide polymorphisms (SNPs). The system typically includes a hybridization medium and/or substrate that includes at least two oligonucleotides of the present invention, or oligonucleotides used in the methods of the present invention. The hybridization medium and/or substrate are used to determine the nucleotide occurrence of canine SNPs that are associated with breed. Accordingly, the oligonucleotides are used to determine the nucleotide occurrence of canine SNPs that are associated with a breed. The determination can be made by selecting oligonucleotides that bind at or near a genomic location of each SNP of the series of canine SNPs. The system of the present invention typically includes a reagent handling mechanism that can be used to apply a reagent, typically a liquid, to the solid support. The binding of an oligonucleotide of the series of oligonucleotides to a polynucleotide isolated from a genome can be affected by the nucleotide occurrence of the SNP. The system can include a mechanism effective for moving a solid support and a detection mechanism. The detection method detects binding or tagging of the oligonucleotides.
Methods according to this aspect of the present invention can utilize a bioeconomic model, such is a model that estimates the net value of one or more companion animal subjects based on one or more phenotypes. By this method, phenotypes of one, or preferably a series of genetic traits are inferred. The model is typically a computer model. Values for the companion animal subjects can be used to segregate the animals. Furthermore, various parameters that can be controlled during maintenance and growth of the companion animal subjects can be input into the model in order to affect the way the animals are raised in order to obtain maximum health for the companion animal subject.
In another aspect, the present invention provides methods that allow effective measurement and sorting of animals individually, accurate and complete record keeping of genotypes and phenotypes or characteristics for each animal, and production of an economic end point determination for each animal using growth performance data. Accordingly, the present invention provides a method for sorting companion animal subjects. The method includes inferring a phenotype of a genetic trait for both a first companion animal subject and a second companion animal subject from a nucleic acid sample of the first companion animal subject and the second companion animal subject. The inference is made by a method that includes identifying the nucleotide occurrence of at least one single nucleotide polymorphism (SNP), wherein the nucleotide occurrence is associated with the genetic trait. The method further includes sorting the first companion animal subject and the second companion animal subject based on the inferred phenotype.
The method can further include measuring a physical characteristic of the first companion animal subject and the second companion animal subject, and sorting the first companion animal subject and the second companion animal subject based on both the inferred phenotype and the measured physical characteristic. The physical characteristic can be, for example, weight, breed, type or frame size, and can be measured using many methods known in the art, such as by using ultrasound. Sorting companion animals based on predicted phenotype allows selected companion animals to be chosen for programs such as guide dogs, police dogs and for dog and breed club shows.
In another aspect, the present invention provides methods that use analysis of companion animal genetic variation to improve the genetics of the population to produce animals with consistent desirable characteristics. Accordingly, in one aspect the present invention provides a method for selection and breeding of companion animal subjects for a genetic trait. The method includes inferring a phenotype of the genetic trait of a group of companion animal candidates for use in breeding programs from a nucleic acid sample of the companion animal candidates. The inference is made by a method that includes identifying the nucleotide occurrence of at least one single nucleotide polymorphism (SNP), wherein the nucleotide occurrence is associated with the phenotype. Individuals are then selected from the group of candidates with a desired phenotype for the genetic trait for use in breeding programs.
In another aspect the present invention provides a method for cloning a companion animal subject with a specific genetic trait or series of traits. The method includes identifying nucleotide occurrences of at least two SNPs for the companion animal subject, isolating a progenitor cell from the companion animal subject, and generating a cloned companion animal from the progenitor cell. The method can further include before identifying the nucleotide occurrences, identifying the phenotype of the companion animal subject, wherein the companion animal subject has a desired phenotype for a genetic trait and wherein at least two SNPs affect the phenotype. Methods of breeding and cloning companion animals are known in the art and can be used for the present invention.
This invention identifies animals that may have superior genetic traits, predicted very accurately, that can be used to identify parents of the next generation through selection.
In another aspect, the present invention provides a method of tracking a companion animal subject. The method includes identifying nucleotide occurrences for a series of genetic markers of the companion animal subject, identifying the nucleotide occurrences for the series of genetic markers for a sample, and determining whether the nucleotide occurrences of the companion animal subject are the same as the nucleotide occurrences of the sample. In this method identical nucleotide occurrences indicate that the sample is from the companion animal subject. For example, parentage can be confirmed by this method.
In certain preferred embodiments the series of genetic markers is a series of single nucleotide polymorphisms (SNPs). The method can further include comparing the results of the above determination with a determination of whether the sample is from the companion animal subject made using another tracking method. In this embodiment, the present invention provides quality control information that improves the accuracy of tracking the source of the sample.
The nucleotide occurrence data for the companion animal subject can be stored in a computer readable form, such as a database. Therefore, in one example, an initial nucleotide occurrence determination can be made for the series of genetic markers for a young companion animal subject and stored in a database along with information identifying the companion animal subject.
A series of markers or a series of SNPs as used herein, can include a series of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 250, 500, 1000, 2000, 2500, 5000, or 6000 markers, for example.
In another aspect, the present invention provides a method for diagnosing a health condition of a companion animal subject. The method includes drawing an inference regarding a phenotype of the companion animal subject for the health condition, from a nucleic acid sample of the subject. The inference is drawn by identifying, in the nucleic acid sample, at least one nucleotide occurrence of a single nucleotide polymorphism (SNP), wherein the nucleotide occurrence is associated with the phenotype.
The nucleotide occurrence of at least 2 SNPs can be determined. In some methods, at least 2 SNPs form a haploytpe, wherein the method identifies a haplotype allele that is associated with the genetic trait. Preferably, the method includes identifying a diploid pair of haplotype alleles for one or more haplotypes.
The health condition for this aspect of the invention, is resistance to disease or infection, susceptibility to infection, regulation of immune status and response to antigens, previous exposure to infection or parasites, or bone/joint health, coat color/health, body mass, and health of respiratory and digestive tissues, for example.
The present invention in another aspect provides a method for inferring a phenotype of a genetic trait of a companion animal subject from a nucleic acid sample of the subject, that includes identifying, in the nucleic acid sample, at least one nucleotide occurrence of a single nucleotide polymorphism (SNP). The nucleotide occurrence is associated with the phenotype, thereby allowing an inference of the phenotype.
These embodiments of the invention are based, in part, on a determination that single nucleotide polymorphisms (SNP), including haploid or diploid SNPs, and haplotype alleles, including haploid or diploid haplotype alleles, allow an inference to be drawn as to the phenotype of a subject, particularly a companion animal subject. Accordingly, methods of the invention can involve determining the nucleotide occurrence of at least 2, 3, 4, 5, 10, 20, 30, 40, 50, etc. SNPs. The SNPs can form all or part of a haploytpe, wherein the method can identify a haplotype allele that is associated with the genetic trait. Furthermore, the method can include identifying a diploid pair of haplotype alleles.
In another aspect, the present invention provides a method for identifying a companion animal genetic marker that influences a phenotype of a genetic trait. The method includes analyzing companion animal genetic markers for association with the genetic trait. Preferably, as discussed above for other aspects of the invention, the genetic marker is a single nucleotide polymorphism (SNPs). Preferably, at least two SNPs are identified that influence the genetic trait. Because the method can identify at least two SNPs, and in some embodiments, many SNPs, the method can identify not only additive genetic components, but non-additive genetic components such as dominance (i.e. dominating phenotype of an allele of one gene over an allele of a another gene) and epistasis (i.e. interaction between genes at different loci). Furthermore, the method can uncover pleiotropic effects of SNP alleles (i.e. effects on many different genetic traits), because many genetic traits can be analyzed for their association with many SNPs using methods disclosed herein.
Nucleotide occurrences are determined for essentially all, and most preferably all of the SNPs of a high-density, whole genome SNP map. This approach has the advantage over traditional approaches in that since it encompasses the whole genome, it identifies potential interactions of gene products expressed from genes located anywhere on the genome without requiring preexisting knowledge regarding a possible interaction between the gene products. An example of a high-density, whole genome SNP map is a map of at least about 1 SNP per 10,000 kb, preferably at least 1 SNP per 500 kb or about 10 SNPs per 500 kb, most preferably at least about 25 SNPs or more per 500 kb. Definitions of densities of markers may change across the genome and are determined by the degree of linkage disequilibrium from marker to marker.
The invention includes methods for creating a high density map. The SNP markers and their surrounding sequence are compared to model organisms, for example human and mouse genomes, where the complete genomic sequence is known and syntenic regions identified. The model organism map may serve as a template for ensuring complete coverage of the animal genome. The finished map has markers spaced in such a way to maximize the amount of linkage disequilibrium in a specific genetic region.
This map is used to mark all regions of the chromosomes in a single experiment utilizing thousands of experimental animals in an association study, to correlate genomic regions with complex and simple genetic traits. These associations can be further analyzed to unravel complex interactions among genomic regions that contribute to the targeted genetic trait or other traits, epistatic genetic interactions and pleiotropy. The invention of regional high density maps can also be used to identify targeted regions of chromosomes that influence genetic traits.
Accordingly, in embodiments where SNPs that affect the same phenotype are identified that are located in different genes, the method can further include analyzing expression products of genes near the identified SNPs, to determine whether the expression products interact. As such, the present invention provides methods to detect epistatic genetic interactions. Laboratory methods are well known in the art for determining whether gene products interact.
In another aspect, the present invention provides a method for identifying a companion animal gene associated with a genetic trait. The method includes identifying a companion animal single nucleotide polymorphism (SNP) that influences a phenotype of a genetic trait by analyzing a genome-wide companion animal SNP map for association with the genetic trait, wherein the SNP is found on a target region of a companion animal chromosome. Genes present on the target region are then identified. The presence of a gene on the target region of the companion animal chromosome indicates that the gene is a candidate gene for association with the genetic trait. The candidate gene can then be analyzed using methods known in the art to determine whether it is associated with the genetic trait.
In another aspect, the present invention provides a high-throughput system for determining the nucleotide occurrences at a series of companion animal single nucleotide polymorphisms (SNPs). The system typically includes a hybridization medium comprising a series of oligonucleotides, which is typically one of the following: a solid support to which a series of oligonucleotides can be directly or indirectly attached, a homogeneous assay or a microfluidic device. Each of these hybridization mediums is used to determine the nucleotide occurrence of companion animal SNPs that are associated with a genetic trait.
Accordingly, the oligonucleotides are used to determine the nucleotide occurrence of companion animal SNPs that are associated with a genetic trait. The determination can be made by selecting oligonucleotides that bind at or near a genomic location of each SNP of the series of companion animal SNPs. The high-throughput system of the present invention typically includes a reagent handling mechanism that can be used to apply a reagent, typically a liquid, to the solid support. The binding of an oligonucleotide of the series of oligonucleotides to a polynucleotide isolated from a genome can be affected by the nucleotide occurrence of the SNP. The high-throughput system can include a mechanism effective for moving a solid support and a detection mechanism. The detection method detects binding or tagging of the oligonucleotides.
Medium to high-throughput systems for analyzing SNPs, known in the art such as the SNPStreamÒn UHT Genotyping System (Beckman/Coulter, Fullerton, Calif.) (Boyce-Jacino and Goelet Patents), the Mass Array™ system (Sequenom, San Diego, Calif.) (Storm, N. et al. (2002) Methods in Molecular Biology. 212: 241-262.), the BeadArrayÔ SNP genotyping system available from Illumina (San Diego, Calif.)(Oliphant, A., et al. (June 2002) (supplement to Biotechniques), and TaqMan™ (Applied Biosystems, Foster City, Calif.) can be used with the present invention. However, the present invention provides a medium to high-throughput system that is designed to detect nucleotide occurrences of canine SNPs, or a series of canine SNPs that can make up a series of haplotypes. Therefore, as indicated above the system includes a solid support or other method to which a series of oligonucleotides can be associated that are used to determine a nucleotide occurrence of a SNP for a series of canine SNPs that are associated with a trait. The system can further include a detection mechanism for detecting binding of the series of oligonucleotides to the series of SNPs. Such detection mechanisms are known in the art.
In certain preferred embodiments, the high-throughput system is a microfluidics device. Numerous microfluidic devices are known that include solid supports with microchannels (See e.g., U.S. Pat. Nos. 5,304,487, 5,110745, 5,681,484, and 5,593,838, incorporated herein by reference in their entirety). The high-throughput systems of the present invention are designed to determine nucleotide occurrences of one SNP and preferably a series of SNPs. In certain preferred embodiments, the systems can determine nucleotide occurrences of an entire genome-wide high-density SNP map.
Numerous methods are known in the art for determining the nucleotide occurrence for a particular SNP in a sample. Such methods can utilize one or more oligonucleotide probes or primers, including, for example, an amplification primer pair, that selectively hybridize to a target polynucleotide, which corresponds to one or more companion animal SNP positions. Oligonucleotide probes useful in practicing a method of the invention can include, for example, an oligonucleotide that is complementary to and spans a portion of the target polynucleotide, including the position of the SNP, wherein the presence of a specific nucleotide at the position (i.e., the SNP) is detected by the presence or absence of selective hybridization of the probe. Such a method can further include contacting the target polynucleotide and hybridized oligonucleotide with an endonuclease, and detecting the presence or absence of a cleavage product of the probe, depending on whether the nucleotide occurrence at the SNP site is complementary to the corresponding nucleotide of the probe.
An oligonucleotide ligation assay (Grossman, P. D. et al. (1994) Nucleic Acids Research 22:4527-4534) also can be used to identify a nucleotide occurrence at a polymorphic position, wherein a pair of probes that selectively hybridize upstream and adjacent to and downstream and adjacent to the site of the SNP, and wherein one of the probes includes a terminal nucleotide complementary to a nucleotide occurrence of the SNP. Where the terminal nucleotide of the probe is complementary to the nucleotide occurrence, selective hybridization includes the terminal nucleotide such that, in the presence of a ligase, the upstream and downstream oligonucleotides are ligated. As such, the presence or absence of a ligation product is indicative of the nucleotide occurrence at the SNP site. An example of this type of assay is the SNPlex System (Applied Biosystems, Foster City, Calif.).
An oligonucleotide also can be useful as a primer, for example, for a primer extension reaction, wherein the product (or absence of a product) of the extension reaction is indicative of the nucleotide occurrence. In addition, a primer pair useful for amplifying a portion of the target polynucleotide including the SNP site can be useful, wherein the amplification product is examined to determine the nucleotide occurrence at the SNP site. Particularly useful methods include those that are readily adaptable to a high throughput format, to a multiplex format, or to both. The primer extension or amplification product can be detected directly or indirectly and/or can be sequenced using various methods known in the art. Amplification products which span a SNP loci can be sequenced using traditional sequence methodologies (e.g., the ‘dideoxy-mediated chain termination method,’ also known as the ‘Sanger Method’ (Sanger, F., et al., J. Molec. Biol. 94:441 (1975); Prober et al. Science 238:336-340 (1987)) and the ‘chemical degradation method,’ ‘also known as the ‘Maxam-Gilbert method’ (Maxam, A. M., et al., Proc. Natl. Acad. Sci. (U.S.A.) 74:560 (1977)), both references herein incorporated by reference) to determine the nucleotide occurrence at the SNP locus.
Methods of the invention can identify nucleotide occurrences at SNPs using genome-wide sequencing or “microsequencing” methods. Whole-genome sequencing of individuals identifies all SNP genotypes in a single analysis. Microsequencing methods determine the identity of only a single nucleotide at a “predetermined” site. Such methods have particular utility in determining the presence and identity of polymorphisms in a target polynucleotide. Such microsequencing methods, as well as other methods for determining the nucleotide occurrence at a SNP locus are discussed in Boyce-Jacino, et al., U.S. Pat. No. 6,294,336, incorporated herein by reference, and summarized herein.
Microsequencing methods include the Genetic Bit Analysis method disclosed by Goelet, P. et al. (WO 92/15712, herein incorporated by reference). Additional, primer-guided, nucleotide incorporation procedures for assaying polymorphic sites in DNA have also been described (Komher, J. S. et al, Nucl. Acids. Res. 17:7779-7784 (1989); Sokolov, B. P., Nucl. Acids Res. 18:3671 (1990); Syvanen, A.-C., et al., Genomics 8:684-692 (1990); Kuppuswamy, M. N. et al., Proc. Natl. Acad. Sci. (U.S.A.) 88:1143-1147 (1991); Prezant, T. R. et al, Hum. Mutat. 1:159-164 (1992); Ugozzoli, L. et al., GATA 9:107-112 (1992); Nyren, P. et al., Anal. Biochem. 208:171-175 (1993); and Wallace, WO89/10414). These methods differ from Genetic Bit™. Analysis in that they all rely on the incorporation of labeled deoxynucleotides to discriminate between bases at a polymorphic site. In such a format, since the signal is proportional to the number of deoxynucleotides incorporated, polymorphisms that occur in runs of the same nucleotide can result in signals that are proportional to the length of the run (Syvanen, A.-C., et al. Amer. J. Hum. Genet. 52:46-59 (1993)).
Alternative microsequencing methods have been provided by Mundy, C. R. (U.S. Pat. No. 4,656,127) and Cohen, D. et al (French Patent 2,650,840; PCT Appln. No. WO91/02087) which discusses a solution-based method for determining the identity of the nucleotide of a polymorphic site. As in the Mundy method of U.S. Pat. No. 4,656,127, a primer is employed that is complementary to allelic sequences 3′- to a polymorphic site.
In response to the difficulties encountered in employing gel electrophoresis to analyze sequences, alternative methods for microsequencing have been developed. Macevicz (U.S. Pat. No. 5,002,867), for example, describes a method for determining nucleic acid sequence via hybridization with multiple mixtures of oligonucleotide probes. In accordance with such method, the sequence of a target polynucleotide is determined by permitting the target to sequentially hybridize with sets of probes having an invariant nucleotide at one position, and variant nucleotides at other positions. The Macevicz method determines the nucleotide sequence of the target by hybridizing the target with a set of probes, and then determining the number of sites that at least one member of the set is capable of hybridizing to the target (i.e., the number of matches'). This procedure is repeated until each member of a sets of probes has been tested.
Boyce-Jacino, et al., U.S. Pat. No. 6,294,336 provides a solid phase sequencing method for determining the sequence of nucleic acid molecules (either DNA or RNA) by utilizing a primer that selectively binds a polynucleotide target at a site wherein the SNP is the most 3′ nucleotide selectively bound to the target.
The occurrence of a SNP can be determined using denaturing HPLC such as described in Nairz K et al (2002) Proc. Natl. Acad. Sci. (U.S.A.) 99: 10575-80, and the Transgenomic WAVE® System (Transgenomic, Inc. Omaha, Nebr.). Oliphant et al. report a method that utilizes BeadArray™ Technology that can be used in the methods of the present invention to determine the nucleotide occurrence of a SNP. (supplement to Biotechniques, June 2002). Additionally, nucleotide occurrences for SNPs can be determined using a DNAMassARRAY system (SEQUENOM, San Diego, Calif.). This system combines proprietary SpectroChips™, microfluidics, nanodispensing, biochemistry, and MALDI-TOF MS (matrix-assisted laser desorption ionization time of flight mass spectrometry).
As another example, the nucleotide occurrences of canine SNPs in a sample can be determined using the SNP-IT™ method (Beckman Coulter, Fullerton, Calif.). In general, SNP-IT™ is a 3-step primer extension reaction. In the first step a target polynucleotide is isolated from a sample by hybridization to a capture primer, which provides a first level of specificity. In a second step the capture primer is extended from a terminating nucleotide triphosphate at the target SNP site, which provides a second level of specificity. In a third step, the extended nucleotide trisphosphate can be detected using a variety of known formats, including: direct fluorescence, indirect fluorescence, an indirect colorimetric assay, mass spectrometry, fluorescence polarization, etc. Reactions can be processed in 384 well format in an automated format using a SNPstream™ instrument (Beckman Coulter, Fullerton, Calif.). Reactions can also be analyzed by binding to Luminex biospheres (Luminex Corporation, Austin, Tex., Cai. H. (2000) Genomics 66(2): 135-43.). Other formats for SNP detection include TaqMan™ (Applied Biosystems, Foster City, Calif.), Rolling circle (Hatch et al (1999) Genet. Anal. 15: 35-40, Qi et al (2001) Nucleic Acids Research Vol. 29 e116), fluorescence polarization (Chen, X., et al. (1999) Genome Research 9:492-498), SNaPShot (Applied Biosystems, Foster City, Calif.) (Makridakis, N. M. et al. (2001) Biotechniques 31:1374-80.), oligo-ligation assay (Grossman, P. D., et al. (1994) Nucleic Acids Research 22:4527-4534), locked nucleic acids (LNATM, Link, Technologies LTD, Lanarkshire, Scotland, EP patent 1013661, U.S. Pat. No. 6,268,490), Invader Assay (Aclara Biosciences, Wilkinson, D. (1999) The Scientist 13:16), padlock probes (Nilsson et al. Science (1994), 265: 2085), Sequence-tagged molecular inversion probes (similar to padlock probes) from ParAllele Bioscience (South San Francisco, Calif.; Hardenbol, P. et al. (2003) Nature Biotechnology 21:673-678), Molecular Beacons (Marras, S. A. et al. (1999 Genet Anal. 14:151-156), the READIT™ SNP Genotyping System from Promega (Madison, Wis.) (Rhodes R. B. et al. (2001) Mol Diagn. 6:55-61), Dynamic Allele-Specific Hybridization (DASH) (Prince, J. A. et al. (2001) Genome Research 11:152-162), the Qbead™ system (quantum dot encoded microspheres conjugated to allele-specific oligonucleotides)(Xu H. et al. (2003) Nucleic Acids Research 31:e43), Scorpion primers (similar to molecular beacons except unimolecular) (Thelwell, N. et al. (2000) Nucleic Acids Research 28:3752-3761), and Magiprobe (a novel fluorescence quenching-based oligonucleotide probe carrying a fluorophore and an intercalator)(Yamane A. (2002) Nucleic Acids Research 30:e97). In addition, Rao, K. V. N. et al. ((2003) Nucleic Acids Research. 31:e66), recently reported a microsphere-based genotyping assay that detects SNPs directly from human genomic DNA. The assay involves a structure-specific cleavage reaction, which generates fluorescent signal on the surface of microspheres, followed by flow cytometry of the microspheres. With a slightly different twist on the Sequenom technology (MALDI), Sauer et al. ((2003) Nucleic Acids Research 31:e63) generate charge-tagged DNA (post PCR and primer extension), using a photocleavable linker.
Accordingly, using the methods described above, the companion animal, such as a canine companion animal, haplotype allele or the nucleotide occurrence of a companion animal SNP can be identified using an amplification reaction, a primer extension reaction, or an immunoassay. The companion animal haplotype allele or companion animal SNP can also be identified by contacting polynucleotides in the sample or polynucleotides derived from the sample, with a specific binding pair member that selectively hybridizes to a polynucleotide region comprising the companion animal SNP, under conditions wherein the binding pair member specifically binds at or near the companion animal SNP. The specific binding pair member can be an antibody or a polynucleotide.
The nucleotide occurrence of a SNP can be identified by other methodologies as well as those discussed above. For example, the identification can use microarray technology, which can be performed with PCR, for example using Affymetrix technologies and GenFlex Tag arrays (See e.g., Fan et al (2000) Genome Res. 10:853-860), or using a canine gene chip containing proprietary SNP oligonucleotides (See e.g., Chee et al (1996), Science 274:610-614; and Kennedy et al. (2003) Nature Biotech 21:1233-1237) or without PCR, or sequencing methods such as mass spectrometry, scanning electron microscopy, or methods in which a polynucleotide flows past a sorting device that can detect the sequence of the polynucleotide. The occurrence of a SNP can be identified using electrochemical detection devices such as the eSensor™ DNA detection system (Motorola, Inc., Yu, C. J. (2001) J. Am Chem. Soc. 123:11155-11161). Other formats include melting curve analysis using fluorescently labeled hybridization probes, or intercalating dyes (Lohmann, S. (2000) Biochemica 4, 23-28, Herrmann, M. (2000) Clinical Chemistry 46: 425).
The SNP detection systems of the present invention typically utilize selective hybridization. As used herein, the term “selective hybridization” or “selectively hybridize,” refers to hybridization under moderately stringent or highly stringent conditions such that a nucleotide sequence preferentially associates with a selected nucleotide sequence over unrelated nucleotide sequences to a large enough extent to be useful in identifying a nucleotide occurrence of a SNP. It will be recognized that some amount of non-specific hybridization is unavoidable, but is acceptable provide that hybridization to a target nucleotide sequence is sufficiently selective such that it can be distinguished over the non-specific cross-hybridization, for example, at least about 2-fold more selective, generally at least about 3-fold more selective, usually at least about 5-fold more selective, and particularly at least about 10-fold more selective, as determined, for example, by an amount of labeled oligonucleotide that binds to target nucleic acid molecule as compared to a nucleic acid molecule other than the target molecule, particularly a substantially similar (i.e., homologous) nucleic acid molecule other than the target nucleic acid molecule. Conditions that allow for selective hybridization can be determined empirically, or can be estimated based, for example, on the relative GC:AT content of the hybridizing oligonucleotide and the sequence to which it is to hybridize, the length of the hybridizing oligonucleotide, and the number, if any, of mismatches between the oligonucleotide and sequence to which it is to hybridize (see, for example, Sambrook et al., “Molecular Cloning: A laboratory manual (Cold Spring Harbor Laboratory Press 1989)).
An example of progressively higher stringency conditions is as follows: 2×SSC/0.1% SDS at about room temperature (hybridization conditions); 0.2×SSC/0.1% SDS at about room temperature (low stringency conditions); 0.2×SSC/0.1% SDS at about 42° C. (moderate stringency conditions); and 0.1×SSC at about 68° C. (high stringency conditions). Washing can be carried out using only one of these conditions, e.g., high stringency conditions, or each of the conditions can be used, e.g., for 10-15 minutes each, in the order listed above, repeating any or all of the steps listed. However, as mentioned above, optimal conditions will vary, depending on the particular hybridization reaction involved, and can be determined empirically.
The term ‘polynucleotide’ is used broadly herein to mean a sequence of deoxyribonucleotides or ribonucleotides that are linked together by a phosphodiester bond. For convenience, the term ‘oligonucleotide’ is used herein to refer to a polynucleotide that is used as a primer or a probe. Generally, an oligonucleotide useful as a probe or primer that selectively hybridizes to a selected nucleotide sequence is at least about 15 nucleotides in length, usually at least about 18 nucleotides, and particularly about 21 nucleotides or more in length.
A polynucleotide can be RNA or can be DNA, which can be a gene or a portion thereof, a CDNA, a synthetic polydeoxyribonucleic acid sequence, or the like, and can be single stranded or double stranded, as well as a DNA/RNA hybrid. In various embodiments, a polynucleotide, including an oligonucleotide (e.g., a probe or a primer) can contain nucleoside or nucleotide analogs, or a backbone bond other than a phosphodiester bond. In general, the nucleotides comprising a polynucleotide are naturally occurring deoxyribonucleotides, such as adenine, cytosine, guanine or thymine linked to 2′ deoxyribose, or ribonucleotides such as adenine, cytosine, guanine or uracil linked to ribose. However, a polynucleotide or oligonucleotide also can contain nucleotide analogs, including non naturally occurring synthetic nucleotides or modified naturally occurring nucleotides. Such nucleotide analogs are well known in the art and commercially available, as are polynucleotides containing such nucleotide analogs (Lin et al., Nucleic Acids Research (1994) 22:5220-5234 Jellinek et al., Biochemistry (1995) 34:11363-11372; Pagratis et al., Nature Biotechnol. (1997) 15:68-73, each of which is incorporated herein by reference). Primers and probes can also be comprised of peptide nucleic acids (PNA) (Nielsen P E and Egholm M. (1999) Curr. Issues Mol. Biol. 1:89-104).
The covalent bond linking the nucleotides of a polynucleotide generally is a phosphodiester bond. However, the covalent bond also can be any of numerous other bonds, including a thiodiester bond, a phosphorothioate bond, a peptide-like bond or any other bond known to those in the art as useful for linking nucleotides to produce synthetic polynucleotides (see, for example, Tam et al., Nucl. Acids Res. 22:977-986 (1994); Ecker and Crooke, BioTechnology 13:351360 (1995), each of which is incorporated herein by reference). The incorporation of non-naturally occurring nucleotide analogs or bonds linking the nucleotides or analogs can be particularly useful where the polynucleotide is to be exposed to an environment that can contain a nucleolytic activity, including, for example, a tissue culture medium or upon administration to a living subject, since the modified polynucleotides can be less susceptible to degradation.
A polynucleotide or oligonucleotide comprising naturally occurring nucleotides and phosphodiester bonds can be chemically synthesized or can be produced using recombinant DNA methods, using an appropriate polynucleotide as a template. In comparison, a polynucleotide or oligonucleotide comprising nucleotide analogs or covalent bonds other than phosphodiester bonds generally are chemically synthesized, although an enzyme such as T7 polymerase can incorporate certain types of nucleotide analogs into a polynucleotide and, therefore, can be used to produce such a polynucleotide recombinantly from an appropriate template (Jellinek et al., supra, 1995). Thus, the term polynucleotide as used herein includes naturally occurring nucleic acid molecules, which can be isolated from a cell, as well as synthetic molecules, which can be prepared, for example, by methods of chemical synthesis or by enzymatic methods such as by the polymerase chain reaction (PCR).
In various embodiments for identifying nucleotide occurrences of SNPs, it can be useful to detectably label a polynucleotide or oligonucleotide. Detectable labeling of a polynucleotide or oligonucleotide is well known in the art. Particular non-limiting examples of detectable labels include chemiluminescent labels, fluorescent labels, radiolabels, enzymes, haptens, or even unique oligonucleotide sequences. Thus, a polynucleotide or an oligonucleotide of the invention can further include a detectable label. For example, the detectable label can be associated with the polynucleotide at a position corresponding to the SNP in Table 8 sequences. As discussed in more detail herein, the labeled polynucleotide can be generated, for example, during a microsequencing reaction, such as SNP-IT™ reaction.
A method of the identifying a SNP also can be performed using a specific binding pair member. As used herein, the term ‘specific binding pair member’ refers to a molecule that specifically binds or selectively hybridizes to another member of a specific binding pair. Specific binding pair member include, for example, probes, primers, polynucleotides, antibodies, etc. For example, a specific binding pair member includes a primer or a probe that selectively hybridizes to a target polynucleotide that includes a SNP loci, or that hybridizes to an amplification product generated using the target polynucleotide as a template.
As used herein, the term ‘specific interaction,’ or ‘specifically binds’ or the like means that two molecules form a complex that is relatively stable under physiologic conditions. The term is used herein in reference to various interactions, including, for example, the interaction of an antibody that binds a polynucleotide that includes a SNP site; or the interaction of an antibody that binds a polypeptide that includes an amino acid that is encoded by a codon that includes a SNP site. According to methods of the invention, an antibody can selectively bind to a polypeptide that includes a particular amino acid encoded by a codon that includes a SNP site. Alternatively, an antibody may preferentially bind a particular modified nucleotide that is incorporated into a SNP site for only certain nucleotide occurrences at the SNP site, for example using a primer extension assay.
A specific interaction can be characterized by a dissociation constant of at least about 1×10−6 M, generally at least about 1×10−7 M, usually at least about 1×10−8 M, and particularly at least about 1×10−9 M or 1×10−10 M or greater. A specific interaction generally is stable under physiological conditions, including, for example, conditions that occur in a living individual such as a human or other vertebrate or invertebrate, as well as conditions that occur in a cell culture such as used for maintaining mammalian cells or cells from another vertebrate organism or an invertebrate organism. Methods for determining whether two molecules interact specifically are well known and include, for example, equilibrium dialysis, surface plasmon resonance, and the like.
The invention also relates to kits, which can be used, for example, to perform a method of the invention such as parentage, identity, breed determination and the determination of trait identification. This, in one embodiment, the invention provides a kit for identifying nucleotide occurrences or haplotype alleles of canine SNPs. Such a kit can contain, for example, an oligonucleotide probe, primer, or primer pair (see e.g., Table 7, SEQ ID NOs:102-407), or combinations thereof, for identifying the nucleotide occurrence of at least one canine single nucleotide polymorphism (SNP) associated with breed, such as a SNP corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101 (see Table 1 or Table 8). Such oligonucleotides being useful, for example, to identify a SNP or haplotype allele as disclosed herein; or can contain one or more polynucleotides corresponding to a portion of a canine gene containing one or more nucleotide occurrences associated with a canine trait, such polynucleotide being useful, for example, as a standard (control) that can be examined in parallel with a test sample. In addition, a kit of the invention can contain, for example, reagents for performing a method of the invention, including, for example, one or more detectable labels, which can be used to label a probe or primer or can be incorporated into a product generated using the probe or primer (e.g., an amplification product); one or more polymerases, which can be useful for a method that includes a primer extension or amplification procedure, or other enzyme or enzymes (e.g., a ligase or an endonuclease), which can be useful for performing an oligonucleotide ligation assay or a mismatch cleavage assay; and/or one or more buffers or other reagents that are necessary to or can facilitate performing a method of the invention. The primers or probes can be included in a kit in a labeled form, for example with a label such as biotin or an antibody. In one embodiment, a kit of the invention provides a plurality of oligonucleotides of the invention, including one or more oligonucleotide probes or one or more primers, including forward and/or reverse primers, or a combination of such probes and primers or primer pairs. Such a kit also can contain probes and/or primers that conveniently allow a method of the invention to be performed in a multiplex format.
The kit can also include instructions for using the probes or primers to determine a nucleotide occurrence of at least one canine SNPs. In one embodiment, a kit of the invention provides a plurality of oligonucleotides of the invention, including one or more oligonucleotide probes or one or more primers, including forward and/or reverse primers, or a combination of such probes and primers or primer pairs. Such a kit also can contain probes and/or primers that conveniently allow a method of the invention to be performed in a multiplex format. The kit can also include instructions for using the probes or primers to determine a nucleotide occurrence of at least one companion animal SNP, such as an SNP from a canine subject.
In another embodiment, the present invention provides a primer pair that binds to a first target region and a second target region, thereby supporting amplification of a nucleic acid sequence that includes the sequence of an SNP corresponding to any one of the SNPs set forth in SEQ ID NOs:1-101. For example, SEQ ID NO:1 encompasses the nucleic acid sequence TCTATACCTCTAAAGAATCGCTGCTACTTTGTGCAAGACTTTTAAAGTTTAAATGAAT TAA/G. Thus, nucleotides A or G correspond to the single nucleotide polymorphism (SNP) of SEQ ID NO:1 because the SNP corresponds to the first nucleotide, or complement thereof, in the most 3′ position of SEQ ID NO:1. Table 8 lists the SNP accession number and the 5′ sequence associated with each SNP (i.e., SEQ ID NOs:1-101). The single nucleotide polymorphism (SNP) corresponds to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101. Primer pairs include the forward (SEQ ID NOs:102-203) and reverse (SEQ ID NOs:204-305) primers provided in Table 7. For example, a primer for the SNP having the accession number ss9048431 can include SEQ ID NO: 102 (TATTGACTCTATACCTCTAA AGAATCGC) and SEQ ID NO:204 (AGAGTTTCATACTGGGGTAACTTTG). The extension primer for this SNP can include SEQ ID NO:306 (AGACTTTTAAAGTTTAAA TGAATTA). In general, the first primer of the primer pair and a second primer of the primer pair are at least 10 nucleotides in length and bind opposite strands of the target region located within about 3000 nucleotides of a position corresponding to the position of the SNP set forth in any one of the sequences set forth in SEQ ID NOs:1-101. In certain aspects, the terminal nucleotide of an oligonucleotide binds to the SNP. In these aspects, the method can include detecting an extension product generated using the oligonucleotide as a primer.
In another embodiment, provided herein is a primer pair that binds to a first target region and a second target region within about 3000 base pairs of SEQ ID NOs:1-101, wherein a first primer of the primer pair and a second primer of the primer pair are at least 10 nucleotides in length, bind opposite strands of the target region, and prime polynucleotide synthesis from the target region in opposite directions across the SNP identified in any one of SEQ ID NOs:1-101.
In another embodiment, the present invention provides an isolated oligonucleotide that selectively binds to a target polynucleotide that comprises at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 100, 150, 300, 500, or 600 nucleotides of any one of SEQ ID NOs:1-101, or a complement thereof. In another embodiment, the present invention provides an isolated oligonucleotide that includes 10 nucleotides, which selectively binds to a target polynucleotide of any one of the sequences provided in Table 8. The oligonucleotide can be, for example, 10, 15, 20, 25, 50, or 100 nucleotides in length.
In another embodiment, the present invention provides an isolated oligonucleotide pair effective for determining a nucleotide occurrence at a single nucleotide polymorphism (SNP) corresponding to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101 (Table 1 and Table 8). In certain aspects, the specific binding pair member is a substrate for a primer extension reaction.
In another embodiment, the present invention provides an isolated vector that includes a polynucleotide disclosed hereinabove. The term “vector” refers to a plasmid, virus or other vehicle known in the art that has been manipulated by insertion or incorporation of a nucleic acid sequence. Methods that are well known in the art can be used to construct vectors, including in vitro recombinant DNA techniques, synthetic techniques, and in vivo recombination/genetic techniques (See, for example, the techniques described in Maniatis et al. 1989 Molecular Cloning A Laboratory Manual, Cold Spring Harbor Laboratory, N.Y., incorporated herein in its entirety by reference). Further, the present invention provides an isolated cell that includes the vector. The cell can be prokaryotic or eukaryotic. Techniques for incorporated vectors into prokaryotic and eukaryotic cells are well known in the art. In certain aspects, the cells are canine cells. In other aspects, the cells are bacterial cells. In still other aspects, the cells are human cells.
Methods and compositions provided herein are also useful to infer a trait of a canine subject from a nucleic acid sample of the canine subject is provided. An exemplary method includes contacting the nucleic acid sample with a pair of oligonucleotides that comprise a primer pair, wherein amplified target nucleic acid molecules are produced; hybridizing at least one oligonucleotide primer selected from the group consisting of SEQ ID NOS:306-407 (see Table 7) to one or more amplified target nucleic acid molecules, wherein each oligonucleotide primer is complementary to a specific and unique region of each target nucleic acid molecule such that the 3′ end of each primer is proximal to a specific and unique target nucleotide of interest; extending each oligonucleotide with a template-dependent polymerase; and determining the identity of each nucleotide of interest by determining, for each extension primer employed, the identity of the nucleotide proximal to the 3′ end of each primer. The primer pair can be any of the forward and reverse oligonucleotide primer pairs listed in Table 7. For example, a first primer of the primer pair can be selected from SEQ ID NOS:1-101 and the second primer of the primer pair can be selected from SEQ ID NOS:102-203.
Population-specific alleles can be used to assign, for example, a canine animal to a particular breed. These population specific alleles are fixed in the population of interest and absent in comparison populations. The absence of an allele in a sample of individuals from any one population may be because those alleles are truly population-specific or because the frequency of those alleles is low and the sample obtained from any given population was small (Taylor, J. F., Patent: PCT/US01/47521). For admixed populations, population-specific alleles rarely occur, however the difference in allele frequency between populations may still enable their use to infer assignment of individual canines based to a breed, these are known as population associated alleles (Kumar, P., Heredity 91: 43-50 (2003)). Both population specific alleles and population-associated alleles are herein referred to as Breed-Specific Markers.
In the present invention, a marker is breed specific if it has a different allele frequency in one breed relative to one or more other breeds. A similar logic was employed by Kumar, P. (Heredity 91: 43-50 (2003)) to genetically distinguish cattle from European Bos taurus breeds and Indian Bos indicus breeds of cattle (see e.g., DeNise et al., 2003. U.S. patent application Ser. No. 10/750,622; Parker et al., Science 304, 1161-1164 (2004)).
In the present invention there are about 60 parentage and identity markers and about 101 breed-specific SNP markers, not mutually exclusive. One or more of these markers could be used to determine parentage or identity or breed specificity and/or to assign an individual to one or more breeds with an associated probability. These markers could be used alone or in any combination.
In general, there are two broad classes of clustering methods that are used to assign individuals to populations (Pritchard, J. K., et al., Genetics 155: 945-959 (2000)). These are: 1) Distance-based methods: These calculate a pairwise distance matrix, whose entries give the distance between every pair of individuals. 2) Model-based methods: These proceed by assuming that observations from each cluster are random draws from some parametric model. Inference for the parameters corresponding to each cluster is then done jointly with inference for the cluster membership of each individual, using standard statistical methods. The preset disclosure includes the use of all standard statistical methods including maximum likelihood, bootstrapping methodologies, Bayesian methods and any other statistical methodology that can be employed to analyze such genome data. These statistical techniques are well known to those in the art.
Many software programs for molecular population genetics studies have been developed, their advantage lies in their pre-programmed complex mathematical techniques and ability to handle large volumes of data. Popular programs used by those in the field include, but are not limited to: TFPGA, Arlequin, GDA, GENEPOP, GeneStrut, POPGENE (Labate, J. A., Crop Sci. 40: 1521-1528. (2000)) and Structure. The present disclosure incorporates the use of all of the software disclosed above used to classify canines into populations based on DNA polymorphisms as well as other software known in the art. “Structure” has been used to determine population structure and infer assignment of individual animals to populations for livestock species including poultry (Rosenberg, N. A., et al., Genetics. 159: 699-713 (2001)) and canines from South Asia (Kumar, P., Heredity 91: 43-50 (2003)).
In another aspect, the present invention provides a computer system that includes a database having records containing information regarding a series of companion animal single nucleotide polymorphisms (SNPs), and a user interface allowing a user to input nucleotide occurrences of the series of companion animal SNPs for a companion animal subject. The user interface can be used to query the database and display results of the query. The database can include records representing some or preferably all of the SNP of a companion animal SNP map, preferably a high-density companion animal SNP map. The database can also include information regarding haplotypes and haplotype alleles from the SNPs. Furthermore, the database can include information regarding phenotypes and/or genetic traits that are associated with some or all of the SNPs and/or haplotypes. In these embodiments the computer system can be used, for example, for any of the aspects of the invention that infer a phenotype of a genetic trait of a companion animal subject.
The computer system of the present invention can be a stand-alone computer, a conventional network system including a client/server environment and one or more database servers, and/or a handheld device. A number of conventional network systems, including a local area network (LAN) or a wide area network (WAN), are known in the art. Additionally, client/server environments, database servers, and networks are well documented in the technical, trade, and patent literature. For example, the database server can run on an operating system such as UNIX, running a relational database management system, a World Wide Web application, and a World Wide Web Server. When the computer system is a handheld device it can be a personal digital assistant (PDA) or another type of handheld device, of which many are known.
Typically, the database of the computer system of the present invention includes information regarding the location and nucleotide occurrences of companion animal SNPs. Information regarding genomic location of the SNP can be provided for example by including sequence information of consecutive sequences surrounding the SNP, that only 1 part of the genome provides 100% match, or by providing a position number of the SNP with respect to an available sequence entry, such as a Genbank sequence entry, or a sequence entry for a private database, or a commercially licensed database of DNA sequences. The database can also include information regarding nucleotide occurrences of SNPs, since as discussed herein typically nucleotide occurrences of less than all four nucleotides occur for a SNP.
The database can include other information regarding SNPs or haplotypes such as information regarding frequency of occurrence in a companion animal population. Furthermore, the database can be divided into multiple parts, one for storing sequences and the others for storing information regarding the sequences. The database may contain records representing additional information about a SNP, for example information identifying the gene in which a SNP is found, or nucleotide occurrence frequency information, or characteristics of the library or clone which generated the DNA sequence, or the relationship of the sequence surrounding the SNP to similar DNA sequences in other species.
The parts of the database of the present invention can be flat file databases or relational databases or object-oriented databases. The parts of the database can be internal databases, or external databases that are accessible to users. An internal database is a database maintained as a private database, typically maintained behind a firewall, by an enterprise. An external database is located outside an internal database, and is typically maintained by a different entity than an internal database. A number of external public biological sequence databases, particularly SNP databases, are available and can be used with the current invention. For example, the dbSNP database available from the National Center for Biological Information (NCBI), part of the National Library of Medicine, can be used with the current invention to provide comparative genomic information to assist in identifying companion animal SNPs.
In another aspect, the current invention provides a population of information regarding companion animal SNPs and haplotypes. The population of information can include an identification of genetic traits associated with the SNPs and haplotyopes. The population of information is typically included within a database, and is preferably identified using the methods of the current invention. The population of sequences can be a subpopulation of a larger database, that contains only SNPs and haplotypes related to a particular genetic trait. For example, the subpopulation can be identified in a table of a relational database. A population of information can include all of the SNPs and/or haplotypes of a genome-wide SNP map.
In addition to the database discussed above, the computer system of the present invention includes a user interface capable of receiving entry of nucleotide occurrence information regarding at least one, preferably two companion animal SNPs. The interface can be a graphic user interface where entries and selections are made using a series of menus, dialog boxes, and/or selectable buttons, for example. The interface typically takes a user through a series of screens beginning with a main screen. The user interface can include links that a user may select to access additional information relating a companion animal SNP map.
The function of the computer system of the present invention that carries out the phenotype inference methods typically includes a processing unit that executes a computer program product, itself representing another aspect of the invention, that includes a computer-readable program code embodied on a computer-usable medium and present in a memory function connected to the processing unit. The memory function can be ROM or RAM.
The computer program product, itself another aspect of the invention, is read and executed by the processing unit of the computer system of the present invention, and includes a computer-readable program code embodied on a computer-usable medium. The computer-readable program code relates to a plurality of sequence records stored in a database. The sequence records can contain information regarding the relationship between nucleotide occurrences of a series of companion animal single nucleotide polymorphisms (SNPs) and a phenotype of one or more genetic traits. The computer program product can include computer-readable program code for providing a user interface capable of allowing a user to input nucleotide occurrences of the series of companion animal SNPs for a companion animal subject, locating data corresponding to the entered query information, and displaying the data corresponding to the entered query. Data corresponding to the entered query information is typically located by querying a database as described above.
In another embodiment of the present invention, the computer system and computer program products are used to perform bioeconomic valuations used to perform methods described herein, such as methods for estimating the value of a companion animal subject that will be obtained therefrom.
An exemplary canine panel of SNPs for determining, for example, parentage or breed, is provided herein. DNA analysis provides a powerful tool for verifying the parentage and identification of individual animals. Microsatellite marker panels have been developed for canine that are highly polymorphic and amenable to standardization among laboratories performing these tests (DeNise et al., 2004, Anim Genet. 35(1): 14-17). However, microsatellite scoring requires considerable human oversight and microsatellite markers have high mutation rates. Single nucleotide polymorphisms (SNP) are likely to become the standard marker for parentage verification and identity because of the ease of scoring, low cost assay development and high-throughput capability.
The present invention is based in part on the discovery of single nucleotide polymorphisms (SNPs) that can be used to verify parentage or identity of canine subjects or infer breed of a canine subject. For example, SNPs have been used to verify parentage and breed in bovine subjects (see, e.g., U.S. patent application Ser. No. 10/750,622 or U.S. patent application Ser. No. 10/750,623, both of which are incorporated herein in their entirety). Accordingly, provided herein is a method for excluding putative parents of a canine breed and/or verifying identity of a canine; or inferring the breed of a canine subject from a nucleic acid sample of the canine subject, by identifying in the sample, a nucleotide occurrence for at least one single nucleotide polymorphism (SNP), wherein the nucleotide occurrence is associated with the breed.
Teachings for genetic identity and parentage exclusion are well known in the art. (DeNise et al., 2004. Anim. Genetics. 35(1): 14-17; Halverson et al., 1995. U.S. Pat. No. 05,874,217; Ostrander et al., 1993. Genomics, 16: 207-213, Ostrander et al., 1995. Mammalian Genome, 6: 192-195; Franscisco et al., 1996. Mammalian Genome 7:359-362). Statistical probability of identity is calculated as the probability of having a canine animal with the specific genotype of a canine subject. Parentage verification and identification is statistically characterized by the exclusion probability. Both of these statistical estimates are calculated from nucleotide frequencies within the population.
The methods of the present invention for inferring breed of a canine subject can be used to infer the breed of any canine subject. For example, the methods can be used to infer a breed including, but not limited to, Afghan Hound, Basenji, Basset Hound, Beagle, Belgian Tervuren, Bernese Mountain Dog, Borzoi, Chihuahua, Chinese Shar-Pei, Cocker Spaniel, Dachshund, Doberman Pinscher, German Shepherd Dog, German Shorthaired Pointer, Golden Retriever, Labrador Retriever, Mastiff, Miniature Schnauzer, Poodle, Pug, Rottweiler, Saluki, Samoyed, Shetland Sheepdog, Siberian Husky, St. Bernard, Whippet and Yorkshire Terrier.
Furthermore, the methods of the present invention can be used to assign a breed or breeds to an individual animal with a specific probability. Typically, an identified nucleotide occurrence is compared to multiple known SNP alleles associated with multiple breeds, for example the breed associated alleles identified herein in Table 4, to infer a breed for a subject from multiple possible breeds.
SNP markers were identified from whole-genome shotgun sequencing of the canine genome (Kirkness, et al., 2003, Science 5641:1898-903). Over 650,000 putative bi-alleleic SNP markers, excluding insertion/deletions, were identified from the 974,000 putative SNPs assigned Genbank accession numbers between ss8830321 and ss9805720. The contigs containing these SNPs were syntenically mapped to the sequence of the human genome. The present study evaluated 384 SNP markers for their robust assay development, allele frequency among 30 canine breeds, exclusion probabilities and probability match rate. Out of these markers, about 60 SNPs were selected for a parentage panel that can be used across a number of breeds and systems for parentage verification and animal identity and 101 breed specific markers were identified. Briefly, markers were assayed on Beckman Coulter GenomeLab™ SNPstream® Genotyping System. Markers were amplified in a 5 μl reaction volume of a 12-marker multiplex in a 384-well format. The PCR is performed as follows: 95° C. for 10 min, followed by 34 cycles of 94° C. for 30 s, 55° C. for 30 s, and 72° C. for 1 min. The DNA products are cleaned using 3 μl of diluted SNP-IT™ Clean-Up (USB), incubated at 37° C. for 30 min with a final inactivation step of 96° C. for 10 min. The extension reaction is performed as described by the manufacturer, with 0.2 μl of the G/A extension mix 3.762 μl extension mix diluent, 0.021 ml DNA polymerase, 3 μl of extension primer working stock, and 0.018 ml water added to the 8 ml volume in the well after clean-up. This 15 μl extension reaction is then thermal cycled as follows: 96° C. for 3 min, followed by 45 cycles of 94° C. for 20 s and 40° C. for 11 s. Following extension, 8 μl of hybridization cocktail is added and mixed. Ten microliters of this mixture is then transferred to the 384-well SNPStream® Tag Array plate. The plate is then incubated at 42° C. for 2 hr. Each of the 384 wells in a Tag Array plate contains 16 unique oligonucleotides of a known sequence, or tag. After hybridization, the Tag Array plate is then washed, dried (1 hr), and read on the SNPstream® SNPScope Array Imager. The raw image data is then analyzed and genotype calls generated using the software provided, then reviewed by scientists before data is uploaded into the database.
Three hundred eighty four SNP markers were selected for study based on their dispersion pattern throughout the canine genome as determined by their human location, and all markers contained a guanine/adenine purine transition for ease of assay development. Trios of 23 parent, offspring combinations were used to verify mendelian inheritance. Canine animals, representing 30 breeds, 38 animals per breed, were used to validate and select markers. Allele frequencies within breed were determined using simple counting methods. Sixty markers were identified that can be utilized for parentage and identity and 101 breed specific markers were identified. These markers are not mutually exclusive. Accession numbers for parentage and identity markers are listed in Table 1 and Table 8. The sequences of the parentage and identity markers can be found on the world wide web at ftp://ftp.ncbi.nlm.nih.gov/snp/dog/ss_fasta. The contents of these files are encoded in XML, and contain the following information: SNP Id, Contig Name denoting the location of the SNP, and 60 bases of sequence flanking 5′ end of SNP, and the alleles comprising the SNP. The position of the SNP in the contig is determined by blasting the 5′ flanking sequence to the contig sequence. The location of the SNP is the base following the last matching base of the 60 bases. Contigs can be found on the world wide web at http://www.ncbi.nlm.nih.gov/entrez/query.fcgidb=Nucleotide&cmd=Search&term=AACN010000001:AACN011089636[PACC]. An example of the information provided for Accession number ss9048431 includes the following information related to reference information and contig analysis:
With regard to the information associated with each accession number, the sequence associated with a particular sequence identifier can be found at the lined labeled “<NSEss_flank5_E>” and the SNP can be found at the line labeled “<NSE-ss_observed>.” For example, for SEQ ID NO:1 the line labeled “<NSEss_flank5_E>” has the sequence “TCTATACCTCTAAAGAATCGCTGCTACTTTGTGCAAGACTTTTAAAGTTTAAATGAA TTA” associated with it. In addition, the line labeled “<NSE-ss_observed>” has the SNP “A/G” associated with it. Thus, SEQ ID NO:1 encompasses the nucleic acid sequence TCTATACCT CTAAAGAATCGCTGCTACTTTGTGCAAGACTTTTAAAGTTTAAATGAATTAA/G. Thus, nucleotides A or G correspond to the single nucleotide polymorphism (SNP) of SEQ ID NO:1 because the SNP corresponds to the first nucleotide, or complement thereof, in the most 3′ position of SEQ ID NO:1. Similar information for the remaining accession numbers in provided in the aforementioned database. Table 8 lists the SNP accession number and the 5′ sequence associated with each SNP (i.e., SEQ ID NOs:1-101). The single nucleotide polymorphism (SNP) corresponds to the first nucleotide, or complement thereof, in the most 3′ position of any one of SEQ ID NOs:1-101.
Table 2 provides the identified parentage and identity markers and allele frequencies within breed. Table 3 summarizes the data as to exclusion probability rate and probability match rate within breed and across all breeds. Exclusion probability at any locus l, (Ql), is the probability of excluding a random individual from the population as a potential parent of an animal based on the genotype of one parent and offspring. Following Weir (Weir, 1996, Genetic Data Analysis II. Sinauer, Sunderland, Mass.).
Q
l
=p
l−2pl2+2pl3−pl4
where pl is the frequency of the guanine allele at locus l. The overall probability of exclusion is one minus the probability that none of the loci allows exclusion and is calculated as
Q=1−Π(1−Ql)
Match probability ratio (MPR) was calculated, using the ceiling principle, as the square of the most frequent allele frequency to provide the most conservative estimate of match rate within a breed. Overall match probability ratio was estimated as the product of MPR at each SNP marker.
Sixty markers with the highest exclusion probability computed across all breeds were selected for the parentage panel. For example, with the 60 marker panel, most or all breeds can reach an exclusion probability of about 0.994 and identity match rate of about 6.42×10−5. This panel provides a powerful tool that can be used efficiently in parentage and identity programs.
In one example, a nucleic acid sample from a canine subject from the Doberman Pincher breed can be accurately matched to a previously identified sample 99.9% of the time. Using these same markers for parentage verification and identity, the probability of an individual selected at random from the Doberman Pincher breed with nucleotide occurrences at the SNP parentage and identity markers consistent with the canine subject is greater than 1 in 1,000,000.
The potential parents of a canine subject can be excluded thereby assuring the direct ancestral line and insuring the integrity of the registration database. Nucleic acid hypermutable sequences are currently utilized by the American Kennel Club, Professional Kennel Club and the United Kennel Club. As used herein, the term “hypermutable” refers to a nucleic acid sequence that is susceptible to instability, thus resulting in nucleic acid alterations. Such alterations include the deletion and addition of nucleotides. The hypermutable sequences of the invention are most often microsatellite DNA sequences which, by definition, are small tandem repeat DNA sequences. Thus, a combination of SNP analysis and microsatellite analysis may be used to infer a breed of a canine subject. Nucleic acid or tissue samples from an unknown canine subject can be matched to verify the ownership or identity of an individual canine. Because of the reproducibility and standardization of the SNP panel markers, these nucleic acid differences can be stored in a database linking animal id and owner, parents and siblings, with genotype allowing for ease of comparison and reducing the need for additional testing.
A panel generated from the canine SNPs provided herein can be utilized to verify the identity of a cloned animal or frozen or split and/or cloned embryo, or characterize tissues that may undergo intra- or inter-transplantation or propagation to other mammals, or verify the identity of banked and/or frozen semen, or verify cultured cell lines. In addition, an SNP identity and parentage panel can be used to link an animal, animal hair or animal biological samples to a crime scene for forensic analysis.
Examples of the probability of correct breed assignment is presented in Table 4 for 28 breeds evaluated. The probability of assignment ranged from 0.676 for the Chihuahua breed to 0.946 for the Basenji breed. In addition, Table 5 depicts each individual canine tested and the probability of assignment to a specific cluster. As shown in Table 6, all 38 canine subjects in eleven of the 28 breeds presented reached at least 0.7 probability of falling into the correct cluster. Canine subjects in 18 of the breeds evaluated had at least 90% of the canine subjects within breed falling into the correct cluster. The SNP breed identity panel can be used to verify claims for breed of a canine animal when parentage is unknown. Currently, the only canines accepted by breed are those where the records of individual animals are maintained by Breed Associations, this could open up new avenues for dog owners. Further, information regarding canine breed could allow canine owners to identify health characteristics associated with specific breed designations. Preventative measures could reduce the trauma to the animal and owner, and provide the owner with insight into the behaviors of the canine subject. The disposition and safety of the canine subject can be broadly determined by breed characteristics. At one extreme, communities have a vested role in safeguarding their citizens against vicious behaviors; and at the other extreme, canine owners may be able to reduce negative impacts from normal behaviors found within specific breeds.
A panel provided herein also aids in the placement, lost and found searches, and animal shelter reporting for canine animals become more accurate when the exact breed is known. Such means of identification allows animal shelters to screen animals and announce the results of the search to potential owners and to specific breed rescue groups. Further, mixed breed groups could determine the percentages of specific breeds of composition and breed development using such panels. These programs could lead to certification programs that can broadly group characteristics of specific crosses of canines.
Methods of the present invention further encompass identifying a nucleotide sequence of a hypermutable sequence in the sample, and inferring breed based on at least one SNP nucleotide occurrence and the nucleotide sequence of the hypermutable sequence. Hypermutable sequences include, for example, microsatellite nucleic acid sequences. The method can include a determination of the nucleotide occurrence of at least 2 SNPs. At least 2 SNPs can form all or a portion of a haplotype, wherein the method identifies a haplotype allele that is associated with a specific breed. Furthermore, the method can include identifying a diploid pair of haplotype alleles.
Also provided are methods for identifying a canine single nucleotide polymorphism (SNP) informative of breed, that includes performing whole genome shotgun sequencing of a canine genome, and genotyping at least two canine subjects from at least two breeds, thereby identifying the canine single nucleotide polymorphisms informative of breed. The Example provided herein, illustrates the use of this method to identify breed SNPs.
The following tables provide exemplary data generated by the compositions and methods provided herein.
Although the invention has been described with reference to the above example, it will be understood that modifications and variations are encompassed within the spirit and scope of the invention. Accordingly, the invention is limited only by the following claims.
This application claims the benefit of priority under 35 U.S.C. § 119(c) of U.S. Ser. No. 60/524,180, filed Oct. 24, 2003, and U.S. Ser. No. 60/617,383, filed Oct. 8, 2004, the entire content of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60524180 | Nov 2003 | US | |
60617383 | Oct 2004 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10972767 | Oct 2004 | US |
Child | 12288949 | US |