The present invention relates to methods and products for estimation on a breeding value based in genomic information. In one aspect, the invention relates to a method for determining the individual effect of a plurality of genetic markers on a phenotype such as udder health, fertility, or other health, or breeding value of reference bovine subjects. The individual effects of the genetic markers are employed in another aspect of the invention, for estimating a breeding value of a bovine subject based on the genotype of said bovine subject for the plurality of genetic markers.
The identification of genetic markers, that are associated with a particular phenotype, such as quantitative traits or to a heritable disease, has been facilitated by the identification of an increasing amount of markers such as microsatellite markers and single nucleotide polymorphisms (SNPs) as a source of polymorphic markers, which are associated with a mutation causing a specific phenotype. Markers associated with the mutation or the mutation itself causing a specific phenotype of interest may be localised by use of genetic analysis in pedigrees. In fact, application of molecular genetic information is an important issue in animal breeding.
Conventional methods to predict breeding value of cattle is based on the phenotypic record of the individual and the records of its relatives. Thus, the estimation of a breeding value of a bovine subject requires that phenotypic traits of the bovine subject and/or its relatives have been registered. Other methods of estimating a breeding value combine phenotypic and pedigree data with genomic information to increase accuracy and decrease generation interval. In contrast to other genetic evaluation methods, which only combine phenotypic data and probabilities that genes are identical by descent from pedigree data, genomic prediction traces the inheritance of individual genes. Thus, marker genotypes for multiple genetic marker loci across the genome can measure genetic similarity with higher precision than single marker methodologies and marker assisted selection based on the inheritance of only a few major genes. Low resolution marker maps can only provide indications of shared long chromosome segments within closely related family members but cannot detect the many minor genetic effects shared by distant relatives. In contrast by using genomic prediction, even minor genes can now be traced using high density genetic markers located across the entire genome. Current reports on genomic selection are primarily based on simulated data. Recently, a few results based on the data from real livestock population have been published (e.g., Harris et al., 2008; Hayes et al., 2009; Gonzalez-Recio et al., 2009; VanRaden et al., 2009). None of these reports, however, evaluate the accuracy of genomic prediction in the target population, which is necessary in order to apply genomic selection in practical breeding programs.
In a key aspect, the present invention relates to a method for determining the individual effect of a plurality of genetic markers on a breeding value of one or more reference bovine subjects. The invention also relates to estimation of a breeding value in a bovine subject, wherein said breeding value is based in the genotype of the bovine subject for a plurality of genetic markers. The estimated breeding value is then determined on the basis of the individual effect of the plurality of genetic markers on the breeding value of a reference bovine subjects. Moreover, the present invention relates to a method for selective breeding based in the estimation of the genomic breeding value; and the invention also comprises computer program products, computer readable media and computer systems for executing the methods of the present invention. A kit is also provided, which comprises one or more components of the present invention.
Also, in a main object of the present invention to provide an application method for marker assisted selection of polymorphisms in the bovine genome, wherein polymorphisms are associated with specific traits and phenotypes, such as defined herein; and/or provide a kit for detection of genetic marker alleles or a combination of genetic marker alleles for use in such a method, and/or to provide animals selected using the method of the invention.
Thus, in one aspect, the present invention relates to a method for determining the individual effect of a plurality of genetic marker alleles on udder health, fertility and/or other health of at least 100 reference bovine subjects and/or its relatives, said method comprising
a. providing at least 100 reference bovine subjects,
b. obtaining a sample from said one or more bovine subjects comprising genetic material,
c. determining on the basis of said genetic material the genotype of said one or more reference bovine subjects for said plurality of genetic markers,
d. determining the udder health, fertility and/or other health of said reference bovine subject, and
e. determining the individual effect of said plurality of genetic marker alleles on said udder health, fertility and/or other health of said reference bovine subject.
In a preferred embodiment, an estimated breeding value (EBV) is calculated on the basis of said udder health, fertility and/or other health. For example, the effect of each individual genetic marker allele on udder health, fertility, and/or other health is determined by calculating a reference estimated breeding value (EBV) of said one or more reference bovine subjects, wherein said reference EBV is used as response variable for determining the effect of each individual genetic marker allele on udder health, fertility, other health and/or the EBV. Udder health, fertility, other health and/or estimated breeding value is preferably determined by registration of phenotypic traits of said bovine subject and off-spring and/or other relatives of said bovine subject.
Udder health, fertility, other health and/or estimated breeding value is determined according to any method available to those of skill in the art. The phenotypes are preferably evaluated by registration of phenotypic traits of at least 40 offspring or other relatives of said bovine subject, however, the more offspring or relatives scored for a specific phenotype, the more accurate is the determination of the phenotype. Thus, in one embodiment, the effect of the plurality of genetic markers on udder health, fertility, other health and/or estimated breeding value have been determined for at least 100 reference bovine subjects, such as at least 1000 bovine subjects, for example between 1000 and 6000 bovine subjects, such as between 2000 and 5000 bovine subjects. In some cases at least 10000, such as for example between 10000 and 50000 bovine subjects (e.g. offspring or relatives) are included in the determination of a phenotype and/or a reference estimated breeding value.
Udder health, fertility, other health and/or estimated breeding value is in one embo determined using Least-squares method, Bayesian estimation, such as BayesA or BayesB or modification thereof, or Best Linear Unbiased Prediction (BLUP), for example marker-assisted Best Linear Unbiased Prediction (MA-BLUP), preferably Bayesian estimation, such as BayesA or BayesB or modifications thereof. Moreover, the determined udder health, fertility, other health, estimated breeding value and/or the individual effect of each genetic marker allele on said breeding value is stored in a non-volatile memory such as a computer memory and/or a database.
In another aspect, the present invention relates to a method of determining a genomic estimated breeding value (GEBV) of a bovine subject based on the genotype of said bovine subject for a plurality of genetic markers, said method comprising
a. providing a bovine subject,
b. obtaining a sample from said bovine subject comprising genetic material,
c. determining on the basis of said genetic material the genotype of said bovine subject for said plurality of genetic markers,
d. determining said GEBV by correlating said genotype for said plurality of genetic markers with a predetermined effect of each individual genetic marker allele on udder health, fertility, other health and/or estimated breeding value of at least 100 reference bovine subjects, said effect being determined as defined in any one of the preceding claims.
In a preferred embodiment of the aspects of the present, udder health comprises resistance to clinical mastitis, for example, udder health is determined by an udder health index weighing together information from resistance to clinical mastitis in first, second and/or third parity, somatic cell count (SCC), dairy form, udder support/fore udder attachment, and/or udder depth. Moreover, fertility is in one embodiment determined by a fertility index comprising Number of inseminations cows (AISC), Number of inseminations heifers (AISH), Fertility treatment 1st lactation (FERT1), Fertility treatment 2nd lactation (FERT2), Fertility treatments 3rd lactation (FERT3), Heat strength cows (HSTC), Heat strength heifers (HSTH), Calving to first insemination (ICF), First to last insemination cows (IFLC), First to last insemination heifers (IFLH), 56 day Non-return rate cows (NRRC), and/or 56 day Non-return rate heifers (NRRH), wherein the specific traits are defined more closely herein below. Also, in one embodiment of the aspects of the present invention, the other health is determined by an other health index comprising reproductive diseases, digestive diseases, and/or feet and leg diseases.
The plurality of genetic markers or genetic marker alleles is selected from any combination of polymorphic genetic markers, but in a preferred embodiment is a plurality of single nucleotide polymorphisms (SNP), microsatellite markers and/or mixtures thereof, most preferred SNP markers.
The determination of a genomic estimated breeding value according to the present invention is based on any combination of markers, but in a preferred embodiment, the plurality of genetic markers comprise a dense set of genetic markers located across the entire genome, for example comprising on average at least 1 genetic marker per cM, such as at least 10 genetic markers per cM, for example between 1 and 100 genetic markers per cM. Thus, in one embodiment, the plurality of genetic markers or genetic marker alleles comprises at least 50, such as at least 100, such as at least 200, such as at least 300, such as at least 400, such as at least 500, such as at least 600, such as at least 700, such as at least 800, such as at least 900, such as at least 1000, such as at least 2000, for example at least 3000, such as at least 4000, for example at least 5000, such as at least 6000, for example at least 7000, such as at least 8000, for example at least 9000, such as at least 10000, for example at least 12000, such as at least 14000, for example at least 16000, such as at least 18000, for example at least 20000, for example at least 22000, such as at least 24000, for example at least 26000, such as at least 28000, for example at least 30000, for example at least 32000, such as at least 34000, for example at least 36000, such as at least 38000, for example at least 40000, for example at least 42000, such as at least 44000, for example at least 46000, such as at least 48000, for example at least 50000, for example at least 52000, such as at least 54000, for example at least 56000, such as at least 58000, for example at least 60000, for example at least 62000, such as at least 64000, for example at least 66000, such as at least 68000, for example at least 20000, for example at least 72000, such as at least 74000, for example at least 76000, such as at least 78000, for example at least 80000, for example at least 82000, such as at least 84000, for example at least 86000, such as at least 88000, for example at least 90000, for example at least 92000, such as at least 94000, for example at least 96000, such as at least 98000, for example at least 100000 genetic markers or genetic marker alleles, or the plurality of genetic markers or genetic marker alleles comprises between 10000 and 100000, such as between 20000 and 80000, for example between 30000 and 60000, for example between 30000 and 50.000, such as between 30000 and 40000, for example between 35000 and 40000, for example between 37000 and 39000 genetic markers or genetic marker alleles. Several methods of detecting multiple genetic markers are available to the skilled person. In a preferred embodiment, the genetic marker alleles are detected simultaneously by gene chip technology, for example by using the Bovine SNP50 BeadChip provided by Illumina Inc.
The bovine subjects of the present invention belong to any cattle breed or family. In a preferred embodiment, the bovine subject is a member of the Holstein breed, for example a member of the Danish Holstein cattle population. The sample obtained in the methods of the present invention is any sample comprising genetic material, which may be extracted from the sample and used for genotyping of the bovine subject for the genetic markers of the invention. In a preferred embodiment, the sample is selected from the group consisting of blood, semen (sperm), urine, liver tissue, milk, muscle, skin, hair, follicles, ear, tail, fat, testicular tissue, lung tissue, saliva, spinal cord biopsy, and any other tissue; in a preferred embodiment, the sample is blood and/or milk.
The udder health, fertility, other health and/or genomic estimated breeding value (GEBV) is preferably calculated by simultaneous inclusion of all genetic marker effects regardless of statistic significance, and/or wherein genetic marker effects are calculated simultaneously, for example, udder health, fertility, other health and/or genomic estimated breeding value is calculated using Least-squares, Bayesian estimation, such as BayesA or BayesB or modifications thereof, or a marker-assisted Best Linear Unbiased Prediction (MA-BLUP), preferably Bayesian estimation, such as BayesA or BayesB or modifications thereof.
In a specific embodiment, the genomic estimated breeding value (GEBV) is combined with an estimated breeding value determined on the basis of an observed phenotype of said bovine subject and/or its offspring or other relatives.
In a third aspect, the present invention relates to a method for selective breeding, comprising determining a genomic estimated breeding value (GEBV) of a bovine subject using a method of the present invention, using said bovine subject as sire or dam for breeding if the GEBV of said bovine subject is equal to, or differs less than a predetermined amount from, a desired breeding value for the offspring. The specific GEBV depends on the statistical methods employed for determining the phenotypes and breeding values, and is apparent for those of skill in the art. In one embodiment, the bovine subject is used as sire or dam before an udder health, fertility and/or other health phenotype associated with the GEBV becomes manifest, and/or wherein said bovine subject does not have any offspring.
In a fourth aspect, the present invention relates to a computer program product including program code portions for performing, when run on a programmable apparatus, a method of the invention.
In a fifth aspect, the present invention relates to a computer readable medium comprising data representing a computer program product of the present invention.
In a further aspect, the present invention relates to a computer system and/or a programmable apparatus for performing a method of the present invention, comprising a computer program product and/or computer readable medium of the present invention.
In yet another aspect, the present invention relates to a kit comprising means for detecting a plurality of genetic markers, said kit comprising a computer program and/or a computer readable medium of the present invention.
A further aspect of the present invention relates to use of use of a computer program product, a computer readable medium, computer system and/or a programmable apparatus, and/or kit of the present invention for estimating a breeding value for a specific phenotype of a bovine subject.
The invention also relates to methods of determining a phenotypic trait, based on detection of one or more genetic markers, and methods for selected a bovine subject for breeding, as well as diagnostic kits for performing those methods.
Thus, another aspect of the present invention relates to a method of determining a phenotypic trait in a bovine subject, comprising detecting in a sample from said bovine subject the presence or absence of at least one genetic marker allele or a specific combination of genetic marker alleles, wherein said genetic marker allele or a specific combination of genetic marker alleles is associated with said phenotypic trait of said bovine subject and/or offspring or other relatives therefrom.
In another aspect, the present invention relates to a method for selecting bovine subjects for breeding purposes, said method comprising detecting in a sample from said bovine subject the presence or absence of at least one genetic marker allele or a specific combination of genetic marker alleles as defined in any of the preceding claims, wherein said at least one genetic marker allele or a specific combination of genetic marker alleles is associated with at least one trait of said bovine subject and/or offspring or other relatives therefrom.
A further aspect of the present invention relates to a diagnostic kit for determining the presence or absence in a bovine subject of at least one genetic marker allele or a specific combination of genetic marker alleles, wherein said genetic marker allele or a specific combination of genetic marker alleles is associated with a phenotypic trait of said bovine subject and/or offspring or other relatives therefrom.
The present invention relates to genetic determinants of phenotypes and/or phenotypic traits in dairy cattle. The phenotypic traits of the present invention are predominantly economically important factors in the dairy industry. Furthermore, bovine subjects with genetic predisposition for non-desired traits, such as low production traits, are carriers of genetic determinants of such non-desired traits, which can be passed on to their offspring. Therefore, it is of economic interest to identity those bovine subjects that have a genetic predisposition for specific desirable quantitative traits, to aid in the selection of cattle with a genetic profile, which positively affects specific phenotypic traits. The invention specifically relates to a method of determining a phenotypic trait in a bovine subject, comprising determining a multiplicity of genetic marker alleles that are associated with a specific phenotypic trait of said bovine subject and/or offspring or other relatives therefrom. Preferably, the present invention relates to prediction of a genomic breeding value, wherein a combination or plurality of genetic markers located throughout the entire bovine genome is determined, and the effect of each individual genetic marker allele is included in the determination of breeding value. Thus, the present invention relates to a method for predicting a genomic estimated breeding value of a bovine subject based on the genotype of said bovine subject for a plurality of genetic markers.
The methods, products and uses of the present invention allows for more efficient selection of genetically superior bovine subjects for breeding, and for generation of cattle with economically important phenotypes, such as cattle less susceptible to disease such as clinical mastitis and/or other diseases, and/or cattle with higher fertility/reproductive rate.
In cattle, it is possible to simultaneously genotype a plurality of genetic markers, for example a kit for genotyping more than 50.000 SNP markers (SNP: single nucleotide polymorphism) is commercially available. This opens an opportunity for effective selection using dense markers through the whole genome, such that selection is based on a plurality of markers covering the entire genome, this method also referred to as genomic selection. Genomic selection is based on breeding values that are directly estimated from genome-wide dense marker panels. Therefore, genetic evaluation can be performed as soon as DNA is obtained, which allows accurate selection in both genders early in life. Genomic selection leads to considerably higher genetic gains than conventional quantitative genetic selection, and using genomic selection in dairy cattle breeding will considerably facilitate the genetic progress while reducing the cost for proving bulls.
BTA is short for Bos taurus autosome.
The term “heritability” is used herein to describe the strength with which traits are inherited and it varies depending on the trait in question. In general traits associated with reproduction and survival have low heritabilities, while milk production and early body size have medium heritabilities, and later growth and carcase traits (i.e. fat and muscle) have relatively high heritabilities.
The term “determining a genotype” as used herein, refers generally to the determination of which specific allele a subjects carries in a specific genomic polymorphic locus. In a locus comprising a polymorphic genetic marker, such as a single nucleotide polymorphism (SNP), homozygous bovine subjects are carriers of the same allele of the genetic marker on both chromosomes, while heterozygous bovine subjects are carriers of different genetic marker alleles. Genotyping of the bovine subject includes identifying the specific genetic marker allele that the subject is a carrier of.
The word “comprising” does not exclude the presence of other features or steps then those listed in a claim. The words ‘a’ and ‘an’ shall not be construed as limited to ‘only one’, but instead are used to mean ‘at least one’, and do not exclude a plurality. The mere fact that certain measures are recited in mutually different claims does not indicate that a combination of these measures cannot be used to advantage.
Estimated breeding value (EBV) and genomic estimated breeding value (GEBV). The appearance and performance of a bovine subject is influenced by multiple both genetic and environmental factors. The term “estimated breeding value” is also abbreviated EBV throughout herein. An estimated breeding value is an estimate of an animals (herein bovine subjects) genetic merit for a range of commercially relevant phenotypes or production traits. An estimated breeding value is used as a measure of the genetic potential of an animal, for example as a measure of its genetic capacity for calving, susceptibility to disease etc., which it can pass on to its offspring. Estimated Breeding Values (EBV's) and indexes are normally calculated from animals' individual performance records as well as those of their known relatives, where the environmental effects (feeding, management, disease, climate etc) are sifted out to leave an estimate of the genetic value for each trait.
EBV are normally calculated using information from several sources, such as measurements from the animal itself, measurements from the animal's herd mates (contemporaries), measurements from the animal's relatives and their contemporaries, the degree to which one trait influences another (correlation), and/or the degree to which each trait is passed on to the next generation (i.e. heritability). A conventional EBV calculation involves solving a set of simultaneous equations where the unknown variables are the genetic value of the animal and the environmental effect on its performance. When carried out many times, using all the information on the animal, the equations are able to quantify the unknown genetic component. Thus, an estimated breeding value is an estimation of how much better than the average an animal's genetics should be, based on the animal's performance, as well as the performance of all its relatives. The more closely related the relative is to the individual, the more it can contribute to the EBV. Therefore siblings, progeny and parents are normally used for calculating EBVs, as they share the most genes with the animal. The accuracy of the EBV increases with the number of relatives included in the phenotypic registration records. The end result of the calculations is an EBV and over time, as more pedigree and performance data is added, the solution to the equations becomes more accurate and the EBV approaches the true (empiric/observed) breeding value of the bovine subject.
Thus, the conventional methods for estimating a breeding value of a subject are based on the phenotypic record of the individual and the records of its relatives. For example, an animal has one record of protein yield 300 kg, its mother has one record 280 kg, and the animal does not have any other relatives' record in the data available. Assume that the population mean is 250 kg and the variance is Vp=900, and heritability of protein yield is h2=0.3. Breeding value of protein yield for this animal is estimated as
EBV=b1(300−250)+b2(280−250).
b1 and b2 are solved by mixed model equations
b1Vp+b20.5h2Vp=h2Vp
b10.5h2Vp+b2Vp=0.5h2Vp,
which leads to b1=0.284, b2=0.107, and EBV=0.284*50+0.107*30=17.41
In fact, an animal usually has a number of relatives in the dataset, EBV of all animals of interest can be estimated using an appropriate model, such as a BLUP model integrating a genetic relationship matrix which is constructed from the pedigree of the animals.
In conventional genetic prediction, breeding value of a candidate bull without progeny records can be estimated from parent average EBV (i.e., pedigree index). If EBV of sire and dam or maternal grandsire are available, conventional EBV of a candidate bull without progeny records are usually estimated as:
EBV=½×sire EBV+¼×maternal grandsire EBV
or
EBV=½×sire EBV+½×dam EBV
When EBV of a bovine subject is obtained from its sire and maternal grandsire EBV, the reliability of the EBV is equal to ¼ reliability of sire EBV+ 1/16 reliability of maternal grandsire EBV.
The present invention provides a method for predicting a genomic estimated breeding value for a bovine subject based on the genotype of said bovine subject for a plurality of genetic markers. An estimated breeding value (EBV) of the present invention, which is based on a plurality of genetic marker alleles is herein referred to as a genomic estimated breeding value (GEBV), because the EBV is based on genomic information, i.e. the EBV is calculated from the genotype of a bovine subject. Thus, the abbreviation, GEBV, as used herein, refers to a breeding value, which is estimated on the basis of genomic information, such as the genotype of a plurality of genetic markers. Methods of predicting a GEBV are provided herein below.
Prediction of a genomic estimated breeding value is based on a plurality of genetic markers, such as dense markers through the whole genome and marker effects. Marker effects are estimated from reference animals which have both phenotypic record and genotype record. For example, based on one or more reference animals with both phenotypic record and genotype records of 50000 markers, allele effects of each marker are estimated and shown in the following table.
An individual has marker types as presented in the following table
The predicted breeding of this individual is the sum of allele effects over all markers, GEBV=1*(−5)+1*5+2*3+0*(−3)+2*6+0*(−6)+ . . . +1*(−2)+1*2+1*8+1*(−8)+0*(−4)+2*2
In this approach, breeding value of an individual can be predicted, no matter whether the individual and its relatives have records or not. Thus, in one aspect, the present invention relates to a method for determining the individual effect of a plurality of genetic marker alleles on udder health, fertility and/or other health of at least 100 reference bovine subjects and/or its relatives, said method comprising
a. providing at least 100 reference bovine subjects,
b. obtaining a sample from said one or more bovine subjects comprising genetic material,
c. determining on the basis of said genetic material the genotype of said one or more reference bovine subjects for said plurality of genetic markers,
d. determining the udder health, fertility and/or other health of said reference bovine subject, and
e. determining the individual effect of said plurality of genetic marker alleles on said udder health, fertility and/or other health of said reference bovine subject.
In a preferred embodiment, an estimated breeding value (EBV) is calculated on the basis of said udder health, fertility and/or other health. The effect of each individual genetic marker allele on udder health, fertility, and/or other health is for example determined by calculating a reference estimated breeding value (EBV) of said one or more reference bovine subjects, wherein said reference EBV is used as response variable for determining the effect of each individual genetic marker allele on udder health, fertility, other health and/or the EBV. Udder health, fertility and other health phenotypes are described more in more detail herein below. The phenotype of the bovine subject is preferably determined by registration of phenotypic traits of said bovine subject and/or off-spring/progeny and/or other relatives of said bovine subject. Registration of phenotypic traits is described elsewhere herein. In a preferred embodiment, the phenotype is udder health, fertility, and/or a health index (other health) comprising reproductive diseases, digestive diseases, feet and leg diseases, see herein below, however an udder health phenotype is particularly preferred, such as an udder health index comprising susceptibility to clinical mastitis as described herein below. The individual effect of a plurality of genetic marker alleles on udder health, fertility and/or other health phenotype or phenotypic trait is preferably evaluated by registration of phenotypic traits of at least 5, such as at least 10, for example at least 20, such as at least 30, for example at least 40 offspring or other relatives of said bovine subject.
The accuracy of the estimated individual effect of each marker of the plurality of markers increased with the number of reference bovine subjects, for which the phenotype, such as udder health, fertility and/or other health have been correlated with the genotype. Thus, in a preferred embodiment, the individual effect of the plurality of genetic marker alleles on udder health, fertility and/or other health, and/or a corresponding breeding value are determined for a plurality of reference bovine subjects, such as at least 10, at least 20, at least 50, at least 80, for example at least 100 reference bovine subjects, such as at least 1000 bovine subjects, for example at least 2000 reference bovine subjects, such as at least 3000 bovine subjects, for example at least 4000 reference bovine subjects, such as at least 5000 bovine subjects, for example at least 6000 reference bovine subjects, such as at least 7000 bovine subjects, for example at least 8000 reference bovine subjects, such as at least 9000 bovine subjects, for example at least 10000 reference bovine subjects, such as at least 20000 bovine subjects, for example at least 30000 reference bovine subjects, such as at least 40000 bovine subjects, for example at least 50000 reference bovine subjects.
In one embodiment, the individual effects of the plurality of genetic marker alleles on udder health, fertility and/or other health and/or the corresponding breeding value are determined for 1000 and 6000 bovine subjects, such as between 2000 and 5000 bovine subjects. In a preferred embodiment, the genetic marker effects are determined on the basis of between 1000 and 10000 bovine subjects, such as between 2000 and 6000 bovine subjects, for example between 2500 and 8000 bovine subjects, such as between 2000 and 7000 bovine subjects, such as between 3000 and 5000 bovine subjects, such as between 4000 and 4500 bovine subjects. The specific number of bovine subjects required for the most accurate determination of marker effects depends on the phenotype/phenotypic trait to correlate with genotype, and the heritability of said phenotype/phenotypic trait. Moreover, the number of reference animals may depend on whether the phenotype is observed on a bull or a cow, where for example fewer bulls are required for accurate estimation of genotype effects on udder health than for cows, because the bulls phenotype is an average of more daughter's and other relatives' registration.
Genetic marker effects may be determined by any method available to those of skill in the art. In a preferred embodiment, the effect of each individual genetic marker allele on a phenotype such as udder health, fertility and/or other health or the estimated breeding value is determined by calculating a reference breeding value of the reference bovine subjects, wherein said reference breeding value is used as response variable for determining the effect of each individual genetic marker allele on udder health, fertility and/or other health and/or the reference estimated breeding value. Any statistical method or model available to the skilled person may be employed. For example, the reference breeding value is determined using Least-squares method, Bayesian estimation, such as BayesA or BayesB or modification thereof, or Best Linear Unbiased Prediction (BLUP), for example marker-assisted Best Linear Unbiased Prediction (MA-BLUP), preferably Bayesian estimation, such as BayesA or BayesB or modifications thereof.
The phenotypic registrations, such as udder health, fertility and/or other health phenotypes, and/or a reference estimated breeding values are preferably stored in a database, for use in any method for predicting a breeding value of a bovine subject. In one embodiment, the reference breeding value and/or the individual effect of each genetic marker allele on said breeding value is stored in a non-volatile memory such as a computer memory and/or a database.
The present invention provides a methodology for the determining an estimated breeding value based on the genotype of a bovine subject. Such a breeding value based on genomic information of the bovine subject is herein referred to as a genomic estimated breeding value or GEBV.
Thus, determination of a GEBV can be performed as soon as a sample comprising genetic material such as preferably DNA and/or RNA is obtained from the bovine subject, and a plurality of genetic markers have been genotyped. It is not required that the candidate bovine subject have phenotypic records, or progeny phenotypic records.
As shown in Table 1, reliabilities of GEBV are much higher than reliabilities of the EBV estimated from parent average EBV, thus providing that genomic prediction of young candidates based on GEBV is more accurate than conventional approach.
Thus, the present invention provides an improved methodology for determining a genomic estimated breeding value for a bovine subject, in the absence of phenotypic records of said bovine subject or its progeny and/or relatives. In one aspect, the invention relates to a method of determining a genomic estimated breeding value (GEBV) of a bovine subject based on the genotype of said bovine subject for a plurality of genetic markers, said method comprising
a. providing a bovine subject,
b. obtaining a sample from said bovine subject comprising genetic material,
c. determining on the basis of said genetic material the genotype of said bovine subject for said plurality of genetic markers,
d. determining said GEBV by correlating said genotype for said plurality of genetic markers with a predetermined effect of each individual genetic marker allele on udder health, fertility, other health and/or estimated breeding value of at least 100 reference bovine subjects, said effect being determined as defined elsewhere herein. In a preferred embodiment, udder health comprises resistance to clinical mastitis, for example, udder health is determined by an udder health index weighing together information from resistance to clinical mastitis in first, second and/or third parity, somatic cell count (SCC), dairy form, udder support/fore udder attachment, and/or udder depth.
In a preferred embodiment, the genomic estimated breeding value is determined in respect of a phenotype such as udder health, such as susceptibility to subclinical and/or clinical mastitis. In a specific embodiment, udder health is determined by an udder health index weighing together information from resistance to clinical mastitis in first, second and/or third parity, somatic cell count (SCC), dairy form, udder support/fore udder attachment, and/or udder depth. In another embodiment, the phenotype is fertility, or an other health index as disclosed herein.
The plurality of genetic marker alleles is preferably a plurality of single nucleotide polymorphisms (SNP), microsatellite markers and/or mixtures thereof. However, any suitable genetic marker may be employed in the analysis.
Several statistical models and algorithms are suitable for determining a genomic estimated breeding values based on plurality of genetic markers and/or dense markers of the present invention. For example Best Linear Unbiased Prediction (BLUP), BayesA and BayesB may all be used for analysis of marker effects, and for determining a genomic estimated breeding value. A linear BLUP approach assumes that effects of all genetic markers (e.g. SNPs and/or microsatellites) are normal distributed with same variance. BayesA and BayseB allow each marker to have its own variances of allele effects, and each variance is a sample of a scaled inverse Chi-square distribution. BayesB also models most SNP having zero effect, but a few having moderate to large effects. To simplify the computing algorithm in BayesA and BayesB (especially, the Metropolis-Hastings step in BayesB), alternative Bayesian approaches similar to BayesA and BayesB may be employed for prediction of a GEBV according to the present invention; such modifications of BayesA or BayesB being apparent to those of skill in the art. Some approaches model SNP effects as a product of a scaled effect and a scaling factor (which can be understood as standard deviation of allele effects in a marker). It may assume that the prior distribution of scaling factors is either a normal distribution or a mixture of two normal distributions.
In one example, a Bayesian method, which captures the features of BayesA and BayesB but simplifies the computing algorithm, is used to estimate marker effects for genomic prediction, in a model such as:
where y is the EBV determined by the observed phenotype (e.g. udder health, fertility and/or other health), μ is the intercept, m is the number of SNP markers, qi is the vector of scaled SNP effects (scaled by standard deviation) of marker i with qi˜N(0,I), vi (vi>0) is a scaling factor (standard deviation) for SNP effects of marker i, and e is the vector of residual. The effects of SNP types of marker i are the products of vi and qi.
Scaling factors vi are in this approach assumed to have either a common prior distribution or a mixture prior distribution. A common prior distribution across the variances of chromosome segment effects, which leads to a slight or moderate differentiation between small and large effects of markers, is assumed to be a positive half-normal distribution,
v
i
˜TN(0,σv2), vi>0
Mixture prior distribution, which lead to a strong differentiation between small and large effects of markers, assume that a proportion (p0, typically large) of markers have a very small effect, and a proportion (p1, typically small) of markers have a moderate or large effect. This is achieved by assuming that the prior distribution of vi was sampled from either a positive half-normal distribution with a small variance (σv02) or a positive half-normal distribution with large variance (σv12)
v
i˜π0TN(0,σv02)+π1TN(σv12)
The GEBV for individual k is in one embodiment defined as the sum of predicted effects of SNP over all markers
Preferably, the common prior model or the mixture prior models is used to estimate SNP effect for genomic prediction, most preferably the common prior model.
In general, in the methods of the present invention, the genomic estimated breeding value based on genotyping of a bovine subject is calculated by simultaneous inclusion of both significant and non-significant genetic marker effects, and/or the genetic marker effects are fitted in the model (calculated) simultaneously. Suitable statistical tools or models for determining that genomic estimated breeding value comprise Least-squares method, Bayesian estimation, such as BayesA or BayesB or modification thereof, or Best Linear Unbiased Prediction (BLUP), for example marker-assisted Best Linear Unbiased Prediction (MA-BLUP), preferably Bayesian estimation, such as BayesA or BayesB or modifications thereof.
The present invention also relates to methods of predicting a breeding value, wherein a GEBV is combined with an EBV based on progeny records.
Thus, in one embodiment, the present invention relates to an index, which combines GEBV and parent average EBV and/or EBV based on other phenotypic registrations among progeny, offspring and/or other relatives. By combining the GEBV with an EBV determined on the basis of progeny records or parent average, the accuracy of the predicted breeding value in increased. The present invention incorporates any suitable method available for those of skill in the art for calculating an index of GEBV and a progeny/relative based EBV, or EBV based on parent average EBV. Several approaches are thus available for blending GEBV and conventional EBV.
In one specific embodiment, a simple index van be constructed as
I=b1GEBV+b2PA,
where I represents the index combining GEBV and PA EBV, and b1 and b2 are solved correlation analysis.
In another embodiment, a bivariate model is used to fit GEBV and progeny based EBV or PA EBV data, where GEBV and progeny based EBV (or PA) of both reference animal and candidates are used as data information, and relationship between animals are integrated into the analysis.
In a further embodiment, the index of GEBV and a progeny based EBV/PA EBV is predicted by a one-step approach, wherein the model is used to estimate GEBV and extra contribution from relatives.
The method of the present invention for estimating a breeding value based on genotyping of a plurality of genetic markers of a bovine subject, thus comprises combining the genomic estimated breeding value (GEBV) with a breeding value estimated on the basis of an observed phenotype of said bovine subject and/or its offspring or other relatives.
Importantly, genomic prediction, i.e. determination of a GEBV, according to the present invention, can be used to increase reliability in determination of breeding values. Reliability is a statistical measure, which describes the extent to which a test is dependable, stable, and consistent. For example, reliability may be calculated as: Reliability=1−Se2/Vg, where Se is posterior standard deviation of GEBV, Vg is genomic variance estimated from data, see examples herein below.
The present invention can be used to estimate breeding values with high reliability, and according to the present invention breeding values can be determined with reliabilities of at least 0.5, such as at least 0.6, for example at least 0.7, for example at least 0.8, such as at least 0.9, for example at least 0.99, such as 1. Thus, the present invention comprises genotyping of at least one genetic marker locus, such as multiplicity of genetic marker loci to obtain an estimated breeding value with a reliability according to the present invention of at least 0.5. In fact, the genotyping of multiple genetic marker loci and/or determination of a GEBV according to the present invention significantly increases the reliability of breeding values determined according to the present invention.
Genomic selection or selection of bovine subjects for breeding based on GEBV may for example be used for selection of cows and bulls for breeding purposes, as described below.
The term “bovine subject” refers to cattle of any breed and is meant to include both cows and bulls, whether adult or newborn animals, and sires as well as dams. A bovine subject of the present invention comprises animals with phenotypic registrations as well as animal with non phenotypic record, including newborn calves. No particular age of the animals are denoted by this term. One example of a bovine subject is a member of the Holstein breed. In one embodiment, the bovine subject is a member of the Holstein-Friesian cattle population. In another embodiment, the bovine subject is a member of the Danish Holstein cattle population. In yet another embodiment, the bovine subject is a member of the Swedish Holstein cattle population. In another embodiment, the bovine subject is a member of the Holstein Swartbont cattle population. In another embodiment, the bovine subject is a member of the Deutsche Holstein Schwarzbunt cattle population. In another embodiment, the bovine subject is a member of the US Holstein cattle population. In one embodiment, the bovine subject is a member of the Red and White Holstein breed. In another embodiment, the bovine subject is a member of the Deutsche Holstein Schwarzbunt cattle population. In one embodiment, the bovine subject is a member of any family, which includes members of the Holstein breed. In one embodiment the bovine subject is a member of the Danish Red population. In another embodiment the bovine subject is a member of the Finnish Ayrshire population. In yet another embodiment the bovine subject is a member of the Swedish Red population. In another embodiment, the bovine subject is a member of the Swedish Red and White population. In yet another embodiment, the bovine subject is a member of the Nordic Red population.
In a preferred embodiment, the bovine subject of the present invention is a member of the Holstein breed, such as a member of the Holstein-Friesian cattle population. In a most preferred embodiment, the bovine subject of the present invention is a member of the Danish Holstein cattle population. However, in another embodiment, the bovine subject is a member of the Swedish Holstein cattle population. Moreover, it is understood that the methods and kits of the present invention are applicable to bovine subjects in general and thus also applies to any bovine subject, which is more or less related to the Danish Holstein cattle population.
In one embodiment of the present invention, the bovine subject is selected from the group consisting of Swedish Red and White, Danish Red, Finnish Ayrshire, Holstein-Friesian, Danish Holstein and Nordic Red. In another embodiment of the present invention, the bovine subject is selected from the group consisting of Finnish Ayrshire and Swedish Red cattle. In another embodiment of the present invention, the bovine subject is selected from the group consisting of Finnish Ayrshire and Swedish Red cattle.
In a preferred embodiment, the bovine subject is a member of the Danish Holstein cattle population. In another preferred embodiment, the bovine subject is a member of the Swedish Holstein cattle population.
In one embodiment, the bovine subject is selected from the group of breeds shown in table 1a
In one embodiment, the bovine subject is a member of a breed selected from the group of breeds shown in table 1b
In one embodiment, the bovine subject is a member of a breed selected from the group of breeds shown in table 1c
The methods and products (kits, computer systems etc.) of the present invention relates to the determination of a genomic estimated breeding value based on the genotype of a plurality of genetic markers, wherein the genotype is correlated with a predetermined effect of said genotype on a phenotype (e.g. udder health, fertility and/or other health) and/or the corresponding estimated breeding value. An marker effects and/or estimated breeding value may be determined for any phenotype or phenotypic trait.
The genetic marker allele, plurality of genetic marker alleles, and/or combination of genetic marker alleles of the present invention may be used to determine a number of phenotypic traits and/or the breeding value for a number of phenotypes or phenotypic traits of a bovine subject. A number of specific traits may be associated with a specific phenotype, and a phenotypic trait may also comprise a number of secondary or subordinate traits. Therefore, a phenotype may also be an overall index, which incorporates one or more of said phenotypic traits and/or subordinate/secondary traits. The phenotypes or phenotypic traits of the present invention may be grouped into reproduction traits, production traits, milk traits, meat traits, health traits and exterior traits. A list of secondary phenotypic traits associated with each of those traits is provided in the tables below.
The methods and products (kits, computer systems etc.) of the present invention relates to the determination of a genomic estimated breeding value based on the genotype of a plurality of genetic markers, wherein the genotype is correlated with a predetermined effect of said genotype on the phenotype or estimated breeding value. A breeding value may be determined for any phenotype or phenotypic trait.
Thus, in one embodiment a genetic marker allele, genotype of a plurality of genetic markers and/or combination of genetic marker alleles of the present invention may be used for determining in a bovine subject a phenotype, phenotypic trait and/or an estimated breeding value associated with reproduction/fertility, wherein said phenotypic trait is selected from any of the phenotypic traits listed above. I.e. the phenotypic trait is in one embodiment selected from the group consisting of Conception Rate [CONCRATE], Calving Ease [CALEASE], Calves Born Alive, % [BORNLIVE], Pregnancy Rate [PREGRATE], Twinning [TWIN], Ovulation Rate [OVR] (N/A), Nonreturn Rate [NONR] (N/A), Still Birth [SB] (N/A), Heat Intensity [HTINT] (N/A), Age At Puberty [PUBAGE], Structurally Soundness (Legs, Feet, Penis And Prepuce) [SOUND], FSH At Castration [FSHC] (N/A), Paired Testis Weight [PTW] (N/A), Paired Testis Volume [PTV] (N/A), Body Weight @ Castration [BWC] (N/A), Teat Placement [TPL] (N/A), Teat Length [TLGTH] (N/A), Quality Of Udder [UQUAL] (N/A), Udder Depth [UDPTH] (N/A), Foot Angle [FANG] (N/A), Somatic Cell Count [SCC] (N/A), Udder Cleft [UC] (N/A), Body Form Composite Index [BFCI] (N/A), Udder Attachment [UA] (N/A), Stature [STA] (N/A), Strength [STR] (N/A), Bone Quality [BQ] (N/A), Dairy Form [DYF] (N/A), Udder Height [UHT] (N/A), Udder Composite Index [UCI] (N/A), Udder Width [UWDT] (N/A), and Daughter Pregnancy Rate [DPR] (N/A).
In another embodiment, the plurality of genetic marker alleles, genetic marker allele and/or combination of genetic marker alleles of the present invention may be used for determining in a bovine subject a phenotype, phenotypic trait and/or an estimated breeding value associated with production, wherein said phenotypic trait is selected from the group consisting of Retail Product Yield [YEILD], Pre-Weaning Average Daily Gain [PWADG], Post-Weaning Average Daily Gain [POADG], Mean Body Weight [BWM], Lean To Fat Ratio [L2FRATIO], Yearling Weight [W365], Weaning Weight [WWT], Birth Weight [BW], Average Daily Gain [ADG], Slaughter Weight [SWT] (N/A), Temperament [TMP] (N/A), Withers Height [WHT] (N/A), Hip Height [HIPHT] (N/A), Body Length [BL] (N/A), Chest Width [CHWDT] (N/A), Shoulder Width [SHOWDT] (N/A), Chest Depth [CHDPT] (N/A), Hip Width [HIPWDT] (N/A), Lumbar Width [LUMWDT] (N/A), Thurl Width [THUWDT] (N/A), Pin Bone Width [PINWDT] (N/A), Rump Length [RUMLGT] (N/A), Cannon Circumference [CANCIR] (N/A), Chest Girth [CHEGRT] (N/A), Abdominal Width [ABDWDT] (N/A), Abdominal Growth [ABDGRT] (N/A), Body Depth [BD] (N/A), Rump Angle [RANG] (N/A), Rump Width [RUMWID] (N/A), Heel Depth [Hdpth] (N/A), Veterinary Treatments [VT], Length Of Productive Life [PL], Persistency [Per] (N/A), PTA Type [PTAT], and Behavior [BEH] (N/A).
In another embodiment, the plurality of genetic marker alleles, genetic marker allele and/or combination of genetic marker alleles of the present invention may be used for determining in a bovine subject a phenotype, phenotypic trait and/or an estimated breeding value associated with milk, wherein said phenotypic trait is selected from the group consisting of Milk Yield [MY], Milking Speed [MSPD] (N/A), Dairy Capacity Composite Index [DCCI] (N/A), Protein Yield [PY], Protein Percentage [PP], Energy Yield [EY] (N/A), Protein Content [PC] (N/A), Fat Percentage [FP], Fat Content [FC], and Fat Yield [FY].
In yet another embodiment, the plurality of genetic marker alleles, genetic marker allele and/or combination of genetic marker alleles of the present invention may be used for determining in a bovine subject a phenotype, phenotypic trait and/or an estimated breeding value associated with meat quality, wherein said phenotypic trait is consisting of Carcase Dressing [DRESSING], % USDA Choice [PERCHOICE], Tenderness Score [TENDER], Ribeye Area [REA], Marbling Score [MARBL], Fat Thickness [FATTH], Longissimus Muscle Area [LMA] (N/A), Adjusted Fat [FATADJ] (N/A), Ether Extractable Fat [EEF] (N/A), Stearic Acid [STA] (N/A), Oleic Acid [OLA] (N/A), % Unsaturated Fatty Acids [PERUFA] (N/A), Percent Unsaturated To Saturated Fatty Acids [RUSFA] (N/A), Fat Trim Yield [FATYD] (N/A), Rib Fat [RIBF] (N/A), Protein % [PP] (N/A), Carcase Weight [CWT], and KPH Fat/CWT Ratio [KPHCWT] (N/A).
In another embodiment, the plurality of genetic marker alleles, genetic marker allele and/or combination of genetic marker alleles of the present invention may be used for determining in a bovine subject a phenotype, phenotypic trait and/or an estimated breeding value associated with health/fitness/udder health, wherein said phenotypic trait is selected from the group consisting of Somatic Cell Score [SCS], Somatic Cell Count [SCC] and Clinical Mastitis [CM].
In another embodiment, the plurality of genetic marker alleles, genetic marker allele and/or combination of genetic marker alleles of the present invention may be used for determining in a bovine subject a phenotype, phenotypic trait and/or an estimated breeding value associated with exterior traits, such as Pigmentation and conformation, wherein said phenotypic trait for example is Degree Of Spotting [DS].
The plurality of genetic marker alleles, genetic marker allele and/or combination of genetic marker alleles of the present invention may be used for determining in a bovine subject a phenotype, phenotypic trait and/or an estimated breeding value of a phenotypic trait, such as a yield trait, a fitness trait, and/or a conformation trait. The phenotypic trait is in one example selected from the group consisting of Birth ease, Body score, Calving ease, Fat, Fat percent, Fertility, Health, Leg, Longevity, Milk, Milk organ, Milk speed, Protein, Prot. percent, Temperament, Udder health, Yield, Average, and Other diseases.
In a preferred embodiment, the plurality of genetic marker alleles, genetic marker allele and/or combination of genetic marker alleles of the present invention may be used for determining in a bovine subject a phenotype, phenotypic trait and/or an estimated breeding value of a phenotypic trait associated with mastitis, fertility/reproduction, calving and/or other diseases.
Genomic prediction of a breeding value according to the present invention is particularly suitable for determination of genetic determinants or estimation of a breeding value for complex genetic traits and phenotypes, and for traits for which the genetic correlations are relatively weak, such as for example fertility and mastitis.
Fertility or the fertility phenotypes in a bovine subject is affected by a number of traits. The terms “fertility trait” or “phenotypic trait associated with fertility” or “fertility phenotype” as used herein refers to any trait or phenotype, which affect fertility in a bovine subject or its offspring or other relatives. In particular, fertility of a bovine subject in the context of the present application may be physically manifested by the fertility of its offspring or other relatives—both female and male. Thus, the fertility of a bull may be measured by a specific fertility trait in its female offspring or other relatives and/or the female offspring or other relatives of its offspring or other relatives. The calving traits may, thus, be assessed both as a “direct’ effect (D) of the sire in the calf and as a “maternal” effect (M) of the sire in the mother of the calf.
The breeding value of the present invention is determined for any trait, which affects fertility and/or is associated with fertility. In one embodiment, the breeding value is determined for an index of fertility, which incorporates one or more of the fertility traits described herein below. Specifically, the present invention and/or breeding values determined according to the present invention relates to traits such as those listed below:
Number of inseminations cows (AISC)
Number of inseminations heifers (AISH)
Fertility treatment 1st lactation (FERT1)
Fertility treatment 2nd lactation (FERT2)
Fertility treatments 3rd lactation (FERT3)
Heat strength cows (HSTC)
Heat strength heifers (HSTH)
Calving to first insemination (ICF)
First to last insemination cows (IFLC)
First to last insemination heifers (IFLH)
56 day Non-return rate cows (NRRC)
56 day Non-return rate heifers (NRRH)
AISc, ICF, IFLc, NRRc and HSTc in different parities are taken as the same traits. Fertility treatments in different parity are treated as different traits.
The individual fertility trait is described in more detail below.
This trait is based on how many inseminations a cow or heifer needs in order to get pregnant; it describes the cows or heifer's ability to get pregnant after a number of inseminations (defined as pregnancy rate) and it also describes heat strength. In order to inseminate a cow at the correct time point it must show heat, and that is why this trait also reflects heat strength.
Cows(C) and heifers (H) are considered separately as AISC or AISH.
All inseminations are recorded by an artificial insemination technician (AI-technician) or a licensed farmer. This is later recorded in the national recording database and in this case recalculated into a breeding value for every sire.
Fertility Treatments 1st, 2nd and 3rd Lactation (FERT1, FERT2, FERT3)
Fertility treatments are divided into three groups. Group 1 represents hormonal reproductive disorders and consists of ovarian cysts treatments. Group 2 represents infective reproductive disorders and consists of recordings of endometritis, metritis and vaginitis treatments. The last group consists of treatments for abortion, uterine prolaps, uterine torsion and other reproductive disorders. A disorder code is 1 if the cow has the corresponding disease or otherwise 0. The three lactations are considered as different traits. The trait may be recalculated into a breeding value for every sire.
In Denmark fertility treatments are recorded by veterinarians and AI-technicians. Thus the traits FERT1, FERT2 and FERT3 describe fertility treatments for first to third lactation, respectively.
HST measures the ability to show oestrus. The trait is measured subjectively by the individual farmer on a predefined relative scale from 1 to 5.
Cows and heifers are considered separately as HSTC or HSTH.
This may be recalculated into a breeding value for every sire.
This trait is only described for cows and reflects heat strength and reflects the ability to return to cycling after calving. In order to inseminate a cow it must return to cycling after calving. The recording unit is days. ICF may be recalculated into a breeding value for every sire.
This is measured as the time from first insemination to the last insemination. The recording unit is days. IFL describes pregnancy rate and heat strength defined above. Cows and heifers are considered separately as IFLC or IFLH.
This is later recorded in the national recording database and in this case recalculated into a breeding value for every sire.
NRR is based on whether the cow or heifer had a second insemination within 56 days after the first insemination. All cows and heifers not offered AI within 56 days were considered pregnant. NRR describes the cows' or heifers' ability to become pregnant after insemination, defined as pregnancy rate. The recording unit is days.
It is recorded by an artificial insemination technician (AI-technician) or a licensed farmer.
Cows and heifers are considered separately as NRRC or NRRH, respectively.
This is later recorded in the national recording database and in this case recalculated into a breeding value for every sire.
The genetic markers according to the present invention and/or fertility or breeding value determined according to the present invention are associated with at least one trait associated with fertility. In one embodiment, the trait associated with fertility is selected from the group consisting of Number of inseminations cows (AISC), Number of inseminations heifers (AISH), Fertility treatment 1st lactation (FERT1), Fertility treatment 2nd lactation (FERT2), Fertility treatments 3rd lactation (FERT3), Heat strength cows (HSTC), Heat strength heifers (HSTH), Calving to first insemination (ICF), First to last insemination cows (IFLC), First to last insemination heifers (IFLH), 56 day Non-return rate cows (NRRC) and 56 day Non-return rate heifers (NRRH)
In a specific embodiment, fertility is determined by the presence of a trait associated with fertility is Number of inseminations cows (AISC). In another specific embodiment, the trait associated with fertility is Number of inseminations heifers (AISH). In yet another specific embodiment, the trait associated with fertility is Fertility treatment 1st lactation (FERT1). In a further specific embodiment, the trait associated with fertility is Fertility treatment 2nd lactation (FERT2). In another specific embodiment, the trait associated with fertility is Fertility treatments 3rd lactation (FERT3). In another specific embodiment, the trait associated with fertility is Heat strength cows (HSTC). In yet another specific embodiment, the trait associated with fertility is Heat strength heifers (HSTH). In another specific embodiment, the trait associated with fertility is Calving to first insemination (ICF). In another specific embodiment, the trait associated with fertility is First to last insemination cows (IFLC). In another specific embodiment, the trait associated with fertility is First to last insemination heifers (IFLH). In another specific embodiment, the trait associated with fertility is 56 day Non-return rate cows (NRRC) and 56 day Non-return rate heifers (NRRH)
The fertility of a bovine subject as determined by the presence or absence of a genetic marker or genetic marker allele as defined by the present invention is estimated relative to the fertility of a bovine subject, wherein said genetic marker is absent from or present in the same locus, respectively. Thus, a bovine subject, wherein the presence of a genetic marker allele is associated with a reduced fertility, the reduction is estimated relative to a bovine subject, wherein said genetic marker is absent from the same genetic locus. Conversely, a bovine subject, wherein the absence of a genetic marker is associated with a reduced fertility, the reduction is estimated relative to a bovine subject, wherein said genetic marker is present from the same genetic locus.
The data of sub-traits of female fertility are preferably collected by A.I. technicians. Fertility treatments are carried out by veterinarians and/or by AI technicians. The data are preferably transferred to a Central Cattle Database, such as a computer readable medium or other physical or non-physical entities as described elsewhere herein.
The heat strength is in one embodiment recorded in Sweden. The judgement is preferably done by the farmer and is based on how clearly the cow shows oestrus signs. The heat strength is recorded as ordinal category codes. The data are reported to the milk recording and AI schemes in Sweden.
In a preferred embodiment, fertility is determined by a fertility index comprising one or more fertility phenotypic traits selected from the group consisting of
In one embodiment, the sub-traits are divided into 3 groups, and breeding values of each trait were estimated using a multi-trait BLUP sire model for each group of traits, respectively. Group 1: NRRh, IFLh, NRRc, ICF, and IFLc. Group 2: AISh, HSTh, AISc, HSTc, ICF. Group 3 (FTR1, FTR2, and FTR3).
Model for group1 and group 2 is:
Y=Herd_Year+Month of first insemination+Parity_Age at first insemination+Proportion of breed+Proportion of heterozygosity+Sire+Residual,
where Herd_Year, Month of first insemination and Parity_Age at first insemination are fixed effect, Proportion of breed and Proportion of heterozygosity are fixed regression, Sire and Residual are random effects. For ICF, Month is the month of calving.
Model for group 3 is:
Y=Herd
—5Year+Month of calving+Parity_Age at calving+Proportion of breed+Proportion of heterozygosity+Sire+Residual,
where Herd—5 Year, Month of calving and Parity_Age at calving are fixed effect, Proportion of breed and Proportion of heterozygosity are fixed regression, Sire and Residual are random effects.
In a preferred embodiment, a fertility index of the present invention is calculated as the weighted sum of sub-trait EBV, weighted by their economical weight.
Mastitis influences the udder health status of a bovine subject, and is affected by a number of traits. Traits that are associated with mastitis or udder health according to the present invention are for example the occurrence of clinical mastitis, somatic cell counts (SCC), somatic cell count, and udder conformation. Herein the term SCC is identical to the term CELL. Somatic cell score (SCS) is defined as the mean of log 10 transformed somatic cell count values (in 10,000/mL) obtained from the milk recording scheme. The mean was taken over the period 10 to 180 after calving. Traits associated with mastitis or udder health according to the present invention include traits, which affect udder health in the bovine subject or its offspring or other relatives. Thus, udder health and associated traits, such as mastitis traits of a bull are physically manifested by its female offspring or other female relatives.
In one embodiment of the present invention, udder health is reflected by the quantitative traits Mas1, Mas2 (CM1), Mas3 (CM2), Mas4 (CM3), SCC, SCS and/or udder health index. The quantitative traits are for example defined by the following parameters:
Mas1: Treated cases of clinical mastitis in the period −5 to 50 days after 1st calving.
Mas2 (also designated CM1): Treated cases of clinical mastitis in the period −5 to 305 days after 1st calving.
Mas3 (also designated CM2): Treated cases of clinical mastitis in the period −5 to 305 days after 2nd calving.
Mas4 (also designated CM3): Treated cases of clinical mastitis in the period −5 to 305 days after 3rd or later calving.
SCS: Mean SCS in period 5-180 days after 1st calving.
In a preferred embodiment, the genomic estimated breeding value (GEBV) according to the present invention is determined for udder health, wherein udder health is determined by an udder health index. In a preferred embodiment, the udder health index includes the following 4 sub-traits:
1) Mastitis during the period from 10 days before calving to 50 days after calving in first parity.
2) Mastitis during the period from 10 days before calving to 305 days after calving in first parity.
3) Mastitis during the period from 10 days before calving to 100 days after calving in second parity.
4) Mastitis during the period from 10 days before calving to 100 days after calving in third parity.
In addition, further parameters may be included in the udder health index to improve the accuracy or reliability of the udder health assessment or GEBV, for example the following 4 type traits may be used as information-traits (secondary traits) to improve estimated breeding value (EBV) of the udder health index based on the mastitis-traits mentioned above:
1) Somatic cell count during the period from 10 days to 180 days after calving in first parity.
2) Dairy form measured in first parity.
3) Udder support (fore udder attachment) measured in first parity.
4) Udder depth measured in first parity.
Thus, in one embodiment, the udder health index of the present invention is an index weighing together information from Mas1-Mas4, SCC, fore udder attachment, udder depth, and udder band, as defined herein.
In a preferred embodiment, the individual effect of a plurality of genetic marker alleles on udder health and/or estimated breeding value, and/or the estimation of a genomic breeding value by correlation of a genotype with a predetermined effect of said phenotype is determined with respect to udder health, such as an udder health index. The udder health index preferably comprises clinical mastitis, such as susceptibility to clinical mastitis, or resistance to clinical mastitis. Moreover, the EBV or GEBV of udder health determined according to the present invention is in a preferred embodiment calculated as an index including the four mastitis-traits, e.g. the weighted sum of EBV or GEBV of the four mastitis-traits, weighted by their economical importance.
Clinical mastitis is preferably diagnosed by veterinarians. The data is preferably binary data, which define presence or absence of clinical mastitis. Registrations are for example transferred from the veterinarians' computer to a Central Cattle Database, and/or a computer readable medium or memory.
Analysis of somatic cell count is carried out at central laboratories. The information is preferably automatically transferred to the Central Cattle Database. In the prediction of breeding value, test-day somatic cell counts are preferably generalized into a geometric mean.
For the type traits (e.g., Dairy form, Udder support, and Udder depth), cows in first lactation are preferably classified according to a linear scale (ordinal categorical scores) by the classifiers of the individual breed. The scores of a daughter group classification are preferably entered into the Central Cattle Database and/or a computer readable medium or memory by means of a portable terminal.
Estimated breeding value (EBV) or GEBV of the sub-traits are for example estimated using a multi-trait best linear unbiased prediction (BLUP) sire model including the 4 mastitis trait and the 4 information-traits. The model is:
Y=Herd_Year_Season+Year_Month+Calving age (only first parity)+Proportion of breed+Proportion of heterozygosity+Sire+Residual,
where Herd_Year_Season, Year_Month and Calving age are fixed effect, Proportion of breed and Proportion of heterozygosity are fixed regression, Sire and Residual are random effects.
EBV or GEBV of Udder health is in a preferred embodiment calculated as an index including the four mastitis-traits, i.e., the weighted sum of EBV or GEBV of the four mastitis-traits, weighted by their economical importance.
In one embodiment of the present invention, a method, product, kit and/or breeding value described herein relates to udder health index. In another embodiment of the present invention, the method, product, kit and/or breeding value described herein relates to clinical mastitis. In another embodiment, the method, product, kit and/or breeding value of the present invention pertains to sub-clinical mastitis, such as detected by somatic cell score. In yet another embodiment, the method, product, kit and/or breeding value of the present invention primarily relates to clinical mastitis in combination with sub-clinical mastitis such as detected by somatic cell counts.
Registrations from daughters of bulls may be examined and used in establishing a relation between the observable incidents of mastitis and potential genetic determinants of udder health in a bovine subject.
Calving in a bovine subject is affected by a number of traits. Traits that are associated with calving according to the present invention are for example the occurrence of stillbirth (SB), calving difficulty (CD) and the size of the calf at birth (CS). The traits are assessed by a direct effect (D) of the sire in the calf. However, the traits are also assessed as a maternal effect (M) of the sire in the mother of the calf. By the term calving characteristics is meant traits which affect calving in the bovine subject or its offspring or other relatives. Thus, calving traits of a bull are physically manifested by its offspring or other relatives—both female and male. In the present invention calving characteristics comprise the traits SB, CD, and CS, which refer to the following characteristics:
SB: Designates stillbirths.
CS: Size of calves.
CD: Calving difficulties, which are based on registrations from the farmers where it is subjectively registered how difficult the calving is. The calving difficulties consist of four categories:
In one embodiment of the present invention, the method, genetic marker allele and/or combination of genetic marker alleles and kit described herein relates to still births, calving difficulties as categorized herein and/or calf size. In one embodiment of the present invention, the method, genetic marker allele and/or combination of genetic marker alleles and kit described herein relates to still births. In another embodiment, the method, genetic marker alleles and/or combinations of genetic marker alleles and kit of the present invention pertains to calving difficulties, such as detected by the calving difficulty categories described above. In yet another embodiment, the method, genetic marker alleles and/or combinations of genetic marker alleles and kit of the present invention relates to calf size. In another embodiment of the present invention, the method, genetic marker alleles and/or combinations of genetic marker alleles and kit described herein relates to any combination of still birth, calving difficulties and/or calf size.
In one preferred embodiment, the methods, kits, genetic marker alleles and/or combination of genetic marker alleles/haplotypes of the present invention relates to determining a phenotype or GEBV for an other health phenotype.
Other health phenotype is preferably determined by a “other health index”, which includes 9 sub-traits (3 types×3 parities). These sub-traits are: 1) reproductive diseases, 2) digestive diseases, 3) feet and leg diseases in the period 10 days before calving to 100 days after calving in first, second and third parity.
1) Reproductive diseases include abortion, endometritis, uterine prolapse, uterine torsion, endometritis treatment, follicular cysts, retained placenta, caesarian section, vaginitis and other reproductive diseases.
2) Digestive diseases include diarrhoea, traumatic reticuloperitonitis, ludigestion, hypomagnesemia, ketosis, milk fever, abomasal displacement, abomasal indigestion, rumen acidosis, enteritis, bloat and other digestive and metabolic diseases.
3) Feet and leg diseases include heel erosion, interdigital dermatitis, claw trimming by veterinarian, interdigital necrobacillosis, interdigital skin hyperplasia, laminitis, arthritis, sole ulcer, pressure injuries, tenosynovitis of hoofs and other leg diseases.
Mastitis in first parity is in one embodiment used as an information trait to improve the accuracy of the observed phenotype, EBV, and/or GEBV of the above disease-traits.
Digestive diseases and feet and leg diseases are preferably diagnosed by veterinarians. Reproduction disorders are preferably detected by veterinarians and by AI technicians. The data (binary data) are preferably transferred to a Central Cattle Database. In genetic evaluation, all codes for each type of diseases within lactation are pooled. If the sum is larger or equal to 1, the cow is considered sick (=1). Otherwise she is considered healthy (=0).
Breeding value for each sub-trait is in one embodiment predicted using a multi-trait BLUP sire model including the 9 sub-traits and mastitis in first parity. For example, the model is:
Y=Herd_Year_Season+Year_Month+Calving age (only first parity)+Proportion of breed+Proportion of heterozygosity+Sire+Residual,
where Herd_Year_Season, Year_Month and Calving age are fixed effect, Proportion of breed and Proportion of heterozygosity are fixed regression, Sire and Residual are random effects. Index of other-health is calculated as the weighted sum of EBV of the 9 sub-traits, weighted by their economical importance.
The term “genetic marker” refers to a variable nucleotide sequence (polymorphism) of the DNA on the bovine chromosome. The variable nucleotide sequence can be identified by methods known to a person skilled in the art, as explained elsewhere herein, for example by using specific oligonucleotides in for example amplification methods and/or hybridization techniques and/or observation of a size difference. However, the variable nucleotide sequence may also be detected by sequencing or for example restriction fragment length polymorphism analysis. In a preferred embodiment, the genetic marker allele is detected by gene chip technology or microarrays. The variable nucleotide sequence may be represented by a deletion, an insertion, repeats, and/or a point mutation. Thus, a genetic marker locus comprises a variable number of polymorphic alleles. Thus, the term “genetic marker allele” as used herein, refers to a specific such allele, i.e. a specific nucleic acid sequence.
In one embodiment, the genetic marker of the present invention is a quantitative trait locus. One type of genetic marker is a microsatellite marker that is associated with a quantitative trait locus. Thus, in one embodiment, the genetic marker of the present invention is a microsatellite. Microsatellite markers are short sequences repeated after each other. In short sequences are for example one nucleotide, such as two nucleotides, for example three nucleotides, such as four nucleotides, for example five nucleotides, such as six nucleotides, for example seven nucleotides, such as eight nucleotides, for example nine nucleotides, such as ten nucleotides. However, changes sometimes occur and the number of repeats may increase or decrease. The specific definition and locus of the polymorphic microsatellite markers can be found in the USDA genetic map (Kappes et al. 1997; or by following the link to U.S. Meat Animal Research Center http://www.marc.usda.gov/). A microsatellite locus comprises a variable number of polymorphic alleles. Thus, the term “microsatellite marker allele” as used herein refers to a specific such allele. In one embodiment of the at least one genetic marker of the present invention is detected by identification of a microsatellite marker allele, which is genetically coupled to said genetic marker.
In a preferred embodiment, the genetic marker of the present invention is a single nucleotide polymorphism (SNP). An SNP is a variation in the genetic code at a specific point on the DNA, i.e. a genetic change that is caused by substitution of a single nucleotide (such as an A is changed to G). An SNP locus comprises at least two alleles, and an SNP locus comprising two, three, and four alleles are referred to as bi-, tri-, or tetra-allelic polymorphisms, respectively. The bovine genome comprise large amounts of SNPs, and SNP markers are therefore highly suitable for use in selection for desirable phenotypic traits, which are genetically linked to the SNPs.
In one embodiment of the present invention, the specific genetic marker alleles are associated with quantitative trait loci affecting a phenotypic trait as defined below, including traits affecting udder health, such as susceptibility to mastitis, fertility, calving, and other diseases, as defined herein.
The term “associated with” as used herein in regards to the genetic marker allele and/or combination of genetic marker alleles and phenotypic traits, is meant to comprise both direct and indirect genetic linkages. Thus, a genetic marker allele and/or combination of genetic marker alleles which are associated with a trait according to the present invention may be coupled to said trait by direct or indirect genetic linkages. Moreover, the term “trait associated with” as used herein in regards to a specific phenotype, relates to any phenotypic traits, which to any extent contribute to said phenotype. For example, the traits somatic cell count (SCC), somatic cell score (SCS), udder conformation (which comprises several quantitative measures, such as fore udder attachment, udder depth, udder texture etc.), and diagnostic variables (such as treated cases of clinical mastitis within a specific timeframe) contribute to the overall mastitis phenotype. Thus, the “traits associated with mastitis”, or “mastitis phenotypic traits” comprise SCC, SCS, udder conformation and diagnostic variables, including the subindexes of any of said phenotypic traits.
The term “genetically coupled” is used herein about two genomic loci, which tend to segregate together. Thus, an SNP marker allele, which is genetically coupled to another genetic marker allele associated with a specific phenotypic trait according to the present invention, is indicative of said genetic marker, and may consequently be detected in a sample as an alternative of detecting said genetic marker associated with said phenotypic traits, for example traits associated with mastitis or fertility.
It is furthermore appreciated that the nucleotide sequences of the genetic marker allele or combination of marker alleles of the present invention are genetically associated with phenotypic traits of the present invention in a bovine subject. Consequently, it is also understood that a number of genetic markers may be comprised in the nucleotide sequence of the DNA region(s) flanked by and including the genetic markers according to the method of the present invention.
The present invention relates to methods for determining a genomic estimated breeding value based on the genotype of a bovine subject for a plurality of genetic markers, such as dense markers located across the entire bovine genome. The plurality of genetic markers is in one preferred embodiment a set of dense markers located across the entire bovine genome, such as located with at least one marker for every 1 cM on average, for example at least one marker per 0.1 cM. Thus, in one embodiment, the plurality of genetic marker alleles of the present invention is a plurality of single nucleotide polymorphisms (SNP), microsatellite markers and/or mixtures thereof, preferably a plurality of SNPs.
In one embodiment, the plurality of genetic markers or genetic marker alleles comprises at least 50, such as at least 100, such as at least 200, such as at least 300, such as at least 400, such as at least 500, such as at least 600, such as at least 700, such as at least 800, such as at least 900, such as at least 1000, such as at least 2000, for example at least 3000, such as at least 4000, for example at least 5000, such as at least 6000, for example at least 7000, such as at least 8000, for example at least 9000, such as at least 10000, for example at least 12000, such as at least 14000, for example at least 16000, such as at least 18000, for example at least 20000, for example at least 22000, such as at least 24000, for example at least 26000, such as at least 28000, for example at least 30000, for example at least 32000, such as at least 34000, for example at least 36000, such as at least 38000, for example at least 40000, for example at least 42000, such as at least 44000, for example at least 46000, such as at least 48000, for example at least 50000, for example at least 52000, such as at least 54000, for example at least 56000, such as at least 58000, for example at least 60000, for example at least 62000, such as at least 64000, for example at least 66000, such as at least 68000, for example at least 20000, for example at least 72000, such as at least 74000, for example at least 76000, such as at least 78000, for example at least 80000, for example at least 82000, such as at least 84000, for example at least 86000, such as at least 88000, for example at least 90000, for example at least 92000, such as at least 94000, for example at least 96000, such as at least 98000, for example at least 100000. In another embodiment, the plurality of comprises between 10000 and 100000, such as between 20000 and 80000, for example between 30000 and 60000, for example between 30000 and 50.000, such as between 30000 and 40000, for example between 35000 and 40000, for example between 37000 and 39000. In a specific embodiment, the plurality of comprises between 38000 and 38900 genetic markers or genetic marker alleles.
The SNP markers, which are genotyped in the present invention, are selected from any known genetic markers, including genetic markers available on commercially available detection systems, such as commercially available gene chips. In a preferred embodiment, the plurality of genetic markers is selected from the SNP markers of the Illumina Bovine SNP50 BeadChip (see e.g. http://illumina.com/downloads/BovineSNP5O_data_sheet.pdf). The BovineSNP50 BeadChip features more than 54,000 evenly spaced SNP probes that span the bovine genome. The BovineSNP50 BeadChip covers common SNPs validated in economically important beef and dairy cattle breed types and presents an average minor allele frequency (MAF) of 0.25 across all loci. Importantly, this BeadChip offers uniform coverage with an average probe spacing of 51.5 kb to provide more than sufficient SNP density for robust genome-association studies in cattle. The SNPs have been described by the Bovine HapMap Consortium, which, to date, has conducted extensive genotyping of cattle. Consequently, in a preferred embodiment, the genetic markers of the present invention are genotyped, i.e. the genetic marker alleles of the methods and products of the present invention are determined, by gene chip technology/DNA microarrays.
One aspect of the present invention relates to a method of determining a phenotypic trait as defined herein in a bovine subject, comprising detecting in a sample from said bovine subject the presence or absence of at least one genetic marker allele, such as a plurality of genetic marker alleles or a specific combination of genetic marker alleles, wherein said genetic marker allele, plurality or specific combination of genetic marker alleles is associated with said phenotypic trait of said bovine subject and/or offspring or other relatives therefrom. The present invention relates to any type of genetic marker. Preferred genetic marker alleles are however single nucleotide polymorphisms (SNP markers) and/or microsatellite markers.
In a preferred embodiment, the present invention relates to a specific combination of genetic markers. In this embodiment, the contribution of each individual genetic marker allele to a phenotypic trait as defined herein is used for the determination of a phenotypic trait and/or breeding value of a bovine subject. Therefore, the present invention also relates to methods and kits, which comprises determining multiple genetic marker alleles, and/or a specific combination of genetic marker alleles in a sample from said bovine subject.
Thus, in one embodiment, the present invention relates to methods, products and kits for determining a phenotypic trait comprising determining a specific combination or a plurality of at least 2 genetic marker alleles, such as at least 3 genetic marker alleles, such as at least 4 genetic marker alleles, such as at least 5 genetic marker alleles, such as at least 10 genetic marker alleles, such as at least 20 genetic marker alleles, such as at least 30 genetic marker alleles, such as at least 40 genetic marker alleles, such as at least 50 genetic marker alleles, such as at least 60 genetic marker alleles, such as at least 70 genetic marker alleles, such as at least 80 genetic marker alleles, such as at least 90 genetic marker alleles, such as at least 100 genetic marker alleles, such as at least 200 genetic marker alleles, such as at least 300 genetic marker alleles, such as at least 400 genetic marker alleles, such as at least 500 genetic marker alleles, such as at least 600 genetic marker alleles, such as at least 700 genetic marker alleles, such as at least 800 genetic marker alleles, such as at least 900 genetic marker alleles, such as at least 1000 genetic marker alleles, such as at least 2000 marker alleles, for example at least 3000 marker alleles, such as at least 4000 genetic marker alleles, for example at least 5000 genetic marker alleles, such as at least 6000 marker alleles, for example at least 7000 marker alleles, such as at least 8000 genetic marker alleles, for example at least 9000 genetic marker alleles, such as at least 10000 marker alleles, for example at least 12000 marker alleles, such as at least 14000 genetic marker alleles, for example at least 16000 genetic marker alleles, such as at least 18000 marker alleles, for example at least 20000 marker alleles, for example at least 22000 marker alleles, such as at least 24000 genetic marker alleles, for example at least 26000 genetic marker alleles, such as at least 28000 marker alleles, for example at least 30000 marker alleles, for example at least 32000 marker alleles, such as at least 34000 genetic marker alleles, for example at least 36000 genetic marker alleles, such as at least 38000 marker alleles, for example at least 40000 marker alleles, for example at least 42000 marker alleles, such as at least 44000 genetic marker alleles, for example at least 46000 genetic marker alleles, such as at least 48000 marker alleles, for example at least 50000 marker alleles, for example at least 52000 marker alleles, such as at least 54000 genetic marker alleles, for example at least 56000 genetic marker alleles, such as at least 58000 marker alleles, for example at least 60000 marker alleles, for example at least 62000 marker alleles, such as at least 64000 genetic marker alleles, for example at least 66000 genetic marker alleles, such as at least 68000 marker alleles, for example at least 20000 marker alleles, for example at least 72000 marker alleles, such as at least 74000 genetic marker alleles, for example at least 76000 genetic marker alleles, such as at least 78000 marker alleles, for example at least 80000 marker alleles, for example at least 82000 marker alleles, such as at least 84000 genetic marker alleles, for example at least 86000 genetic marker alleles, such as at least 88000 marker alleles, for example at least 90000 marker alleles, for example at least 92000 marker alleles, such as at least 94000 genetic marker alleles, for example at least 96000 genetic marker alleles, such as at least 98000 marker alleles, for example at least 100000 marker alleles. In one embodiment, the present invention comprises genotyping between 100 and 100000, such as between 10000 and 60000, for example between 20000 and 40000 genetic markers in one or more bovine subjects.
In one embodiment, the genetic markers of the present invention are selected from the group of genetic markers set out in the BovineSNP50 BeadChip from Illumina Inc. Each of the genetic markers set out in the BovineSNP50 BeadChip from Illumina Inc are also claimed as a single embodiment of the present invention.
It is understood that the genetic marker alleles of the present invention may be genetically coupled to other genetic polymorphisms, which may then serve as alternative genetic markers for determining a bovine subject with a specific phenotype according to the present invention. Such alternative genetic markers, however, cannot be used for selection without also selecting for the genetic marker alleles of the present invention, and therefore, said alternative genetic markers are also within the scope of the present invention.
In a preferred embodiment, the genetic markers or marker alleles for use in the present invention are determined by the BovineSNP50 BeadChip from Illumina Inc, said assay featuring more than 54,000 evenly spaced probes that target SNPs. The BovineSNP50 BeadChip presents an average SNP spacing of 51.5 Kb across the entire genome, thus allowing sufficient SNP density for genomic prediction of phenotypic traits of the present invention. The genetic markers of the BovineSNP50 BeadChip from Illumina Inc are available from the suppliers website.
The BovineSNP50 BeadChip targets evenly distributed SNPs that are polymorphic across the breeds tested and provides an average probe spacing of 51.5 kb and a median spacing of 37.3 kb. In general, observed linkage disequilibrium (LD) in multiple breeds of cattle suggests haplotype blocks of approximately 70 kb on average, indicating that the resolution offered by the BovineSNP50 chip is well within the resolution of LD in cattle.
Selective breeding of cattle is based on selection of sires and dams with superior genetic backgrounds to pass on to their off-spring. Sires and/or dams are specifically selected on the basis of their genetic merit with respect to economically important phenotypes, such as resistance/susceptibility to disease (e.g. mastitis) and/or yield.
The present invention allows the selection of bovine subjects for breeding based on the genomic estimated breeding value (GEBV) of the sire and/or dam; i.e. the method of determining an genomic estimated breeding value for a bovine subject (e.g. without a known phenotype) of the invention, may be used in a method for selective breeding. According to an aspect of the invention, the method for selective breeding, here also referred to as breeding program, comprises selecting a sire and a dam, e.g. from a plurality of bovine subjects, wherein a bovine subject selected for breeding has a genomic estimated breeding value of a specific order of magnitude. The breeding values are chosen, in a manner known to the skilled person, such that offspring of the sire and dam may have a desired breeding value. After selection of the sire and dam on the basis of the determined genomic estimated breeding value, offspring is produced using the sire and dam. The breeding value of the offspring may be estimated as described hereinabove, e.g. before the phenotype, associated with the desired breeding value, becomes manifest in the offspring. The breeding program may proceed with the offspring as sire or dam for a next generation of offspring if its genomic estimated breeding value or estimated breeding value, or the GEBV or EBV of its offspring is larger than, equal to, or differs less than a predetermined amount from, the desired breeding value. The determination of an estimated breeding value before the phenotype is known allows for more efficient breeding of cattle, because an accurate breeding value can be determined on the basis of a genetic test. This, allows for more accurate selection of genetically superior sires and dams, and thus facilitate breeding of cattle with improved genetic potential in respect of economical and ethical important phenotypic factors, such as disease resistance, yield etc. Moreover, the time required for a breeding program is reduced since registration/observation of phenotypic traits is not required for determining the genomic estimated breeding value of an animal by a method of the present invention.
Thus, in one aspect the present invention relates to a method for selective breeding, comprising determining an estimated breeding value of a bovine subject using a method as defined herein for determining a genomic estimated breeding value, using said bovine subject as sire or dam for breeding if the estimated breeding value of said bovine subject is larger than, equal to, or differs less than a predetermined amount from a desired breeding value for the bovine subjects and/or the offspring. The genomic estimated breeding value may be determined without inclusion of phenotypic registrations of the bovine subjects or its relatives, and thus, the method also applies to the use of a bovine subjects as sire or dam before/prior to an udder health, fertility and/or other health phenotype associated with the estimated breeding value becomes manifest. For the same reason, the method also applies to the use of the bovine subject as sire or dam, when that bovine subject does not have any offspring and/or any phenotypic records of its offspring or other relatives.
According to the method of the present invention for selective breeding, the bovine subject is used as sire or dam for breeding if its genomic estimated breeding value is larger than, equal to, or differs less than a predetermined amount from a desired breeding value for the bovine subject or the offspring. The desired breeding value for the bovine subject or the offspring is apparent for those skilled in the art, and the tolerance with respect to predetermined difference between the breeding values of the parent and offspring are also within the skills of the trained practitioner. The desired breeding value and the tolerance with in terms of the accepted difference from the predetermined value depend on the phenotype with respect to which the breeding value is determined. In general a difference between the estimated breeding value and the desired breeding value of less than 20, such as less than 10, such as less than 9, for example less than 8, such as less than 7, for example less than 6, such as less than 5, for example less than 4, such as less than 3, for example less than 2, such as less than 1.
The breeding value is determined with respect to any phenotype as described herein. In preferred embodiments, the breeding value is determined in respect of udder health, fertility and/or other health, such as a health index comprising reproductive diseases, digestive diseases, feet and leg diseases. In a preferred embodiment, the breeding value is determined in respect of udder health, such as an udder health index comprising resistance to clinical mastitis and/or cell count.
According to the present invention, a phenotypic trait according to the present invention is determined by detecting the absence or presence of a genetic marker allele or a specific combination of genetic marker alleles in a sample of any source comprising genetic material. Thus, detection of a genetic marker may be performed on samples selected from the group consisting of blood, semen (sperm), urine, liver tissue, muscle, skin, hair, follicles, ear, tail, fat, testicular tissue, lung tissue, saliva, spinal cord biopsy and any other tissue.
In preferred embodiments the sample is selected from the group consisting of blood, urine, skin, hair, ear, tail, liver and muscle. In another preferred embodiment the sample is selected from the group consisting of blood, liver tissue and muscle. In particularly preferred embodiments the sample is blood. In another particularly preferred embodiment the sample is liver tissue. In yet another particularly preferred embodiment the sample is muscle. In yet a further preferred embodiment the sample is blood and/or milk.
For genotyping, such as SNP and/or microsatellite genotyping, nucleic acid may be extracted from the samples by a variety of techniques. For example Genomic DNA may be isolated from the sample by treatment with proteinase K followed by extraction with phenol (Sambrook et al. 1989). However, the sample may also be used directly.
The amount of the nucleic acid used for microsatellite genotyping for detection of a genetic marker allele according to the method of the present invention is in the range of nanograms to micrograms. It is appreciated by the person skilled in the art that in practical terms no upper limit for the amount of nucleic acid to be analysed exists. The problem that the skilled person encounters is that the amount of sample to be analysed is limited. Therefore, it is beneficial that the method of the present invention can be performed on a small amount of sample and thus a limited amount of nucleic acid in the sample is required. The amount of the nucleic acid to be analysed is thus at least 1 ng, such as at least 10 ng, for example at least 25 ng, such as at least 50 ng, for example at least 75 ng, such as at least 100 ng, for example at least 125 ng, such as at least 150 ng, for example at least 200 ng, such as at least 225 ng, for example at least 250 ng, such as at least 275 ng, for example at least 300 ng, 400 ng, for example at least 500 ng, such as at least 600 ng, for example at least 700 ng, such as at least 800, ng, for example at least 900 ng or such as at least 1000 ng.
In one preferred embodiment the amount of nucleic acid as the starting material for the method of the present invention is 20-50 ng. In a specifically preferred embodiment, the starting material for the method of the present invention is at 30-40 ng.
The method according to the present invention for determining a genotype for a plurality of genetic markers, a phenotype, and or a phenotypic trait according to the present invention of a bovine subject comprises detecting in a sample from said bovine subject the presence or absence of at least one genetic marker allele or a plurality genetic markers of the present invention. The genetic marker allele, plurality of markers or specific combination of genetic marker alleles is associated with said phenotypic trait of said bovine subject and/or offspring or other relatives therefrom. In a preferred embodiment, the genetic markers are selected from the group set out in the BovineSNP50 BeadChip from Illumine Inc. The genetic markers, or a complementary sequence as well as transciptional (mRNA) and translational products (polypeptides, proteins) therefrom may be identified by any method known to those of skill within the art.
It will be apparent to the person skilled in the art that there are a large number of analytical procedures which may be used to detect the presence or absence of variant nucleotides at one or more of positions mentioned herein in the specified region. Mutations or polymorphisms within or flanking the specified region can be detected by utilizing a number of techniques. Nucleic acid from any nucleated cell can be used as the starting point for such assay techniques, and may be isolated according to standard nucleic acid preparation procedures that are well known to those of skill in the art. In general, the detection of allelic variation requires a mutation discrimination technique, optionally an amplification reaction and a signal generation system.
A number of mutation detection techniques are listed below. Some of the methods listed are based on the polymerase chain reaction (PCR), wherein the method according to the present invention includes a step for amplification of the nucleotide sequence of interest in the presence of primers based on the nucleotide sequence of the variable nucleotide sequence. The methods may be used in combination with a number of signal generation systems, a selection of which is listed further below.
Further amplification techniques are found elsewhere herein. Many current methods for the detection of allelic variation are reviewed by Nollau et al., Clin. Chem. 43, 1114-1120, 1997; and in standard textbooks, for example “Laboratory Protocols for Mutation Detection”, Ed. by U. Landegren, Oxford University Press, 1996 and “PCR”, 2nd Edition by Newton & Graham, BIOS Scientific Publishers Limited, 1997.
The detection of genetic marker alleles and/or combinations of genetic marker alleles can according to one embodiment of the present invention be achieved by a number of techniques known to the skilled person, including typing of microsatellites or short tandem repeats (STR), restriction fragment length polymorphisms (RFLP), detection of deletions or insertions, random amplified polymorphic DNA (RAPIDs) or the typing of single nucleotide polymorphisms by methods such as restriction fragment length polymerase chain reaction, allele-specific oligomer hybridisation, oligomer-specific ligation assays, hybridisation with PNA or locked nucleic acids (LNA) probes.
A primer of the present invention is a nucleic acid molecule sufficiently complementary to the sequence on which it is based and of sufficiently length to selectively hybridise to the corresponding region of a nucleic acid molecule intended to be amplified. The primer is able to prime the synthesis of the corresponding region of the intended nucleic acid molecule in the methods described above. Similarly, a probe of the present invention is a molecule for example a nucleic acid molecule of sufficient length and sufficiently complementary to the nucleic acid sequence of interest which selectively binds to the nucleic acid sequence of interest under high or low stringency conditions.
The genetic marker associated with a phenotypic trait according to the present invention can be detected by a number of methods known to those of skill within the art. For example, the genetic marker may be identified by genotyping using a method selected from the group consisting of single nucleotide polymorphisms (SNPs), microsatellite markers, restriction fragment length polymorphisms (RFLPs), DNA chips, amplified fragment length polymorphisms (AFLPs), randomly amplified polymorphic sequences (RAPDs), sequence characterised amplified regions (SCARs), cleaved amplified polymorphic sequences (CAPSs), nucleic acid sequencing, and microsatellite genotyping.
In a one embodiment, the genetic marker allele or combination of alleles associated with a phenotypic trait according to the present invention is detected by microsatellite genotyping. Microsatellite genotyping may be performed by amplification of the microsatellite marker by sequence specific oligonucleotide primers, and subsequent analysis of the amplification product, in terms of for example length, quantity and/or sequence of the amplification product.
In a preferred embodiment, the genetic markers of the present invention are genotyped, i.e. the genetic marker alleles of the methods and products of the present invention are determined, by gene chip technology/DNA microarrays. In a more preferred embodiment, the genetic marker allele, plurality of genetic marker alleles or combination or plurality of alleles are determined by a DNA array, such as the BovineSNP50 BeadChip, which is a multi-sample genotyping panel powered by Illumina's Infinium® II Assay.
In another preferred embodiment, the genetic marker allele, plurality of genetic marker alleles, or combination or plurality of alleles associated with a phenotype, or phenotypic trait according to the present invention is detected by a DNA array, such as the BovineSNP50 BeadChip, which is a multi-sample genotyping panel powered by Illumina's Infinium® II Assay provided by Illumina Inc.
In one aspect, the present invention relates to a diagnostic kit for detecting the presence or absence in a bovine subject of at least one genetic marker allele as described herein. The kit will provide an easily applicable means of genotyping a bovine subject in respect of genetic marker alleles and/or combinations of genetic marker alleles of the genomic prediction model.
Specifically, the diagnostic kit is suitable for detection of the presence or absence of at least one genetic marker allele and/or combination of marker alleles, which is associated with at least one phenotypic trait of said bovine subject and/or offspring or other relatives therefrom. Examples of specific traits are provided elsewhere herein.
In one aspect, the kit of the present invention comprises means for detecting a plurality of genetic marker alleles, and/or a computer program and/or a computer readable medium as defined elsewhere herein.
Genotyping of a bovine subject in order to establish the genetic determinants of a phenotypic trait according to the present invention for that subject according to the present invention can be based on the analysis of genomic DNA which can be provided using standard DNA extraction methods as described herein. The genomic DNA may be isolated and amplified using standard techniques such as the polymerase chain reaction using oligonucleotide primers corresponding (complementary) to the polymorphic marker regions. Additional steps of purifying the DNA prior to amplification reaction may be included.
In one embodiment of the present invention, the kit comprises components for genotyping a bovine subject. Methods for genotyping are disclosed elsewhere herein. In a preferred embodiment, genotyping is SNP or microsatellite genotyping. Thus, the kit may comprise various components for performing SNP or microsatellite genotyping. For example, the kit may comprise at least one oligonucleotide for genotyping of a bovine subject. In a specific embodiment, the kit comprises at least one oligonucleotide for detecting a genetic marker allele selected from the group of genetic markers set out in the BovineSNP50 BeadChip from Illumine Inc.
Furthermore, the kit according to the present invention may comprise reagents and buffers required for genotyping. The exact composition of buffers and reagents depend on the method used for genotyping. In one embodiment, the kit comprises buffers required for amplification of DNA. In a particular embodiment, the kit comprises components for purification of DNA.
The diagnostic kit according to the present invention may further comprise at least one reference sample. The reference sample serves to verify that the genetic marker is correctly detected in the sample. Thus, the presence or absence of a genetic marker according to the present invention can be detected in parallel in the sample and the reference sample. The reference sample may either be a negative control, which does not comprise genetic material comprising a genetic marker allele or a combination of alleles according to the present invention, or the reference sample may be a positive control, which comprises genetic material comprising a genetic marker allele or combination of alleles according to the present invention. The reference sample thus serves to verify that the kit is used correctly. In one embodiment, the reference sample comprises an oligonucleotide sequence of an SNP marker allele associated with at least one trait as defined elsewhere herein. In a specific embodiment, the reference sample comprises an SNP marker or microsatellite marker oligonucleotide sequence associated with a specific phenotype, as defined elsewhere herein. In another specific embodiment, the reference sample comprises an NSP or microsatellite marker oligonucleotide sequence associated with a specific phenotypic trait, as defined elsewhere herein.
The kit according to the present invention may be provided with instructions for the performance of the detection method of the kit, and for the interpretation of the results.
The individual effect of a plurality of genetic marker alleles on a phenotype, such as udder health, fertility and/or other health, or the corresponding estimated breeding value (e.g. for udder health or another phenotype mentioned herein) of a bovine subject, determined according to the present invention is in one embodiment stored in a suitable media, such as in a non-volatile memory, such as a computer memory and/or a database. The suitable media may be located distantly from the location of the bovine subject for which a genomic estimated breeding value is determined according to the present invention.
The methods of the present invention may be embodied in a computer program product including program code portions for performing, when run on a programmable apparatus, a method according to the present invention, for example a method for determining the individual effect of a plurality of genetic marker alleles on a breeding value of one or more reference bovine subjects, or a method of estimating a breeding value of a bovine subject based on the genotype of said bovine subject for a plurality of genetic markers, said methods as described in more detail elsewhere herein. The present invention also in one aspect relates to a computer readable medium comprising data representing a computer program product as defined above. Thus, data representing the computer program product may be stored on a computer readable product, comprising, but not limited to, storage media such as magnetic storage media (ROMs, RAMs, floppy discs, magnetic tapes, etc.), optically readable storage media (CD-ROMs, DVDs, etc.), and carrier waves (transmission via the internet). Further, the computer program product may be implemented in a distributed fashion, e.g. comprising a first portion, e.g. for performing the method for determining the individual effect of a plurality of genetic marker alleles on a breeding value, and a second portion, e.g. for performing the method of estimating a genomic breeding value of a bovine subject, wherein the first and second portions may be arranged to be run on mutually different programmable apparatus and/or at mutually different (remote) locations.
One aspect of the present invention relates to a computer, a computer system and/or a programmable apparatus for performing a method of the present invention, said computer, a computer system and/or a programmable apparatus comprising a computer program product as and/or computer readable medium as described above.
Any use of a product of the present invention are inherently within the scope of the present invention. In particular, the present invention relates to the use of a computer program product, a computer readable medium, computer system and/or a programmable apparatus, and/or kit of the present invention for estimating a breeding value for a specific phenotype of a bovine subject, or for use in selective breeding.
The granddaughter design includes analysing data from DNA-based markers for grandsires that have been used extensively in breeding and for sons of grandsires where the sons have produced offspring. The phenotypic data that are to be used together with the DNA-marker data are derived from the daughters of the sons. Such phenotypic data could be for example milk production features, features relating to calving, fertility, meat quality, or disease. One group of daughters has inherited one allele from their father whereas a second group of daughters has inherited the other allele from their father. By comparing data from the two groups, information can be gained whether a fragment of a particular chromosome is harbouring one or more genes that affect the trait in question. It may be concluded whether a Quantitative trait loci is present within this fragment of the chromosome.
A prerequisite for performing a granddaughter design is the availability of detailed phenotypic data. In the present invention such data have been available (http://www.lr.dk/kvaeg/diverse/principles.pdf).
In contrast, genetic marker alleles and/or combinations of genetic marker alleles can be used directly to provide information of the traits passed on from parents to one or more of their offspring when a number of DNA markers on a chromosome have been determined for one or both parents and their offspring. The markers may be used to calculate the genetic history of the chromosome linked to the DNA markers.
The frequency of recombination is the likelihood that a recombination event will occur between two genes or two markers. The frequency of recombination may be calculated as the genetic distance between the two genes or the two markers. Genetic distance is measured in units of centiMorgan (cM). One centiMorgan is equal to a 1% chance that a marker at one genetic locus will be separated from a marker at a second locus due to crossing over in a single generation. One centiMorgan is equivalent, on average, to one million base pairs.
In order to detect whether the genetic marker is present in the genetic material, standard methods well known to persons skilled in the art may be applied, for example by the use of nucleic acid amplification. In order to determine whether the genetic marker allele is genetically linked to a phenotypic trait according to the present invention, a permutation test can be applied when the regression method is used (Doerge and Churchill, 1996). The principle of the permutation test is well described by Doerge and Churchill (1996). A threshold at the 5% chromosome wide level is considered to be significant evidence for linkage between the genetic marker and a phenotypic trait according to the present invention.
The frequency of recombination is the likelihood that a recombination event will occur between two genes or two markers. The frequency of recombination may be calculated as the genetic distance between the two genes or the two markers. Genetic distance is measured in units of centiMorgan (cM). One centiMorgan is the length of chromosome wherein there is on average 0.01 cross-over per meiosis. If an uneven number of cross-overs occurs between two genetic markers, then a marker at one genetic locus will be separated from a marker at a second locus due to crossing over in a single generation. One centiMorgan is equivalent, on average, to one million base pairs. The relative position of a genetic marker, for example a microsatellite marker, may be designated in Morgan or centiMorgan with reference to its distance from a proximal position in the chromosome located at 0 cM.
It is an object of the present invention to provide genetic marker alleles or a specific combination of such marker alleles that are associated with a trait as defined herein. The specific genetic marker allele can be identified according to the present invention, by detecting a genetic marker, such as an SNP marker, which is genetically coupled to said first genetic marker allele.
2000 Holstein bulls (covering birth year from 1973 to 2002) were chosen to be genotyped for 50 k SNP loci. After the editing, the number of SNP loci reduced to 38055, and the number of typed bulls reduced to 1898, among which 1481 bulls were born during 1993-2002. EBVs (index) from current genetic evaluation were used as response variable to estimate SNP effect. In total 17 single or complex traits were analyzed in this study.
A Bayesian Gibbs sampling approach was applied to estimate SNP effect using a simple model,
y
i=μ+Σj=1m(qij1+qij2)vj+ei
where yi is pedigree based EBV of individual i, μ is the intercept, m is the number of SNP loci, qij1 and qij2 are the scaled effects of paternal and maternal SNPs at locus j, vj (vj>0) is the scale factor (standard deviation) for qjk at locus j, and ei is the random residual.
It is specified that the prior distribution of qjk is a standard normal distribution, i.e.,
q˜N(0,I)
Scale factors vj's were assumed to have either a common prior distribution
v˜TN(0,Iσ2SNP), v>0
or a mixture prior distribution,
v
0
˜TN(0,I0σ2v0) v0>0, v1˜TN(0, I1σ2v1) v1>0
The prior distribution of μ, σ2v and σ2v1 were assumed to be improper uniform distribution, while σ2v0 was set with a small value. In this study, σ2v0 was set to 0.01 in all traits.
The accuracy of GEBV was evaluated by a five-fold cross validation. Thus, five subsets were created from the whole data, each left two year's records out (subset 1 without bulls which born in 1993 and 1994, subset 2 without 1995 and 1996, and so on). Each of the five subsets was used as training data to estimated SNP effect, and the corresponding “leave out” data as test data to predict GEBV (Table 1). The five test datasets comprised a total test data including 1481 bulls.
Pilot analysis: First, a pilot analysis on four traits (protein yield, combined yield, udder, and female fertility) was carried out to find appropriate prior distribution of scale factors (v). The analysis included five scenarios: mixture priors with π1=5%, 10%, 20%, 50% and a common prior for all loci.
Final analysis: The most appropriate (best prediction) prior distribution (which was common prior distribution according to the pilot study) was used to analyse all the 17 traits.
Holstein bulls from 258 half-sib families (1-41 bulls each), born during years from 1986 to 2004 were genotyped using Illumina Bovine SNP50 BeadChip (Illumina, San Diego, Calif.). The marker data were edited using the following criteria: 1) the locus was deleted if the minor allele frequency less than 5%, or the proportion of animals called for a genotype at this locus was less than 95%, or the average GenCall score at the locus was less than 0.65; 2) the individual was deleted if the call rate was less than a score of 0.85; 3) a marker type for an individual had a GenCall score less than 0.6. After the editing, there were 3,330 bulls and 38,134 SNP (single nucleotide polymorphism) markers available.
Published conventional EBV were used as response variables to estimate SNP effects. The EBV and their reliability for the genotyped bulls were obtained from official evaluations in April 2009.
Several statistical models and algorithms are suitable for predicting breeding values based on dense markers, for example BLUP, BayesA and BayesB can be used to analyze simulated data and real data. A linear BLUP approach assumes that effects of all SNP are normal distributed with same variance. BayesA and BayseB (Meuwissen et al., 2001) allow each marker to have its own variances of allele effects. BayesB also models that most SNP have zero effect, but a few have moderate to large effects.
In this example, a Bayesian method, which captures the features of BayesA and BayesB but simplifies the computing algorithm, was used to estimate marker effects for genomic prediction. The model is:
where y is published conventional EBV, μ is the intercept, m is the number of SNP markers, qi is the vector of scaled SNP effects (scaled by standard deviation) of marker with qi˜N(0,I), vi (vi>0) is a scaling factor (standard deviation) for SNP effects of marker i, and e is the vector of residual. The effects of SNP types of marker i are the products of vi and qi.
Scaling factors vi were assumed to have either a common prior distribution or a mixture prior distribution. A common prior distribution across the variances of chromosome segment effects, which leads to a slight or moderate differentiation between small and large effects of markers, was assumed to be a positive half-normal distribution,
v
i
˜TN(0,σv2), vi>0
Mixture prior distribution, which lead to a strong differentiation between small and large effects of markers, assume that a proportion (p0, typically large) of markers have a very small effect, and a proportion (p1, typically small) of markers have a moderate or large effect. This was achieved by assuming that the prior distribution of vi was sampled from either a positive half-normal distribution with a small variance (σv02) or a positive half-normal distribution with large variance (σv12)
v
i˜π0TN(0,σv02)+π1TN(σv12)
The GEBV for individual k was defined as the sum of predicted effects of SNP over all markers
A study on model validation shows that the common prior model performed generally better than the mixture prior models, therefore this model was used to estimate SNP effect for genomic prediction.
The accuracy of GEBV were evaluated using a 5-fold cross validation. In each validation dataset, many half-sibs (about 500 individuals) were left out as test data, and the remaining as reference data. Two criteria were used to assess accuracy of genomic predictions. One was squared the correlation between GEBV and published conventional EBV in test datasets, where both GEBV and EBV were adjusted for birth-year mean to account for genetic trend, i.e., within-year squared correlation. The other was expected genomic reliability, obtained from prediction error variance (PEV) which was measured as posterior variance of each GEBV. To avoid strong dependencies between test data and training data, the sires which had sons or grandsons in the training data were excluded from the validation.
Squared correlation between GEBV and EBV (r2GEBV,EBV) for animals in the test data and expected reliability of GEBV
r2GEBV,EBV were lower than expected reliabilities. The lower r2GEBV,EBV could be due to the facts that EBV contained error and the animals in the validation were selected from elite parents, instead of random samples. The reliability in the above table is calculated from the animals in test data which were not used to estimated SNP effect. It means the reliability of GEBV is for those animals which do not have their own record (no progeny records for bull).
For young bulls without progeny records, their EBV can be obtained from parent average EBV. The reliability of parent average EBV is about half of the reliability of GEBV in this study, indicating genomic prediction is considerably better than conventional prediction of young bull.
Breeding values for 30 sires where calculated for fertility index, udder health index and other health index (health index comprising reproductive diseases, digestive diseases and/or feet and leg diseases) phenotypes. Conventional EBVs were calculated from parent EBV (PA), and EBVs were calculated, which include records of progeny tests. Moreover, GEBV were calculated on the basis of the genotype of the sire for a plurality of genetic SNP marker, without including records of progeny test. The bulls were genotyped and GEBV calculated as described in example 2. The values are listed in table 2. It is clear that predicted GEBV are more similar to the EBV after progeny test than parent average EBV. Thus, GEBV predicted according to the present invention is a more reliable tool than PA EBV for predicting the genetic merit of a bull, in the absence of progeny records.
This example evaluates the reliability of genomic estimated breeding values (GEBV) in the Danish Holstein population. The data in the analysis included 3,330 bulls with both published conventional EBV and single nucleotide polymorphism (SNP) markers. After data editing, 38,134 SNP markers were available. In the analysis, all SNPs were fitted simultaneously as random effects in a Bayesian variable selection model which allows heterogeneous variances for different SNP markers. The response variables were the official EBV. Direct GEBV were calculated as the sum of individual SNP effects. Initial analyses of 4 index traits were carried out to compare models with different intensities of shrinkage for SNP effects, i.e., mixture prior distributions of scaling factors (standard deviation of SNP effects) assuming 5%, 10%, 20% or 50% of SNP having large effects and the others having very small or no effects, and a single prior distribution common for all SNP. It was found that in general the model with a common prior distribution of scaling factors had better predictive ability than any mixture prior models. Therefore, a common prior model was used to estimate SNP effects and breeding values for all 18 index traits. Reliability of GEBV was assessed by squared correlation between GEBV and conventional EBV, and expected reliability obtained from prediction error variance, using a 5-fold cross validation. Squared correlations between GEBV and published EBV (without any adjustment) ranged from 0.252 to 0.700 with an average of 0.418. Expected reliabilities ranged from 0.494 to 0.733 with an average of 0.546. The results show selection of bovine subjects for breeding based on GEBV can greatly improve the accuracy of selection for young bulls and bull dams, compared with traditional selection based on parent average.
Holstein bulls from 258 half-sib families (1-41 bulls each), born during years from 1986 to 2004 were genotyped using Illumina Bovine SNP50 BeadChip (Illumina, San Diego, Calif.). The marker data were edited using the following criteria: 1) the locus was deleted if the minor allele frequency less than 5%, or the proportion of animals called for a genotype at this locus was less than 95%, or the average GenCall score at the locus was less than 0.65; 2) the individual was deleted if the call rate was less than a score of 0.85; 3) a marker type for an individual had a GenCall score less than 0.6. After the editing, there were 3,330 bulls and 38,134 SNP (single nucleotide polymorphism) markers available. In the analysis of SNP effects and genomic prediction, missing SNP at a particular marker in some animals was treated as an extra allele. It corresponded to replacing the effect of missing SNP at a marker with population mean of this marker.
Published conventional EBV were used as response variables to estimate SNP effects. The EBV and their reliability for the genotyped bulls were obtained from official evaluations in April 2009. In total 18 index traits were analyzed in this example. Except for fat percentage and protein percentage, the traits are the sub-traits in the new Nordic Total merit index (NTM). Detailed descriptions on these index traits and their EBV are given in Danish Cattle Federation (2006).
All individual SNP markers were used as predictors and conventional EBV (PA EBV) were used as response variables weighted by a function of reliability of EBV (see herein below for details). A Bayesian method which captures the features of BayesA and BayesB but simplifies the computing algorithm was used to estimate marker effects for genomic prediction. The method applies the methodology of variable selection presented by George and McCulloch (1993). A detailed description of the method was presented by Villumsen et al. (2009) and Meuwissen and Goddard (2004). The following model was used to fit EBV data:
where y is the vector of published conventional EBV, μ is the intercept, m is the number of SNP markers, qi is the vector of scaled SNP effects (scaled by standard deviation) of marker i with qi˜N(0,I), vi (vi>0) is a scaling factor (standard deviation) for SNP effects of marker i, and e is the vector of residual with e˜N(0,Iσe2). The effects of SNP alleles of marker i are the products of vi and qi. Scaling factors vi were assumed to have either a common prior distribution or a mixture prior distribution. A common prior distribution across the variances of chromosome segment effects, which leads to a slight or moderate differentiation between small and large effects of markers, was assumed to be a positive half-normal distribution,
v
i
˜TN(0,σv2), vi>0
Mixture prior distributions, which lead to strong differentiation between small and large effects of markers, assume that a proportion (π0, typically large) of markers have very small effects, and another proportion (π1=1−π0, typically small) of markers have moderate or large effects. This was achieved by assuming that the prior distribution of vi was sampled from either a positive half-normal distribution with a small variance (σv02) or a positive half-normal distribution with large variance (σv12),
v
i˜π0TN(0,σv02)+π1TN(σv12)
The prior distributions of μ, σv2 and σv12 were assumed to be improper uniform distributions, while σv02 was fixed at a small value. In this study, σv02 was set to 0.0001 for all traits. The genomic estimated breeding value (GEBV) for individual k was defined as the sum of predicted effects of SNP over all markers
The effect of shrinkage intensity on accuracy of GEBV was investigated using 5 scenarios: 1) mixture prior of scaling factors with π1=5%, 2) π1=10%, 3) π1=20%, 4) π1=50%, and 5) common prior of scaling factors for all markers (i.e., π1=100%).
An initial evaluation on fertility and udder-health using a common prior model was carried out to investigate the effect of weighting factor on accuracy of genomic prediction. It was found that a model using 1/(1−reliability of EBV) as weighting factor of response variable performed better than a model using reliability of EBV as weighting factor, the latter was better than a constant weight of 1 for all response variables (results not shown). Therefore weighting factor of 1/(1−reliability of EBV) was used in the further analysis. The weights were divided by the average weight to scale the weights to an average of 1. According to the definition, an EBV with reliability close to one would get an extremely high weight. To avoid possible problems due to extreme weights, reliabilities larger than 0.98 were replaced with 0.98 in the calculation of weight.
The models with different priors for scaling factors and the accuracy of GEBV were evaluated using a 5-fold cross validation. In the cross validation, 134 half-sib families which have at least one bull born after 1993 were divided into 5 test datasets by the following procedures. First, the 134 half-sib families were assigned into 10 year classes (1994-2003) according to birth-year for the most of half-sibs. Then each two year classes formed a test dataset, i.e., 1994-1995 formed Test-set 1, 1996-1997 Test-set 2, and so on. The five test datasets comprised a total of 2,393 bulls. In each fold cross validation, the whole data excluded one test dataset to form a training dataset which was used to estimate marker effects and predict genomic breeding values of the “left out” animals. Detailed information on the whole data and 5 test datasets is shown in Table 11.
Two criteria were used to assess accuracy of genomic prediction. One was squared correlation between GEBV and published conventional EBV (r2GEBV,EBV) in test datasets, where both GEBV and EBV were adjusted for birth-year mean to account for genetic trend, i.e., within-year squared correlation. The other was expected genomic reliability, obtained from prediction error variance (PEV) which was measured as posterior variance of each GEBV. To avoid strong dependencies between test data and training data, the sires which had sons or grandsons in the training data were excluded from the validation.
Five scenarios of prior distribution for scaling factors (standard deviations, vi) of SNP effects were evaluated by analyzing 4 index traits (protein, fat percentage, udder-health, and female fertility). Model predictive ability was assessed by r2GEBV,EBV in the 5-fold cross validation. The best model (which was a common prior distribution in this study) was used to analyze all the 18 index traits.
The analyses were carried out using the IBAY package (Janss Luc, Faculty of Agricultural Sciences, Aarhus University, Tjele, Denmark; personal communication). The Gibbs sampler was run as a single chain with a length of 50,000 samples. Convergence was monitored by graphical inspection in variance of scaling factors and the correlation between GEBV from two separate rounds. The first 20,000 samples were discarded as burn-in. Every 10th sample of the remaining 30,000 was saved to estimate the features of the realized posterior distributions.
The mean, cross-year standard deviation and within-year standard deviation of the published EBV and their reliabilities for the genotyped bulls are shown in Table 2. The published EBV were standardized to a mean of 100 for the cows born 3-5 years (for production and conformation traits, animal model) or for the bulls born 7-9 years (for the remaining traits, sire model) before publication, and standardized to a standard deviation of 10 for bulls born in 1997 and 1998. The cross-year standard deviations for yield-index, protein, milk, fertility and other-disease were higher than 10, reflecting a genetic change over the years for these traits. Within-year standard deviations were close to 10 for all traits except for longevity (8.6) and growth (11.5), indicating that the genotyped bulls represented the genetic variation of bulls in the population. Reliabilities of EBV differed among 18 traits, and were consistent with heritabilities of the traits. There was a large variation in reliabilities within trait for low heritability traits, but small for high heritability traits.
The effect of changing the prior distribution of scaling factors on the predictive ability was investigated on fertility, protein, udder-health and fat percentage. The predictive abilities of the models with different priors for scaling factors were evaluated by within-year r2GEBV,EBV, based on a 5-fold cross validation. Table 3 clearly shows that r2GEBV,EBV increased with increasing prior proportion (π1) of SNP with large effects within each subsets. Pooled over 5 subsets, r2GEBV,EBV increased from 0.347 (π1=0.05) to 0.412 (common prior, i.e., π1=1.0) for fertility, from 0.279 to 0.412 for protein, from 0.338 to 0.435 for udder-health, and from 0.670 to 0.700 for fat percentage. r2GEBV,EBV for fertility were similar when using models with π1=0.50 and common prior, and for fat percentage, the values were similar when using models with π1=0.20, π1=0.50, and common prior. It was found that variation of r2GEBV,EBV among the 5 subsets was larger in fertility and udder-health (the traits having low heritability) than protein and fat percentage (high heritability).
It was found that the prior distribution of scaling factors influenced the estimates of genetic marker effects. Taking fertility as an example, the distribution of marker effects (expressed as absolute value of the difference between two allele effects) followed a Gamma distribution for all scenarios (
It can be observed that the variation of marker effects increased with increasing shrinkage intensity (i.e., decreasing π1) for all 4 traits. The means of marker effects decreased with increasing shrinkage intensity. Model fit was inspected by coefficient of determination (R2 of the model based on the training data). As expected, the coefficient of determination increased with decreasing shrinkage intensity, due to increasing the freedom of explanatory variables to fit data.
Since models with a common prior distribution of scaling factors generally provided better predictive abilities than mixture prior models, this model was chosen to estimate SNP effects and predict breeding values for 18 traits in Danish Holstein population. Table 4 presents within-year r2GEBV,EBV and expected reliability of GEBV calculated from PEV for bulls in the test data. r2GEBV,EBV ranged from 0.252 to 0.700 with an average of 0.418. Expected reliability ranged from 0.494 to 0.733 with an average of 0.546. For all traits, expected reliability was higher than r2GEBV,EBV.
It was observed that the variation in r2GEBV,EBV among the 18 traits was larger than the variation in expected reliabilities, but the patterns of ranks were similar. Product moment correlation and rank correlation between the two parameters were 0.883 and 0.813, respectively. Although the heritabilities for these traits differed considerably, the difference in accuracies of GEBV between low heritability traits and high heritability traits were relatively small, indicating that the reliability of GEBV was not very strongly influenced by heritability. For example, fertility, feet-legs, udder-health, and other-diseases had an expected reliability of GEBV and a r2GEBV,EBV as high as or close to those for production traits.
The reliability of GEBV is a critical criterion in deciding whether GEBV is suitable for practical genetic evaluation. In the present example, 5 scenarios of prior distribution for variance of SNP effects were assessed. The common prior model performed generally better than the mixture prior models, therefore this model was used to investigate reliability of GEBV for 18 traits in Danish Holstein population. Cross validation shows that the accuracy of GEBV was significantly greater than conventional parent average EBV.
In the present example, reliability of GEBV was evaluated by r2GEBV,EBV and expected reliability (calculated from PEV). It is seen that r2GEBV,EBV were lower than the expected reliabilities. The lower r2GEBV,EBV could be due to the facts that EBV contained error and the animals in the validation were selected from elite parents, instead of random samples. On the other hand, it is also possible that the expected reliability may overestimate the reliability of GEBV. An alternative is to measure reliability of GEBV as r2GEBV,EBV divided by reliability of EBV. This is to assume the correlation between GEBV and EBV was through their correlation with true breeding value (i.e., no correlation between prediction errors of GEBV and EBV). Thus, rGEBV,EBV=rGEBV,TBVrEBV,TBV, r2GEBV,TBV=r2GEBV,EBV/r2EBV,TBV, where TBV is true breeding value. However, based on the present data, reliability estimated using this approach seemed too high to be acceptable for some low-heritability traits, thus implying that prediction errors of GEBV and EBV were not completely independent. We suggest that the true reliability of GEBV (r2GEBV,TBV) in the present data could be between r2GEBV,EBV and the expected reliability. Thus averaged over 18 traits, reliability of GEBV could be in the interval between 0.42 and 0.54. The figures are considerably greater than the reliability of the conventional parent average EBV. It indicates that genomic prediction can effectively improve the accuracy of pre-selection for young bulls, compared with traditional selection based on parent average.
The difference in reliability of GEBV between low heritability traits and high heritability traits was relatively small. In the present example, marker effects were estimated from published EBV. The influence of heritability on GEBV was through its influence on reliability of EBV. However the published EBVs were predicted from a very large dataset, resulting in a relatively high accuracy even for the traits with low heritability. Moreover, in genomic prediction, each individual in reference data has a contribution to marker effects. In other words, the GEBV of a candidate is actually obtained from the information of all individuals in the reference data. The benefit from information of other animals for the traits with low heritability is relatively greater than that for the traits with high heritability. The weak dependency on heritability indicates that genetic evaluation based on GEBV would be relatively more beneficial for the traits with low heritability. Previous studies on marker assisted selection have shown that gain in response rate is larger for traits with lower heritability (Lande and Thompson, 1990; Meuwissen and Goddard, 1996). However, these calculations were conditional on the fact that QTL had been identified, which is much more difficult for low heritability traits due to low statistical power of detection. Using genomic selection, the step of testing for QTL is circumvented. This is a reason that accurate GEBV can be obtained even for low heritability traits. As a consequence of a relatively weak dependency of GEBV on heritability, it becomes easier to improve functional traits and to obtain a balanced genetic progress between functional traits and production traits.
Five scenarios of prior distributions of the variance of SNP effects were investigated in this example. Using a single-marker approach, it was found that the model with a common distribution of scaling factors (standard deviations) had a better predictive ability than models assuming a mixture distribution. Also the predictive ability of the model using a mixture distribution increased with increasing assumed proportion of SNP having large effect. VanRaden et al. (2009) reported that predictive ability of a nonlinear BLUP model was considerably better than a linear BLUP model for fat percentage and protein percentage, while the predictive abilities were similar for other 25 traits. However, the linear BLUP model is not equivalent to the common prior model in the present study; the latter allows allele effects of different marks to have different variances. In simulation studies, Meuwissen et al. (2001) reported that the accuracy of GEBV using BayesB (similar to mixture prior distribution in the present example) was higher than that using BayesA (common prior distribution), and Lund et al. (2009) found that mixture models predicted breeding value better than the models with a common prior distribution of variances or the models with equal variance for all SNP. Both studies were based on the data in which QTL effects were simulated from a Gamma distribution with shape parameter 0.4 (L shape).
There are many possible reasons why the model with a mixture prior distribution of scaling factors did not perform better than the model with a common prior distribution in the present dataset. Firstly, the mixture prior distribution of scaling factors is based on the hypothesis that few genes have a large effect and a large number of genes have a small effect, and the distribution of QTL effects follows a Gamma distribution of L shape. The hypothesis is supported by the derived distribution of QTL effects reported by Hayes and Goddard (2001). However, the distribution of SNP effects is not necessary to be consistent with the distribution of QTL effects. Many SNP could located in a chromosome segment with large effect, thus the effect of the chromosome segment could be divided over many SNPs. On the other hand, the effect of a QTL might not be fully accounted for by a single marker, because of incomplete linkage between marker and QTL. In any way, for markers with small effect, it is difficult to find the optimal proportion and variance of scaling factors.
The accuracy of GEBV was evaluated using a 5-fold cross validation. The advantage of multiple-fold cross validation is that it can retain training data as large as possible, while keep the test data as large as required (with maximal total test data equal to the whole data). There was a variation in r2GEBV,EBV between 5 sets of cross validations (Table 3), indicating that enough number of individuals in test data is important for validating reliability of GEBV. In the cross validation, each set of training data left many half-sib families out, instead of leaving a random sample out. This strategy greatly reduces the dependency between the training data and the test data, because the individuals in the test data did not have their sibs in the training data.
In this example, marker effects were estimated by fitting a model to published EBV. The advantage of using EBV is that they can be obtained directly from routine genetic evaluations. In addition they contain little random error, which greatly reduces the prediction error variance. This could be important in situations where the number of genotyped animals in the reference data is small. An alternative type of response variable is daughter yield deviation.
The reliabilities of GEBV in the present example indicate that genomic selection is a very promising tool in cattle breeding. Moreover, genomic prediction can be further improved by several approaches. Firstly, reliability of GEBV can increase with increasing data size (the number of individuals with both genotypes and phenotypes) to estimate marker effects. Secondly, the reliability may be improved by using other statistical models. Thirdly, the reliability of GEBV for an index trait may be improved by predicting genomic breeding value for each single trait, and then calculating the GEBV of the index trait, instead of predicting the index trait directly. Finally, higher accuracy of genomic selection can be obtained by a genomic selection index which combines GEBV and other sources of information, such as parent EBV from conventional national genetic evaluation.
Averaged over all 18 index traits, the reliability of GEBV is considerably greater than the reliability of conventional parent average EBV. It clearly provides that genomic selection can greatly improve the accuracy of pre-selection for young bulls, compared with traditional selection based on parent average. Based on the data in this example, the model with a common prior distribution of scaling factors had better predictive ability than those models with a mixture prior distribution.
The following items define specific embodiments of the present invention
Item 1. A method of determining a phenotypic trait in a bovine subject, comprising detecting in a sample from said bovine subject the presence or absence of at least one genetic marker allele or a specific combination of genetic marker alleles, wherein said genetic marker allele or a specific combination of genetic marker alleles is associated with said phenotypic trait of said bovine subject and/or off-spring therefrom.
Item 2. The method according to Item 1, wherein said at least one genetic marker is a single nucleotide polymorphism (SNP), or a microsatellite marker.
Item 3. The method according to any of the preceding, comprising determining multiple genetic marker alleles in a sample from said bovine subject.
Item 4. The method according to Item 3, comprising determining at least 50 genetic marker alleles, such as at least 100 genetic marker alleles, such as at least 200 genetic marker alleles, such as at least 300 genetic marker alleles, such as at least 400 genetic marker alleles, such as at least 500 genetic marker alleles, such as at least 600 genetic marker alleles, such as at least 700 genetic marker alleles, such as at least 800 genetic marker alleles, such as at least 900 genetic marker alleles, such as at least 1000 genetic marker alleles, such as at least 2000 marker alleles, for example at least 3000 marker alleles, such as at least 4000 genetic marker alleles, for example at least 5000 genetic marker alleles, such as at least 6000 marker alleles, for example at least 7000 marker alleles, such as at least 8000 genetic marker alleles, for example at least 9000 genetic marker alleles, such as at least 10000 marker alleles, for example at least 12000 marker alleles, such as at least 14000 genetic marker alleles, for example at least 16000 genetic marker alleles, such as at least 18000 marker alleles, for example at least 20000 marker alleles, for example at least 22000 marker alleles, such as at least 24000 genetic marker alleles, for example at least 26000 genetic marker alleles, such as at least 28000 marker alleles, for example at least 30000 marker alleles, for example at least 32000 marker alleles, such as at least 34000 genetic marker alleles, for example at least 36000 genetic marker alleles, such as at least 38000 marker alleles, for example at least 40000 marker alleles, for example at least 42000 marker alleles, such as at least 44000 genetic marker alleles, for example at least 46000 genetic marker alleles, such as at least 48000 marker alleles, for example at least 50000 marker alleles, for example at least 52000 marker alleles, such as at least 54000 genetic marker alleles, for example at least 56000 genetic marker alleles, such as at least 58000 marker alleles, for example at least 60000 marker alleles, for example at least 62000 marker alleles, such as at least 64000 genetic marker alleles, for example at least 66000 genetic marker alleles, such as at least 68000 marker alleles, for example at least 20000 marker alleles, for example at least 72000 marker alleles, such as at least 74000 genetic marker alleles, for example at least 76000 genetic marker alleles, such as at least 78000 marker alleles, for example at least 80000 marker alleles, for example at least 82000 marker alleles, such as at least 84000 genetic marker alleles, for example at least 86000 genetic marker alleles, such as at least 88000 marker alleles, for example at least 90000 marker alleles, for example at least 92000 marker alleles, such as at least 94000 genetic marker alleles, for example at least 96000 genetic marker alleles, such as at least 98000 marker alleles, for example at least 100000 marker alleles.
Item 5. The method according to any of the preceding, wherein said phenotypic trait is selected from the group consisting of Birth ease, Body score, Calving ease, Fat, Fat percent, Fertility, Health, Leg, Longevity, Milk, Milk organ, Milk speed, Protein, Prot. percent, Temperament, Udder health, Yield, Average, and other diseases.
Item 6. The method according to any of the preceding, wherein said sample is selected from the group consisting of blood, semen (sperm), urine, liver tissue, muscle, skin, hair, follicles, ear, tail, fat, testicular tissue, lung tissue, saliva, spinal cord biopsy, and any other tissue.
Item 7. The method according to any of the preceding, wherein said sample is blood, muscle tissue or liver tissue.
Item 8. The method according to any of the preceding, wherein said sample is blood.
Item 9. The method according to any of the preceding, wherein said bovine subject is a member of the Holstein breed
Item 10. The method according to any of the preceding, wherein said bovine subject is a member of the Danish Holstein cattle population
Item 11. The method according to any of the preceding, wherein at least one of said genetic marker alleles is detected by use of allele specific oligonucleotide primers.
Item 12. A method for selecting bovine subjects for breeding purposes, said method comprising detecting in a sample from said bovine subject the presence or absence of at least one genetic marker allele or a specific combination of genetic marker alleles as defined in any of the preceding, wherein said at least one genetic marker allele or a specific combination of genetic marker alleles is associated with at least one trait according to Item 5 of said bovine subject and/or off-spring therefrom.
Item 13. A diagnostic kit for determining the presence or absence in a bovine subject of at least one genetic marker allele or a specific combination of genetic marker alleles, wherein said genetic marker allele or a specific combination of genetic marker alleles is associated with a phenotypic trait of said bovine subject and/or off-spring therefrom.
Item 14. The diagnostic kit according to Item 13, for determining multiple genetic marker alleles that are associated with a phenotypic trait of said bovine subject and/or off-spring therefrom.
Item 15. The diagnostic kit according to any of Item 13 and Item 14, comprising at least one oligonucleotide for genotyping said bovine subject in respect of a genetic marker allele as defined in Item 1.
Item 16. The kit according to any of Item 13 to Item 15, further comprising reagents and buffers required for genotyping.
Item 17. The diagnostic kit according to Item 16, wherein genotyping is microsatellite genotyping and/or single nucleotide polymorphism genotyping.
Item 18. The diagnostic kit according to any of Item 13 to Item 17 further comprising at least one reference sample.
Item 19. The diagnostic kit according to Item 18, wherein said reference sample comprises at least one oligonucleotide sequence of a genetic marker allele associated with a phenotypic trait as defined in claim.
Item 20. The kit according to any of Item 13 to Item 19, further comprising instructions for the performance of the detection method of the kit, and for the interpretation of the results.
Number | Date | Country | Kind |
---|---|---|---|
PA 2008 01128 | Aug 2008 | DK | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/DK2009/050203 | 8/19/2009 | WO | 00 | 9/26/2011 |