Use of deletion polymorphisms to predict, prevent, and manage histoincompatibility

Information

  • Patent Application
  • 20070172853
  • Publication Number
    20070172853
  • Date Filed
    December 04, 2006
    17 years ago
  • Date Published
    July 26, 2007
    17 years ago
Abstract
Disclosed herein are methods for predicting the immunocompatibility of two subjects that include determining the presence or absence of one or more deletion variants in the DNA sequence of a gene, where the deletion variant substantially prevents expression of the protein encoded by the gene.
Description
BACKGROUND OF THE INVENTION

Organ and bone marrow transplantation are routinely used for the treatment of patients with end-stage disease such as leukemia, liver failure due to hepatitis C, and kidney failure. While the frequency of organ and tissue transplants has increased dramatically over the past decades, histoincompatibility between the transplant recipient and the donor remains a significant barrier to the success of the transplant.


Histocompatibility, also known as immunocompatibility, refers to the compatibility between two individuals or the actual organs or tissues to be transplanted (also known as “grafts”). Consequences of histoincompatibility include graft rejection, also known as host versus graft disease (HVGD) in organ transplant, and graft versus host disease (GVHD), typically associated with bone marrow transplants. In GVHD, immune cells derived from donor hematopoetic stem cells identify host tissue as foreign and mount an immune response against them. In HVGD, host immune cells identify the graft organ as foreign and mount an immune response against it. Both GVHD and HVGD are debilitating conditions and can require patients to be placed on severe immunosuppressive regimens, with attendant complications. Immunocompatibility largely depends on the genetic similarities between donor and recipient and is generally determined by blood typing and by Major Histocompatibility Complex (MHC) typing, which in humans is also referred to as the Human Leukocyte Antigen (HLA) typing. The MHC of humans is a cluster of genes occupying a region located on the sixth chromosome. The strongest antigens of the MHC are separated into two classes—class I and class II. Class I and II MHC molecules are found in nearly every cell in the body and are the major determinants used by the body's immune system for recognition and differentiation of self from non-self. MHC molecules present antigen peptides to the T cells of the immune system and different MHC molecules differ in the efficiency with which they bind sequences of the antigenic peptides and some are better than others at presenting antigens to the immune system. The class I MHC molecules are encoded by three loci—HLA A, HLA B, and HLA C—and class II MHC molecules are encoded by three loci—HLA DR, HLA DP, and HLA DQ. While the number of alleles at each locus varies widely, a person can only inherit two alleles for each HLA locus. The large number of possible combinations at each locus make the genes of the MHC the most polymorphic loci known.


Every person's HLA pattern can be “fingerprinted” through tissue typing. Tissue typing, or HLA matching, is used to measure the pattern of HLA antigens present for a potential transplant donor and recipient and to determine the level of compatibility between them. The more similar the HLA antigen patterns are from the two tissue samples, the less likely it is that the graft will be rejected.


HLA typing has revolutionized the treatment of many end-stage diseases by increasing the success rate of transplantation of bone marrow cells or organs, but graft rejection still occurs with significant frequency even in sibling transplants in which donor and host are perfectly matched for all blood type and HLA antigens. This may be due, at least in part, to the fact that many other histocompatibility antigens have not yet been identified.


Despite the advances in tissue typing and the creation of numerous tissue and organ registries used to screen potential donors and recipients prior to transplantation, the prevalence of life-threatening complications such as graft failure and rejection remains a significant barrier to the overall success of transplantation.


SUMMARY OF THE INVENTION

The compatibility of bodily tissues with the immune system is a central and unpredictable feature of the etiology of numerous medical conditions, including the rejection of allografts, the development of GVHD and HVGD, spontaneous abortions, and the treatment of many hematologic disorders. Improvements to the methods currently available for screening recipients in need of a transplant against potential donors are necessary to reduce the likelihood of graft rejection, GVHD, and HVGD.


Histoincompatibility is generally believed to be due to genetic differences or polymorphisms between individuals. Because the DNA of any two individuals is known to differ at millions of single-nucleotide polymorphisms (SNP) scattered throughout the human genome, it is often assumed that histoincompatibility results from a large number of small differences between the antigen repertoires of the two individuals. However, we have discovered places in the human genome in which entire segments, ranging from hundreds of base pairs to multi-kilobases of the human genome, are present in some individuals and missing in others. Many of these individual “deletion polymorphisms” or “deletion variants” remove protein-coding sequences from the human genome, and thus result in large changes to an individual's antigen repertoire relative to the changes associated with individual SNPs.


When a deletion variant appears in all copies of the gene in an individual, the result is generally a lack of expression of the gene product in that individual. If an individual does not have the deletion in all copies of the gene, the gene is present and the gene product is generally expressed. As a result, the immune cells of an individual with a deletion variant in all copies of the gene will not have been exposed to this gene or its product, and will tend to recognize the gene product as foreign when it is presented on tissue from another individual. In the context of transplant, this will result in an immune response when the donor and host are not matched, also known as a “deletion mismatch” for the specific deletion polymorphism. For example, in the context of organ transplant, a person having a deletion variant in gene X that results in a lack of expression of gene X that receives a kidney from a donor that does not have a deletion variant in gene X, and is therefore positive for gene X, could mount an immune response against the antigen encoded by gene X and the cells which express it. In the context of bone marrow transplant, immune cells from a donor having a deletion variant in gene X, if transplanted into an individual who is positive for gene X, could mount an immune response against the product of gene X and the cells that express it. In the context of fetal loss, a mother who lacks gene X could miscarry a fetus which is positive for gene X due to an immune response by the mother against the product of gene X.


Several of these common deletion variants are present in genes that are specifically expressed in organs relevant to transplantation and are likely to be determinants used by the body's immune system for recognition and differentiation of self from non-self. If the presence of a deletion resulting in the absence of the antigen is not matched between two subjects for whom immunocompatibility is desired (e.g., a graft donor and a graft recipient), the result is an immune response mounted by the subject having the deletion (i.e., lacking the gene product) against the protein product present in the subject lacking the deletion (i.e., having the expressed protein product). Therefore, these common deletion variants that affect the expression of antigens can be used to screen individuals for immunocompatibility, and used to manage, measure, prevent, and provoke histoincompatibility.


Accordingly, in one aspect, the invention features a method for predicting immunocompatibility of the immune system of a first subject with a cell, tissue, or organ from a second subject that includes the following steps. A biological sample from a first subject and a biological sample from a second subject are obtained and the presence or absence of at least one deletion variant in the DNA sequence of a gene in the first and second biological samples is determined, where the deletion variant substantially prevents expression of an antigen encoded by the gene and where the deletion variant is in a gene selected from the group consisting of UGT2B28, TRY6, LCE3C, PRB1, OR51A2, ORF4F5, GNB1L, MGAM, and MCEE. The deletion variant can be a common deletion variant and can be in anywhere in the gene including the coding region or in a regulatory element of the gene. In one embodiment, the deletion variant is at least 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 bp, or 2 kb, 3 kb, 4 kb, 5 kb, 7 kb, 8 kb, 9 kb, 10 kb, 100 kb, 200 kb, 300 kb, 400 kb, 500 kb, 600 kb, 700 kb, 745 kb, 800 kb, 900 kb, or 1000 kb in length. In one embodiment, the deletion variant is between 100 bp and 745 kb in length. In another embodiment, at least two, three, four, five, six, seven, eight, nine, ten or more deletion variants are identified. The presence or absence of the deletion variant can be determined, for example, by polymerase chain reaction, DNA sequencing, sequencing of the whole genome, or a subset thereof, Southern blotting, restriction fragment length polymorphism analysis, microelectrophoresis, sequencing by hybridization, single molecule sequencing, or microarray analysis. The presence or absence of the deletion variant can also be determined indirectly by testing polymorphisms (e.g., SNPs) that are in linkage disequilibrium with deletion polymorphisms or by genotyping polymorphisms (e.g., SNPs) that are inside a deleted region to infer the presence of a deletion that removes the site of the SNP. Preferably, the deletion is in a gene that is normally expressed in the biological sample.


The presence or absence of the at least one deletion variant in the DNA sequence of the gene is then compared between the first and second subjects. The immune system of the first subject is immunocompatible with a cell, tissue, or organ from the second subject if the comparison results in one of the following: (i) the first subject has at least one intact copy of the gene, where the antigen encoded by the gene is expressed or (ii) the second subject has a deletion variant in all copies of the gene, where the deletion variant substantially prevents expression of the antigen encoded by the gene. Based on this comparison, three possible scenarios would predict immunocompatibility between the immune system of the first subject and the cell, tissue, or organ from the second subject: 1) both the first and second subjects have a deletion variant in all copies of the gene, which substantially prevents expression of the antigen, encoded by the gene, in both subjects; 2) both subjects have at least one intact copy of the gene and the antigen encoded by the gene is expressed; and 3) the second subject has a deletion variant in all copies of the gene that substantially prevents expression of the antigen encoded by the gene and the first subject has at least one intact copy of the gene that does not have a deletion variant, in which case, the antigen is expressed.


In one embodiment, the method further includes determining the presence or absence of at least one additional deletion variant in the DNA sequence of a gene in the first and second biological sample where the deletion variant substantially prevents expression of an antigen encoded by the gene and where the at least one additional deletion variant is in a gene selected from the group consisting of UGT2B17, GSTT1, GSTM1, and CYP2A6. The presence or absence of the at least one additional deletion variant in the DNA sequence of the gene is then compared between the first and second subjects. The immune system of the first subject is immunocompatible with a cell, tissue, or organ from the second subject if the comparison results in one of the following: (i) the first subject has at least one intact copy of the gene, where the antigen encoded by the gene is expressed or (ii) the second subject has a deletion variant in all copies of the gene, where the deletion variant substantially prevents expression of the antigen encoded by the gene. Based on this comparison for the additional deletion variant, any of the three possible scenarios described above would predict immunocompatibility between the immune system of the first subject and the cell, tissue, or organ from the second subject. In one desirable combination, the at least one deletion variant is in the UGT2B28 gene and the at least one additional deletion variant is in the UGT2B17 gene. In another desirable combination, the at least one deletion variant is in the UGT2B28 gene and the at least one additional deletion variant is in the GSTT1 or GSTM1 gene, or both.


In a related aspect, the invention features a method for predicting immunocompatibility of the immune system of a first subject with a cell, tissue, or organ from a second subject that includes the following steps. A biological sample from a first subject and a biological sample from a second subject are obtained and the presence or absence of at least one deletion variant antigen in the first and second biological samples is determined, for example, using immunological methods (e.g., ELISA or western blotting based methods). The at least one deletion variant antigen can be a common deletion variant antigen and is preferably one of the following: UGT2B28, TRY6, LCE3C, PRB1, OR51A2, ORF4F5, GNB1L, MGAM, and MCEE. The deletion variant antigen is not an antigen encoded by an MHC, HLA, or Rh factor gene. In one embodiment, at least two, three, four, five, six, seven, eight, nine, ten or more deletion variant antigens are compared.


The presence or absence of the deletion variant antigen is then compared between the first and second subjects. The immune system of the first subject is immunocompatible with a cell, tissue, or organ from the second subject if the comparison results in one of the following: i) the first subject expresses the at least one deletion variant antigen or (ii) the second subject does not express the at least one deletion variant antigen. Based on this comparison, three possible scenarios would predict immunocompatibility between the immune system of the first subject and the cell, tissue, or organ from the second subject: 1) both the first and second subjects express the deletion variant antigen; 2) both the first and second subjects do not express the deletion variant antigen; and 3) the first subject expresses the deletion variant antigen and the second subject does not express the deletion variant antigen.


In one embodiment, the method further includes determining the presence or absence of at least one additional deletion variant antigens selected from the group consisting of UGT2B17, GSTT1, GSTM1, and CYP2A6. The presence or absence of the at least one additional deletion variant antigen is then compared between the first and second subjects. The immune system of the first subject is immunocompatible with a cell, tissue, or organ from the second subject if the comparison results in one of the following: i) the first subject expresses the at least one additional deletion variant antigen or (ii) the second subject does not express the at least one additional deletion variant antigen. Based on this comparison for the additional deletion variant, any of the three possible scenarios described above would predict immunocompatibility between the immune system of the first subject and the cell, tissue, or organ from the second subject. In one desirable combination, the at least one deletion variant antigen is UGT2B28 and the at least one additional deletion variant antigen is UGT2B17. In another desirable combination, the at least one deletion variant antigen is UGT2B28 and the at least one additional deletion variant antigen is GSTT1 or GSTM1, or both.


In a related aspect, the invention also features a method for predicting the immunocompatibility of the immune system of a first subject with a cell, tissue, or organ from a second subject that includes the following steps. A biological sample is obtained from the first subject and second subjects. The presence or absence of one or more deletion variants in the DNA sequence of at least one gene in the biological samples is determined, where the one or more deletion variants substantially prevents the expression of an antigen encoded by the at least one gene. The deletion variant is not in an MHC, Rh factor, or HLA gene. The deletion variant can be a common deletion variant and can be in anywhere in the gene including the coding region or in a regulatory element of the gene. In one embodiment, the deletion variant is at least 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 bp, or 2 kb, 3 kb, 4 kb, 5 kb, 7 kb, 8 kb, 9 kb, 10 kb, 100 kb, 200 kb, 300 kb, 400 kb, 500 kb, 600 kb, 700 kb, 745 kb, 800 kb, 900 kb, or 1000 kb in length. In one embodiment, the deletion variant is between 100 bp and 745 kb in length. The presence or absence of the deletion variant can be determined, for example, by polymerase chain reaction, DNA sequencing, Southern blotting, restriction fragment length polymorphism analysis, or microarray analysis. The presence or absence of the deletion variant can also be determined indirectly by testing polymorphisms (e.g., SNPs) that are in linkage disequilibrium with deletion polymorphisms or by genotyping polymorphisms (e.g., SNPs) that are inside a deleted region to infer the presence of a deletion that removes the site of the SNP. Preferably, the deletion is in a gene that is normally expressed in the biological sample. Preferably, the deletion variant is in one of the following genes: UGT2B17, UGT2B28, TRY6, LCE3C, GSTM1, GSTT1, CYP2A6, PRB1, OR51A2, ORF4F5, GNB1L, MGAM, and MCEE.


The presence or absence of the deletion variants is then used to determine the deletion variant pattern for the first and second subjects. The deletion variant pattern is compared between the first and second subjects and the immune system of the first subject is immunocompatible with a cell, tissue, or organ from the second subject if the subjects have a substantially identical deletion variant pattern (e.g., at least 50%, 60%, 70%, 80%, 90%, 95%, 99%, or 100% identical) and the subjects are not immunocompatible if they do not have a substantially identical deletion variant pattern (e.g., less than 50%, 40%, 30%, 20%, 10%, 5%, or less The immune system of the first subject is also immunocompatible with a cell, tissue, or organ from the second subject if the comparison results in one of the following: (i) the first subject has at least one intact copy of at least one gene, where the antigen encoded by the gene is expressed or (ii) the second subject has a deletion variant in all copies of the at least one gene, where the deletion variant substantially prevents expression of the antigen encoded by the gene. Based on this comparison, three possible scenarios would predict immunocompatibility between the immune system of the first subject and the cell, tissue, or organ from the second subject: 1) both the first and second subjects have a deletion variant in all copies of the same gene, which substantially prevents expression of the antigen, encoded by the gene, in both subjects; 2) both subjects have at least one intact copy of the same gene and the antigen encoded by the gene is expressed; and 3) the second subject has a deletion variant in all copies of the same gene that substantially prevents expression of the antigen encoded by the gene and the first subject has at least one intact copy of the same gene that does not have a deletion variant, in which case, the antigen is expressed.


Optionally, the method can further include determining the presence or absence of the antigen encoded by the at least one gene that is not an MHC gene, where the presence or absence of the antigen is used to determine the deletion variant antigen pattern for the first and second subjects. The deletion variant antigen pattern is compared between the first and second subjects and the immune system of the first subject is immunocompatible with a cell, tissue, or organ from the second subject if the subjects have a substantially identical deletion antigen variant pattern (e.g., at least 50%, 60%, 70%, 80%, 90%, 95%, 99%, or 100% identical) and the subjects are not immunocompatible if they do not have a substantially identical deletion variant antigen pattern (e.g., less than 50%, 40%, 30%, 20%, 10%, 5%, or less than 1% identical). The immune system of the first subject is immunocompatible with a cell, tissue, or organ from the second subject if the comparison results in one of the following: (i) the first subject expresses the deletion variant antigen or (ii) the second subject does not express the deletion variant antigen. Based on this comparison, three possible scenarios would predict immunocompatibility between the immune system of the first subject and the cell, tissue, or organ from the second subject: (1) both the first subject and the second subject express the deletion variant antigen, (2) both the first subject and the second subject do not express the deletion variant antigen, or (3) the first subject expresses the deletion variant antigen and the second subject does not express the antigen.


In another aspect, the invention features a method for predicting immunocompatibility of the immune system of a first subject with a cell, tissue, or organ from a second subject that includes the following steps. A biological sample from a first subject and a biological sample from a second subject are obtained and the DNA sequence of the whole genome, or a subset thereof, is determined. The sequence of the whole genome, or subset thereof from the first sample and the second sample are then compared and the presence or absence of at least one deletion mismatch loci is determined. A deletion mismatch loci includes at least one deletion variant in the DNA sequence of a gene, where the deletion variant substantially prevents expression of an antigen encoded by the gene. In one embodiment, the deletion variant is in the DNA sequence of any one or more of the following genes: UGT2B17, UGT2B28, TRY6, LCE3C, GSTM1, GSTT1, CYP2A6, PRB1, OR51A2, ORF4F5, GNB1L, MGAM, and MCEE. The deletion variant can be a common deletion variant and can be in anywhere in the gene including the coding region or in a regulatory element of the gene. In one embodiment, the deletion variant is at least 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 bp, or 2 kb, 3 kb, 4 kb, 5 kb, 7 kb, 8 kb, 9 kb, 10 kb, 100 kb, 200 kb, 300 kb, 400 kb, 500 kb, 600 kb, 700 kb, 745 kb, 800 kb, 900 kb, or 1000 kb in length. In one embodiment, the deletion variant is between 100 bp and 745 kb in length. In another embodiment, at least two, three, four, five, six, seven, eight, nine, ten or more deletion mismatch loci are identified. The whole genome, or a subset thereof, can be sequenced using any technique known in the art including, but not limited to, microelectrophoresis, genomic hybridization, single molecule sequencing, or microarray analysis. Preferably, the deletion mismatch is a deletion variant in a gene that is normally expressed in the biological sample. Alternatively or additionally, the sequence of the genome or subset thereof of the first subject can be compared to a reference genome DNA sequence, where the reference genome sequence can be the DNA sequence from a third subject or from a composite of multiple subjects.


The immune system of the first subject is immunocompatible with a cell, tissue, or organ from the second subject if the comparison results in one of the following: (i) the first subject has at least one intact copy of the gene, where the antigen encoded by the gene is expressed or (ii) the second subject has a deletion variant in all copies of the gene, where the deletion variant substantially prevents expression of the antigen encoded by the gene. Based on this comparison, three possible scenarios would predict immunocompatibility between the immune system of the first subject and the cell, tissue, or organ from the second subject: 1) both the first and second subjects have a deletion variant in all copies of the gene, which substantially prevents expression of the antigen, encoded by the gene, in both subjects; 2) both subjects have at least one intact copy of the gene and the antigen encoded by the gene is expressed; and 3) the second subject has a deletion variant in all copies of the gene that substantially prevents expression of the antigen encoded by the gene and the first subject has at least one intact copy of the gene that does not have a deletion variant, in which case, the antigen is expressed.


Each of the above methods can be used alone or in combination to determine immunocompatibility between an organ, tissue, or cell donor and a recipient or between a woman and a potential father, an embryo, or fetus (collectively referred to as “maternal/fetal compatibility”). For organ transplants and maternal/fetal compatibility, the first subject is the organ or tissue recipient or the woman and the second subject is the organ or tissue donor, the prospective father, or the embryo or fetus. In each of these scenarios, the immune system of the recipient would not be newly exposed to the antigen upon transplantation. For bone marrow or peripheral blood transplantation, the first and second subjects are reversed, that is, the first subject is the bone marrow or peripheral blood donor and the second subject is the bone marrow or peripheral blood recipient.


Each of the above methods can further include determining the blood type or the MHC type for the first or second subject. In various embodiments of the above aspects, the first or second biological sample is an organ, or part thereof, a tissue, or a bodily fluid, such as blood, serum, plasma, bone marrow, cerebrospinal fluid, amniotic fluid, urine, saliva, or semen.


In one example of the above aspects, the second subject is in need of a bone marrow or peripheral blood transplant and the first subject is a potential bone marrow or peripheral blood donor and the method is used to determine if the two subjects are a donor/recipient match. In this example, the deletion variant can be identified, for example, in a UGT2B17, UGT2B28, GSTM1, GSTT1, MGAM, or CYP2A6 gene or in the antigen encoded by the any of the genes. In another example, the first subject is an organ or tissue recipient and the second subject is a potential organ or tissue donor and the method is used to determine if the two subjects are a donor/recipient match. For example, the methods can be used to identify a donor/recipient match for a subject in need of a liver transplant where the deletion variant is preferably identified in one or more of the following genes: UGT2B17, UGT2B28, GSTM1, GSTT1, and CYP2A6, or in the antigens encoded by any of the genes. In another example, the methods can be used to identify a donor/recipient match for a subject in need of a kidney transplant where the deletion variant is identified in a UGT2B28, GSTT1 or GSTM1 gene or in the antigens encoded by the genes.


In yet another example of the above aspects, the method is used to predict the immunocompatibility of prospective parents (e.g., where the first subject is a woman and the second subject is a prospective father or a potential sperm donor) or between a woman and an embryo (e.g., an embryo that is conceived by in vitro fertilization) or a pregnant woman and her fetus. Desirably, if the method is used to determine immunocompatibility between a woman and an embryo or fetus, the deletion variant antigen or deletion variant encoding the antigen is normally expressed by the fetal or embryonic cells. For example, the methods can be used to determine compatibility between a woman and an embryo or fetus where the deletion variant is preferably identified in one or more of the following genes: UGT2B28, UGT2B17, or LCE3C, which are expressed in the placenta, or in the antigens encoded by any of the genes.


If, using any of the above methods described herein, the first and second subjects are not immunocompatible, the deletion variant antigen can be administered to the first subject to tolerize the subject to the deletion variant antigen. The deletion variant antigen can be administered by gene therapy or protein therapy.


The methods of the above aspects can also be used to determine histoincompatibility. For example, if the second subject is in need of a bone marrow or peripheral blood transplant and the first subject is a bone marrow or peripheral blood donor, the method can be used to identify the subjects as a donor/recipient match if the first subject is not immunocompatible with the second subject. Such a method can be used, for example, to treat a subject that has a hematologic disorder (e.g., myelodysplastic syndrome, aplastic anemia, sickle cell anemia, metabolic disease, or a blood cell cancer such as Hodgkin's lymphoma, non-Hodgkin's lymphoma, leukemia, and multiple myeloma) and the desired outcome is for the donor's immune cells to attack the diseased cells in the host. For example, if the second subject has a blood cell cancer, the deletion variant is preferably detected in an antigen or in a gene that encodes an antigen that is specifically expressed on the cancer cells in the patient suffering from the blood cell cancer.


The invention also features a kit for deletion variant typing that includes at least one nucleic acid molecule that is complementary to a DNA sequence of at least a portion of a gene selected from the following: UGT2B28, TRY6, LCE3C, PRB1, OR51A2, ORF4F5, GNB1L, MGAM, and MCEE. The kit also includes instructions for the use of the nucleic acid molecule for deletion variant typing. The kit can further include at least one additional nucleic acid molecule that is complementary to the DNA of any one or of the following genes: UGT2B17, GSTT1, GSTM1, and CYP2A6. The nucleic acid molecule can be a primer used for a polymerase chain reaction or a probe that hybridizes to the gene at high stringency.


The invention also features a kit for deletion variant antigen typing that includes at least one binding agent (e.g., an antibody or fragment thereof) that specifically binds at least one antigen encoded by a gene selected from the following: UGT2B28, TRY6, LCE3C, PRB1, OR51A2, ORF4F5, GNB1L, MGAM, and MCEE. The kit can also include at least one binding agent (e.g., ann antibody or fragment thereof) that specifically binds a at least one antigen encoded by a gene selected from the following: UGT2B17, GSTT1, GSTM1, and CYP2A6. The kit also includes instructions for the use of the binding agent (e.g., antibody or fragment thereof) for deletion variant antigen typing.


By “antigen” is meant a polypeptide chain of two or more amino acids regardless of any post-translational modification (e.g., glycosylation or phosphorylation) that stimulates a cellular or humoral immune response.


By “biological sample” is meant a tissue biopsy, cell, bodily fluid (e.g., blood, serum, plasma, semen, urine, saliva, amniotic fluid, or cerebrospinal fluid), organ, or part thereof, or other specimen obtained from a patient or a test subject. Desirably, the biological sample includes nucleic acid molecules or polypeptides or both.


By “cell, tissue, or organ” is meant any cell, tissue or organ from the body or bodily fluid of a subject. Non-limiting examples of organs include kidney, liver, skin, pancreas, heart, lung, muscle, small bowel, hand, cornea, or any part thereof Non-limiting examples of tissues include skin, bone, heart valve, blood, bone marrow, semen, an embryo, and a fetus. Non-limiting examples of cells include red blood cells, white blood cells, stem cells, sperm, egg, embryonic cells, and fetal cells.


By “deletion variant” or “deletion polymorphism” is meant a segment of the genome that is present in some individuals of a species and absent in other individuals of that species. Deletion variants can vary in size from 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 bp, or 2 kb, 3 kb, 4 kb, 5 kb, 7 kb, 8 kb, 9 kb, 10 kb, 100 kb, 200 kb, 300 kb, 400 kb, 500 kb, 600 kb, 700 kb, 745 kb, 800 kb, 900 kb, or 1000 kb in length. In one embodiment, the deletion variant is between 100 bp and 745 kb in length. By “common deletion variant” is meant a deletion variant that is seen with a frequency of at least 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or at least 10% in a given population. Most common deletions appear to result from ancestral mutations that have been inherited by descent; their frequency is strongly related to ancestry, and they are in linkage disequilibrium with nearby SNP variants. Desirably, the deletion variant or common deletion variant is a deletion in all copies of the gene that prevents expression of a gene, or prevents expression of an antigen encoded by a gene. Deletion variants can be found in the exons, introns, or the coding region of the gene or in the sequences that control expression of the gene. Examples of protein-encoding genes identified as having common deletion polymorphisms include UGT2B17, UGT2B28, TRY6, LCE3C, GSTM1, GSTT1, CYP2A6, PRB1, OR51A2, ORF4F5, GNB1L, MGAM, and MCEE.


By “a deletion variant in all copies of the gene” or “homozygous deletion” is meant the deletion of all of an individual's potential copies of a DNA locus, which may result from inheritance of a substantially identical deletion variant from both parents; or from the inheritance of different but overlapping deletions from one's parents; or from the combined effect of an inherited deletion and a subsequent, de novo mutation that removes that remaining intact copy of a DNA locus. For an autosomal DNA locus, or for an X-chromosome DNA locus in females, a deletion variant in all copies of the gene means a deletion of the DNA locus on both chromosomes. For a sex-chromosome locus in males, a deletion variant in all copies of the gene means a deletion of the only copy of that locus. For example, in the CYP2A6 gene, there is more than one deletion allele of the same locus present in the population that leads to the complete deletion of the DNA locus.


By “deletion variant antigen” is meant an antigen that is encoded by a gene with a “deletion variant” which, when present, prevents expression of the antigen. Preferably, a deletion variant antigen is not an HLA, MHC antigen, or Rh factor. For example, the antigens encoded by UGT2B17, UGT2B28, TRY6, LCE3C, GSTM1, GSTT1, CYP2A6, PRB1, OR51A2, ORF4F5, GNB1L, MGAM, or MCEE are considered deletion variant antigens because, when the deletion variant is present, expression of the antigen is prevented.


By “deletion variant pattern” is meant a compilation of the determination of the presence or absence of deletion variants present in one or more genes in a biological sample. Deletion variant patterns can be determined at the nucleic acid sequence level or at the antigen expression level using any standard method for nucleic acid sequence determination or antigen expression detection known in the art or described herein. The deletion variant pattern can be determined for one gene, two genes, three or more genes, a genomic locus, a chromosome, or an entire genome for a subject sample. The deletion variant pattern can also be determined for one or more deletion variant antigens. A deletion variant pattern identified for one gene, two genes, three or more genes, a genomic locus, a chromosome, an entire genome, or an antigen for one subject sample can be compared to a deletion variant pattern for the same one gene, two genes, three or more genes, a genomic loci, specified genomic loci, a chromosome, an entire genome, or an antigen identified for a second subject sample. The two patterns are said to be substantially identical if they are more than 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more identical over the one gene, two genes, three or more genes, genomic loci, chromosome, entire genome, or antigen compared. Two subjects with a substantially identical deletion variant pattern are said to be immunocompatible. Deletion variant patterns can be compared over an entire region or only for genes or genomic loci that are relevant to the organ or tissue for which immunocompatibility is desired.


By “deletion variant typing” is meant the process of determining the presence or absence of a deletion variant, preferably a common deletion variant, in a nucleic acid encoding an antigen. Deletion variant typing may or may not be used in combination with HLA typing.


By “deletion variant antigen typing” is meant the process of determining the presence or absence of a deletion variant antigen encoded by a gene having a deletion variant, preferably a common deletion variant. Deletion variant typing may or may not be used in combination with HLA typing.


By “deletion mismatch locus” is meant the absence of a genetic locus from the genome, or subset thereof, of one sample that is not absent (i.e. not homozygous deleted) in the genome, or subset thereof of another sample. Generally, the absence of the genetic locus is due to the presence of a deletion variant in all copies of that locus (i.e., a homozygous deletion).


By “donor” is meant a mammal, preferably a human, from whom an organ or a tissue is removed. The mammal may be alive or dead at the time the organ or tissue is removed. By “potential donor” is meant an individual who is identified as having an organ or tissue suitable for transplant. Generally, a potential donor will be free of disease affecting the organ or tissue to be transplanted. For example, a potential liver donor will generally have a healthy liver and be free of liver cancer, cirrhosis, sepsis, or infection with hepatitis A, B, or C virus or human immunodeficiency virus. A potential bone marrow or peripheral blood donor will generally be free of viral infection, blood cancer, or any type of hematologic disorders. A “preferred donor” is a donor that is matched to a recipient either by standard methods known in the art, such as blood typing, HLA typing, or by the methods described herein, or a combination thereof. Donors can be obtained from a registry of potential donors such as the National Cord Blood Program, United Network for Organ Sharing, National Marrow Donor Program, and any other public or private international, national, state, or local organ procurement organizations or organ donor registries. Information pertaining to potential donors can be entered into a database including name, age, sex, race, blood type, HLA type, and deletion variant typing, deletion variant antigen typing, or deletion variant pattern.


By “donor/recipient match” is meant a donor and a recipient that are identified as having (donor) and needing (recipient) the same organ, tissue, blood, or bone marrow and are immunocompatible. Donor/recipient matches need not be a perfect match but may have sufficiently matched criteria (e.g., blood type, HLA type, antigen type), which can be determined by the skilled artisan or the transplant physician. Preferably, a donor/recipient match will have the same blood type and will be identical for at least 1 deletion variant antigen, preferably 2 or more, 3 or more, 4 or more, 5 or more, and most preferably all of the deletion variant antigens for the biological sample being tested. A donor/recipient match will also preferably have an identical pattern for at least one HLA allele, preferably 2 or more, 3 or more, 4 or more, 5 or more, or all 6 commonly tested HLA alleles (e.g., 2 each for HLA-A, HLA-B, and HLA-DR). Donor/recipient matches can be further screened using additional medical criteria such as size of organ and urgency of need of organ, as well as geographic criteria and other health considerations.


By “expression” is meant the production by cells of a gene or polypeptide detectable by standard art known methods. For example, polypeptide expression is often detected by immunological methods, DNA expression is often detected by Southern blotting or polymerase chain reaction (PCR), and RNA expression is often detected by northern blotting, PCR, or RNAse protection assays.


By “gene” is meant a nucleic acid (e.g., DNA) sequence that comprises coding sequences necessary for the production of a polypeptide, precursor, or RNA (e.g., mRNA, rRNA, tRNA), as well as regulatory sequences that promote or restrict the expression of that gene. The term encompasses the coding region and the sequences located adjacent to the coding region on both the 5′ and 3′ ends. Sequences which regulate the expression of a gene's coding sequence are typically located close (e.g., within a distance of about 10 kb) to the coding sequence and are frequently called “promoter elements.” Sequences located 5′ of the coding region and present on the mRNA are referred to as 5′ non-translated sequences. Sequences located 3′ or downstream of the coding region and present on the mRNA are referred to as 3′ non-translated sequences. The term “gene” encompasses both cDNA and genomic forms of a gene. A genomic form contains the coding region (“exons”) interrupted with non-coding sequences termed “introns” or “intervening regions” or “intervening sequences.” Introns are segments of a gene that are transcribed into nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Exons are the segments of the DNA that encode the polypeptide. Introns are removed or “spliced out” from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide. As used herein, the term “nucleic acid” means a polynucleotide such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) and encompasses both single-stranded and double-stranded nucleic acid. Total genomic DNA is a particularly useful nucleic acid with which to practice a method of the invention. When detecting a polymorphism in a coding region, mRNA or cDNA are also useful.


By “genome” is meant the complete genetic content of an organism. The genome includes both the genes and the non-coding sequences. By “a subset of the whole genome” is meant a substantial portion of the genome. For example, chromosomal DNA is a preferred subset of the whole genome. In another example, the DNA sequences encoding proteins is a preferred subset of the whole genome. In another example, the DNA sequences encoding proteins that are known to be expressed in a particular organ or tissue type of interest is a preferred subset of the whole genome. In another example, the DNA sequences encoding protein sequences that are known to be presented by the MHC or to elicit antibody responses are a preferred subset of the genome.


By “hematologic disorder” is meant any abnormal condition of any type of blood cell including erythrocytes (red blood cells), platelets, leukocytes, monocytes, granulocytes, lymphocytes. Examples of diseases of the blood include cancers such as Hodgkin's lymphoma, non-Hodgkin's lymphoma, leukemia, multiple myeloma, and myelodysplastic syndrome. Also included are diseases of the immune system, aplastic anemia (when bone marrow stops producing new blood cells), inherited diseases of the bone marrow such as sickle cell anemia, and some metabolic diseases.


By “hybridize” is meant pair to form a double-stranded molecule between complementary polynucleotide sequences, or portions thereof, under various conditions of stringency. (See, e.g., Wahl and Berger (1987) Methods Enzymol. 152:399; Kimmel, Methods Enzymol. 152:507, 1987.) For example, stringent salt concentration will ordinarily be less than about 750 mM NaCl and 75 mM trisodium citrate, preferably less than about 500 mM NaCl and 50 mM trisodium citrate, and most preferably less than about 250 mM NaCl and 25 mM trisodium citrate. Low stringency hybridization can be obtained in the absence of organic solvent, e.g., formamide, while high stringency hybridization can be obtained in the presence of at least about 35% formamide, and most preferably at least about 50% formamide. Stringent temperature conditions will ordinarily include temperatures of at least about 30° C., more preferably of at least about 37° C., and most preferably of at least about 42° C. Varying additional parameters, such as hybridization time, the concentration of detergent, e.g., sodium dodecyl sulfate (SDS), and the inclusion or exclusion of carrier DNA, are well known to those skilled in the art. Various levels of stringency are accomplished by combining these various conditions as needed. In a preferred embodiment, hybridization will occur at 30° C. in 750 mM NaCl, 75 mM trisodium citrate, and 1% SDS. In a more preferred embodiment, hybridization will occur at 37° C. in 500 mM NaCl, 50 mM trisodium citrate, 1% SDS, 35% formamide, and 100 μg/ml denatured salmon sperm DNA (ssDNA). In a most preferred embodiment, hybridization will occur at 42° C. in 250 mM NaCl, 25 mM trisodium citrate, 1% SDS, 50% formamide, and 200 μg/ml ssDNA. Useful variations on these conditions will be readily apparent to those skilled in the art.


For most applications, washing steps that follow hybridization will also vary in stringency. Wash stringency conditions can be defined by salt concentration and by temperature. As above, wash stringency can be increased by decreasing salt concentration or by increasing temperature. For example, stringent salt concentration for the wash steps will preferably be less than about 30 mM NaCl and 3 mM trisodium citrate, and most preferably less than about 15 mM NaCl and 1.5 mM trisodium citrate. Stringent temperature conditions for the wash steps will ordinarily include a temperature of at least about 25° C., more preferably of at least about 42° C., and most preferably of at least about 68° C. In a preferred embodiment, wash steps will occur at 25° C. in 30 mM NaCl, 3 mM trisodium citrate, and 0.1% SDS. In a more preferred embodiment, wash steps will occur at 42° C. in 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS. In a most preferred embodiment, wash steps will occur at 68° C. in 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS. Additional variations on these conditions will be readily apparent to those skilled in the art. Hybridization techniques are well known to those skilled in the art and are described, for example, in Benton and Davis (Science 196:180, 1977); Grunstein and Hogness (Proc. Natl. Acad. Sci., USA 72:3961, 1975); Ausubel et al. (Current Protocols in Molecular Biology, Wiley Interscience, New York, 2001); Berger and Kimmel (Guide to Molecular Cloning Techniques, 1987, Academic Press, New York); and Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York.


By “immunocompatibility,” “immunological compatibility,” or “histocompatibility” is meant a condition in which the cells or tissue of one subject do not elicit an immune response by the immune system of another subject. Generally, immunocompatibility is measured by determining the presence of antigens in the cells or tissue of one subject that are absent in the cells or tissue of another subject and would cause the second subject to elicit an immune response against the antigen(s). Examples of such antigens known in the art include the glycosyltransferase enzyme that modifies the carbohydrate content of the red blood cell antigens and determines the blood type of an individual (e.g., Type A, B, AB, or O), HLA antigens, and the Rh antigen. Immunocompatibility can be absolute or relative to another individual based on the number of antigens tested and found in the subjects tested. For example, if a first subject has the same blood type, Rh factor and 3 out of 6 HLA antigens that are identical to one individual and the same blood type, Rh factor antigen, and 5 out of 6 HLA antigens that are identical to the second individual, the first subject is said to be more immunocompatible with the second individual than with the first individual.


By “major histocompatibility complex” or “MHC” is meant a complex of genes encoding cell surface molecules that are required for antigen presentation to T cells. The MHC is a large genomic region or gene family found in most vertebrates containing many genes with important immune system roles. In humans, the MHC is also referred to as the Human Leukocyte Antigen (HLA) and spans almost 4 megabases of chromosome 6. The strongest antigens of the MHC are separated into two classes—class I and class II. Class I and II MHC molecules are found in nearly every cell in the body and are the major determinants used by the body's immune system for recognition and differentiation of self from non-self. The class I MHC molecules are encoded by three loci—HLA A, HLA B, and HLA C—and class II MHC molecules are encoded by three loci—HLA DR, HLA DP, and HLA DQ.


By “polymorphism” is meant the occurrence of different forms, stages, or types in individual organisms or in organisms of the same species, independent of sexual variations, for example, the DNA sequence variations that occur when a single nucleotide (A, T, C, or G) in the genome sequence is altered. One example of a polymorphism is a single nucleotide polymorphism (SNP).


By “predicting the immunocompatibility” is meant determining or identifying the genetic similarities between two individuals or between an individual and a cell, tissue, or organ to be transplanted into that individual.


By “recipient” is meant a mammal, preferably a human, in need of an organ or a tissue transplant. Recipients can also be entered into a registry or a waiting list of subjects in need of an organ or tissue transplant. Information pertaining to recipients that can be entered into a database includes name, age, sex, race, blood type, HLA tissue type, geographic location, and urgency of the needed organ or tissue donation.


By “subject” is meant a mammal, including, but not limited to, a human or non-human mammal, such as a bovine, equine, canine, ovine, or feline.


By “substantially prevents expression” is meant to cause a reduction in the expression of a gene or antigen by at least 50%, 60%, 70%, 80%, 90%, 95%, 99%, or 100% when compared to the expression of the gene or antigen in a sample that does not have a deletion variant in the gene or a deletion variant antigen. The term “substantially prevents expression” also includes a loss or reduction in the expression of a gene or antigen spatially or temporally during development when compared to the expression of the gene or antigen in a sample that does not have a deletion variant in the gene or a deletion variant antigen.


By “tolerize” is meant providing an antigen or nucleic acid sequence encoding an antigen to an individual to reduce or prevent antigen-specific immune responses.


By “transplantation” is meant the transfer of cells, tissues, blood, bone marrow, or organs from one area of the body to another area of the body or from one organism to another. Allogeneic transplantation refers to transplantation between genetically different members of the same species. Nearly all organ and bone marrow transplants are allografts. These may be between brothers and sisters, parents and children, or between donors and recipients who are not related to each other. Autologous transplantation refers to transplantation of an organism's own cell or tissues; autologous transplantation may be used to repair or replace damaged tissue; autologous bone marrow transplantation permits the usage of more severe and toxic cancer therapies by replacing bone marrow damaged by the treatment with marrow that was removed and stored prior to treatment. By xenogenic transplantation is meant transplantation between members of different species; for example, the transplantation of animal organs into humans. Transplantation can refer to the transfer of a healthy organ or tissue such as liver, kidney, heart, pancreas, skin, lungs, and cornea. Transplantation can also refer to the transfer or replacement of blood or bone marrow, for example in as bone marrow transplant (BMT), umbilical cord blood, or peripheral blood stem cell transplant (PBSCT), where diseased blood cells or stem cells can be restored or replaced.


We have discovered a number of common deletion variants in genes that encode for antigens expressed in tissues relevant to immunocompatibility. The conservation of these common deletion variants among multiple individuals, the presence of the antigens encoded by these polymorphic genes in relevant tissues, and the ability of the antigen to elicit an immune response, makes them ideal candidates for screening methods that determine immunocompatibility between two subjects in any situation where compatibility or histocompatibility is desired.


Other features and advantages of the invention will be apparent from the following Detailed Description, the drawings, and the claims.




BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a schematic diagram showing the use of SNP genotypes to discover segregating deletion variants. Segregating deletions leave a “footprint” in SNP genotype data by causing physically clustered patterns of null genotypes, apparent Mendelian inconsistencies, and apparent Hardy-Weinberg disequilibrium.



FIGS. 2A-2E show the spatially patterned aberrations in SNP genotypes. FIG. 2A is graph based on pairs of HapMap SNP markers that were typed using different genotyping technologies and showing the more frequent appearance of Mendelian-inconsistent SNP genotypes (“Mendel failures”) at genomic locations close to other Mendel failures when those earlier failures are observed in the same individuals (open circles) but not when they are observed in other individuals (filled circles). FIG. 2B is a graph showing the clustering of population patterns of null genotypes. FIGS. 2C and 2D show the spatially patterned failure of SNP genotype assays at the site of segregating deletions. Tracks show, for each SNP assay (triangles), the pattern of null genotypes across 90 individuals (green track), the pattern of Mendel failure across 60 pairs of relatives (blue track), and the ratio of observed to expected heterozygosity (red track). Physically clustered sets of similarly aberrant genotypes identify a common, 85 kb segregating deletion on chromosome 3q (FIG. 2C) and a common, 10 kb segregating deletion on chromosome 7q (FIG. 2D), both in a sample of 30 trios with European ancestry. FIG. 2E is a graph showing the size distribution of deletion variants identified from regional patterns of aberrant SNP genotypes. A few deletions larger than 100 kb (up to 845 kb) were also observed.



FIGS. 3A-3D show the existence of segregating deletions at the sites of clusters of aberrant genotypes. FIG. 3A is a series of photomicrographs showing fluorescent in situ hybridization (FISH) confirmation of the presence and Mendelian inheritance of an 85-kb deletion at chr4q13.2 at 70.4 MB. FIG. 3B is a graph showing two color-allele-specific fluorescence intensity measurements for a SNP underneath a common deletion on chr4 at 69.5 MB. The measurements show extra genotype clusters (beyond the 2-3 clusters typically observed for SNPs), corresponding to individuals who were subsequently determined to carry hemizygous and homozygous deletions of the locus. FIG. 3C is a series of photographs of gels showing confirmation by PCR of a predicted population pattern of homozygous deletion of sequence on chr8p23.3 at 2.4 MB. Yellow arrows indicate the individuals predicted (from having multiple null genotypes at the locus) to carry homozygous deletions. FIG. 3D is a graph showing measurements of copy number obtained by quantitative PCR (shown here for a deletion on chr4 at 70.5 MB) fall into three discrete clusters, allowing accurate inference of the deletion genotype in each individual.



FIG. 4 is a series of graphs showing inter-individual variation in gene expression due to gene copy number variation. Each graph shows the measure expression level of each gene (Monks et al., Am. J Hum. Genet. 75:1094-1105(2004)) in lymphoblastoid cell lines from individuals who were determined by quantitative PCR to have 0, 1, and 2 gene copies.



FIGS. 5A-5C show linkage disequilibrium of deletion variants with SNPs. FIG. 5A is a series of graphs showing linkage disequilibrium (r2) of gene deletion polymorphisms with SNPs. For each gene deletion, strong linkage disequilibrium is observed with SNPs to the left and right of the deletion breakpoints (red dotted lines). FIG. 5B is an image generated using the Bifurcator program (Fry in “Computational Information Design,” Doctoral thesis, MIT, Cambridge Mass., 2005) showing the residence of the UGT2B28 deletion allele on the same core haplotype in European (CEU) and Yoruba (YRI) populations. Letters indicate the consensus haplotype in each population. FIG. 5C is a graph showing haplotype homozygosity across flanking SNPs in individuals homozygous for 51 experimentally validated deletions (red); in randomly selected control individuals at the same deletion loci (black); in individuals homozygous for a frequency- and population-matched set of SNP variants (blue); and in randomly selected control individuals at these SNP loci (yellow).



FIGS. 6A-6D show physical clustering of patterns of apparent Mendelian inconsistency and null genotypes in the HapMap data. FIG. 6A shows “Mendel failure profiles.” Binary patterns of apparent Mendelian inconsistency across the 60 relative-pairs in a population, are more likely to be observed in the proximity of similar profiles at nearby SNPs. FIG. 6B shows “null genotype profiles.” Binary patterns of null genotypes across the 90 individuals in a population, are more likely to be observed in the proximity of similar profiles at nearby SNPs. FIG. 6C shows clustering p-values for Mendel failure profiles that show a generally uniform distribution with an excess of extremely low p-values from which candidate deletion variants were identified. FIG. 6D shows clustering p-values for null genotype profiles that show a generally uniform distribution, with an excess of extremely low p-values from which candidate variants were identified.



FIGS. 7A-7F show the linkage disequilibrium (r2) between gene deletion polymorphisms and nearby SNPs in three population samples. The predicted locations of the deletion breakpoints are shown by dotted lines. FIG. 7A shows the linkage disequilibrium of TRY6. FIG. 7B shows the linkage disequilibrium of LCE3C. FIG. 7C shows the linkage disequilibrium of UGT2B28. FIG. 7D shows the linkage disequilibrium of UGT2B17. FIG. 7E shows the linkage disequilibrium of GSTM1. FIG. 7F shows the linkage disequilibrium of GSTT1.




DETAILED DESCRIPTION

The genetic sequences of different people are remarkably similar. When the chromosomes of two humans are compared, their DNA sequences can be identical for hundreds of bases. But at about one in every 1,200 bases, on average, the sequences will differ. Differences in individual bases are by far the most common type of genetic variation. These genetic differences are known as single nucleotide polymorphisms, or SNPs. The International HapMap Project is focused on identifying the basis for a large fraction of the genetic diversity in the human species by identifying most of the approximately 10 million SNPs estimated to occur commonly in the human genome.


For geneticists, SNPs act as markers to locate genes in DNA sequences. However, testing all of the 10 million common SNPs in a person's chromosomes would be extremely expensive. The development of the HapMap is a global collaboration designed to enable geneticists to take advantage of how SNPs and other genetic variants are organized on chromosomes. Genetic variants that are near each other tend to be inherited together. For example, all of the people who have an adenine rather than a guanine at a particular location in a chromosome can have identical genetic variants at other SNPs in the chromosomal region surrounding the adenine. These regions of linked variants are known as haplotypes.


In many parts of the human chromosomes, just a handful of haplotypes are found. For example, in a given population, 55% of people may have one version of a haplotype, 30% may have another, 8% may have a third, and the rest may have a variety of less common haplotypes. The International HapMap Project is identifying these common haplotypes in four populations from different parts of the world.


One type of human genetic variation consists of deletion variants—segments of the human genome that are present in some individuals and absent in others. The locations of common deletion variants in the human genome are largely unknown, as is the best way to determine the association of such variants with disease. To address these questions, we developed an approach for using the HapMap to discover, localize, and analyze common deletion variants. We found hundreds of deletion variants, 1 kb-745 kb in size, including common deletion variants that were observed as homozygous deletions in a number of expressed genes that are specifically expressed in organs relevant to transplantation, such as liver, prostate, kidney, intestine, and skin. These common deletion variants prevent the expression of the protein, or antigen, encoded by these genes.


The present invention features methods for identifying immunocompatible subjects by determining the presence or absence of deletion variants, preferably a deletion variant in all copies of the gene, that substantially prevents expression of either the gene or the antigen encoded by a gene. The lack of expression of the gene or of an antigen encoded by the gene, respectively, is used to identify subjects or subject samples that are immunocompatible. Screening subjects for immunocompatibility is used, for example, to identify donor/recipient matches for transplantation, to identify maternal/fetal compatibility issues in prospective parents, and to identify bone marrow donors that are not immunocompatible with a recipient and can be used to provoke an immune response to the tumor cells in a recipient having a blood cancer. Therefore, the present invention provides methods for immunocompatibility typing which can be used alone or together with previously known typing techniques to manage, measure, prevent, and provoke histoincompatibility.


Identification of Common Deletion Variants


We used data from the International HapMap Project, including about 1.3 million SNP assays in 270 individuals of European, Yoruban, and Chinese and Japanese ancestry, to identify clusters of regionally aberrant genotype patterns (see Examples below). We validated the presence of polymorphic deletions by fluorescence in situ hybridization (FISH), fluorescence allelic-intensity measurements, and PCR. Altogether, more than 80 common deletions were validated by one or more of these approaches


The deletion alleles were linked to the same SNP alleles in each population, suggesting that each deletion derived from an ancestral mutation that occurred before humans migrated from Africa to Europe and Asia. The observed levels of linkage disequilibrium indicates that these common deletion variants are highly conserved among individuals and that SNPs can be used to discover, analyze, and serve as markers for these variants.


Thirteen protein-coding genes were disrupted or entirely removed by common deletions (Table 1). These common deletion variants were found in multiple genes with roles in drug response, olfaction, and sex steroid metabolism. To learn more about these common gene deletion variants, we developed quantitative PCR assays for distinguishing individuals with 0, 1, and 2 gene copies (FIG. 3A), and used these assays to type seven gene deletion variants in all the HapMap individuals. The resulting genotypes showed Mendelian inheritance, Hardy-Weinberg equilibrium, and stable transmission rates, suggesting that they behave as stable, segregating, germline genetic variants. The deletion variants were observed in individuals of European, Yoruban, and Chinese and Japanese ancestry, though the frequency of each deletion haplotype varied from population to population (Table 1).

TABLE 1Common Deletion Variants.Population frequency ofLinkage disequilibriumdeletion variant(tagging SNP R2)ExonsChinese/Chinese/GeneFunctionExpressiondeletedEuropeanJapaneseYorubaEuropeanJapaneseYorubaUGT2B17Sex steroid hormoneliver, prostateall30%84%22%1.000.960.63metabolismUGT2B28Sex steroid hormoneliver, kidneyall13%15%35%1.001.000.90metabolismTRY6Proteolysisnot knownall41%74%12%1.001.001.00LCE3CEpidermal cornifiedinternalall56%69%30%0.931.000.92envelopeepitheliaGSTM1Detoxification, drugliverall76%70%48%0.660.950.87metabolismGSTT1Detoxification, drugglands,all38%63%62%0.850.610.38metabolismkidneyCYP2A6Detoxification, drugliverall25%55%18%0.220.470.09metabolismPRB1Secreted salivarysalivary#1-2 of 4N.D.N.D.N.D.N.D.N.D.N.D.proteoglycangland, tracheaOR51A2Olfactory receptorolfactoryall50%19%28%0.400.510.20epitheliumOR4F5Olfactory receptorolfactoryallN.D.N.D.N.D.N.D.N.D.N.D.epitheliumGNB1LGuanine nucleotideheartallN.D.N.D.N.D.N.D.N.D.N.D.bindingMCEEMethylmalonyl CoAvarious#3 of 3N.D.N.D.N.D.N.D.N.D.N.D.epimeraseMGAMMaltase glucoamylaseN.D.N.D.N.D.N.D.N.D.N.D.N.D.N.D.
N.D. = no data


These common deletion variants were detected in several tissues including liver, prostate, kidney, heart, and skin, all of which are important to immunocompatibility and transplantation. The expression products from these genes, also known as antigens, are absent in an individual having the common deletion variant. As a result, the immune system of that individual would not be exposed to the antigen, and, if exposed, would recognize the antigen as foreign and respond by mounting an immune response to the antigen. The conservation of these common deletion variants, particularly among people of a shared ancestry, and the ability of the encoded antigens to elicit an immune response indicates that these common deletion variants and the encoded antigens are an effective tool for screening individuals for immunocompatibility, particularly with respect to the organs or tissues in which the antigen is expressed.


Methods for the use of common deletion variants, for example, those identified in Table 1, to screen for and manage immunocompatibility are described in detail below. It should be understood by the skilled artisan that any deletion variant, particularly a common deletion variant, that affects the expression of any antigen can be used in the methods described herein to screen for and manage immunocompatibility or to provoke histoincompatibility, if desired. In addition, it should be understood that any methods for sequencing all or part of a subject's genome, or determining the deletion variant pattern or deletion variant antigen pattern for a subject can be used to identify additional deletion variants in expressed antigens and to screen for and manage immunocompatibility.


Methods for Deletion Variant Typing


Individual subjects can be typed for the presence or absence of common deletion variants in a biological sample using methods for detection of the deletion in the gene or methods for detection of the antigen encoded by the gene. The biological sample used to detect the gene or protein can be any biological material from the subject (e.g., the graft recipient, potential donor, mother, fetus, father, or prospective parent) that contains the antigen or nucleic acids encoding the antigen. For detection of deletion variant antigens, the biological sample is preferably a sample in which the antigen is normally expressed. Desirably the biological material is a bodily fluid, such as blood, serum, plasma, amniotic fluid, cerebrospinal fluid, saliva, urine, and semen, or a cell or tissue in which the antigen or nucleic acid encoding the antigen is expressed. In the case of an organ transplant, the biological sample is desirably a biopsy of the organ to be transplanted and the antigen or nucleic acid encoding the antigen is expressed in the organ. In the case of a bone marrow or peripheral blood transplant, the biological sample is preferably blood, serum, or plasma in which the antigen or nucleic acid encoding the antigen is expressed.


Methods of detecting deletion variants in a nucleic acid are well known to those skilled in the art. In one example, polymerase chain reaction (PCR) can be used to detect a deletion variant in a nucleic acid. Oligonucleotide PCR primers that flank a known deletion polymorphism can be used to amplify genomic DNA spanning the deletion breakpoints in individuals carrying the deletion allele; alternatively, oligonucleotide primers inside the deleted sequence can be used to amplify genomic DNA selectively in individuals carrying the other (non-deletion) allele. The amplified genomic DNA can then be sequenced, analyzed by fluorescence quantitation, resolved on a gel, or otherwise analyzed, and the presence or absence of a deletion variant can be determined. These PCR-based methods can be combined to identify individuals carrying 0, 1, or 2 copies of the deletion allele. Furthermore, quantitative PCR can be used to compare the abundance of a polymorphically deleted locus to the abundance of a control locus, and thereby infer copy number, and thereby infer the deletion status of an individual. Methods for PCR amplifying and sequencing a nucleic acid molecule are well known to those skilled in the art (see, for example, Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Press, Plainview, N.Y. (1989); Ausubel et al., Current Protocols in Molecular Biology (Supplement 47), John Wiley & Sons, New York (1999); Dieffenbach and Dveksler, PCR Primer: A Laboratory Manual, Cold Spring Harbor Press (1995)). The following are examples of PCR primers and quantitative fluorescent probes which we have used to successfully genotype deletion polymorphisms in DNA samples from individuals:

PMP22 (control)primer1CCCTTCTCAGCGGTGTCATC(SEQ ID NO: 1)primer2ACAGACCGTCTGGGCGC(SEQ ID NO: 2)probeVIC-TTCGCGTTTCCGCAAGAT-(SEQ ID NO: 3)MGBNFQ
Note:

VIC is the fluorescent label commonly known as “VIC” (available, for example, from Applied Biosystems) and MGBNFQ is a non-fluorescent quencher molecule (available, for example, from Applied Biosystems).















GSTM1





primer1
CTGTGTCCACCTGCATTCG
(SEQ ID NO: 4)





primer2
GAGACCGGGCACTCACTGT
(SEQ ID NO: 5)





probe
6FAM-TCAGTCCTGCCATGAGCAGGC-
(SEQ ID NO: 6)



BHQ1




Note:





6FAM is the fluorescent label commonly known as “6FAM” (available, for example, from IDT) and BHQ-1 is a non-fluorescent quencher molecule (available, for example, from IDT).




















GSTT1





primer1
GGGATGGAAAGTCACGTCCT
(SEQ ID NO: 7)





primer2
AGAGACTGGGACAGCGTCAA
(SEQ ID NO: 8)





probe
6FAM-CAGAATCTCAGCAGCTGGGCC
(SEQ ID NO: 9)



A-BHQ1





CYP2A6


primer1
AGGATGGGGACTTTTCCTTT
(SEQ ID NO: 10)





primer2
TCCTCATCTTCAGCTGTTGG
(SEQ ID NO: 11)





probe
6FAM-CATTCAGGATTCTGGGCTTGC
(SEQ ID NO: 12)



TCC-BHQ1





OR51A2


primer1
TGCCAATTGCCTACTGTTTG
(SEQ ID NO: 13)





primer2
AGCAACAGTGGAAGGAGAGAA
(SEQ ID NO: 14)





probe
6FAM-GACAACATAACCAAGTGGGGC
(SEQ ID NO: 15)



TTATTTTC-BHQ1





PRB1


primer1
TGAAGGGACCTCAGTAGTTGG
(SEQ ID NO: 16)





primer2
TGACAGGCATGGTTCTTCTG
(SEQ ID NO: 17)





probe
6FAM-CTGACTTTCTAGCAAGG-
(SEQ ID NO: 18)



MGBNFQ





UGT2B17
Applied Biosystems



Hs00854486_sH





UGT2B28
Applied Biosystems



Hs00852540_s1





LCE3C
Applied Biosystems



Hs00708773_s1







Sequence analysis, which is any manual or automated process by which the order of nucleotides in a nucleic acid is determined, also can be useful for determining the presence or absence of a common deletion variant. It is understood that the term sequence analysis encompasses chemical (Maxam-Gilbert) and dideoxy enzymatic (Sanger) sequencing as well as variations thereof. Thus, the term sequence analysis includes capillary array DNA sequencing, which relies on capillary electrophoresis and laser-induced fluorescence detection and can be performed using, for example, the MegaBACE 1000 or ABI 3700. Also encompassed by the term sequence analysis are thermal cycle sequencing (Sears et al., Biotechniques 13:626-633 (1992)); solid-phase sequencing (Zimmerman et al., Methods Mol. Cell Biol. 3:39-42 (1992)) and sequencing with mass spectrometry such as matrix-assisted laser desorption/ionization time-of-flight mass spectrometry MALDI-TOF MS (Fu et al., Nature Biotech. 16: 381-384 (1998)). Sequence analysis can be used to determine the sequence of a particular genetic loci known to have a common deletion variant, an entire gene known to contain a common deletion variant, a chromosome, or the entire genome of a subject. The term sequence analysis also includes, for example, sequencing by hybridization (SBH), which relies on an array of all possible short oligonucleotides to identify a segment of sequences present in an unknown DNA (Chee et al., Science 274:61-614 (1996); Drmanac et al., Science 260:1649-1652 (1993); Drmanac et al., Nature Biotech. 16:54-58 (1998), Margulies et al., Nature 437:376-380 (2005) and Bentley, Curr. Opin. Genet. Dev. 16:545-552 (2006)). The whole genome approach to typing individual subjects for the presence or absence of common deletion variants is described in detail below.


Other methods for detecting the presence or absence of a deletion variant include electrophoretic analysis and restriction fragment length polymorphism (RFLP) analysis. Electrophoretic analysis, as used herein in reference to one or more nucleic acid molecules such as amplified fragments, means a process whereby charged molecules are moved through a stationary medium under the influence of an electric field. Electrophoretic migration separates nucleic acid molecules primarily on the basis of their charge, which is in proportion to their size. The term electrophoretic analysis includes analysis using both slab gel electrophoresis, such as agarose or polyacrylamide gel electrophoresis, and capillary electrophoresis. Capillary electrophoretic analysis is generally performed inside a small-diameter (50-100-μm) quartz capillary in the presence of high (kilovolt-level) separating voltages with separation times of a few minutes. Using capillary electrophoretic analysis, nucleic acids are conveniently detected by UV absorption or fluorescent labeling, and single-base resolution can be obtained on fragments up to several hundred base pairs. Such methods of electrophoretic analysis, and variants thereof, are well known in the art, as described, for example, in Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, Inc. New York (1999).


Restriction fragment length polymorphism (RFLP) analysis also can be useful for determining the presence or absence of a deletion variant (Jarcho et al., in Current Protocols in Human Genetics, Dracopoli et al., eds., pages 2.7.1-2.7.5, John Wiley & Sons, New York (1994); Innis et al., (Ed.), PCR Protocols, San Diego: Academic Press, Inc. (1990)). As used herein, restriction fragment length polymorphism analysis means any method for distinguishing genetic polymorphisms using a restriction enzyme, which is an endonuclease that catalyzes the degradation of nucleic acid and recognizes a specific base sequence, generally a palindrome or inverted repeat. One skilled in the art understands that the use of RFLP analysis depends upon an enzyme that can differentiate two alleles at a polymorphic site. For example, if the restriction enzyme recognizes a specific base sequence that is present in the nucleic acid sequence containing the deletion variant, then a subject having the deletion variant would not have cleavage at that restriction enzyme site and would therefore produce a different enzymatic cleavage pattern than a subject lacking the deletion variant and having the restriction enzyme site.


Other methods for detecting the presence or absence of a deletion variant at a polymorphic site include allele-specific oligonucleotide (ASO) hybridization. Allele-specific oligonucleotide hybridization is based on the use of a labeled oligonucleotide probe having a sequence perfectly complementary, for example, to a known or predicted deletion variant site. A heteroduplex mobility assay (HMA) is another well-known assay that can be used to detect a common deletion variant according to a method of the invention. HMA is useful for detecting the presence of a polymorphic sequence since a DNA duplex carrying a mismatch has reduced mobility in a polyacrylamide gel compared to the mobility of a perfectly base-paired duplex (Delwart et al., Science 262:1257-1261 (1993); White et al., Genomics 12:301-306 (1992)).


The technique of single strand conformational polymorphism (SSCP) can also be used to detect the presence or absence of a deletion variant (see Hayashi, PCR Methods Applic. 1:34-38 (1991)). This technique can be used to detect deletions based on differences in the secondary structure of single-strand DNA that produce an altered electrophoretic mobility upon non-denaturing gel electrophoresis. Polymorphic fragments are detected by comparison of the electrophoretic pattern of the test fragment to corresponding standard fragments containing known alleles.


SNP genotyping can also be used to detect the presence or absence of a deletion variant. We have observed that common deletion polymorphisms are generally in linkage disequilibrium with nearby SNPs, which suggests that specific SNP genotyping assays could be used to indirectly detect a deletion polymorphism. In this technique, a SNP that is known to be in linkage disequilibrium with a deletion polymorphism, such that individuals carrying the deletion almost always carry a particular variant of the SNP, is used as a marker for the presence of the deletion. Individuals can be typed for the SNP as a way of indirectly typing for the deletion. Techniques for deriving SNP genotypes include hybridization to allele-specific complementary sequences on microarrays or beads, as well as allele-specific primer extension.


We have further observed that genotyping of a SNP that is inside a deleted region can also be used to infer the presence of a deletion that removes the site of the SNP. In particular, the presence of the deletion causes particular SNP genotyping results, including null genotypes, apparent mendelian inconsistencies, and reductions in intensity measurements. Techniques for deriving SNP genotypes include hybridization to allele-specific complementary sequences on microarrays or beads, as well as allele-specific primer extension.


Denaturing gradient gel electrophoresis (DGGE) also can be used to detect a deletion variant. In DGGE, double-stranded DNA is electrophoresed in a gel containing an increasing concentration of denaturant; double-stranded fragments made up of mismatched alleles have segments that melt more rapidly, causing such fragments to migrate differently as compared to perfectly complementary sequences (Sheffield et al., “Identifying DNA Polymorphisms by Denaturing Gradient Gel Electrophoresis” in Innis et al., supra, 1990).


In addition to using DGGE as described above, other methods to detect heteroduplexes include temperature gradient gel electrophoresis (TGGE), constant denaturant gel electrophoresis (CDGE), and base excision sequence scanning (BESS) (Gupta, The Scientist 13:25-28 (1999)). Other methods include oligonucleotide ligation assay (OLA) in which a PCR-amplified target is hybridized to two oligonucleotides, one tagged, for example, with biotin, and the other with a reporter molecule and then ligated with DNA ligase. If the tag and reporter oligonucleotides are ligated, the tagged molecule can be used to isolate the ligated oligonucleotide and the reporter molecule can be detected.


Other well-known approaches for determining the presence or absence of a deletion variant include automated sequencing and RNAase mismatch techniques (Winter et al., Proc. Natl. Acad. Sci. 82:7575-7579 (1985)). In view of the above, one skilled in the art realizes that the methods of the invention for determining the presence or absence of a deletion variant in an individual can be practiced using any one of the well known assays described above, or another art-recognized assay for genotyping. Furthermore, one skilled in the art understands that individual alleles can be detected by any combination of molecular methods (see, in general, Birren et al. (Eds.) Genome Analysis: A Laboratory Manual Volume 1 (Analyzing DNA) New York, Cold Spring Harbor Laboratory Press (1997)).


Additional methods for determining the presence of deletion variants include fluorescence in situ hybridization (FISH) and fluorescence allelic-intensity measurements, examples of which are described in the Examples below. FISH is used to visualize the presence or absence of DNA sequence on chromosomes, via hybridization of a fluorescent probe to the chromosome in site.


In addition to the above methods for detecting the presence of a known human deletion polymorphism, additional methods, known to those versed in the art, can be used to scan the genome of one individual for deletions of DNA sequences which are present in other individuals. One such method is microarray hybridization, in which DNA from a subject is probed with a microarray of nucleic acids containing human genomic sequences, and the user identifies microarray probes which are not bound by that individual's genomic DNA. Another such method is whole-genome sequencing, in which the DNA from an individual is systematically sequenced. In this application, the practitioner could look for nucleic acid sequences which appear to be absent from that individual's sequence but which are known to be present in other individuals. Another such method is subtractive hybridization, in which two DNA samples are compared by molecular techniques which allow DNA sequences that are present in the first sample to be selectively removed from the second sample, leaving only those DNA sequences that are present in the second sample and not in the first sample. Such an approach could be used to identify genomic loci that were deleted in the individual from whom the first sample was obtained but present in the second individual from which the second sample was obtained.


Methods for detecting the presence or absence of a deletion variant antigen are also well known in the art and include, for example, immunoassays to detect the presence of an antigen in the biological sample of the subject. Polyclonal or monoclonal antibodies specific for each antigen can be used in any standard immunoassay format (e.g., ELISA, sandwich ELISA, Western blot, or RIA; see, e.g., Ausubel et al., supra) to determine the presence of the antigen. Standard methods for enzyme immunoassays can also be used to detect antigens that are present on enzymes, such as GSTM1, GSTT1, UGT2B17, UGT2B28, and CYP2A6. ELISA assays are the preferred method for measuring levels of any one or more of the following antigens: UGT2B17, UGT2B28, TRY6, LCE3C, GSTM1, GSTT1, CYP2A6, PRB1, OR51A2, ORF4F5, GNB1L, MGAM, and MCEE. Particularly preferred, for ease and simplicity of detection, and its quantitative nature, is the sandwich or double antibody ELISA of which a number of variations exist, all of which are contemplated by the present invention. For example, in a typical sandwich ELISA, unlabeled antibody that recognizes the antigen is immobilized on a solid phase, e.g. microtiter plate, and the sample to be tested is added. After a certain period of incubation to allow formation of an antibody-antigen complex, a second antibody, labeled with a reporter molecule capable of inducing a detectable signal, is added and incubation is continued to allow sufficient time for binding with the antigen at a different site, resulting with a formation of a complex of antibody-antigen-labeled antibody. The presence of the antigen is determined by observation of a signal, which may be quantitated by comparison with control samples containing known amounts of antigen.


Immunohistochemical techniques can also be utilized for detection of any of the antigens in a tissue biopsy sample. For example, a tissue sample can be obtained from a subject, sectioned, and stained for the presence of the antigen using an antibody that specifically binds the antigen and any standard detection system (e.g., one that includes a secondary antibody conjugated to an enzyme, such as horseradish peroxidase). General guidance regarding such techniques can be found in, e.g., Bancroft et al., Theory and Practice of Histological Techniques, Churchill Livingstone, 1982 and Ausubel et al., supra).


The methods described herein can be used to detect one or more deletion variants, preferably common deletion variants, in a single gene or in more than one gene. For example, an individual can be typed for the presence of one, two, three, four, five, six or more common deletion variants in nucleic acids encoding one, two, three, four, five, six or more different antigens (e.g., UGT2B17, UGT2B28, TRY6, LCE3C, GSTM1, GSTT1, CYP2A6, PRB1, OR51A2, ORF4F5, GNB1L, MGAM, and MCEE). The methods described herein can be used to detect one or more deletion variant antigens. For example, an individual can be typed for the presence or absence of one, two, three, four, five, six or more deletion variant antigens. While it is preferred that two subjects are a perfect match for each and every deletion variant or deletion variant antigen tested, individuals can be ranked for immunocompatibility depending on the number of matches and the relative importance of the antigen. For example, an individual in need of a liver transplant would seek a donor having a common deletion variant type match at the UGT2B17, UGT2B28, and GSTM1 loci, all of which are expressed in the liver, but may not be matched for common deletion variants at the OR51A2 loci, which is expressed in the olfactory epithelium. Two subjects can also be typed for deletion variant patters or deletion variant antigen patterns in which one or more genes, genomic loci, chromosome, or entire genome is assayed using the methods described herein to determine the presence or absence of deletion variants throughout the one or more genes, genomic loci, chromosome, or entire genome assayed. The information is then compiled into a deletion variant pattern for each subject and can be compared either for overall substantially identical patterns or for substantial identity within a defined set of genes or antigens, e.g., those expressed in an organ or tissue being transplanted. For example, a subject in need of a liver transplant may show deletion variants in 3 genes expressed in the kidney and 1 gene expressed in the liver and a potential donor has a deletion variant in 1 of the same genes expressed in the kidney and the same 1 gene expressed in the liver. The potential liver donor is identified as immunocompatible because of the 100% identity of the deletion variant pattern in the relevant tissue (i.e., the liver).


Methods for Whole Genome Sequence Analysis to Determine Immunocompatibility


As described above, sequence analysis, including any manual or automated process, can be used for determining the presence or absence of a common deletion variant. Such sequence analysis can also be used to analyze the genome, or a subset thereof, of an individual subject and to compare that subject's genome sequence, or subset thereof, to the genome sequence or the same subset thereof, in a second individual or a cell, tissue, or organ from the second individual. This type of whole genome, or subset thereof, sequence analysis can be used to search for or identify a deletion variant that is present in one individual and absent in a second individual. The deletion variant can vary in size from 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 bp, or 2 kb, 3 kb, 4 kb, 5 kb, 7 kb, 8 kb, 9 kb, 10 kb, 100 kb, 200 kb, 300 kb, 400 kb, 500 kb, 600 kb, 700 kb, 745 kb, 800 kb, 900 kb, or 1000 kb in length. A deletion variant present at a particular loci in one individual and absent in a second individual is called a deletion mismatch loci.


The identification of a deletion mismatch loci between two individuals is predictive of histoincompatibility if:


(i) there is a homozygous deletion variant in a loci in the genome of a first subject (e.g., a candidate bone marrow donor) but not in a loci in the genome of the second subject (e.g., a candidate bone marrow recipient, (ii) there is a homozygous deletion in a loci in the genome of a first subject (e.g., a candidate organ recipient) but not in a loci in the genome of the second subject (e.g., a candidate organ donor), or (iii) there is a homozygous deletion in a loci in the genome of a first subject (e.g., a candidate mother) but not in a loci in the genome of the second subject (e.g., the candidate father, embryo, fetus, or miscarriage).


Alternatively or additionally, the sequence of the genome or subset thereof of the first subject can be compared to a reference genome DNA sequence, where the reference genome sequence can be the DNA sequence from a third subject or from a composite of multiple subjects. The identification of a deletion mismatch loci between the first subject and the reference genome DNA sequence is then carried out as described above and used to predict histoincompatibility as described above.


The whole-genome analysis can be performed using any sequencing technique known in the art or described herein. In one example, a whole genome sequencing approach can be used where millions of genome-wide sequence reads are obtained from the patient's DNA. Technologies available for massively parallel sequencing include sequencing by synthesis in arrays such as on fiber optic slides and single-molecule sequencing via nanopores (Margulies et al., Nature 437:376-380 (2005) and Bentley, Curr. Opin. Genet. Dev. 16:545-552 (2006)). Homozygous deletions are identified as loci which are not covered by any sequence reads, despite overall sequencing having been performed at a sufficient depth to have covered all genomic loci present in that individual.


Another technique useful for the whole genome sequence analysis is genomic hybridization. For this method, patient DNA is labeled with a suitable marker (typically a fluorescent molecule) with or without amplification, and hybridized to an array consisting of DNA probes. These probes can consist of oligonucleotides, plasmids, fosmids, or other genomic clones. Deletions are identified from probes for which the patient's DNA fails to yield the appreciable hybridization signal that is normally observed in DNA from other individuals or fails to yield hybridization signal beyond that would be expected from cross-hybridization to other genomic sequences.


Additional techniques for whole genome sequence analysis are described in Bentley, supra, (herein incorporated by reference in its entirety) and include microelectrophoresis and single molecule sequencing.


Immunocompatibility between two subjects can be determined by the identification of deletion mismatch loci, where two subjects would be considered not immunocompatible if there is at least one, two, three, four, five, six, seven, eight, nine, ten or more homozygous deletion mismatch loci identified between the two subjects; or when a scoring system, which combines information across multiple deletion mismatch loci, is determined to have an appropriately high mismatch score. Preferably, the one or more deletion mismatch loci would remove the protein-coding sequences and prevent expression of the encoded antigen in the individual homozygous for the deletion. Alternatively or additionally, a scoring system can be used to determine the relevance of each deletion mismatch locus identified between the two subjects. The scoring system would score each of the homozygous deletion mismatch loci for its potential contribution to antigenicity, and produce a composite score which combines information across all deletion mismatch loci, and potentially combines this with additional information relevant to histocompatibility, such as the subjects' sex and the subjects' HLA types. For example, a scoring system could assign points for deletions which remove protein-coding sequences for which the encoded proteins are generally expressed in tissues relevant to the immune response considered in the clinical application. For example, for kidney transplant, deletion variants in genes encoding proteins which are expressed in the kidney are assigned points. Additional points are awarded if those deletions affect protein-coding sequences which (i) encode peptide sequences known or predicted to be presented by that individual's HLA alleles or (ii) contain sequences which are particularly accessible to antibodies, such as sequences encoding extracellular domains of proteins.


In the scoring system described above, donor-recipient pairs with a high “deletion mismatch score” are interpreted to be more likely to have histoincompatibilities; such a diagnosis might recommend the use of a different donor, or the application of a tolerization regimen, or the further investigation of any particular deletion mismatches identified by this analysis. This further investigation could include testing the relevant donor or patient for pre-existing antibodies or pre-existing T-cell responses to the antigen encoded by the genomic region(s) identified as the deletion loci.


Statistical analysis or metrics for prioritization or comparison of genomic information are known in the art and can be applied to the methods herein to prioritize and compare the deletion mismatch loci between two subjects and to generate a composite mismatch score reflecting mismatches (including deletion mismatches) at multiple loci. Examples of such analytical methods include naïve Bayesian scoring, decision trees, and boosting; these and similar approaches are routinely applied to genome-scale data sets to derive focused predictions (Jansen et al., Science 302: 449-453, 2003; Calvo et al., Nat Genet 38: 576-582, 2006).


Uses of the Deletion Variants to Measure and Manage Immunocompatibility


We have discovered a number of common deletion variants, particularly among people of a shared ancestry, in genes that encode for antigens expressed in tissues relevant to immunocompatibility. The conservation of these common deletion variants among multiple individuals, the presence of the antigens encoded by these polymorphic genes in relevant tissues, and the ability of the antigen to elicit an immune response, makes them ideal candidates for screening methods that determine immunocompatibility in any situation where immunocompatibility or lack of immunocompatibility is desired.


For example, the methods described herein can be used to detect common deletion variants to determine immunocompatibility between a subject in need of a transplant (a recipient) and a potential donor. These methods can also be used to screen for maternal/fetal incompatibility in cases of spontaneous abortion or among prospective parents having difficulty conceiving. The methods for identifying common deletion variants can also be used to identify a bone marrow donor for a recipient having a blood cancer where the recipient and the donor are not immunocompatible. In this case, a donor's immune system would attack the cancer cells that remain in the recipients blood system thereby enabling the transplanted bone marrow to not only replace the host's bone marrow but also to aid in the treatment of the cancer by killing off any remaining cancer cells present in the recipient. All of these uses are described in detail below.


Organ, Bone Marrow, and Blood Transplantation


Despite the increased success of organ and bone marrow transplantation in recent decades, the overall success is limited by the likelihood of graft rejection and the potentially fatal effects of GVHD or HVGD. In GVHD, most commonly seen in bone marrow transplants, the immune cells in the donor's graft recognize the antigens in the recipient as foreign and mount an immune attack against the host cells. In HVGD, most commonly seen in organ transplants, the recipient's immune system recognizes the antigens expressed in the donor organ graft as foreign and mounts an immune attack against the graft. Although in some cases the immune response can be treated using immunosuppressive drugs, the problems that arise from these drugs presents additional health related complications.


Blood typing and tissue typing for HLA antigens are the most common screens used today for determining immunocompatibility between a recipient and a potential donor prior to transplantation. However, these methods, when used alone, are not always effective or sufficient due to the inadequacies of HLA typing methods and the presence of additional antigens that can elicit an immune response


The deletion variants identified using the methods described herein are useful for screening individuals for immunocompatibility prior to transplantation. In general, a biological sample is obtained from the recipient in need of a transplant and the potential donor. The biological sample can be any bodily fluid (e.g., blood, serum, plasma, amniotic fluid, cerebrospinal fluid, saliva, urine, or semen), tissue, or cell and the sample is tested for the presence or absence of a deletion variant either at the nucleic acid level or the antigen level using the methods described above. For organ transplants, a blood sample or a biopsy sample from the organ to be transplanted or both are preferred. For bone marrow transplants, a blood, serum, or plasma sample is preferred, although the particular of involvement of liver, intestine, and skin in typical GVHD suggests that antigens in liver, intestine, and skin are also relevant to histocompatibility.


Deletion variant, preferably common deletion variant, typing information can include a nucleic acid “type” or antigen “type” for a particular antigen identified by the methods described herein as having a common deletion variant or any combination of the antigens described in Table 1. Common deletion variant typing can also include whole genome sequences for an individual where common deletion variants can be identified and matched with potential donors based on genome sequencing and analysis as described herein. Deletion variant typing information can also include deletion variant pattern or deletion variant antigen pattern information for a subject.


An organ recipient and organ donor are said to match when the organ donor does not have any antigens that are deleted in the recipient. For histocompatibility between an organ or tissue donor and an organ or tissue recipient, one of three scenarios can occur: 1) both the recipient and the donor have a deletion variant in all copies of the gene, which prevents expression of the antigen in both the recipient and the donor; 2) both the recipient and the donor do not have a deletion variant and both express the antigen; and 3) the recipient does not have the deletion variant and expresses the antigen and the donor has a deletion variant in all copies of the gene that prevents expression of the antigen. In all of these scenarios, the immune system of the recipient would not be newly exposed to the antigen upon transplantation. For histocompatibility between a bone marrow or peripheral blood donor and a bone marrow or peripheral blood recipient, one of three scenarios can occur: 1) both the recipient and the donor have a deletion variant in all copies of the gene which prevents expression of the antigen in both the recipient and the donor; 2) both the recipient and the donor do not have a deletion variant and both express the antigen; and 3) the donor does not have the deletion variant and expresses the antigen and the recipient has the deletion variant in all copies of the gene that prevents expression of the antigen. In all of these scenarios, the immune system of the bone marrow donor would not be newly exposed to the antigen expressed by the recipient upon transplantation.


The methods described herein can be used to detect a deletion variant in a single gene or in more than one gene. For example, an individual can be typed for the presence of one, two, three, four, five, six or more common deletion variants in expressed antigens. Furthermore, an individual can be screened for deletion variants throughout her genome using whole genome sequencing techniques such as those described above (e.g., genomic hybridization to microarrays, microelectrophoresis, and single molecule sequencing). While it is preferred that two subjects are a perfect match for each and every common deletion variant tested, individuals can be ranked for immunocompatibility depending on the number of matches and the relative importance of the antigen expressed by the gene having the common deletion variant. Priority scoring systems, statistical analysis, and metrics can be used by the skilled artisan to rank the subjects for immunocompatibility. For example, an individual in need of a liver transplant would generally seek a donor having a common deletion variant type match at any, and preferably all, of the UGT2B17, UGT2B28, and GSTM1 loci, all of which are expressed in the liver, but may not be matched for common deletion variants at the OR51A2 locus, which is expressed in the olfactory epithelium. An individual in need of a kidney transplant would generally seek a donor having a common deletion variant type match at any, and preferably all, of the UGT2B28, GSTT1, and GSTM1 loci, all of which are expressed in the kidney. An individual in need of a bone marrow transplant would generally seek a donor having a common deletion variant type match at any, and preferably all, of the UGT2B17, UGT2B28, GSTM1, GSTM1, and CYP2A6 loci. Combinations of the above with any additional deletion variants either described herein or known in the art, or identified by whole genome sequencing analysis as described herein, can be used to further type the candidate transplant donor and recipients.


A transplant recipient can be screened or “typed” for deletion variants, preferably common deletion variants, in any one or more of the nucleic acids or antigens listed herein at any time after diagnosis of a disease or a propensity to develop a disease that would require an organ, tissue, blood, or bone marrow transplant. A transplant donor can be screened or “typed” for deletion variants, preferably common deletion variants, in any one or more of the antigens listed herein at any time after which the decision to donate or serve as a potential donor is made or after the donor's organ, tissue, blood or bone marrow become available. Information regarding the common deletion variant typing of the recipient and donor can be used to identify a histocompatibility match with an already identified individual (e.g., a sibling or a relative) or entered into a registry or waiting list for subjects in need of an organ or bone marrow transplant and potential donors along with additional pertinent information such as name, age, sex, race, blood type, HLA tissue type, geographic location, and urgency of the needed organ or tissue donation.


Procedures for matching transplant donors and recipients using transplant registries are known to the skilled artisan. Generally, when organs are donated, the procuring organization accesses the national transplant computer system, UNetsm, through the Internet, or contacts the UNOS Organ Center directly. In either situation, information about the donor is entered into UNetsm and a donor/recipient match is run for each donated organ. The resulting match list of potential recipients is ranked according to objective medical criteria (i.e. blood type, tissue type, common deletion variant or antigen type, size of the organ, medical urgency of the patient, as well as time already spent on the waiting list and distance between donor and recipient). Each organ has its own specific criteria.


Using the match of potential recipients, the local organ procurement coordinator or an organ placement specialist contacts the transplant center of the highest ranked patient, based on policy criteria, and offers the organ. If the organ is turned down, the next potential recipient's transplant center on the match list is contacted. Calls are made to multiple recipients' transplant centers in succession to expedite the organ placement process until the organ is placed. Once the organ is accepted for a patient, transportation arrangements are made and the transplant surgery is scheduled.


Antigen or nucleic acid typing using the deletion variants identified herein can also be used to determine the need for additional immunosuppressive medications such as purine analogs, corticosteroids, FK506, cyclosporine, rapamycin, mycophenolate mofetil, antithymocyte globulin, and anti-CD3 and anti-IL-2 receptor monoclonal antibodies during and after transplantation. For example, if the donor and recipient were not perfectly matched for antigens tested, the clinician may decide to use additional immunosuppressive medications than if donor and recipient had been a perfect match.


In addition, using the deletion variants described herein, immune rejection can also be monitored by assaying for the presence of antibodies directed against the common deletion variant antigen. Standard immunoassays using the antigen as a substrate to detect binding to antibodies present in the serum or blood sample from a subject are known in the art. Examples of kits in the art used to detect antibodies to a given antigen in serum include kits to detect Helicobacter pylori, Rubella, and cytomegalovirus.


In this example, a recipient, after transplantation, can be screened regularly for the presence of antibodies, or fragments thereof, that specifically bind any of the deletion variant antigens that are or are not matched for the donor and recipient samples. The increased presence of such antibodies as compared to a sample taken prior to transplantation is indicative of an immune response against the antigen and may suggest imminent graft rejection. In this case, the clinician can use the information to make decisions regarding the use of additional immunosuppressive medications or removal of the graft. The development of therapies for depleting such antibodies from a patient, or for masking or otherwise interfering with their ability to bind to antigen, is also contemplated in this invention.


Graft Versus Tumor Effect


An immune attack by donor-derived immune cells against cancerous host cells is frequently a desired feature of a bone marrow transplant. This “graft-versus-tumor” or “graft-versus-leukemia” effect has been an occasionally successful but highly unpredictable feature of bone marrow transplant. Bone marrow derived from individuals who are deleted for antigens that are generally expressed selectively in leukemic cells might be able to mount a graft-versus-leukemia response without causing a dangerous graft-versus-host risk to other tissues.


In this subset of bone marrow or peripheral blood transplantation, an immune response is actually desired in order to mount an attack against tumor cells present in the blood or bone marrow of the recipient. When a subject has a hematologic disorder, such as blood cell cancer, a bone marrow or peripheral blood transplant is used to introduce new marrow into the recipient's system in order to produce healthy red blood cells, white blood cells, and platelets. Bone marrow transplants are often used, for example, after high doses of chemotherapy or radiation which killing the cancer cells but also kill the patient's bone marrow.


In this example, common deletion variant typing of the nucleic acids of the invention or the antigens encoded by the invention is done to identify a bone marrow or blood donor that is not compatible with the recipient. Any one or more of the antigens or common deletion variants can be screened but it is most desirable to screen for antigens that are expressed by the cancer cells or progenitor cells. Alternatively or additionally, a whole genome sequence analysis can be performed to identify common deletion variants at a deletion mismatch loci. A donor is identified as incompatible with the recipient if the donor has a deletion variant in all copies of the gene that prevents expression of the antigen and the recipient does not have the deletion variant and expresses the antigen. Once a histoincompatible donor is identified for the recipient, the transplant is performed and desirably, results in an immune attack mounted by the donor's transplanted immune cells against the remaining cancer or disease cells in the host recipient. This desired outcome of transplantation is termed graft versus tumor and not only provides healthy blood cells to the patient but also aids in the treatment of the cancer by killing the remaining cancer cells.


Maternal/Fetal Immunocompatibility


The methods of the present invention are also useful for screening individuals for immunocompatibility to diagnose and understand maternal/fetal incompatibility issues that may contribute to spontaneous abortion or miscarriage. In some cases, fertility issues arise not because of fertility problems but because of immunocompatibility issues between the mother and the prospective father or sperm donor. One common example of such a case occurs when a mother is Rh negative and her partner is Rh positive. Rh factor is a protein present in the red blood cells of most people, capable of inducing intense antigenic reactions. If the mother has an Rh antibody titer after sensitization during a previous pregnancy or due to a previous incompatible transfusion, and the fetus is Rh positive, then the mother's immune system can mount an attack against the fetal cells expressing the Rh factor. Such an attack can result in spontaneous abortion or many lifelong complications for the baby before and after birth. Pregnant women or women interested in conceiving are often tested for the presence of antibodies for Rh as are fetuses in women who are Rh negative.


Despite this understanding of Rh compatibility issues, many spontaneous abortions and fertility problems still occur as a result of incompatibility of antigens that have not yet been identified. Using the methods of the present invention, a woman and a prospective man wanting to conceive can be tested for any one or more of the common deletion variants of the invention. Such typing can occur at the DNA level (either whole genome sequencing or to identify the presence or absence of known deletion variants) or using antigen typing for common deletion variant antigens other than MHC, Rh factor, or blood type. Antigen typing can occur as a preliminary screen or after fertility problems or one or more spontaneous abortions have occurred. Similarly, a woman intending to use a sperm donor can be screened and the sperm can be screened for deletion variants or expression of the deletion variant antigens encoded by the polymorphic genes. In the case of known incompatibility, the fetus can also be tested. Similarly, a woman undergoing in vitro fertilization could have several embryos tested for histocompatibility with her, to ensure that a histocompatible embryo is implanted and thereby maximize the probability of a successful pregnancy. Information gained from antigen or common deletion variant typing can be used to understand fertility issues, to identify problems with potential partners, or to monitor an at risk fetus when incompatibility is known.


For histocompatibility between a woman and a prospective father or sperm donor or an embryo or fetus, one of three scenarios can occur: 1) both the mother and the father, sperm, embryo or fetus have a deletion variant, preferably a have a deletion variant in all copies of the gene, which prevents expression of the antigen in both the recipient and the donor; 2) both the mother and the father, sperm, embryo or fetus do not have a deletion variant and both express the antigen; and 3) the mother does not have the deletion variant and expresses the antigen and the father, sperm, embryo or fetus has a deletion variant in all copies of the gene that prevents expression of the antigen. In all of these scenarios, the immune system of the mother would not be newly exposed to the antigen upon transplantation.


In one example, a pregnant woman presents at her OB/GYN office for a prenatal visit. Routine blood work determines that she has one or more common deletion variants resulting in non-expression of the encoded antigen. Examples of particular deletion variants that are useful in this method include UGT2B28, UGT2B17, and LCE3C, all of which are expressed in the placenta. Her partner does not have the common deletion variant and expresses the antigen. The pregnant woman is then further tested to determine if she has a serum antibody titer to the antigen. Fetal DNA or antigen typing using amniotic fluid can also be performed. If the fetus is determined to lack the common deletion variant or express the antigen, further monitoring of the fetus by the clinician or by sonography or amniocentesis can be performed. However, if the fetus is determined to have the common deletion variant, the fetus is judged to be at low risk for immune attacks by the maternal immune system and can be followed by non-invasive procedures such as sonography.


Combination Screening Methods


Although the methods described herein are effective for determining immunocompatibility between individuals, they can also be combined with additional known screens and tissue typing methods for the identification of compatible or incompatible individuals. Such methods are known in the art and include blood type matching, Rh factor typing, and HLA typing, both of which are known in the art. When immunocompatibility is desired, individuals matched for antigens can also be screened (either prior to or after antigen screening) for matching blood types and matching HLA types. When immunoincompatibility is desired, individuals that are identified as having different antigens types can also be screened (either prior to or after antigen screening) for the presence or absence of distinct blood types and HLA types.


For blood typing, an individual with Type A blood is compatible with an individual with Types A or O. An individual with Type B blood is compatible with an individual with Types B or O. An individual with Type O blood is only compatible with an individual with Type O. An individual with Type AB blood is compatible with an individual having any blood type. Blood types can also be measured for compatibility of Rh factor.


For HLA typing, the screen can include any number of the proteins encoded by the HLA region and generally includes from one to six of the proteins. The polymorphic proteins encoded by the HLA region have been designated HLA-A, -B, -C,-DR,-DQ, and -DP. HLA-A, -B, and -C consist of a single polymorphic chain. HLA-DR, -DQ, and -DP proteins contain two polymorphic chains, designated alpha and beta. These D-region proteins are encoded by loci designated DRA, DRB1, DRB3, DRB4, DQA1, DQB1, DPA1, and DPB1. (See Schwartz, Ann. Rev. Immunol. 3:27-261, 1985.) The products encoded by the polymorphic HLA loci are most commonly typed by serological or nucleic acid based typing methods. See for example, U.S. Pat. No. 6,194,147 for a description of methods for HLA typing.


Of the many HLA antigens, the National Marrow Donor Program (NMDP) sets minimum matching levels that must be met before a donor or cord blood unit from the NMDP Registry can be used for a transplant. These minimum requirements are based on research studies of transplant outcomes. The HLA antigens that are looked at for these minimum requirements are called HLA-A, -B and -DRB1. One set of these three antigens is inherited from the mother and another set is inherited from the father. This makes a total of six antigens to match. For cord blood units, the NMDP requires a match of at least four of these six HLA antigens. For adult marrow or peripheral (circulating) blood cell donors, the NMDP requires a match of at least five of these six HLA antigens.


Potential donors and recipients can also be tested for crossmatching in which the recipient's blood and the potential donor's blood are place together in a test tube and examined to see if there is cell death. If all the cells survive without death of the donor's cells, there is a negative crossmatch, which is indicative of immunocompatibility of the individuals. If the cells of the donor begin to die, a positive crossmatch results, which is indicative of immunoincompatibility.


Tolerization


For any of the immunocompatibility testing methods where compatibility is desired, if two subjects are found to be immunoincompatible due to the presence in one subject of a deletion variant in all copies of the gene that is not present in another subject (e.g., an organ donor and a recipient or a potential mother and father trying to conceive), but a transplant or fertilization must still take place between the two subjects, methods for tolerizing the subject having the common deletion variant and therefore lacking the expressed antigen can be used to reduce the risk of organ rejection or spontaneous abortion.


Tolerization regimens are intended to prepare the immune system of an individual (e.g., organ recipient or prospective mother) to accept an antigen that is not expressed in that individual due to a polymorphic deletion. For example, an individual awaiting an organ transplant could be treated to facilitate acceptance of antigens that are not expressed in that individual. In another example, if prospective parents are not compatible because the prospective mother does not express one or more of the antigens encoded by the common deletion variants and the prospective father does, the prospective mother can be treated to tolerize her to the presence of the antigen that may be expressed on the fetus.


Tolerization can be achieved through any gene therapy or protein therapy regimens known in the art for delivery of an antigen or a nucleic acid encoding an antigen to the individual in need of tolerization. The purified protein or nucleic acid encoding the antigen can be delivered directly to a target organ or systemically.


For protein therapy, purified forms of the antigen used for tolerization can be purchased from a commercial source or can be produced by recombinant methods known in the art (see, for example, Sambrook et al., Molecular Cloning: A Laboratory Manual, Vols. 1-3, Cold Spring Harbor Laboratory Press, 3 ed., 2001, or F. Ausubel et al., Current Protocols in Molecular Biology (Green Publishing and Wiley-Interscience: New York, 1987) and periodic updates.


The desired antigen can also be delivered via a nucleic acid encoding the antigen. The nucleic acid can be any nucleic acid (DNA or RNA) including genomic DNA, cDNA, and mRNA encoding the antigen. Methods for nucleic acid therapy are known in the art and can be found, for example, in Sambrook et al., supra, Ausubel et al., supra, and Watson et al., Recombinant DNA, Chapter 12, 2d edition, Scientific American Books, 1992).


In gene therapy applications, genes are introduced into cells in order to achieve in vivo synthesis of a therapeutically effective genetic product. “Gene therapy” includes both conventional gene therapy where a lasting effect is achieved by a single treatment, and the administration of gene therapeutic agents, which involves the one time or repeated administration of a therapeutically effective DNA or mRNA. Standard gene therapy methods typically allow for transient protein expression at the target site ranging from several hours to several weeks. Re-application of the nucleic acid can be utilized as needed to provide additional periods of tolerization.


An additional method for tolerizing immune cells from one individual to a known antigen is to “immunodeplete” those cells which bind to a particular antigen, or which bind to peptide fragments presented on cell surfaces by the MHC. Methods for immunodepletion are known in the art and are reviewed, for example, in Blazar and Murphy, Philos Trans R Soc Lond B Biol Sci. 360:1747-67 (2005).


EXAMPLES
Example 1
Identification of Aberrant Genotype Patterns Across the Genome

The locations of common deletions in the human genome are largely unknown, as is the best way to determine the association of such variants with disease. To address these questions, we developed an approach for using the HapMap to discover, localize, and analyze common deletion variants. We found hundreds of deletion variants, 1 kb-745 kb in size, including more than 100 common deletions that were observed as homozygous deletions. Ten of these common deletion variants remove the coding regions of expressed genes thought to contribute to drug response, olfaction, and sex steroid hormone metabolism; the gene deletion variants also explained variation in gene expression at these loci. Most common deletions appear to result from ancestral mutations that have been inherited by descent; they are in linkage disequilibrium with nearby single-nucleotide polymorphisms (SNPs), such that their association to disease could be discovered in whole-genome association studies.


SNPs have long been appreciated as common, potentially phenotype-causing genetic variants and as markers for other, undiscovered variants via linkage disequilibrium. Genome-wide SNP discovery efforts, and the construction of a map of human SNP variation (HapMap consortium), allow for the use of whole-genome SNP genotyping to discover common ancestral mutations that affect disease risk.


Recently, it has been recognized that structural variation—including duplications, deletions, and inversions—is common and extensive. (See, for example, Sebat et al., Science 305:525-8 (2004); Iafrate et al., Nat. Genet. 36:949-51 (2004); Tuzun, et al., Nat. Genet. 37:727-32 (2005); and Sharp et al., Am. J. Hum. Genet. 77:78-88 (2005)). Of all forms of structural rearrangement of a locus, the form with the most obvious potential functional relevance is that which removes the DNA sequence altogether. However, little is known about the location of common deletion polymorphisms on the scale of specific exons and regulatory elements; even less is known about which deletion variants may be sufficiently common to appear as homozygous deletions in many individuals.


To identify, catalog, and enable study of deletion variants across the human genome, we set out to develop and validate a method for discovering deletions from SNP genotypes. We hypothesized that a segregating deletion would leave “footprints” in SNP genotypes, including null genotypes, apparent deviations from Mendelian inheritance, and apparent deviations from Hardy-Weinberg equilibrium (FIG. 1). This is complicated by the fact that technical artifacts and genotyping errors also give rise to these three “failure modes” at individual markers. In fact, because of the likelihood that such deviations are errors, genotype assays have long been discarded from medical genetic studies when such failures are observed.


To determine whether a subset of “failed” SNP genotyping assays in the HapMap data might reflect structural variation, we asked whether such failures are physically clustered in a manner that is specific to individuals. Consistent with this hypothesis, the rate of Mendelian-inconsistent genotypes was elevated near other Mendelian-inconsistent genotypes in the same individual (regardless of whether the same genotyping platform was used for both assays), but was unrelated to Mendelian inconsistencies in other individuals (FIG. 2A). A similar relationship was observed for null genotypes (FIG. 2B). Thus, such clustering is a property of individual variation in local sequence, rather than the local sequence per se.


We used data from the International HapMap Project to identify clusters of aberrant genotype patterns across the genome. We used the unfiltered genotypes from release 16 of the HapMap, which we downloaded from http://hapmap.org. These consisted of separate genotype files for four population samples: 90 CEPH individuals (30 trios) of European ancestry; 90 individuals (30 trios) of Yoruban ancestry sampled in Ibadan, Nigeria; 45 unrelated individuals of Han Chinese ancestry sampled in Beijing; and 45 unrelated individuals of Japanese ancestry sampled in Tokyo. The population samples are described in detail in Altshuler et al. Nature 437:1299-1320 (2005)). We combined the data from the Chinese and Japanese population samples and thereafter treated the data set as three population samples of 90 individuals each.


A complication is that this data had been generated at ten different genotyping centers, using seven different genotyping technologies, each of which showed distinct rates of each type of “failed assay.” We noted that the background rates of Mendel failure, null genotypes, and Hardy-Weinberg disequilibrium differed greatly from technology to technology, and even for the same technology when used by different centers. Furthermore, there were many sample-by-batch interactions, in which particular samples were associated with elevated rates of null genotypes or Mendel failures in particular experimental batches. To distinguish physically clustered patterns of aberrant genotypes from sporadically appearing patterns, we developed a set of statistical thresholds, tailored to each genotype pattern, genotyping center, and genotyping technology, for identifying significantly clustered patterns.


Because we sought to identify multi-assay patterns in the data from independent genotype assays, we did not combine data from multiple assays that potentially used the same sequence features for amplification, labeling, or restriction digest. Thus, we excluded all the Perlegen assays, because the use of 10-kb amplicons on that platform potentially caused long-range patterns of aberrant genotypes wherever an undiscovered SNP altered either primer-binding site. We also excluded data from any experiments whose batch structure corresponded to physical regions of the genome, because this design potentially allowed batch-specific experimental artifacts to appear as regional patterns in the data.


We looked for clustering of aberrant genotype patterns in each of the populations separately as described below.


Null Genotypes


For each genotype assay and population sample, we defined the “null genotype pattern” of that assay as the binary vector (length 90) of null genotype calls across the 90 individuals in that population sample. For each such pattern that was observed on any genotyping platform, we considered each pattern together with its close neighbors (R2>0.8) in pattern space. (This fuzzy clustering was necessary because genotype assays do not consistently obtain 100% complete calls, even in euploid samples.) We determined the background frequency of that set of patterns on the combination of genotyping technology, genotyping center, and (wherever possible) on the specific experimental batch in question. Using that background frequency, we defined a statistical threshold for clustering by finding numbers x and y such that the binomial probability of observing 2 occurrences in x physically consecutive assays, or 3 occurrences in y physically consecutive assays, was sufficiently small that, after testing (num_patterns×num_assays) hypotheses, we would expect fewer than two chance discoveries per platform. We identified all genomic segments (runs of two or three examples of the pattern) where the clustering of this pattern exceeded the statistical threshold, and clustered any segments that overlapped.


Mendel Failures


For each genotype assay and population sample (CEPH and Yoruba samples only), we defined the “Mendel failure pattern” of that assay as the binary vector (length 60) of null genotype calls across the 60 parent-offspring pairs in that population sample. For each such pattern that was observed, we considered each pattern together with its close neighbors (R2>0.8) in pattern space. This fuzzy clustering was desirable because the same deletion segregating in a population can give rise to non-identical patterns of Mendel failure at different SNPs, due to the fact that the non-deletion SNP haplotypes that are segregating in a trio (whose conflicts result in the Mendel conflicts) may not disagree at all SNPs.


Assessment of Clustering of “Failure Profiles”


For both Mendel failure profiles and null genotype profiles, we observed that highly similar (R2>0.8) profiles tended to be physically clustered in the genome. More specifically, we observed that the probability of observing a “match” to any particular profile was a decreasing function of physical distance from that profile, even when we considered only pairs of SNP assays that were typed using different technology platforms (FIGS. 6A and 6B).


The Phase I HapMap data was produced by ten different genotyping centers, with each chromosome arm primarily genotyped by one particular center (HapMap Consortium, Altshuler et al. Nature 437:1299-1320 (2005)). Approximately 120 thousand SNP assays were performed by centers outside of their primary regions, or on genome-wide platforms such as Affymetrix 100K SNP arrays, allowing cross-platform analyses like those in FIG. 2A and FIGS. 6A and 6B. However, because the overwhelming majority of assays in any particular region were performed at a single genotyping center, any effort to identify local multi-marker features in the HapMap data must of necessity compare many SNP assays that were produced by the same center and genotyping technology. It was therefore critical to control for center- and platform-specific patterns in the data. An initial survey of the data suggested that such patterns were potentially abundant; for example, the background rates of Mendel failures varied from center to center (with additional examples of center-by-sample interaction), and particular DNA samples tended to have low genotype conversion rates (high null genotype rates) at particular centers. Thus, it was important that the clustering of each pattern be assessed in a format that controlled for such center- and platform-specific patterns in the data.


We therefore analyzed the data from each genotyping center separately. For each genotyping center, we first ordered all of the SNP assays from that center by genomic position. For each pattern (clustered set of highly similar profiles) that was observed multiple times, we determined that pattern's background frequency at that center, and wherever possible on the specific experimental batch in question. (Batch information was obtained from the International HapMap Consortium.) We then analyzed the physical distribution of all observations of that pattern relative to all of the SNP assays from that center (ordered by genomic position). A list of “candidate clusters” was determined by considering every consecutive pair and consecutive trio of observations of that pattern, together with any other, intervening SNP assays from that center. To assess the tightness of each such candidate cluster, a “clustering p-value” was calculated to assess the probability of observing a cluster at least as tight (in consecutive-assay space) as that cluster, given (i) the background frequency of the pattern, (ii) the number of SNP assays spanned by the cluster, and (iii) the total number of SNP assays performed by that center. The distribution of these p-values is shown in FIGS. 6C and 6D. These figures show a generally uniform distribution of p-values from zero to one, but with an excess of very low p-values. The region of excess low p-values can be thought to identify a set of candidate clusters in which the alternate hypothesis (non-random degree of clustering) is likely to be true; this region is separated by a “knee” from the rest of the distribution, which is organized as a generally uniform distribution (FIGS. 6C and 6D). We chose a significance threshold for promoting potential clusters, based on the goal of capturing as many true discoveries as possible, while maintaining a false discovery rate of no greater than 10% of all discoveries. This required selecting a significance threshold somewhat to the left of the “knee” in the distribution, where the height of the distribution was at least ten times greater than the average height of the distribution to the right of the knee. In FIGS. 6C and 6D, this corresponds to the leftmost bars of the histogram (a p-value of 1.8×10−3 for Mendel patterns, and 9×10−4 for null-genotype patterns). The large additional region of excess low p-values, not captured by this threshold, suggests a significant type II error rate (because no gold-standard data set exists, the true type II error rate is not known). As the density of markers in the HapMap increases during subsequent phases of the project, many of the clusters in this region may be either confirmed by additional assays (increasing the clustering and promoting the cluster beyond the threshold) or not confirmed (reducing the level of clustering and increasing the p-value).


We clustered all overlapping genomic segments that were identified by this analysis, into 702 genomic loci.


We were concerned that multiplexed batches of SNP assays that were performed together could also give rise to potential patterns in the data, which (if distributed non-randomly in genomic space with respect to that center's other SNP assays) could give rise to potential batch artifacts. We therefore excluded those clusters that consisted entirely of SNP assays from the same experimental batch. This resulted in a set of 541 predictions.


Hardy-Weinberg Disequilibrium


We observed that a deletion tended to reduce the ratio of observed heterozygosity to expected heterozygosity (hetobs/hetexp) by a uniform amount (FIG. 2C), this amount being determined by the population frequency of the deletion haplotype. We thus looked for genomic regions in which (hetobs/hetexp) consistently fell underneath some cutoff (we used cutoffs of 0.7 and 0.4). We included only those assays with a minor allele frequency greater than 10 percent. For each genotyping platform, we determined the background frequency of assays for which (hetobs/hetexp) was less than the cutoff, and used this frequency to determine statistical thresholds for clustering as described above.


Wherever the resulting genomic segments overlapped with clusters of Mendel failure or null genotypes as discovered above, we clustered those segments together. (Because heterozygosity can show regional correlations due to haplotype structure, selection, and potentially duplicated sequence, we did not promote loci based on (hetobs/hetexp) alone unless confirmed by one of the other lines of evidence; however, the (hetobs/hetexp) deviations were useful for extending clusters discovered by Mendel failures, because the Mendel failures themselves may not be observed at every marker in the deleted region (FIG. 2C).)


More specifically, we defined as the “failure profile” of an assay its pattern of Mendel failure across the 60 pairs of relatives in a population, its pattern of null genotypes across the 90 individuals in a population, and its deviation from the expected level of heterozygosity in that population. We looked for regions of the genome in which highly similar “failure profiles” appeared at nearby markers (FIGS. 2C and 2D), to an extent not explained by center-, platform-, or batch-specific patterns in the data. To assess the statistical significance of each candidate cluster, we calculated the binomial probability of observing each pattern n times in m markers (based on the empirically observed rate for each platform). We identified a candidate deletion when that probability was smaller than a threshold expected to result in fewer than 20 false discoveries across the genome.


Using these methods we identified 541 candidate polymorphic deletions 1-200 kB in size (as shown in Appendix A). 120 of these loci generated null genotypes in multiple individuals, suggesting the existence of common, homozygous deletions. More than 90% of the discovered deletion variants were novel. Half of these loci were 1-7 kb in size and were therefore not detectable by earlier approaches; 98% were 1-30 kb in size and would have had little chance of detection by commonly used hybridization-based approaches.


It was critical to validate the presence of segregating deletions at the predicted sites, given their origin in data that fails typical quality control standards and the statistical nature of the inference. We used four methods: fluorescent in situ hybridization (FISH), two-color fluorescence allele-intensity measurements, PCR amplification, and comparison to previous work. These methods are described below in the Materials and Methods section.


First, we performed fluorescent in situ hybridization (FISH) on four candidate deletions that completely contained available FISH probes. The FISH assays confirmed the existence of segregating deletions at each site, and confirmed their Mendelian inheritance wherever suitable cell lines were available (FIG. 3A).


Second, we examined two-color fluorescence data from the assays that had been used to genotype SNPs on chromosomes 4q, 7q, and 18p at the Broad Institute. Specifically, this method associates a quantitative fluorescence signal with each allele at each typed SNP in each individual. At most SNPs, individuals' fluorescence-intensity measurements cluster into two or three discrete groups corresponding to homozygous and heterozygous genotypes. At SNPs under 15 candidate deletion loci, fluorescence intensity data instead clustered into as many as six groups (FIG. 3B). When we compared these measurements to imputed genotypes for individuals with hemizygous and homozygous deletions, these segregated into the observed clusters (FIG. 3B).


Third, we selected 60 loci for which the pattern of genotypes suggested the existence of multiple individuals with homozygous deletions, and confirmed the existence of homozygous deletions at 51 of these loci by PCR assays that failed in the suspected homozygous-null individuals but succeeded in all other individuals tested (FIG. 3C and Tables 2a and 2b).


Fourth, quantitative PCR was performed in all 269 HapMap DNA samples for 11 candidate deletions that overlapped the coding exons of genes (described below) and were discovered in many individuals: at 10/11 loci, three discrete clusters were observed, identifying individuals with 0, 1, and 2 gene copies (FIG. 3D).

TABLE 2aValidation of candidate deletion variants.Experimental validationVariantsVariantsTechniqueCandidates screenedCriterion appliedscreenedconfirmedFISH5 large candidate variants thatAbsence of FISH signal in the specific individuals predicted to55cover available FISH probesharbor the deletion variant (but not in control individuals)Two-color,All 17 candidate commonObservation of extra genotype classes, well-separated from1715allele-specificvariants on 4q, 7q, 17pothers clusters at at least one SNP, that contain all individualsfluorescencethat covered at least 3predicted to be aneuploid at locus.HapMap SNPs typed atthe Broad InstitutePCR60 candidate variantsConfirmation of predicted pattern of successful and6051predicted to be homozygousunsuccessful amplification across 12-24 individuals including atnull in at least two individualsleast 2 with each predicted resultQuantitative PCR11 candidate commonClustering of measurements of DNA copy-number into three1110variants that affected thediscrete groupscoding exons of genes









TABLE 2b










Comparison to earlier work.











Earlier approach
Ref.
Variants considered
Criterion applied
Confirmations














ROMA
1
55 “copy number polymorhpisms” (potentially
Aberrant SNPs spanned at least 20% of the
4


(Sebat et al., 2004)

deletions or duplications)
region identified earlier


BAC array CGH
2
255 “large copy number variants” (potentially
Aberrant SNPs spanned at least
3


(Iafrate et al., 2004)

deletions or duplications)
20% of the BAC probe identified earlier


BAC array CGH
3
119 “copy number polymorhpisms” (potentially
Aberrant SNPs spanned at least 20%
6


(Sharp et al., 2005)

deletions or duplications)
of the BAC probe identified earlier


Fosmid end pair
4
102 deletion variants
Aberrant SNPs fell completely inside the
28


sequencing


fosmid(s) identified earlier


(Tuzun et al., 2005)









We also tested an additional 56 loci that were not among our core predictions, but met a more-relaxed set of statistical thresholds; the confirmation rate among these other candidate variants was considerably lower, suggesting that relaxation of the statistical thresholds would be unwarranted.


Finally, we compared the locations of the candidate deletions to results from an earlier study, in which the approximate genomic locations of 102 candidate deletions in a single individual were discovered by the existence of fosmid end pair sequence reads from that individual that map more than 48 kb apart on the reference human genome sequence. (Tuzun et al., supra). Twenty-eight of our candidate deletions resided within these fosmids; in each case, the location of the aberrant genotypes further refined the localization of the deletion variant.


In sum, 90 predicted deletion variants (including 68 of 120 predicted common homozygous deletions) were validated by one or more of these approaches. Based on the experimental results, we estimate that 15% of the still-untested candidate deletion loci may be false positives.


We found thirteen genes for which exons were deleted at an appreciable frequency (Table 1). Of these genes, eight were observed as homozygous deletions. These common gene deletion polymorphisms included two genes involved in the metabolism of sex steroid hormones (UGT2B28 and UGT2B17). Common deletions also removed two genes encoding olfactory receptors (OR51A2 and OR4F5) and three genes (CYP2A6, GSTT1, and GSTM1) with roles in detoxification and drug metabolism. (For information on previously identified deletions in some of these genes see Seidgard et al., Proc. Natl. Acad. Sci. 85:7293-7297 (1988), Nunoya et al., Pharmacogenetics 8:239-249 (1998); and Pemble et al. Biochem. J. 300 Pt1:271-276 (1994).)


To assess the frequencies and inheritance of these gene deletions in different populations, we developed quantitative PCR assays for accurately genotyping individuals as carrying 0, 1, or 2 gene copies, and used these assays to successfully genotype eight of the ten gene deletion variants in all the HapMap individuals (Table 1). The resulting genotypes showed Mendelian inheritance, Hardy-Weinberg equilibrium, and expected transmission rates, suggesting that each behaves as a stable, heritable genetic variant. The gene deletion variants were observed in individuals of European, Yoruba, and Chinese and Japanese ancestry, though the frequency of each deletion varied from population to population (Table 1).


Assessing functional relevance requires testing for association to phenotype. A simple phenotype is the level of expression for each transcript. Based on global profiles of gene expression in a subset of the samples, we found that three commonly deleted genes (Table 1) are expressed at appreciable levels in the lymphoblastoid cell lines used to measure individual variation in gene expression. (Monks et al., supra and Morley et al., Nature 430:743-747 (2004)). We compared published expression measurements from these cell lines to deletion genotypes that we obtained experimentally. Variation in gene dosage explained respectively 88%, 26%, and 75% of the observed variation in expression of the three genes (FIG. 4); individuals with one copy showed 30%, 35%, and 38% less expression respectively than individuals with two gene copies. Individuals homozygous for deletion variants showed little or no expression. Individuals with one gene copy showed less expression than individuals with two gene copies, suggesting that feedback regulation had not normalized transcript level.


For medical genetics, a key question is whether one must discover each deletion variant in every patient, using dedicated technology, or can rely on linkage disequilibrium by using nearby SNPs as proxies for common deletions. The answer to this question depends on the linkage disequilibrium properties of common deletion variants: if common deletion of a locus is due to recurrent mutation there, then deletions must be discovered independently in every patient; if common deletion of a locus results from an ancestral mutation that has been inherited by descent, then it will often segregate on an ancestral haplotype and be in linkage disequilibrium with nearby SNPs.


In addition, to the extent that deletions result from unique ancestral mutational events, they will often be in linkage disequilibrium with nearby SNPs, and ancestral SNP haplotypes can serve as proxy in disease studies as well as immunocompatibility assays.


We observed strong LD between SNPs from HapMap and validated deletions. For example, nine of the ten gene deletions (for which we had designed accurate quantitative PCR genotyping assays) showed significant LD with nearby SNPs, and six of the ten had a perfect SNP proxy (r2=1) in one or more populations (see, for example FIG. 5A and FIGS. 7A-7F). In each case the deletion was associated to the same SNP allele(s) in each population (FIG. 5B and Table 3), indicating an ancestral mutation that occurred before humans migrated from Africa to Europe and Asia. In the larger collection of 51 deletion variants validated by PCR, we found elevated homozygosity at SNPs flanking the homozygous deletions (relative to randomly-selected individuals at the same loci), indicating that the deletion alleles travel on specific SNP haplotypes (FIG. 5C). On average, the rate of decay of haplotype homozygosity around deletion alleles was similar to that observed for a frequency- and population-matched set of SNP alleles (FIG. 5C).

TABLE 3SNP alleles that tag common gene deletion alleles, for potential usein medical genetic studies.Tagging SNP allele (R2)GeneChinese/deletionSNPSNP positionEuropeanJapaneseYorubaUGT2B28rs459010870,430,000C(0.78)C(0.81)rs1124953270,432,487T(1.0)T(1.0)T(0.90)rs1250139370,562,708G(1.0)G(0.69)rs1250195370,572,663C(1.0)C(1.0)C(0.69)rs1250704170,577,410G(1.0)G(0.91)UGT2B17rs270866669,370,214A(0.74)A(0.55)A(0.40)rs310064569,806,739C(1.0)C(0.96)C(0.63)LCE3Crs4112788149,767,858C(0.93)C(0.90)rs1886734149,807,724G(0.93)G(0.93)G(1.0)rs6700158149,809,410G(0.93)G(0.92)G(1.0)rs4845459149,820,424A(0.85)A(0.93)TRY6rs13230029141,907,602G(1.0)G(0.97)rs4726581141,912,822C(1.0)C(0.94)rs4726582141,912,983T(1.0)T(0.97)rs4726583141,912,987T(1.0)T(0.97)rs2734212141,936,451G(1.0)G(0.97)rs2734213141,936,908C(1.0)C(0.97)rs2855983141,939,191A(0.97)A(0.97)A(1.0)rs2734218141,946,713C(0.97)C(0.94)C(0.92)GSTM1rs2071487109,531,808C(1.0)rs448934109,534,259A(1.0)rs1858749109,540,179G(0.97)rs366631109,551,199T(0.76)T(0.85)T(0.91)GSTT1rs576014722,659,502C(0.80)C(0.83)rs40725722,671,104G(1.0)G(1.0)G(0.48)


Our results indicate that the human genome has hundreds of common, multi-kilobase deletion variants, including some that remove genes, and that SNPs can be used to discover, analyze, and serve as markers for these variants. While we have used this approach on the HapMap, the same approach can be used to search for deletion variants in any set of SNP genotypes, such as data from imminent whole-genome association studies. Discarded, “failed” assays from earlier medical genetics studies could also be re-examined to search for the spatially patterned signature of a segregating deletion. Such an approach could be used together with intensity data from genotyping assays (FIG. 3A and Zhao et al., Cancer Res. 64:3060-3071 (2004)) to routinely identify deletions in genetic studies.


We describe an initial catalog of common deletion variants, but it is just a first draft toward a complete catalog. We have detected only those deletions large enough to affect multiple, independent HapMap SNP assays; most deletions smaller than 5 kb would not be detected at the current HapMap marker density. Phase 2 of the HapMap, with an assay every 1 kb, will considerably increase this resolution. The low density of HapMap assays in very-recently-duplicated regions of the genome has also impeded our discovery of deletions there; thus, our findings are limited to deletions of relatively unique sequences. Other types of structural variants, such as multi-copy duplications, may be more susceptible to recurrent structural mutation and therefore show less linkage disequilibrium. The application of diverse methods for finding structural variants (Sebat et al., supra; Iafrate et al., supra; Tuzun et al., supra; Sharp et al., supra; and Fredman et al., Nat. Genet. 36:861-866 (2004)), together with the development of follow-on genotyping assays, will allow more-complete catalogs of structural variants and their linkage disequilibrium properties.


Most importantly, an integrated view of structural variation and SNP variation is critical to medical genetics. To the extent that common deletion variants are in linkage disequilibrium, their association to disease can be discovered by the kinds of strategies proposed for SNP association studies (HapMap consortium, Altshuler et al. Nature 437:1299-1320 (2005)). In the future, medical genetics will benefit from a full catalog of common variants, since all types of alleles must be considered in an unbiased search for the causes of disease.


Materials and Methods


Fluorescent in situ Hybridization (FISH)


Fosmid clones with end sequences mapped to locations within predicted deletion intervals were obtained from the BAC/PAC resource, and DNA was isolated from each fosmid with the Maxi DNA plasmid kit (Qiagen). Fosmid DNAs were then labeled by nick translation with Spectrum Green-11-dUTP (G248P89259F2 and G248P87989C3 on chromosome 4) or Spectrum Orange-11-dUTP [Vysis, Inc.] (G248P87609A7 on chromosome 8 and G248P81036F4 on chromosome 18). We co-hybridized the test probes with appropriate positive control probes: Spectrum Orange-11-dUTP-labeled BAC clone RP11-363G1 (BAC/PAC; chromosome 4p15.1), and biotin-16-dUTP-labeled chromosome 8 and 18 paint probes (Roche). FISH experiments were performed using standard hybridization conditions on metaphase chromosome preparations derived from lymphoblastoid cell lines obtained from the Coriell Institute for Medical Research. Cy5-labeled streptavidin was used for detection of the biotin labeled chromosome 8 and 18 paint probes. Images were captured on an Olympus AX70 fluorescent microscope equipped with a CCD camera (Photometrics KAF 1400) with appropriate fluorescent filters and analyzed with Applied Imaging's Genus software.


The chromosome 4 fosmids used for FISH validation (G248P89259F2 and G248P87989C3) are mapped to segmental duplication-containing regions (Sebat et al., supra). Sequences with >94% nucleotide similarity are located <1 Mb (on chromosome 4) from each fosmid (http://genome.ucsc.edu). We considered the possibility that these probes could hybridize to a segmental duplication and yield a positive FISH signal, even if the target sequence were deleted. To investigate this, we repeated these experiments six times under various hybridization conditions, including once with an extended hybridization of 48 hours. In four out of these six experiments for a given probe and in a minimum of 25 metaphase spreads examined per individual, we consistently observed zero fluorescent probe signals (e.g., for fosmid probe G248P89259F2: NA19098), one signal (NA19100, NA19200, NA19202), or two fluorescent probe signals (NA19099, NA19201) per individual. Furthermore, in these experiments we included parent-offspring trios and FISH results were consistent with Mendelian inheritance of deletions. In two experiments (including the 48 hour hybridization protocol), those individuals believed to be homozygous for the deletion, heterozygous for the deletion, and homozygous for the non-deletion allele were observed in a minimum of 25 metaphase spreads per individual to have two faint signals (e.g., for fosmid probe G248P89259F2: NA19098), one faint and one strong signal (NA19100, NA19200, NA19202), and two strong signals (NA19099, NA19201), respectively. FIG. 6C shows such a signal intensity difference in an individual heterozygous for the chromosome 4 deletion containing fosmid G248P87989C3.


PCR Validation of Homozygous Deletion Variants


To validated predicted homozygous deletions by PCR, we selected 60 candidate deletion loci for which the pattern of genotypes predicted the existence of at least two individuals with homozygous deletions in at least one population. The criterion for validation was confirmation of a precise predicted pattern of amplification success and amplification failure across at least 12 samples that included at least two predicted examples of each result. Any deviation from that pattern was classified as a confirmation failure. The predictions (about which individuals harbored homozygous deletions) were derived from the SNP genotypes—the individuals in whom multiple null genotypes had given rise to the predicted deletion variant (Appendix A) were predicted to be homozygous null; all other individuals were predicted to have genetic material at that locus. Importantly, we chose PCR amplification sites that were distinct from any of the sequences used in the SNP genotyping assays, so that this would be an independent confirmation of a predicted result. Table 4 includes a list of PCR primers that were used in PCR assays for each deletion variant.

TABLE 4PCR assays for homozygous deletions.Deletion variantSEQLeftRightPCR primers usedIDsupportingsupportingForward primerReverse primerNOChrmarkermarkersequencesequence19chr211,042,73211,043,694TAGAGGCAGGGGCCTAAATGCCATATTTCTAAAAGTTCTTTGTATGGAGGGC20chr289,093,93589,175,498TGTGGTTGCAAGGTTCCTCTTCCCGTGCTGATTTCCAACTCTAATTTTC21chr2123,575,969123,577,446AGTGGCATGGCAGCCAATCATTTTACACAGGGTTTGTAAACTCAACTACGTC22chr2129,734,701129,735,219CACCTTCCAGATGAACAATGCATCACTGGAATGCACACCACCACCC23chr2238,086,689238,093,885GCTGAATAATAGCCGCTCCATATCAAAGTAAGGGACAGGCGAGTAAGCAATC24chr399,731,95099,732,552CTCATGGCACCCACACCTATTTGAGGATATAAAAAGCTCCTACAGACCAAACCTACC25chr3133,032,620133,033,926AAACACCCTGATTCTTTCTAGAGCTCATCGTGTGCAGGTCTTCTACCAAGGGC26chr3133,312,348133,312,893GATTTGAGGAACCTTACACAACAGAGCTCCTTTGAGAGACCTTTCTCCGGCAC27chr3163,538,907163,550,497GAGTGATTTGTATAGTTCTTTCTAATACATATAGGCAGCCAATAACCAATAAATCTTCCAAACC28chr3190,685,007190,688,562TTACAACTGGTTAAAATATGGGGCCATCTCCAGTAGAAGTGGCATTCGATGTAGGC29chr3192,388,157192,390,897AGCAGAGTTAGTTCCACATCCCCCACATAGCAAAATGTCCTACACACATACCC30chr49,969,5249,980,122GGTGATTATACTTAGACCACTTGCCAATCCACCTGCTCATAACTCCCAGC31chr421,123,92921,126,700ACAGGTAGGTGGGTCTGCCCAATAATCTAAATATGAAACTTAGTAGGCATCCAGTAGCTGTC32chr434,677,42234,724,191GTGTGCTCGGAGAAACCATATAACCACAGAGCTCTAATTGTGGGCCAACC33chr464,701,16464,713,008ATGGAAAGCAGCAAAAACACAAAATATGCCAAACTCACAATTATAATACCCCCTCC34chr4115,637,062115,641,110CATAAGGGCTGCCTCCCACTGCATCACATGTGCGCCCAG35chr4138,551,715138,556,685TTCAACAGAAATGATGCATTCTAGAACTAATTGCTTACATAATGAACATGGGCTGAGG36chr4152,458,392152,461,345CTGAGTTATGTATAATATGGGGTGAGGAGCCACTTTGGCCATAGAGGGGTGTGG37chr4189,917,097189,929,184AAGGGTTCAATGTGAAGCCCGTCCAGTTCAAAGGCATCCCTGC38chr619,151,52919,155,942GCAAAGGATCTTTTTTATCATAAAAAGACTGAGCCAACCTGTAGGGCAAGTAGATG39chr789,422,55689,424,327CATTGTTCTTAATCAGATGTGACAAAGATTGTTATCAAGGTCTGGAAGGGTCTC40chr797,008,44097,012,729CCTAAATCTTAAATGCCAACAAGCTGATAAGTTTGGTAGGGTCCAAAGACAAAAAGGG41chr7125,601,054125,603,762TTGGAATGGCTTAATTTTCTTAAATAACATCTTCTGTGCCTTTAAGCGTGTGTC42chr7141,188,320141,199,669GTAAAGGGTCTTTAACACACATGCCCTGTAAATTTGGGTGAAGTGTTTCTCCG43chr7141,456,537141,472,285GGCTGGATGACTACACACACACTACATGAGCCTACATCACTTGGATATATACGTTTC44chr7141,921,685141,931,471ACAGGTGATAAAAGCCCTCAGATGATTGCCCGAGCCATTGGTGAAGTG45chr82,242,1102,250,519CAACAAGCATGCAAACCAGGGCAAGACAACATCAGTGTCCCGC46chr839,250,10739,404,547AGAAGAATATTTATTTGCAAGTTATGAAACTCCTTATTGTATGCTGAGATGGGCATAC47chr854,202,94254,211,318AAAGTTCCCCCTTTTGATCCTTGGTCTTGGGGAGGTAGTTTAGCCTATTCC48chr855,414,54455,423,847AAAGAAAATCTGTTTTGTGTTGGTTTTTCTGGAGATTCGTTGAGGGCAACATATTG49chr859,355,79459,368,409GAGTGCTTAAGATCACATCCCACCCAGCCTGATATGAGTGGCAGTTCGC50chr865,165,83865,167,766AACCTCTGCCCCATACCTTCTCAGCAAACCCACCGGCCATC51chr8103,010,682103,011,802CCTTGGGAAAGGTGCATCCACAAAGTCTTAGGGAGCGTTCCAAACTGTAGG52chr8115,591,865115,599,078TGCACTGCAATGCGATAGGATCTGATCCACCTTTGAGAAATGCACTCAGC53chr94,006,8144,009,224TCCCTACCAGCCACAATCCAAGATTCGAGGGGGAAAATGCACG54chr932,991,44933,014,917AATGCCTTTCCCACATGAGCAGCTTCCGTTTCCTGTCCTACCTGCC55chr9124,896,840124,898,353AGGAGAGGGTCAATGTTGCGTCACAGACCACCATAAGCGGGGGAG56chr9133,660,972133,664,954CAGGCCCTGGAAGCTGGTGCTTTCATTTCCCCGCGGAAAAC57chrX3,700,0093,708,183GCAAATATCTAATTTTAACTACGATTATTTTCAAGATCCAATGATGGAATAGAGGCCCAAAGAC58chrX15,834,90515,839,616AAAAACAAATAAAGGTATGAAGTTGTGCAATGGTGCCCAAGTAGTGCCAG59chrX83,958,99783,971,986CAACCTTCAAACTCAGAGGAGTGTGTGTTTGGGATAGAAGACTGCCCCC60chrX107,659,218107,737,812GTTCCAAGAGATTAGATTTTAAGGAAGAAACCGATAAGAACACTCCACCCCTACCTG61chrX114,373,214114,373,881CAGTAGGACAATGATCCTGGCCTTAGGACAAAGTAGGCTTATGAGGACGAGCAG62chrX140,165,208140,166,897GACCACAGCACATAGCATAGCGACAACTCAACACATCCCTATGCCTTCTG63chr1227,539,97727,545,038CCATGGCATTTTATGTTCACCTTCAATTGGCTCCCATCTCTCCACTCTTG64chr1239,104,52739,106,948AGAAAACAGCTATTCTTAGTCGGAGTCATGGTGGTTGAACATGGGTATTTGCTATG65chr12117,847,706117,853,165GGTTTAAACTATTGCTGTGCATGGCTATCCAGGTGGAAGAGATGACTTGTACTCGGC66chr1433,107,80733,110,015ATTCCCACCAGCTAGGTCAGAGAAGTGCCCTTCACCAGACTGAAAAATGTGG67chr1468,010,23168,011,603TTTCTCCCTTCCTGGCCTTGGCCATAGCCAAACACGCTCAACAAG68chr181,907,9001,922,838GACCCCAAGGAGAATGTTTTCATAAATTTGCTGACCTAACCCTAACCCAG69chr1836,512,51336,518,187CCTGGCTTCCTTAAAAAGGCCTAACTATCAAACTCACTTAGTGAACCCATGTAAACTC70chr1845,353,98945,369,896GATCGAGCAGAATACTTCACCCGGATGGCCACCAACCGCC71chr1846,252,95246,257,058TCTGTAGAGTGTGAAACCTCTGGCTCACCAGTTAGTACAGGTGTTGGCC72chr1864,895,17764,904,477TTCTTCTCCTATAGCTAAATAGTGGCATACCTGCTAATGCAAGACCTCGTCTACTCAGTTCC73chr2247,865,64747,867,222TCTCTATGCCAATGGACTCTCCTGATTAAACAAATTGAAATTCAATCCAACCTTAAC


Illumina (Two-Color, Allele-Specific Fluorescence) Validation of Deletion Variants


Seventeen candidate deletion variants covered at least three SNPs that had been assayed on the Illumina platform at the Broad Institute. The Illumina platform generates a quantitative allele-specific intensity measurement for each allele in each individual in a population. The normalized allele-specific intensity measurements are comparable across individuals and generally fall into two or three discrete clusters, corresponding to individuals homozygous for allele 1, individuals homozygous for allele 2, and individuals heterozygous for alleles 1 and 2. For SNPs covered by predicted deletion variants, we observed additional genotype classes corresponding to individuals hemizygous for allele 1, individuals hemizygous for allele 2, and individuals homozygous for the deletion allele. We considered a deletion variant validated if (i) we observed one or more of these additional, well-separated genotype clusters, and (ii) all of the individuals predicted (from multi-marker genotype patterns) to be hemizygous or homozygous deleted in fact fell into the appropriate additional cluster.


Quantitative PCR


Individuals' deletion genotypes cannot be unambiguously inferred from SNP genotypes data (see, for example, FIG. 2B). Therefore it was necessary to develop assays for accurately typing the deletion variants. We performed two-color TaqMan assays, using a FAM-labeled probe for the test gene and a HEX-labeled probe for PMP22, a euploid control gene. TaqMan amplification reagents were purchased from Applied Biosystems together with the following assays.

PMP22(control)primer1CCCTTCTCAGCGGTGTCATC(SEQ ID NO: 1)primer2ACAGACCGTCTGGGCGC(SEQ ID NO: 2)probeVIC-TTCGCGTTTCCGCAAGAT-(SEQ ID NO: 3)MGBNFQGSTM1primer1CTGTGTCCACCTGCATTCG(SEQ ID NO: 4)primer2GAGACCGGGCACTCACTGT(SEQ ID NO: 5)probe6FAM-TCAGTCCTGCCATGAGCAG(SEQ ID NO: 6)GC-BHQ1GSTT1primer1GGGATGGAAAGTCACGTCCT(SEQ ID NO: 7)primer2AGAGACTGGGACAGCGTCAA(SEQ ID NO: 8)probe6FAM-CAGAATCTCAGCAGCTGGG(SEQ ID NO: 9)CCA-BHQ1CYP2A6primer1AGGATGGGGACTTTTCCTTT(SEQ ID NO: 10)primer2TCCTCATCTTCAGCTGTTGG(SEQ ID NO: 11)probe6FAM-CATTCAGGATTCTGGGCTT(SEQ ID NO: 12)GCTCC-BHQ1OR51A2primer1TGCCAATTGCCTACTGTTTG(SEQ ID NO: 13)primer2AGCAACAGTGGAAGGAGAGAA(SEQ ID NO: 14)probe6FAM-GACAACATAACCAAGTGGG(SEQ ID NO: 15)GCTTATTTTC-BHQ1PRB1primer1TGAAGGGACCTCAGTAGTTGG(SEQ ID NO: 16)primer2TGACAGGCATGGTTCTTCTG(SEQ ID NO: 17)probe6FAM-CTGACTTTCTAGCAAGG-M(SEQ ID NO: 18)GBNFQUGT2B17Applied BiosystemsHs00854486_sHUGT2B28Applied BiosystemsHs00852540_s1LCE3CApplied BiosystemsHs00708773_s1


Small (60-90 nt) amplicons from the test and control loci were simultaneously amplified in the same tube, in 96-well plates (one plate per population, including five replicate samples and one blank sample) on a Bio-Rad iCycler. The threshold cycle (Ct) was calculated for each fluorophore separately, and the difference between the threshold cycles for the two fluorophores (delta_Ct) was used as a measurement of relative copy number that could be compared from sample to sample on the same plate. For each assay, the delta_Ct measurements clustered into three discrete groups (with one group typically showing no amplification of the test locus at all). For some assays, these groups were initially incompletely separated; in these cases, averaging of the delta_Ct measurements across 3-5 replicates resulted in discrete, well-separated clusters of average measurements. For each assay, we treated these three clusters as “+/+,” “+/−,” and “−/−” genotypes. In each case, the resulting genotype calls for replicate samples agreed completely, and the resulting genotypes showed Mendelian inheritance and Hardy-Weinberg equilibrium.


Example 2
Use of Common Deletion Variants for Determining Immunocompatibility in Bone Marrow Transplant

The non-MHC factors which determine histocompatibility are generally unknown. As a consequence, allogeneic transplantations carry risk due to unforeseen incompatibilities between donor and host. The human genome has recently been shown to exhibit large-scale deletion polymorphism, including many large common deletion variants that appear as homozygous deletions in a significant fraction of the population. In the following example of the methods of the invention we investigated whether deletion mismatches for common deletion variants (homozygous deletion in donor but not in host) were associated with graft-versus-host disease (GVHD) following allogeneic hematopoetic stem cell transplantation (aHSCT).


Using the methods described below, we evaluated 500 aHSCT cases involving HLA-identical sibling donor-recipient pairs. We typed donors and patients for the presence of six gene deletions, and assessed whether aGVHD and cGVHD occurrence were associated with mismatch for these gene deletions. We found that mismatch for two common deletion variants, UGT2B28 and UGT2B17, was associated with chronic GVHD, and, for UGT2B17, was also associated with acute GVHD. These results demonstrate that large deletion variants may contribute to histoincompatibilities among individuals, and validate the usefulness of the invention described in this application. GVHD risk might be reduced by prospectively typing donors and patients for deletion variants in UGT2B17 and UGT2B28 genes.


Patients


The main study population consisted of 500 aHSCT recipients and their HLA-identical sibling donors. Inclusion criteria were the use of full myeloablative aHSCT. All recipients and donors gave written informed consent according to protocols approved by the institutional review boards of Helsinki University Central Hospital and the Dana Farber Cancer Institute (protocol 01-206).


The aGVHD replication study population consisted of 336 aHSCT recipients and their HLA-identical sibling donors, collected as described previously (Nichols et al., Blood. Dec. 15, 1996;88(12):4429-34).


Genotyping of Deletion Variants


We developed a quantitative PCR assay for typing each deletion variant in each donor and patient. In this assay, the locus of interest and a control, two-copy locus (PMP22) are simultaneously amplified in a 20 μl reaction containing TaqMan Master Mix (Applied Biosystems) together with a forward primer, a reverse primer, and a dual-labeled probe for each locus. The probe for the test locus (gene deletion polymorphism) is labeled with FAM and a BHQ-1 quencher (IDT); the probe for the control locus is labeled with VIC and an MGB quencher (Applied Biosystems). The simultaneous amplification of the test and control loci is monitored by real-time PCR and a threshold cycle (Ct) is determined separately for each locus by separation of the FAM and VIC spectra.


A sample was determined to be homozygous deleted for the test locus if the control locus showed robust amplification (Ct<32) while the test locus failed to amplify after 40 cycles. The quantity δCt =Ct—control−Ctgene showed a discrete, bimodal distribution across the remaining, non-homozygous deleted samples; samples from the higher δCt cluster were determined to have two copies of the gene, and samples from the lower δCt cluster were determined to have one copy. As a quality-control check, we verified that both of the following were true: (i) membership in the three genotype classes (corresponding to 0, 1, 2 copies) showed Hardy-Weinberg equilibrium and (ii) sibling genotypes were correlated across the cohort: regression of patient genotypes against the genotypes of their sibling donors yielded a regression coefficient that was not significantly different from 0.5.


Determination of Mismatches


Transplants were determined to involve a donor-recipient “deletion mismatch” for a deletion variant if the donor was homozygous deleted for that gene, and the recipient had a positive number (1 or 2) of gene copies. Transplants were considered to involve a “sex mismatch” if they involved a female donor and a male recipient.


Statistical Analysis


Acute and chronic GVHD were diagnosed and graded according to standard criteria. Acute GVHD cases were those with grades 2-4 aGVHD; controls were those with grades 0-1 GVHD. Chronic GVHD cases were those with “limited” or “extensive” cGVHD; cGVHD controls were those with “no” cGVHD.


The relationship between deletion variant mismatches and GVHD status was first assessed by association analysis of mismatch at each individual locus with aGVHD and cGVHD, using the 360 donor-recipient pairs who had no known mismatch risk factors (no sex mismatch). We performed a one-sided chi-square test.


The Michigan aGVHD cohort was used for replication analysis of the single locus showing positive association for aGVHD in the initial analysis, assessed by association analysis of mismatch at each individual locus with aGVHD. We performed a one-sided chi-square test.


Two loci (UGT2B17 and UGT2B28) were found to show positive associations in the initial analysis and were then assessed in the full cohort of 836 donor-recipient pairs using a regression model for GVHD risk whose terms included age, transplantation year, sex mismatch, UGT2B17 mismatch, UGT2B28 mismatch, and the interaction terms sex+UGT2B17 mismatch and sex+UGT2B28 mismatch.


These results demonstrate that deletion variants may contribute to histoincompatibilities among individuals. GVHD risk might be reduced by prospectively typing donors and patients for UGT2B17 and UGT2B28 gene deletions.


Example 3
Use of Deletion Variants for Determining Immunocompatibility in Organ Transplant

As described for Example 2, the non-MHC factors which determine histocompatibility are generally unknown. As a consequence, allogeneic organ transplantations carry risk due to unforeseen incompatibilities between donor and host. This study was designed to investigate whether mismatch for common deletion variants (homozygous deletion in donor but not in host) is associated with host-versus-graft disease (HVGD) following kidney transplantation.


Patients


The first study population consists of 500 renal allograft recipients and their HLA-identical sibling donors. The second study population consists of 700 renal allograft recipients and their unrelated donors. All recipients and sibling donors provided written informed consent according to protocols approved by the institutional review boards of Massachusetts General Hospital, Helsinki University Central Hospital, and Hospital do Rim o Hipertensao Sao Paulo.


Genotyping of Deletion Variants


Samples from each of the patient populations have been collected and will be used for the genotyping of deletion variants as described in Example 2. Methods for analysis of the samples after genotyping of deletion variants is performed are described below.


Determination of Deletion Variant Mismatches


Transplants are determined to involve a donor-recipient “mismatch” for a deletion variant if the recipient had a deletion variant in all copies of the gene (i.e., homozygous deletion) and the donor had a positive number (1 or 2) of gene copies.


Statistical Analysis


Renal allograft rejection was diagnosed and graded according to standard criteria. The primary diagnostic categories used in this study are “rejection,” “no rejection,” and “days to rejection.”


For the first (sibling-donor) study population, the relationship between gene deletion mismatches and rejection status is assessed by association analysis of mismatch at each individual locus with risk of rejection. A one-sided chi-square test is used to assess whether mismatch is associated with increased risk of rejection.


Any two loci found to show positive associations in the initial analysis are then assessed using a regression model for rejection risk whose terms include age, donor sex, recipient sex, transplantation year, cold ischemia time, and mismatch for each of the deletion variants.


For the second (unrelated-donor) study population, we assess the contribution of gene deletion mismatches using using a regression model for rejection risk whose terms include age, donor sex, recipient sex, transplantation year, cold ischemia time, mismatch for each of the gene deletion polymorphisms, and the numbers of HLA-AB and HLA-DR mismatches.


Appendix A: Predicted Deletion Variants and Supporting SNP Evidence

This table lists 541 predicted deletion variants identified from patterns of SNP assay failures in the Phase 1 Hapmap, as described in this study. The three leftmost columns show the location of the predicted deletion variant (the genomic coordinates spanned by all SNPs that supported the prediction). The five rightmost columns describe the evidence supporting each prediction: the locations of SNP assays, the population and type of supporting evidence, and the individuals in whose genotypes that evidence was observed.


Key to populations:

    • CEU 90 individuals (30 trios) of European ancestry, sampled in Utah, USA
    • YRI 90 individuals (30 trios) of Yoruba ancestry, sampled in Ibadan, Nigeria
    • JCH 45 unrelated individuals of Han Chinese ancestry, sampled in Beijing, plus
      • 44 unrelated individuals of Japanese ancestry, sampled in Tokyo


All physical coordinates are shown on the hg16 build of the human genome.

Predicted deletionvariantSupporting evidenceLeftmostRightmostLeftmostRightmostsupportingsupportingsupportingsupportingChrmarkermarkermarkermarkerPopType of evidenceIndividualschr155,87168,94155,87168,941CEUMendelian inconsistencies(NA12155, NA10831), (NA12044, NA10857)55,87168,941YRIMendelian inconsistencies(NA18912, NA18914), (NA19207, NA19208)chr110,084,72510,087,96210,084,72510,087,962YRIMendelian inconsistencies(NA18502, NA18500), (NA18912, NA18914),(NA19130, NA19132)chr110,415,63710,427,14310,415,63710,427,143CEUMendelian inconsistencies(NA12146, NA10847)chr112,541,83912,549,32712,541,83912,549,327YRIMendelian inconsistencies(NA19101, NA19103)chr116,392,73616,405,38116,395,82016,405,381CEUMendelian inconsistencies(NA12815, NA12802)16,392,73616,400,201YRIMendelian inconsistencies(NA19138, NA19139)chr116,634,90916,646,44716,634,90916,646,447YRIMendelian inconsistencies(NA19207, NA19208), (NA19238, NA19240)chr134,606,76134,610,71534,606,76134,610,715CEUMendelian inconsistencies(NA12154, NA10830), (NA11839, NA10854),(NA12892, NA12878), (NA12763, NA12753)chr143,330,70343,346,98343,330,70343,346,983YRIMendelian inconsistencies(NA18853, NA18854)chr161,811,70561,813,16361,811,70561,813,163YRIMendelian inconsistencies(NA19203, NA19205), (NA19222, NA19221),(NA19098, NA19100)chr172,137,66872,176,87072,142,47372,161,849CEUMendelian inconsistencies(NA11832, NA10855), (NA12154, NA10830),(NA07000, NA07029), (NA07022, NA07019),(NA12043, NA10857), (NA12873, NA12864),(NA12874, NA12865), (NA06985, NA06991),(NA12751, NA12740)72,139,34572,176,870CEUMendelian inconsistencies(NA12003, NA10838), (NA12006, NA10839),(NA12057, NA10851), (NA11993, NA10860),(NA11994, NA10861), (NA12145, NA10846),(NA12716, NA12707), (NA12874, NA12865),(NA11881, NA10859)72,137,66872,147,489YRIMendelian inconsistencies(NA18501, NA18500), (NA18508, NA18506),(NA18516, NA18515), (NA19200, NA19202),(NA19160, NA19161), (NA19143, NA19145)72,139,34572,176,870CEUHardy-Weinbergpopulation72,137,66872,178,467YRIHardy-Weinbergpopulation72,139,34572,178,467JCHHardy-Weinbergpopulationchr185,825,06285,827,08985,825,06285,827,089CEUMendelian inconsistencies(NA12762, NA12753)chr187,059,98787,062,65187,059,98787,062,651CEUMendelian inconsistencies(NA11831, NA10855)chr187,784,37887,792,85787,784,37887,792,857CEUNull genotypesNA10839chr191,257,67791,268,00891,257,67791,268,008CEUMendelian inconsistencies(NA07000, NA07029)chr194,609,88594,625,06394,622,66794,623,588YRIMendelian inconsistencies(NA18505, NA18503), (NA19141, NA19142)94,609,88594,625,063YRIMendelian inconsistencies(NA18505, NA18503)chr1102,291,054102,291,746102,291,054102,291,746CEUMendelian inconsistencies(NA11995, NA10861), (NA07055, NA07048)chr1109,527,309109,534,259109,527,309109,534,259CEUNull genotypesNA12043, NA12264chr1110,677,705110,681,790110,679,447110,681,702CEUNull genotypesNA12264, NA12812, NA12801, NA12814110,677,705110,681,790CEUMendelian inconsistencies(NA11992, NA10860), (NA11882, NA10859),(NA12249, NA10835)chr1114,106,286114,107,358114,106,286114,107,358CEUMendelian inconsistencies(NA12044, NA10857), (NA12760, NA12752)chr1119,469,803119,473,878119,469,803119,473,878YRIMendelian inconsistencies(NA18501, NA18500)chr1142,902,233142,921,305142,902,233142,921,305CEUMendelian inconsistencies(NA12814, NA12802)chr1146,404,349146,405,202146,404,349146,405,202CEUMendelian inconsistencies(NA12044, NA10857)chr1146,563,653146,580,520146,569,781146,580,520CEUMendelian inconsistencies(NA12044, NA10857),(NA07055, NA07048)146,563,653146,572,086YRIMendelian inconsistencies(NA18508, NA18506), (NA18856, NA18857)chr1146,591,613146,605,848146,591,613146,605,848YRIMendelian inconsistencies(NA19099, NA19100)chr1148,535,315148,561,285148,535,315148,561,285YRIMendelian inconsistencies(NA18853, NA18854)chr1149,296,571149,300,621149,296,571149,300,621CEUMendelian inconsistencies(NA12716, NA12707)chr1149,771,758149,800,260149,785,060149,797,102YRIMendelian inconsistencies(NA18505, NA18503), (NA18517, NA18515),(NA19128, NA19129), (NA19238, NA19240)149,786,102149,798,424YRIMendelian inconsistencies(NA18505, NA18503), (NA18517, NA18515),(NA18912, NA18914), (NA19200, NA19202),(NA19207, NA19208), (NA19160, NA19161),(NA19143, NA19145), (NA19128, NA19129),(NA19238, NA19240)149,786,102149,800,260CEUMendelian inconsistencies(NA12057, NA10851), (NA11829, NA10856),(NA12155, NA10831), (NA07056, NA07019),(NA12145, NA10846), (NA12717, NA12707),(NA12891, NA12878), (NA12760, NA12752),(NA12761, NA12752), (NA12763, NA12753),(NA07034, NA07048), (NA07055, NA07048),(NA06993, NA06991), (NA06985, NA06991),(NA11882, NA10859)149,771,758149,798,424YRIMendelian inconsistencies(NA18502, NA18500), (NA18505, NA18503),(NA18517, NA18515), (NA19200, NA19202),(NA19207, NA19208), (NA19160, NA19161),(NA19119, NA19120), (NA19143, NA19145),(NA19128, NA19129), (NA19238, NA19240)149,771,758149,798,424YRIHardy-Weinbergpopulation149,774,642149,800,260JCHHardy-Weinbergpopulationchr1149,977,953149,986,389149,977,953149,986,389CEUMendelian inconsistencies(NA12057, NA10851), (NA11829, NA10856),(NA11832, NA10855), (NA07034, NA07048)149,977,953149,982,821YRIMendelian inconsistencies(NA19210, NA19211)149,977,953149,986,389CEUHardy-Weinbergpopulationchr1155,706,737155,707,243155,706,737155,707,243CEUMendelian inconsistencies(NA12249, NA10835)chr1155,737,529155,738,184155,737,529155,738,184YRIMendelian inconsistencies(NA19102, NA19103), (NA19207, NA19208)chr1160,080,024160,080,653160,080,024160,080,653CEUMendelian inconsistencies(NA12751, NA12740)160,080,024160,080,653YRIMendelian inconsistencies(NA18508, NA18506), (NA18871, NA18872),(NA19116, NA19120), (NA19127, NA19129),(NA19131, NA19132), (NA19238, NA19240)chr1172,037,243172,041,015172,037,243172,041,015CEUMendelian inconsistencies(NA12154, NA10830), (NA12144, NA10846),(NA12248, NA10835)chr1187,000,430187,069,828187,022,238187,058,520CEUMendelian inconsistencies(NA12892, NA12878)187,000,430187,069,828CEUMendelian inconsistencies(NA12892, NA12878)chr1194,080,727194,081,468194,080,727194,081,468YRIMendelian inconsistencies(NA18502, NA18500), (NA18505, NA18503),(NA18523, NA18521), (NA18852, NA18854),(NA18862, NA18863), (NA19102, NA19103),(NA19159, NA19161), (NA19140, NA19142),(NA19152, NA19154)chr1206,205,376206,209,672206,205,376206,209,672CEUMendelian inconsistencies(NA12056, NA10851), (NA11995, NA10861),(NA12044, NA10857), (NA12264, NA10863),(NA12234, NA10863), (NA06993, NA06991),(NA12751, NA12740)chr1226,783,490226,788,594226,783,490226,788,594CEUMendelian inconsistencies(NA12004, NA10838)chr1240,005,624240,011,152240,005,624240,011,152CEUMendelian inconsistencies(NA11830, NA10856), (NA11993, NA10860),(NA07022, NA07019), (NA12762, NA12753),(NA11881, NA10859)chr23,782,0463,786,6543,782,0463,786,654CEUNull genotypesNA12873, NA12248chr211,042,73211,043,69411,042,73211,043,694CEUNull genotypesNA07000, NA12813chr217,207,64117,216,89117,207,64117,216,891CEUNull genotypesNA12812, NA1280117,210,41817,212,022CEUMendelian inconsistencies(NA12812, NA12801)chr218,156,32518,177,35418,156,32518,177,354YRINull genotypesNA19094, NA19093, NA1920618,160,63318,171,634YRIMendelian inconsistencies(NA19093, NA19094)chr218,374,13718,374,72818,374,13718,374,728CEUMendelian inconsistencies(NA11994, NA10861)chr224,576,97424,580,58724,576,97424,580,587YRIMendelian inconsistencies(NA18507, NA18506), (NA18852, NA18854),(NA19204, NA19205)chr229,386,65829,401,47329,386,65829,401,473JCHNull genotypesNA18579chr229,611,57529,629,48829,611,57529,629,488JCHNull genotypesNA18579chr230,137,68730,155,19230,137,68730,155,192JCHNull genotypesNA18579chr230,230,14730,242,57730,230,14730,242,577JCHNull genotypesNA18579chr230,381,43730,392,82930,381,43730,392,829JCHNull genotypesNA18579chr234,688,26234,696,09434,688,26234,696,094CEUNull genotypesNA12005, NA10855, NA11994, NA12155,NA07019, NA12716, NA12891, NA12812,NA12813, NA12801, NA12874, NA12865,NA06993, NA06985, NA06991, NA12751chr235,560,17035,589,40035,560,17035,589,400YRINull genotypesNA18501, NA18506, NA18507, NA18913,NA19221, NA19222, NA19240, NA19239chr241,213,64541,221,38941,213,64541,220,036YRINull genotypesNA18504, NA18862, NA19201, NA1913041,213,64541,221,389JCHNull genotypesNA1861241,216,80441,220,036YRIHardy-Weinbergpopulationchr246,697,75346,700,74446,697,75346,700,744CEUMendelian inconsistencies(NA12146, NA10847)chr252,726,06552,757,08452,726,06552,756,887CEUNull genotypesNA10856, NA11993, NA07056, NA07019,NA12717, NA12891, NA12878, NA12812,NA12815, NA12762, NA12753, NA11881,NA10859, NA1224952,726,06552,757,084JCHNull genotypesNA18524, NA18547, NA18609, NA18608,NA18564, NA18545, NA18542, NA18561,NA18537, NA18579, NA18570, NA18571,NA18620, NA18621, NA18637, NA18526,NA18953, NA18968, NA18964, NA18940,NA18951, NA18947, NA18949, NA18948,NA18952, NA18975, NA18991, NA18994,NA18992, NA19007, NA18990, NA18976,NA18978, NA18995, NA1898152,726,06552,757,084JCHHardy-Weinbergpopulationchr254,667,73854,688,60054,667,73854,688,600YRINull genotypesNA19192chr255,301,62555,307,76955,301,62555,307,769YRINull genotypesNA19142, NA19141chr257,382,18357,389,88657,382,18357,389,886CEUNull genotypesNA12155, NA10831chr259,598,12259,600,15659,598,12259,600,156JCHNull genotypesNA18542, NA18561, NA18621chr271,306,84171,317,12971,306,84171,317,129YRINull genotypesNA18505, NA18857, NA18855, NA18913,NA19139, NA19138chr275,341,47475,346,12575,341,47475,346,125CEUNull genotypesNA12154chr275,819,16475,862,69775,819,16475,862,697JCHNull genotypesNA18555chr277,951,79977,968,75277,951,79977,968,752JCHNull genotypesNA18998chr287,388,75287,418,65687,391,62487,418,656CEUMendelian inconsistencies(NA12146, NA10847), (NA12239, NA10847)87,388,75287,399,350CEUMendelian inconsistencies(NA12239, NA10847)chr287,448,65987,465,27187,448,65987,465,271CEUMendelian inconsistencies(NA12239, NA10847), (NA07055, NA07048)chr289,039,26889,049,26789,039,26889,049,267CEUNull genotypesNA12005, NA10839, NA10855, NA11992,NA11993, NA11994, NA12236, NA06994,NA07000, NA11839, NA12044, NA12707,NA12872, NA06993, NA06985, NA12248,NA10835chr289,093,93589,175,49889,093,93589,175,498CEUNull genotypesNA12005, NA10839, NA10856, NA10855,NA12236, NA10830, NA06994, NA11839,NA12872, NA12873, NA12760, NA07048,NA06993, NA12248, NA10835chr289,796,70590,026,10589,866,75089,993,748CEUNull genotypesNA1083189,826,08689,981,417YRINull genotypesNA18500, NA1850589,796,70590,026,105YRINull genotypesNA18500, NA18505, NA1913789,803,19789,848,524YRINull genotypesNA1850589,823,73489,997,623CEUMendelian inconsistencies(NA12814, NA12802), (NA12875, NA12865),(NA06993, NA06991)89,878,48589,992,924CEUMendelian inconsistencies(NA06993, NA06991)chr298,493,43998,507,32598,493,43998,507,325YRIMendelian inconsistencies(NA18523, NA18521)chr2108,165,482108,177,918108,165,482108,177,918YRINull genotypesNA18872chr2108,209,110108,228,500108,209,110108,228,500YRINull genotypesNA18872chr2112,154,976112,156,194112,154,976112,156,194CEUMendelian inconsistencies(NA07345, NA07348)chr2123,575,969123,577,446123,575,969123,577,446JCHNull genotypesNA18545, NA18558, NA18537, NA18540,NA18633, NA18635, NA18577, NA18594,NA19000, NA18971chr2129,734,701129,735,219129,734,701129,735,219JCHNull genotypesNA18564, NA18566, NA18579, NA18570,NA18968, NA18952chr2132,397,741132,437,832132,397,741132,437,832YRINull genotypesNA19140chr2147,075,728147,086,685147,075,728147,086,685CEUNull genotypesNA11829, NA11832, NA12154, NA12155,NA07348, NA12144, NA10846, NA12234,NA12878, NA12875, NA12865, NA12760,NA07034, NA07055, NA07048, NA11882,NA10859147,075,728147,085,187JCHNull genotypesNA18547, NA18609, NA18550, NA18564,NA18542, NA18532, NA18603, NA18540,NA18566, NA18579, NA18582, NA18633,NA18570, NA18612, NA18571, NA18620,NA18621, NA18573, NA18953, NA18969,NA18961, NA18972, NA18964, NA18956,NA18943, NA18948, NA18966, NA18975,NA18994, NA18992, NA18990, NA18987,NA18980, NA18995, NA18971, NA18974,NA19003147,075,728147,085,187JCHHardy-Weinbergpopulationchr2150,928,553150,929,911150,928,553150,929,911CEUMendelian inconsistencies(NA07055, NA07048)chr2152,066,616152,072,353152,066,616152,072,353YRIMendelian inconsistencies(NA19203, NA19205)chr2152,524,640152,533,963152,524,640152,533,963YRIMendelian inconsistencies(NA19203, NA19205)chr2185,330,664185,336,780185,330,664185,336,780YRIMendelian inconsistencies(NA19137, NA19139)chr2185,454,384185,455,058185,454,384185,455,058YRIMendelian inconsistencies(NA18501, NA18500), (NA18505, NA18503)chr2196,183,303196,184,993196,183,303196,184,993YRINull genotypesNA18858, NA18854, NA18853, NA18863,NA18914, NA19202, NA19207, NA19140,NA19098, NA19192196,183,303196,184,993JCHNull genotypesNA18542, NA18570chr2203,499,611203,511,609203,499,611203,511,609YRIMendelian inconsistencies(NA18505, NA18503), (NA18507, NA18506),(NA19201, NA19202), (NA19172, NA19173),(NA19203, NA19205), (NA19206, NA19208),(NA19223, NA19221)chr2208,555,606208,560,381208,555,606208,560,381YRIMendelian inconsistencies(NA18852, NA18854)chr2209,437,916209,442,285209,437,916209,442,285YRIMendelian inconsistencies(NA18502, NA18500)chr2233,430,372233,455,176233,430,372233,455,176JCHNull genotypesNA18622chr2238,086,689238,093,885238,086,689238,093,885JCHNull genotypesNA18592, NA18637chr2241,763,283241,775,378241,763,283241,775,378JCHNull genotypesNA18942chr2243,342,997243,363,765243,342,997243,363,765JCHNull genotypesNA18605chr34,063,5764,076,3564,063,5764,076,356CEUMendelian inconsistencies(NA12873, NA12864)chr35,073,7055,078,8255,073,7055,078,825JCHNull genotypesNA18961, NA18981chr36,196,0306,211,0386,196,0306,211,038YRIMendelian inconsistencies(NA18859, NA18860)chr315,366,89815,373,05115,366,89815,373,051JCHNull genotypesNA18582, NA18966, NA18990chr316,246,80416,248,56216,246,80416,248,562CEUMendelian inconsistencies(NA12144, NA10846)chr322,058,20122,060,77422,058,20122,060,774YRIMendelian inconsistencies(NA18505, NA18503)chr322,323,10922,338,85122,323,10922,338,851YRIMendelian inconsistencies(NA18912, NA18914)chr330,167,38330,170,75230,167,38330,170,752JCHNull genotypesNA18529, NA18579, NA18570, NA18945,NA18978chr335,935,05935,957,49835,935,05935,957,498YRINull genotypesNA19194chr335,977,05836,016,01435,977,05836,016,014JCHNull genotypesNA18547, NA18943, NA18947, NA18944chr346,758,43246,807,28446,758,43246,807,284YRIMendelian inconsistencies(NA18501, NA18500), (NA18505, NA18503),(NA18859, NA18860), (NA19152, NA19154)46,758,43246,807,284YRIHardy-Weinbergpopulationchr352,991,14452,995,04352,991,14452,995,043CEUMendelian inconsistencies(NA12812, NA12801)chr360,783,76760,860,44960,816,03460,842,631CEUNull genotypesNA11992, NA1083560,783,76760,860,449CEUMendelian inconsistencies(NA11992, NA10860)60,806,08460,844,635CEUMendelian inconsistencies(NA11992, NA10860)60,833,98660,838,247CEUMendelian inconsistencies(NA11992, NA10860)chr362,136,07662,139,99462,136,07662,139,994JCHNull genotypesNA18576, NA18960, NA18980chr365,155,87765,169,61765,155,87765,169,617CEUMendelian inconsistencies(NA12043, NA10857), (NA12248, NA10835)65,156,31965,158,131CEUMendelian inconsistencies(NA07055, NA07048), (NA12248, NA10835)chr368,023,88168,025,90568,023,88168,025,905JCHNull genotypesNA18940, NA18967chr384,621,61384,622,37684,621,61384,622,376JCHNull genotypesNA18576, NA18961, NA18964, NA18948,NA18987, NA19003chr389,323,76389,337,96589,323,76389,337,965JCHNull genotypesNA1900789,328,15789,337,965CEUMendelian inconsistencies(NA12761, NA12752)89,168,78289,401,166JCHHardy-Weinbergpopulationchr389,594,73089,596,77189,594,73089,596,771YRIMendelian inconsistencies(NA19238, NA19240)chr399,731,95099,732,55299,731,95099,732,552CEUNull genotypesNA12006, NA10839, NA11831, NA11995,NA12155, NA11840, NA12239, NA12891,NA12892, NA12878, NA12814, NA12802,NA12875, NA12761, NA12763, NA07034,NA0704899,731,95099,732,552JCHNull genotypesNA18609, NA18564, NA18561, NA18579,NA18635, NA18636, NA18593, NA18621,NA18942, NA18968, NA18969, NA18951,NA18945, NA18949, NA18948, NA18952,NA18999, NA19007, NA18990, NA18987,NA18976, NA1897199,731,95099,732,552YRIMendelian inconsistencies(NA19160, NA19161)chr3100,265,427100,268,749100,265,427100,268,749CEUMendelian inconsistencies(NA11831, NA10855)100,264,911100,268,749CEUHardy-Weinbergpopulationchr3105,453,223105,468,789105,453,223105,468,789JCHNull genotypesNA18966chr3105,563,293105,578,251105,563,293105,578,251JCHNull genotypesNA18966chr3105,679,895105,708,632105,679,895105,708,632JCHNull genotypesNA18966chr3105,741,486105,749,182105,741,486105,749,182JCHNull genotypesNA18966chr3105,820,171105,864,826105,820,171105,864,826JCHNull genotypesNA18966chr3105,938,874105,952,504105,938,874105,952,504JCHNull genotypesNA18966chr3106,031,726106,098,859106,031,726106,098,859JCHNull genotypesNA18966chr3115,979,707115,988,797115,979,707115,988,797YRINull genotypesNA18515, NA18517, NA18516, NA18872,NA18871, NA19205, NA19203, NA19161,NA19160, NA19132, NA19130, NA19194,NA19192115,979,707115,981,957YRIMendelian inconsistencies(NA18507, NA18506)chr3127,035,541127,042,413127,035,541127,042,413YRINull genotypesNA19132, NA19131, NA19130chr3131,109,842131,114,005131,109,842131,114,005JCHNull genotypesNA18572, NA18545, NA18635, NA18621,NA18594, NA18622, NA18949chr3131,453,760131,454,529131,453,760131,454,529CEUMendelian inconsistencies(NA12234, NA10863)chr3133,032,620133,033,926133,032,620133,033,926CEUNull genotypesNA12003, NA12154, NA10830, NA07348,NA07000, NA12891, NA12872, NA12874,NA12865, NA12763, NA07034, NA07048chr3133,312,348133,312,893133,312,348133,312,893JCHNull genotypesNA18550, NA18558, NA18633chr3138,351,092138,352,662138,351,092138,352,662YRIMendelian inconsistencies(NA19102, NA19103)chr3143,931,519143,932,566143,931,519143,932,566YRIMendelian inconsistencies(NA18913, NA18914), (NA19238, NA19240)chr3150,285,601150,288,244150,285,601150,288,244CEUMendelian inconsistencies(NA12234, NA10863)chr3153,671,436153,675,234153,671,436153,675,234CEUMendelian inconsistencies(NA11992, NA10860)chr3156,501,087156,505,963156,501,087156,505,963YRIMendelian inconsistencies(NA18516, NA18515), (NA18870, NA18872),(NA19137, NA19139), (NA19200, NA19202),(NA19206, NA19208), (NA19160, NA19161)chr3160,095,253160,113,199160,095,253160,113,199CEUNull genotypesNA07348chr3163,450,906163,462,798163,450,906163,462,798CEUMendelian inconsistencies(NA11840, NA10854)chr3163,538,907163,550,497163,538,907163,550,497JCHNull genotypesNA18540, NA18968, NA18969, NA18966,NA18991chr3163,833,596163,943,569163,860,609163,861,908CEUNull genotypesNA12003, NA12005, NA12006, NA10839,NA12056, NA12057, NA10851, NA11830,NA11831, NA10860, NA11994, NA10861,NA10831, NA07357, NA07345, NA07348,NA06994, NA07029, NA07056, NA11839,NA12044, NA12145, NA12264, NA12716,NA12717, NA12707, NA12891, NA12892,NA12878, NA12812, NA12801, NA12872,NA12874, NA12875, NA12865, NA12752,NA12763, NA12750, NA12751, NA12740,NA11882, NA12248, NA12249, NA10835163,882,205163,926,256CEUNull genotypesNA12003, NA12005, NA12057, NA10851,NA07357, NA07056, NA12264, NA12891,NA12801, NA12874, NA12752, NA12750,NA12740163,833,596163,943,569YRINull genotypesNA18505, NA18860, NA18857, NA18855,NA18863, NA18861, NA19103, NA19101,NA19139, NA19138, NA19200, NA19204,NA19211, NA19209, NA19208, NA19207,NA19161, NA19221, NA19222, NA19223,NA19116, NA19154, NA19152, NA19100,NA19098163,857,897163,861,908JCHNull genotypesNA18524, NA18572, NA18547, NA18609,NA18550, NA18552, NA18611, NA18555,NA18542, NA18532, NA18561, NA18603,NA18540, NA18605, NA18566, NA18563,NA18624, NA18579, NA18632, NA18582,NA18633, NA18635, NA18592, NA18636,NA18593, NA18576, NA18570, NA18612,NA18571, NA18620, NA18621, NA18594,NA18622, NA18573, NA18623, NA18637,NA18526, NA18942, NA18953, NA18968,NA18959, NA18960, NA18972, NA18965,NA18956, NA18940, NA18951, NA18943,NA18947, NA18944, NA18945, NA18949,NA18948, NA18966, NA18975, NA18991,NA18992, NA18997, NA18998, NA19005,NA18999, NA19007, NA18990, NA18987,NA18967, NA18976, NA18978,NA18995, NA18981, NA18971, NA18974,NA19003163,837,185163,940,699JCHNull genotypesNA18524, NA18572, NA18547, NA18609,NA18550, NA18552, NA18555, NA18542,NA18532, NA18561, NA18603, NA18540,NA18605, NA18566, NA18563, NA18624,NA18579, NA18632, NA18582, NA18633,NA18635, NA18592, NA18636, NA18593,NA18576, NA18570, NA18612, NA18571,NA18620, NA18621, NA18594, NA18622,NA18573, NA18623, NA18637, NA18953,NA18968, NA18959, NA18960, NA18972,NA18965, NA18956, NA18940, NA18951,NA18943, NA18944, NA18945, NA18949,NA18948, NA18991, NA18992, NA18997,NA18998, NA19005, NA19007, NA18990,NA18987, NA18967, NA18976, NA18978,NA18995, NA18981, NA18971, NA18974,NA19003163,835,144163,922,881CEUMendelian inconsistencies(NA07000, NA07029)163,835,144163,891,949YRIMendelian inconsistencies(NA19140, NA19142), (NA19238, NA19240)163,837,185163,917,427CEUMendelian inconsistencies(NA12003, NA10838), (NA07000, NA07029),(NA07056, NA07019), (NA12264, NA10863),(NA12813, NA12801), (NA12760, NA12752),(NA12761, NA12752)163,875,766163,940,699CEUMendelian inconsistencies(NA12005, NA10839), (NA12056, NA10851),(NA07357, NA07348), (NA07056, NA07019),(NA12264, NA10863), (NA12891, NA12878),(NA12874, NA12865), (NA12760, NA12752),(NA12761, NA12752), (NA12751, NA12740)163,840,486163,939,798CEUMendelian inconsistencies(NA12005, NA10839), (NA12056, NA10851),(NA07357, NA07348), (NA07000, NA07029),(NA12891, NA12878), (NA12812, NA12801),(NA12874, NA12865), (NA12751, NA12740)163,860,189163,934,963CEUMendelian inconsistencies(NA12146, NA10847), (NA07034, NA07048),(NA06985, NA06991)163,833,596163,892,785CEUMendelian inconsistencies(NA07000, NA07029)163,837,185163,943,569YRIMendelian inconsistencies(NA19140, NA19142), (NA19238, NA19240)163,888,838163,903,159YRIMendelian inconsistencies(NA19144, NA19145), (NA19238, NA19240)163,889,909163,912,703YRIMendelian inconsistencies(NA19140, NA19142), (NA19144, NA19145)163,833,596163,943,569CEUHardy-Weinbergpopulation163,837,185163,943,569YRIHardy-Weinbergpopulation163,826,348163,940,699JCHHardy-Weinbergpopulation163,835,144163,922,881CEUHardy-Weinbergpopulation163,835,144163,922,881JCHHardy-Weinbergpopulationchr3164,983,304164,985,198164,983,304164,985,198CEUMendelian inconsistencies(NA07056, NA07019)chr3166,377,266166,386,073166,377,266166,386,073CEUMendelian inconsistencies(NA11881, NA10859)chr3166,635,288166,641,054166,635,288166,641,054CEUNull genotypesNA07056, NA11840, NA10863, NA12801,NA12740chr3176,402,880176,404,312176,402,880176,404,312JCHNull genotypesNA18622, NA18974chr3177,217,491177,250,587177,217,491177,250,587YRINull genotypesNA18523chr3180,609,192180,611,006180,609,192180,611,006CEUMendelian inconsistencies(NA12057, NA10851)chr3187,710,472187,716,485187,710,472187,716,485CEUMendelian inconsistencies(NA12813, NA12801)187,710,472187,711,540CEUMendelian inconsistencies(NA12813, NA12801)chr3190,685,007190,688,562190,685,007190,688,562YRINull genotypesNA18503, NA18504, NA19211190,685,007190,688,562JCHNull genotypesNA18547, NA18609, NA18570, NA18612,NA18944, NA18949chr3191,058,267191,060,542191,058,267191,060,542YRINull genotypesNA18858, NA18857, NA18914, NNA19205, NA19127, NA19132, NA19240191,058,267191,060,542JCHNull genotypesNA18524, NA18611, NA18537, NA18563,NA18624, NA18579, NA18633, NA18592,NA18620, NA18623, NA18953, NA18969,NA18991, NA19000, NA18976chr3192,388,157192,390,897192,388,157192,390,897YRINull genotypesNA19101192,388,776192,390,897YRIMendelian inconsistencies(NA19159, NA19161), (NA19193, NA19194)192,388,157192,390,897CEUHardy-Weinbergpopulationchr3193,920,249193,930,629193,920,249193,930,629YRINull genotypesNA19210chr3194,196,286194,205,086194,196,286194,201,998YRINull genotypesNA18860, NA18858, NA18859, NA18521,NA18522, NA18854, NA18852, NA18853,NA18857, NA19102, NA19139, NA19137,NA19138, NA19204, NA19211, NA19120,NA19142, NA19140, NA19145, NA19143,NA19128, NA19132, NA19131, NA19100,NA19238194,196,286194,201,998JCHNull genotypesNA18609, NA18550, NA18608, NA18552,NA18611, NA18564, NA18545, NA18542,NA18558, NA18562, NA18537, NA18603,NA18540, NA18563, NA18624, NA18579,NA18632, NA18582, NA18633, NA18635,NA18592, NA18636, NA18593, NA18577,NA18570, NA18612, NA18571, NA18620,NA18594, NA18622, NA18623, NA18637,NA18526, NA18942, NA18953, NA18959,NA18969, NA18961, NA18972, NA18965,NA18973, NA18964, NA18940, NA18951,NA18943, NA18947, NA18944, NA18945,NA18948, NA18952, NA18966, NA18975,NA18991, NA18994, NA18992, NA18997,NA18998, NA19000, NA19005, NA19007,NA18990, NA18987, NA18967, NA18976,NA18978, NA18970, NA18980, NA18981,NA18971, NA18974194,200,830194,204,450CEUNull genotypesNA12005, NA12056, NA10851, NA11830,NA11831, NA11832, NA10855, NA11993,NA12155, NA07357, NA06994, NA07000,NA07029, NA07022, NA07019, NA12043,NA10857, NA12144, NA12146, NA12239,NA10847, NA12264, NA12234, NA10863,NA12716, NA12717, NA12707, NA12891,NA12812, NA12801, NA12872, NA12864,NA12875, NA12752, NA12763, NA07034,NA07048, NA12750, NA12248, NA10835194,196,286194,205,086CEUMendelian inconsistencies(NA12004, NA10838), (NA12144, NA10846),(NA12813, NA12801), (NA12875, NA12865),(NA12760, NA12752)194,196,286194,201,998YRIHardy-Weinbergpopulationchr3194,457,389194,459,618194,457,389194,459,618JCHNull genotypesNA18637, NA18944, NA18945, NA18992,NA19000, NA18987chr49,877,8619,879,0299,877,8619,879,029CEUMendelian inconsistencies(NA12154, NA10830)chr49,969,5249,980,1229,969,5249,980,122CEUNull genotypesNA12003, NA12004, NA10838, NA12005,NA10839, NA12056, NA12057, NA10851,NA11829, NA11830, NA10856, NA11992,NA11993, NA10860, NA11994, NA11995,NA10861, NA12156, NA10831, NA07345,NA07022, NA07056, NA07019, NA11839,NA12146, NA12239, NA10847, NA12234,NA12716, NA12717, NA12707, NA12892,NA12812, NA12815, NA12873, NA12761,NA12762, NA12763, NA12753, NA07055,NA06993, NA06985, NA06991, NA12750,NA12248, NA12249, NA108359,969,5249,980,122YRINull genotypesNA18504, NA18515, NA18523, NA18871,NA18852, NA18855, NA18861, NA18914,NA18912, NA19092, NA19103, NA19101,NA19205, NA19120, NA19116, NA19119,NA19140, NA19127, NA190989,969,5249,980,122JCHNull genotypesNA18524, NA18545, NA18561, NA18632,NA18636, NA18571, NA18594, NA18637,NA18959, NA18969, NA18961, NA18964,NA18956, NA18947, NA18945, NA18948,NA18966, NA18975, NA18992, NA18998,NA19005, NA18999, NA18990, NA18987,NA18976, NA189959,969,5249,980,122YRIMendelian inconsistencies(NA19152, NA19154), (NA19143, NA19145)9,969,5249,980,122YRIHardy-Weinbergpopulation9,969,5249,998,131JCHHardy-Weinbergpopulationchr410,148,21010,151,03910,148,21010,151,039JCHNull genotypesNA1897110,148,21010,151,039CEUNull genotypesNA11831, NA11832, NA10855, NA12236,NA11840, NA12044, NA10857, NA10846chr412,125,94012,134,83012,125,94012,130,201YRIMendelian inconsistencies(NA18861, NA18863)12,129,93912,134,830YRIMendelian inconsistencies(NA18861, NA18863)chr420,312,96320,313,99120,312,96320,313,991YRINull genotypesNA19161chr421,123,92921,126,70021,123,92921,126,700YRINull genotypesNA18506, NA18508, NA18854, NA18855,NA18913, NA19094, NA19092, NA19103,NA19201, NA19205, NA19210, NA19159,NA19222, NA19119, NA19145, NA19143,NA19144, NA19192, NA1923821,043,09621,126,700YRIHardy-Weinbergpopulationchr432,163,12632,169,58132,163,12632,169,581JCHNull genotypesNA18945, NA18949chr434,677,42234,724,19134,677,42234,722,072CEUNull genotypesNA11832, NA11992, NA10860, NA12156,NA12878, NA07034, NA1085934,677,42234,724,191YRINull genotypesNA19200, NA1909834,677,42234,724,191JCHNull genotypesNA1852934,685,15434,722,072CEUMendelian inconsistencies(NA12815, NA12802)34,685,15434,701,647YRIMendelian inconsistencies(NA18858, NA18860), (NA19131, NA19132)34,716,06034,724,191YRIMendelian inconsistencies(NA18858, NA18860), (NA19206, NA19208)34,686,46734,707,485YRIMendelian inconsistencies(NA19131, NA19132)34,685,15434,762,467CEUHardy-Weinbergpopulationchr454,366,71754,399,11154,366,71754,399,111CEUNull genotypesNA12802chr463,672,95563,678,55863,672,95563,678,558JCHNull genotypesNA18609, NA18545, NA18542, NA18532,NA18537, NA18540, NA18635, NA18960,NA18961, NA18972, NA18965, NA18945,NA18949, NA18952, NA18966, NA18975,NA1899863,672,95563,678,558CEUNull genotypesNA07000, NA07029, NA1276163,672,95563,676,155YRIMendelian inconsistencies(NA18502, NA18500)chr464,701,16464,713,00864,701,16464,712,923JCHNull genotypesNA18542, NA1899564,701,16464,712,923CEUNull genotypesNA12154, NA07056, NA07019, NA12234,NA12761, NA12762, NA06993, NA06991,NA1224964,702,55964,713,008CEUNull genotypesNA12154, NA07056, NA07019, NA12234,NA12761, NA12762, NA06993, NA06991,NA1224964,707,38764,713,008JCHNull genotypesNA18542, NA1899564,701,16464,712,923CEUMendelian inconsistencies(NA12239, NA10847), (NA12872, NA12864),(NA12750, NA12740)64,701,16464,712,923CEUHardy-Weinbergpopulationchr469,377,78669,808,23769,432,41769,486,334YRINull genotypesNA19172, NA19161, NA19160, NA1909869,377,78669,808,237JCHNull genotypesNA18572, NA18547, NA18608, NA18552,NA18611, NA18545, NA18558, NA18532,NA18561, NA18562, NA18537, NA18603,NA18540, NA18605, NA18566, NA18563,NA18624, NA18579, NA18632, NA18582,NA18633, NA18635, NA18592, NA18636,NA18593, NA18576, NA18570, NA18571,NA18620, NA18594, NA18622, NA18623,NA18526, NA18942, NA18953, NA18968,NA18959, NA18969, NA18961, NA18972,NA18973, NA18964, NA18940, NA18947,NA18952, NA18966, NA18975, NA18991,NA18994, NA18992, NA18997, NA18998,NA19005, NA18999, NA19007, NA18990,NA18987, NA18967, NA18976, NA18970,NA18980, NA18995, NA18981, NA1897169,378,12369,808,237CEUNull genotypesNA12056, NA12057, NA10851, NA10831,NA12264, NA12716, NA12892, NA12878,NA1281369,460,79069,486,227YRINull genotypesNA19172, NA19161, NA19160, NA1909869,460,79069,486,227YRIMendelian inconsistencies(NA19143, NA19145)69,441,69569,482,361CEUMendelian inconsistencies(NA11995, NA10861), (NA12043, NA10857),(NA12145, NA10846), (NA12873, NA12864),(NA12750, NA12740)69,393,71269,431,602CEUMendelian inconsistencies(NA12145, NA10846), (NA12716, NA12707),(NA12813, NA12801)69,450,97269,458,490YRIMendelian inconsistencies(NA18858, NA18860), (NA18855, NA18857),(NA18862, NA18863), (NA19200, NA19202),(NA19172, NA19173)69,459,70169,462,910YRIMendelian inconsistencies(NA18858, NA18860), (NA19200, NA19202),(NA19143, NA19145)69,475,97269,482,770YRIMendelian inconsistencies(NA18502, NA18500)69,482,36169,491,890YRIMendelian inconsistencies(NA19143, NA19145)69,378,12369,482,361CEUHardy-Weinbergpopulation69,431,60269,486,334YRIHardy-Weinbergpopulation69,377,78669,486,334JCHHardy-Weinbergpopulationchr470,447,40970,542,96570,471,69170,542,965YRINull genotypesNA18503, NA18504, NA18508, NA18516,NA18912, NA19094, NA19093, NA19201,NA19160, NA19119, NA19153, NA1912970,471,69170,542,965JCHNull genotypesNA18965, NA1894770,447,40970,542,965CEUMendelian inconsistencies(NA12875, NA12865)70,477,07470,542,965YRIMendelian inconsistencies(NA19207, NA19208), (NA19222, NA19221),(NA19143, NA19145)70,455,62170,471,691YRIMendelian inconsistencies(NA19101, NA19103)70,412,47570,542,965YRIHardy-Weinbergpopulationchr474,165,31774,224,90174,165,31774,224,901CEUMendelian inconsistencies(NA12236, NA10830)74,174,22474,210,693CEUMendelian inconsistencies(NA12236, NA10830)74,194,50674,195,313CEUMendelian inconsistencies(NA12236, NA10830)chr491,630,18491,656,18691,630,18491,656,186CEUNull genotypesNA11832chr492,391,95892,393,09392,391,95892,393,093CEUNull genotypesNA12004, NA12057, NA10856, NA11832,NA11992, NA12154, NA12156, NA07345,NA07348, NA07056, NA12146, NA12264,NA10863, NA12802, NA1276292,391,95892,393,093CEUMendelian inconsistencies(NA12006, NA10839), (NA07000, NA07029),(NA12812, NA12801), (NA12875, NA12865)chr494,994,79894,995,54894,994,79894,995,548CEUMendelian inconsistencies(NA12145, NA10846)chr4104,670,951104,671,800104,670,951104,671,800YRIMendelian inconsistencies(NA18522, NA18521), (NA18870, NA18872)chr4105,212,443105,219,862105,212,443105,219,862CEUMendelian inconsistencies(NA12005, NA10839)chr4108,651,560108,665,451108,651,560108,665,451YRIMendelian inconsistencies(NA18505, NA18503), (NA18507, NA18506),(NA18859, NA18860), (NA19201, NA19202),(NA19130, NA19132)chr4115,637,062115,641,110115,637,062115,641,110JCHNull genotypesNA18532, NA18635, NA18593, NA18972,NA18964, NA18945, NA18975, NA18991,NA18987, NA18976115,637,062115,641,110CEUNull genotypesNA10839, NA11831, NA10855, NA07357,NA06994, NA07000, NA07029, NA12146,NA12717, NA12802, NA12760, NA07048,NA12249115,638,909115,641,110CEUMendelian inconsistencies(NA12003, NA10838), (NA12056, NA10851),(NA11830, NA10856), (NA12154, NA10830),(NA12234, NA10863), (NA12874, NA12865)115,637,062115,641,110YRIHardy-Weinbergpopulationchr4116,620,794116,631,153116,620,794116,631,153CEUMendelian inconsistencies(NA12892, NA12878)116,620,794116,631,153YRIMendelian inconsistencies(NA19206, NA19208)chr4119,071,973119,076,909119,071,973119,076,909CEUMendelian inconsistencies(NA07000, NA07029)119,071,973119,076,909YRIMendelian inconsistencies(NA19210, NA19211)chr4121,410,025121,411,772121,410,025121,411,772CEUMendelian inconsistencies(NA11993, NA10860), (NA12155, NA10831)chr4129,148,757129,162,771129,148,757129,162,771YRINull genotypesNA19132, NA19239chr4130,965,878130,977,227130,965,878130,977,227YRIMendelian inconsistencies(NA19239, NA19240)chr4133,046,503133,053,932133,046,503133,053,932CEUNull genotypesNA06994, NA12872, NA12864chr4138,551,715138,556,685138,551,715138,556,685YRINull genotypesNA18863, NA18861138,551,715138,556,685CEUNull genotypesNA12003, NA10838, NA12154, NA10830,NA12043, NA12234, NA10863138,551,715138,556,270CEUMendelian inconsistencies(NA12760, NA12752)138,551,715138,556,270YRIMendelian inconsistencies(NA18871, NA18872), (NA19101, NA19103),(NA19116, NA19120), (NA19098, NA19100)chr4138,642,484138,644,834138,642,484138,644,834CEUMendelian inconsistencies(NA12812, NA12801), (NA11882, NA10859)chr4144,863,310144,873,579144,863,310144,873,579YRIMendelian inconsistencies(NA19201, NA19202)chr4152,458,392152,461,345152,458,392152,461,345YRINull genotypesNA19211, NA19210, NA19143, NA19239chr4157,541,319157,545,522157,541,319157,545,522CEUNull genotypesNA07000chr4161,637,501161,649,570161,637,501161,649,570CEUMendelian inconsistencies(NA12891, NA12878)chr4162,456,693162,462,242162,456,693162,462,242JCHNull genotypesNA18562, NA18577, NA18594, NA18960,NA18972, NA18943, NA18975, NA18997,NA18971162,456,693162,462,242CEUNull genotypesNA07019, NA12802, NA12248162,458,627162,462,242YRIMendelian inconsistencies(NA19141, NA19142), (NA19131, NA19132)chr4169,519,486169,532,584169,519,486169,532,584CEUMendelian inconsistencies(NA12056, NA10851)chr4169,661,623169,683,933169,661,623169,683,933CEUMendelian inconsistencies(NA12056, NA10851)chr4170,385,138170,389,969170,385,138170,389,969YRINull genotypesNA18852, NA18861chr4173,685,436173,686,862173,685,436173,686,862JCHNull genotypesNA18558, NA18632, NA18620, NA19005173,685,436173,686,862CEUNull genotypesNA12236, NA12155, NA10857, NA12145,NA12239, NA12264, NA12234, NA10863,NA12892, NA12864, NA12875, NA12760,NA12762, NA12753, NA07034, NA07055,NA07048, NA06993, NA06991, NA12249,NA10835173,669,470173,686,862CEUHardy-Weinbergpopulationchr4179,290,796179,297,048179,290,796179,292,684YRIMendelian inconsistencies(NA19171, NA19173)179,292,684179,297,048YRIMendelian inconsistencies(NA19171, NA19173)chr4189,917,097189,929,184189,917,097189,929,184JCHNull genotypesNA18960, NA18965chr551,876,79651,890,58951,876,79651,890,589YRIMendelian inconsistencies(NA18504, NA18503)chr5109,500,767109,501,608109,500,767109,501,608CEUMendelian inconsistencies(NA12005, NA10839)chr5140,256,473140,258,672140,256,473140,258,672JCHNull genotypesNA18637, NA18990chr5161,906,500161,932,381161,906,500161,932,381CEUNull genotypesNA10854, NA12801chr67,574,5767,578,2767,574,5767,578,276CEUMendelian inconsistencies(NA12875, NA12865)chr610,579,02310,636,78010,579,02310,609,555CEUMendelian inconsistencies(NA12264, NA10863)10,586,42810,636,780CEUMendelian inconsistencies(NA12264, NA10863)chr619,151,52919,155,94219,151,52919,155,942CEUNull genotypesNA12872, NA1085919,154,60919,155,942CEUMendelien inconsistencies(NA12815, NA12802)chr627,783,26927,784,77327,783,26927,784,773YRIMendelian inconsistencies(NA18522, NA18521), (NA19160, NA19161)chr629,963,78829,971,72729,963,78829,971,727YRIMendelian inconsistencies(NA18507, NA18506), (NA19093, NA19094),(NA19200, NA19202)29,961,77030,000,151JCHHardy-Weinbergpopulationchr630,032,43330,035,09030,032,43330,035,090YRIMendelian inconsistencies(NA19131, NA19132)chr631,382,09831,387,19031,382,09831,387,190YRIMendelian inconsistencies(NA18502, NA18500), (NA19201, NA19202),(NA19119, NA19120)chr632,578,05732,582,06832,578,05732,582,068YRIMendelian inconsistencies(NA19119, NA19120)32,569,02532,605,711JCHHardy-Weinbergpopulationchr632,714,84032,718,48332,714,84032,718,483YRIMendelian inconsistencies(NA18501, NA18500), (NA18508, NA18506),(NA18523, NA18521), (NA19192, NA19194)chr633,985,15133,989,08333,985,15133,989,083YRIMendelian inconsistencies(NA18502, NA18500), (NA18855, NA18857),(NA19092, NA19094), (NA19138, NA19139),(NA19204, NA19205), (NA19160, NA19161)chr654,698,19254,707,08154,698,19254,707,081CEUMendelian inconsistencies(NA07357, NA07348)54,698,19254,707,081YRIMendelian inconsistencies(NA19209, NA19211), (NA19141, NA19142),(NA19128, NA19129)chr677,015,14477,017,57577,015,14477,017,575YRIMendelian inconsistencies(NA19128, NA19129)chr678,968,79778,978,36478,973,26378,978,364YRIMendelian inconsistencies(NA18913, NA18914)78,968,79778,974,994YRIMendelian inconsistencies(NA18504, NA18503)chr678,995,49479,027,96578,998,67479,022,282CEUMendelian inconsistencies(NA12144, NA10846), (NA12145, NA10846)79,021,96079,027,965CEUMendelian inconsistencies(NA12003, NA10838), (NA12236, NA10830),(NA06994, NA07029), (NA07022, NA07019),(NA12144, NA10846), (NA12145, NA10846),(NA12249, NA10835)78,995,49479,021,960YRIMendelian inconsistencies(NA18862, NA18863), (NA18913, NA18914)chr681,280,10381,280,86981,280,10381,280,869CEUMendelian inconsistencies(NA07034, NA07048), (NA06993, NA06991)chr693,571,37993,573,95793,571,37993,573,957YRIMendelian inconsistencies(NA19206, NA19208), (NA19116, NA19120)chr6103,784,319103,807,031103,784,319103,807,031CEUMendelian inconsistencies(NA12057, NA10851), (NA12145, NA10846),(NA12891, NA12878), (NA12762, NA12753)103,787,052103,794,525CEUMendelian inconsistencies(NA12004, NA10838), (NA12006, NA10839),(NA12057, NA10851), (NA11832, NA10855),(NA12156, NA10831), (NA07000, NA07029),(NA12716, NA12707), (NA12717, NA12707),(NA12812, NA12801), (NA12815, NA12802),(NA12875, NA12865), (NA07055, NA07048),(NA12249, NA10835)103,787,052103,807,031YRIMendelian inconsistencies(NA19130, NA19132), (NA19131, NA19132)103,784,468103,799,771YRIMendelian inconsistencies(NA18502, NA18500), (NA18852, NA18854),(NA18862, NA18863), (NA18913, NA18914),(NA19203, NA19205), (NA19119, NA19120),(NA19099, NA19100)103,784,319103,806,709CEUHardy-Weinbergpopulation103,768,785103,807,031YRIHardy-Weinbergpopulation103,768,785103,807,031JCHHardy-Weinbergpopulationchr6147,916,589147,923,491147,916,589147,923,491CEUMendelian inconsistencies(NA12004, NA10838)chr6148,247,323148,252,401148,247,323148,252,401YRIMendelian inconsistencies(NA19092, NA19094)chr7246,846249,228246,846249,228JCHNull genotypesNA18608, NA18953, NA18948, NA18966,NA18975, NA19000, NA18978chr73,136,8643,171,3523,136,8643,171,352YRIMendelian inconsistencies(NA19222, NA19221)chr77,095,0207,099,4527,095,0207,099,452YRIMendelian inconsistencies(NA19099, NA19100)chr712,848,77112,856,66012,848,77112,856,660YRIMendelian inconsistencies(NA18502, NA18500)chr738,094,78038,112,25438,094,78038,112,254CEUMendelian inconsistencies(NA07034, NA07048)chr761,563,67261,571,61661,563,67261,571,616CEUNull genotypesNA07055, NA1085961,563,67261,571,616CEUMendelian inconsistencies(NA12815, NA12802), (NA06985, NA06991)chr775,886,97375,897,20975,886,97375,897,209JCHNull genotypesNA18994chr775,913,42375,923,50975,913,42375,923,509CEUNull genotypesNA12005chr778,439,55778,445,10978,439,55778,445,109YRIMendelian inconsistencies(NA19092, NA19094), (NA19130, NA19132)78,418,15178,523,025JCHHardy-Weinbergpopulationchr789,422,55689,424,32789,422,55689,424,327YRINull genotypesNA18503, NA18523, NA18914, NA19139,NA19207, NA19116, NA19129, NA19128,NA1913189,422,55689,424,158CEUNull genotypesNA12056, NA12057, NA10851, NA11830,NA10856, NA10861, NA07345, NA07348,NA07000, NA12043, NA12044, NA10857,NA12145, NA12716, NA12717, NA12707,NA12812, NA12813, NA12801, NA12815,NA12864, NA12874, NA12865, NA12760,NA12753, NA06985, NA12740chr790,987,66791,000,86590,987,66791,000,865JCHNull genotypesNA18964, NA18994chr792,092,86692,109,45692,092,86692,101,842CEUMendelian inconsistencies(NA11840, NA10854)92,096,12292,109,456CEUMendelian inconsistencies(NA11840, NA10854)chr792,939,50892,942,58692,939,50892,942,586CEUNull genotypesNA12056, NA10854, NA12814, NA1286492,939,50892,941,787CEUMendelian inconsistencies(NA12239, NA10847), (NA12717, NA12707)chr797,008,44097,012,72997,008,44097,012,729CEUNull genotypesNA12003, NA1280297,008,44097,012,729CEUMendelian inconsistencies(NA11993, NA10860), (NA12154, NA10830),(NA12155, NA10831), (NA11840, NA10854),(NA12043, NA10857), (NA12872, NA12864)chr7104,193,511104,201,772104,193,511104,201,772YRIMendelian inconsistencies(NA19203, NA19205), (NA19222, NA19221)chr7109,002,325109,011,761109,002,325109,007,346YRINull genotypesNA18515, NA19202109,002,325109,010,654JCHNull genotypesNA18609, NA18608, NA18566, NA18624,NA18620, NA18973, NA18964, NA18952,NA18980, NA18981109,003,350109,007,346CEUNull genotypesNA10851, NA11830, NA11993, NA07357,NA06994, NA07022, NA07056, NA07019,NA10846, NA12234, NA12717, NA12707,NA12801, NA07055, NA10859109,002,968109,011,761JCHNull genotypesNA18609, NA18608, NA18566, NA18624,NA18620, NA18973, NA18964, NA18952,NA18980, NA18981109,002,968109,011,761YRIMendelian inconsistencies(NA18870, NA18872), (NA18913, NA18914),(NA19116, NA19120), (NA19128, NA19129)109,002,325109,005,266YRIMendelian inconsistencies(NA19116, NA19120), (NA19140, NA19142)109,003,350109,007,346YRIMendelian inconsistencies(NA18870, NA18872), (NA18913, NA18914),(NA19116, NA19120), (NA19128, NA19129)chr7109,735,375109,745,762109,735,375109,745,762JCHNull genotypesNA18632, NA18636, NA18593, NA18570,NA18956, NA18952, NA18991, NA19000chr7110,398,629110,442,249110,400,767110,439,782CEUMendelian inconsistencies(NA11995, NA10861)110,398,629110,442,249CEUMendelian inconsistencies(NA11995, NA10861)chr7115,492,184115,494,416115,492,184115,494,416CEUMendelian inconsistencies(NA06994, NA07029), (NA12760, NA12752)chr7117,616,907117,627,348117,616,907117,627,348CEUMendelian inconsistencies(NA12813, NA12801)chr7118,709,624118,710,630118,709,624118,710,630YRIMendelian inconsistencies(NA18522, NA18521), (NA19172, NA19173)chr7120,111,123120,114,041120,111,123120,114,041CEUMendelian inconsistencies(NA12760, NA12752)chr7124,212,558124,213,103124,212,558124,213,103YRIMendelian inconsistencies(NA19127, NA19129)chr7125,601,054125,603,762125,601,054125,603,762YRINull genotypesNA18504, NA18506, NA18508, NA18860,NA18859, NA18521, NA18523, NA18522,NA18870, NA18863, NA18861, NA18913,NA19103, NA19101, NA19139, NA19138,NA19205, NA19204, NA19203, NA19208,NA19206, NA19207, NA19160, NA19222,NA19116, NA19140, NA19154, NA19152,NA19145, NA19143, NA19129, NA19127,NA19128, NA19131, NA19194, NA19193,NA19192, NA19238125,601,054125,603,762JCHNull genotypesNA18572, NA18547, NA18609, NA18608,NA18552, NA18611, NA18542, NA18540,NA18579, NA18635, NA18593, NA18622,NA18959, NA18972, NA18951, NA18943,NA18994, NA19007, NA18987, NA18976,NA18981, NA18971125,601,054125,603,762CEUMendelian inconsistencies(NA12005, NA10839), (NA11992, NA10860),(NA12154, NA10830), (NA11840, NA10854),(NA12145, NA10846), (NA12248, NA10835)chr7133,202,026133,212,391133,209,203133,212,391YRINull genotypesNA18504, NA19119133,203,070133,212,391JCHNull genotypesNA18540, NA18624, NA18593, NA18594,NA18961, NA18940, NA18966133,203,070133,212,391CEUNull genotypesNA12006, NA10839, NA10851, NA11831,NA11995, NA12155, NA10831, NA07357,NA07348, NA12264, NA12234, NA10863,NA12717, NA12707, NA12814, NA12872,NA12864, NA12763, NA07055, NA12740,NA12248133,209,203133,212,391YRIMendelian inconsistencies(NA19209, NA19211), (NA19160, NA19161)133,202,026133,203,070YRIMendelian inconsistencies(NA19137, NA19139)chr7141,179,905141,200,696141,188,320141,199,669CEUNull genotypesNA07056, NA07019, NA10846, NA06993141,179,905141,200,696CEUNull genotypesNA07056, NA07019, NA10846, NA06993chr7141,456,537141,472,512141,462,154141,472,285YRINull genotypesNA18501, NA18504, NA18506, NA18508,NA18507, NA18860, NA18858, NA18859,NA18515, NA18517, NA18516, NA18521,NA18872, NA18870, NA18871, NA18854,NA18852, NA18857, NA18855, NA18914,NA18912, NA18913, NA19093, NA19102,NA19138, NA19202, NA19200, NA19203,NA19210, NA19208, NA19160, NA19223,NA19116, NA19153, NA19145, NA19143,NA19144, NA19129, NA19128, NA19132,NA19130, NA19100, NA19099, NA19194,NA19192, NA19240, NA19239141,462,154141,472,285JCHNull genotypesNA18524, NA18547, NA18550, NA18608,NA18552, NA18545, NA18558, NA18532,NA18561, NA18537, NA18603, NA18563,NA18579, NA18633, NA18635, NA18593,NA18620, NA18621, NA18594, NA18622,NA18573, NA18637, NA18526, NA18942,NA18953, NA18968, NA18961, NA18965,NA18973, NA18964, NA18956, NA18947,NA18944, NA18945, NA18992, NA18997,NA18998, NA19000, NA18987, NA18967,NA18976, NA18978, NA19003141,462,154141,472,512CEUNull genotypesNA12004, NA10838, NA11832, NA10855,NA11995, NA12154, NA12156, NA07357,NA12044, NA12144, NA12892, NA12815,NA12802, NA12872, NA12864, NA12762,NA07034, NA07055, NA07048141,460,703141,472,285JCHNull genotypesNA18524, NA18547, NA18550, NA18608,NA18552, NA18545, NA18558, NA18532,NA18561, NA18537, NA18603, NA18563,NA18579, NA18633, NA18635, NA18593,NA18620, NA18621, NA18594, NA18622,NA18573, NA18637, NA18526, NA18942,NA18953, NA18968, NA18961, NA18965,NA18973, NA18964, NA18956, NA18947,NA18944, NA18945, NA18992, NA18997,NA18998, NA19000, NA18990, NA18987,NA18967, NA18976, NA18978, NA19003141,456,537141,472,512CEUNull genotypesNA12004, NA10838, NA10851, NA11832,NA10855, NA11995, NA12154, NA12156,NA07357, NA12044, NA12144, NA12892,NA12815, NA12802, NA12872, NA12864,NA12761, NA12762, NA07034, NA07055,NA07048141,456,537141,472,285YRINull genotypesNA18500, NA18501, NA18504, NA18506,NA18508, NA18507, NA18860, NA18858,NA18859, NA18515, NA18517, NA18516,NA18521, NA18872, NA18870, NA18871,NA18854, NA18852, NA18857, NA18855,NA18914, NA18912, NA18913, NA19093,NA19102, NA19138, NA19202, NA19200,NA19203, NA19210, NA19208, NA19160,NA19223, NA19116, NA19153, NA19145,NA19143, NA19144, NA19129, NA19128,NA19132, NA19130, NA19100, NA19099,NA19194, NA19192, NA19240, NA19239141,469,162141,470,799CEUMendelian inconsistencies(NA12006, NA10839), (NA12057, NA10851),(NA11992, NA10860)chr7141,657,311141,669,388141,657,311141,669,388CEUMendelian inconsistencies(NA12813, NA12801)chr7141,730,581141,765,123141,730,581141,765,123CEUMendelian inconsistencies(NA12144, NA10846)141,730,581141,765,123CEUMendelian inconsistencies(NA12144, NA10846)chr7141,921,685141,931,471141,922,974141,927,931JCHNull genotypesNA18524, NA18547, NA18609, NA18550,NA18552, NA18611, NA18555, NA18529,NA18532, NA18561, NA18537, NA18603,NA18605, NA18582, NA18635, NA18636,NA18577, NA18571, NA18620, NA18621,NA18622, NA18573, NA18623, NA18637,NA18959, NA18960, NA18961, NA18973,NA18956, NA18940, NA18943, NA18944,NA18945, NA18949, NA18952, NA18966,NA18975, NA18992, NA18998, NA19007,NA18990, NA18978, NA18970, NA18980,NA18995, NA18974, NA19003141,921,685141,931,471CEUNull genotypesNA10851, NA11829, NA11993, NA07345,NA10846, NA12264, NA12716, NA12812,NA12761, NA06991141,922,974141,931,471JCHNull genotypesNA18524, NA18547, NA18609, NA18550,NA18552, NA18611, NA18555, NA18529,NA18532, NA18561, NA18537, NA18605,NA18582, NA18635, NA18636, NA18577,NA18571, NA18620, NA18621, NA18622,NA18573, NA18623, NA18637, NA18959,NA18960, NA18961, NA18940, NA18944,NA18949, NA18952, NA18992, NA18998,NA19007, NA18978, NA18970, NA18980,NA18995, NA19003chr7149,625,782149,631,027149,625,782149,631,027CEUMendelian inconsistencies(NA11831, NA10855)chr7157,855,041157,857,130157,855,041157,857,130JCHNull genotypesNA18566, NA18593, NA18947chr8587,487588,391587,487588,391JCHNull genotypesNA18547, NA18609, NA18550, NA18608,NA18611, NA18632, NA18635, NA18593,NA18576, NA18612, NA18594, NA18622,NA18953, NA18960, NA18961, NA18972,NA18956, NA18949, NA18966, NA18975,NA18998, NA19000, NA18987, NA18976,NA18995, NA18974534,755588,391CEUHardy-Weinbergpopulationchr82,066,2222,067,6962,066,2222,067,696CEUMendelian inconsistencies(NA12872, NA12864)2,066,2222,075,642YRIHardy-Weinbergpopulationchr82,242,1102,250,5192,242,1102,250,519YRINull genotypesNA18500, NA18501, NA19103, NA19101,NA19137, NA19203, NA19209, NA19127,NA191302,242,1102,243,578CEUMendelian inconsistencies(NA07000, NA07029)2,242,1102,244,333YRIMendelian inconsistencies(NA18504, NA18503), (NA19172, NA19173)chr83,987,4683,992,4293,987,4683,992,429JCHNull genotypesNA19007chr84,039,8674,040,4874,039,8674,040,487JCHNull genotypesNA19007chr84,150,1214,158,1944,150,1214,158,194JCHNull genotypesNA19007chr84,173,1584,183,5884,173,1584,183,588JCHNull genotypesNA19007chr84,576,5764,586,1364,576,5764,586,136JCHNull genotypesNA19007chr84,619,5134,691,9494,619,5134,691,949YRIMendelian inconsistencies(NA18912, NA18914)chr84,708,9274,717,6624,708,9274,717,662YRIMendelian inconsistencies(NA19160, NA19161)chr85,065,3075,076,1945,065,3075,076,194JCHNull genotypesNA19007chr85,340,6945,350,4475,340,6945,350,447JCHNull genotypesNA19007chr85,587,9995,590,0455,587,9995,590,045CEUMendelian inconsistencies(NA11882, NA10859)chr85,638,4095,641,0125,638,4095,641,012JCHNull genotypesNA19007chr86,052,5756,056,8516,052,5756,056,851YRIMendelian inconsistencies(NA19223, NA19221)chr86,108,3586,147,2626,108,3586,147,262JCHNull genotypesNA18537chr86,810,7056,811,4526,810,7056,811,452CEUMendelian inconsistencies(NA12813, NA12801)chr87,201,3877,206,9537,201,3877,206,953CEUMendelian inconsistencies(NA06994, NA07029)7,201,3877,206,953YRIMendelian inconsistencies(NA18871, NA18872), (NA18862, NA18863),(NA19138, NA19139), (NA19140, NA19142),(NA19239, NA19240)chr87,814,6597,824,3637,814,6597,824,363CEUMendelian inconsistencies(NA11839, NA10854), (NA12234, NA10863),(NA12813, NA12801)chr89,537,0159,537,6549,537,0159,537,654JCHNull genotypesNA18558, NA18562, NA18959, NA18944,NA18949, NA18981chr812,242,02512,257,91912,242,02512,257,919CEUMendelian inconsistencies(NA07034, NA07048), (NA06993, NA06991)12,242,02512,257,919CEUMendelian inconsistencies(NA07034, NA07048), (NA06993, NA06991)chr812,556,00412,565,10912,556,00412,565,109JCHNull genotypesNA18561, NA18621chr813,625,50113,659,19613,625,50113,659,196YRINull genotypesNA1920513,637,01613,656,796YRIMendelian inconsistencies(NA18505, NA18503), (NA18522, NA18521),(NA19092, NA19094)13,625,50113,630,545YRIMendelian inconsistencies(NA18505, NA18503)13,625,50113,659,196YRIHardy-Weinbergpopulationchr814,590,53414,596,69214,590,53414,596,692YRINull genotypesNA18507, NA18863, NA18862, NA18914,NA18912, NA18913, NA19102, NA19171,NA19152, NA19131, NA19240, NA19238,NA19239chr814,647,13015,392,54814,650,69115,336,034CEUMendelian inconsistencies(NA12234, NA10863)14,978,42915,392,548CEUMendelian inconsistencies(NA12234, NA10863)14,647,13015,337,510CEUMendelian inconsistencies(NA12234, NA10863)chr815,622,96915,670,92415,643,76315,646,128CEUMendelian inconsistencies(NA12234, NA10863)15,622,96915,670,924CEUMendelian inconsistencies(NA12234, NA10863)15,627,17215,632,885CEUMendelian inconsistencies(NA12234, NA10863)chr816,211,92416,216,72616,212,02716,216,726YRINull genotypesNA18505, NA18508, NA18515, NA18517,NA18516, NA18854, NA18852, NA19201,NA19204, NA19211, NA19209, NA19208,NA19160, NA19153, NA19129, NA19128,NA1913016,211,92416,216,726JCHNull genotypesNA18612, NA18956, NA18975, NA18992,NA18998, NA1899016,211,92416,216,726CEUMendelian inconsistencies(NA12760, NA12752), (NA12248, NA10835)16,211,92416,215,120YRIMendelian inconsistencies(NA18522, NA18521), (NA19192, NA19194)16,211,92416,216,201YRIHardy-Weinbergpopulationchr816,277,37516,280,83316,277,37516,280,833YRIMendelian inconsistencies(NA19204, NA19205)chr820,003,37520,033,44420,003,37520,033,444JCHNull genotypesNA18997chr824,996,36925,011,46424,996,36925,011,464YRINull genotypesNA1850624,998,57125,011,464JCHNull genotypesNA1899925,007,85925,011,464CEUMendelian inconsistencies(NA11995, NA10861), (NA12892, NA12878)25,009,26525,011,464CEUMendelian inconsistencies(NA11995, NA10861), (NA07357, NA07348),(NA07345, NA07348), (NA12892, NA12878)25,001,04325,006,243CEUMendelian inconsistencies(NA12004, NA10838), (NA12005, NA10839),(NA12006, NA10839), (NA11995, NA10861),(NA07345, NA07348), (NA06994, NA07029),(NA12144, NA10846), (NA12145, NA10846),(NA12716, NA12707), (NA12812, NA12801)24,996,36925,011,464YRIMendelian inconsistencies(NA19098, NA19100), (NA19239, NA19240)25,001,04325,011,464CEUHardy-Weinbergpopulationchr825,433,95525,436,60025,434,48725,436,600CEUMendelian inconsistencies(NA11994, NA10861), (NA12144, NA10846),(NA06993, NA06991)25,433,95525,434,487YRIMendelian inconsistencies(NA19127, NA19129)chr838,125,50138,127,58338,125,50138,127,583CEUNull genotypesNA10856, NA11882chr839,250,10739,404,54739,250,10739,397,764CEUNull genotypesNA10839, NA11829, NA11992, NA12154,NA12156, NA10831, NA12144, NA12239,NA12264, NA12716, NA12891, NA12813,NA12814, NA12873, NA12864, NA12875,NA12249, NA1083539,250,10739,404,547JCHNull genotypesNA18524, NA18526, NA18942, NA1896739,268,39839,389,812CEUNull genotypesNA10839, NA11829, NA11992, NA12154,NA12156, NA10831, NA12144, NA12239,NA12264, NA12716, NA12891, NA12813,NA12814, NA12873, NA12864, NA12875,NA12249, NA1083539,386,49139,390,862CEUMendelian inconsistencies(NA12004, NA10838), (NA11831, NA10855),(NA12750, NA12740)39,271,74239,390,071CEUMendelian inconsistencies(NA12005, NA10839), (NA12006, NA10839),(NA12057, NA10851), (NA11829, NA10856),(NA11831, NA10855), (NA11992, NA10860),(NA12155, NA10831), (NA12144, NA10846),(NA12239, NA10847), (NA12264, NA10863),(NA12813, NA12801), (NA12814, NA12802),(NA12872, NA12864), (NA07055, NA07048),(NA12750, NA12740), (NA12248, NA10835)39,304,59939,320,127CEUMendelian inconsistencies(NA12005, NA10839), (NA12006, NA10839),(NA11829, NA10856), (NA11831, NA10855),(NA12155, NA10831), (NA12813, NA12801),(NA07055, NA07048), (NA12750, NA12740),(NA12248, NA10835)39,292,39439,316,841CEUMendelian inconsistencies(NA11831, NA10855), (NA11839, NA10854),(NA12813, NA12801), (NA12248, NA10835)39,291,52139,299,946CEUMendelian inconsistencies(NA11831, NA10855), (NA11839, NA10854)39,304,81739,319,367CEUMendelian inconsistencies(NA11839, NA10854)39,250,10739,255,036CEUMendelian inconsistencies(NA11831, NA10855)39,334,59339,387,126CEUMendelian inconsistencies(NA06985, NA06991)39,271,74239,390,862YRIMendelian inconsistencies(NA19203, NA19205), (NA19207, NA19208)39,250,10739,326,538YRIMendelian inconsistencies(NA18858, NA18860)39,268,25639,296,584YRIMendelian inconsistencies(NA19203, NA19205)39,387,12639,404,547YRIMendelian inconsistencies(NA19203, NA19205)39,250,10739,411,423CEUHardy-Weinbergpopulation39,255,03639,325,989CEUHardy-Weinbergpopulationchr840,201,78040,206,91740,201,78040,206,917CEUMendelian inconsistencies(NA12156, NA10831), (NA12239, NA10847)chr841,124,71341,125,49041,124,71341,125,490JCHNull genotypesNA18632chr851,082,18551,083,97851,082,18551,083,978YRIMendelian inconsistencies(NA19201, NA19202)chr854,202,94254,211,31854,202,94254,211,318YRINull genotypesNA18860, NA18854, NA19103, NA19132,NA19100chr855,414,54455,423,84755,414,54455,423,847YRINull genotypesNA19172, NA19128chr857,668,66757,679,18357,668,66757,679,183YRINull genotypesNA18500chr858,413,96458,415,08358,413,96458,415,083YRIMendelian inconsistencies(NA19127, NA19129)chr859,355,79459,368,40959,355,79459,368,409JCHNull genotypesNA18940, NA19003chr861,051,46861,055,81161,051,46861,055,811YRINull genotypesNA18516chr864,899,41164,900,64264,899,41164,900,642YRINull genotypesNA19132, NA19131chr865,165,83865,167,76665,165,83865,167,766CEUNull genotypesNA11994, NA10861chr868,039,96268,043,38768,039,96268,043,387YRINull genotypesNA18861, NA19211, NA19120, NA19116,NA19119, NA19141, NA19130, NA19100chr870,997,23971,011,39070,997,23971,011,390YRIMendelian inconsistencies(NA18913, NA18914)chr873,673,60473,685,91573,673,60473,685,915CEUNull genotypesNA12873chr882,429,91582,431,08582,429,91582,431,085YRINull genotypesNA19205, NA19204chr885,311,11985,315,95985,311,11985,315,959CEUMendelian inconsistencies(NA12057, NA10851), (NA12872, NA12864)chr890,287,37790,293,53390,287,37790,292,349YRINull genotypesNA19173, NA1917190,287,60790,293,533YRIMendelian inconsistencies(NA19171, NA19173)chr891,140,35791,143,30191,140,35791,143,301JCHNull genotypesNA18971chr893,654,57393,663,14193,654,57393,663,141CEUNull genotypesNA10860, NA07056, NA12892, NA12878chr895,515,27795,528,02695,515,27795,528,026YRINull genotypesNA19116chr8101,615,219101,615,751101,615,219101,615,751CEUNull genotypesNA06985chr8103,010,682103,011,802103,010,682103,011,802YRINull genotypesNA18860, NA18858, NA18522, NA19139,NA19138chr8107,045,009107,047,083107,045,009107,047,083CEUMendelian inconsistencies(NA12875, NA12865)chr8107,817,446107,818,580107,817,446107,818,580YRIMendelian inconsistencies(NA19203, NA19205)chr8115,126,252115,130,784115,126,252115,130,784YRINull genotypesNA19131, NA19130chr8115,591,865115,599,078115,591,865115,599,078JCHNull genotypesNA18636, NA18612, NA18956, NA18947,NA18948115,540,689115,611,467JCHHardy-Weinbergpopulationchr8121,829,317121,829,933121,829,317121,829,933YRIMendelian inconsistencies(NA19127, NA19129)chr8123,027,122123,034,224123,027,122123,034,224YRINull genotypesNA18516chr8137,811,456137,822,815137,811,456137,822,815CEUMendelian inconsistencies(NA12249, NA10835)chr8141,973,953141,975,249141,973,953141,975,249YRINull genotypesNA19116chr9205,917238,749205,917238,749JCHNull genotypesNA18576chr9581,040598,622581,040592,986YRINull genotypesNA18862, NA19201582,004598,622YRINull genotypesNA18862chr9665,600685,607665,600685,607YRIMendelian inconsistencies(NA18859, NA18860)chr91,500,2991,516,3831,500,2991,516,383YRIMendelian inconsistencies(NA18870, NA18872)chr94,006,8144,009,2244,006,8144,009,224YRINull genotypesNA19093, NA19092chr95,102,5195,103,5775,102,5195,103,577YRINull genotypesNA18505, NA19093chr99,791,1539,794,3139,791,1539,793,954CEUMendelian inconsistencies(NA11840, NA10854)9,791,1539,794,313CEUMendelian inconsistencies(NA11840, NA10854)chr910,536,81410,572,58610,536,81410,572,586CEUMendelian inconsistencies(NA12760, NA12752)chr911,881,38111,883,04711,881,38111,883,047YRINull genotypesNA18859, NA19221, NA19222chr911,903,28711,978,03611,903,28711,978,036YRIMendelian inconsistencies(NA19159, NA19161)chr921,180,82721,189,68021,180,82721,189,680YRINull genotypesNA19131chr921,275,38921,296,31821,278,01821,285,828YRINull genotypesNA18914, NA18912, NA1920821,275,38921,284,921YRINull genotypesNA18914, NA1891221,283,63121,296,318YRIMendelian inconsistencies(NA18912, NA18914)chr923,759,75423,765,44923,759,75423,765,449CEUMendelian inconsistencies(NA06994, NA07029)23,759,75423,765,449YRIMendelian inconsistencies(NA19144, NA19145)chr924,457,69824,484,30224,457,69824,484,302CEUNull genotypesNA11881chr929,914,68229,976,25029,933,65229,965,253CEUNull genotypesNA0704829,929,17329,948,595CEUMendelian inconsistencies(NA07055, NA07048)29,914,68229,976,250CEUMendelian inconsistencies(NA07055, NA07048)chr932,991,44933,014,91732,991,44933,014,917CEUNull genotypesNA12057, NA07357chr936,872,14136,890,81736,872,14136,890,817YRINull genotypesNA18500, NA19207chr938,509,91238,524,24538,509,91238,524,245CEUNull genotypesNA12872chr943,348,53743,374,21443,348,53743,374,214YRINull genotypesNA18500, NA19207chr962,545,37562,570,38362,545,37562,570,383YRINull genotypesNA18500, NA19207chr967,563,74867,566,19667,563,74867,566,196JCHNull genotypesNA18953chr981,945,70581,999,32581,945,70581,999,325CEUNull genotypesNA12872chr989,737,46889,742,16789,737,46889,742,167CEUNull genotypesNA12760chr9100,233,285100,243,634100,233,285100,243,634JCHNull genotypesNA18572chr9102,745,808102,746,309102,745,808102,746,309YRIMendelian inconsistencies(NA19204, NA19205), (NA19152, NA19154)chr9109,440,549109,445,124109,440,549109,445,124YRINull genotypesNA19159chr9124,896,840124,898,353124,896,840124,898,353JCHNull genotypesNA18532, NA18582, NA18952, NA18998,NA18970, NA18995chr9129,658,718129,724,341129,658,718129,724,341JCHNull genotypesNA18967chr9133,660,972133,664,954133,660,972133,664,954JCHNull genotypesNA18526, NA18999chr9136,191,747136,197,190136,191,747136,197,190JCHNull genotypesNA18540, NA18960chr9136,222,560136,230,195136,222,560136,230,195CEUNull genotypesNA12814chrX3,700,0093,708,1833,700,0093,708,183YRINull genotypesNA19205, NA19119chrX5,225,7045,245,3615,225,7045,245,361YRIMendelian inconsistencies(NA18501, NA18500), (NA18504, NA18503)chrX6,811,0256,978,8306,844,8536,950,030CEUNull genotypesNA10854, NA122486,811,0256,978,830CEUNull genotypesNA12248chrX7,583,8127,590,1937,583,8127,590,193YRINull genotypesNA18856chrX11,381,61211,410,07511,381,61211,410,075CEUMendelian inconsistencies(NA12003, NA10838), (NA12005, NA10839)chrX15,834,90515,839,61615,834,90515,839,616YRINull genotypesNA18854, NA19161chrX15,964,77715,971,94815,964,77715,971,948YRINull genotypesNA19103, NA19141, NA19154, NA19128chrX27,404,10427,412,91627,404,10427,412,916JCHNull genotypesNA1897227,395,88327,463,391YRIHardy-WeinbergpopulationchrX31,917,23131,922,83631,917,23131,922,836YRIMendelian inconsistencies(NA18501, NA18500), (NA18504, NA18503)chrX33,482,31333,483,04833,482,31333,483,048YRINull genotypesNA19206, NA19141chrX46,532,54046,534,97546,532,54046,534,975YRIMendelian inconsistencies(NA18502, NA18500), (NA19223, NA19221)chrX46,929,29847,028,43346,929,29847,028,433YRINull genotypesNA18856chrX55,918,25455,922,82755,918,25455,922,827YRINull genotypesNA18504, NA19138chrX57,162,07457,167,03757,162,07457,167,037JCHNull genotypesNA18632chrX57,430,26557,450,03857,430,26557,450,038YRINull genotypesNA18501chrX62,874,61262,922,11062,874,61262,922,110JCHNull genotypesNA18995chrX65,105,13665,531,01065,105,13665,531,010YRIMendelian inconsistencies(NA18501, NA18500), (NA18502, NA18500),(NA18504, NA18503)chrX74,996,46875,010,86474,996,46875,010,864CEUNull genotypesNA12864chrx77,858,92077,859,90377,858,92077,859,903CEUNull genotypesNA10856, NA10854chrX80,152,21480,167,84380,152,21480,167,843YRINull genotypesNA18504, NA19103chrX83,958,99783,971,98683,958,99783,971,986JCHNull genotypesNA18532, NA18540chrX84,097,36984,104,17084,097,36984,104,170JCHNull genotypesNA18532, NA18540chrX88,669,19588,680,38888,669,19588,680,388CEUNull genotypesNA12003, NA1276188,670,01188,673,690CEUNull genotypesNA12003chrX89,743,31389,750,18589,743,31389,750,185YRINull genotypesNA18871, NA19138, NA1914589,743,31389,750,185YRIMendelian inconsistencies(NA19093, NA19094)chrX91,086,00591,109,76691,086,00591,109,766YRINull genotypesNA19200, NA19159chrX92,173,43092,175,75692,173,43092,175,756CEUNull genotypesNA1083892,173,43092,175,756YRINull genotypesNA18503, NA18506, NA1885692,173,43092,175,756JCHNull genotypesNA18945chrX95,742,98295,757,25895,742,98295,757,258JCHNull genotypesNA18611chrX107,659,218107,737,812107,662,335107,675,738YRINull genotypesNA19223, NA19153107,659,218107,737,812YRIMendelian inconsistencies(NA18501, NA18500), (NA18504, NA18503)chrX108,399,188108,404,026108,399,188108,404,026YRINull genotypesNA19211chrX108,703,176108,705,130108,703,176108,705,130YRINull genotypesNA18501, NA19160, NA19127chrX110,928,593110,929,387110,928,593110,929,387JCHNull genotypesNA18608, NA18633chrX114,373,214114,373,881114,373,214114,373,881JCHNull genotypesNA18540, NA18943chrX119,102,962119,108,525119,102,962119,108,525JCHNull genotypesNA18637chrX121,803,546121,804,252121,803,546121,804,252JCHNull genotypesNA18540, NA18995chrX140,165,208140,166,897140,165,208140,166,897JCHNull genotypesNA18562, NA18563chrX144,847,796144,850,328144,847,796144,850,328CEUNull genotypesNA12872chrX145,514,196145,515,614145,514,196145,515,614YRINull genotypesNA18522, NA19153chrX147,101,408147,102,204147,101,408147,102,204YRINull genotypesNA18500, NA19098chrX147,351,039147,356,073147,351,039147,356,073YRINull genotypesNA18506, NA18522, NA19173, NA19161,NA19144chrX153,206,494153,209,395153,206,494153,209,395CEUNull genotypesNA12003chr106,659,3126,666,1416,659,3126,666,141CEUMendelian inconsistencies(NA12057, NA10851)chr1011,108,87011,112,91111,108,87011,112,911YRIMendelian inconsistencies(NA19200, NA19202)chr1020,308,63120,323,19820,315,24920,321,662CEUMendelian inconsistencies(NA12760, NA12752)20,308,63120,323,198CEUMendelian inconsistencies(NA12760, NA12752)chr1020,855,21420,859,47820,855,21420,859,478CEUMendelian inconsistencies(NA11995, NA10861), (NA12154, NA10830),(NA07000, NA07029), (NA07056, NA07019),(NA12239, NA10847), (NA12234, NA10863),(NA12812, NA12801), (NA12248, NA10835)20,856,79820,859,015YRIMendelian inconsistencies(NA19140, NA19142)20,855,21420,859,478CEUHardy-Weinbergpopulationchr1028,571,81328,573,96528,571,81328,573,965YRIMendelian inconsistencies(NA19093, NA19094)chr1037,821,88837,822,92237,821,88837,822,922YRIMendelian inconsistencies(NA18501, NA18500), (NA18871, NA18872)chr1041,640,54941,649,68241,640,54941,649,682CEUMendelian inconsistencies(NA11829, NA10856), (NA11992, NA10860),(NA12236, NA10830), (NA12043, NA10857),(NA12751, NA12740)chr1046,327,38446,342,62246,327,38446,342,622YRIMendelian inconsistencies(NA19200, NA19202)chr1046,376,36346,378,18646,376,36346,378,186YRIMendelian inconsistencies(NA19093, NA19094)chr1046,993,01147,012,01646,993,01147,012,016CEUMendelian inconsistencies(NA11840, NA10854)chr1054,723,27154,798,75554,723,27154,798,755YRIMendelian inconsistencies(NA19131, NA19132)chr1057,861,52757,869,79057,861,52757,869,790CEUMendelian inconsistencies(NA12815, NA12802)chr1058,435,93858,444,28058,435,93858,444,280CEUMendelian inconsistencies(NA07000, NA07029), (NA12717, NA12707),(NA06993, NA06991), (NA12248, NA10835)chr1065,780,80965,782,24365,780,80965,782,243YRIMendelian inconsistencies(NA19099, NA19100)chr1066,655,02166,658,07266,655,02166,658,072YRIMendelian inconsistencies(NA19206, NA19208)66,655,02166,658,072JCHHardy-Weinbergpopulationchr10107,333,070107,344,876107,333,070107,344,876CEUMendelian inconsistencies(NA06985, NA06991)chr10122,059,897122,066,987122,059,897122,066,987YRIMendelian inconsistencies(NA19141, NA19142), (NA19128, NA19129),(NA19099, NA19100)chr114,940,3864,941,0774,940,3864,941,077YRIMendelian inconsistencies(NA18912, NA18914), (NA19102, NA19103),(NA19099, NA19100)chr1125,256,14825,258,53825,256,14825,258,538YRIMendelian inconsistencies(NA19099, NA19100)chr1149,813,49049,819,01649,813,49049,819,016YRINull genotypesNA19099, NA19240chr1155,147,16755,149,06355,147,16755,149,063CEUNull genotypesNA12003, NA11832, NA12864, NA12763chr11108,458,130108,458,972108,458,130108,458,972YRIMendelian inconsistencies(NA19098, NA19100)chr122,117,4402,125,5882,117,4402,125,588CEUNull genotypesNA12872chr123,957,1463,958,8033,957,1463,958,803JCHNull genotypesNA18547, NA18592, NA18576, NA18976chr126,113,8776,116,7076,113,8776,116,707CEUMendelian inconsistencies(NA12056, NA10851), (NA12750, NA12740)chr1211,375,25011,435,55511,398,34111,414,113CEUNull genotypesNA1274011,431,14711,435,555CEUNull genotypesNA1274011,398,34111,431,147YRINull genotypesNA18502, NA18515, NA18522, NA1923811,375,25011,435,555CEUMendelian inconsistencies(NA12156, NA10831), (NA12762, NA12753)11,398,34111,426,356CEUMendelian inconsistencies(NA12239, NA10847)11,402,39911,407,484YRIMendelian inconsistencies(NA18508, NA18506), (NA19119, NA19120)11,400,65511,434,605YRIMendelian inconsistencies(NA19119, NA19120)chr1212,031,05212,037,27112,031,05212,037,271YRIMendelian inconsistencies(NA18862, NA18863), (NA19207, NA19208),(NA19143, NA19145)chr1218,456,33018,458,28518,456,33018,458,285YRIMendelian inconsistencies(NA19137, NA19139), (NA19159, NA19161)18,371,12118,456,330CEUHardy-Weinbergpopulationchr1221,034,21521,035,48321,034,21521,035,483JCHNull genotypesNA18594, NA1900720,959,60421,054,866YRIHardy-Weinbergpopulationchr1222,086,46922,099,21122,086,46922,099,211YRIMendelian inconsistencies(NA19141, NA19142)22,095,42522,125,325JCHHardy-Weinbergpopulationchr1222,139,91522,142,63622,139,91522,142,636YRIMendelian inconsistencies(NA19200, NA19202), (NA19171, NA19173),(NA19204, NA19205)chr1227,532,74027,533,80927,532,74027,533,809YRIMendelian inconsistencies(NA19171, NA19173)chr1227,539,97727,545,03827,539,97727,543,917YRINull genotypesNA18515, NA1916027,540,05327,545,038YRIMendelian inconsistencies(NA18855, NA18857), (NA19171, NA19173),(NA19152, NA19154), (NA19143, NA19145),(NA19099, NA19100), (NA19239, NA19240)chr1230,128,61830,132,41030,128,61830,132,410YRINull genotypesNA18856chr1232,414,72232,422,47932,414,72232,422,479YRIMendelian inconsistencies(NA19137, NA19139)chr1233,609,51433,612,68933,609,51433,612,689YRIMendelian inconsistencies(NA18871, NA18872), (NA19127, NA19129)chr1233,617,67033,626,05033,617,67033,626,050YRIMendelian inconsistencies(NA19098, NA19100)chr1236,397,59936,433,84036,397,59936,433,840YRINull genotypesNA18854, NA18853, NA19103, NA19173,NA19208, NA19132, NA19100chr1239,104,52739,106,94839,104,52739,106,948YRINull genotypesNA19142, NA1914039,105,98539,106,948YRIMendelian inconsistencies(NA18855, NA18857), (NA19152, NA19154)chr1239,168,29339,169,67139,168,29339,169,671JCHNull genotypesNA18608, NA18605, NA18594, NA18968,NA18975chr1247,011,04547,020,87647,011,04547,020,876YRIMendelian inconsistencies(NA18508, NA18506)chr1247,345,08147,357,51247,345,08147,357,512YRIMendelian inconsistencies(NA18507, NA18506)chr1250,791,22650,819,08550,791,22650,819,085YRIMendelian inconsistencies(NA18505, NA18503)chr1254,001,14354,011,95754,001,14354,011,957YRIMendelian inconsistencies(NA18508, NA18506)chr1255,300,11155,301,40055,300,11155,301,400CEUMendelian inconsistencies(NA12717, NA12707)chr1255,606,07755,610,76355,606,07755,610,763YRINull genotypesNA18503, NA19103chr1257,502,97157,518,06457,502,97157,518,064YRIMendelian inconsistencies(NA18501, NA18500), (NA18507, NA18506),(NA18870, NA18872), (NA19101, NA19103),(NA19119, NA19120)chr1258,222,19358,230,89858,222,19358,225,778JCHNull genotypesNA18965, NA18997, NA1897158,225,77858,230,898CEUMendelian inconsistencies(NA12155, NA10831), (NA12872, NA12864),(NA06985, NA06991), (NA12248, NA10835)chr1262,688,13662,690,20162,688,13662,690,201YRIMendelian inconsistencies(NA19127, NA19129)chr1263,304,11163,323,75063,304,11163,323,750YRIMendelian inconsistencies(NA19143, NA19145)chr1269,160,99369,163,28369,160,99369,163,283CEUNull genotypesNA10856, NA1275169,160,99369,163,283CEUMendelian inconsistencies(NA12056, NA10851), (NA12145, NA10846),(NA12146, NA10847)chr1276,544,92876,569,80976,544,92876,569,809YRIMendelian inconsistencies(NA19203, NA19205)chr1278,939,80178,940,38678,939,80178,940,386YRIMendelian inconsistencies(NA18522, NA18521)chr1281,459,09981,470,76681,459,09981,470,766YRIMendelian inconsistencies(NA19210, NA19211)chr1282,662,98382,669,95182,662,98382,669,951JCHNull genotypesNA18608, NA18960chr1286,264,83286,275,49986,264,83286,275,499CEUMendelian inconsistencies(NA12003, NA10838), (NA11995, NA10861)chr1288,992,72788,993,41288,992,72788,993,412CEUMendelian inconsistencies(NA12006, NA10839), (NA07345, NA07348),(NA06994, NA07029)88,992,72788,993,412YRIMendelian inconsistencies(NA18858, NA18860)chr1292,363,64092,365,45792,363,64092,365,457JCHNull genotypesNA18572, NA18563chr1294,301,82694,311,59494,301,82694,311,594YRIMendelian inconsistencies(NA19209, NA19211)chr1296,379,48196,391,28596,379,48196,391,285YRIMendelian inconsistencies(NA19138, NA19139)chr1296,517,21396,533,44796,517,21396,533,447CEUMendelian inconsistencies(NA12005, NA10839), (NA11831, NA10855)chr1297,173,32997,173,98997,173,32997,173,989CEUNull genotypesNA12864, NA12760, NA12761, NA12752,NA12763, NA12753chr1298,297,96098,304,97698,297,96098,304,976CEUNull genotypesNA12144chr12100,026,444100,034,552100,026,444100,034,552CEUMendelian inconsistencies(NA06994, NA07029), (NA07056, NA07019),(NA12249, NA10835)chr12107,297,172107,311,184107,297,172107,311,184YRIMendelian inconsistencies(NA18507, NA18506), (NA18522, NA18521),(NA18912, NA18914), (NA19099, NA19100)chr12109,794,989109,803,752109,794,989109,803,752CEUMendelian inconsistencies(NA12751, NA12740)chr12111,568,376111,585,515111,568,376111,585,515YRIMendelian inconsistencies(NA18508, NA18506)chr12117,847,706117,853,165117,847,706117,853,165JCHNull genotypesNA18959, NA18976chr12124,559,866124,561,057124,559,866124,561,057YRIMendelian inconsistencies(NA19131, NA19132)chr12125,947,213125,951,381125,947,213125,951,381YRIMendelian inconsistencies(NA18501, NA18500)chr12127,151,101127,159,718127,151,101127,159,718YRIMendelian inconsistencies(NA19160, NA19161)chr12129,501,100129,503,209129,501,100129,503,209YRIMendelian inconsistencies(NA19203, NA19205)chr12130,083,416130,128,817130,083,416130,128,817CEUMendelian inconsistencies(NA07022, NA07019)chr12130,253,222130,299,606130,253,222130,299,606YRIMendelian inconsistencies(NA19204, NA19205)chr12130,625,352130,630,204130,625,352130,630,204CEUNull genotypesNA12006, NA11829, NA10856, NA11994,NA12154, NA12763, NA12751130,625,352130,629,179YRIMendelian inconsistencies(NA19201, NA19202), (NA19119, NA19120)chr12131,823,562131,838,245131,823,562131,838,245CEUNull genotypesNA10838, NA11832, NA11995, NA06985chr1318,261,86718,268,07118,261,86718,268,071YRIMendelian inconsistencies(NA18856, NA18857), (NA19210, NA19211),(NA19222, NA19221), (NA19099, NA19100)chr1331,939,63531,941,44731,939,63531,941,447CEUMendelian inconsistencies(NA12043, NA10857)chr1337,666,77337,668,59237,666,77337,668,592CEUMendelian inconsistencies(NA12815, NA12802)chr1346,127,77246,128,41946,127,77246,128,419YRIMendelian inconsistencies(NA18504, NA18503)chr1355,553,47555,554,50355,553,47555,554,503CEUMendelian inconsistencies(NA11995, NA10861)55,553,47555,583,831YRIHardy-Weinbergpopulationchr1355,559,47155,565,09455,559,47155,565,094CEUMendelian inconsistencies(NA11995, NA10861), (NA06985, NA06991)55,553,47555,583,831YRIHardy-Weinbergpopulationchr1365,310,96965,318,34565,310,96965,318,345CEUMendelian inconsistencies(NA12043, NA10857)chr1367,050,83967,055,76167,050,83967,055,761CEUMendelian inconsistencies(NA11832, NA10855)chr1378,277,40578,301,11178,277,40578,301,111CEUMendelian inconsistencies(NA11829, NA10856)chr1395,875,94095,876,92695,875,94095,876,926YRIMendelian inconsistencies(NA19201, NA19202)chr13112,751,980112,753,018112,751,980112,753,018YRIMendelian inconsistencies(NA18859, NA18860), (NA18861, NA18863)chr1433,107,80733,110,01533,107,80733,110,015YRINull genotypesNA18503, NA19171chr1436,920,09336,928,31236,920,09336,928,312YRINull genotypesNA18501, NA18858chr1468,010,23168,011,60368,010,23168,011,603CEUNull genotypesNA11832, NA12875, NA06985chr1474,338,28274,350,47474,338,28274,350,474YRINull genotypesNA18502, NA19160, NA19153chr14104,215,047104,275,522104,215,047104,275,522CEUNull genotypesNA06994, NA07019chr14104,485,754104,965,621104,711,227104,741,347JCHNull genotypesNA18594, NA19000104,732,004104,733,584YRIMendelian inconsistencies(NA18870, NA18872), (NA19171, NA19173),(NA19172, NA19173), (NA19159, NA19161)104,485,754104,965,621CEUNull genotypesNA07029104,873,992104,886,848CEUNull genotypesNA06994, NA07029chr1518,840,31718,844,98718,840,31718,844,987CEUNull genotypesNA11829, NA10856, NA11832, NA07345,NA12873, NA12874chr1522,225,10422,261,99322,225,10422,261,993CEUNull genotypesNA12761, NA11882, NA10835chr1524,972,76024,980,43524,972,76024,980,435CEUMendelian inconsistencies(NA12056, NA10851)chr1532,437,86632,525,03732,437,86632,525,037YRINull genotypesNA18500, NA18521, NA19144chr1554,508,49254,511,69454,508,49254,511,694JCHNull genotypesNA18561, NA18605chr1715,244,56515,259,09115,244,56515,259,091JCHNull genotypesNA18572, NA18592, NA18991chr1734,186,33634,188,93434,186,33634,188,934YRINull genotypesNA18506, NA18914, NA19094, NA19138,NA19210, NA1916134,186,33634,188,934YRIMendelian inconsistencies(NA18507, NA18506), (NA19130, NA19132),(NA19192, NA19194)chr1739,893,16639,898,34339,893,16639,898,343YRINull genotypesNA18522, NA19209chr181,907,9001,922,8381,907,9001,922,838JCHNull genotypesNA18635, NA18612chr183,815,1143,820,8923,815,1143,820,892CEUNull genotypesNA12145, NA10846chr1822,618,47522,619,96022,618,47522,619,960YRINull genotypesNA19171chr1827,285,12727,286,29127,285,46827,286,291CEUMendelian inconsistencies(NA11832, NA10855)27,285,12727,285,713CEUMendelian inconsistencies(NA11832, NA10855)chr1829,104,10629,114,50429,104,10629,114,504JCHNull genotypesNA18624, NA18968, NA19007chr1832,371,41532,436,18432,371,41532,436,184JCHNull genotypesNA18971chr1832,507,10332,508,65532,507,10332,508,655YRIMendelian inconsistencies(NA18505, NA18503)chr1832,653,14732,677,78632,653,14732,677,786JCHNull genotypesNA18971chr1836,239,51936,247,37136,239,51936,247,371YRINull genotypesNA18506chr1836,512,51336,518,18736,512,51336,518,187CEUNull genotypesNA10839, NA11993, NA10830, NA06994,NA12761chr1836,926,39436,930,79036,926,39436,930,790JCHNull genotypesNA18571chr1844,632,51144,636,39944,632,51144,636,399JCHNull genotypesNA18947, NA18944chr1845,353,98945,369,89645,353,98945,369,896JCHNull genotypesNA18947, NA18944chr1846,252,95246,257,05846,252,95246,257,058JCHNull genotypesNA18947, NA18944chr1849,789,71449,796,76649,789,71449,796,766YRINull genotypesNA19116chr1851,289,23751,294,29851,289,23751,294,298YRINull genotypesNA19116chr1856,070,40256,072,44956,070,40256,072,449YRIMendelian inconsistencies(NA19140, NA19142)chr1861,350,42761,354,67161,350,42761,354,671YRIMendelian inconsistencies(NA19159, NA19161)chr1861,878,05661,879,91961,878,05661,879,919YRINull genotypesNA18852, NA19208chr1864,171,64964,270,53264,171,64964,270,532YRINull genotypesNA19100, NA1909864,222,74464,261,149YRIMendelian inconsistencies(NA19098, NA19100)64,193,33464,209,977YRIMendelian inconsistencies(NA19098, NA19100)64,241,18564,256,622YRIMendelian inconsistencies(NA19098, NA19100)chr1864,444,63764,450,08564,444,63764,450,085JCHNull genotypesNA18964chr1864,895,17764,904,47764,895,17764,904,477JCHNull genotypesNA18564, NA18964, NA18976chr1869,100,08369,103,44869,100,08369,103,448CEUMendelian inconsistencies(NA12812, NA12801)chr1874,392,29974,395,52974,392,29974,395,529CEUMendelian inconsistencies(NA12056, NA10851)chr1874,909,00774,923,02274,909,00774,923,022JCHNull genotypesNA18582chr1946,046,37346,065,87346,046,37346,065,873JCHNull genotypesNA18973, NA18952chr20681,314685,325681,314685,325YRIMendelian inconsistencies(NA19153, NA19154)chr201,564,7041,567,3741,564,7041,567,374YRIMendelian inconsistencies(NA18862, NA18863), (NA19160, NA19161),(NA19130, NA19132), (NA19239, NA19240)chr206,417,0016,417,5706,417,0016,417,570YRIMendelian inconsistencies(NA18862, NA18863), (NA19160, NA19161),(NA19192, NA19194)chr2014,789,36114,818,47214,789,36114,818,472CEUMendelian inconsistencies(NA12043, NA10857)14,798,45414,817,227CEUMendelian inconsistencies(NA12043, NA10857)chr2016,562,20216,580,31416,562,20216,580,314YRIMendelian inconsistencies(NA19102, NA19103)chr2041,042,45141,046,28241,042,45141,046,282CEUMendelian inconsistencies(NA12155, NA10831)chr2047,638,10847,664,81147,638,10847,664,811CEUNull genotypesNA07034chr219,979,02910,033,4569,979,02910,016,793CEUNull genotypesNA07357, NA06985, NA12248, NA12249,NA1083510,012,22110,033,456JCHNull genotypesNA185629,979,02910,012,221YRIMendelian inconsistencies(NA19092, NA19094), (NA19172, NA19173),(NA19140, NA19142)chr2113,483,81013,497,62513,483,81013,497,625JCHNull genotypesNA18550chr2113,588,03513,603,21013,588,03513,603,210JCHNull genotypesNA18550chr2113,817,28113,833,80613,817,28113,833,806JCHNull genotypesNA18529chr2114,003,72814,020,04114,003,72814,020,041CEUNull genotypesNA1085414,003,72814,015,620YRINull genotypesNA19161chr2123,345,57823,354,46223,345,57823,354,462YRINull genotypesNA1852323,347,30623,352,592YRIMendelian inconsistencies(NA18508, NA18506)chr2124,477,77524,500,65224,477,77524,500,652JCHNull genotypesNA18579chr2127,117,91627,122,32027,117,91627,122,320YRINull genotypesNA19099chr2132,878,23132,878,74432,878,23132,878,744JCHNull genotypesNA18945chr2134,188,73234,190,16334,188,73234,190,163CEUNull genotypesNA10851, NA12763chr2134,400,72734,401,81934,400,72734,401,819YRINull genotypesNA19098chr2216,391,06216,393,15816,391,06216,393,158JCHNull genotypesNA18973, NA18991, NA18994, NA18992,NA19007chr2218,146,73318,170,14018,146,73318,170,140JCHNull genotypesNA18942, NA18973, NA18994, NA18992,NA19007chr2219,783,35819,785,82819,783,35819,785,828YRIMendelian inconsistencies(NA19210, NA19211)chr2220,705,59920,709,56020,705,59920,709,560YRINull genotypesNA19221chr2220,710,98820,718,78120,710,98820,718,781CEUMendelian inconsistencies(NA12716, NA12707)chr2220,751,45520,771,12920,751,45520,771,129CEUNull genotypesNA1287820,751,57720,756,384CEUMendelian inconsistencies(NA12716, NA12707)chr2220,779,13420,808,46620,799,46420,808,466CEUMendelian inconsistencies(NA12005, NA10839), (NA12716, NA12707)20,779,13420,806,052CEUMendelian inconsistencies(NA12005, NA10839)chr2220,814,46220,826,38720,814,46220,826,387CEUNull genotypesNA12878chr2220,829,87620,830,89420,829,87620,830,894CEUMendelian inconsistencies(NA12892, NA12878)chr2220,841,75920,858,13320,841,75920,858,133JCHNull genotypesNA18972chr2220,861,05220,881,21620,861,05220,881,216CEUNull genotypesNA12873chr2221,019,04021,026,84521,019,04021,026,845YRINull genotypesNA18523, NA18871chr2221,026,94421,558,65021,036,34021,046,529CEUNull genotypesNA10839, NA1084621,396,77821,538,141CEUNull genotypesNA12005, NA10831, NA07357, NA06994,NA10846, NA12707, NA0698521,118,17521,360,293CEUNull genotypesNA07357, NA06994, NA1084621,116,95421,282,684CEUNull genotypesNA06994, NA1084621,104,18221,129,536CEUNull genotypesNA0699421,261,36921,357,002CEUNull genotypesNA0699421,026,94421,069,666CEUNull genotypesNA1084621,247,43221,250,691YRINull genotypesNA18523, NA18914, NA1915421,037,38521,137,802YRINull genotypesNA18523, NA18871, NA1915421,221,46621,336,872YRINull genotypesNA18523, NA18871, NA1915421,036,34021,548,612JCHNull genotypesNA18526, NA18943, NA1894521,223,78821,249,549JCHNull genotypesNA18972, NA1894521,046,43221,110,493JCHNull genotypesNA18526, NA1897221,162,03121,216,889JCHNull genotypesNA18526, NA1897221,240,37621,280,071JCHNull genotypesNA18526, NA1897221,380,86721,548,896JCHNull genotypesNA18526, NA1897221,087,32521,158,310JCHNull genotypesNA1897221,230,94521,240,941JCHNull genotypesNA1897221,396,77821,496,507JCHNull genotypesNA1897221,514,09821,522,758JCHNull genotypesNA1897221,547,04021,558,650JCHNull genotypesNA1897221,193,05121,194,296CEUMendelian inconsistencies(NA12005, NA10839), (NA11992, NA10860),(NA12716, NA12707), (NA12873, NA12864),(NA07055, NA07048)21,041,99821,046,529CEUMendelian inconsistencies(NA12044, NA10857), (NA12873, NA12864)21,214,21821,221,815CEUMendelian inconsistencies(NA11831, NA10855)21,238,22121,243,149CEUMendelian inconsistencies(NA11993, NA10860)21,155,78921,163,650CEUMendelian inconsistencies(NA12813, NA12801)21,038,13921,040,959CEUMendelian inconsistencies(NA12044, NA10857)21,268,70721,396,778CEUMendelian inconsistencies(NA12044, NA10857)21,214,62021,235,614CEUMendelian inconsistencies(NA07055, NA07048)21,359,78721,388,825YRIMendelian inconsistencies(NA19144, NA19145)21,247,43221,249,549YRIMendelian inconsistencies(NA19140, NA19142)21,408,82521,411,554YRIMendelian inconsistencies(NA19203, NA19205)21,519,75521,538,141YRIMendelian inconsistencies(NA18502, NA18500)21,046,52921,144,002CEUHardy-Weinbergpopulation21,214,62021,235,614CEUHardy-Weinbergpopulation21,110,49321,131,227JCHHardy-Weinbergpopulationchr2222,269,92322,271,90622,269,92322,271,906YRINull genotypesNA18503, NA18505chr2222,701,38722,701,48322,701,38722,701,483CEUNull genotypes (BCM) +NA10838, NA10851, NA11829, NA10856,MendelianNA11832, NA10861, NA07357, NA07056,inconsistencies (Sanger)NA12043, NA12044, NA10857, NA12239,NA12874, NA12865, NA12751chr2224,077,33724,086,56424,077,33724,086,564CEUMendelian inconsistencies(NA12716, NA12707)24,075,42724,095,027YRIHardy-Weinbergpopulationchr2224,123,18624,131,50324,123,18624,131,503CEUMendelian inconsistencies(NA11830, NA10856)chr2224,230,22224,233,17024,230,22224,233,170CEUMendelian inconsistencies(NA12716, NA12707)24,231,41424,232,995YRIMendelian inconsistencies(NA19141, NA19142)chr2228,193,40928,209,12128,193,40928,209,121JCHNull genotypesNA1855228,180,63328,210,597JCHHardy-Weinbergpopulationchr2233,621,00933,622,55233,621,00933,622,552CEUMendelian inconsistencies(NA12056, NA10851)chr2237,615,46637,624,86537,615,46637,624,865CEUMendelian inconsistencies(NA12006, NA10839), (NA12873, NA12864)chr2242,897,00142,911,18642,897,00142,911,186JCHNull genotypesNA18572chr2247,865,64747,867,22247,865,64747,867,222JCHNull genotypesNA18529, NA18944


Other Embodiments

The description of the specific embodiments of the invention is presented for the purposes of illustration. It is not intended to be exhaustive or to limit the scope of the invention to the specific forms described herein. Although the invention has been described with reference to several embodiments, it will be understood by one of ordinary skill in the art that various modifications can be made without departing from the spirit and the scope of the invention, as set forth in the claims. All patents, patent applications, and publications referenced herein are hereby incorporated by reference.


Other embodiments are in the claims.

Claims
  • 1. A method for predicting the immunocompatibility of the immune system of a first subject with a cell, tissue, or organ from a second subject comprising: (a) obtaining a first biological sample from the first subject and a second biological sample from the second subject; (b) determining the presence or absence of at least one deletion variant in the DNA sequence of a gene in the first and second biological samples of step (a), wherein the deletion variant substantially prevents expression of an antigen encoded by the gene, and wherein the at least one deletion variant is in a gene selected from the group consisting of UGT2B28, TRY6, LCE3C, PRB1, OR51A2, ORF4F5, GNB1L, MGAM, and MCEE; and (c) comparing the presence or absence of the at least one deletion variant determined in step (b) from the first biological sample from the first subject and the second biological sample from the second subject; wherein the immune system of the first subject is immunocompatible with the cell, tissue, or organ from the second subject if (i) the first subject has at least one intact copy of the gene selected from the group consisting of UGT2B28, TRY6, LCE3C, PRB1, OR51A2, ORF4F5, GNB1L, MGAM, and MCEE, wherein the antigen encoded by the gene is expressed or (ii) the second subject has a deletion variant in all copies of the gene selected from the group consisting of UGT2B28, TRY6, LCE3C, PRB1, OR51A2, ORF4F5, GNB1L, MGAM, and MCEE, wherein said deletion variant substantially prevents expression of the antigen encoded by the gene.
  • 2. The method of claim 1, wherein said first or second biological sample is an organ or part thereof.
  • 3. The method of claim 1, wherein said first or second biological sample is a tissue.
  • 4. The method of claim 1, wherein said first or second biological sample is a bodily fluid.
  • 5. The method of claim 4, wherein said bodily fluid is blood, serum, plasma, bone marrow, cerebrospinal fluid, amniotic fluid, urine, saliva, or semen.
  • 6. The method of claim 1, wherein the presence or absence of the at least one deletion variant is determined by polymerase chain reaction, DNA sequencing, whole-genome sequencing, Southern blotting, restriction fragment length polymorphism analysis, microelectrophoresis, sequencing by hybridization, single molecule sequencing, or microarray analysis.
  • 7. The method of claim 1, wherein the presence or absence of the at least one deletion variant is determined indirectly by genotyping one or more polymorphisms that are in linkage disequilibrium with the deletion variant.
  • 8. The method of claim 7, wherein said polymorphism is a single nucleotide polymorphism (SNP).
  • 9. The method of claim 1, wherein the presence or absence of at least one deletion variant is determined by genotyping one or more polymorphisms that are located inside the sequence that is deleted by the deletion variant.
  • 10. The method of claim 1, wherein the at least one deletion variant is a common deletion variant.
  • 11. The method of claim 1, wherein the at least one deletion variant is at least 100 base pairs in length.
  • 12. The method of claim 1, wherein the at least one deletion variant is in the coding region of the gene.
  • 13. The method of claim 1, wherein the at least one deletion variant is in a regulatory element of the gene.
  • 14. The method of claim 1, wherein the at least one deletion variant is in a gene that is normally expressed in the biological sample.
  • 15. The method of claim 1, wherein the at least one deletion variant is in the UGT2B28 gene.
  • 16. The method of claim 1, comprising determining the presence or absence of at least two deletion variants.
  • 17. The method of claim 1, further comprising determining the blood type or the MHC type for the first and second subjects.
  • 18. The method of claim 1, wherein said second subject is in need of a bone marrow or peripheral blood transplant and said first subject is a potential bone marrow or peripheral blood donor and said method is used to determine if said first subject and said second subject are a donor/recipient match.
  • 19. The method of claim 1, wherein said first subject is a subject in need of an organ or tissue and said second subject is a potential organ or tissue donor and said method is used to determine if said first subject and said second subject are a donor/recipient match.
  • 20. The method of claim 1, wherein said first subject is a woman and said second subject is a prospective father and the method is used to determine if the immune system of the woman is immunocompatible with a sperm from the prospective father.
  • 21. The method of claim 20, wherein said prospective father is a potential sperm donor.
  • 22. The method of claim 1, wherein said first subject is a woman and said second subject is an embryo or fetus.
  • 23. The method of claim 22, wherein the embryo is conceived by in vitro fertilization.
  • 24. The method of claim 22, wherein said antigen is normally expressed by fetal or embryonic cells.
  • 25. The method of claim 1, wherein said second subject is a subject that is in need of a bone marrow or peripheral blood transplant and said first subject is a bone marrow or peripheral blood donor, wherein said method is used to identify said first subject and said second subject as a donor/recipient match if the immune system of the first subject is not immunocompatible with the bone marrow or peripheral blood from the second subject.
  • 26. The method of claim 25, wherein said second subject has a blood cell cancer and wherein said gene encodes an antigen that is specifically expressed on the blood cancer cells.
  • 27. The method of claim 1, further comprising (d) determining the presence or absence of at least one additional deletion variant in the DNA sequence of a gene in the first and second biological samples of step (a), wherein the deletion variant substantially prevents expression of an antigen encoded by the gene, and wherein the at least one deletion variant is in a gene selected from the group consisting of UGT2B17, GSTT1, GSTM1, and CYP2A6; and (e) comparing the presence or absence of the at least one additional deletion variant determined in step (d) from the first biological sample from the first subject and the second biological sample from the second subject; wherein the immune system of the first subject is immunocompatible with the cell, tissue, or organ from the second subject if (i) the first subject has at least one intact copy of the gene selected from the group consisting of UGT2B17, GSTT1, GSTM1, and CYP2A6, wherein the antigen encoded by the gene is expressed or (ii) the second subject has a deletion variant in all copies of the gene selected from the group consisting of UGT2B17, GSTT1, GSTM1, and CYP2A6, wherein said deletion variant substantially prevents expression of the antigen encoded by the gene.
  • 28. The method of claim 27, wherein the at least one deletion variant of step (b) is in the UGT2B28 gene and the at least one additional deletion variant of step (d) is in the UGT2B17 gene.
  • 29. The method of claim 27, comprising determining the presence or absence of at least two additional deletion variants in step (d).
  • 30. The method of claim 29, wherein the at least one deletion variant of step (b) is in the UGT2B28 gene and the at least two additional deletion variants of step (d) are in the GSTM1 and GSTT1 genes.
CROSS REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. Provisional Application No. 60/741,638 filed on Dec. 2, 2005, herein incorporated by reference.

STATEMENT AS TO FEDERALLY FUNDED RESEARCH

The United States Government has a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of grant number 1U54HG02750 awarded by the National Human Genome Research Institute of the National Institutes of Health.

Provisional Applications (1)
Number Date Country
60741638 Dec 2005 US