Means and methods for investigating nucleic acid sequences

The invention relates to the fields of biology, molecular biology, biotechnology and medicine.

Nucleic acid sequences are investigated in a wide variety of applications. For instance, for diagnosis of infection with a pathogen, a sample of an individual is often screened for the presence of pathogen nucleic acid. Furthermore, nucleic acid sequence investigation is often performed for the diagnosis of genetic disorders, such as for instance Prader-Willi syndrome, Angelman syndrome and Duchenne muscular dystrophy. Widely used methods for detection of deletions or duplications of chromosomal sequences are quantitative multiplex PCR and quantitative Southern blotting. Drawbacks of these methods are that they are time-consuming and that results are difficult to interpret.

One particularly suitable technique for investigation of nucleic acid sequences is multiplex ligation dependent probe amplification (MLPA). This technique is based on hybridisation of probes to target nucleic acids, where after probes are amplified. In currently used MLPA assays, each MLPA probe set consists of two half probes. These two half probes contain a target-specific sequence and a primer binding site sequence to which a nucleic acid amplification primer (preferably a PCR primer) can bind. One half probe is typically shorter in length then the other. The other half probe is longer due to a non-hybridizing stuffer sequence. The stuffer sequence of each probe set is unique in length, resulting in different lengths of amplification products (typically between 130 and 480 base pairs) that can be separated by electrophoresis. In an MLPA assay, typically a plurality of probe sets is used. The two half probes of each probe set are typically added to denatured sample nucleic acid and hybridized immediately adjacent to each other on their target sequence. Subsequently, the resulting nucleic acid is subjected to a ligation reaction. Usually a ligase is used which ligates only half probes that are perfectly matched with their target sequence (such as for instance the thermostable Ligase-65). A mismatch of a half probe at the ligation site prevents ligation and amplification. Thereby no amplification products of the probe will be detected. This allows MLPA to discriminate sequences that only differ in a single nucleotide. Sequences from pseudogenes or related genes can therefore be distinguished. Ligated half probes (which are also referred to as “ligated probes”) are amplified, preferably by PCR, using primers capable of specifically binding the primer binding site sequences of the probes. The amplification products of each ligated probe are separated and analyzed, for instance by electrophoresis. Preferably, amplification products are represented graphically by separate peaks. Each peak is the product of an amplified MLPA ligated probe and a relative difference in peak intensity (height or surface) between a control sample and a sample of interest indicates copy number variation. FIG. 1A schematically outlines an MLPA reaction.

MLPA is particularly suitable for detecting nucleic acid (pseudo)gene variants, (pseudo)gene-specific nucleotides and/or copy number variation. MLPA has been employed in several studies, e.g. for the diagnosis of Prader-Willi or Angelman syndromes, for prenatal diagnosis of chromosomal aberrations in fetuses, and for the detection of exon deletions and/or duplications in the Duchenne muscular dystrophy gene. Overall, the conclusion was that MLPA could replace the existing methods used for screening of chromosomal abnormalities due to its relative simplicity, reproducibility and speed.

In an MLPA assay, targeted nucleic acid which is gene-specific or pseudogene-specific is preferably present at the ligation site of the half probes. When a gene-specific or pseudogene-specific nucleotide is present at (or within three nucleotides from) a ligation site, this will ensure that only perfectly matched half probes are ligated to each other. A mismatch of a half probe at the ligation site prevents ligation and amplification, whereas a perfect match of the half probe at the ligation site allows ligation and amplification. As said before, this allows MLPA to discriminate between sequences that only differ in a single nucleotide. Mismatches at four to six nucleotides away from the ligation site have been reported to have little effect on the ligation step.

Hence, the half probes are preferably designed such that the half probe whose 3′ end hybridizes at a target sequence (called herein a “left probe” or a “left half probe”) is complementary to a gene-specific sequence or pseudogene-specific sequence of the target sequence. This gene-specific or pseudogene-specific sequence of the target sequence comprises at least one but preferably more nucleotides that make the probe specific for a given gene or pseudogene. Preferably, at least one of the 3′ end nucleotides of said left half probe is complementary to at least one gene-specific nucleotide and/or at least one pseudogene-specific nucleotide of the target sequence, so that the (pseudo)gene-specific nucleotide(s) or a single nucleotide polymorphism within a given (pseudo)gene is present at (or within three nucleotides from) the ligation site of said left half probe. In this case, said left half probe and the probe whose 5′ end hybridizes at a target sequence (called herein a “right probe” or a “right half probe”) are ligated to each other only when the sequence of the left half probe perfectly matches its target sequence.

As used herein the term “gene-specific nucleotide” or “gene-specific sequence” means a nucleotide or sequence, respectively, which is present in said gene but not present at the corresponding location in at least one other related gene or pseudogene. The term “pseudogene-specific nucleotide” or “pseudogene-specific sequence” means a nucleotide or sequence, respectively, which is present in said pseudogene but not present at the corresponding location in at least one other related gene or pseudogene. Hence, at least one other (pseudo)gene comprises another nucleotide or sequence at that location. The presence of a (pseudo)gene-specific nucleotide or (pseudo)gene-specific sequence in a (pseudo)gene thus distinguishes said (pseudo)gene from at least one other (pseudo)gene, even in case when the other (pseudo)gene has a high overall homology with said (pseudo)gene.

A pseudogene is defined herein as a nucleic acid sequence which does not encode a wild type, functional, protein. The term “pseudogene” encompasses nucleic acid sequences which do not encode protein at all. Additionally, the term “pseudogene” encompasses gene alleles which comprise a modification, for instance an insertion or deletion so that they encode a protein or a part of a protein with significantly impaired, or lost, function as compared to a wild type protein of the same kind. Such allele for instance encodes a truncated protein as a result of a frame shift caused by an insertion and/or deletion of at least one nucleotide, or caused by a premature stop codon.

Since ligases only ligate half probes which are adjacent to each other, half probes need to be designed which are capable of hybridizing immediately adjacent to each other on their target sequence. This is not always convenient, because the hybridization location of a left half probe on a target nucleic acid is often determined by a (pseudo)gene-specific site of the target nucleic acid (as explained above). In such case, the sequence of the corresponding right half probe is determined as well, since the right half probe should be capable of hybridizing to a region of said target nucleic acid which is immediately adjacent to said (pseudo)gene-specific nucleotide. However, such region may comprise sequences which are very commonly present in the nucleic acid sequences of a sample. As a result, a right half probe having a sequence which is complementary to such common sequence will hybridize at many different sites of the nucleic acids present in a sample. In such case, it would be more attractive to design a right half probe with a sequence which is more specific for a given site of interest of a target nucleic acid. However, if the left half probe and the right half probe do not hybridize to adjacent regions of a target nucleic acid, the commonly used ligases will not be capable of performing the ligation reaction. Patent application WO 01/61033 in the name of Schouten discloses a solution to this problem by adding a short third probe to the reaction mixture, which third probe will fill the gap between the left half probe and the right half probe. Such third probe is designed to hybridize to a region of a target nucleic acid which lies between the left and the right half probes. After hybridization of such third probe, the left half probe is connected to the right half probe via the third probe and ligation has become possible. The third half probe does not need to be perfectly complementary to the region of the target nucleic acid which lies between the left and the right half probes, as long as the third probe connects the left half probe and the right half probe so that a ligase reaction can occur. Moreover, since the third probe is small, it will hybridize more easily to the target nucleic acid as compared to the left and right half probes. Hence, mismatches between the third probe and the target nucleic acid are allowed. This way, one and the same third probe is suitable for connecting left and right half probes of different probe sets.

Instead of using a third probe, WO 01/61033 also discloses an embodiment wherein the 3′ end of a left half probe is extended after hybridization of the half probes to the target sequence, so that the gap between the left half probe and the right half probe is filled. The resulting extended left half probe is adjacent to the right half probe and a ligase reaction has become possible.

In order to be capable of distinguishing between amplificates of different probe sets, currently used MLPA probe sets are designed such that the resulting amplificates have a different length. Differences in ligated probe length are typically realized by using a non-hybridizing stuffer sequence in one of the half probes. The stuffer sequence of the half probes of each probe set is unique in length, resulting in different lengths of amplification products that can be separated by electrophoresis. Typically, in order to be capable of discriminating between the different amplification products, the difference in length between different ligated probes is at least 5 nucleotides. Since a usual MLPA assay involves the use of many different probe sets in order to be capable of detecting a wide variety of (pseudo)gene variants, this means that long probes have to be generated. This is especially the case when complex loci carrying many (pseudo)gene-specific nucleotides are investigated for proper genotyping and/or additional single nucleotide polymorphisms are investigated for detection of subtle genetic variation within a specific genotype, as well as the presence of pseudogenes and single nucleotides in these pseudogenes. Such investigation requires the use of many different probe sets. This is inconvenient if probes are chemically synthesized, because a drawback of synthetic probes is the lower quality in comparison with cloned probes, due to contamination with incompletely synthesized probes. These incompletely synthesized probes lack or gain one nucleotide, which results in stutter peaks and split peaks. A method to remove these contaminants is to purify the synthesized probes, for instance by polyacrylamid gel electrophoresis (PAGE). If short and long probes are chemically synthesized, a higher proportion of longer probes is more likely to be affected by the incomplete oligonucleotides, causing a limitation of synthetic probe size. The upper limit of synthetic probes is typically about 100 base pairs.

On the other hand, the use of synthetic probes is preferred because they are easy to obtain and cost-effective whereas generating a probe by cloning in bacteriophage vectors is a time-consuming process and more expensive.

Hence, although good results have been obtained with currently used MLPA assays, it is desirable to provide alternatives and improvements, especially if complex (pseudo)gene loci are investigated which involves the use of many probe sets.

It is an object of the present invention to provide alternative and improved MLPA methods and MLPA-like methods.

Accordingly, the present invention provides MLPA assays and MLPA-like assays wherein at least one probe set is used which comprises a first nucleic acid probe (“left probe” or “left probe part”), a second nucleic acid probe (“right probe” or “right probe part”) and a third nucleic acid probe (“third probe” or “middle probe” or “middle probe part”), wherein at least one third probe is complementary to a target nucleic acid region comprising a (pseudo)gene-specific nucleotide or (pseudo)gene-specific sequence.

The present invention provides a different approach as compared to the prior art. MLPA methods and MLPA-like methods are now provided wherein at least one third probe, but preferably a plurality of third probes, is used in order to detect at least one (pseudo)gene-specific nucleotide of a target nucleic acid. Hence, an additional probe is used in at least one of the probe sets, which is specific for a (pseudo)gene-specific target nucleic acid. As used herein, an MLPA-like method is defined as a method comprising the steps of hybridisation of at least two probes to a target nucleic acid and ligation of at least two probes. Preferably, said MLPA-like method comprises amplification of ligated probes as well.

MLPA methods and MLPA-like methods according to the present invention have several advantages as compared to current methods. For instance, if the left probe and the third probe of a probe set are both complementary to target nucleic acid regions comprising (pseudo)gene-specific nucleotides and/or additional single nucleotide polymorphism(s), two different (pseudo)gene-specific target nucleotides or two SNP's or a combination of one (pseudo)gene specific target nucleotide and one SNP are screened using one probe set. It has become possible to use one probe set in order to screen for at least two (pseudo)gene variations which are located within a region of about 150 nucleotides of a target nucleic acid. Contrary, in a currently used MLPA assay two separate probe sets are needed for screening for two variants in a target nucleic acid. This is illustrated by the following example. If a target (pseudo)gene contains a (pseudo)gene variant at location A and at location B, an individual may comprise the following alleles: a-b, a-B, A-b and A-B. In order to determine whether allele a-B is present in a sample of said individual, a currently used MLPA assay would need a probe set specific for the “a” and/or “A” (pseudo)gene variant and a probe set specific for the “B” and/or “b” (pseudo)gene variant. If both the probe set specific for “a” and the probe set specific for “B” provide a positive result, it is concluded that allele a-B is present in said individual. With a MLPA method according to the present invention, however, only one probe set is needed wherein the left probe is specific for the “a” (pseudo)gene variant and the third probe is specific for the “B” (pseudo)gene variant. If an amplification product is obtained, it is immediately concluded that allele a-B is present in said individual. If allele a-B is not present, said probe set according to the invention will not yield an amplification product. Hence, it has become possible to more specifically screen for a given allele.

Moreover, a method of the invention provides an additional advantage when two (pseudo)gene variations are located close to each other. If the (pseudo)gene variants at location A and at location B are close to each other, the use of two different probe sets according to conventional MLPA techniques is inconvenient or even not possible at all, because the two probe sets will hinder each other in view of their close proximity. This will result in less efficient hybridization of the two probe sets, resulting in a lower signal as compared to a method according to the invention, wherein two (pseudo)gene variants can be detected using only one probe set. Hence, a method according to the invention is more sensitive when (pseudo)gene variants are located close to each other (in practice, this effect will be most profound when the (pseudo)gene variants are located between 20-100 nucleotides from each other). Having two probes to detect a variant at the same position (such as in currently used MLPA assays) will result in a change in signal intensity, depending on the presence of the (pseudo)gene variant and the binding of the probe. The use of more than two probes for one position is not advised. FIG. 1B schematically outlines an MLPA reaction according to the invention in which a probe set consisting of three probes is used for detecting two SNPs. FIG. 1C shows a non-limiting example of two specific probe sets according to the invention for detecting two SNPs.

As another example, in case that an individual is heterozygous for the above mentioned (pseudo)gene, the individual for instance contains alleles a-B and A-b. A conventional MLPA assay would use four probe sets (one specific for “a”, one specific for “A”, one specific for “b” and one specific for “B”). Four positive results would be obtained, because all four probe sets would hybridize and result in an amplification product. However, in such case it would still be unknown whether the individual comprises the alleles a-b and A-B, or the alleles a-B and A-b. With a method according to the present invention, however, it has become possible to directly identify the alleles of said individual. For instance, a first probe set of the invention is used comprising a left probe specific for “a” and a third probe specific for “b”, together with a second probe set of the invention comprising a left probe specific for “a” and a third probe specific for “B” and a third probe set of the invention comprising a left probe specific for “A” and a third probe specific for “b” and a fourth probe set of the invention comprising a left probe specific for “A” and a third probe specific for “B”. Two of these probe sets according to the present invention will yield an amplification product, namely the second probe set of the invention comprising a left probe specific for “a” and a third probe specific for “B” and the third probe set of the invention comprising a left probe specific for “A” and a third probe specific for “b”. The first and fourth probe sets according to the present invention will not yield (significant) amplification product. This way, it is immediately apparent which alleles are present in said individual. This, too, is an advantage as compared to currently used methods, especially when complex loci with many (pseudo)gene-specific nucleotides and additional single nucleotide polymorphisms within a given (pseudo)gene are investigated, because in such case many different combinations of such (pseudo)gene variants need to be screened for.

Another advantage of a method according to the present invention is the fact that more variations in length of the ligated probes are obtained. Since at least one probe set of the invention, but preferably a plurality of probe sets of the invention, comprise a third probe it has become possible to design the probe sets such that variations in length of the resulting ligated probes are obtained. This obviates the need of stuffer sequences. As a result, the individual probes of a probe set according to the invention can be kept shorter, which is particularly advantageous when chemically synthesized probes are used because chemical production of long probes is cumbersome, as explained above. Hence, a method according to the invention allows for the use of probe sets with relatively short probes, while the resulting ligated probes are long enough to allow for many size variations. Thus, the present invention allows the use of synthetic probes, which are easy to obtain and cost-effective, even when complex loci are investigated, and offers greater flexibility to adapt the assay in case of cross-reactivity or unclear results.

For instance, if 20 (pseudo)gene variants are investigated, probes with a stuffer sequence with a length varying from 4 to 100 nucleotides would need to be used in a conventional MLPA assay in order to be capable of distinguishing the resulting amplification products by size. Since the probe sequences hybridizing to a target sequence are typically about 30 nucleotides, and since the primer binding sequences of the probes are typically about 15-25 nucleotides, this would mean that probe sets with probes with a length varying from 45-125 nucleotides would need to be synthesized. When the probes are chemically synthesized, it is hardly possible to obtain reliable probe sets with these lengths. With a method according to the invention, however, differences of length between the various amplificates need not to be obtained by use of stuffer sequences in the probe sets. Instead, at least one third probe is used, preferably a plurality of third probes is used. By varying combinations of three probes, optionally in combination with probe sets consisting of two probes, the overall length differences of the ligated probes vary considerably whereas probe sets can be used with chemically synthesized probes with convenient lengths. Of course, this does not mean that the use of stuffer sequences is excluded. But the skilled person does no longer have to rely on these stuffer sequences only for length variations. If stuffer sequences are used in a method according to the invention, it is preferred to keep these sequences as short as possible.

Accordingly, the present invention provides a method for screening for the presence of at least one target nucleic acid sequence in a sample, comprising the steps of:

a) adding to said sample at least two different probe sets, each probe set comprising:

- a first nucleic acid probe (“left probe”), said first probe comprising a first nucleic acid sequence complementary to a first region of said target nucleic acid sequence, and
- a second nucleic acid probe (“right probe”), said second probe comprising a second nucleic acid sequence complementary to a second region of said target nucleic acid sequence,
  
  wherein at least one of said probe sets comprises a third nucleic acid probe, said third probe comprising a third nucleic acid sequence complementary to a third region of said target nucleic acid sequence, and
  
  wherein, if said third probe is present in said probe set, said first and said third region of said target nucleic acid are located essentially adjacent to each other and said third and said second region of said target nucleic acid are located essentially adjacent to each other, and
  
  wherein, if said third probe is not present in said probe set, said first and said second region of said target nucleic acid are located essentially adjacent to each other,

b) allowing hybridization of said at least two different probe sets to complementary nucleic acid of said sample,

c) subjecting nucleic acid of said sample to a ligation reaction, and

d) determining whether said at least one target nucleic acid sequence is present in said sample,

wherein at least one third nucleic acid probe is complementary to a target nucleic acid region comprising a (pseudo)gene variation.

The advantage of probe sets comprising at least three probes according to the present invention is that at least two different SNPs can be detected with one probe set. For instance, in a probe set comprising three probes two sites for ligation are present. A left probe and middle probe are ligated, and a middle probe and right probe are ligated. At each ligation site a SNP can be detected. Thus it is possible to design two probes of the same probe set in such a way that they are used to detect two SNPs. In that case, using MLPA and a probe set comprising three probes according to the invention, a product will only be obtained when both SNPs are present in a sample, because only then ligation can occur at both ligation sites.

With conventional MLPA probesets consisting of two probes only one SNP can be detected, because only one site for ligation is present. Additional third probe parts in conventional MLPA, as described in WO 01/61033, are occasionally used to bridge the two half probes. Such an additional third probe part is not SNP-specific. Therefore, the advantages of probe sets comprising at least three probes according to the present invention are not obtained when using such additional third probe part for bridging purposes in conventional MLPA.

Therefore, in a preferred embodiment of the invention a probe set comprises three nucleic acid probes wherein each of at least two nucleic acid probes are specific for a different (pseudo)gene variation. Preferably, a first (or a second) nucleic acid probe of a probe set according to the invention is complementary to a target nucleic acid region comprising a gene-specific nucleotide and/or a pseudogene-specific nucleotide and/or a gene-specific sequence and/or a pseudogene-specific sequence and/or a polymorphism within a given gene or pseudogene, and a third nucleic acid probe of the same probeset is complementary to another target nucleic acid region comprising a gene-specific nucleotide and/or a pseudogene-specific nucleotide and/or a gene-specific sequence and/or a pseudogene-specific sequence and/or a polymorphism within a given gene or pseudogene. Said polymorphism preferably comprises an SNP.

Preferably, ligated probes are amplified. Accordingly, the present invention provides a method for screening for the presence of at least one target nucleic acid sequence in a sample, comprising the steps of:

a) adding to said sample at least two different probe sets, each probe set comprising:

- a first nucleic acid probe (“left probe”), said first probe comprising a first nucleic acid sequence complementary to a first region of said target nucleic acid sequence and, located 5′ thereof, a non-complementary nucleic acid sequence comprising a first primer binding site, and
- a second nucleic acid probe (“right probe”), said second probe comprising a second nucleic acid sequence complementary to a second region of said target nucleic acid sequence and, located 3′ thereof, a non-complementary nucleic acid sequence comprising a second primer binding site,

wherein at least one of said probe sets comprises a third nucleic acid probe, said third probe comprising a third nucleic acid sequence complementary to a third region of said target nucleic acid sequence, and

wherein, if said third probe is present in said probe set, said first and said third region of said target nucleic acid are located essentially adjacent to each other and said third and said second region of said target nucleic acid are located essentially adjacent to each other, and

wherein, if said third probe is not present in said probe set, said first and said second region of said target nucleic acid are located essentially adjacent to each other,

b) allowing hybridization of said at least two different probe sets to complementary nucleic acid of said sample,

c) subjecting nucleic acid of said sample to a ligation reaction,

d) subjecting nucleic acid of said sample to a nucleic acid amplification reaction, using at least one primer capable of specifically binding said first primer binding site and at least one primer capable of specifically binding said second primer binding site, and

e) determining whether amplified nucleic acid is present, thereby determining whether said at least one target nucleic acid sequence is present in said sample,

wherein at least one third nucleic acid probe is complementary to a target nucleic acid region comprising a (pseudo)gene variation.

As used herein, the term “(pseudo)gene variation” encompasses a (pseudo)gene-specific nucleotide and/or a (pseudo)gene-specific sequence. In one embodiment, said (pseudo)gene variation comprises an additional polymorphism within a given (pseudo)gene. Said additional polymorphism preferably comprises an SNP.

Hence, the present invention uses probe sets, wherein at least one probe set, but preferably a plurality of probe sets, comprises three probes. The probes comprise sequences which are complementary to a region of a target nucleic acid of interest. As used herein, the term “complementary” means that said probe sequence comprises at least 70%, preferably at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity to said region or to the complement of said region. The term “% sequence identity” is defined herein as the percentage of residues in a nucleotide sequence that is identical with the residues in a reference sequence after aligning the two sequences and introducing gaps, if necessary, to achieve the maximum percent identity. Methods and computer programs for the alignment are well known in the art. One computer program which may be used or adapted for purposes of determining whether a candidate sequence falls within this definition is Autoassembler 2.0 (ABI Prism, Perkin Elmer).

The first and second probes of each probe set also comprise a primer binding site, so that the resulting ligated probes can be amplified. Preferably, the primer binding sites of the first nucleic acid probes of each probe set is designed such that the same primer can bind. This allows the use of the same primer for binding the primer binding sites of the first probes in step d). Likewise, it is preferred that the primer binding sites of the second nucleic acid probes of each probe set is designed such that the same primer can bind. Most preferably, the probe sets are designed such that a first primer is capable of specifically binding the primer binding sites of the first nucleic acid probes of each probe set and a second primer is capable of specifically binding the primer binding sites of the second nucleic acid probes of each probe set. This embodiment allows the use of only one primer pair in step d). This is, however, not necessary: it is also possible to use different primers for different probe sets. The number of different primers is, however, kept as low as possible.

One preferred embodiment therefore provides a method according to the invention, wherein the first primer binding sites of the first nucleic acid probes of each probe set is capable of specifically binding the same primer and/or wherein the second primer binding sites of the second nucleic acid probes of each probe set is capable of specifically binding the same primer. Preferably, the first nucleic acid probes and/or the second nucleic acid probes of each probe set comprise essentially identical primer binding sequences. Further provided is therefore a method according to the invention, wherein the non-complementary nucleic acid sequences of said first nucleic acid probes comprise essentially identical first primer binding sites and/or wherein the non-complementary nucleic acid sequences of said second nucleic acid probes comprise essentially identical second primer binding sites. Using essentially identical primer binding sequences ensures that the same primer can bind different probes. The term “essentially identical primer binding sequences” is defined herein as primer binding sequences which comprise at least 80%, preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity to each other.

As already described, a method according to the invention is particularly suitable for investigating a nucleic acid sequence having various (pseudo)gene specific nucleotides and/or (pseudo)gene variants, such as complex loci. It is therefore preferred to use a plurality of third probes, so that many (pseudo)gene variant combinations are investigated. A method according to the invention is therefore preferably provided wherein at least two, preferably at least five, more preferably at least ten different third nucleic acid probes are used. As illustrated in the Examples, a plurality of probe sets comprising different third probes according to the invention allows for screening of complex gene loci such as the KIR locus. Not all third probes need to be specific for a genetic variation of a target nucleic acid. It is also possible to use a combination of variant-specific third probes and third probes which are not specific for a (pseudo)gene variation. Likewise, not all first probes need to be specific for a variant of a target nucleic acid. It is also possible to use a combination of variant-specific first probes and first probes which are not specific for a (pseudo)gene variation. Any of these combinations is for instance used to vary the length of the resulting ligated probes to a larger extent. In one preferred embodiment of the invention, therefore, at least 50%, preferably at least 70%, more preferably at least 80%, most preferably at least 90% of the third nucleic acid probes is complementary to a target nucleic acid region comprising a (pseudo)gene variation. In one embodiment, all third probes are complementary to a target nucleic acid region comprising a (pseudo)gene variant. Preferably, the second probes (“right probes”) are not designed to contain (pseudo)gene variant-specific sequences, although the use of variant-specific right probes in a method according to the invention is not excluded.

Preferably, at least 50%, preferably at least 70%, more preferably at least 80%, most preferably at least 90% of the third nucleic acid probes that are complementary to a target nucleic acid region comprising a (pseudo)gene variation are combined with a first nucleic acid probe or a second nucleic acid probe that is complementary to another target nucleic acid region comprising a (pseudo)gene variation in order to be capable of screening for many variants with one MLPA assay or MLPA-like assay. In one embodiment, all third probes that are combined with a first nucleic acid probe or a second nucleic acid probe that is complementary to a target nucleic acid region comprising a (pseudo)gene variation are complementary to a target nucleic acid region comprising a (pseudo)gene variant. Of course, these probes are preferably specific for different variants.

In one preferred embodiment, a (pseudo)gene variant-specific sequence of a third probe is at least located within the last three nucleotides or the first three nucleotides of the third probe. This means that the last three nucleotides and/or the first three nucleotides comprise at least one nucleotide which is specific for a (pseudo)gene variation of a target nucleic acid. In this embodiment, said (pseudo)gene variation is present at a ligation site of the third probe, so that ligation is only possible when the sequence of the third probe is exactly complementary to said (pseudo)gene variation. This enhances the specificity of the MLPA method, as explained before. Preferably, the last three nucleotides and/or the first three nucleotides of said third probe comprise one nucleotide which is specific for a (pseudo)gene variant of a target nucleotide.

The probe sets according to the present invention preferably have a length between 90 and 300 nucleotides. Cloned probes can be as long as 500 nucleotides. Preferably, however, chemically synthesized probes are used because they are rapidly synthesized, easy to obtain and cost-effective. In order to be capable of synthetically producing the probes according to the present invention, a method according to the invention is preferably provided wherein third nucleic acid probes with a length of between 20 and 100 nucleotides are used. Most preferably, third nucleic acid probes with a length of between 19 and 110 nucleotides are used. Since at least one probe set of the invention, but preferably a plurality of probe sets according to the invention, is used which comprise three nucleic acid probes, sufficient variations in length and specificity of the resulting ligated probes is ensured so that many (pseudo)gene variations can be investigated simultaneously.

These length variations of the resulting ligated probes obviate the need of stuffer sequences, as explained before. It is therefore possible to design the probe sets such that the parts of the first and/or second probe which are not complementary to a target nucleic acid have about the same length. According to this embodiment, the length of the non-complementary sequences of all first probes is about the same in each probe set, and/or the length of the non-complementary sequences of all second probes is about the same in each probe set. These lengths are about the same when they do not differ from each other by more than 10 nucleotides. Preferably, they do not differ from each other by more than 6 nucleotides, most preferably they do not differ from each other by more than 4 nucleotides. This, too, facilitates synthetic production of the probes. Further provided is therefore a method according to the invention, wherein the difference in length of said non-complementary nucleic acid sequences of said first nucleic acid probes of said at least two different probe sets and/or the difference in length of said non-complementary nucleic acid sequences of said second nucleic acid probes of said at least two different probe sets is less than 6, preferably less than 4 nucleic acids.

Besides the analysis of (pseudo)gene-specific nucleotides and additional single nucleotide polymorphisms, an MLPA technique or MLPA-like technique is particularly suitable for relative (pseudo)gene copy number determination. If multiple copies of a (pseudo)gene of interest (or any other target nucleic acid of interest) are present in sample nucleic acid molecules, each copy will, in principle, be bound by the specific probes which is detectable. When the probes are amplified, more amplification product will be present when multiple copies were present in the original sample nucleic acid as compared to a situation wherein only one copy is present. Analysis of the amount of amplification product thus provides information about the copy number of a target nucleic acid of interest. This is often done by graphically representing amplified products by separate peaks. Each peak is the product of an amplified MLPA ligated probe and a relative difference in peak intensity (height or surface) between a control sample and a sample of interest indicates copy number variation. When a complex locus is investigated, multiple copies of a (pseudo)gene of interest can be present in highly polymorphic regions. In such case, when (pseudo)gene copy number is to be determined, many different combinations of (pseudo)gene variants need to be taken into account. This involves the use of a wide variety of different probe sets, to ensure that each combination of (pseudo)gene variants can be detected. In one embodiment according to the present invention, however, when the relative copy number of a nucleic acid of interest is to be estimated, an improved approach is provided. According to this embodiment, at least one probe is used with degenerate bases at one or more positions. This means that a mixture of probes is used wherein different nucleotides can be present at one or more positions. Hence a mixture of probes is used, which probes have the same sequence, except for the fact that some probes have a certain nucleotide at a given position X and some probes have another nucleotide at said position X. Such degenerate bases are commonly represented by the IUB nucleotide codes as depicted in FIG. 2. The use of probes with degenerate bases allows for an efficient estimation of copy number of a nucleic acid of interest, even in highly polymorphic regions. Further provided is therefore a method for determining the copy number of a nucleic acid of interest, wherein at least one probe set is used which comprises a probe with (a) degenerate base(s) at one or more positions. Preferably, at most 20 probe positions have such multiple alternatives, in order to retain specificity of the probes for a given target region of interest. A use of at least one probe set for determining the copy number of a nucleic acid of interest, wherein at least one probe set comprises a probe with (a) degenerate base(s) at one or more positions, is also provided herewith. In one preferred embodiment, at least one probe set comprising a probe with (a) degenerate base(s) is used in a MLPA method or MLPA-like method according to the present invention. Further provided is therefore a method according to the invention, wherein at least one probe set is used which comprises a probe with (a) degenerate base(s) at one or more positions.

Alternatively, or additionally, a probe set is used which comprises an alternative base which alternative base is capable of binding at least two bases selected from the group consisting of A, T, G, C and U. Preferably, said alternative base is capable of binding at least three, most preferably at least four, bases selected from the group consisting of A, T, G, C and U. Such alternative base is suitable as an alternative for degenerate bases. It is, of course, also possible to combine such alternative base with degenerate bases. In a particularly preferred embodiment said alternative base is deoxyinosine triphosphate (dITP) or a functional equivalent thereof, which is capable of binding A and T and G and C and U. Further provided is therefore a method for determining the copy number of a nucleic acid of interest, wherein at least one probe set is used which comprises an alternative base which is capable of binding at least two, preferably at least three, more preferably at least four bases selected from the group consisting of A, T, G, C and U. As said before, said alternative base preferably comprises deoxyinosine triphosphate (dITP) or a functional equivalent thereof. A use of at least one probe set for determining the copy number of a nucleic acid of interest, wherein at least one probe set comprises an alternative base which is capable of binding at least two, preferably at least three, more preferably at least four bases selected from the group consisting of A, T, G, C and U, is also provided herewith. In one preferred embodiment, at least one probe set comprising such alternative base(s) is used in a MLPA method or MLPA-like method according to the present invention. Further provided is therefore a method according to the invention, wherein at least one probe set is used which comprises an alternative base which is capable of binding at least two, preferably at least three, more preferably at least four bases selected from the group consisting of A, T, G, C and U. As said before, said alternative base preferably comprises deoxyinosine triphosphate (dITP) or a functional equivalent thereof.

The present invention provides alternative and improved methods for screening for the presence of at least one target nucleic acid sequence in a sample, wherein at least one third probe is used which is complementary to a target nucleic acid region comprising a (pseudo)gene variation. A use of a probe set comprising at least three nucleic acid probes, wherein at least one third probe is complementary to a target nucleic acid region comprising a gene variant and/or a pseudogene variant, for screening for the presence of at least one target nucleic acid sequence in a sample is therefore also provided. Preferably, a plurality of probe sets according to the present invention is used. Further provided is therefore a use of a plurality of probe sets for screening for the presence of at least one target nucleic acid sequence in a sample, wherein each of said probe sets comprises:

- a first nucleic acid probe, said first probe comprising
  - a first nucleic acid sequence complementary to a first region of said target nucleic acid sequence and, located 5′ thereof, a non-complementary nucleic acid sequence comprising a first primer binding site, and
- a second nucleic acid probe, said second probe comprising
  - a second nucleic acid sequence complementary to a second region of said target nucleic acid sequence and, located 3′ thereof, a non-complementary nucleic acid sequence comprising a second primer binding site,

wherein, if said third probe is not present in said probe set, said first and said second region of said target nucleic acid are located essentially adjacent to each other, and

wherein at least one third nucleic acid probe is complementary to a target nucleic acid region comprising a gene-specific nucleotide and/or a pseudogene-specific nucleotide and/or a gene-specific sequence and/or a pseudogene-specific sequence and/or an additional polymorphism within a given gene or pseudogene, said polymorphism preferably comprising an SNP.

A method according to the present invention is particularly suitable for analysis of (pseudo)gene variation and (pseudo)gene copy number determination in complex loci such as the gene encoding complement factors (e.g. Factor H and FH-like genes, C4A and C4B within the HLA-class III region), chemokines and their receptor alleles (e.g. CCL3L1, CCL4L1, CCR5 or CCR5delta32), HLA-class I and II, SIRPs and LILRs.

In one preferred embodiment, a method according to the invention is used in order to investigate the killer cell immunoglobulin-like receptor (KIR) locus. KIRs are expressed by natural killer (NK) cells and a subset of T cells. NK cells are cells of the lymphoid lineage, but display no antigen-specific receptors. Their main function is to monitor host cells for the presence of MHC class I molecules and this is important for e.g. distinguishing healthy cells from virus-infected or tumors cells. Interaction between NK cells and MHC class I molecules is mediated by KIRs. The KIR locus in humans is polygenic and highly polymorphic, so that accurate and efficient characterization of an individual's KIR (pseudo)gene profile is cumbersome. In the determination of the KIR (pseudo)gene profile and their role in many diseases an efficient and reliable method for KIR genotyping is, however, important. Until now, KIR genotyping is based upon the polymerase chain reaction sequence-specific primer (PCR-SSP) (Sun et al, 2004), multiplex PCR (Vilches et al, 2007) and PCR-sequence specific oligonucleotide probes (PCR-SSOP) (Crum et al, 2000). For the PCR-SSP high-quality genomic DNA is required and multiple reactions are needed to generate a complete KIR profile of an individual. Multiple copies of KIR2DL4 and KIR3DL1/S1 in individuals have been reported with PCR-SSOP (Williams et al, 2003). Detection of the multiple gene copies was possible because the gene copies of these genes consisted of different alleles. However, multiple gene copies of highly homologous or identical sequences are not distinguishable with this molecular detection system or cloning methods when individuals are homozygous for a gene (Williams et al, 2003).

As shown in the Examples, a method according to the present invention is particularly suitable for investigating the KIR locus of individuals. Even though this locus is highly polymorphic, (pseudo)gene variants and copy number variations are efficiently detected with methods according to the present invention. One preferred embodiment therefore provides a method or use according to the invention, wherein said target nucleic acid sequence is present in a KIR locus. Preferably, copy number variation of at least one KIR gene and/or at least one KIR pseudogene is determined. FIGS. 3A and B provide KIR-specific probes which provide particularly good results. These probes are therefore preferred when a KIR locus is investigated. FIGS. 3C and D provide an extended list of KIR-specific probes which provide even better results than the probes listed in FIGS. 3A and B. Therefore, these probes are even more preferred when a KIR locus is investigated. FIGS. 3F and G also provide preferred probes. Further provided is thus a method and/or a use according to the invention, wherein at least one probe depicted in FIG. 3A, 3B, 3C, 3D, 3F or 3G, is used. Preferably, at least one probe depicted in FIG. 3C, 3D, 3F or 3G is used. More preferably, at least two probes depicted in FIG. 3 are used. In another preferred embodiment at least four probes, more preferably at least six probes depicted in FIG. 3A, 3B, 3C, 3D, 3F or 3G, preferably depicted in FIG. 3C, 3D, 3F or 3G, are used.

In a particularly preferred embodiment, a probe set of FIG. 3 is used. Said probe set preferably comprises three probes. A probe set of FIG. 3 is formed by two or three individual probes depicted in FIG. 3 which have the same number, followed by the letter A, B, C, D, E, G, K, L, M or N. For instance, probe set 408 is formed by probes 408A, 408B and 408C. Optionally, four different probes with the same number are given for a probe set of FIG. 3. In that case, a left, a middle and a right probe is selected from said four probes. Further provided is therefore a method and/or a use according to the invention, wherein at least one probe set depicted in FIG. 3A selected from the group consisting of probe set 408, probe set 507, probe set 419, probe set 528, probe set 413, probe set 416, probe set 415 and probe set 418 is used. In a particularly preferred embodiment at least one probe set depicted in FIG. 3A selected from the group consisting of probe set 408, probe set 507, probe set 528, probe set 413, probe set 416 and probe set 415 is used. These probe sets contain a third probe which is specific for a (pseudo)gene variant of the KIR locus. Also provided is a method and/or a use according to the invention, wherein at least one probe set depicted in FIG. 3B selected from the group consisting of probe set 409, probe set 506, probe set 507, probe set 538, probe set 417 and probe set 517 is used. In a particularly preferred embodiment at least one probe set depicted in FIG. 3B selected from the group consisting of probe set 409, probe set 506, probe set 507, probe set 538, probe set 417 and probe set 517 is used. These probe sets also contain a third probe which is specific for a (pseudo)gene variant of the KIR locus. Also provided is a method and/or a use according to the invention, wherein at least one probe set depicted in FIG. 3C selected from the group consisting of probe set 415, probe set 703, probe set 413, probe set 419, probe set 702, probe set 711, probe set 408, probe set 507, probe set 710, probe set 528, probe set 418 and probe set 416 is used. In a particularly preferred embodiment at least one probe set depicted in FIG. 3C selected from the group consisting of probe set 415, probe set 703, probe set 413, probe set 419, probe set 702, probe set 711, probe set 408, probe set 507, probe set 710, probe set 528, probe set 418 and probe set 416 is used. These probe sets also contain a third probe which is specific for a (pseudo)gene variant of the KIR locus. Also provided is a method and/or a use according to the invention, wherein at least one probe set depicted in FIG. 3D selected from the group consisting of probe set 506, probe set 417, probe set 517, probe set 409, probe set 507, probe set 710, probe set 709, probeset 708, probe set 704 and probe set 538 is used. In a particularly preferred embodiment at least one probe set depicted in FIG. 3D selected from the group consisting of probe set 506, probe set 417, probe set 517, probe set 409, probe set 507, probe set 710, probe set 709, probeset 708, probe set 704 and probe set 538 is used. These probe sets also contain a third probe which is specific for a (pseudo)gene variant of the KIR locus. Also provided is a method and/or a use according to the invention, wherein at least one probe set depicted in FIG. 3F selected from the group consisting of probe set 415, probe set 413, probe set 419, probe set 702, probe set 711, probe set 408, probe set 507, probe set 710, probe set 528, probe set 418 and probe set 416 is used. These probe sets also contain a third probe which is specific for a (pseudo)gene variant of the KIR locus. Also provided is a method and/or a use according to the invention, wherein at least one probe set depicted in FIG. 3G selected from the group consisting of probe set 506, probe set 417, probe set 517, probe set 409, probe set 507, probe set 710, probe set 709, probe set 708, probe set 704 and probe set 538 is used. These probe sets also contain a third probe which is specific for a (pseudo)gene variant of the KIR locus.

It is preferred to use at least two probe sets selected from FIG. 3, so that various KIR (pseudo)gene variants are screened for with good results. More preferably, at least three probe sets selected from FIG. 3 are used. Even more preferably, at least four, more preferably at least five, most preferably at least six probe sets selected from FIG. 3 are used. Said at least two, three, four, five or six probe sets are preferably selected from the group consisting of probe set 506, probe set 417, probe set 517, probe set 409, probe set 507, probe set 710, probe set 709, probe set 708, probe set 704 and probe set 538 of FIG. 3A, probe set 415, probe set 413, probe set 419, probe set 702, probe set 711, probe set 408, probe set 507, probe set 710, probe set 528, probe set 418 and probe set 416 of FIG. 3B, probe set 506, probe set 417, probe set 517, probe set 409, probe set 507, probe set 710, probe set 709, probe set 708, probe set 704 and probe set 538 of FIG. 3C, probe set 415, probe set 703, probe set 413, probe set 419, probe set 702, probe set 711, probe set 408, probe set 507, probe set 710, probe set 528, probe set 418 and probe set 416 of FIG. 3D, probe set 409, probe set 506, probe set 507, probe set 538, probe set 417 and probe set 517 of FIG. 3F and probe set 408, probe set 507, probe set 528, probe set 413, probe set 416 and probe set 415 of FIG. 3G, since these probe sets contain a third probe which is specific for a (pseudo)gene variant of the KIR locus. In one embodiment, all probe sets depicted in FIGS. 3A, and/or 3B, and/or 3C, and/or 3D and/or 3F and/or 3G are used. In a preferred embodiment all probe sets depicted in FIG. 3C and/or FIG. 3D and/or FIG. 3F and/or FIG. 3G are used.

It is of course also possible to modify a sequence of at least one probe depicted in FIG. 3 to some extent. This is for instance done for optimalization purposes. Further provided is therefore a method and/or a use according to the invention, wherein at least one probe is used which has at least 70%, preferably at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity to a probe depicted in FIG. 3. Preferably, at least two, more preferably at least four, most preferably at least six probes are used which have at least 70%, preferably at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity to a probe depicted in FIG. 3. In one embodiment, a method or use according to the invention is provided wherein at least 20 probes are used, said at least 20 probes having at least 70%, preferably at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity to the probes depicted in FIG. 3. A minimum of two specific probes per (pseudo)gene is preferred to determine copy number variation (CNV).

The left and right probes of the probe sets of FIG. 3 contain a primer binding site, which are indicated in bold and/or underlined in this figure. The primer binding site of each left probe of FIG. 3 consists of the sequence GGGTTCCCTAAGGGTTGGA and the primer binding site of each right probe of FIG. 3 consists of the sequence TCTAGATTGGATCTTGCTGGCAC or TCTAGATTGGATCTTGCTGGCAC.

Of course, these primer binding sites can be varied at will, as long as complementary primers are used in the amplification reaction. Therefore, the primer binding sites of probes according to the invention need not be at least 70% identical to the above mentioned sequences. Lower sequence identity can be used, complementary to the amplification primers. Other primers then used in the Examples can be developed for use in a method according to the invention wherein probes depicted in FIG. 3 are used. This means that the primer binding sites of probes depicted in FIG. 3 can be amended or even replaced by entirely different primer binding sites. Thus, the sequence identity between a primer binding site of a probe according to the invention and a primer binding site as depicted in FIG. 3 can be less than 70%, such as for instance at most 60%, at most 50%, at most 40%, at most 30%, at most 25%, at most 20%, at most 15%, at most 10% or even lower.

However, the parts of probes according to the invention that are capable of hybridizing to KIR genes preferably have at least 70% sequence identity to the KIR-specific sequences depicted FIG. 3, i.e. the probe sequences of FIG. 3 without the primer binding sites. Thus, preferably at least one probe is used in a method according to the invention which has at least 70%, preferably at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity to the part of the sequence of a probe depicted in FIG. 3 which is capable of hybridizing to a KIR gene, i.e. the part of a sequence of a probe depicted in FIG. 3 which is not the primer binding site.

Preferably, probe sets are used which are based on the probe sets depicted in FIG. 3A, 3B, 3C, 3D, 3F or 3G, preferably based on the probe sets depicted in FIGS. 3C and/or 3D and/or 3F and/or 3G. At least one of said probe sets preferably comprises three probes. One or more of the probes of such probe set may be modified to some extent, as described above. Further provided is therefore a method and/or a use according to the invention, wherein at least one probe set is used which has at least 70%, preferably at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity to a probe set as depicted in FIG. 3 without the primer binding sites. This means that the probes of said probe set have at least 70% sequence identity to the KIR-specific parts of corresponding probes of at least one probe set of FIG. 3. Preferably, a probe set is used which has at least 70%, preferably at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity to a probe set depicted in FIG. 3 selected from the group consisting of probe set 506, probe set 417, probe set 517, probe set 409, probe set 507, probe set 710, probe set 709, probe set 708, probe set 704 and probe set 538 of FIG. 3A, probe set 415, probe set 413, probe set 419, probe set 702, probe set 711, probe set 408, probe set 507, probe set 710, probe set 528, probe set 418 and probe set 416 of FIG. 3B, probe set 506, probe set 417, probe set 517, probe set 409, probe set 507, probe set 710, probe set 709, probe set 708, probe set 704 and probe set 538 of FIG. 3C, probe set 415, probe set 703, probe set 413, probe set 419, probe set 702, probe set 711, probe set 408, probe set 507, probe set 710, probe set 528, probe set 418 and probe set 416 of FIG. 3D, probe set 409, probe set 506, probe set 507, probe set 538, probe set 417 and probe set 517 of FIG. 3F and probe set 408, probe set 507, probe set 528, probe set 413, probe set 416 and probe set 415 of FIG. 3G, without the primer binding sequences, since these probe sets contain a third probe specific for a KIR nucleic acid sequence. Preferably at least two, more preferably at least three, more preferably at least four, more preferably at least five, most preferably at least six of such probe sets are used, so that various KIR (pseudo)gene variants are screened for with good results.

Novel probes and probe sets which are particularly suitable for (pseudo)gene variant analysis and (pseudo)gene copy number determination of the KIR locus are also provided. These probes and probe sets are listed in FIGS. 3A, B, C, D, F and G, as described above. Further provided are therefore probes and probe sets as depicted in FIG. 3A, 3B, 3C, 3D, 3F or 3G, as well as probes and probe sets which have at least 70%, preferably at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity to the KIR-specific part of a probe or probe set depicted in FIG. 3A, 3B, 3C, 3D, 3F or 3G, i.e. the sequences depicted in FIG. 3 without the primer binding sites. A mixture of nucleic acids, wherein said nucleic acids comprise at least two probe sets according to the invention is also provided. Preferably, said mixture comprises at least four, more preferably at least six probe sets according to the invention. As said before, such probe sets have at least 70% sequence identity to a probe or probe set depicted in FIG. 3A, 3B, 3C, 3D, 3F or 3G without the primer binding sites. One embodiment provides a mixture of nucleic acids comprising at least two, preferably at least four, more preferably at least six probe sets as depicted in FIG. 3A, 3B, 3C, 3D, 3F or 3G.

Further provided is a kit for detecting the presence of at least one target nucleic acid sequence in a sample, comprising a probe set or a mixture of nucleic acids according to the invention. Said at least one target nucleic acid sequence preferably comprises a nucleic acid sequence present in a KIR locus. A kit according to the invention preferably further comprises a PCR primer set comprising at least 70%, preferably at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity to nucleic acid sequences 5′-GGGTTCCCTAAGGGTTGGA and TCTAGATTGGATCTTGCTGGCAC-3′ or TCTAGATTGGATCTTGCTGGCGC-3′, or the complements thereof. These primers are particularly suitable for amplifying probe sets depicted in FIG. 3. However, as described above, different primers are also suitable, in which case the primer binding sites need to be altered in order to be complementary to the primers.

The invention further provides calibrators that are particularly suitable for determining copy numbers of nucleic acids of interest. Currently only relative gene copy number can be determined. This is often done by graphically representing amplified products of genes of interest by separate peaks. A relative peak intensity (height or surface) of an amplified product of a gene of interest is compared with the peak intensity of an amplified product of a control sample containing the gene of interest to determine relative copy number. For instance, if an MLPA reaction is used, each peak represents the product of an amplified MLPA ligated probe. However, with such a method it is not possible to quantify the absolute gene copy number because intensity peaks of a control sample do not represent a known copy number. Furthermore, a reference sample does not always contain all genes of interest. This is in particular the case for polygenic and highly polymorphic gene loci such as the KIR locus and the human leukocyte antigen (HLA) locus, whereby the identity and copy number of alleles differ greatly between individuals. Since no individual has all alleles of such polygenic and highly polymorphic gene cluster, a reference sample containing all these alleles is not available. Thus, if a sample of a random individual is compared with such a reference sample in order to determine the haplotype and/or copy number of genes, possibly several alleles of said individual are not detected because they are not present in the reference sample. Thus, with a reference sample currently used in the art, it is not possible to determine the complete haplotype (including copy number variation) of such polygenic and highly polymorphic gene loci such as the KIR locus of an individual. It is of course possible to use multiple reference samples, which will result in a more elaborate method. Furthermore, some alleles of a gene cluster are relatively rare and it is difficult to obtain reference samples with all known alleles of a gene cluster.

The invention provides means and methods that enable determination of the complete haplotype of a polygenic and highly polymorphic gene cluster. In addition, determination of copy number variation of genes of such gene cluster of an individual has now become possible. This comprises the use of a nucleic acid molecule comprising at least one control nucleic acid sequence, and for each gene or allele of interest a nucleic acid sequence which is unique for said gene or allele of interest. Said nucleic acid molecule can be used as such. Of course, such nucleic acid molecule can also be present in a vehicle such as for instance a plasmid, which optionally comprises other nucleic acid sequences. Such nucleic acid molecule as such or a vehicle or plasmid comprising such nucleic acid molecule are herein referred to as a calibrator according to the invention. Instead of a single nucleic acid molecule, a calibrator according to the invention can also contain a combination of multiple nucleic acid molecules or multiple vehicles/plasmids according to the invention. For instance, a calibrator according to the invention may contain 2, 3, 4, 5, 6, 7, 8 or more separate nucleic acid molecules. Preferably, however, a calibrator according to the invention consists of one nucleic acid molecule, vehicle or plasmid. In FIG. 17 an exemplary representation of such vehicle is shown which comprises multiple nucleic acid sequences unique for a gene or allele of interest, KIR genes in FIG. 17, to which, for instance, MLPA probes can bind.

A calibrator according to the invention comprises at least one nucleic acid sequence with a length of at least 10 nucleotides which is at least 70% identical to a part of a (pseudo)gene comprising a polymorphism, such as a SNP. A part of a (pseudo)gene is defined as a consecutive stretch of at least 10 nucleotides in said (pseudo)gene, Preferably said part is at least 15, more preferably at least 20, more preferably at least 25, more preferably at least 30 nucleotides, such as 35, 40, 45 or 50 nucleotides. Said nucleic acid sequence of a calibrator according to the invention preferably comprises a sequence which is identical to—or complementary to—a polymorphism of said (pseudo)gene. Preferably, a calibrator according to the invention comprises at least one nucleic acid sequence which is at least 75%, more preferably at least 80%, more preferably at least 85%, more preferably at least 90%, more preferably at least 95% identical to part of a gene, which part comprises at least one polymorphism.

In addition, such calibrator comprises at least one nucleic acid sequence which is at least 70% identical to part of a control gene. A part of a control gene is defined as a consecutive stretch of at least 10 nucleotides in said control gene, preferably at least 15, more preferably at least 20, more preferably at least 25, more preferably at least 30 nucleotides, such as 35, 40, 45 or 50 nucleotides. As used herein, control genes are preferably genes which have a constant copy number in the human genome. Preferably a calibrator according to the invention comprises at least one nucleic acid sequence which is at least 75%, more preferably at least 80%, more preferably at least 85%, more preferably at least 90%, more preferably at least 95% identical to part of a control gene. Preferably sequences are used of control genes that have no or few polymorphisms so that these sequences will always be present in samples of individuals, avoiding the need to use many different control sequences for one particular control gene.

In one embodiment, a calibrator according to the invention comprises a nucleic acid sequence which is identical to part of a gene, which part comprises at least one polymorphism and a nucleic acid sequence which is identical to part of a control gene. As described before, such part contains at least 10 nucleotides. In one embodiment, a calibrator according to the invention comprises at least one nucleic acid sequence which is at least 70% identical to part of a gene, which part comprises at least two polymorphisms, such as two SNPs.

A calibrator is preferably designed such that all genes of interest and their allelic variants are separately represented once with a unique nucleic acid sequence on the calibrator. Different unique nucleic acid sequences from one gene of interest or gene variant of interest may also be represented on a calibrator as a single sequence to yield the same result: i.e. detection by one single probe in a mixture of probes as one copy of that sequence on the calibrator used.

With the use of a calibrator according to the invention, it is possible to determine in one reaction polymorphisms as well as absolute copy numbers of (pseudo)genes. In a preferred embodiment, nucleic acid sequences representing all possible (pseudo)genes of a gene cluster are present on a calibrator according to the invention. As used herein, “representing all possible (pseudo)genes of a gene cluster” means that for each (pseudo)gene of a gene cluster a nucleic acid sequence which is at least 70% identical to a part of at least 10 nucleotides of said (pseudo)gene is present on the calibrator. Said part of said (pseudo)gene preferably comprises at least one polymorphism, such as a SNP. Such calibrator according to the invention comprising nucleic acid sequences representing all possible (pseudo)genes of a gene cluster allows for the determination of presence or absence, as well as the copy number, of each of said (pseudo)genes in an individual, using a sample of said individual. Thus, such calibrator enables determining the entire haplotype of a polygenic and polymorphic gene cluster of an individual.

Now that a calibrator according to the invention is provided by the invention, an improved reference sample when determining (absolute) copy number variation has become available. This is for instance shown in Example 5 for determining the presence and copy number of KIR genes, which form a particularly polygenic and highly polymorphic gene locus. A calibrator according to the invention can be advantageously used with any method known to a person skilled in the art for detecting nucleic acid, such as (real-time) PCR, PCR-SSP, multiplex PCR and PCR-SSOP and MLPA. However, a calibrator according to the invention is particularly suitable for use in an MLPA method according to the invention.

An example of how a calibrator according to the invention can be used in a reference sample in an MLPA reaction is as follows. A test sample with nucleic acid of an individual and a reference sample comprising a calibrator according to the invention are provided with MLPA probes. Following ligation of MLPA probes to nucleic acid of said test sample and to said calibrator of said reference sample, an amplification reaction with both the reference sample with calibrator and the test sample is performed. A calibrator according to the invention and the MLPA probes are designed such that each amplified nucleic acid variant of a (pseudo)gene containing a polymorphism has a different length. It is thus immediately apparent which variant of a (pseudo)gene is present from the presence and length of amplified product. Furthermore, the amount of amplification products derived from (pseudo)genes of interest of the test sample can be correlated to the amount of amplification products of the calibrator of the reference sample. Also, the amount of amplification product derived from a control gene of the test sample, of which the copy number in the genome is known, can be correlated to the amount of amplification product of the corresponding control sequence of the calibrator of the reference sample. Based on the correlations of the amounts of amplified product of a (pseudo)gene of interest between the test sample and the reference sample, and the correlations of the amounts of amplified product of a control gene (with a constant copy number in the human genome) between the test sample and the reference sample, the copy number of the (pseudo)genes of interest in an individual can be determined. A more detailed example of the determination of the copy number of a gene, when used with a calibrator according to the invention in an MLPA method, is described below.

An MLPA method according to the invention comprises the use of a sample with nucleic acids obtained from an individual (“a test sample”) and a reference sample comprising a calibrator according to the invention (“a reference sample”). In one embodiment, both said test sample and said reference sample containing the calibrator are subjected to an MLPA method, preferably an MLPA method according to the invention. This comprises the addition to said test sample and said reference sample of at least one probe set which is complementary to part of a (pseudo)gene of interest (said part preferably comprising a polymorphism), and at least one probeset which is complementary to part of a control gene. The probe sets are allowed to hybridize to the target nucleic acid in said test sample and to the target nucleic acid located on the calibrator in the reference sample. Said target nucleic acid located on the calibrator is at least 70% identical to target nucleic acid in said test sample. Subsequently the probes of the different probe sets hybridized to nucleic acid in said test sample and in said reference sample are subjected to a ligation reaction. As herein before described in detail, ligation will only occur if probes of a specific probeset are hybridized immediately adjacent to each other on their target sequences. Thus, if one specific variant of a polymorphic gene is present in the test sample, only the probes of the probe set specific for this specific variant will ligate, whereas the probes of probe sets specific for other gene variants will not ligate. As described before, a ligated probe set according to the invention is flanked by two primer binding sites. During a subsequent amplification reaction, only the ligated probe sets will be amplified. Hence, the presence or absence of each gene variant in a test sample is directly determined.

The calibrator, however, preferably contains binding sites for each of the probe sets, so that amplification of all probe sequences will occur in the reference sample. This avoids false-negative test results; if a given probe sequence is not amplified in the test sample, it is verified whether the corresponding probe sequence is amplified in the reference sample, i.e. from the calibrator. Only if the probe sequence is indeed amplified in the reference sample, the absence of the probe sequence in the test sample is considered a reliable result. If the probe sequence appears not to be amplified in the reference sample either, this indicates a failure of the test procedure and the test results are to be discarded. Hence, false-negative test results are avoided. A binding site for a particular probe set is a nucleic acid sequence that is at least 70% identical to part of a (pseudo)gene to which said probe set is complementary. As used herein, reference to a gene of interest also encompasses a pseudogene of interest.

Besides probe sets specific for a gene of interest, probe sets specific for control genes are used. Control genes preferably have a constant copy number in the human genome, such as for instance two. This copy number is known. Nucleic acid sequences with a length of at least 10 nucleotides which are at least 70% identical to control gene sequences and nucleic acid sequences with a length of at least 10 nucleotides which are at least 70% identical to gene of interest sequences are present in known amounts on the calibrator. It is therefore possible to correlate amplified products from the calibrator to amplified products of a test sample. This is for instance done as follows. Amplification reactions of the test sample and of the reference sample will result in an intensity peak pattern with peaks for each amplified nucleic acid product. The peaks will have varying intensity (for instance height or surface). The peaks represent amplified nucleic acid sequences indicative for a gene of interest or a control gene. The peak intensity (for instance height or surface) of an amplified control gene product of the test sample and the peak intensity of the amplified product of the same control gene sequence in the reference sample (i.e. of the calibrator) are compared. This is done for each control gene sequence. The peak intensities of both amplified products ought to be the same, representing for instance 2 copies per genome or DNA sample tested if the copy number of the control gene is 2. The control genes are also an internal quality control of both the test and the reference sample, because amplified product is only detected if the MLPA reaction was successful. The proportion between the peak intensities of amplified control gene product in the test sample and in the reference sample can be determined based on relative differences in peak intensity. Subsequently, the peak intensity of amplified product of a gene of interest of the test sample and the peak intensity of amplified product of a sequence of the same gene of the reference sample (i.e. of the calibrator) are also compared. This is also done for each gene of interest. The proportion between the peak intensities of amplified product of each gene of interest in the test sample and in the reference sample is determined based on relative differences in peak intensity as well.

If nucleic acid corresponding to a specific nucleic acid sequence in a control gene and nucleic acid corresponding to a to a specific nucleic acid sequence in a gene of interest are present on the calibrator in the same amount, equal amounts of product will in principle be amplified during the amplification reaction. In that case, the peak intensities of the control gene and the gene of interest can be directly compared. If the control gene and the gene of interest are present in the same amount in the test sample as well, the difference between the peak intensities of control gene in test sample and reference sample will be comparable to the difference between the peak intensities of the gene of interest in the test sample and reference sample. Thus, the proportion between the peak intensities of amplified product of the gene of interest in the test and reference sample, and the proportion between the peak intensities of amplified control gene product in the test and reference sample are equal if the copy number of the gene of interest and the copy number of the control gene are equal. For instance, if the copy number in the human genome of a control gene is 2, and if the peak intensity of amplified control gene product in a reference sample is determined to be 2 and the relative peak intensity of amplified control gene product in a test sample is determined to be 3, the proportion between the peak intensities of amplified product of the control gene in the test sample versus the reference sample is 3/2=1.5. If it is determined that the proportion between the peak intensities of amplified product of a gene of interest in the test sample versus the reference sample is also 1.5, it can be concluded that the copy number of the gene of interest is identical to the copy number of said control gene, that is 2. This direct comparison of peak intensity proportions only applies if nucleic acid corresponding to part of the control gene and nucleic acid corresponding to part of the gene of interest are present in the same amount on the calibrator. It is, therefore, preferred to use the same number of different nucleic acid sequences on the calibrator. Otherwise, the difference between the number of different nucleic acid sequences on the calibrator needs to be taken into account when calculating the copy number of the corresponding genes in the test sample.

When equal numbers of different nucleic acid sequences are present on the calibrator, the proportion between the peak intensities of amplified product of a gene of interest in the test and reference sample, and the proportion between the peak intensities of amplified control gene product in the test and reference sample, are substantially equal if the copy number of the control gene and the copy number of the gene of interest in a test sample are the same, and these proportions are not equal if the copy number of the control gene and the copy number of the gene of interest in a test sample are different. The copy number of the gene of interest in the test sample can then be calculated based on the peak intensity proportions of the control gene of the test sample and the control gene of the reference sample and the peak intensity proportions of the gene of interest of the test sample and the gene of interest of the reference sample, making use of the known copy number in the test sample of the control gene. For instance, if, like in the example above, the copy number of a control gene is 2, and if the peak intensity of amplified control gene product in the reference sample is determined to be 2 and the relative peak intensity of amplified control gene product in the test sample is determined to be 3, the proportion between the peak intensity of amplified product of the control gene in the test sample versus the reference sample is 3/2=1.5. If the proportion between the peak intensities of amplified product of a gene of interest in test and reference sample is determined to be 3, instead of 1.5, it can be concluded that the copy number of the gene of interest is twice (3/1.5) the copy number of the control gene, that is 4. It is also possible to determine the proportion of the peak intensity of the control gene and the peak intensity of the gene of interest of the reference sample and to compare this with the proportion of the peak intensities of amplification product of the corresponding genes of the reference sample. If this proportion of the reference sample is about one (meaning that the peak intensities are about the same, which is often the case when the number of gene-specific sequences and control-specific sequences on the calibrator are the same), and if this proportion of the test sample is 2, it can be concluded that the copy number of the gene of interest in the individual is twice the (known) copy number of the control gene in the individual.

Although not necessary, before determining the copy number of a gene of interest in a test sample, the concentration of total nucleic acid in the test sample is preferably measured, for instance using spectrometry. The molecular weight of a calibrator according to the invention is, of course, also known. The concentration of nucleic acid in the test sample and the concentration of nucleic acid (i.e. calibrator) in a reference sample can then be made approximately equal, so that the peak intensities of amplified product of control genes, which have a constant copy number, of the test sample and of the reference sample are approximately equal. In that case, a direct comparison between the peak intensities of amplified product of a gene of interest of a test sample and a reference sample can be made. Then, the differences in nucleic acid concentration in the test and references samples do not need to be taken into account.

For instance, the concentration of nucleic acid in the test sample and the concentration of the calibrator in the reference sample can both be based on the molecular weight of the human genome. If, for example, the amount of genomic DNA added in a test sample of the assay is 100 nanogram, the concentration of genomic DNA in the assay is then 4.8E-15 Mol/liter, based on the fact that a diploid human female and male nuclei in G₁phase of the cell cycle should contain 6.950 and 6.829 pg of DNA, respectively. It is then possible to prepare a reference sample with calibrator in the same concentration as the concentration of nucleic acid in the test sample, because the weight of the calibrator can be calculated if the exact composition of the calibrator is known. In that case, a copy number of for instance 2 for a control gene in the test sample will result in the same amount of amplified product in both the test and the reference sample. It follows that the peak intensities of this control gene product will then be approximately equal for the test sample and the reference sample.

Preferably, a calibrator according to the invention comprises the same number of copies of each nucleic acid sequence. For instance, 1 copy of each nucleic acid which is at least 70% identical to part of a gene of interest and 1 copy of each nucleic acid which is at least 70% identical to part of a control gene are preferably present on the calibrator.

Accordingly, the invention provides a nucleic acid molecule comprising at least one control nucleic acid sequence and at least one nucleic acid sequence with a length of at least 10 nucleotides which is at least 70% identical to part of a gene or pseudogene of interest, or a complementary sequence thereof, wherein at least 80%, preferably at least 85%, more preferably at least 90%, more preferably at least 95%, of said nucleic acid sequences which are at least 70% identical to part of at least one gene or pseudogene of interest comprise a sequence that is identical to, or complementary to, a gene-specific nucleotide and/or a pseudo-gene specific nucleotide and/or a gene-specific sequence and/or a pseudogene-specific sequence and/or an additional polymorphism within said gene or pseudogene, said polymorphism preferably comprising an SNP. Preferably, a nucleic acid molecule according to the invention comprises at least one nucleic acid sequence with a length of at least 10 nucleotides which is at least 75%, more preferably at least 80%, more preferably at least 85%, more preferably at least 90%, more preferably at least 95% identical to a part of a gene or pseudogene of interest, or a complementary sequence thereof, said part comprising a (pseudo)gene-specific nucleotide and/or sequence, and/or an additional (pseudo)gene-specific polymorphism preferably an SNP. In one embodiment, a nucleic acid molecule according to the invention comprises at least one nucleic acid sequence with a length of at least 10 nucleotides which is identical to a part of a gene or pseudogene of interest, or a complementary sequence thereof, said part comprising a (pseudo)gene-specific nucleotide and/or sequence, and/or an additional (pseudo)gene-specific polymorphism preferably an SNP. The invention further provides a vehicle or plasmid comprising a nucleic acid molecule according to the invention.

As used herein, a “nucleic acid molecule” or a “nucleic acid sequence” comprises a chain of nucleotides, preferably DNA and/or RNA. A nucleic acid molecule or nucleic acid sequence of the invention may be single stranded or double stranded. In other embodiments a nucleic acid molecule or nucleic acid sequence of the invention comprises other kinds of nucleic acid structures such as for instance a DNA/RNA helix, peptide nucleic acid (PNA), locked nucleic acid (LNA) and/or a ribozyme. Hence, the term “nucleic acid sequence” also encompasses a chain comprising non-natural nucleotides, modified nucleotides and/or non-nucleotide building blocks which exhibit the same function as natural nucleotides.

As used herein, “copy number of a (control) gene or pseudogene” refers to the number of DNA molecules of said gene or pseudogene in the genome of an individual.

The term “complementary” is known in the art. A complementary sequence as used herein refers to a nucleic acid sequence of which the base pairs can be non-covalently connected to the target sequence.

As used herein, a “vehicle” is defined as any means that can contain a nucleic acid molecule, such as for instance a vector or plasmid. A “plasmid” is defined herein as a circular, double-stranded DNA molecule.

As used herein, the term “% sequence identity to part of a gene” is defined as the percentage of residues in a nucleotide sequence that is identical with the residues in said part of a gene after aligning the two sequences and introducing gaps, if necessary, to achieve the maximum percent identity. Methods and computer programs for the alignment are well known in the art. One computer program which may be used or adapted for purposes of determining whether a candidate sequence falls within this definition is Autoassembler 2.0 (ABI Prism, Perkin Elmer).

As used herein a “control nucleic acid sequence” is a nucleic acid sequence with a length of at least 10 nucleotides which is at least 70% identical to part of a gene other than the gene of interest, or a complementary sequence thereof. Preferably, a control nucleic acid sequence is at least 75%, more preferably at least 80%, more preferably at least 85%, more preferably at least 90%, more preferably at least 95% identical to part of a gene other than the gene of interest, or a complementary sequence thereof. Preferably, gene other than the gene of interest, herein called a control gene, has a constant copy number in the human genome. Most preferably, it is known how many copies of said control gene are present in the genome of each human. Said control gene is thus preferably not subject to copy number variation. In a preferred embodiment, said control gene has two copies in the human genome. The invention therefore also provides a nucleic acid molecule according to the invention, wherein said control nucleic acid sequence is at least 70% identical to, or complementary to, a part of a control gene which has a constant copy number in the human genome, preferably wherein said control gene has a copy number of two in the human genome.

Examples of genes which are normally not subject to copy number variation and which are known to have a copy number of 2 are FGF3, BCAS4, LMNA, PARK2, MSH6, GALT, SPG4, IL-4 and NF2. Therefore, in a preferred embodiment, said at least one control nucleotide sequence is at least 70% identical to, or complementary to, a part of FGF3, BCAS4, LMNA, PARK2, MSH6, GALT, SPG4, IL-4 and/or NF2, said part having a length of at least 10 nucleotides. In FIGS. 3E and 3H, probe sets that are particularly suitable to hybridize to these control genes are depicted. In a preferred embodiment, said at least one control nucleotide sequence therefore has at least 70% sequence identity with a probe of at least one probe set of FIGS. 3E and/or 3H, or complementary sequences of said probes. More preferably said at least one control nucleotide sequence has at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity with a probe of at least one probe set of FIGS. 3E and/or 3H, or complementary sequences of said probes. Preferably, control nucleotide sequences are used which together have at least 70%, more preferably at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity with all probes of at least one probe set of FIGS. 3E and/or 3H, or complementary sequences thereof. In one embodiment, control nucleotide sequence are used that together are identical to the sequence of all probes of at least one probe set of FIGS. 3E and/or 3H, or complementary sequences thereof.

The left and right probes of the probesets of FIGS. 3E and 3H contain a primer binding site, which are indicated in bold and underlined in these figures. The primer binding site of each left probe of FIGS. 3E and 3H consists of the sequence GGGTTCCCTAAGGGTTGGA and the primer binding site of each right probe of FIGS. 3E and 3H consists of the sequence TCTAGATTGGATCTTGCTGGCAC or TCTAGATTGGATCTTGCTGGCGC. A control nucleotide sequence located on a nucleic acid molecule according to the invention preferably has at least 70% sequence identity with the sequence of a probe of at least one probe set of FIGS. 3E and/or 3H without the primer binding sites. More preferably said at least one control nucleotide sequence has at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity with the sequence of a probe of at least one probe set of FIGS. 3E and/or 3H without the primer binding sites, i.e. without the bold and underlined sequences. Said at least one control nucleic acid sequence is preferably selected from the group of nucleic acid sequences having at least 70% sequence identity to the probe sequences, without the primer binding sites, of the probe sets indicated as Control 1 (IL-4), Control 2 (FGF3), Control 3 (BCAS4), Control 4 (LMNA), Control 8 (GALT), Control 9 (SPG4) and Control 10 (NF2) in FIG. 3H. As demonstrated in Examples 4 and 5 and FIG. 19, these 7 control genes are particularly suitable for use for a calibrator according to the invention.

The use of at least two control nucleic acid sequences which are at least 70% identical to parts of different control genes is preferred because this allows for a more accurate determination of the copy number of a gene of interest. Therefore, a nucleic acid molecule according to the invention preferably comprises at least two control nucleic acid sequences, more preferably at least three, more preferably at least four, more preferably at least five, more preferably at least six, more preferably at least seven control nucleic acid sequences. Said control nucleic acid sequences are preferably selected from the group of nucleic acid sequences having at least 70% sequence identity to the probe sequences, without the primer binding sites, of the probe sets indicated as Control 1 (IL-4), Control 2 (FGF3), Control 3 (BCAS4), Control 4 (LMNA), Control 8 (GALT), Control 9 (SPG4) and Control 10 (NF2) in FIG. 3H, or a combination thereof. In a particularly preferred embodiment, a nucleic acid molecule or vehicle or plasmid according to the invention comprises control nucleic acid sequences which have at least 70% sequence identity to the probes of the probe sets indicated as Control 1 (IL-4), Control 2 (FGF3), Control 3 (BCAS4), Control 4 (LMNA), Control 8 (GALT), Control 9 (SPG4) and Control 10 (NF2) in FIG. 3H.

As explained above, a calibrator according to the invention is particularly suitable for determining the copy number of a gene of interest in an individual. Therefore, nucleic acid sequences located on a calibrator according to the invention are preferably at least 70% identical to part of a gene of interest which is subject to copy number variation. The invention thus provides a nucleic acid molecule comprising at least one nucleic acid sequence with a length of at least 10 nucleotides which is at least 70% identical to part of a gene of interest, or a complementary sequence thereof, and at least one control nucleic acid sequence, wherein at least 80% of said nucleic acid sequences which are at least 70% identical to part of a gene of interest comprise a sequence that is identical to, or complementary to, a gene-specific nucleotide and/or a pseudo-gene specific nucleotide and/or a gene-specific sequence and/or a pseudogene-specific sequence and/or an additional polymorphism within said gene or pseudogene, said polymorphism preferably comprising an SNP, wherein at least one of said genes of interest is subject to copy number variation in the human genome. Said control nucleic acid is preferably at least 70% identical to, or complementary to, a part of a gene which has a constant copy number in the human genome.

Preferably each (pseudo)gene-specific nucleic acid sequence located on a calibrator according to the invention comprises a sequence that is identical to, or complementary to, a gene-specific nucleotide and/or a pseudo-gene specific nucleotide and/or a gene-specific sequence and/or a pseudogene-specific sequence and/or an additional polymorphism within said gene or pseudogene. Such nucleic acid sequence is unique for said (pseudo)gene of interest. Said nucleic acid sequence located on the calibrator can thus be used to distinguish said specific gene variant from other genes, such as other gene variants and/or other genes of a gene cluster. If a nucleic acid molecule according to the invention comprises nucleic acid sequences that are specific for each (pseudo)gene variant of a gene cluster of interest, such nucleic acid molecule can be used to determine the haplotype, including copy number variation, of said gene cluster for an individual. Thus, in one embodiment, a calibrator according to the invention is provided that comprises nucleic acid sequences which together are at least 70% identical to, or complementary to, parts of each gene variant of a gene cluster of interest. Such calibrator can be used to determine in one reaction the presence or absence and copy number of each gene of said gene cluster, for instance of the KIR or HLA gene cluster. This means that with the use of such calibrator, the complete haplotype, including gene copy number, of a gene cluster in a sample of an individual can be determined. Therefore, in a preferred embodiment, a nucleic acid molecule, vehicle or plasmid according to the invention is provided that comprises nucleic acid sequences which together are at least 70% identical to, or complementary to, parts of each gene of a gene cluster of interest, or complementary sequences thereof.

Because a calibrator according to the invention is particularly suitable for use in an MLPA method according to the invention, (pseudo)gene-specific nucleic acid sequences located on the calibrator preferably have the same number of nucleotides as an MLPA probe set according to the invention depicted in FIG. 3. Thus, a nucleic acid sequence located on a calibrator according to the invention, which is at least 70% identical to part of a gene of interest or to part of a control gene, preferably has a length of between 40 and 600 nucleotides. As described herein before, preferably chemically synthesized probes are used with a length of between 20 and 110 nucleotides because such probes can be synthesized easily and cost-effective. In view of the fact that probe sets according to the invention preferably contain 2 or 3 probes, a nucleic acid sequence located on a calibrator according to the invention, which is at least 70% identical to part of a gene of interest or to part of a control gene, preferably has a length of between 40 and 330 nucleotides. Most preferably, such nucleic acid sequence has a length of between 90 and 300 nucleotides. Preferably said nucleic acid sequence does not contain the primer binding sites of probes according to the invention.

A calibrator according to the invention is also particularly useful as an internal quality control when determining the presence or copy number of a (pseudo)gene of interest in a sample of an individual. Without the use of such a control, if a sample of an individual is subjected to an amplification reaction, for instance as part of an MLPA method, the absence of amplified product of a gene of interest may indicate that said gene of interest is not present in said sample. However, it is also possible that the amplification reaction failed.

As explained before, if a reference sample containing a calibrator according to the invention comprising a nucleic acid sequence specific for the same gene of interest is subjected to the same MLPA method as a sample of an individual, it serves as a control for the success of an amplification reaction. If no amplified product is obtained using said sample of said individual, but amplified product is obtained using said reference sample, it can be determined that the amplification was successful. In that case, it can be concluded that said gene of interest is not present in said sample of said individual. The presence of amplified product from said reference sample proves that the amplification reaction was successful.

On the other hand, if amplified product is not obtained from said sample of said individual and also not from said reference sample, it can be concluded that the amplification reaction failed. If the amplification reaction succeeded, at least in the reference sample amplified product should be present.

Of course, if amplified product following an amplification reaction is present both when using said sample of said individual and when using said reference sample, it can be concluded that said gene of interest is present in said individual.

Preferably each nucleic acid sequence located on a calibrator according to the invention is separated from the upstream or downstream nucleic acid sequence by a spacer sequence of at least 5 nucleotides. This allows an efficient hybridization of MLPA probes of a multiple probe set according to the invention to a calibrator according to the invention and it allows an efficient amplification reaction. A “spacer sequence” as used herein is defined as a nucleotide sequence that it not present in the probes used in an MLPA reaction. More preferably, each (pseudo) gene-specific or control gene-specific nucleic acid sequence located on the calibrator is followed by a spacer sequence of at least 10, more preferably at least 15, more preferably at least 20 nucleotides. Preferably said spacer sequence consists of at most 100 nucleotides, more preferably at most 80 nucleotides to limit the size of a nucleic acid molecule according to the invention. This is not necessary, however: said spacer sequences can be larger than 100 nucleotides. However, in that case a large calibrator will be generated which can be disadvantageous, for instance because the larger the calibrator, the more complicated it is to synthesize. Therefore in a preferred embodiment a nucleic acid molecule or vehicle or plasmid according to the invention is provided, wherein each (pseudo)gene-specific nucleic acid sequence or complementary sequence thereof and/or each control nucleotide sequence or complementary sequence thereof is followed from 5′ to 3′ by a spacer sequence of between 5 and 100 nucleotides, preferably of between 20 and 80 nucleotides.

As described above, a calibrator according to the invention is particularly suitable for determining the copy number of a (pseudo)gene of interest or for determining a haplotype of a gene cluster of interest in an individual. Also provided is therefore a use of a nucleic acid molecule or vehicle or plasmid according to the invention for determining a copy number of at least one (pseudo)gene of interest in an individual and a use of a nucleic acid molecule or vehicle or plasmid according to the invention for determining a haplotype of a gene cluster of interest of an individual.

Also provided is a method for determining a copy number of at least one (pseudo)gene of interest of an individual comprising:

amplifying a sequence with a length of at least 10 nucleotides of said at least one (pseudo)gene of interest using a sample of said individual and amplifying a sequence with a length of at least 10 nucleotides of said at least one (pseudo)gene of interest using a reference sample, said reference sample comprising a nucleic acid molecule or a vehicle or a plasmid according to the invention, and

amplifying a sequence with a length of at least 10 nucleotides of at least one control gene using said sample of said individual and amplifying a sequence with a length of at least 10 nucleotides of said at least one control gene using said reference sample;

determining a level of amplified product of said sequence of said at least one (pseudo)gene of interest from said sample of said individual and determining a level of amplified product of said sequence of said at least one (pseudo)gene of interest from said reference sample; and

determining a level of amplified product of said sequence of said at least one control gene from said sample of said individual and determining a level of amplified product of said sequence of said at least one control gene in said reference sample; and

comparing said levels of amplified products of said sequences of said at least one (pseudo)gene of interest with each other and with said levels of amplified products of said sequences of said at least one control gene, thereby determining the copy number of said at least one (pseudo)gene of interest. In one embodiment a method for determining a haplotype of a gene cluster of an individual is provided which method comprises determining a copy number of all genes of said gene cluster with a method of the invention.

A calibrator according to the invention is further particularly suitable to determine the presence or absence and the copy number of a (pseudo)gene of interest in an individual using an MLPA or MLPA-like method according to the invention, in which at least one probe set is used which consists of a left probe, a middle probe and a right probe. The invention therefore also provides a method for determining a copy number of at least one nucleic acid of interest in an individual, comprising the steps of:

a) adding to a sample of said individual and to a reference sample comprising a nucleic acid molecule or a vehicle or a plasmid according to the invention at least two different probe sets, each probe set comprising:

a first nucleic acid probe, said first probe comprising a first nucleic acid sequence complementary to a first region of said nucleic acid of interest and, located 5′ thereof, a non-complementary nucleic acid sequence comprising a first primer binding site, and

a second nucleic acid probe, said second probe comprising a second nucleic acid sequence complementary to a second region of said nucleic acid of interest and, located 3′ thereof, a non-complementary nucleic acid sequence comprising a second primer binding site,

wherein at least one of said probe sets comprises a third nucleic acid probe, said third probe comprising a third nucleic acid sequence complementary to a third region of said nucleic acid of interest, and

wherein, if said third probe is present in said probe set, said first and said third region of said nucleic acid of interest are located essentially adjacent to each other and said third and said second region of said nucleic acid of interest are located essentially adjacent to each other, and

wherein, if said third probe is not present in said probe set, said first and said second region of said nucleic acid of interest are located essentially adjacent to each other,

wherein at least one third nucleic acid probe is complementary to a region of said nucleic acid of interest comprising a gene-specific nucleotide and/or a pseudogene-specific nucleotide and/or a gene-specific sequence and/or a pseudogene-specific sequence and/or an additional polymorphism within a given gene or pseudogene, said polymorphism preferably comprising an SNP

b) adding to said sample of said individual and to said reference sample at least one different probe set, each probe set comprising:

a first nucleic acid probe, said first probe comprising a first nucleic acid sequence complementary to a first region of a control nucleic acid sequence and, located 5′ thereof, a non-complementary nucleic acid sequence comprising a first primer binding site, and

at least a second nucleic acid probe, said second probe comprising a second nucleic acid sequence complementary to a second region of said control nucleic acid and, located 3′ thereof, a non-complementary nucleic acid sequence comprising a second primer binding site, and

c) allowing hybridization of said at least two different probe sets to complementary nucleic acid of said sample of said individual,

d) allowing hybridization of said at least two different probe sets to complementary nucleic acid of said reference sample,

e) subjecting nucleic acid of said sample of said individual, and nucleic acid of said reference sample to a ligation reaction,

f) subjecting nucleic acid of said sample of said individual and nucleic acid of said reference sample to a nucleic acid amplification reaction, using at least one primer capable of specifically binding said first primer binding site and at least one primer capable of specifically binding said second primer binding site, and

g) determining whether amplified nucleic acid is present, thereby determining whether said at least one nucleic acid sequence of interest and/or said control nucleic acid is present in said sample of said individual,

h) determining a level of amplified product of said at least one nucleic acid sequence of interest of said sample of said individual and a level of amplified product of said at least one nucleic acid sequence of interest of said reference sample;

i) determining a level of amplified product of said at least one control nucleic acid sequence of said sample of said individual and a level of amplified product of said at least one control nucleic acid sequence of said reference sample;

j) comparing said levels of amplified product of said at least one nucleic acid of interest with said levels of amplified product of said at least one control nucleic acid, thereby determining the copy number of said at least one nucleic acid of interest.

As described above, the KIR locus in humans is polygenic and highly polymorphic, so that accurate and efficient characterization of an individual's KIR (pseudo)gene profile is cumbersome. A calibrator according to the invention is therefore particularly suitable for determining the presence and/or copy number of a KIR gene.

If the presence or absence of a specific (pseudo)gene is correlated with (predisposition to) disease, it is often sufficient to compare a sample of an individual with a reference sample of which it is known that the specific gene is present. However, the presence or absence of several KIR genes is not directly correlated with disease. An individual may lack one or more KIR genes without this resulting in disease. Importantly, the correlation between one or more specific KIR genes and disease or the predisposition to disease often depends on the copy number of the KIR gene. For instance, a copy number of 1 of a specific KIR gene is not correlated with a disease, but a copy number of 2 or more of this KIR gene results in, or predisposes to, disease. As an example, a higher copy number of KIR2DL2 and/or KIR2DS2 in an individual has been demonstrated to be predisposing for rheumatoid arthritis with extra-articular manifestations and rheumatoid vasculitis. Thus, obtaining information about the presence or absence of a specific KIR gene in an individual may not be sufficient to obtain information about the correlation between the KIR gene profile of an individual and the correlation to disease. It is also necessary to determine the copy number of KIR genes when information about such correlations are needed.

As described in Example 4, the present inventors constructed a calibrator according to the invention comprising nucleic acid sequences which are identical to parts of each currently known KIR gene. The sequence of this calibrator is depicted in FIG. 19. Such KIR calibrator has been demonstrated to be particularly suitable for determining the copy number of KIR genes and determining the complete KIR haplotype of an individual using a method according to the invention, as is shown in Example 5. Therefore, in a preferred embodiment, a calibrator according to the invention is specifically designed for determining the copy number of at least one KIR gene. Such calibrator is particularly useful in a MLPA method according to the invention using KIR-specific probe sets. Preferably, sequences of the probes of the KIR-specific probe sets depicted in FIG. 3A, 3B, 3C, 3D, 3F or 3G are used. A calibrator according to the invention that is specifically designed for determining the copy number of at least one KIR gene therefore preferably comprises at least one nucleic acid sequence which has at least 70% sequence identity with a KIR probe set according to the invention or with sequences complementary thereto. That way, a probe set according to the invention specific for a particular KIR gene can be used to hybridize not only to that particular KIR gene in a sample of an individual, but also to a nucleic acid sequence located on a calibrator according to the invention corresponding to part of said particular KIR gene.

The left and right probes of the probe sets of FIGS. 3A, 3B, 3C, 3D, 3F and 3G contain a primer binding site, which are indicated in bold and/or underlined in these figures. The primer binding site of each left probe of FIGS. 3A, 3B, 3C, 3D, 3F and 3G consists of the sequence GGGTTCCCTAAGGGTTGGA and the primer binding site of each right probe of FIGS. 3A, 3B, 3C, 3D, 3F and 3G consists of the sequence TCTAGATTGGATCTTGCTGGCAC or TCTAGATTGGATCTTGCTGGCGC. A nucleotide sequence located on a nucleic acid molecule according to the invention preferably has at least 70% sequence identity with a sequence of a probe of at least one probe set of FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G without the primer binding sites, i.e. without the bold and/or underlined sequences. More preferably said at least one nucleotide sequence has at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity with a sequence of a probe of at least one probe set of FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G without the primer binding sites.

The invention therefore provides a nucleic acid molecule comprising a nucleotide sequence which has at least 70% sequence identity with at least one nucleic acid sequence consisting of:

a) a probe set of FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G, without the primer binding sites GGGTTCCCTAAGGGTTGGA and TCTAGATTGGATCTTGCTGGCAC and TCTAGATTGGATCTTGCTGGCGC of said probe sets, or

b) a complementary sequence of said probe set without said primer binding sites,

wherein said nucleic acid sequence of a) or b) either comprises immediately adjacent to each other:

the sequences or complementary sequences of a left probe of said probe set, without primer binding site GGGTTCCCTAAGGGTTGGA, and of a right probe of the same probe set, without primer binding site TCTAGATTGGATCTTGCTGGCAC or TCTAGATTGGATCTTGCTGGCGC, if said probe set consists of two probes, or

the sequences or complementary sequences of a left probe of said probe set, without primer binding site GGGTTCCCTAAGGGTTGGA, and of a middle probe and right probe of the same probe set, without primer binding site TCTAGATTGGATCTTGCTGGCAC or TCTAGATTGGATCTTGCTGGCGC, if said probe set consists of three probes. Such nucleic acid molecule is herein also defined as “a nucleic acid molecule comprising a nucleic acid sequence of the probes of at least one probe set of FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G”. Preferably, a nucleotide sequence according to the invention has at least 75%, more preferably at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity with said nucleic acid sequence consisting of a probe set of FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G without the primer binding sites GGGTTCCCTAAGGGTTGGA and TCTAGATTGGATCTTGCTGGCAC or TCTAGATTGGATCTTGCTGGCGC, or a complementary sequence of said probe sets without the primer binding sites. In one embodiment, a nucleotide sequence according to the invention consists of the sequence of at least one probe set of FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G without the primer binding sites, or a complementary sequence thereof.

Hence, in a preferred embodiment a calibrator according to the invention comprises the sequences of all probes of a given probe set of FIG. 3 adjacent to each other. If, for instance, said at least one probe set of FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G consists of a left probe and a right probe, said nucleotide sequence thus consists from 5′ to 3′ of: the nucleotide sequence of said left probe without the primer binding site GGGTTCCCTAAGGGTTGGA, followed by the nucleotide sequence of said right probe without the primer binding site TCTAGATTGGATCTTGCTGGCAC or TCTAGATTGGATCTTGCTGGCGC, or sequences having at least 70% identity thereto. If said at least one probe set of FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G consists of a left probe, a middle probe and a right probe, said nucleotide sequence consists from 5′ to 3′ of: the nucleotide sequence of said left probe without the primer binding site GGGTTCCCTAAGGGTTGGA, followed by the nucleotide sequence of said middle probe, followed by the nucleotide sequence of said right probe without the primer binding site TCTAGATTGGATCTTGCTGGCAC or TCTAGATTGGATCTTGCTGGCGC, or sequences having at least 70%, more preferably at least 75%, more preferably at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% identity thereto.

If the copy number of a KIR gene is to be determined, the level of amplification product of said KIR gene using a sample of an individual and the level of amplification product of a corresponding sequence in a reference sample are preferably compared with an expression level of a control nucleic acid. Also provided is therefore a nucleic acid molecule comprising a nucleotide sequence which has at least 70% sequence identity with at least one nucleic acid sequence consisting of:

a) a probe set of FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G, without the primer binding sites GGGTTCCCTAAGGGTTGGA and TCTAGATTGGATCTTGCTGGCAC and TCTAGATTGGATCTTGCTGGCGC of said probe sets, or

b) a complementary sequence of said probe set without said primer binding sites,

wherein said nucleic acid sequence of a) or b) either comprises immediately adjacent to each other:

the sequences or complementary sequences of a left probe of said probe set, without primer binding site GGGTTCCCTAAGGGTTGGA, and of a middle probe and right probe of the same probe set, without primer binding site TCTAGATTGGATCTTGCTGGCAC or TCTAGATTGGATCTTGCTGGCGC, if said probe set consists of three probes, said nucleic acid molecule further comprising at least one control nucleic acid sequence or a complementary sequence thereof. Said nucleic acid molecule can be used as such. However, such nucleic acid molecule can also be present in a vehicle such as a plasmid, which optionally comprises other nucleic acid sequences. Also provided is therefore a vehicle or a plasmid comprising a nucleic acid molecule according to the invention. Such nucleic acid molecule as such or vehicle or plasmid comprising such nucleic acid molecule are herein also referred to as “KIR calibrator” according to the invention.

Generally, one will be interested in determining the copy number of more than one KIR gene, for instance for determining the KIR haplotype of an individual, or for determining predisposition to a disorder which is associated with the presence or absence or copy number of more than one KIR gene. A KIR calibrator according to the invention has the advantage that multiple nucleic acid sequences, each of which are at least 70% identical to part of a given KIR gene of interest, are included. Preferably, sequences specific for all known KIR genes are located on a calibrator according to the invention. Thus, a KIR calibrator preferably comprises for each known KIR gene a nucleic acid sequence which is at least 70% identical to a part with a length of at least 10 nucleotides of said KIR gene. In that case, only one KIR calibrator according to the invention needs to be present in a reference sample to determine the copy number of all KIR genes of interest.

A KIR calibrator may consist of a single nucleic acid molecule according to the invention. However, a KIR calibrator may also comprise multiple nucleic acid molecules according to the invention. Most preferably, but not necessary, however, a KIR calibrator according to the invention consist of one vehicle or plasmid according to the invention. A nucleic acid molecule according to the invention therefore preferably comprises nucleic acid sequences of the probes without primer binding sites of at least two probe sets of FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G, more preferably nucleic acid sequences of at least three, more preferably at least four, more preferably at least five, more preferably at least six, more preferably at least 7, more preferably at least 8, more preferably at least 9, more preferably at least 10 probe sets of FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G. In a particularly preferred embodiment, a nucleic acid molecule according to the invention comprises nucleic acid sequences of the probes without primer binding sites GGGTTCCCTAAGGGTTGGA and TCTAGATTGGATCTTGCTGGCAC or TCTAGATTGGATCTTGCTGGCGC of all probe sets of FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G, or complementary sequences thereof. Preferably a nucleic acid molecule according to the invention comprises nucleic acid sequences that are at least 70% identical to, or complementary to, all probes of FIGS. 3F and/or 3G, most preferably of FIGS. 3F and 3G, without the primer binding sites. Provided therefore is a nucleic acid molecule according to the invention, comprising sequences selected from the group consisting of:

all nucleic acid sequences of FIG. 3A without the primer binding sites GGGTTCCCTAAGGGTTGGA and TCTAGATTGGATCTTGCTGGCAC and TCTAGATTGGATCTTGCTGGCGC, or sequences having at least 70% sequence identity to said nucleic acid sequences, and

all nucleic acid sequences of FIG. 3B without the primer binding sites GGGTTCCCTAAGGGTTGGA and TCTAGATTGGATCTTGCTGGCAC and TCTAGATTGGATCTTGCTGGCGC, or sequences having at least 70% sequence identity to said nucleic acid sequences, and

all nucleic acid sequences of FIG. 3C without the primer binding sites GGGTTCCCTAAGGGTTGGA and TCTAGATTGGATCTTGCTGGCAC and TCTAGATTGGATCTTGCTGGCGC, or sequences having at least 70% sequence identity to said nucleic acid sequences, and

all nucleic acid sequences of FIG. 3D without the primer binding sites GGGTTCCCTAAGGGTTGGA and TCTAGATTGGATCTTGCTGGCAC and TCTAGATTGGATCTTGCTGGCGC, or sequences having at least 70% sequence identity to said nucleic acid sequences, and

all nucleic acid sequences of FIG. 3F without the primer binding sites GGGTTCCCTAAGGGTTGGA and TCTAGATTGGATCTTGCTGGCAC and TCTAGATTGGATCTTGCTGGCGC, or sequences having at least 70% sequence identity to said nucleic acid sequences, and

all nucleic acid sequences of FIG. 3G without the primer binding sites GGGTTCCCTAAGGGTTGGA and TCTAGATTGGATCTTGCTGGCAC and TCTAGATTGGATCTTGCTGGCGC, or sequences having at least 70% sequence identity to said nucleic acid sequences, and

any combination thereof, and

any complementary sequences thereof. Such preferred KIR calibrator comprises a nucleic acid sequence of part of each KIR gene currently known. A reference sample comprising such KIR calibrator is thus particularly suitable to determine the presence or absence and copy number of each currently known KIR gene and can thus be used to determine the KIR haplotype, including gene copy number, of any individual. This is for instance demonstrated in Example 5, which describes the determination of the KIR haplotype, including copy number variation, of two siblings of two different families using a KIR calibrator according to the invention. Without the use of such KIR calibrator, it would have been only possible to determine presence or absence of each KIR gene and not the absolute copy number, because no reference samples are currently available comprising known quantities of all KIR genes.

The sequence of the probes, without primer binding sites, of all probes sets of FIGS. 3F and 3G or sequences complementary thereto are particularly suitable for use in a KIR calibrator according to the invention as demonstrated in Examples 4 and 5 and FIG. 19. Such KIR calibrator has been used by the present inventors to determine the KIR haplotype and KIR copy number in multiple families.

The construction of a non-limiting example of a KIR calibrator according to the invention is described in Example 4. This specific KIR calibrator comprises nucleic acid sequences corresponding to part of each currently known KIR gene, and nucleic acid sequences corresponding to part of seven control genes which are known to have a constant copy number in the human genome of 2. The KIR calibrator described in Example 4 comprises the sequences of the probes of all probe sets of FIGS. 3F and 3G without the primer binding sites (i.e. without the bold and underlined part of the sequences). In this examples, for some probe sequences the sequences depicted in FIGS. 3F and 3G are used, and for some probe sequences the sequences that are complementary to the depicted sequences of FIGS. 3F and 3G are used. As the target nucleic acid in the test sample is double stranded DNA, both the sense or antisense sequences can be used. The calibrator depicted in Example 4 further comprises the sequences of the probes of the probe sets indicated as Control 1 (IL-4), Control 2 (FGF3), Control 3 (BCAS4), Control 4 (LMNA), Control 8 (GALT), Control 9 (SPG4) and Control 10 (NF2) in FIG. 3H. Again, for some probes the complementary sequences are used. The sequence of this specific KIR calibrator of Example 4 is depicted in FIG. 19. This KIR calibrator has been demonstrated in Example 5 to be particularly suitable for determining the copy number of KIR genes and determining the complete KIR haplotype in individuals using MLPA. In one embodiment, the invention therefore provides a nucleic acid molecule comprising a sequence which has at least 70% sequence identity with the sequence depicted in FIG. 19. Preferably said sequence has at least 75%, more preferably at least 80%, more preferably at least 85%, more preferably at least 90%, more preferably at least 95% sequence identity with the sequence depicted in FIG. 19. In one embodiment, a nucleic acid molecule is provided which comprises a sequence depicted in FIG. 19.

A KIR calibrator according to the invention may have spacer sequences between nucleic acid sequences or complement thereof which correspond to part of a KIR gene and/or a control gene. Preferably, in such nucleic acid molecule, most variation in sequence is allowed in spacer nucleic acid sequences. Nucleic acid sequences which correspond to part of a KIR gene or to part of a control gene or complement thereof preferably have at least 70%, more preferably at least 75%, more preferably at least 80%, more preferably at least 85%, more preferably at least 90%, more preferably at least 95% sequence identity with the corresponding sequence in FIG. 19. The spacer sequences are allowed to have a sequence which differs more than 70% from the corresponding sequence in FIG. 19. Because these spacer sequences only serve as spacers between gene-specific sequences, their nucleotide sequence has no particular function. No probes need to hybridize to these spacer sequences during an MLPA reaction. In fact, hybridization to these spacer sequences is unwanted. Therefore the exact nucleotide sequence is not important, as long as these spacer sequences are not complementary to the probes and primers that are used in the MLPA reaction. Thus, the sequence identity between a spacer sequence in a KIR calibrator according to the invention and the KIR calibrator of FIG. 19 can be less than 60%, less than 50%, less than 40%, less than 30%, less than 25% or even lower.

The invention also provides a use of a nucleic acid molecule or a vehicle or plasmid according to the invention for determining the copy number of at least one KIR gene in an individual and/or for determining a KIR haplotype of an individual.

The invention further provides method for determining the copy number of at least one KIR gene of an individual comprising:

amplifying a sequence with a length of at least 10 nucleotides of said at least one KIR gene using a sample of said individual and amplifying a sequence with a length of at least 10 nucleotides of said at least one KIR gene using a reference sample, said reference sample comprising a nucleic acid molecule or a plasmid according to the invention, and

determining a level of amplified product of said sequence of said at least one KIR gene from said sample of said individual and determining a level of amplified product of said sequence of said at least one KIR gene from said reference sample; and

comparing said levels of amplified products of said sequences of said at least one KIR gene with each other and with said levels of amplified products of said sequences of said at least one control gene, thereby determining the copy number of said at least one KIR gene.

Said part of said at least one KIR gene and said part of said at least one control gene preferably comprise at least 10, more preferably at least 15, more preferably at least 18, more preferably at least 19, more preferably at least 20 nucleotides. As described herein before, in an MLPA method according to the invention, preferably chemically synthesized MLPA probes are used with a length of between 20 and 110 nucleotides because such probes can be synthesized easily and cost-effective. Therefore, if the copy number of at least on KIR gene is determined using MLPA, the KIR gene-specific and control gene-specific sequences located on a calibrator according to the invention preferably have a length of between 40 and 330 nucleotides. Most preferably, such nucleic acid sequences have a length of between 90 and 300 nucleotides.

As described herein before, a KIR calibrator is particularly suitable for determining the KIR haplotype and/or copy number of KIR genes using an MLPA method. Therefore, in a preferred embodiment a method according to the invention further comprises the steps of:

a) adding to said sample of said individual and to said reference sample at least one probe set selected from FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G, and

b) optionally, adding to said sample of said individual and to said reference sample at least one probe set selected from FIG. 3E or 3H, and

c) allowing hybridization of said probe set or probe sets to complementary nucleic acid of said sample of said individual, and

d) allowing hybridization of said probe set or probe sets to complementary nucleic acid of said reference sample, and

e) subjecting nucleic acid of said sample of said individual, and nucleic acid of said reference sample to a ligation reaction.

Said method preferably further comprises amplifying ligated nucleic acid and determining levels of amplified products, thereby determining the copy number of at least one KIR gene of said individual. In a preferred embodiment, a method according to the invention comprises the use of at least one of the probe sets selected from FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G that comprise a third nucleic acid probe.

The invention also provides a method for determining a KIR haplotype of an individual comprising determining the copy number of at least 5, preferably at least 10, more preferably at least 15, most preferably all KIR genes of said individual with a method according to the invention.

It is described herein before in detail how the copy number of a gene is determined based on the level of amplified product of part of said gene in a test sample and in a reference sample and the level of amplified product of part of at least one control gene in said test sample and said reference sample. Briefly, the difference between the peak intensities of each amplified control gene product is determined by comparing for each control gene the intensity peak of amplified product in a test sample and the intensity peak of amplified product in a reference sample. The difference between the peak intensity of each amplified KIR gene is also determined by comparing for each KIR gene the intensity peak of amplified product in said test sample and the intensity peak of amplified product in said reference sample. Subsequently, the copy number of the KIR gene is determined based on the proportions of the peak intensities of the KIR gene in test and reference sample and the proportion of the peak intensities of the control gene in test and reference sample.

KIR polymorphisms have been associated with disease. Association between KIR polymorphisms and subtypes of leukemia were investigated by Zhang et al. (Zhang et al. 2009). The presence of KIR2DS4 was demonstrated to be predisposing to chronic myelogenous leukemia (CML) and the absence of KIR2DS3 was predisposing to acute lymphoblastic leukemia (ALL). KIR2DS4 is present in haplotype A, whereas KIR2DS3 is present in haplotype B. Presence of KIR2DS4 and absence of KIR2DS3 are predisposing to leukemia subtypes. Thus, characteristics of haplotype A are predisposing to leukemia subtypes. The present invention provides probes that are particularly well suitable for detecting KIR genes, including KIR2DS4 and KIR2DS3. Thus, with probes according to the present invention selected from FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G the presence and/or absence of KIR2DS4 and KIR2DS3 in a sample is particularly well determined. Preferably probesets 540A/540C, and/or 513B/513D and/or 504A/504B, and/or 708K/708L/708M/708N as depicted in FIGS. 3C, 3D, 3F and/or 3G are used to detect KIR2DS3 and/or KIR2DS4 polymorphisms. With probes selected from FIG. 3 predisposition to leukemia subtypes is thus particularly well determined.

Therefore, in one embodiment the invention provides a method for determining predisposition to leukemia of an individual comprising determining the presence or absence of KIR2DS4 and/or KIR2DS3 in a nucleic acid sample of said individual with at least one probeset listed in FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G, wherein the presence of KIR2DS4 is indicative for a predisposition for chronic myelogenous leukemia and the absence of KIR2DS3 is indicative for a predisposition for acute lymphoblastic leukemia. In a preferred embodiment probe set 540A/540C, and/or 513B/513D and/or probe set 504A/504B, and/or 708K/708L/708M/708N as depicted in FIGS. 3C, 3D, 3F and/or 3G are used for determining the presence or absence of KIR polymorphisms. As used herein, the term “nucleic acid sample” means a sample comprising nucleic acid. Said sample may of course further comprise other components, such as for instance proteins. Preferably, nucleic acid is at least partly isolated from said sample before being subjected to a method according to the present invention.

Association between KIR polymorphisms and inflammatory bowel disease (IBD) and/or Crohn's disease have been established as well (Hollenbach et al 2009). The KIR2DL2/KIR2DL3 heterozygous genotype predisposes or protects from Crohn's disease depending on the presence of their HLA-C ligands. KIR2DL2/KIR2DL3 heterozygosity in combination with C1 predisposes to Crohn's disease whereas KIR2DL2/KIR2DL3 heterozygosity in combination with C2 protects from IBD and/or Crohn's disease. KIR2DL2/KIR2DL3 heterozygosity in combination with C1/C2 heterozygosity has an intermediate effect on predisposition (Hollenbach et al 2009). Non-limiting examples for determining the presence or absence of C1 and/or C2 are detecting nucleic acid sequence(s) encoding C1 and/or C2 protein using for instance a nucleic acid amplification reaction or detecting C1 and/or C2 protein using for instance Western blot analysis.

The present invention provides probes that are particularly suitable for detecting KIR genes, including KIR2DL2 and KIR2DL3. Thus, with probes according to the present invention selected from FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G KIR2DL2/KIR2DL3 heterozygosity in a sample is particularly well determined. Preferably probeset 415B/415C/415D and/or 417A/417B/417C and/or probeset 420A/420B, and/or 706A/706B as depicted in FIGS. 3C, 3D, 3F and/or 3G are used to detect KIR2DL3 and/or KIR2DL2 polymorphisms. With probes selected from FIG. 3 predisposition to Crohn's disease is thus particularly well determined.

Therefore, in one embodiment the invention provides a method for determining predisposition to IBD and/or Crohn's disease of an individual comprising determining the presence or absence of KIR2DL2 and/or KIR2DL3 in a nucleic acid sample of said individual with at least one probeset listed in FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G, and determining the presence of absence of HLA C1 and/or C2 ligand in a sample of said individual, wherein KIR2DL2, KIR2DL3 heterozygosity in combination with C1 homozygosity is indicative for a predisposition for Crohn's disease, and KIR2DL2, KIR2DL3 heterozygosity in combination with C2 homozygosity is indicative for protection for Crohn's disease. In a preferred embodiment probe set 415B/415C/415D and/or 417A/417B/417C and/or probe set 420A/420B and/or 706A/706B as depicted in FIGS. 3C, 3D, 3F and/or 3G are used for determining the presence or absence of KIR polymorphisms.

Copy number variation of KIR2DL3, KIR3DL1 and KIR3DS1 is correlated to the course of disease in chronic infection, such as retroviral infection, herpes virus infection, and hepatitis virus infection, more in particular HW, CMV, EBV, HSV, HBV and HCV (Martin et al 2007 and Khakoo et al 2004). A higher copy number of KIR3DL1 and/or KIR3DS1 in an individual is indicative for an improved course of the disease and/or response to treatment of chronic infection as compared with a low copy number of KIR3DL1 and/or KIR3DS1 in an individual and a low copy number of KIR2DL3 in an individual is indicative for an improved course of the disease and/or response to treatment of chronic infection as compared with a high copy number of KIR2DL3 in an individual. Thus, a higher copy number of KIR3DL1 and/or KIR3DS1 in an individual is indicative for an increased survival in chronic infection and a lower copy number of KIR2DL3 in an individual is indicative for increased survival in chronic infection.

The present invention provides probes that are particularly well suitable for determining copy number variation of KIR genes, including KIR3DL1 and KIR3DS1. Thus, with probes according to the present invention selected from FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G the copy number of KIR3DL1 and KIR3DS1 and KIR2DL3 in a sample is particularly well determined. Preferably probe sets 409A/409B/409C, and/or 711A/711B/711C/711D and/or 418A/418B/418D, and/or 709C/709D/709E/709G and/or probe set 415B/415C/415D and/or 417A/417B/417C as depicted in FIGS. 3C, 3D, 3F and/or 3G are used to estimate the copy number of KIR3DL1 and/or KIR3DS1 and/or KIR 2DL3. With probes selected from FIG. 3 susceptibility of an individual to course of disease and/or response to treatment in chronic infection is thus particularly well determined.

Therefore the invention provides method for determining susceptibility of an individual to course of disease and/or response to treatment in chronic infection, preferably retroviral infection, herpes virus infection, and hepatitis virus infection, comprising determining the copy number of KIR2DL3, KIR3DL1 and/or KIR3DS1 in a nucleic acid sample of said individual with at least one probeset listed in FIG. 3A or 3B or 3C or 3D or 3F or 3G, wherein a high KIR3DL1 and/or KIR3DS1 copy number in an individual is indicative for an improved course of disease and/or response to treatment of chronic infection as compared with a low copy number of KIR3DL1 and/or KIR3DS1 in an individual and a low KIR2DL3 copy number in an individual is indicative for an improved course of disease and/or response to treatment of chronic infection as compared with a high copy number of KIR2DL3 in an individual. Preferably said chronic infection comprises HIV, CMV, EBV, HSV, HBV and HCV. In a preferred embodiment probeset 409A/409B/709D/409C, and/or 711A/711B/711C/711D and/or 418A/418B/418D, and/or 709C/709E/709G and/or probe set 415B/415C/415D and/or 417A/417B/417C as depicted in FIGS. 3C, 3D, 3F and/or 3G are used for determining the copy number of KIR genes.

The presence of KIR2DS4 in a donor is correlated to transplantation-related outcome measures, such as mortality, graft-versus-host, graft-versus-tumor and grafted organ survival in recipients after transplantation. The presence of KIR2DS4 in a donor is indicative for reduced mortality, reduced graft-versus-host, increased graft-versus-tumor and increased grafted organ survival in recipients after transplantation as compared to the absence of KIR2DS4 in a donor. The present invention provides probes that are particularly well suitable for determining copy number variation of KIR genes, including KIR3DL1 and KIR3DS1. Thus, with probes according to the present invention selected from FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G the copy number of KIR2DS4 in a sample is particularly well determined. Preferably probe sets 504A/504B, and/or 708K/708L/708M/708N as depicted in FIGS. 3C, 3D, 3F and/or 3G are used to the presence or absence of KIR2DS4. With probes selected from FIG. 3 predisposition to transplantation-related outcome measures is thus particularly well determined

Therefore the invention provides a method for determining predisposition to transplantation-related outcome measures, such as mortality, graft-versus-host, graft-versus-tumor and grafted organ survival of a recipient after transplantation, comprising determining the presence or absence of KIR2DS4 in a nucleic acid sample of a donor for said recipient with at least one probeset listed in FIG. 3A or 3B or 3C or 3D or 3F or 3G, wherein the presence of KIR2DS4 in said donor is indicative for a reduced mortality, a reduced graft-versus-host reaction, an increased graft-versus-tumor reaction and an increased grafted organ survival in said recipient as compared to the mortality, graft-versus-host reaction, graft-versus-tumor reaction and grafted organ survival of a recipient with a donor wherein KIR2DS4 is absent. In a preferred embodiment probeset 504A/504B, and/or 708K/708L/708M/708N as depicted in FIGS. 3C, 3D, 3F and/or 3G are used for determining the presence or absence of KIR polymorphisms.

A correlation has been established between the copy number of KIR2DL2 and KIR2DS2 and rheumatoid arthritis (RA) with extra-articular manifestations and rheumatoid vasculitis. A higher copy number of KIR2DL2 and/or KIR2DS2 in an individual was demonstrated to be predisposing for rheumatoid arthritis with extra-articular manifestations and rheumatoid vasculitis (Majorczyk et al 2007, Yen et al 2001). Additionally, rheumatoid arthritis patients positive for KIR2DL3 and negative for KIR2DS3 had earlier disease diagnosis (Majorczyk et al 2007).

The present invention provides probes that are particularly well suitable for determining the presence or absence and copy number variation of KIR genes, including KIR2DL2, KIR2DS2, KIR2DL3 and KIR2DS3. Thus, with probes according to the present invention selected from FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G the presence or absence and copy number of KIR2DL2, KIR2DS2, KIR2DL3 and KIR2DS3 in a sample is particularly well determined. Preferably probe sets 420A/420B, and/or 706A/706B and/or probe set 703A/703B/703C, and/or 544A/544B as depicted in FIGS. 3C, 3D, 3F and/or 3G are used to estimate the copy number of KIR2DL2 and/or KIR2DS2. Preferably probe sets 415B/415C/415D and/or 417A/417B/417C and/or probe set 513B/513D and/or 540A/540C as depicted in FIGS. 3C, 3D, 3F and/or 3G are used to estimate the copy number of KIR2DL3 and/or KIR2DS3. With probes selected from FIG. 3 susceptibility of an individual to rheumatoid arthritis (RA) with extra-articular manifestations and rheumatoid vasculitis is thus particularly well determined.

Therefore in one embodiment the invention provides a method for determining predisposition to rheumatoid arthritis with extra-articular manifestations and rheumatoid vasculitis of an individual comprising determining the copy number of KIR2DS2 and/or KIR2DL2 in a nucleic acid sample of said individual with at least one probeset listed in FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G, wherein a high copy number of KIR2DS2 and/or KIRDL2 in said individual is indicative for a predisposition for rheumatoid arthritis with extra-articular manifestations and rheumatoid vasculitis as compared with a low copy number of KIR2DL2 and/or KIR2DS2 in an individual. In a preferred embodiment probeset 420A/420B, and/or 706A/706B and/or probe set 703A/703B/703C, and/or 544A/544B as depicted in FIGS. 3C, 3D, 3F and/or 3G are used for determining the copy number of KIR genes.

Finally, a correlation has been found between the presence or absence or copy number of KIR genes and predisposition to autoinflammation, such as HLA-B27-related enthesitis-related arthropathy and reactive arthritis, psoriasis, in individuals. For instance, KIR3DL2 is increased in spondylarthritides and juvenile enthesitis-related arthritis (Chan et al 2005, Brown 2009). The present invention provides probes that are particularly well suitable for determining the presence or absence and copy number variation of KIR genes. Thus with probes selected from FIG. 3 susceptibility of an individual to autoinflammation, such as HLA-B27-related enthesitis-related arthropathy and reactive arthritis, psoriasis is particularly well determined.

Therefore, in one embodiment the invention provides a method for determining predisposition to autoinflammation, preferably HLA-B27-related enthesitis-related arthropathy and reactive arthritis, psoriasis, in individuals comprising a) determining the presence or absence and/or copy number of a KIR gene indicative for said disorder in a nucleic acid sample of said individual with at least one probeset listed in FIG. 3A or 3B or 3C or 3D or 3F or 3G, and b) correlating the result obtained in step a) with presence or absence of said predisposition.

In another embodiment the invention provides a method for determining predisposition to spondylarthritides and/or juvenile enthesitis-related arthritis of an individual comprising determining the copy number of KIR3DL2 in a nucleic acid sample of said individual with at least one probeset listed in FIGS. 3A, 3B, 3C, 3D, 3F and/or 3G, wherein a high copy number of KIR3DL2 in said individual is indicative for a predisposition for spondylarthritides and/or juvenile enthesitis-related arthritis as compared with a low copy number of KIR3DL2 in an individual. In a preferred embodiment probeset 404A/404B, and/or 538A/538B/538D as depicted in FIGS. 3C, 3D, 3F and/or 3G are used for determining the copy number of KIR genes.

The invention is further explained in the following examples. These examples do not limit the scope of the invention, but merely serve to clarify the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1. A) Schematic outline of a conventional MLPA reaction. The figure is adapted from www.mpla.com. FIG. 1B) illustrates the use of two ligation sites in one probe set, to detect two SNP's at the same time with one probe set (a Tri-Lig probe) on a specific target sequence. If the correct SNP's are present at both ligation sites, the three probe parts will become ligated together to result in one PCR product, as shown at the bottom left. If an incorrect SNP is present on one or both ligation sites, no PCR product will be formed, as shown at the bottom right. FIG. 1C) illustrates the use of two ligation sites in one Tri-Lig probe, to detect one particular gene, KIR3DL1*024N, in the background of all other KIR3DL1 WT alleles at the first ligation site, and all other KIR genes at the second ligation site. The 1a probe detects all WT KIR3DL1 alleles (1a) whereas the 1b probe only detects the KIR3DL1*024N allele (1b), due to a different SNP at the first ligation site. The partial KIR gene sequences 2 to 12 are not detected by the 1a and 1b probes, because these probes are only specific for KIR3DL1 genes at the second ligation site due to a different SNP at the second ligation site.

FIG. 2. IUB nucleotide codes of degenerate bases

FIG. 3 KIR-specific probe sets. A) KIR probe mix 1. Bold and underlined nucleotides represent primer binding sites., B) KIR probe mix 2. Bold and underlined nucleotides represent primer binding sites., C) extended KIR probe mix 1. Bold and underlined nucleotides represent primer binding sites. KIR genes in which two SNPs are detected using one probe set according to the invention, consisting of three probes are depicted in FIG. 13, D) extended KIR probe mix 2. Bold and underlined nucleotides represent primer binding sites. KIR genes in which two SNPs are detected using one probe set of this probe mix, consisting of three probes are depicted in FIG. 13, E) control probe mix. Bold and underlined nucleotides represent primer binding sites, F) improved KIR probe mix 1. Bold and underlined nucleotides represent primer binding sites, G) improved KIR probe mix 2. Bold and underlined nucleotides represent primer binding sites, H) improved control probe mix. Bold and underlined nucleotides represent primer binding sites.

FIG. 4. The KIR protein structures. Depicted as large ovals are the extracellular Ig-like domains, as squares the ITIMs and as small light grey circles the charged residues on the cytoplasmic tail (IPD KIRdatabase). Inhibitory KIRs and activating KIRs are indicated by a “+” and “−”, respectively.

FIG. 5. Exon structure of KIR3DL1. Exons are depicted with black boxes and introns with lines and are draw approximately to scale (Vilches et al, 2002).

FIG. 6. The organization of KIR locus. a: Framework genes KIR3DL3, KIR2DL4 and KIR3DL2 are in black and are found at the beginning, near the middle and at the end of the locus. The pseudogenes KIR2DP1 and KIR3DP1 (which is also a framework gene) in white and black, respectively, and the regions between the framework genes are variable and these KIR genes are in grey, with activating KIRs with black letters and inhibitory KIRs in white. b: One example of haplotype A. c: An example of haplotype B (Parham et al, 2003).

FIG. 7. The pedigrees of 12 families from the KIR reference panel I (the families 1347 and 1349 are depicted in FIGS. 11 and 12, respectively). The four numbers on top of the pedigree is the CEPH family number and the numbers in the shapes is the individual number, these numbers correspond with the numbers in table 4. The letters below the shape indicates the haplotypes and can be found in the legend next to the pedigree.

FIG. 8. Electropherogram of probe set 1. The peak patterns of the probes on two donors: 8080 (top) and 5911 (bottom). All 17 KIR probe peaks are present on donor 8080 and 10 KIR probe peaks on donor 5911. In all donors the nine control probes (Ctr2-10) and the probes on the four framework genes: KIR3DL3, KIR3DP1, KIR3DL2, and KIR2DL4 (indicated with the black arrows) generated a signal. Electropherogram of probes set 2 were similar for these two probe groups (data not shown).

FIG. 9. Comparison of peak intensities of the probe 2DS2 (black arrows) between a true positive for KIR2DS2 (top) and a false positive (bottom).

FIG. 10. The peak profiles of the probes 2DL5 (left arrows) and 2DL5A (right arrows). Top: a sample which is positive for KIR2DL5 indicated by the presence of the peak from probe 2DL5 and the peak from 2DL5A cannot be distinguished in the presence of KIR2DL5A or 3DP1*004. Bottom: this sample is negative for KIR2DL5 indicated by the absence of the probe 2DL5 and the peak of 2DL5A indicates the presence of KIR3DP1*004.

FIG. 11. The pedigree of family 1347.

A) Left: The numbers of the individuals in top left pedigree correspond with the numbers of the DNA samples in the table. At the bottom the haplotype is denoted in letters and the legend for the haplotype is displayed below (www.ihwg.org). The CNV of some of the genes where quantified different by each of the two probe sets, the number before ‘/’ is for probe set 1 and after for probe set 2.

B1) Interpretation based on SSP-PCR data from CEPH-IHWG and the conventional KIR haplotype model (see also http://www.ncbi.nlm.nih.gov/projects/gv/mhc/xslcgi.fcgi?id=1347&cmd=kirped&locus_group=1).

B2) Novel haplotype model based on SSP-PCR data obtained from CEPH-IHWG (http://www.ncbi.nlm.nih.gov/projects/gv/mhc/xslcgi.fcgi?id=1347&cmd=kirped&locus_group=1).

B3) Copy number variation of KIR genes, determined using SSP-PCR data obtained from CEPH-IHWG based on the conventional KIR haplotype model (table 1) and the novel KIR haplotype model (table 2) and copy number variation of KIR genes, determined by KIR-MLPA using the extended probe sets 1 and 2 and the novel KIR haplotype model (table 3).

FIG. 12. The pedigree of family 1349.

B1) Interpretation based on SSP-PCR data from CEPH-IHWG and the conventional KIR haplotype model (see also http://www.ncbi.nlm.nih.gov/projects/gv/mhc/xslcgi.fcgi?id=1347&cmd=kirped&locus_group=1).

B2) Novel haplotype model based on SSP-PCR data obtained from CEPH-IHWG (http://www.ncbi.nlm.nih.gov/projects/gv/mhc/xslcgi.fcgi?id=1347&cmd=kirped&locus_group=1).

FIG. 13. Detection of KIR alleles and KIR copy number variation.

FIG. 14. Schematic representation of the process to design synthetic genes for the KIR MLPA calibrator. The process includes design of the sequence, design of the oligo's, assembly of the oligo's by means of ligation and amplification of the assembled gene.

FIG. 15. Agarose gels showing the PCR products after oligo assembly (step 1) and gene amplification (step 2). Left: The assembly and amplification of the entire 1.1 kb gene failed. Right: The gene was split into two smaller genes, A and B, with 180 bp overlap. These products were amplified successfully and subsequently combined to form one gene of 1,1 kb.

FIG. 16. pBlueScript is digested so that newly synthesized genes can be combined into one plasmid, resulting in the KIR MLPA calibrator.

FIG. 17. Probes designed to recognize specific KIR genes, based on variation of one or two specific base pairs. The probe binding sites are arranged to prevent interference: either with a spacer (nonsense sequence) between the binding sites, or by alternating between the binding site between the sense and anti-sense strands.

FIG. 18. KIR haplotype of CEPH families 3144 (A) and 3149 (B).

FIG. 19. Sequence of KIR MLPA calibrator. Small letters represent the backbone of the pBlueScript plasmid and capitals represent nucleotide sequences which are inserted into the plasmid as described in Example 4, including KIR (pseudo)gene specific sequences, control gene specific sequences, and spacer sequences.

EXAMPLES
Example 1

This Example presents a new method for KIR genotyping.

KIRs are expressed by natural killer (NK) cells and a subset of T cells. NK cells are cells of the lymphoid lineage, but display no antigen-specific receptors. Their main function is to monitor host cells for the presence of MHC class I molecules and this is important for e.g. distinguishing healthy cells from virus-infected or tumors cells. A low expression of MHC class I molecules on host cells, which may for instance occur during viral infections as a result of virus-mediated down regulation to prevent presentation of viral peptides to CD8 T cells, stimulate NK cells to launch cytotoxic attack. This phenomenon is also known as the “missing self” theory.

NK cells express a variety of receptors that mediate interactions with MHC class I molecules, including members of the KIRs and CD94/NKG receptor multigene families. Interaction between MHC class I molecules and these receptors regulates NK cytotoxicity generally through the generation of inhibitory signals. The composition between KIR and CD94/NKG families of humans and mice differs considerably, with KIRs constituting the most in genetic and gene number variation in man.

KIRs were first discovered in their role in fighting virus infections by natural killer cells, but they are also expressed by a subset of T cells. The KIR gene cluster is located at chromosome 19q13.4 within the leukocyte receptor complex (LCR) and spans a region of about 150 kb. Up to 15 genes plus two pseudogenes have been identified to date. Characteristic of the KIR gene cluster is the variable gene content and an extensive degree of allelic gene variants. The gene content between unrelated individuals can differ considerably in the amount of KIR (pseudo)genes present, but also in the numbers of activating and inhibitory (pseudo)genes. Contractions and expansions by non-reciprocal recombination are the major mechanism behind KIR diversification. KIRs can be divided into two haplotypes, A and B in which haplotype B has a greater variety in gene content and contains more activating KIR genes. Studies of different ethnic populations show significant differences in the distribution of these two haplotypes. The selective pressures, such as exposure to different pathogens and rapidly evolving MHC class I molecules appear to be the forces behind such a gene diversification. A functional analog is the Ly49 gene family in mice, but KIRs and Ly49 are structurally distinct proteins. KIRs have been identified in different primate species, but they are species-specific and differ in gene content among various species. These findings provide evidence for a rapid evolution and expansion of this gene family.

Another level of relevant variation is the level of expression of KIRs by individual NK cells. Each NK cell expresses only a subset of its KIR gene repertoire and the presence of HLA ligands seems to influence the frequency of NK cells expressing the cognate ligand. A higher frequency of NK cells expressing inhibitory KIRs in individuals have been found, when their cognate HLA ligand is present. The ligands of some KIRs, in particular those with activating potential remain to be determined.

Some of these activating KIRs seem to have lower affinity for their cognate HLA class I ligands in comparison with their related inhibitory receptors.

KIRs have been associated with several diseases, but due to the genetic diversity between and in populations and the differences in KIR expression by NK cells, a clear understanding of their role has yet to be defined. KIRs have been reported to play a role in allogeneic hematopoietic stem cell transplantation (HSCT), which is used in the treatment of leukemia. It was suggested that an intentional mismatch between donor KIR and recipient HLA ligands would allow for a graft anti-tumor effect. KIR3DS1 and KIR3DL1 have been reported to be associated with slower progression to AIDS and several other virus infections, such as Hepatitis C virus (HCV), human cytomegalovirus (CMV). Also the protozoan infection with Plasmodium falciparum implicated roles for KIRs in malaria. In autoimmune and inflammatory conditions, certain KIRs and cognate ligand potentially results in higher susceptibility or protection of the host.

The KIR Gene Cluster

The KIR acronym originally stood for killer cell-inhibitory receptor, because the first KIR discovered had an inhibiting effect on NK cells. To date, KIR is an abbreviation for Killer-cell Immunoglobulin-like Receptor, as this family includes both inhibitory and activating receptors. The HUGO Genome Nomenclature Committee (HGNC) is responsible for the naming of KIR genes. Currently KIR gene family consists of 15 genes and 2 pseudogenes, listed in Table 1 (Marsh et al, 2002). KIR genes are named after the protein structure they encode. The “D” denotes “Domain” and the number 2 or 3 before it indicates the number of extra cellular Ig-like domains. “L” indicates a “Long” cytoplasmic tail and “S” indicates a “Short” cytoplasmic tail and the “P” indicates a “pseudogene”. The number behind the letter L or S denotes the gene encoding for this structure. Thus KIR2DL1 encodes for a structure with two Ig-like domains and a long cytoplamic tail. KIR2DL5A and KIR2DL5B are exceptions; they were initially identified as one gene KIR2DL5. However these two structurally similar variants are discovered to be located on different regions of the KIR gene cluster and can be inherited separately (Gomez-Lozano et al, 2002).

The KIRs that possess long cytoplasmic tails transduce inhibitory signals to the NK cell, owing to the two immunoreceptor tyrosine-based inhibitory motifs (ITIMs) (FIG. 4). Binding of these receptors with HLA class I molecules leads to phosphorylation of the tyrosine residues within the ITIM. Tyrosine phosphatase (SHP-1) is then recruited and activated by the ITIM and prevents or inhibits phosphorylation events which are associated with cellular activation. NK-cell mediated cytotoxicity and cytokine secretion inhibition are the main downstream effects. Short cytoplasmic tails lack the ITIM and possess a basic charged amino acid, such as lysine in the transmembrane domain. This positively charged amino acid residue allows association with an adaptor molecule, such as DAP12. DAP12 has one immunoreceptor tyrosine-based activation motif (ITAM). When the tyrosine residues in the ITAM are phosphorylated a docking site for SH2 domain of ZAP70 and Syk tyrosine kinase is generated. The action of these kinases triggers a downstream transduction cascade that promotes NK-mediated cytolysis (Middleton et al, 2005). KIR2DL4 is unique among KIRs, as it possesses a long cytoplasmic tail with a charged amino acid arginine in the transmembrane region. KIR2DL4 might therefore be capable of eliciting both activating as well as inhibitory signals.

Exon and Intron Structure

The KIR3DL1 and KIR3DL2, with three extracellular Ig-like domains represent the prototypical KIR from which all the others can be derived. KIR genes are organized in nine exons, the order of these exons corresponding to the different functional regions of the protein (FIG. 5). The first two exons encode the signal peptide, exons 3, 4 and 5 encode the Ig-like domain, D0, D1 and D2, respectively. Exon 6 encodes the stem or linker that connects the D2 domain with the transmembrane region that is encoded by exon 7. Exons 8 and 9 encode the cytoplasmic tail. Type 1 KIRs have two Ig-like domains D1 and D2, KIR2DL1-3 and KIR2DS1-5. The protein products of type 1 lack the D0 domain because exon 3 is a pseudo-exon. This exon is spliced out of the RNA transcript, possibly due to a three-base-pair deletion. Type 2 KIRs have the D0 and D2 domains, KIR2DL4-5, exon 4 is absent in these KIR genes, resulting in a protein without D1 domain.

In KIR2DP1 exon 3 is a pseudoexon and exon 4 has an early stop codon. If KIR2DP1 would be transcribed this could result in a KIR protein with only a single Ig (D2) domain. In KIR3DP1 exon 2 is missing due to a deletion. The exons encoding for the stalk, TM and cytoplasmic regions are also absent. The three exons coding for the Ig-like domains are intact, however the leader sequence is missing. No transcripts have been found for KIR2DP1 (Trowsdale et al, 2001) and KIR3DP1, the latest one is normally silent, but a recombination of KIR2DL5A and KIR3DP1 have been found to be transcribed and is predicted to be secreted rather than anchored to the cell membrane (Gomez-Lozano, 2005).

Genotypes

Uhrberg et al. (Uhrberg et al, 1997) identified that the KIR locus in humans appeared to be polygenic and polymorphic. Individuals have a variable KIR gene content, achieved through differences in number of total KIR genes and differences in the amount of activating and inhibitory KIR genes. The mechanism behind the KIR diversification is non-reciprocal recombinations between non-allelic genes leading to expansion and contractions of the KIR locus. Also reciprocal crossing over events are postulated to contribute to the diversity. The KIR locus can be separated into two parts with KIR3DL3 on the centromeric end and the central KIR3DP1 on one half, and KIR2DL4 in the central and KIR3DL2 on the telomeric end on the other half. Inside these two parts of KIR locus, genes are located that are in much stronger linkage disequilibrium, supporting a homologous recombination event (Uhrberg 2005).

Studies worldwide using genomic DNA to determine the presence or absence of KIR genes in populations have contributed to an extensive amount of KIR-genotype profiling data. These studies show a difference in frequency of KIR genes in populations of different ethnic backgrounds and can be found on www.allelefrequencies.net. The methods used for KIR genotyping are polymerase chain reaction with sequence-specific primers (PCR-SSP), sequence-specific oligonucleotide probes, PCR (PCR-SSOP), multiplex PCR, automated sequencing and mass spectrometry.

Haplotypes

KIR genes can be divided in the haplotypes A and B (Carrington et al, 2003). Both haplotypes contain the framework genes KIR3DL3, KIR3DP1, KIR2DL4 and KIR3DL2. These genes are conserved and are virtually present in every individual. Haplotype A is uniform in terms of gene content and is composed of five inhibitory genes (KIR3DL3, KIR2DL3, KIR2DL1, KIR2DL4KIR3DL1 and KIR3DL2, and only one activating KIR2DS4, as shown in FIG. 6. However the central framework gene KIR2DL4 may have an activating function. On the other hand, there are haplotypes A that possess null variants of both KIR2DS4 and KIR2DL4 that are not expressed on the cell surface and technically these haplotypes contain virtually no functional activating KIR.

Haplotype B is more variable than haplotype A and is characterized by one or more of the following genes: KIR2DS2, KIR2DL2, KIR2DL5, KIR2DS3, KIR3DS1, KIR2DL5A, KIR2DS5 and KIR2DS1, conversely haplotype A is characterized by the absence of these genes. The frequency of both haplotypes is relatively even among populations of different ethnic background. It is possible that some haplotypes cannot be placed in these two categories, as the definition of haplotypes varies between authors and hybrids of haplotypes are possible (Vilches et al, 2002). Distinction between A and B haplotypes is useful in biological and medical settings, as haplotype B have more genes that encode for activating KIR than haplotype A. The haplotypes have been constructed by family segregation analysis, genomic sequencing and gene-order analysis (Shilling et al, 2002). FIG. 6 depicts the organization of a KIR locus.

Gene Variation

Adding another level of genetic diversity to the KIR family is the extensive degree of gene variations, which are exhibited by all KIR genes. Allelic diversity is generated by substitutions of nucleotides, recombination or gene conversion and point mutations. Activating KIRs and inhibitory KIRs share a high sequence homology. Activating KIRs are believed to be derived from inhibitory KIRs by alterations in sequence, creating a charged residue upstream of a stop codon and an elimination of ITIMs. Due to their younger evolution, allelic diversity of activating KIRs is quite limited when compared to inhibitory KIRs, but the variation of activating receptors across ethnic populations is more extensive.

Currently a total of 335 KIR alleles have been identified and can be found at the website: http://www.ebi.ac.uk/ipd/kir (table 2). KIR allele sequences are denoted by an asterisk after the gene name. Differences in the encoded protein sequences are distinguished by the first three digits, the next two digits are used to denote alleles that differ by synonymous differences within the coding sequence (i.e. not resulting in amino acid substitutions) and the last two digits are used for alleles that have differences in the noncoding region, such as introns and promoters. Thus, 3DL1*009 and 3DL1*010 are alleles that encode different protein products and 3DL 1*00101 and 3DL1*00102 are alleles that encode the same protein product, but these alleles differ by a synonymous DNA substitution within the coding region (Marsh et al, 2002).

Expression and HLA

The ligands for inhibitory KIRs are MHC class I molecules, which are constitutively expressed by most healthy cells, but can be down-regulated in tumors and infected cells allowing killing by NK cells. Interaction of MHC with inhibitory receptors ensures tolerance of NK cells towards self. MHC class I molecules are encoded by human leukocyte antigen (HLA) genes that are located at chromosome 6p21.3 and are polymorphic and display significant variations. KIR genes and HLA genes segregate independently during meiosis, because they are located on different chromosomes. This can lead to interesting HLA and KIR combinations inherited by one individual, but to obtain a functional interaction between receptor and the cognate ligand, they need to be expressed together. This raises the question whether a correlation exists between the genes encoding KIR and HLA. The ligand specificity for activating KIRs is not well defined. The ligands of some activating KIRs have not been identified yet. The activating receptors of KIR2DS2 and KIR2DS1 were reported to have a lower affinity of binding to HLA-C than those of their closely related inhibitory receptors. It is also possible that non-HLA ligands exist for these activating KIRs. The KIRs with a defined cognate ligand are presented in table 3.

The KIR surface protein repertoire in an individual is mainly determined by the KIR genes. Hence, a lack of expression is more likely caused by the lack of that gene than by a down-regulation. KIR genes are expressed by NK cells in a clonal manner, each individual NK cell within a person possesses a different combination of KIRs, with a subset of the total KIR gene repertoire being expressed on each individual. KIR2DL4 is one notable exception; this gene is ubiquitously expressed on NK cells. The frequency of each expressed KIR may differ between individuals, but is stable over time. For example the gene KIR2DL1 may be expressed on 50% of the NK cell population of individual A, while in individual B the expression of KIR2DL1 is found to be 14% of its NK cell population. One explanation for this difference could be that particular alleles of a gene are expressed more frequently due to the presence of multiple copies of a gene.

This Example presents a new method for KIR genotyping with multiplex ligation dependent probe amplification (MLPA). With this method a rapid and convenient way of KIR genotyping is performed and also the relative number of copies of the KIR genes is quantified. Copy number variation (CNV) accounts for a substantial amount of genetic variation, resulting in significant phenotypic variations in e.g. transcript levels and therefore are of functional relevance.

We developed two synthetic MLPA probe sets for the typing of 16 out of the 17 KIR genes KIR2DL1-5, KIR2DS1-5, KIR3DL1-3, KIR3DS1, KIR3DP1 and KIR2DP1. The probes for the KIR genes were designed for different loci to detect most of the alleles. Probesets 1 and 2 are listed in FIGS. 3A and 3B. The specificity of the probes was validated by comparison of the samples for the KIR genotypes obtained with PCR-SSOP and PCR-SSP methods, and the ability of the probes to quantify relative gene copy numbers was examined with 12 families, each consisting of two parents and two offspring, which have been genotyped for most KIR alleles.

Materials & Methods
DNA Selection/Isolation

DNA from unrelated randomly selected Caucasian donors was obtained for this study to test the peak profile of the probes. For the validation of the probes five SSP-PCR KIR typed genomic DNA samples and 11 EBV transformed B cell lines from the 10^thInternational Histocompatibility Workshop were used (Cook et al, 2003), JVM, T7507, OLGA, SAVC, JBUSH, BM16, LBUF, AMALA, BM90, TAB089 and KAS116. The KIR Reference Panel I from the IHWG containing 48 samples from 12 Centre de'Etude du Polymorphism Humain (CEPH) families □ including 2 parents and 2 children (table 4: KIR typing of the 48 samples and FIG. 7: the pedigrees) □ also served this purpose, but its main purpose was to determine the ability of copy number quantification of the probes. Genomic DNA and the DNA from the Cell lines were isolated with Qiagen (blood kit) according to the manufacturer's instructions.

Probe Design

Probes were designed according to general instructions (www.mlpa.com/protocols.htm). All the probes were manufactured by Invitrogen (Carsblad, Calif.). The sizes of the probes after ligation (“ligated probes”) are spaced four to five nucleotides apart, to separate each amplification product on the sequence type gels, amplification product size ranged from 95 to 223 nucleotides. All MLPA probes contain a PCR primer sequence, which is recognized by a universal primer pair. PCR primer sequences were: forward 5′-GGGTTCCCTAAGGGTTGG-3′ and reverse 5′-TCTAGATTGGATCTTGCTGGCAC-3′.

The KIR probes were designed to identify and discriminate between the 17 KIR genes listed in table 1, with exception of KIR2DL5B. No specific probe could be designed for this gene. The probe for KIR2DL5 now, detects both KIR2DL5A and KIR2DL5B genes. In addition probes on alternative sequences and intron sequences were designed, using basic local alignment sequence tool searches and the IPD/KIR Database, http://www.ebi.ac.uk/ipd/kir. The sizes of the KIR probes can be found in tables 5 and 6.

The targets of the nine control probes are on conserved genes in the human genome, FGF3, BCAS4, LMNA, PARK2, MSH6, GALT, SPG4, IL-4 and NF2. These target genes were tested to show no considerable variation between donors in a previous MLPA study at Sanquin. Control 1 and 10 were initially 88 bp and 130 bp respectively, but have been elongated to 180 bp and 223 bp to distribute the control probes more evenly among the KIR probes. Table 7 shows the list of the genes and the sizes of the control probes.

Competitor probes are designed where the signal of the probe was off-scale to be detected by the capillary electrophoresis apparatus and are listed in table 8.

MLPA Reaction

All DNA samples were diluted to 20 ng/μl with water and 5 μlwas denatured at 98° C. for 5 minutes in 200 μl tubes in a Biometra T-1 Thermoblock with heated lid. MLPA reagents (EK kit 5) were obtained from MRC-Holland (Amsterdam, The Netherlands). SALSA MLPA buffer (2 μl) and 1-10 fmol of each MLPA probe in a probe mixture (1 μl) were added and incubated for 1 minute 95° C., followed by 16 hours at 60° C. in a total volume of 10 μl. Ligation of the hybridized probes was performed by reducing the temperature to 54° C., before adding 32 μl Ligase-65 mix (3 μl ligase buffer A, ligase buffer B, 1 μl Ligase-65 and 25 μl water) and incubated for 15 min. After inactivating the enzyme at 98° C. for 5 min, 10 μl of the ligase mix was diluted with 4 μl PCR Buffer and 26 μl water at 4° C. in 200 μl tubes. For the PCR reaction, 10 μl of polymerase mix (0.5 μl polymerase, 2 μl SALSA enzyme dilution buffer, 2 μl SALSA PCR-primers and 5.5 μl water) was added at 60° C. PCR amplification of the ligated MLPA probes was performed for 36 cycles (30 sec 95° C., 30 sec 60° C., 60 sec 72° C.) followed by an incubation for 20 min at 72° C.

Electrophoresis

1 μl PCR product is added in new tubes containing 0.4 μl Promega Rox size standard 60-400 bp+8.6 μl High Definition buffer. The products are separated by Applied Biostystems Genetic Analyzer 3130XL capillary electrophoresis according to its molecular weight and the resulting electropherogram show specific peaks that correspond to each probe.

Analysis

Data were visualized with Genemapper v3.6 and normalized with Soft genetics Genemarker v1.6, using internal control probe normalization (http://www.softgenetics.com/papers/MLPA). Finally these data was exported to an Excel file.

Results
Detection of Probe Signal

All the MLPA probes were initially tested on randomly chosen donors. We first examined if the probes would generate a signal and if these signals corresponded with the expected size of each probe. The control probe peaks and the probe peaks for the four framework genes, KIR2DL4, KIR2DL3, KIR3DL3 and KIR3DP1, occurred in all samples, as expected. KIR gene content variation between individuals was observed when different samples were compared, FIG. 8. The probe intensity is denoted by arbitrary units (AU) on the y-axis and the probe size is expressed on the x-axis in basepairs (bp). We used the peak height to quantify the data, while others may suggest probe area.

Secondly, the intensity of the probe signal was examined. The peak patterns were visualized with Genemapper, to observe the peak intensities before normalization. Genemarker is used to normalize the data and correct this for the decay of larger probes, but does not indicate where signals are off-scale. It is preferred to have a probe signal between 500-6000 AU in order to obtain a more reliable DQ value. Moreover fluorescent peaks with a signal less than 500 AU may not always be detected when more probes are added to the reaction. Fluorescent peaks above 6000 AU can be off-scale to be detected by the sequencer and decrease the signal of other probes relatively. Several suggestions are described to enhance or lower probe intensity, the nucleotide composition next to the PCR primer tag sites and/or the GC content of a probe are a few factors that can be of influence (www.mlpa.com/protocols.htm). In general competitors are used for reduction of probe signals and a higher probe concentration for an increase in signal. Competitors are oligonucleotides that are identical to a part of the MLPA probe without the forward or reverse primer sequence, depending whether the left or right part is chosen. Competitors compete with the MLPA probe for the same target, however no amplification of these ligated probes will occur, since they lack a primer sequence. The result is that less probe amplification product will be detected and lower peak intensity is obtained.

Competitors were designed for control probes 2, 3, 4, 7 and 9 and in the first place also for the KIR probes 2DL4, 3DL3 (probe set 1) and 3DL2 (probe set 2) These probes had a length of 96 bp, 100 bp and 108 bp, respectively. However we observed a decrease in peak intensity, more or less corresponding with an increase in probe size. Longer synthetic probes are more likely to contain a higher proportion of incomplete oligonucleotides. Therefore it seemed to be an option to elongate the length of probes with high peak intensities and to shorten this for probes with low peak intensities. Probe 2DL4 was redesigned to 170 bp and 3DL3 to 154 bp and lower peak intensities were the result. The peak generated by probe 3DL3 (100 bp) was not affected by its competitor and was apparently a product of the probe 2DS3 (108 bp), because when this probe was removed from the probe set 1, the off-scale signal reduced to normal. Furthermore competitors with a length of 30 bp had less effect than those with a length of 50 bp, in which case a higher dosage was needed to reduce the probe signal (data not shown).

For probes that failed to generate a signal or for which the signal was insufficient, the followings have been performed; a three- to ten-fold concentration of these probes was used and probes that have a high overlap in sequence were not included in one probe set. Placing two cytosine nucleotides after the forward primer should increase the probes signal and a tyrosine base should decrease this, reported in the MLPA design protocol. However in our experiment, several probes were redesigned to contain two cytosines after the forward primer and this did not produce the same results. Probes that still failed to generate a signal after the aforementioned proceedings and testing on lager number of donors were replaced by probes on the reverse strand of the target gene or by probes that have a different target location on that gene.

The frequencies of each KIR gene probe peak on the tested samples were compared with the KIR gene frequencies in Caucasian population available on www.allelefrequencies.net (table 9). Probes with observed frequencies that were contradicted by the population frequencies were assumed to give false negative or false positive results and were replaced by new designs. These were assumed to be caused by gene variation at the ligation sites of the probe.

The list of the alleles that can be detected by the KIR probes and the coverage of the total KIR alleles by the probes are shown in table 10.

Other Factors Interfering with Peak Intensities

Probe Quality

We experienced differences probe quality by probes that were manufactured at different companies. The nine control probes were initially ordered from Biolegio (www.biolegio.com) which had also supplied these for the C4 MLPA project previously done here. All the KIR MLPA probes were ordered at Invitrogen (www.invitrogen.com). The control probe set was separated in two mixes, control probes 1 (IL-4), 2 (FGF3), 3 (BCAS4), 4 (LMNA), 5 (PARK2) and 7 (MSH6) in one and the control probes 8 (GALT), 9 (SPG4) and Ctrl 10 (NF2) in the other. The concentration needed for each control probe varied and ranged from 0.5 fmol to 6 fmol and also different concentrations of competitors were needed.

The control probes used for the KIR MLPA were ordered from Invitrogen. Only 1 fmol is needed for each control, with the exception of control probe 5 (3 fmol) in order to obtain the same peak intensity as mention above and the probes do not need to be separated into two mixes. Due to the better probe quality, time is saved in producing the probe sets.

Template DNA Amount

A MLPA reaction with 50 ng of DNA was performed and compared with 100 ng that is used throughout this study. MLPA reactions using a DNA amount of 20 ng have been reported by Schouten et al. (Schouten et al, 2002). When the peak profiles were compared, no striking differences between these two reactions were observed. The DQ of the nine control probes were calculated for each sample and a sample with 100 ng DNA was taken as reference. Seven out of eight samples containing 50 ng of DNA showed a DQ value outside [0.8-1.2] for more than three control probes, ranging from [0.3-1.5] within one sample. While all the eight samples of 100 ng DNA had DQ within the acceptable range [0.8-1.2] for all the nine control probes, with exception of one sample that had two control probe DQ value outside this range. Here we conclude that MLPA reactions with different amounts of DNA cannot be compared with each other, because the DQ values of the same sample did not yield the same score with the different DNA amounts.

Next the samples of 50 ng of DNA were compared among, by taking a sample of 50 ng DNA as reference. The observation was that three of the eight samples had more than three control probes with a DQ value out of the range of [0.8-1.2]. When the nine control DQ values of one sample were analyzed, values between [0.5-1.7] were found. Therefore MLPA reactions carried out with 50 ng of DNA were considered to be unreliable, as the DQ values of the probes showed a great variation between the samples and within one sample, which was not observed with the samples that contained 100 ng of DNA. The requirement of higher amounts of DNA for this study could be explained by the fact that we are using a completely synthetic probe set in contrast with the probe sets used by Schouten et al (Schouten et al, 2002). Moreover most studies that were carried out with little amount of DNA often only analyzed chromosomal abnormalities, such as recombination or mutations and did not quantify copy numbers.

Reproducibility

Samples of different runs were not always comparable, when the DQ of the control probes were calculated. The explanation is that the experimental conditions may vary with each run, due to human acting or differences in probe signal reproducibility. Therefore, samples within the same run are preferably normalized and analyzed first before comparing the data with samples of a different run. Reference samples with a more or less established relative gene copy numbers, are preferably included in each experiment to act as reference.

Validation with KIR Typed DNA Samples

The specificity of the KIR probes was verified by testing 11 EBV-transformed cell lines, which were KIR-genotyped by the 10^thInternational Histocompatiblity Workshop (IHW) (Cook et al, 2003). The cell lines were KIR-genotyped using PCR-SPP and PCR-SSOP and were carried out in three separated laboratories. The cell lines were not genotyped for the genes KIR2DL5A, KIR3DL3, KIR2DP1 and KIR3DP1 and also contained no negative controls for the genes KIR2DL1, KIR2DL4, KIR3DL1, KIR3DL2 and KIR2DS4.

In addition, DNA samples from 5 individuals were genotyped by PCR-SSP for further verification. These 5 samples were also genotyped for the genes KIR3DL3 and KIR3DP1 and found to contain true negative genotypic results for KIR2DL1 and KIR2DP1. The results of the verification of the two probe sets are shown in tables 11-14.

Probe Set 1

KIR genotyping with probe set 1 was found to be consistent with the 10^thIHW on 10 of the cell lines for the probes 2DL1-5, 2DS1, 2DS3-5, 3DL1-2 and 3DS1. All cell lines were typed positive for the genes KIR2DP1, KIR3DP1 and KIR3DL3, the first has a frequency between 94-100% (table 9) and the last two are framework genes that are always present. Typing of the 5 individuals yielded the same results as with the PCR-SSP, except for the probe 2DS2.

Probes for 2DL5A (Same Probe in Probe Set 2)

Most studies on KIR genotyping detect the presence of KIR2DL5 and do not differentiate this gene between the two genes KIR2DL5A and KIR2DL5B. These two genes show a nucleotide sequence difference of only 1%. We were unable to design a probe for KIR2DL5B, because a specific ligation site to discriminate KIR2DL5B from KIR2DL5A and the other KIR genes was not found. The probes that were designed for KIR2DL5A also detect the allele KIR3DP1*004 (table 10), because this allele contains no other difference in the sequence within the probe's range, thus the probe sets do not contain specific probes for the selective detection of KIR2DL5A. In fact, KIR3DP1*004 is non-expressed, and forms a hybrid of the promoter of KIR2DL5A and the coding region of KIR3DP1. When probe 2DL5A generates a signal in the MLPA, this could indicate the presence of both KIR2DL5A and KIR3DP1*004 or either 2KIRDL5A or KIR3DP1*004 alone. However, probe 2DL5 detects the same KIR2DL5A alleles as probe 2DL5A. When probe 2DL5 is not binding and probe 2DL5A is, the absence of KIR2DL5A and the presence of KIR3DP1*004 is demonstrated. This is clearly demonstrated by the cell lines JVM, SAVC, JBUSH, BM16, TAB089, KAS116 and the individuals 33_—8025 and 33_—8588 (FIG. 10).

Probe Set 2

Probe set 2 contains a smaller proportion of probes. A higher proportion of the probes had overlapping sequences and seven out of the ten KIR probes needed a 10-fold higher concentration than the others to obtain peak intensities above 500 AU.

Probe 2DS5 and 3DS1

Probes 2DS5 and 3DS1 bound to all samples including to those genotyped negative for KIR2DS5 and KIR3DS1, indicating unspecific ligation of the probes. Probes 2DL5 and 3DS1 were not based on primer sequences used before, the probe search tool on the KIR database and BLAST results showed no match with other KIR genes and these probes were considered to be specific for KIR2DS5 and KIR3DS1. No explanation could be found, why these probes gave false positive results. These probes were excluded from probe set 2.

Probe 2DS1

Three out of the six negative cell lines for KIR2DS1 were typed positive by this probe, while the two negatives from the PCR-SSP-typed individuals were correctly typed. Probe 2DS1 target is on an intron and only little information about intron sequences is available. The fact that other KIR genes may possess the same sequence at this position, cannot be excluded and therefore this probe is not included in the probe set.

Probe 3DP1

The probe 3DP1 in probe set 2 detects a deletion of exon 2, this allele of KIR3DP1 is designated as KIR3DP1*003 and has a frequency of 0.72 in the Caucasian population. Sample 33_—8588 of the PCR-SSP typed individuals was typed negative for KIR3DP1 by the MLPA probe and positive by PCR-SSP (table 14). The conflicting typing results between these two methods can be explained by the presence of exon 2 in this sample.

Cell Line LBUF

Both probe sets have genotyped this cell line positive for KIR2DL3 and negative for KIR2DL5 and KIR2DS. In addition, probe set 1, typed LBUF negative for KIR2DS1, KIR2DS5 and KIR3DS1 (table 11 and 13). It is reasonable to assume that the cell line LBUF that was tested, was not the same as published before by the 10^thIHW. LBUF had been KIR-genotyped by Hsu et al. 2002 (Hsu et al, 2002) and their typing was consistent with ours. Moreover, LBUF and the other cell lines was KIR-genotyped with the standard PCR-SSP method and these results confirmed our findings with MLPA, including the positive typing results of the genes KIR3DL3, KIR2DP1 and KIR3DP1 on all 11 cell lines.

Quantification of Gene Copy Numbers

For the verification of gene copy number quantification, samples with a well-defined number of copies of KIR genes were needed. Since these are not available, we used the KIR reference panel I for this purpose, comprising 12 families of two parents and two children each. These 48 reference samples have been KIR-genotyped by 15 different laboratory groups utilizing PCR-SSP and PCR-SSOP. The Centre de'Etude du Polymorphism Humain (CEPH), Foundation Jean Dausset, Paris, France (www.cephb.fr), had prepared lymphoblastoid cell lines (LCLs) of these families. The International Histocompatibility Working Group (IHWG) Cell and DNA Bank has made this panel available for commercial use (www.ihwg.org).

All the samples have been identified for the presence or absence of 16 of the KIR genes and for two variants of KIR3DP1, (KIR3DP1*003 and KIR3DP1v) and two variants of KIR2DS4 (KIR1D alias KIR2DS4*003 and KIR2DS4) (table 4). Whereas, KIR3DP1 of the KIR reference panel I is characterized by the absence of exon 2 and the KIR3DP1v indicates the remaining KIR3DP1 alleles. KIR1D contains a 22-bp deletion in Ig-like domain D2, causing a frame shift and early stop codon which lead to a truncated protein product (Hsu et al, 2002).

The haplotypes of these six families were also available as shown in FIG. 7. In addition this figure shows the pedigrees of the 12 families. Because of the information about the haplotypes, we could assume that some samples exhibit at least two copies of KIR genes. The inheritance patterns of these copy numbers was deduced from the pedigree information. The reference panel has at the same time been utilized as an extra verification step for the specificity of the probes.

Specificity in KIR Genotyping

With both probe sets difficulties were experienced with generating reliable data of the MLPA experiments with the KIR reference panel, presumably this is caused by the lower quality of the DNA samples, as this did not occur with the genomic DNA samples of the previous experiments. The DQ values of the control probes had a higher frequency outside the proposed normal range [0.8-1.2]. Therefore, data of a number of samples is missing and these samples should be tested in the future.

Probe Set 1

16 probes: 2DL1-5A, 2DS1, and 2DS3-5, 3DL1-3, 3DS1, 2DP1 and 3DP1 were tested and the majority of the probes genotyped the KIR reference panel accordingly to what has been reported, except there were some differences with probes 2DP1 and 2DL5. These samples were correctly typed by probe set 2.

Probe Set 2

The probes: 2DL1-5A, 2DS2, 2DS4, 3DL1-3, 3DS1, 2DP1 and 3DP1, in total 14 probes were tested on the reference panel. Probe 3DP1 was designed for KIR3P1*003 (denoted as 3DP1 in table 4) and its specificity for this allele was confirmed with the reference panel. Probe 2DL2 typed approximately 58% false positive and probe 2DL1 typed three of the four negative of the panel to be positive and, therefore, no further testing has been done with these two probes. Probe 2DS2 typed around 15% incorrectly as negative, although in a previous run which was rejected because of the DQ values of the controls, these two samples were typed positive. These samples need to be revised before a conclusion about probe 2DS2 can be drawn. Probe 2DS4 gave one false negative result (sample 1333-8281). Only 80% of the KIR2DS4 alleles can be detected by this probe because of a gene variant that is 4 bases away from the ligation site in 1 out of 9 alleles. The right part of this probe will be redesigned with an UIB code on this position.

Quantification of CNV

Probes that have been demonstrated to be accurate in KIR genotyping in both probe sets have been analyzed for their ability in copy number quantification. Relative quantification of CNV with one probe is simply not reliable because gene variations near the ligation site of the probe may influence the outcome in DQ value. This is especially true for KIR sequences, because they show a high level of gene variation, while demonstrating a homology up to 99%. Certain probes discriminate the different KIR genes only by one nucleotide difference at their ligation site. A gene variant near the ligation site of the target gene may lead to a lower probe signal. Alternatively, a gene variant at one of the other KIR genes might cause a probe to recognize this gene as its target, thus enhancing the probe signal. Therefore only the KIR genes of the families with the reported haplotype and the complete MLPA data of the two probes are analyzed for copy numbers.

The DQ values of the control probes of both probe sets on each sample were compared to check if the MLPA data are reliable. The nine control probes should generate the same DQ values as these control probes are the same in both probe sets and are tested on the same sample. Samples with less than seven comparable control probe DQ values between the two probe sets were excluded. Next, the DQ values of the KIR probes were evaluated. We interpreted the following; DQ values of 0.3< as 0 copies of that gene, DQ [0.4-0.7]=1 copy, DQ [0.8-1.2]=2 copies, DQ [1.3-1.7]=3 copies, DQ [1.8-2.2]=4 copies, DQ [2.3-2.7]=5 copies, etc. The borderline values, such as a DQ of 0.7 are questionable and when the second probe obviously quantified 1 copy of this gene, 0.7 was considered as 1 copy, the same approach is applied with other borderline values.

FIGS. 11A and 12A show the pedigrees of the families 1347 and 1349, respectively and the legends for the haplotype are displayed below. The copy numbers of the KIR genes are listed in the FIGS. 11A and 12A next to the pedigrees.

A difference in the quantification of the exact copy numbers was observed with the probes for KIR3DP1 in samples: 1347-8445, 1347-8436 and 1349-8398. Probe set 1 seems to detect more copies of this gene than probe set 2, which is in agreement with their design. Probe 3DP1(1) detects all the KIR3DP1(v) alleles and probe 3DP1(2) detects only KIR3DP1*003 denoted in the legend as 3DP1, which exhibit the exon 2 deletion. The probes 2DL3 and 2DL4 in probe set 1 detected fewer copies numbers than their counterparts in probe set 2. Probe 2DL3 and probe 2DL4 might have problems with the presence of gene variants at their target sequence, whereas these probes in probe set 2 have no gene variants in the probe target sequence and give a coverage of 100% (table 10). The probes for KIR3DL1 quantified the members of family 1349 differently. The probe in probe set 1 covers different alleles than the probe in probe set 2, the coverage rate are 78% and 41% respectively due to gene variants present at their target sequence more then 10 bases away from the ligation site, that might influence the binding efficiency and thereby the peakhights. Also here adding TUB codes in the probe sequence will overcome the problem of misinterpretation of copy number differences between individuals.

Despite the differences in copy number quantification of a number of probes, the overall inheritance pattern of the gene copies was in agreement with the inheritance of the haplotypes. For example the four framework genes KIR3DL3, KIR3DP1, KIR2DL4 and KIR3DL2 were present in all samples and at least 2 copies of each of these genes have been found. This indicates that these genes are present in at least one copy at each allele and are inherited from both parents. Examination of family 1347 revealed that the father, haplotype a/b (sample 8440) has three copies of gene KIR2DL5 on one allele, haplotype b and one on the other, haplotype a and has past haplotype b, with the three copies to the child (sample 8436) and the allele haplotype a, with one copy to the other child (sample 8412). For the family 1349, one copy of KIR2DS4 is believed to reside on one allele, haplotype c and two on the other, haplotype d of the mother (sample 8399). Because both children, haplotype b/c and haplotype a/c (sample 8393 and 8636), respectively, inherited the allele with two copies from their mother as they have both the haplotype c and one child (sample 8636) inherited one copy of this gene from its father, haplotype a. Also when the inheritance patterns of the remaining copy numbers of genes were analyzed, no inconsistency with the inheritance patterns of the haplotypes could be found. The rest of the families with fully reported haplotypes should be tested again to obtain complete data of all the members within one family, before the inheritance patterns and copy numbers can be analyzed.

DISCUSSION

Before the present invention, the main problem in designing synthetic MLPA probes for KM genotyping was to design probes specific enough for the target gene, but still sensitive enough to detect most of the alleles present in the population. KIR genes have very high level of homology (85-99%) in the sequences of both exons and introns and show an extensive degree of gene variation.

The MLPA is a good method, because it can discriminate target sequences that only differ one nucleotide at the ligation site. The present inventors designed synthetic MLPA probes consisting of three probe parts which added a second ligation site, so that an extra discrimination point was provided. In addition these three-part probes made it possible to elongate the ligated probe size, the longest probe tested in this study was 223 bp (Ctr 10). Due to the better quality of the probes and three-part probes, the number of probes in a synthetic MLPA probe set according to the invention is less restricted by the size of the ligated probes.

This study has demonstrated that the MLPA with two synthetic probe sets is reliable in KIR genotyping, as these two probe sets have been well validated by three independent approaches. The two probes sets complement each other in the detection and coverage of the KIR alleles, which yielded in no false negatives any more in all the samples used for verification. Even after exclusion of the probes that may have generated false positives from the probe sets, all 16 KIR genes can still be consistently detected for their presence or absence. This makes the MLPA methods used in this Example in a qualitative sense comparable to the PCR-SSP and PCR-SSOP methods. However time and work is saved with the performed Example, as only two reactions are needed to generate a complete KIR-genotype profile.

In summary, probe set 1 contains the probes 2DL1-5, 2DS1, and 2DS3-5, 3DL1-3, 3DS1, 2DP1 and 3DP1, in total 15 probes. Probe set 2 contains the probes 2DL3-5, 2DS2-4, 3DL1-3, 2DP1 and 3DP1, in total 11 probes. Together these two probe sets are accurate for the typing of 16 KIR genes and for quantifying relative copy numbers of at least 9 KIR genes.

Example 2

This Example presents additional probes for KIR genotyping and copy number variation analysis with multiplex ligation dependent probe amplification (MLPA). Here, probes are presented for all 17 KIR genes KIR2DL1-5, KIR2DS1-5, KIR3DL1-3, KIR3DS1, KIR3DP1 and KIR2DP1, including KIR2DL5a and KIR2DL5b, KIR3DP1v and several null alleles. The extended probesets 1 and 2 are listed in FIGS. 3C and 3D, respectively. As in example 1, the specificity of the probes was validated by comparison of the samples for the KIR genotypes obtained with PCR-SSOP and PCR-SSP methods, and the ability of the probes to quantify relative gene copy numbers was examined with 12 families, each consisting of two parents and two offspring, which have been genotyped for most KIR alleles.

Materials & Methods

For DNA selection/isolation, probe design, MLPA reaction, electrophoresis and analysis according to materials & methods of example 1 with the exception that no competitors were used and data were normalized with Soft genetics Genemarker v1.85, using internal control probe normalization (http://www.softgenetics.com/papers/MLPA) and synthetic references.

Results
Extended Probesets

With the extended probesets 1 and 2 all KIR genes and several KIR gene variants were detected.

The extended probe set 1 depicted in FIG. 3C detects the same genes as probe set 1 of example 1 but additional probes are added and therefore additional KIR gene variants are now detected. Additional probes that are added are 2DL5B, 2DL4N (2DL4*007,008,009,011), 3DL1*024N.

The extended probe set 2 as depicted in FIG. 3D detects the same genes as probe set 2 of example 1 but additional probes are added and therefore additional KIR gene variants are now detected. Additional probes that are added are 2DL5B, 3DS1*049N and 2DS4N (2DS4*004, *006, *007,*008 and *009). KIR2DS4N is also called KIR1D.

Probe 3DP1

The probe 3DP1 in extended probe set 2 detects a deletion of exon 2, this allele of KIR3DP1 is designated as KIR3DP1*003, KIR3DP1*005 or KIR3DP1*006.

Probes for 2DL5A and 2DL5B

With the extended probesets 1 and 2 KIR2DL5A and 2DL5B are now also detected. The probes that were designed for KIR2DL5A and KIR2DL5B also detect the alleles KIR3DP1 variants (table 10, KIR3DP1v). When probe 2DL5A or 2DL5B generates a signal in the MLPA, this could indicate the presence of both KIR2DL5A and KIR3DP1v or KIR2DL5B and KIR3DP1v respectively. Alternatively, when probe 2DL5A or 2DL5B generate a signal in the MLPA the presence of either KIRDL5A or KIR3DP1v alone (with probe 2DL5A) or KIR2DL5B or KIR3DP1v alone (with probe 2DL5B) is indicated. Thus with these probes 2DL5A and 2DL5B more than one KIR gene is detected. Therefore, these probes are not suitable to determine copy number variation (see FIG. 13).

Copy Number Variation (CNV)

For all KIR alleles except KIR3DP1 variants (KIR3DP1v), KIR2DL5A and 2DL5B copy number variation is determined with extended probesets 1 and 2 (FIG. 13).

Quantification of CNV

A difference in the quantification of the exact copy numbers as compared to example 1 was elaborated by studies with the extended probesets. Optimization of the probe set initially used in FIG. 11A, has now resulted in a 100%-perfect match with the validated KIR data in the in example 1 genotyped pedigrees. None of the MLPA probes gave a false-positive or false-negative signal in the 10^thICW families tested as exemplified by the analysis of families 1347 and 1349 (FIGS. 11B & 12B). Thus, both probe set 1 and/or 2 and extended probe sets 1 and/or 2 are suitable for detection of KIR genes and for determination of relative copy number variation, but extended probe sets 1 and/or 2, as depicted in FIGS. 3C and 3D, are preferred.

Specificity and Quantification for KIR Haplotyping

From the MLPA data within pedigrees haplotyping can be inferred. First of all, the framework genes KIR3DL3 and KIR3DPI for the first block in both haplotypes A and B (FIG. 6) and KIR2DL4 and KIR3DL2 are present in a fixed copy number of 2 genes. However, KIR3DP1 may be present as so-called KIR3DP1v variant (see also FIG. 7, grey boxes represent the framework KIR genes in both haplotypes A and B). In case of haplotype B the presence of KIR genes may vary widely (FIG. 6), making this haplotype an important contribution to the variation within the KIR gene cluster.

In family 1347, we have deduced, using the extended probesets, from the pedigree a correct and complete KIR haplotype analysis (FIG. 11B).

At the single gene level the MLPA results offers insight into the patterns of inheritance. The sibs inherited from their parents different KIR haplotypes, which □ for instance □ resulted in the variation in KIR2DL5 gene content. Thus, both sibs have 2 of these genes, containing 2 KIR2DL5 genes from the father (who carries 4 KIR2DL5 genes in total) and one null-haplotype from the mother. From the present data from the literature or the current MLPA data, it cannot yet be distinguished whether the two KIR2DL5 genes that both sibs have inherited, are the same alleles, or whether the KIR2DL5 are located in the first or second block of the so-called B haplotype (see also FIG. 6).

At the haplotype level, patterns of inheritance are deduced for the remaining non-framework KIR genes in this pedigree, e.g. KIR2DL3, KIR2DS2, KIR2DL2, KIR2DP1, and KIR2DL1 genes in the first block of haplotype B, generally located in between the framework genes KIR3DL3 and KIR3DP1 genes (see also FIG. 6).

In case of the first block of haplotype B, the results are explained by the inheritance of a KIR2DL3-KIR2DP1-KIR2DL1 haplotype from the father and the KIR2DS2-KIR2DL2-KIR2DP1-KIR2DL1 haplotypic block from the mother.

In case of the second block of haplotype B, it is clear that the KIR3DS1-KIR2DS3-KIR2DS1 haplotype has been inherited from the father and the KIR3DL1-KIR2DS4 from the mother. Yet, one sib (8436) must have lost a KIR3DL1 gene according to our MLPA analysis. Sib 8436 has the normal 3DL1 present in our MLPA, though sib 8412 has inherited a 3DL1N variant gene in stead of the normal 3DL1 gene. This is just by normal inheritance so not an exception.

SSP-PCR can not discriminate between 3DL1 variants (also not between 3DS1 variant genes nor 2DL4 variant genes).

At the haplotype level, patterns of inheritance are similarly deduced for the pedigree of family 1349 (FIG. 12B). Apart from the framework KIR genes in this pedigree, the non-framework genes form the haplotype B that are inherited “en bloc”. In case of these two sibs, 1349-8393 and -8636, the KIR variation can be well explained by inheriting different KIR haplotypes from both parents.

With respect to the first block of haplotype B, the results are explained by the inheritance of one of his two similar KIR2DL3-KIR2DP1-KIR2DL1 alleles from the father and one from the mother (while this female also carried a smaller KIR2DL3-KIR2DP1 haplotypic block).

In case of the second block of haplotype B, it is clear that the father carries a KIR3DL1-KIR2DS4 combination on one allele and a separate KIR2DS3-KIR2DS4-KIR2DS1 haplotypic on the other allele that were differently inherited by the two sibs, whereas the mother carries two identical KIR3DL1-KIR2DS4 alleles.

In FIGS. 11 and 12 the standard SSP PCR results are compared with our MLPA data with the extended probe sets 1 & 2 for the pedigrees in the CEPH families 1347 and 1349.

Two KIR haplotype models have been described (see for instance: H. Li, PLoS Genetics, 2008, 4, 11:e1000254; M. Uhrberg, Eur. J. Imm. Highlights, 2005, 35:10-15; M. Carington, The KIR Gene Cluster, 2003; K. Hsu, Imm. Reviews, 2002, 190:40-52). The conventional KIR haplotype model assumes that there are two haplotypes A and B. Both haplotypes A and B contain the framework genes 3DL3, 3DP1, 2DL4, and 3DL2. Then there are the KIR genes 2DP1, 2DL1 and 2DS4 that are common for both haplotypes, but only the haplotype A contains 2DL3, 3DL1 and 2DS4. Haplotype B is more variable and can contain the KIR genes 2DS1, 2DS2, 2DS3, 2DS4, 2DS5, 3DS1, 2DL2 and 2DL5 (apart form the aforementioned framework genes). In more than 96% of the worldwide global population the A haplotype at KIR gene cluster contains the KIR genes 3DL3, 2DL3, 2DP1, 2DL1, 3DP1, 2DL4, 3DL1, 2DS4 and 3DL2 (see also: www.allelfrequencies.net).

The novel KIR haplotype model assumes that haplotype A and B are present on the two different chromosomes. Therefore any individual can represent an AA, AB or BB genotype. Based on the genes that are present in the DNA sample of that individual, one can conclude which haplotypes are present and the positive genes from the assay can be divided over both haplotypes according to the rules that certain KIR genes are present only in one of the haplotypes A or B, essentially as was mentioned above.

For the SSP PCR data the two haplotype models are shown to interpret possible CNV results, resp. the conventional KIR haplotype model in FIGS. 11B1 and 12B1 and the novel KIR haplotype model in FIGS. 11B2 and 12B2. FIGS. 11B3 and 12B3 show the results of our MLPA data with the extended probe sets 1 & 2 compared with both the SSP PCR data according to the conventional KIR haplotype model and with the novel KIR haplotype model.

In conventional KIR haplotype model in FIGS. 11B1 and 12B1 the KIR gene region is described by framework genes (3DL3, 3DP1, 2DL4 and 3DL2), genes that can be present in both A and B haplotypes (2DP1, 2DL1 and 2DS4) and haplotype-specific genes. The KIR genes 2DL3, 3DL1 and 2DS4 are specific for haplotype A. while the KIR genes 2DL5, 2DS1, 2DS2, 2DS3, 2DS5, 3DS1 and 2DL2 are specific for haplotype B. The haplotype A is constant to a high degree. In more than 96% of the global population haplotype A consists of 3DL3, 2DL3, 2DP1, 2DL1, 3DP1, 2DL4, 3DL1, 2DS4 and 3DL2 (www.allelefrequencies.net). Haplotype B is more variable and carries more activating KIR genes.

FIGS. 11B2 and 12B2 show the interpretation for the respective families based on the novel KIR haplotype model and SSP-PCR data from CEPH-IHWG.

FIGS. 11B3 and 12B3 show the copy number variation for the respective families. In table 3 Copy number variation of KIR genes by MLPA is determined by 2 probes for each gene, except for the N-variant genes (single probe detection by definition), including those genes marked by an asterisk.

For the 3DP1v gene variant a combination of 3 probes has been designed. CNV can be deduced from a comparison between the results for the probes for 2DL5, 2DL5a and 2DL5b.

The 2DS4N KIR probe is designed to detect the KIR-2DS4 deletion-variant genes *003 to *009, while SSP-PCR only detects 2DS4 variant *003 (designated 1D).

In FIG. 12B3 KIR3DP1 variants are detected using MLPA (table 3), whereas KIR3DP1 variants are not detected when SSP-PCR is used. SSP-PCR of KIR3DP1v results in a band of 1672 bp that is obtained from the 3DP1 gene. Because this is a large fragment which are known to be difficult to detect. Therefore, a DNA sample can be positive for KIR3DP1v when MLPA is used but appear to be negative for KIR3DP1v when SSP-PCR is used.

CONCLUSION

Extended probe set 1 contains the probes 2DL1-5, 2DS1-5, 3DL1-3, 3DS1, 2DP1 and 3DP1, in total 20 probes. Extended probe set 2 contains the probes 2DL1-5, 2DS1-5, 3DL1-3, 3DS1, 2DP1 and 3DP1, in total 20 probes. Together these two probe sets are accurate for the typing of all 17 KIR genes, and 7 variant KIR gene variants (i.e. 2DL5a, 2DL5b, 3DP1v, and the null-variants 2DL4N, 3DL1N, 3DS1N, and 2DS4N), and for quantifying relative copy numbers of at least all 17 different KIR genes, and 4 null-variant (2DL4N, 3DL1N, 3DS1N, and 2DS4N) (see FIG. 13).

Example 3

The advantage of probe sets comprising three probe parts according to the present invention is that at least two different SNPs can be detected with one probe set. For instance, in a probeset consisting of three probe parts two sites for ligation are preferably present. A left probe part and middle probe part are ligated and additionally a middle probe part and right probe part are ligated. At each ligation site a SNP can be detected. With conventional MLPA probe sets, consisting of two half probes, only one SNP can be detected per probe set, because only one site for ligation is present.

In this Example detection of the Null allele of KIR3DL1 with a probeset consisting of three probes (one left probe part, one middle probe part and one right probe part) is described. This example is illustrated in FIG. 1C.

Materials & Methods

The null allele, called KIR3DL1*024N, is discriminated from KIR3DL1 using three probes of the invention. Partial probes (probe numbers as depicted in FIG. 3C) used in this example are:

711A - KIR3DL1 WT Left probe part: 5′-PO4

GGTTCCCTAAGGGTTGGACCCCTCACGCCTCGTTGGACA-3′

711D - KIR3DL1*024N Left probe part: 5′-PO4-

GGGTTCCCTAAGGGTTGGACAAGGACCCCTCACGCCTCGTTGGAC-3′

711B - KIR3DL1 Middle probe part: 5′-PO4-

GATCCATGATGGGGTCTCCAAGGCCAATTTCTCCATCGGTCCCATGATG

CT-3′

711C - KIR3DL1 Right probe part: 5′-PO4-

GCCCTTGCAGGGACCTACAGATGCTACGGTTCTGGTCTAGATTGGATCT

TGCTGGCAC-3′

For DNA selection/isolation, probe design, MLPA reaction, electrophoresis and analysis see materials & methods of example 1.

With these partial probes 2 probe sets can be formed. Those two probe sets consist of different left probe parts, but share the middle and right probe parts.

Results and Discussion

The final base of middle probe part 711B is a thymine. This thymine is specific for KIR3DL1 genes while all other KIR genes have a different base at this position. Therefore, with probe part 711B KIR3DL1 is discriminated from other KIR genes. Ligation between the middle probe part (711B) and right probe part (711C) will only occur when KIR3DL1 genes are present.

The final base of left probe part 711A is an adenine. This base is present in wildtype KIR3DL1 gene but deleted in the KIR3DL1 null allele, KIR3DL1*024N. Thus, probe part 711A containing an adenine at the final base position is specific for the wildtype KIR3DL1 gene and ligation between the 711A left probe part and the middle probe part (711B) will only occur if the KIR3DL1 wildtype gene is present. In left probe part 711D the final adenine is removed. Thus, probe part 711D is specific for null allele KIR3DL1*024N and ligation between the 711D left probe part and the middle probe part (711B) will only occur if KIR3DL1*024N is present.

Thus these two probe sets each detect 2 SNPs, namely those SNPs that are specific for KIR3DL1 wildtype gene and null allele KIR3DL1*024N because both the left probe part and the middle probe part are SNP-specific.

Example 4
Introduction

To be able to determine copy number variation of the KIR gene family using MLPA, it is necessary to compare the data with a sample of which it is exactly known how many copies of each KIR gene is present. Up till now, there was no such sample available. That is why a calibrator for the KIR MLPA was designed.

The calibrator is a DNA construct that is designed to contain one binding site for each probe set in the KIR MLPA. If this construct is run in an MLPA just as any other sample, it will result in a peak pattern which resembles the presence of one copy per KIR gene. In theory, if the concentration of this construct is prepared twice as high as compared to the (estimated) concentration of a human genome, the amount of copies that one peak represents is equal to two. The peak pattern of each human genome can be compared to the peak pattern of the calibrator and so the amount of copies of each KIR per genome can be determined.

Materials & Methods

The calibrator is constructed as follows: A total of six genes (FIG. 16) containing the binding sites of all probes were synthesized. These genes were synthesized using a modified version of the method described in an article by Stemmer et al. (Gene, 1995) as schematically shown in FIG. 14. The intended sequences of both the sense and anti-sense strands of each gene were designed first. All genes included unique digestion sites at both ends for cloning purposes. These sequences were broken into pieces of oligo's 60 base pairs long with each 20 base pairs of “non-coding material” in between. The pieces of the anti-sense strand were created so that they would have exactly 20 bp overlap with two pieces of the sense strand.

The 60-mers were obtained from (Life technologies) and assembled with the use of a PCR machine. The overlapping part of the oligo's ligate with each other and the open spaces of 20 bp are filled up with nucleotides. Part of the product is used for a second PCR reaction containing only the outside oligo's which function as primers in this reaction. The assembled gene is amplified.

Technical Aspects

A difference between the construction of the calibrator and the protocol of Stemmer et al. is the length of the starting oligo's. We have decided on the use of 60-mers, while Stemmer et al. uses 40-mers. Also, Stemmer et al. predicts that products can be synthesized of 3-5 kilo bases, while the largest part that we were able to create with this method is 680 bp. For this calibrator, six genes were designed of which gene 1 had a length of 892 bp, gene 2 to 5 had lengths of around 1.1 kb and gene 6 had a length of 2.8 kb. None of these genes could be synthesized in two steps. Therefore it was required to split the genes into smaller parts of maximally 680 bp with a 180 bp overlap. The oligo's that form these smaller parts were assembled together and the products were amplified. Successfully synthesized parts were combined in a second amplification reaction to create the entire gene of 1.1 kb. This process is illustrated by the agarose gel separation of the PCR products of one of these genes in FIG. 15.

Genes 1 to 5 were all synthesized by these 3 steps. Gene 6 is larger and requires multiple steps to assemble. This gene was divided into 6 smaller genes of ˜650 bp which were combined to form 5 genes of ˜1 kb. These 1 kb products were then combined to form 4 products of ˜1.5 kb and so forth. The longer the products, the longer the overlap had to be for the separate products to ligate to each other. Also, for several steps it was required to purify the specific product from gel to remove all side products that had formed.

When a synthetic gene was produced it was cloned into a plasmid; either eGFP-C1 or pBlue Script. Sequencing of the inserts confirmed the presence of the entire gene. Mutations (introduced by possibly incorrectly formed oligo's and multiple PCR rounds) were present at a relatively low rate (approx. 2 mutations per kb). When a mutation was present in the vicinity of a ligation point, a site-directed mutation kit was used to mutate this specific base pair back to the original sequence. Only then were all genes combined into one plasmid: pBlueScript. The plasmid was digested so that these synthetic genes could be inserted. The result was one large construct of 11.5 kb as schematically depicted in FIG. 16. The sequence of the complete KIR MLPA calibrator is depicted in FIG. 19.

Mode of Action

During an MLPA reaction the KIR probes will find only one site on the calibrator to bind. To prevent any interference or competition between different probe sets, the calibrator was designed in such a way that there is the least possible overlap between the probe binding sites. To accomplish this, spacers (non-coding sequences) were introduced between the probe binding sites. This method was applied for gene 1 to 5. A second technique is to alternate the probe binding site between the sense and anti-sense strand. A schematic representation of these two ways of distribution is shown in FIG. 17.

The calibrator contains the probe binding sites of all KIR probe sets of FIGS. 3F and 3G and control probe sets of FIG. 3H. This last aspect facilitates an extra quality control for each MLPA run. If necessary, more probe binding sites can be added to the construct as several single cutting digestion sites are included in the construct.

Example 5
Introduction

The genotyping and CNV detection within CEPH families enabled us to find further proof of principle and application of the calibrator-related assessment of the number of KIR genes within any given individual. As exemplified by two CEPH families 1344 and 1349, the CNV for 3DP1 and 2DL3 has been determined by the current KIR MLPA probe mixes and corresponding calibrator.

Previous SSP-PCR and MLPA data gave the same results in the presence or absence of KIR genes. By the added value of CNV detection within families, it has become possible to construct and trace the haplotypes of father and mother which are inherited by their offspring, as indicated in FIGS. 18A and 18B

Haplotyping is a means to further validate the MLPA for KIR genes. Each haplotype (i.e. a series of adjacent genes at one locus) on one chromosome of the parent is divided among the male and female germline cells and after conception of the fertilized egg a set of two haplotypes is created again, i.e. one from the father and one from the mother.

The framework genes are present in both the relatively fixed haplotype A and the highly variable haplotype B (see FIGS. 11B and 12B), which means that these genes are always present in each individual as a pair of genes. The presence of the other KIR genes are more variable and can be completely absent or present as a single or two copies of that gene, depending on the haplotype (haplotype A or B).

Family Tree Assembly

A set of previously genotyped families was used to validate the use of the calibrator of Example 4, the sequence of which is shown in FIG. 19, in the KIR MLPA. These families, called Centre d'Etude Polymorphisme Humain (CEPH) families, were typed during the 13^thImmuno Histocompatibility Workshop (IHS) using SSP-PCR. In this example it is described how the calibrator is applied during copy number determination in two of these CEPH families.

Materials and Methods

The KIR MLPA was based upon the method as described in Example 1. In short, the MLPA was started with a probe hybridization step, followed by ligation of the different probes. The final step was the amplification of bound probes with a polymerase chain reaction (PCR). The PCR was optimized for this specific MLPA; 10 μl of ligation mixture was added to 40 μl of PCR mixture. This mixture consisted of 5 μl Accuprime Taq Buffer, 0.4 μl Accuprime Taq enzymes (Invitrogen, AccuPrime Taq DNA Polymerase High Fidelity kit), 1.875 mM MgSO4 and 1.5 μl of Salsa PCR primers (MRC Holland). The PCR reaction created products that contained a fluorescent label allowing for fragment analysis by electrophoresis. Fragment analysis was performed with a mixture of HiDi Formamide (9 μl) and Promega Internal Lane Standard 60-600 (1 μl). The probes were divided among three mixes to prevent competition between homologous primer sets. The control probes used in this assay were obtained from MRC Holland.

The PCR fragments that were formed during the KIR MLPA were analysed on a sequencer platform, in our case an 3130x1 Genetic Analyzer from Applied Biosystems. The raw data set was further analyzed using Genemarker software (Softgenetics LLC). This software normalized the peak pattern so that each peak gave a representative height and area for the actual amount of copies of each specific gene. This was also done for the peak pattern of the calibrator: the software assumed that two copies of each gene were present in every sample. Therefore, it adjusted the peaks of the calibrator (which actually represent one copy of each gene) to represent two copies.

The next step of the analysis was the MLPA analysis itself. The peak pattern of one or more calibrator samples was used as a reference for two copies. The software compared each peak from a tested sample with the height or area of the comparable peak in this reference, resulting in a ratio. A ratio of 0.75-1.25 meant an equal amount of copies, so 2. A ratio of 0.25-0.75 was marked as a deletion, so 1 copy. A ratio of 1.25-1.75 referred to a duplication, so 3 copies.

For each member of two CEPH families, the copy numbers of the KIR genes were determined as described above. The genes for the null-variants for KIR2DL4 and KIR2DS4 were determined, but KIR2DL5A and KIR2DL5B were not distinguished from each other. As inheritance of these genes is thought to occur in a Mendelian manner on two alleles, each copy in the offspring should have its origin in one of the parents. This allowed us to divide the found copy numbers of the KIR genes in separate alleles which are inherited from the parents.

Results and Discussion

The KIR haplotype of the two families as determined are shown in FIGS. 18A (family 1344) and 18B (family 1349). The parents are indicated with −01 and −02, the square represents the father and the circle represents the mother, with the determined KIR haplotype besides these figures. The KIR haplotype shows which KIR genes are present in the parent and the presence and copy number of the KIR genes on each chromosome is shown, one chromosome left of the vertical line and the other chromosome to the right of the vertical line. Below the KIR haplotypes of the parents, those of two siblings are depicted. The KIR haplotypes indicate which chromosome is originating from the father (P) and which chromosome is originating from the mother (M).

Before the use of the KIR calibrator, only the presence or absence of a specific KIR gene could be demonstrated. The use of the KIR calibrator enables determination of the copy number of the KIR genes as well. For instance, FIG. 18A shows that both parents of family 1344 carry the KIR2DP1 gene and that both siblings inherited the gene. However, previously it was unknown how many copies of this gene were present. It is now demonstrated that both parents and both siblings in this family carry three copies of the KIR2DP1 gene in their genome. Apart from 2DP1, similar results for CNV of 2DL3 is demonstrated in family 1349.

We further show that the calibrator-based KIR MLPA analysis can be used to not only indicate CNV of these KIR genes (showing 0-1-2-or-more gene copies) but also to help determine the mode of inheritance to siblings within one family. This is illustrated for KIR2DL3 in family 1349 (FIG. 18B). The two siblings tested in this family differ in the number of KIR2DL3 genes inherited from each of the parents who both carry 3 copies in their genome.

Thus, CNV, which can now be determined with a KIR calibrator, can be a helpful way to determine a certain haplotype in a pedigree and use it as a tracer for genetic inheritance patterns within KIR-genotyped families.

TABLE 1

KIR genes and proteins names, adapted from

KIR Nomenclature report 2002 (Marsh et al, 2002).

Gene
Protein

symbol
symbol
Aliases

KIR2DL1
KIR2DL1
cl-42, nkat1, 47.11, p58.1, CD158a

KIR2DL2
KIR2DL2
cl-43, nkat6, CD158b1

KIR2DL3
KIR2DL3
cl-6, nkat2, nkat2a, nkat2b, p58, CD158b2

KIR2DL4
KIR2DL4
103AS, 15.212, CD158d

KIR2DL5A
KIR2DL5A
KIR2DL5.1, CD158f

KIR2DL5B
KIR2DL5B
KIR2DL5.2, KIR2DL5.3, KIR2DL5.4

KIR2DS1
KIR2DS1
EB6ActI, EB6ActII, CD158h

KIR2DS2
KIR2DS2
cl-49, nkat5, 183ActI, CD158j

KIR2DS3
KIR2DS3
nkat7

KIR2DS4
KIR2DS4
cl-39, KKA3, nkat8, CD158i

KIR2DS5
KIR2DS5
nkat9, CD158g

KIR2DP1
KIR2DP1
KIRZ, KIRY, KIR15, KIR2DL6

KIR3DL1
KIR3DL1
cl-2, NKB1, cl-11, nkat3, NKB1B, AMB11,

KIR, CD158e1

KIR3DL2
KIR3DL2
cl-5, nkat4, nkat4a, nkat4b, CD158k

KIR3DL3
KIR3DL3
KIRC1, KIR3DL7, KIR44, CD158z

KIR3DS1
KIR3DS1
nkat10, CD158e2

KIR3DP1
KIR3DP1
KIRX, KIR48, KIR2DS6, KIR3DS2P,

CD158c

TABLE 2

Number of currently known alleles for each KIR gene and the different

protein products they encode (IPD KIR database, http://www.ebi.ac.uk/ipd/kir).

Gene
2DL1
2DL2
2DL3
2DL4
2DL5
2DS1
2DS2
2DS3

Alleles
25
11
9
25
21
12
12
9

Proteins
28
7
8
12
11
8
6
3

Gene
2DS4
2DS5
3DL1
3DS1
3DL2
3DL3
2DP1
3DP1

Alleles
20
12
52
14
45
55
5
8

Proteins
13
9
46
12
40
31
0
0

TABLE 3

KIRs and their cognate ligands (Carrington et al, 2003;

Middleton et al, 2005; Du et al, 2007). The ligands

of the other KIRs are unknown or uncertain.

Inhibitory

Activating

KIRs
Ligands
KIRs
Ligands

2DL1
HLA-C group 2,
2DS1
HLA-C group 2

allotypes Cw1,

allotypes Cw1,

4, 5, 6, 17, 18

4, 5, 6, 17, 18

2DL2 and
HLA-C group 1,
2DS2
HLA group 1,

2DL3
allotypes Cw1,

allotypes Cw1,

3, 7, 8, 13, 14

3, 7, 8, 13, 14

2DL4
HLA-G
2DS4
HLA-C

3DL1
HLA-B, Bw4
3DS1
HLA-B, Bw4

3DL2
HLA-A3 and A11

allotypes

TABLE 4

The KIR Reference Panel I from the IHWG (http://www.ihwg.org/cellbank/dna/refpan_nkkir_table.html). 2DS4

indicates all alleles except KIR2DS4*003 and 1D indicates only KIR2DS4*003. 3DP1 indicates KIR3DP1*003 (deletion of

exon 2) only and 3DP1v indicates all alleles except KIR3DP1*003

embedded image

Note:

“1” = presence of KIR gene

“0” = absence of KIR gene

shaded cells (N = 16) represent four informative families selected for the Phase II reference panel

TABLE 5

The 17 KIR probes that have been designed and tested for probe set 1.

The size of the complete MLPA probe and the size of

the separate probe parts and the concentration

used are listed in this table.

Size

Size
Concentration

Code
Probe
[bp]
Probe Part
[bp]
(fmol)

420A
2DL2
96
Left
48
1

420B

Right
48

512A
3DL3
100
Left
50
1

512B

Right
50

540A
2DS3
108
Left
54
10

540B

Right
54

404A
3DL2
112
Left
56
1

404B

Right
56

405A
2DP1
121
Left
65
1

405B

Right
56

406A
3DP1
125
Left
66
1

406B

Right
59

504A
2DS4
137
Left
61
1

504B

Right
76

408A
2DL5
142
Left
57
1

408B

Middle
32

408C

Right
53

514A
3DL1
149
Left
74
1

514B

Right
75

526A
2DS2
154
Left
57
1

526B

Middle
34

526C

Right
63

507A
2DL5A
165
Left
66
1

507B

Middle
32

507C

Right
67

419A
2DL4
170
Left
59
1

419B

Middle
54

419C

Right
57

528A
2DS5
185
Left
67
1

528B

Middle
47

528C

Right
71

413A
2DL1
189
Left
72
1

413B

Middle
64

413C

Right
53

416A
2DS1
195
Left
78
10

416B

Middle
67

416C

Right
50

415A
2DL3
213
Left
75
10

415B

Middle
69

415C

Right
69

418A
3DS1
218
Left
81
10

418B

Middle
64

418C

Right
73

TABLE 6

The 17 KIR probes that have been designed and tested for probe set 2.

The size of the complete MLPA probe and the size of

the separate probe parts and the concentration

used are listed in this table.

Size

Size
Concentration

Code
Probe
[bp]
Probe Part
[bp]
(fmol)

543A
2DS1
96
Left
48
10

543B

Right
48

544A
2DS2
100
Left
50
1

544B

Right
50

537A
2DL5
108
Left
54
1

537B

Right
54

513D
2DS3
112
Left
52
10

513B

Right
60

518A
3DP1
121
Left
61
1

518B

Right
60

542A
2DP1
125
Left
60
1

542B

Right
65

541A
3DS1
134
Left
67
10

541B

Right
67

524A
2DS4
137
Left
66
10

524B

Right
71

545A
2DS5
144
Left
68
10

545B

Right
76

409A
3DL1
149
Left
60
10

409B

Middle
34

409C

Right
55

506A
3DL3
154
Left
54
10

506B

Middle
48

506C

Right
52

507A
2DL5A
165
Left
66
1

507B

Middle
32

507C

Right
67

539A
2DL2
170
Left
60
1

539B

Middle
46

539C

Right
64

525A
2DL1
190
Left
64
10

525B

Middle
62

525C

Right
64

538A
3DL2 r
195
Left
70
1

538B

Middle
60

538C

Right
65

417A
2DL3
213
Left
75
10

417B

Middle
69

417C

Right
69

517A
2DL4
218
Left
73
10

517B

Middle
68

517C

Right
77

TABLE 7

The control probes used in the two probes sets.

The size of the complete MLPA probe and the size of the

separate probe parts and the concentration used

for the probe sets are listed in this table.

Size

Size
Concentration

Code
Probe (Gene)
[bp]
Probe part
[bp]
(fmol)

201
Ctrl 2 (FGF3)
92
Left
45
1

Right
47

202
Ctrl 3 (BCAS4)
104
Left
52
1

Right
52

203
Ctrl 4 (LMNA)
116
Left
58
1

Right
58

204
Ctrl 5 (PARK2)
130
Left
44
3

Middle
41

Right
45

205
Ctrl 7 (MSH6)
160
Left
59
1

Middle
42

Right
59

206
Ctrl 8 (GALT)
175
Left
58
1

Middle
59

Right
58

207
Ctrl 9 (SPG4)
180
Left
60
1

Middle
60

Right
60

210
Ctrl 1 (IL-4)
208
Left
73
1

Middle
69

Right
66

209
Ctrl 10 (NF2)
223
Left
78
1

Middle
69

Right
76

TABLE 8

The competitors of the control probes.

The size of the competitor, the part of the control probes used and

concentration used for the probe sets are listed in this table.

length

Concentration

code
gene
[bp]
probe part
(fmol)

201X
Ctrl 2 (FGF3)
30
Left
10

202X
Ctrl 3 (BCAS4)
30
Left
10

203X
Ctrl 4 (LMNA)
30
Left
3

205X
Ctrl 7 (MSH6)
50
Left
0

207X
Ctrl 9 (SPG4)
50
Left
1

TABLE 9

KIR gene frequencies in the Caucasian population. The frequencies

are derived from several studies performed worldwide in the Caucasian population

and are availableon www.allelfrequencies.net.

KIR2DL1
KIR2DL2
KIR2DL3
KIR2DL4
KIR2DL5
KIR2DS1
KIR2DS2
KIR2DS3

88-100%
39-63%
57-94%
100%
36-61%
27-49%
25-63%
19-42%

KIR2DS4
KIR2DS5
KIR3DL1
KIR3DL2
KIR3DL3
KIR3DS1
KIR2DP1
KIR3DP1

87-98%
21-46%
76-98%
99-100%
99-100%
26-50%
94-100%
97-100%

TABLE 10

KIR alleles detected by the probes and the coverage of the total KIR alleles, except for 3DP1v, by probe sets 1

and 2, as depicted in FIG. 3A and 3B. All KIR alleles including 3DP1v are also detected by extended probe sets 1 and 2,

as depicted in FIG. 3C and 3D Coverage lower then 100% are caused by gene variants that are present in the target

sequence to which the probes bind. The alleles shown here that can be detected by the probes are generated with the

primer or probe blast tool on the IPD KIR database. The percentage of the total KIR alleles that can be covered by the

probes is calculated by dividing the number of alleles for each probe by the number of total alleles that is reported on the

website. Certain alleles are underlined where the coverage of both probe sets is not 100% due to gene variants present in

the target sequence.

Probe set 1
Probe set 2
Probe set 1 + 2

PROBE
ALLELES
COVERAGE
PROBE
ALLELES
COVERAGE
COVERAGE

2DL1
2DL1*001
2DL1*00402
100%
2DL1
2DL1*001
2DL1*00402
100%
100%

2DL1*002
2DL1*005

2DL1*002
2DL1*005

2DL1*00301
2DL1*006

2DL1*00301
2DL1*006

2DL1*0030201
2DL1*007

2DL1*0030201
2DL1*007

2DL1*0030202
2DL1*008

2DL1*0030202
2DL1*008

2DL1*00303
2DL1*009

2DL1*00303
2DL1*009

2DL1*0040101
2DL1*010

2DL1*0040101
2DL1*010

2DL1*0040102

2DL1*0040102

2DL2
2DL2*001
2DL2*004
100%
2DL2
2DL2*001
2DL2*003
80%
100%

2DL2*002
2DL2*005

2DL2*002
2DL2*005

2DL2*003

2DL3
2DL3*001
2DL3*004
86%
2DL3
2DL3*001
2DL3*005
100%
100%

2DL3*002
2DL3*005

2DL3*002
2DL3*006

2DL3*003
2DL3*006

2DL3*003
2DL3*007

2DL4
2DL4*00101
2DL4*00501
54%
2DL4
2DL4*00101
2DL4*00601
100%
100%

2DL4*00102
2DL4*00601

2DL4*00102
2DL4*00602

2DL4*00105
2DL4*00602

20L4*0010301
2DL4*007

2DL4*00201
2DL4*007

2DL4*0010302
2DL4*0080101

2DL4*00202
2DL4*0080101

2DL4*00104
2DL4*0080102

2DL4*003
2DL4*0080201

2DL4*00105
2DL4*0080103

2DL4*004
2DL4*011

2DL4*00201
2DL4*0080104

2DL4*00202
2DL4*0080201

2DL4*00203
2DL4*0080202

2DL4*003
2DL4*009

2DL4*004
2DL4*010

2DL4*00501
2DL4*011

2DL4*00502
2DL4*012

2DL5
2DL5A*0010101
2DL5B*003
100%
2DL5
2DL5A*0010101
2DL5B*00601
54%
100%

2DL5A*0010102
2DL5B*004

2DL5A*0010102
2DL5B*007

2DL5A*0050101
2DL5B*00601

2DL5B*003
2DL5B*00801

2DL5A*0050102
2DL5B*007

2DL5B*004

2DL5B*0020101
2DL5B*00801

2DL58*0020102
2DL5B*009

2DL5B*0020103

2DL5A
2DL5A*0010101
2DL5A*0050101
100%
2DL5A
Same probe as in probe set 1.
100%
100%

2DL5A*0010102
2DL5A*0050102

3DP1*004

14%

3DP1v

2DS1
No match found in the KIR

2DS1
No match found in the KIR

database. BLAST result in match

database. Probe designed on

with KIR2DS1v alias KIR2DS1*002

intron 6.

2DS2
2DS2*0010101
2DS2*002
90%
2DS2
No match found in the KIR

90%

2DS2*0010102
2DS2*003

database. Probe designed on

2DS2*0010103
2DS2*004

intron 2 and 3.

2DS2*00102
2DS2*0005

2DS2*00103

2DS3
2DS3*00101
2DS3*002
100%
2DS3
2DS3*00101
2DS3*002
100%
100%

2DS3*00102
2DS3*003N

2DS3*00102
2DS3*003N

2DS3*00103
2DS3*004

2DS3*00103
2DS3*004

2DS3*00104

2DS3*00104

2DS4
2DS4*0010101
2DS4*003
100%
2DS4
2DS4*0010101
2DS4*003
80%
100%

2DS4*0010102
2DS4*004

2DS4*0010102
2DS4*006

2DS4*0010103
2DS4*006

2DS4*0010103
2DS4*007

2DS4*00102
2DS4*007

2DS4*00102
2DS4*009

2DS4*00103
2DS4*009

2DS5
2DS5*001
2DS5*004
100%
2DS5
2DS5*001
2DS5*004
100%
100%

2DS5*0020101
2DS5*005

2DS5*0020101
2DS5*005

2DS5*0020102
2DS5*006

2DS5*0020102
2DS5*006

2DS5*0020103
2DS5*007

2DS5*0020103
2DS5*007

2DS5*003
2DS5*008

2DS5*003
2DS5*008

3DL1
3DL1*00101
3DL1*027
78%
3DL1
3DL1*00101
3DL1*021
41%
88%

3DL1*00102
3DL1*028

3DL1*002
3DL1*022

3DL1*002
3DL1*029

3DL1*00401
3DL1*023

3DL1*00401
3DL1*030

3DL1*00402
3DL1*024N

3DL1*00402
3DL1*031

3DL1*00501
3DL1*025

3DL1*00501
3DL1*032

3DL1*00502
3DL1*026

3DL1*00502
3DL1*033

3DL1*006
3DL1*027

3DL1*007
3DL1*034

3DL1*007
3DL1*028

3DL1*008
3DL1*035

3DL1*008
3DL1*029

3DL1*009
3DL1*036

3DL1*009
3DL1*030

3DL1*01501
3DL1*037

3DL1*01502
3DL1*038

3DL1*016
3DL1*039

3DL1*01701
3DL1*040

3DL1*01702
3DL1*041

3DL1*018
3DL1*042

3DL1*024N
3DL1*043

3DL1*025
3DL1*044

3DL1*026
3DL1*057

3DL2
3DL2*00101
3DL2*00902
47%
3DL2
3DL2*00101
3DL2*010
45%
61%

3DL2*002
3DL2*013

3DL2*002
3DL2*011

3DL2*00301
3DL2*014

3DL2*00301
3DL2*012

3DL2*004
3DL2*016

3DL2*0004
3DL2*013

3DL2*005
3DL2*0017

3DL2*005
3DL2*015

3DL2*0070101
3DL2*018

3DL2*006
3DL2*016

3DL2*0070102
3DL2*019

3DL2*0070101
3DL2*020

3DL2*008
3DL2*020

3DL2*0070102
3DL2*021

3DL2*00901
3DL2*021

3DL2*008

3DL3
3DL3*00101
3DL3*01102
75%
3DL3
3DL3*00101
3DL3*01303
100%
100%

3DL3*00102
3DL3*012

3DL3*00102
3DL3*01304

3DL3*00103
3DL3*01301

3DL3*00103
3DL3*01305

3DL3*00201
3DL3*01303

3DL3*00201
3DL3*01306

3DL3*00203
3DL3*01304

3DL3*00202
3DL3*01307

3DL3*00204
3DL3*01401

3DL3*00203
3DL3*01401

3DL3*00205
3DL3*01403

3DL3*00204
3DL3*01402

3DL3*00207
3DL3*01405

3DL3*00205
3DL3*01403

3DL3*0030101
3DL3*015

3DL3*00206
3DL3*01404

3DL3*0030102
3DL3*016

3DL3*00207
3DL3*01405

3DL3*00401
3DL3*017

3DL3*0030101
3DL3*015

3DL3*00402
3DL3*018

3DL3*0030102
3DL3*016

3DL3*005
3DL3*020

3DL3*00401
3DL3*017

3DL3*00601
3DL3*021

3DL3*00402
3DL3*018

3DL3*00602
3DL3*022

3DL3*005
3DL3*019

3DL3*00801
3DL3*023

3DL3*00601
3DL3*020

3DL3*00802
3DL3*024

3DL3*00602
3DL3*021

3DL3*00901
3DL3*025

3DL3*007
3DL3*022

3DL3*00902
3DL3*026

3DL3*00801
3DL3*023

3DL3*010
3DL3*028

3DL3*00802
3DL3*024

3DL3*01101

3DL3*00901
3DL3*0025

3DL3*00902
3DL3*026

3DL3*010
3DL3*027

3DL3*01101
3DL3*028

3DL3*01102
3DL3*029

3DL3*012
3DL3*030

3DL3*01301
3DL3*031

3DL3*01302

3DS1
3DS1*010
3DS1*046
71%
3DS1
3DS1*010
3DS1*045
71%
86%

3DS1*01301
3DS1*047

3DS1*011
3DS1*046

3DS1*01302
3DS1*048

3DS1*012
3DS1*047

3DS1*014
3DS1*049N

3DS1*01301
3DS1*048

3DS1*045
3DS1*055

3DS1*01302
3DS1*049N

2DP1
2DP1*00101
2DP1*0020102
100%
2DP1
2DP1*00101
2DP1*0020102
100%
100%

2DP1*00102
*2DP1*003

2DP1*00102
2DP1*003

2DP1*0020101

2DP1*0020101

3DP1
3DP1*001
3DP1*004
100%
3DP1
No match found on the KIR

100%

3DP1*002
3DP1*005

database.

3DP1*00301
3DP1*006

Detects deletion of exon 2.

3DP1*00302

TABLE 11

Verification of KIR MLPA probe set 1 on 11 cell lines KIR-genotyped by the

10^thIHW.

KIR genotyped Cell lines by the 10^thIHW. results of probes set1.

CODE
NAME
2DL1
2DL2
2DL3
2DL4
2DL5
2DL5A
2DS1
2DS2
2DS3
2DS4
2DS5
3DL1
3DL2
3DL3
3DS1
2DP1
3DP1

231
JVM
1
1
1
1
0
4
0
1
0
1
0
1
1
4
0
4
4

240
T7507
1
1
1
1
1
4
1
1
1
1
0
1
1
4
1
4
4

343
OLGA
1
0
1
1
1
4
1
2
0
1
1
1
1
4
1
4
4

423
SAVC
1
0
1
1
0
4
0
2
0
1
0
1
1
4
0
4
4

712
JBUSH
1
0
1
1
0
4
0
2
0
1
0
1
1
4
0
4
4

723
BM16
1
0
1
1
0
4
0
2
0
1
0
1
1
4
0
4
4

773
LBUF

embedded image

1
2
1
3
4
3
1
3
1
3
1
1
4
3
4
4

931
AMALA
1
1
1
1
1
4
1
1
0
1
1
1
1
4
1
4
4

1042
BM90
1
1
1
1
1
4
1
1
1
1
1
1
1
4
1
4
4

1102
TAB089
1
0
1
1
0
4
0
2
0
1
0
1
1
4
0
4
4

122
KAS116
1
0
1
1
0
4
0
2
0
1
0
1
1
4
0
4
4

0 = negative by MLPA and 10th IHW

1 = positive by MPLA and 10th IHW

2 = positive by MLPA and negative by 10th IHW

3 = negative by MLPA and positive by 10th IHW

4 = not typed by 10th IHW but positive by MLPA

embedded image

TABLE 12

Verification of KIR MLPA probe set 1 on 5 PCR-SSP KIR typed samples.

PCR-SSP KIR typed DNA, results of probe set 1.

sample
2DL1
2DL2
2DL3
2DL4
2DL5
2DL5A
2DS1
2DS2
2DS3
2DS4
2DS5
3DL1
3DL2
3DL3
3DS1
2DP1
3DP1

33_7536
1
0
1
1
1
3
1
2
1
1
1
1
1
1
1
1
1

33_8025
1
0
1
1
0
3
0
2
0
1
0
1
1
1
0
1
1

33_8037
1
0
1
1
1
3
1
2
0
1
1
1
1
1
1
1
1

33_8588
0
1
0
1
0
3
0
1
0
1
0
1
1
1
0
0
1

33_9097
1
1
0
1
1
3
1
1
1
0
1
0
1
1
1
1
1

0 = negative by MLPA and SSP

1 = positive by MPLA and SSP

2 = positvie by MLPA and negative by SSP

3 = positive by MLPA not typed by SSP

TABLE 13

Verification of KIR MLPA probe set 2 on 11 cell lines KIR-genotyped by the

10^thIHW.

KIR genotyped Cell lines by the 10^thIHW. results of probe set2.

CODE
NAME
2DL1
2DL2
2DL3
2DL4
2DL5
2DL5A
2DS1
2DS2
2DS3
2DS4
2DS5
3DL1
3DL2
3DL3
3DS1
2DP1
3DP1

231
JVM
1
1
1
1
0
4
0
1
0
1
2
1
1
4
2
4
4

240
T7507
1
1
1
1
1
4
1
1
1
1
2
1
1
4
1
4
4

343
OLGA
1
0
1
1
1
4
1
0
0
1
2
1
1
4
1
4
4

423
SAVC
1
0
1
1
0
4
0
0
0
1
2
1
1
4
2
4
4

712
JBUSH
1
0
1
1
0
4
0
0
0
1
2
1
1
4
2
4
4

723
BM16
1
0
1
1
0
4
2
0
0
1
2
1
1
4
2
4
4

773
LBUF

embedded image

1
2
1
3
4
1
1
3
1
1
1
1
4
1
4
4

931
AMALA
1
1
1
1
1
4
1
1
0
1
1
1
1
4
1
4
4

1042
BM90
1
1
1
1
1
4
1
1
1
1
1
1
1
4
1
4
4

1102
TAB089
1
0
1
1
0
4
2
0
0
1
2
1
1
4
2
4
4

122
KAS116
1
0
1
1
0
4
2
0
0
1
2
1
1
4
2
4
4

0 = negative by MLPA and 10th IHW

1 = positive by MPLA and 10th IHW

2 = positive by MLPA and negative by 10th IHW

3 = negative by MLPA and positive by 10th IHW

4 = not typed by 10th IHW but positive by MLPA

embedded image

TABLE 14

Verification of KIR MLPA probe set 21 on 5 PCR-SSP KIR typed samples.

PCR-SSP KIR typed patients, results of probe set 2.

sample
2DL1
2DL2
2DL3
2DL4
2DL5
2DL5A
2DS1
2DS2
2DS3
2DS4
2DS5
3DL1
3DL2
3DL3
3DS1
2DP1
3DP1

33_7536
1
0
1
1
1
4
1
0
1
1
1
1
1
1
1
1
1

33_8025
1
0
1
1
0
4
0
0
0
1
2
1
1
1
2
1
1

33_8037
1
0
1
1
1
4
1
0
0
1
1
1
1
1
1
1
1

33_8588
2
1
0
1
0
4
0
1
0
1
2
1
1
1
2
0
3

33_9097
1
1
0
1
1
4
1
1
1
0
1
2
1
1
1
1
1

0 = negative by MLPA and SSP

1 = positive by MPLA and SSP

2 = positive by MLPA and negative by SSP

3 = negative by MLPA and positive by SSP

4 = positive by MLPA not typed by SSP

REFERENCES

Brown M A. Genetics and the pathogenesis of ankylosing spondylitis. Curr Opin Rheumatol. 2009; 21:318-23.

Carrington M, Noramn P. The KIR gene cluster 2003.

Chan A T, Kollnberger S D, Wedderburn L R, Bowness P. Expansion and enhanced survival of natural killer cells expressing the killer immunoglobulin-like receptor KIR3DL2 in spondylarthritis. Arthritis Rheum. 2005;52:3586-95.

Cook M A, Norman P J, Curran M D, Maxwell L D, Briggs D C, Middleton D, Vaughan R W. A multi-laboratory characterization of the KIR genotypes of the 10^thInternational Histocompatibility Workshop cell lines. Human Immunology 2003: 64, 567-571

Crum K A, Logue S. E, Curran M D, Middleton D. Development of a PCR-SSOP approach capable of defining the natural killer cell inhibitory receptor (KIR) gene sequence repertoire. Tissue Antigens 2000: 56: 313-326.

Du Z, Gjertson D W, Reed E F, Rajalingam R. Receptor-ligand analyses define minimal killer cell Ig-like receptor (KIR) in humans. Immunogenetics 2007:59:1-15

Gomez-Lozano N, Gardiner C M, Parham P, Vilches C. Some human KIR haplotypes contain two KIR2DL5 genes: KIR2DL5A and KIR2DL5B. Immunogenetics 2002: 54 (5): 314-9

Gómez-Lozano N, Estefanía E, Williams F, Halfpenny I, Middleton D, Solís R, Vilches C. The silent KIR3DP1 gene (CD158c) is transcribed and might encode a secreted receptor in a minority of humans, in whom the KIR3DP1, KIR2DL4 and KIR3DL1/KIR3DS1 genes are duplicated. European Journal Immunology 2005: 35(1):16-24

Hollenbach J A, Ladner M B, Saeteurn K, Taylor K D, Mei L, Haritunians T, McGovern D P B, Erlich H A, Rotter J I, Trachtenberg E A. Susceptibility to Crohn's disease is mediated by KIR2DL2/KIR2DL3 heterozygosity and the HLA-C ligand. Immunogenetics 2009: 61(10): 663-671

Hsu K C, Liu X R, Selvakumar A, Mickelson E, O'Reilly R J, Dupont B. Killer Ig-like receptor haplotype analysis by gene content: evidence for genomic diversity with a minimum of six basic framework haplotypes, each with multiple subsets. Journal of Immunology 2002: 1; 169(9):5118-29

Hsu K C, Chida S, Geraghty D E, Dupont B. The killer cell immunoglobulin-like receptor (KIR) genomic region: gene-order, haplotypes and allelic polymorphism. Immunol Rev. 2002 December; 190: 40-52.

Khakoo S I, Thio C L, Martin M P, Brooks C R, Gao X, Astemborski J, et al. HLA and NK cell inhibitory receptor genes in resolving hepatitis C virus infection. Science 2004; 305: 872-4.

Li H, Pascal V, Martin M P, Carrington M, Anderson S K. Genetic control of variegated KIR gene expression: polymorphisms of the bi-directional KIR3DL1 promoter are associated with distinct frequencies of gene expression. PLoS Genet. 2008 November; 4(11):e1000254.

Majorczyk E, Pawlik A, □uszczek W, Nowak I, Wi□niewski A, Jasek M, Ku□nierczyk P. Associations of killer cell immunoglobulin-like receptor genes with complications of rheumatoid arthritis. Genes Immun. 2007; 8:678-83.

Marsh S, Parham P, Dupont B, Geraghty D, Trowsdale J, Middelton D, Vilches C, Carrington M, Witt C, Guethlein L, Shilling H, Garcia C, Hsu K, Wain H. Killer-cell Immunoglobulin-like Receptor (KIR) Nomenclature Report. Human Immunology 2002: 64, 648-654.

Martin M P, Qi Y, Gao X, Yamada E, Martin J N, Pereyra F, et al. Innate partnership of HLA-B and KIR3DL1 subtypes against HIV-1. Nat Genet 2007; 39:733-40.

Middleton D, Williams F, Halfpenny I A. KIR genes. Transplant Immunology 2005: 14(3-4):135-42

Parham P, McQueen K L. Alloreactive killer cells: hindrance and help for haematopoietic transplants. Nature reviews Immunology 3 2003: doi: 10.1038/nri999

Shilling H G, Guethlein L A, Cheng N W, Gardiner C M, Rodriguez R, Tyan D, Parham P. Allelic polymorphism synergizes with variable gene content to individualize human KIR genotype. Journal of Immunology 2002: 1:168(5):2307-15

Schouten J P, McElgunn C J, Waaijer R, Zwijnenburg D, Diepvens F, Pals G. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification. Nucleic Acid Research. 2002: 15:30(12):e57

Stemmer W P, Crameri A, Ha K D, Brennan T M, Heyneker H L. Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. 1995 Gene, 164: 49-53.

Sun J Y, Gaidulis L, Miller M M, Goto R M, Rodriguez R, Forman S J, Senitzer D. Development of a multiplex PCR-SSP method for Killer-cell immunoglobulin-like receptor genotyping. Tissue Antigens 2004: 64: 462-468.

Trowsdale J, Barten R, Haude A, Stewart C A, Beck S, Wilson M J. The genomic context of natural killer receptor extended gene families. 2001. Immunological Reviews volume 181: 20-38

Urhberg M, Valiante N M, Shum B P, Shilling H G, Lienert-Weidenbach K, Corliss B, Tyan D, Lanier L L, Parham P. Immunity volume 1997: 7, 753-763

Uhrberg M. The KIR gene family: life in the fast lane of evolution. European Journal of Immunology 2005: 35:10-15

Vilches C, Parham P. KIR: diverse, rapidly evolving receptors of innate and adaptive immunity. Annual Reviews Immunology 2002: 20:217-51

Vilches C, Castano J, Gomez-Lozano N, Estefania E. facilitation of KIR genotyping by a PCR-SSP method that amplifies short DNA fragments. 2007. Tissue Antigens 70, 415-422.

Williams F, Maxwell L D, Halfpenny I A, Meenagh A, Sleator C, Curran M D, Middleton D. Multiple copies of KIR 3DL/S1 and KIR 2DL4 genes identified in a number of individuals. Human Immunology 2003: 64, 729-732.

Yen J H, Moore B E, Nakajima T, Scholl D, Schaid D J, Weyand C M, Goronzy J J. Major histocompatibility complex class I-recognizing receptors are disease risk genes in rheumatoid arthritis. J Exp Med. 2001; 193:1159-67.

Zhang Y, Wang B, Shihui Y, Liu S, Liu M, Shen C, Teng Y, Qi J. Killer cell immunoglobulin-like receptor gene polymorphisms in patients with leukemia: Possible association with susceptibility to the disease. Leuk Res 2009, doi10.1016/j leukres.2009.04.022.

	Number	Date	Country
Parent	12998595	Jul 2011	US
Child	13200991		US

Means and methods for investigating nucleic acid sequences

Information

Publication Number

Date Filed

Date Published

Inventors

CPC

US Classifications

International Classifications

Abstract

Description

Claims

Priority Claims (1)

Continuation in Parts (1)