The invention relates to the fields of biology, molecular biology, biotechnology and medicine.
Nucleic acid sequences are investigated in a wide variety of applications. For instance, for diagnosis of infection with a pathogen, a sample of an individual is often screened for the presence of pathogen nucleic acid. Furthermore, nucleic acid sequence investigation is often performed for the diagnosis of genetic disorders, such as for instance Prader-Willi syndrome, Angelman syndrome and Duchenne muscular dystrophy. Widely used methods for detection of deletions or duplications of chromosomal sequences are quantitative multiplex PCR and quantitative Southern blotting. Drawbacks of these methods are that they are time-consuming and that results are difficult to interpret.
One particularly suitable technique for investigation of nucleic acid sequences is multiplex ligation dependent probe amplification (MLPA). This technique is based on hybridisation of probes to target nucleic acids, where after probes are amplified. In currently used MLPA assays, each MLPA probe set consists of two half probes. These two half probes contain a target-specific sequence and a primer binding site sequence to which a nucleic acid amplification primer (preferably a PCR primer) can bind. One half probe is typically shorter in length then the other. The other half probe is longer due to a non-hybridizing stuffer sequence. The stuffer sequence of each probe set is unique in length, resulting in different lengths of amplification products (typically between 130 and 480 base pairs) that can be separated by electrophoresis. In an MLPA assay, typically a plurality of probe sets is used. The two half probes of each probe set are typically added to denatured sample nucleic acid and hybridized immediately adjacent to each other on their target sequence. Subsequently, the resulting nucleic acid is subjected to a ligation reaction. Usually a ligase is used which ligates only half probes that are perfectly matched with their target sequence (such as for instance the thermostable Ligase-65). A mismatch of a half probe at the ligation site prevents ligation and amplification. Thereby no amplification products of the probe will be detected. This allows MLPA to discriminate sequences that only differ in a single nucleotide. Sequences from pseudogenes or related genes can therefore be distinguished. Ligated half probes (which are also referred to as “ligated probes”) are amplified, preferably by PCR, using primers capable of specifically binding the primer binding site sequences of the probes. The amplification products of each ligated probe are separated and analyzed, for instance by electrophoresis. Preferably, amplification products are represented graphically by separate peaks. Each peak is the product of an amplified MLPA ligated probe and a relative difference in peak intensity (height or surface) between a control sample and a sample of interest indicates copy number variation.
MLPA is particularly suitable for detecting nucleic acid (pseudo)gene variants, (pseudo)gene-specific nucleotides and/or copy number variation. MLPA has been employed in several studies, e.g. for the diagnosis of Prader-Willi or Angelman syndromes, for prenatal diagnosis of chromosomal aberrations in fetuses, and for the detection of exon deletions and/or duplications in the Duchenne muscular dystrophy gene. Overall, the conclusion was that MLPA could replace the existing methods used for screening of chromosomal abnormalities due to its relative simplicity, reproducibility and speed.
In an MLPA assay, targeted nucleic acid which is gene-specific or pseudogene-specific is preferably present at the ligation site of the half probes. When a gene-specific or pseudogene-specific nucleotide is present at (or within three nucleotides from) a ligation site, this will ensure that only perfectly matched half probes are ligated to each other. A mismatch of a half probe at the ligation site prevents ligation and amplification, whereas a perfect match of the half probe at the ligation site allows ligation and amplification. As said before, this allows MLPA to discriminate between sequences that only differ in a single nucleotide. Mismatches at four to six nucleotides away from the ligation site have been reported to have little effect on the ligation step.
Hence, the half probes are preferably designed such that the half probe whose 3′ end hybridizes at a target sequence (called herein a “left probe” or a “left half probe”) is complementary to a gene-specific sequence or pseudogene-specific sequence of the target sequence. This gene-specific or pseudogene-specific sequence of the target sequence comprises at least one but preferably more nucleotides that make the probe specific for a given gene or pseudogene. Preferably, at least one of the 3′ end nucleotides of said left half probe is complementary to at least one gene-specific nucleotide and/or at least one pseudogene-specific nucleotide of the target sequence, so that the (pseudo)gene-specific nucleotide(s) or a single nucleotide polymorphism within a given (pseudo)gene is present at (or within three nucleotides from) the ligation site of said left half probe. In this case, said left half probe and the probe whose 5′ end hybridizes at a target sequence (called herein a “right probe” or a “right half probe”) are ligated to each other only when the sequence of the left half probe perfectly matches its target sequence.
As used herein the term “gene-specific nucleotide” or “gene-specific sequence” means a nucleotide or sequence, respectively, which is present in said gene but not present at the corresponding location in at least one other related gene or pseudogene. The term “pseudogene-specific nucleotide” or “pseudogene-specific sequence” means a nucleotide or sequence, respectively, which is present in said pseudogene but not present at the corresponding location in at least one other related gene or pseudogene. Hence, at least one other (pseudo)gene comprises another nucleotide or sequence at that location. The presence of a (pseudo)gene-specific nucleotide or (pseudo)gene-specific sequence in a (pseudo)gene thus distinguishes said (pseudo)gene from at least one other (pseudo)gene, even in case when the other (pseudo)gene has a high overall homology with said (pseudo)gene.
A pseudogene is defined herein as a nucleic acid sequence which does not encode a wild type, functional, protein. The term “pseudogene” encompasses nucleic acid sequences which do not encode protein at all. Additionally, the term “pseudogene” encompasses gene alleles which comprise a modification, for instance an insertion or deletion so that they encode a protein or a part of a protein with significantly impaired, or lost, function as compared to a wild type protein of the same kind. Such allele for instance encodes a truncated protein as a result of a frame shift caused by an insertion and/or deletion of at least one nucleotide, or caused by a premature stop codon.
Since ligases only ligate half probes which are adjacent to each other, half probes need to be designed which are capable of hybridizing immediately adjacent to each other on their target sequence. This is not always convenient, because the hybridization location of a left half probe on a target nucleic acid is often determined by a (pseudo)gene-specific site of the target nucleic acid (as explained above). In such case, the sequence of the corresponding right half probe is determined as well, since the right half probe should be capable of hybridizing to a region of said target nucleic acid which is immediately adjacent to said (pseudo)gene-specific nucleotide. However, such region may comprise sequences which are very commonly present in the nucleic acid sequences of a sample. As a result, a right half probe having a sequence which is complementary to such common sequence will hybridize at many different sites of the nucleic acids present in a sample. In such case, it would be more attractive to design a right half probe with a sequence which is more specific for a given site of interest of a target nucleic acid. However, if the left half probe and the right half probe do not hybridize to adjacent regions of a target nucleic acid, the commonly used ligases will not be capable of performing the ligation reaction. Patent application WO 01/61033 in the name of Schouten discloses a solution to this problem by adding a short third probe to the reaction mixture, which third probe will fill the gap between the left half probe and the right half probe. Such third probe is designed to hybridize to a region of a target nucleic acid which lies between the left and the right half probes. After hybridization of such third probe, the left half probe is connected to the right half probe via the third probe and ligation has become possible. The third half probe does not need to be perfectly complementary to the region of the target nucleic acid which lies between the left and the right half probes, as long as the third probe connects the left half probe and the right half probe so that a ligase reaction can occur. Moreover, since the third probe is small, it will hybridize more easily to the target nucleic acid as compared to the left and right half probes. Hence, mismatches between the third probe and the target nucleic acid are allowed. This way, one and the same third probe is suitable for connecting left and right half probes of different probe sets.
Instead of using a third probe, WO 01/61033 also discloses an embodiment wherein the 3′ end of a left half probe is extended after hybridization of the half probes to the target sequence, so that the gap between the left half probe and the right half probe is filled. The resulting extended left half probe is adjacent to the right half probe and a ligase reaction has become possible.
In order to be capable of distinguishing between amplificates of different probe sets, currently used MLPA probe sets are designed such that the resulting amplificates have a different length. Differences in ligated probe length are typically realized by using a non-hybridizing stuffer sequence in one of the half probes. The stuffer sequence of the half probes of each probe set is unique in length, resulting in different lengths of amplification products that can be separated by electrophoresis. Typically, in order to be capable of discriminating between the different amplification products, the difference in length between different ligated probes is at least 5 nucleotides. Since a usual MLPA assay involves the use of many different probe sets in order to be capable of detecting a wide variety of (pseudo)gene variants, this means that long probes have to be generated. This is especially the case when complex loci carrying many (pseudo)gene-specific nucleotides are investigated for proper genotyping and/or additional single nucleotide polymorphisms are investigated for detection of subtle genetic variation within a specific genotype, as well as the presence of pseudogenes and single nucleotides in these pseudogenes. Such investigation requires the use of many different probe sets. This is inconvenient if probes are chemically synthesized, because a drawback of synthetic probes is the lower quality in comparison with cloned probes, due to contamination with incompletely synthesized probes. These incompletely synthesized probes lack or gain one nucleotide, which results in stutter peaks and split peaks. A method to remove these contaminants is to purify the synthesized probes, for instance by polyacrylamid gel electrophoresis (PAGE). If short and long probes are chemically synthesized, a higher proportion of longer probes is more likely to be affected by the incomplete oligonucleotides, causing a limitation of synthetic probe size. The upper limit of synthetic probes is typically about 100 base pairs.
On the other hand, the use of synthetic probes is preferred because they are easy to obtain and cost-effective whereas generating a probe by cloning in bacteriophage vectors is a time-consuming process and more expensive.
Hence, although good results have been obtained with currently used MLPA assays, it is desirable to provide alternatives and improvements, especially if complex (pseudo)gene loci are investigated which involves the use of many probe sets.
It is an object of the present invention to provide alternative and improved MLPA methods and MLPA-like methods.
Accordingly, the present invention provides MLPA assays and MLPA-like assays wherein at least one probe set is used which comprises a first nucleic acid probe (“left probe” or “left probe part”), a second nucleic acid probe (“right probe” or “right probe part”) and a third nucleic acid probe (“third probe” or “middle probe” or “middle probe part”), wherein at least one third probe is complementary to a target nucleic acid region comprising a (pseudo)gene-specific nucleotide or (pseudo)gene-specific sequence.
The present invention provides a different approach as compared to the prior art. MLPA methods and MLPA-like methods are now provided wherein at least one third probe, but preferably a plurality of third probes, is used in order to detect at least one (pseudo)gene-specific nucleotide of a target nucleic acid. Hence, an additional probe is used in at least one of the probe sets, which is specific for a (pseudo)gene-specific target nucleic acid. As used herein, an MLPA-like method is defined as a method comprising the steps of hybridisation of at least two probes to a target nucleic acid and ligation of at least two probes. Preferably, said MLPA-like method comprises amplification of ligated probes as well.
MLPA methods and MLPA-like methods according to the present invention have several advantages as compared to current methods. For instance, if the left probe and the third probe of a probe set are both complementary to target nucleic acid regions comprising (pseudo)gene-specific nucleotides and/or additional single nucleotide polymorphism(s), two different (pseudo)gene-specific target nucleotides or two SNP's or a combination of one (pseudo)gene specific target nucleotide and one SNP are screened using one probe set. It has become possible to use one probe set in order to screen for at least two (pseudo)gene variations which are located within a region of about 150 nucleotides of a target nucleic acid. Contrary, in a currently used MLPA assay two separate probe sets are needed for screening for two variants in a target nucleic acid. This is illustrated by the following example. If a target (pseudo)gene contains a (pseudo)gene variant at location A and at location B, an individual may comprise the following alleles: a-b, a-B, A-b and A-B. In order to determine whether allele a-B is present in a sample of said individual, a currently used MLPA assay would need a probe set specific for the “a” and/or “A” (pseudo)gene variant and a probe set specific for the “B” and/or “b” (pseudo)gene variant. If both the probe set specific for “a” and the probe set specific for “B” provide a positive result, it is concluded that allele a-B is present in said individual. With a MLPA method according to the present invention, however, only one probe set is needed wherein the left probe is specific for the “a” (pseudo)gene variant and the third probe is specific for the “B” (pseudo)gene variant. If an amplification product is obtained, it is immediately concluded that allele a-B is present in said individual. If allele a-B is not present, said probe set according to the invention will not yield an amplification product. Hence, it has become possible to more specifically screen for a given allele.
Moreover, a method of the invention provides an additional advantage when two (pseudo)gene variations are located close to each other. If the (pseudo)gene variants at location A and at location B are close to each other, the use of two different probe sets according to conventional MLPA techniques is inconvenient or even not possible at all, because the two probe sets will hinder each other in view of their close proximity. This will result in less efficient hybridization of the two probe sets, resulting in a lower signal as compared to a method according to the invention, wherein two (pseudo)gene variants can be detected using only one probe set. Hence, a method according to the invention is more sensitive when (pseudo)gene variants are located close to each other (in practice, this effect will be most profound when the (pseudo)gene variants are located between 20-100 nucleotides from each other). Having two probes to detect a variant at the same position (such as in currently used MLPA assays) will result in a change in signal intensity, depending on the presence of the (pseudo)gene variant and the binding of the probe. The use of more than two probes for one position is not advised.
As another example, in case that an individual is heterozygous for the above mentioned (pseudo)gene, the individual for instance contains alleles a-B and A-b. A conventional MLPA assay would use four probe sets (one specific for “a”, one specific for “A”, one specific for “b” and one specific for “B”). Four positive results would be obtained, because all four probe sets would hybridize and result in an amplification product. However, in such case it would still be unknown whether the individual comprises the alleles a-b and A-B, or the alleles a-B and A-b. With a method according to the present invention, however, it has become possible to directly identify the alleles of said individual. For instance, a first probe set of the invention is used comprising a left probe specific for “a” and a third probe specific for “b”, together with a second probe set of the invention comprising a left probe specific for “a” and a third probe specific for “B” and a third probe set of the invention comprising a left probe specific for “A” and a third probe specific for “b” and a fourth probe set of the invention comprising a left probe specific for “A” and a third probe specific for “B”. Two of these probe sets according to the present invention will yield an amplification product, namely the second probe set of the invention comprising a left probe specific for “a” and a third probe specific for “B” and the third probe set of the invention comprising a left probe specific for “A” and a third probe specific for “b”. The first and fourth probe sets according to the present invention will not yield (significant) amplification product. This way, it is immediately apparent which alleles are present in said individual. This, too, is an advantage as compared to currently used methods, especially when complex loci with many (pseudo)gene-specific nucleotides and additional single nucleotide polymorphisms within a given (pseudo)gene are investigated, because in such case many different combinations of such (pseudo)gene variants need to be screened for.
Another advantage of a method according to the present invention is the fact that more variations in length of the ligated probes are obtained. Since at least one probe set of the invention, but preferably a plurality of probe sets of the invention, comprise a third probe it has become possible to design the probe sets such that variations in length of the resulting ligated probes are obtained. This obviates the need of stuffer sequences. As a result, the individual probes of a probe set according to the invention can be kept shorter, which is particularly advantageous when chemically synthesized probes are used because chemical production of long probes is cumbersome, as explained above. Hence, a method according to the invention allows for the use of probe sets with relatively short probes, while the resulting ligated probes are long enough to allow for many size variations. Thus, the present invention allows the use of synthetic probes, which are easy to obtain and cost-effective, even when complex loci are investigated, and offers greater flexibility to adapt the assay in case of cross-reactivity or unclear results.
For instance, if 20 (pseudo)gene variants are investigated, probes with a stuffer sequence with a length varying from 4 to 100 nucleotides would need to be used in a conventional MLPA assay in order to be capable of distinguishing the resulting amplification products by size. Since the probe sequences hybridizing to a target sequence are typically about 30 nucleotides, and since the primer binding sequences of the probes are typically about 15-25 nucleotides, this would mean that probe sets with probes with a length varying from 45-125 nucleotides would need to be synthesized. When the probes are chemically synthesized, it is hardly possible to obtain reliable probe sets with these lengths. With a method according to the invention, however, differences of length between the various amplificates need not to be obtained by use of stuffer sequences in the probe sets. Instead, at least one third probe is used, preferably a plurality of third probes is used. By varying combinations of three probes, optionally in combination with probe sets consisting of two probes, the overall length differences of the ligated probes vary considerably whereas probe sets can be used with chemically synthesized probes with convenient lengths. Of course, this does not mean that the use of stuffer sequences is excluded. But the skilled person does no longer have to rely on these stuffer sequences only for length variations. If stuffer sequences are used in a method according to the invention, it is preferred to keep these sequences as short as possible.
Accordingly, the present invention provides a method for screening for the presence of at least one target nucleic acid sequence in a sample, comprising the steps of:
a) adding to said sample at least two different probe sets, each probe set comprising:
b) allowing hybridization of said at least two different probe sets to complementary nucleic acid of said sample,
c) subjecting nucleic acid of said sample to a ligation reaction, and
d) determining whether said at least one target nucleic acid sequence is present in said sample,
wherein at least one third nucleic acid probe is complementary to a target nucleic acid region comprising a (pseudo)gene variation.
The advantage of probe sets comprising at least three probes according to the present invention is that at least two different SNPs can be detected with one probe set. For instance, in a probe set comprising three probes two sites for ligation are present. A left probe and middle probe are ligated, and a middle probe and right probe are ligated. At each ligation site a SNP can be detected. Thus it is possible to design two probes of the same probe set in such a way that they are used to detect two SNPs. In that case, using MLPA and a probe set comprising three probes according to the invention, a product will only be obtained when both SNPs are present in a sample, because only then ligation can occur at both ligation sites.
With conventional MLPA probesets consisting of two probes only one SNP can be detected, because only one site for ligation is present. Additional third probe parts in conventional MLPA, as described in WO 01/61033, are occasionally used to bridge the two half probes. Such an additional third probe part is not SNP-specific. Therefore, the advantages of probe sets comprising at least three probes according to the present invention are not obtained when using such additional third probe part for bridging purposes in conventional MLPA.
Therefore, in a preferred embodiment of the invention a probe set comprises three nucleic acid probes wherein each of at least two nucleic acid probes are specific for a different (pseudo)gene variation. Preferably, a first (or a second) nucleic acid probe of a probe set according to the invention is complementary to a target nucleic acid region comprising a gene-specific nucleotide and/or a pseudogene-specific nucleotide and/or a gene-specific sequence and/or a pseudogene-specific sequence and/or a polymorphism within a given gene or pseudogene, and a third nucleic acid probe of the same probeset is complementary to another target nucleic acid region comprising a gene-specific nucleotide and/or a pseudogene-specific nucleotide and/or a gene-specific sequence and/or a pseudogene-specific sequence and/or a polymorphism within a given gene or pseudogene. Said polymorphism preferably comprises an SNP.
Preferably, ligated probes are amplified. Accordingly, the present invention provides a method for screening for the presence of at least one target nucleic acid sequence in a sample, comprising the steps of:
a) adding to said sample at least two different probe sets, each probe set comprising:
wherein at least one of said probe sets comprises a third nucleic acid probe, said third probe comprising a third nucleic acid sequence complementary to a third region of said target nucleic acid sequence, and
wherein, if said third probe is present in said probe set, said first and said third region of said target nucleic acid are located essentially adjacent to each other and said third and said second region of said target nucleic acid are located essentially adjacent to each other, and
wherein, if said third probe is not present in said probe set, said first and said second region of said target nucleic acid are located essentially adjacent to each other,
b) allowing hybridization of said at least two different probe sets to complementary nucleic acid of said sample,
c) subjecting nucleic acid of said sample to a ligation reaction,
d) subjecting nucleic acid of said sample to a nucleic acid amplification reaction, using at least one primer capable of specifically binding said first primer binding site and at least one primer capable of specifically binding said second primer binding site, and
e) determining whether amplified nucleic acid is present, thereby determining whether said at least one target nucleic acid sequence is present in said sample,
wherein at least one third nucleic acid probe is complementary to a target nucleic acid region comprising a (pseudo)gene variation.
As used herein, the term “(pseudo)gene variation” encompasses a (pseudo)gene-specific nucleotide and/or a (pseudo)gene-specific sequence. In one embodiment, said (pseudo)gene variation comprises an additional polymorphism within a given (pseudo)gene. Said additional polymorphism preferably comprises an SNP.
Hence, the present invention uses probe sets, wherein at least one probe set, but preferably a plurality of probe sets, comprises three probes. The probes comprise sequences which are complementary to a region of a target nucleic acid of interest. As used herein, the term “complementary” means that said probe sequence comprises at least 70%, preferably at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity to said region or to the complement of said region. The term “% sequence identity” is defined herein as the percentage of residues in a nucleotide sequence that is identical with the residues in a reference sequence after aligning the two sequences and introducing gaps, if necessary, to achieve the maximum percent identity. Methods and computer programs for the alignment are well known in the art. One computer program which may be used or adapted for purposes of determining whether a candidate sequence falls within this definition is Autoassembler 2.0 (ABI Prism, Perkin Elmer).
The first and second probes of each probe set also comprise a primer binding site, so that the resulting ligated probes can be amplified. Preferably, the primer binding sites of the first nucleic acid probes of each probe set is designed such that the same primer can bind. This allows the use of the same primer for binding the primer binding sites of the first probes in step d). Likewise, it is preferred that the primer binding sites of the second nucleic acid probes of each probe set is designed such that the same primer can bind. Most preferably, the probe sets are designed such that a first primer is capable of specifically binding the primer binding sites of the first nucleic acid probes of each probe set and a second primer is capable of specifically binding the primer binding sites of the second nucleic acid probes of each probe set. This embodiment allows the use of only one primer pair in step d). This is, however, not necessary: it is also possible to use different primers for different probe sets. The number of different primers is, however, kept as low as possible.
One preferred embodiment therefore provides a method according to the invention, wherein the first primer binding sites of the first nucleic acid probes of each probe set is capable of specifically binding the same primer and/or wherein the second primer binding sites of the second nucleic acid probes of each probe set is capable of specifically binding the same primer. Preferably, the first nucleic acid probes and/or the second nucleic acid probes of each probe set comprise essentially identical primer binding sequences. Further provided is therefore a method according to the invention, wherein the non-complementary nucleic acid sequences of said first nucleic acid probes comprise essentially identical first primer binding sites and/or wherein the non-complementary nucleic acid sequences of said second nucleic acid probes comprise essentially identical second primer binding sites. Using essentially identical primer binding sequences ensures that the same primer can bind different probes. The term “essentially identical primer binding sequences” is defined herein as primer binding sequences which comprise at least 80%, preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity to each other.
As already described, a method according to the invention is particularly suitable for investigating a nucleic acid sequence having various (pseudo)gene specific nucleotides and/or (pseudo)gene variants, such as complex loci. It is therefore preferred to use a plurality of third probes, so that many (pseudo)gene variant combinations are investigated. A method according to the invention is therefore preferably provided wherein at least two, preferably at least five, more preferably at least ten different third nucleic acid probes are used. As illustrated in the Examples, a plurality of probe sets comprising different third probes according to the invention allows for screening of complex gene loci such as the KIR locus. Not all third probes need to be specific for a genetic variation of a target nucleic acid. It is also possible to use a combination of variant-specific third probes and third probes which are not specific for a (pseudo)gene variation. Likewise, not all first probes need to be specific for a variant of a target nucleic acid. It is also possible to use a combination of variant-specific first probes and first probes which are not specific for a (pseudo)gene variation. Any of these combinations is for instance used to vary the length of the resulting ligated probes to a larger extent. In one preferred embodiment of the invention, therefore, at least 50%, preferably at least 70%, more preferably at least 80%, most preferably at least 90% of the third nucleic acid probes is complementary to a target nucleic acid region comprising a (pseudo)gene variation. In one embodiment, all third probes are complementary to a target nucleic acid region comprising a (pseudo)gene variant. Preferably, the second probes (“right probes”) are not designed to contain (pseudo)gene variant-specific sequences, although the use of variant-specific right probes in a method according to the invention is not excluded.
Preferably, at least 50%, preferably at least 70%, more preferably at least 80%, most preferably at least 90% of the third nucleic acid probes that are complementary to a target nucleic acid region comprising a (pseudo)gene variation are combined with a first nucleic acid probe or a second nucleic acid probe that is complementary to another target nucleic acid region comprising a (pseudo)gene variation in order to be capable of screening for many variants with one MLPA assay or MLPA-like assay. In one embodiment, all third probes that are combined with a first nucleic acid probe or a second nucleic acid probe that is complementary to a target nucleic acid region comprising a (pseudo)gene variation are complementary to a target nucleic acid region comprising a (pseudo)gene variant. Of course, these probes are preferably specific for different variants.
In one preferred embodiment, a (pseudo)gene variant-specific sequence of a third probe is at least located within the last three nucleotides or the first three nucleotides of the third probe. This means that the last three nucleotides and/or the first three nucleotides comprise at least one nucleotide which is specific for a (pseudo)gene variation of a target nucleic acid. In this embodiment, said (pseudo)gene variation is present at a ligation site of the third probe, so that ligation is only possible when the sequence of the third probe is exactly complementary to said (pseudo)gene variation. This enhances the specificity of the MLPA method, as explained before. Preferably, the last three nucleotides and/or the first three nucleotides of said third probe comprise one nucleotide which is specific for a (pseudo)gene variant of a target nucleotide.
The probe sets according to the present invention preferably have a length between 90 and 300 nucleotides. Cloned probes can be as long as 500 nucleotides. Preferably, however, chemically synthesized probes are used because they are rapidly synthesized, easy to obtain and cost-effective. In order to be capable of synthetically producing the probes according to the present invention, a method according to the invention is preferably provided wherein third nucleic acid probes with a length of between 20 and 100 nucleotides are used. Most preferably, third nucleic acid probes with a length of between 19 and 110 nucleotides are used. Since at least one probe set of the invention, but preferably a plurality of probe sets according to the invention, is used which comprise three nucleic acid probes, sufficient variations in length and specificity of the resulting ligated probes is ensured so that many (pseudo)gene variations can be investigated simultaneously.
These length variations of the resulting ligated probes obviate the need of stuffer sequences, as explained before. It is therefore possible to design the probe sets such that the parts of the first and/or second probe which are not complementary to a target nucleic acid have about the same length. According to this embodiment, the length of the non-complementary sequences of all first probes is about the same in each probe set, and/or the length of the non-complementary sequences of all second probes is about the same in each probe set. These lengths are about the same when they do not differ from each other by more than 10 nucleotides. Preferably, they do not differ from each other by more than 6 nucleotides, most preferably they do not differ from each other by more than 4 nucleotides. This, too, facilitates synthetic production of the probes. Further provided is therefore a method according to the invention, wherein the difference in length of said non-complementary nucleic acid sequences of said first nucleic acid probes of said at least two different probe sets and/or the difference in length of said non-complementary nucleic acid sequences of said second nucleic acid probes of said at least two different probe sets is less than 6, preferably less than 4 nucleic acids.
Besides the analysis of (pseudo)gene-specific nucleotides and additional single nucleotide polymorphisms, an MLPA technique or MLPA-like technique is particularly suitable for relative (pseudo)gene copy number determination. If multiple copies of a (pseudo)gene of interest (or any other target nucleic acid of interest) are present in sample nucleic acid molecules, each copy will, in principle, be bound by the specific probes which is detectable. When the probes are amplified, more amplification product will be present when multiple copies were present in the original sample nucleic acid as compared to a situation wherein only one copy is present. Analysis of the amount of amplification product thus provides information about the copy number of a target nucleic acid of interest. This is often done by graphically representing amplified products by separate peaks. Each peak is the product of an amplified MLPA ligated probe and a relative difference in peak intensity (height or surface) between a control sample and a sample of interest indicates copy number variation. When a complex locus is investigated, multiple copies of a (pseudo)gene of interest can be present in highly polymorphic regions. In such case, when (pseudo)gene copy number is to be determined, many different combinations of (pseudo)gene variants need to be taken into account. This involves the use of a wide variety of different probe sets, to ensure that each combination of (pseudo)gene variants can be detected. In one embodiment according to the present invention, however, when the relative copy number of a nucleic acid of interest is to be estimated, an improved approach is provided. According to this embodiment, at least one probe is used with degenerate bases at one or more positions. This means that a mixture of probes is used wherein different nucleotides can be present at one or more positions. Hence a mixture of probes is used, which probes have the same sequence, except for the fact that some probes have a certain nucleotide at a given position X and some probes have another nucleotide at said position X. Such degenerate bases are commonly represented by the IUB nucleotide codes as depicted in
Alternatively, or additionally, a probe set is used which comprises an alternative base which alternative base is capable of binding at least two bases selected from the group consisting of A, T, G, C and U. Preferably, said alternative base is capable of binding at least three, most preferably at least four, bases selected from the group consisting of A, T, G, C and U. Such alternative base is suitable as an alternative for degenerate bases. It is, of course, also possible to combine such alternative base with degenerate bases. In a particularly preferred embodiment said alternative base is deoxyinosine triphosphate (dITP) or a functional equivalent thereof, which is capable of binding A and T and G and C and U. Further provided is therefore a method for determining the copy number of a nucleic acid of interest, wherein at least one probe set is used which comprises an alternative base which is capable of binding at least two, preferably at least three, more preferably at least four bases selected from the group consisting of A, T, G, C and U. As said before, said alternative base preferably comprises deoxyinosine triphosphate (dITP) or a functional equivalent thereof. A use of at least one probe set for determining the copy number of a nucleic acid of interest, wherein at least one probe set comprises an alternative base which is capable of binding at least two, preferably at least three, more preferably at least four bases selected from the group consisting of A, T, G, C and U, is also provided herewith. In one preferred embodiment, at least one probe set comprising such alternative base(s) is used in a MLPA method or MLPA-like method according to the present invention. Further provided is therefore a method according to the invention, wherein at least one probe set is used which comprises an alternative base which is capable of binding at least two, preferably at least three, more preferably at least four bases selected from the group consisting of A, T, G, C and U. As said before, said alternative base preferably comprises deoxyinosine triphosphate (dITP) or a functional equivalent thereof.
The present invention provides alternative and improved methods for screening for the presence of at least one target nucleic acid sequence in a sample, wherein at least one third probe is used which is complementary to a target nucleic acid region comprising a (pseudo)gene variation. A use of a probe set comprising at least three nucleic acid probes, wherein at least one third probe is complementary to a target nucleic acid region comprising a gene variant and/or a pseudogene variant, for screening for the presence of at least one target nucleic acid sequence in a sample is therefore also provided. Preferably, a plurality of probe sets according to the present invention is used. Further provided is therefore a use of a plurality of probe sets for screening for the presence of at least one target nucleic acid sequence in a sample, wherein each of said probe sets comprises:
wherein at least one of said probe sets comprises a third nucleic acid probe, said third probe comprising a third nucleic acid sequence complementary to a third region of said target nucleic acid sequence, and
wherein, if said third probe is present in said probe set, said first and said third region of said target nucleic acid are located essentially adjacent to each other and said third and said second region of said target nucleic acid are located essentially adjacent to each other, and
wherein, if said third probe is not present in said probe set, said first and said second region of said target nucleic acid are located essentially adjacent to each other, and
wherein at least one third nucleic acid probe is complementary to a target nucleic acid region comprising a gene-specific nucleotide and/or a pseudogene-specific nucleotide and/or a gene-specific sequence and/or a pseudogene-specific sequence and/or an additional polymorphism within a given gene or pseudogene, said polymorphism preferably comprising an SNP.
A method according to the present invention is particularly suitable for analysis of (pseudo)gene variation and (pseudo)gene copy number determination in complex loci such as the gene encoding complement factors (e.g. Factor H and FH-like genes, C4A and C4B within the HLA-class III region), chemokines and their receptor alleles (e.g. CCL3L1, CCL4L1, CCR5 or CCR5delta32), HLA-class I and II, SIRPs and LILRs.
In one preferred embodiment, a method according to the invention is used in order to investigate the killer cell immunoglobulin-like receptor (KIR) locus. KIRs are expressed by natural killer (NK) cells and a subset of T cells. NK cells are cells of the lymphoid lineage, but display no antigen-specific receptors. Their main function is to monitor host cells for the presence of MHC class I molecules and this is important for e.g. distinguishing healthy cells from virus-infected or tumors cells. Interaction between NK cells and MHC class I molecules is mediated by KIRs. The KIR locus in humans is polygenic and highly polymorphic, so that accurate and efficient characterization of an individual's KIR (pseudo)gene profile is cumbersome. In the determination of the KIR (pseudo)gene profile and their role in many diseases an efficient and reliable method for KIR genotyping is, however, important. Until now, KIR genotyping is based upon the polymerase chain reaction sequence-specific primer (PCR-SSP) (Sun et al, 2004), multiplex PCR (Vilches et al, 2007) and PCR-sequence specific oligonucleotide probes (PCR-SSOP) (Crum et al, 2000). For the PCR-SSP high-quality genomic DNA is required and multiple reactions are needed to generate a complete KIR profile of an individual. Multiple copies of KIR2DL4 and KIR3DL1/S1 in individuals have been reported with PCR-SSOP (Williams et al, 2003). Detection of the multiple gene copies was possible because the gene copies of these genes consisted of different alleles. However, multiple gene copies of highly homologous or identical sequences are not distinguishable with this molecular detection system or cloning methods when individuals are homozygous for a gene (Williams et al, 2003).
As shown in the Examples, a method according to the present invention is particularly suitable for investigating the KIR locus of individuals. Even though this locus is highly polymorphic, (pseudo)gene variants and copy number variations are efficiently detected with methods according to the present invention. One preferred embodiment therefore provides a method or use according to the invention, wherein said target nucleic acid sequence is present in a KIR locus. Preferably, copy number variation of at least one KIR gene and/or at least one KIR pseudogene is determined.
In a particularly preferred embodiment, a probe set of
It is preferred to use at least two probe sets selected from
It is of course also possible to modify a sequence of at least one probe depicted in
The left and right probes of the probe sets of
Of course, these primer binding sites can be varied at will, as long as complementary primers are used in the amplification reaction. Therefore, the primer binding sites of probes according to the invention need not be at least 70% identical to the above mentioned sequences. Lower sequence identity can be used, complementary to the amplification primers. Other primers then used in the Examples can be developed for use in a method according to the invention wherein probes depicted in
However, the parts of probes according to the invention that are capable of hybridizing to KIR genes preferably have at least 70% sequence identity to the KIR-specific sequences depicted
Preferably, probe sets are used which are based on the probe sets depicted in
Novel probes and probe sets which are particularly suitable for (pseudo)gene variant analysis and (pseudo)gene copy number determination of the KIR locus are also provided. These probes and probe sets are listed in
Further provided is a kit for detecting the presence of at least one target nucleic acid sequence in a sample, comprising a probe set or a mixture of nucleic acids according to the invention. Said at least one target nucleic acid sequence preferably comprises a nucleic acid sequence present in a KIR locus. A kit according to the invention preferably further comprises a PCR primer set comprising at least 70%, preferably at least 80%, more preferably at least 85%, more preferably at least 90%, most preferably at least 95% sequence identity to nucleic acid sequences 5′-GGGTTCCCTAAGGGTTGGA and TCTAGATTGGATCTTGCTGGCAC-3′ or TCTAGATTGGATCTTGCTGGCGC-3′, or the complements thereof. These primers are particularly suitable for amplifying probe sets depicted in
The invention further provides calibrators that are particularly suitable for determining copy numbers of nucleic acids of interest. Currently only relative gene copy number can be determined. This is often done by graphically representing amplified products of genes of interest by separate peaks. A relative peak intensity (height or surface) of an amplified product of a gene of interest is compared with the peak intensity of an amplified product of a control sample containing the gene of interest to determine relative copy number. For instance, if an MLPA reaction is used, each peak represents the product of an amplified MLPA ligated probe. However, with such a method it is not possible to quantify the absolute gene copy number because intensity peaks of a control sample do not represent a known copy number. Furthermore, a reference sample does not always contain all genes of interest. This is in particular the case for polygenic and highly polymorphic gene loci such as the KIR locus and the human leukocyte antigen (HLA) locus, whereby the identity and copy number of alleles differ greatly between individuals. Since no individual has all alleles of such polygenic and highly polymorphic gene cluster, a reference sample containing all these alleles is not available. Thus, if a sample of a random individual is compared with such a reference sample in order to determine the haplotype and/or copy number of genes, possibly several alleles of said individual are not detected because they are not present in the reference sample. Thus, with a reference sample currently used in the art, it is not possible to determine the complete haplotype (including copy number variation) of such polygenic and highly polymorphic gene loci such as the KIR locus of an individual. It is of course possible to use multiple reference samples, which will result in a more elaborate method. Furthermore, some alleles of a gene cluster are relatively rare and it is difficult to obtain reference samples with all known alleles of a gene cluster.
The invention provides means and methods that enable determination of the complete haplotype of a polygenic and highly polymorphic gene cluster. In addition, determination of copy number variation of genes of such gene cluster of an individual has now become possible. This comprises the use of a nucleic acid molecule comprising at least one control nucleic acid sequence, and for each gene or allele of interest a nucleic acid sequence which is unique for said gene or allele of interest. Said nucleic acid molecule can be used as such. Of course, such nucleic acid molecule can also be present in a vehicle such as for instance a plasmid, which optionally comprises other nucleic acid sequences. Such nucleic acid molecule as such or a vehicle or plasmid comprising such nucleic acid molecule are herein referred to as a calibrator according to the invention. Instead of a single nucleic acid molecule, a calibrator according to the invention can also contain a combination of multiple nucleic acid molecules or multiple vehicles/plasmids according to the invention. For instance, a calibrator according to the invention may contain 2, 3, 4, 5, 6, 7, 8 or more separate nucleic acid molecules. Preferably, however, a calibrator according to the invention consists of one nucleic acid molecule, vehicle or plasmid. In
A calibrator according to the invention comprises at least one nucleic acid sequence with a length of at least 10 nucleotides which is at least 70% identical to a part of a (pseudo)gene comprising a polymorphism, such as a SNP. A part of a (pseudo)gene is defined as a consecutive stretch of at least 10 nucleotides in said (pseudo)gene, Preferably said part is at least 15, more preferably at least 20, more preferably at least 25, more preferably at least 30 nucleotides, such as 35, 40, 45 or 50 nucleotides. Said nucleic acid sequence of a calibrator according to the invention preferably comprises a sequence which is identical to—or complementary to—a polymorphism of said (pseudo)gene. Preferably, a calibrator according to the invention comprises at least one nucleic acid sequence which is at least 75%, more preferably at least 80%, more preferably at least 85%, more preferably at least 90%, more preferably at least 95% identical to part of a gene, which part comprises at least one polymorphism.
In addition, such calibrator comprises at least one nucleic acid sequence which is at least 70% identical to part of a control gene. A part of a control gene is defined as a consecutive stretch of at least 10 nucleotides in said control gene, preferably at least 15, more preferably at least 20, more preferably at least 25, more preferably at least 30 nucleotides, such as 35, 40, 45 or 50 nucleotides. As used herein, control genes are preferably genes which have a constant copy number in the human genome. Preferably a calibrator according to the invention comprises at least one nucleic acid sequence which is at least 75%, more preferably at least 80%, more preferably at least 85%, more preferably at least 90%, more preferably at least 95% identical to part of a control gene. Preferably sequences are used of control genes that have no or few polymorphisms so that these sequences will always be present in samples of individuals, avoiding the need to use many different control sequences for one particular control gene.
In one embodiment, a calibrator according to the invention comprises a nucleic acid sequence which is identical to part of a gene, which part comprises at least one polymorphism and a nucleic acid sequence which is identical to part of a control gene. As described before, such part contains at least 10 nucleotides. In one embodiment, a calibrator according to the invention comprises at least one nucleic acid sequence which is at least 70% identical to part of a gene, which part comprises at least two polymorphisms, such as two SNPs.
A calibrator is preferably designed such that all genes of interest and their allelic variants are separately represented once with a unique nucleic acid sequence on the calibrator. Different unique nucleic acid sequences from one gene of interest or gene variant of interest may also be represented on a calibrator as a single sequence to yield the same result: i.e. detection by one single probe in a mixture of probes as one copy of that sequence on the calibrator used.
With the use of a calibrator according to the invention, it is possible to determine in one reaction polymorphisms as well as absolute copy numbers of (pseudo)genes. In a preferred embodiment, nucleic acid sequences representing all possible (pseudo)genes of a gene cluster are present on a calibrator according to the invention. As used herein, “representing all possible (pseudo)genes of a gene cluster” means that for each (pseudo)gene of a gene cluster a nucleic acid sequence which is at least 70% identical to a part of at least 10 nucleotides of said (pseudo)gene is present on the calibrator. Said part of said (pseudo)gene preferably comprises at least one polymorphism, such as a SNP. Such calibrator according to the invention comprising nucleic acid sequences representing all possible (pseudo)genes of a gene cluster allows for the determination of presence or absence, as well as the copy number, of each of said (pseudo)genes in an individual, using a sample of said individual. Thus, such calibrator enables determining the entire haplotype of a polygenic and polymorphic gene cluster of an individual.
Now that a calibrator according to the invention is provided by the invention, an improved reference sample when determining (absolute) copy number variation has become available. This is for instance shown in Example 5 for determining the presence and copy number of KIR genes, which form a particularly polygenic and highly polymorphic gene locus. A calibrator according to the invention can be advantageously used with any method known to a person skilled in the art for detecting nucleic acid, such as (real-time) PCR, PCR-SSP, multiplex PCR and PCR-SSOP and MLPA. However, a calibrator according to the invention is particularly suitable for use in an MLPA method according to the invention.
An example of how a calibrator according to the invention can be used in a reference sample in an MLPA reaction is as follows. A test sample with nucleic acid of an individual and a reference sample comprising a calibrator according to the invention are provided with MLPA probes. Following ligation of MLPA probes to nucleic acid of said test sample and to said calibrator of said reference sample, an amplification reaction with both the reference sample with calibrator and the test sample is performed. A calibrator according to the invention and the MLPA probes are designed such that each amplified nucleic acid variant of a (pseudo)gene containing a polymorphism has a different length. It is thus immediately apparent which variant of a (pseudo)gene is present from the presence and length of amplified product. Furthermore, the amount of amplification products derived from (pseudo)genes of interest of the test sample can be correlated to the amount of amplification products of the calibrator of the reference sample. Also, the amount of amplification product derived from a control gene of the test sample, of which the copy number in the genome is known, can be correlated to the amount of amplification product of the corresponding control sequence of the calibrator of the reference sample. Based on the correlations of the amounts of amplified product of a (pseudo)gene of interest between the test sample and the reference sample, and the correlations of the amounts of amplified product of a control gene (with a constant copy number in the human genome) between the test sample and the reference sample, the copy number of the (pseudo)genes of interest in an individual can be determined. A more detailed example of the determination of the copy number of a gene, when used with a calibrator according to the invention in an MLPA method, is described below.
An MLPA method according to the invention comprises the use of a sample with nucleic acids obtained from an individual (“a test sample”) and a reference sample comprising a calibrator according to the invention (“a reference sample”). In one embodiment, both said test sample and said reference sample containing the calibrator are subjected to an MLPA method, preferably an MLPA method according to the invention. This comprises the addition to said test sample and said reference sample of at least one probe set which is complementary to part of a (pseudo)gene of interest (said part preferably comprising a polymorphism), and at least one probeset which is complementary to part of a control gene. The probe sets are allowed to hybridize to the target nucleic acid in said test sample and to the target nucleic acid located on the calibrator in the reference sample. Said target nucleic acid located on the calibrator is at least 70% identical to target nucleic acid in said test sample. Subsequently the probes of the different probe sets hybridized to nucleic acid in said test sample and in said reference sample are subjected to a ligation reaction. As herein before described in detail, ligation will only occur if probes of a specific probeset are hybridized immediately adjacent to each other on their target sequences. Thus, if one specific variant of a polymorphic gene is present in the test sample, only the probes of the probe set specific for this specific variant will ligate, whereas the probes of probe sets specific for other gene variants will not ligate. As described before, a ligated probe set according to the invention is flanked by two primer binding sites. During a subsequent amplification reaction, only the ligated probe sets will be amplified. Hence, the presence or absence of each gene variant in a test sample is directly determined.
The calibrator, however, preferably contains binding sites for each of the probe sets, so that amplification of all probe sequences will occur in the reference sample. This avoids false-negative test results; if a given probe sequence is not amplified in the test sample, it is verified whether the corresponding probe sequence is amplified in the reference sample, i.e. from the calibrator. Only if the probe sequence is indeed amplified in the reference sample, the absence of the probe sequence in the test sample is considered a reliable result. If the probe sequence appears not to be amplified in the reference sample either, this indicates a failure of the test procedure and the test results are to be discarded. Hence, false-negative test results are avoided. A binding site for a particular probe set is a nucleic acid sequence that is at least 70% identical to part of a (pseudo)gene to which said probe set is complementary. As used herein, reference to a gene of interest also encompasses a pseudogene of interest.
Besides probe sets specific for a gene of interest, probe sets specific for control genes are used. Control genes preferably have a constant copy number in the human genome, such as for instance two. This copy number is known. Nucleic acid sequences with a length of at least 10 nucleotides which are at least 70% identical to control gene sequences and nucleic acid sequences with a length of at least 10 nucleotides which are at least 70% identical to gene of interest sequences are present in known amounts on the calibrator. It is therefore possible to correlate amplified products from the calibrator to amplified products of a test sample. This is for instance done as follows. Amplification reactions of the test sample and of the reference sample will result in an intensity peak pattern with peaks for each amplified nucleic acid product. The peaks will have varying intensity (for instance height or surface). The peaks represent amplified nucleic acid sequences indicative for a gene of interest or a control gene. The peak intensity (for instance height or surface) of an amplified control gene product of the test sample and the peak intensity of the amplified product of the same control gene sequence in the reference sample (i.e. of the calibrator) are compared. This is done for each control gene sequence. The peak intensities of both amplified products ought to be the same, representing for instance 2 copies per genome or DNA sample tested if the copy number of the control gene is 2. The control genes are also an internal quality control of both the test and the reference sample, because amplified product is only detected if the MLPA reaction was successful. The proportion between the peak intensities of amplified control gene product in the test sample and in the reference sample can be determined based on relative differences in peak intensity. Subsequently, the peak intensity of amplified product of a gene of interest of the test sample and the peak intensity of amplified product of a sequence of the same gene of the reference sample (i.e. of the calibrator) are also compared. This is also done for each gene of interest. The proportion between the peak intensities of amplified product of each gene of interest in the test sample and in the reference sample is determined based on relative differences in peak intensity as well.
If nucleic acid corresponding to a specific nucleic acid sequence in a control gene and nucleic acid corresponding to a to a specific nucleic acid sequence in a gene of interest are present on the calibrator in the same amount, equal amounts of product will in principle be amplified during the amplification reaction. In that case, the peak intensities of the control gene and the gene of interest can be directly compared. If the control gene and the gene of interest are present in the same amount in the test sample as well, the difference between the peak intensities of control gene in test sample and reference sample will be comparable to the difference between the peak intensities of the gene of interest in the test sample and reference sample. Thus, the proportion between the peak intensities of amplified product of the gene of interest in the test and reference sample, and the proportion between the peak intensities of amplified control gene product in the test and reference sample are equal if the copy number of the gene of interest and the copy number of the control gene are equal. For instance, if the copy number in the human genome of a control gene is 2, and if the peak intensity of amplified control gene product in a reference sample is determined to be 2 and the relative peak intensity of amplified control gene product in a test sample is determined to be 3, the proportion between the peak intensities of amplified product of the control gene in the test sample versus the reference sample is 3/2=1.5. If it is determined that the proportion between the peak intensities of amplified product of a gene of interest in the test sample versus the reference sample is also 1.5, it can be concluded that the copy number of the gene of interest is identical to the copy number of said control gene, that is 2. This direct comparison of peak intensity proportions only applies if nucleic acid corresponding to part of the control gene and nucleic acid corresponding to part of the gene of interest are present in the same amount on the calibrator. It is, therefore, preferred to use the same number of different nucleic acid sequences on the calibrator. Otherwise, the difference between the number of different nucleic acid sequences on the calibrator needs to be taken into account when calculating the copy number of the corresponding genes in the test sample.
When equal numbers of different nucleic acid sequences are present on the calibrator, the proportion between the peak intensities of amplified product of a gene of interest in the test and reference sample, and the proportion between the peak intensities of amplified control gene product in the test and reference sample, are substantially equal if the copy number of the control gene and the copy number of the gene of interest in a test sample are the same, and these proportions are not equal if the copy number of the control gene and the copy number of the gene of interest in a test sample are different. The copy number of the gene of interest in the test sample can then be calculated based on the peak intensity proportions of the control gene of the test sample and the control gene of the reference sample and the peak intensity proportions of the gene of interest of the test sample and the gene of interest of the reference sample, making use of the known copy number in the test sample of the control gene. For instance, if, like in the example above, the copy number of a control gene is 2, and if the peak intensity of amplified control gene product in the reference sample is determined to be 2 and the relative peak intensity of amplified control gene product in the test sample is determined to be 3, the proportion between the peak intensity of amplified product of the control gene in the test sample versus the reference sample is 3/2=1.5. If the proportion between the peak intensities of amplified product of a gene of interest in test and reference sample is determined to be 3, instead of 1.5, it can be concluded that the copy number of the gene of interest is twice (3/1.5) the copy number of the control gene, that is 4. It is also possible to determine the proportion of the peak intensity of the control gene and the peak intensity of the gene of interest of the reference sample and to compare this with the proportion of the peak intensities of amplification product of the corresponding genes of the reference sample. If this proportion of the reference sample is about one (meaning that the peak intensities are about the same, which is often the case when the number of gene-specific sequences and control-specific sequences on the calibrator are the same), and if this proportion of the test sample is 2, it can be concluded that the copy number of the gene of interest in the individual is twice the (known) copy number of the control gene in the individual.
Although not necessary, before determining the copy number of a gene of interest in a test sample, the concentration of total nucleic acid in the test sample is preferably measured, for instance using spectrometry. The molecular weight of a calibrator according to the invention is, of course, also known. The concentration of nucleic acid in the test sample and the concentration of nucleic acid (i.e. calibrator) in a reference sample can then be made approximately equal, so that the peak intensities of amplified product of control genes, which have a constant copy number, of the test sample and of the reference sample are approximately equal. In that case, a direct comparison between the peak intensities of amplified product of a gene of interest of a test sample and a reference sample can be made. Then, the differences in nucleic acid concentration in the test and references samples do not need to be taken into account.
For instance, the concentration of nucleic acid in the test sample and the concentration of the calibrator in the reference sample can both be based on the molecular weight of the human genome. If, for example, the amount of genomic DNA added in a test sample of the assay is 100 nanogram, the concentration of genomic DNA in the assay is then 4.8E-15 Mol/liter, based on the fact that a diploid human female and male nuclei in G1 phase of the cell cycle should contain 6.950 and 6.829 pg of DNA, respectively. It is then possible to prepare a reference sample with calibrator in the same concentration as the concentration of nucleic acid in the test sample, because the weight of the calibrator can be calculated if the exact composition of the calibrator is known. In that case, a copy number of for instance 2 for a control gene in the test sample will result in the same amount of amplified product in both the test and the reference sample. It follows that the peak intensities of this control gene product will then be approximately equal for the test sample and the reference sample.
Preferably, a calibrator according to the invention comprises the same number of copies of each nucleic acid sequence. For instance, 1 copy of each nucleic acid which is at least 70% identical to part of a gene of interest and 1 copy of each nucleic acid which is at least 70% identical to part of a control gene are preferably present on the calibrator.
Accordingly, the invention provides a nucleic acid molecule comprising at least one control nucleic acid sequence and at least one nucleic acid sequence with a length of at least 10 nucleotides which is at least 70% identical to part of a gene or pseudogene of interest, or a complementary sequence thereof, wherein at least 80%, preferably at least 85%, more preferably at least 90%, more preferably at least 95%, of said nucleic acid sequences which are at least 70% identical to part of at least one gene or pseudogene of interest comprise a sequence that is identical to, or complementary to, a gene-specific nucleotide and/or a pseudo-gene specific nucleotide and/or a gene-specific sequence and/or a pseudogene-specific sequence and/or an additional polymorphism within said gene or pseudogene, said polymorphism preferably comprising an SNP. Preferably, a nucleic acid molecule according to the invention comprises at least one nucleic acid sequence with a length of at least 10 nucleotides which is at least 75%, more preferably at least 80%, more preferably at least 85%, more preferably at least 90%, more preferably at least 95% identical to a part of a gene or pseudogene of interest, or a complementary sequence thereof, said part comprising a (pseudo)gene-specific nucleotide and/or sequence, and/or an additional (pseudo)gene-specific polymorphism preferably an SNP. In one embodiment, a nucleic acid molecule according to the invention comprises at least one nucleic acid sequence with a length of at least 10 nucleotides which is identical to a part of a gene or pseudogene of interest, or a complementary sequence thereof, said part comprising a (pseudo)gene-specific nucleotide and/or sequence, and/or an additional (pseudo)gene-specific polymorphism preferably an SNP. The invention further provides a vehicle or plasmid comprising a nucleic acid molecule according to the invention.
As used herein, a “nucleic acid molecule” or a “nucleic acid sequence” comprises a chain of nucleotides, preferably DNA and/or RNA. A nucleic acid molecule or nucleic acid sequence of the invention may be single stranded or double stranded. In other embodiments a nucleic acid molecule or nucleic acid sequence of the invention comprises other kinds of nucleic acid structures such as for instance a DNA/RNA helix, peptide nucleic acid (PNA), locked nucleic acid (LNA) and/or a ribozyme. Hence, the term “nucleic acid sequence” also encompasses a chain comprising non-natural nucleotides, modified nucleotides and/or non-nucleotide building blocks which exhibit the same function as natural nucleotides.
As used herein, “copy number of a (control) gene or pseudogene” refers to the number of DNA molecules of said gene or pseudogene in the genome of an individual.
The term “complementary” is known in the art. A complementary sequence as used herein refers to a nucleic acid sequence of which the base pairs can be non-covalently connected to the target sequence.
As used herein, a “vehicle” is defined as any means that can contain a nucleic acid molecule, such as for instance a vector or plasmid. A “plasmid” is defined herein as a circular, double-stranded DNA molecule.
As used herein, the term “% sequence identity to part of a gene” is defined as the percentage of residues in a nucleotide sequence that is identical with the residues in said part of a gene after aligning the two sequences and introducing gaps, if necessary, to achieve the maximum percent identity. Methods and computer programs for the alignment are well known in the art. One computer program which may be used or adapted for purposes of determining whether a candidate sequence falls within this definition is Autoassembler 2.0 (ABI Prism, Perkin Elmer).
As used herein a “control nucleic acid sequence” is a nucleic acid sequence with a length of at least 10 nucleotides which is at least 70% identical to part of a gene other than the gene of interest, or a complementary sequence thereof. Preferably, a control nucleic acid sequence is at least 75%, more preferably at least 80%, more preferably at least 85%, more preferably at least 90%, more preferably at least 95% identical to part of a gene other than the gene of interest, or a complementary sequence thereof. Preferably, gene other than the gene of interest, herein called a control gene, has a constant copy number in the human genome. Most preferably, it is known how many copies of said control gene are present in the genome of each human. Said control gene is thus preferably not subject to copy number variation. In a preferred embodiment, said control gene has two copies in the human genome. The invention therefore also provides a nucleic acid molecule according to the invention, wherein said control nucleic acid sequence is at least 70% identical to, or complementary to, a part of a control gene which has a constant copy number in the human genome, preferably wherein said control gene has a copy number of two in the human genome.
Examples of genes which are normally not subject to copy number variation and which are known to have a copy number of 2 are FGF3, BCAS4, LMNA, PARK2, MSH6, GALT, SPG4, IL-4 and NF2. Therefore, in a preferred embodiment, said at least one control nucleotide sequence is at least 70% identical to, or complementary to, a part of FGF3, BCAS4, LMNA, PARK2, MSH6, GALT, SPG4, IL-4 and/or NF2, said part having a length of at least 10 nucleotides. In
The left and right probes of the probesets of
The use of at least two control nucleic acid sequences which are at least 70% identical to parts of different control genes is preferred because this allows for a more accurate determination of the copy number of a gene of interest. Therefore, a nucleic acid molecule according to the invention preferably comprises at least two control nucleic acid sequences, more preferably at least three, more preferably at least four, more preferably at least five, more preferably at least six, more preferably at least seven control nucleic acid sequences. Said control nucleic acid sequences are preferably selected from the group of nucleic acid sequences having at least 70% sequence identity to the probe sequences, without the primer binding sites, of the probe sets indicated as Control 1 (IL-4), Control 2 (FGF3), Control 3 (BCAS4), Control 4 (LMNA), Control 8 (GALT), Control 9 (SPG4) and Control 10 (NF2) in
As explained above, a calibrator according to the invention is particularly suitable for determining the copy number of a gene of interest in an individual. Therefore, nucleic acid sequences located on a calibrator according to the invention are preferably at least 70% identical to part of a gene of interest which is subject to copy number variation. The invention thus provides a nucleic acid molecule comprising at least one nucleic acid sequence with a length of at least 10 nucleotides which is at least 70% identical to part of a gene of interest, or a complementary sequence thereof, and at least one control nucleic acid sequence, wherein at least 80% of said nucleic acid sequences which are at least 70% identical to part of a gene of interest comprise a sequence that is identical to, or complementary to, a gene-specific nucleotide and/or a pseudo-gene specific nucleotide and/or a gene-specific sequence and/or a pseudogene-specific sequence and/or an additional polymorphism within said gene or pseudogene, said polymorphism preferably comprising an SNP, wherein at least one of said genes of interest is subject to copy number variation in the human genome. Said control nucleic acid is preferably at least 70% identical to, or complementary to, a part of a gene which has a constant copy number in the human genome.
Preferably each (pseudo)gene-specific nucleic acid sequence located on a calibrator according to the invention comprises a sequence that is identical to, or complementary to, a gene-specific nucleotide and/or a pseudo-gene specific nucleotide and/or a gene-specific sequence and/or a pseudogene-specific sequence and/or an additional polymorphism within said gene or pseudogene. Such nucleic acid sequence is unique for said (pseudo)gene of interest. Said nucleic acid sequence located on the calibrator can thus be used to distinguish said specific gene variant from other genes, such as other gene variants and/or other genes of a gene cluster. If a nucleic acid molecule according to the invention comprises nucleic acid sequences that are specific for each (pseudo)gene variant of a gene cluster of interest, such nucleic acid molecule can be used to determine the haplotype, including copy number variation, of said gene cluster for an individual. Thus, in one embodiment, a calibrator according to the invention is provided that comprises nucleic acid sequences which together are at least 70% identical to, or complementary to, parts of each gene variant of a gene cluster of interest. Such calibrator can be used to determine in one reaction the presence or absence and copy number of each gene of said gene cluster, for instance of the KIR or HLA gene cluster. This means that with the use of such calibrator, the complete haplotype, including gene copy number, of a gene cluster in a sample of an individual can be determined. Therefore, in a preferred embodiment, a nucleic acid molecule, vehicle or plasmid according to the invention is provided that comprises nucleic acid sequences which together are at least 70% identical to, or complementary to, parts of each gene of a gene cluster of interest, or complementary sequences thereof.
Because a calibrator according to the invention is particularly suitable for use in an MLPA method according to the invention, (pseudo)gene-specific nucleic acid sequences located on the calibrator preferably have the same number of nucleotides as an MLPA probe set according to the invention depicted in
A calibrator according to the invention is also particularly useful as an internal quality control when determining the presence or copy number of a (pseudo)gene of interest in a sample of an individual. Without the use of such a control, if a sample of an individual is subjected to an amplification reaction, for instance as part of an MLPA method, the absence of amplified product of a gene of interest may indicate that said gene of interest is not present in said sample. However, it is also possible that the amplification reaction failed.
As explained before, if a reference sample containing a calibrator according to the invention comprising a nucleic acid sequence specific for the same gene of interest is subjected to the same MLPA method as a sample of an individual, it serves as a control for the success of an amplification reaction. If no amplified product is obtained using said sample of said individual, but amplified product is obtained using said reference sample, it can be determined that the amplification was successful. In that case, it can be concluded that said gene of interest is not present in said sample of said individual. The presence of amplified product from said reference sample proves that the amplification reaction was successful.
On the other hand, if amplified product is not obtained from said sample of said individual and also not from said reference sample, it can be concluded that the amplification reaction failed. If the amplification reaction succeeded, at least in the reference sample amplified product should be present.
Of course, if amplified product following an amplification reaction is present both when using said sample of said individual and when using said reference sample, it can be concluded that said gene of interest is present in said individual.
Preferably each nucleic acid sequence located on a calibrator according to the invention is separated from the upstream or downstream nucleic acid sequence by a spacer sequence of at least 5 nucleotides. This allows an efficient hybridization of MLPA probes of a multiple probe set according to the invention to a calibrator according to the invention and it allows an efficient amplification reaction. A “spacer sequence” as used herein is defined as a nucleotide sequence that it not present in the probes used in an MLPA reaction. More preferably, each (pseudo) gene-specific or control gene-specific nucleic acid sequence located on the calibrator is followed by a spacer sequence of at least 10, more preferably at least 15, more preferably at least 20 nucleotides. Preferably said spacer sequence consists of at most 100 nucleotides, more preferably at most 80 nucleotides to limit the size of a nucleic acid molecule according to the invention. This is not necessary, however: said spacer sequences can be larger than 100 nucleotides. However, in that case a large calibrator will be generated which can be disadvantageous, for instance because the larger the calibrator, the more complicated it is to synthesize. Therefore in a preferred embodiment a nucleic acid molecule or vehicle or plasmid according to the invention is provided, wherein each (pseudo)gene-specific nucleic acid sequence or complementary sequence thereof and/or each control nucleotide sequence or complementary sequence thereof is followed from 5′ to 3′ by a spacer sequence of between 5 and 100 nucleotides, preferably of between 20 and 80 nucleotides.
As described above, a calibrator according to the invention is particularly suitable for determining the copy number of a (pseudo)gene of interest or for determining a haplotype of a gene cluster of interest in an individual. Also provided is therefore a use of a nucleic acid molecule or vehicle or plasmid according to the invention for determining a copy number of at least one (pseudo)gene of interest in an individual and a use of a nucleic acid molecule or vehicle or plasmid according to the invention for determining a haplotype of a gene cluster of interest of an individual.
Also provided is a method for determining a copy number of at least one (pseudo)gene of interest of an individual comprising:
amplifying a sequence with a length of at least 10 nucleotides of said at least one (pseudo)gene of interest using a sample of said individual and amplifying a sequence with a length of at least 10 nucleotides of said at least one (pseudo)gene of interest using a reference sample, said reference sample comprising a nucleic acid molecule or a vehicle or a plasmid according to the invention, and
amplifying a sequence with a length of at least 10 nucleotides of at least one control gene using said sample of said individual and amplifying a sequence with a length of at least 10 nucleotides of said at least one control gene using said reference sample;
determining a level of amplified product of said sequence of said at least one (pseudo)gene of interest from said sample of said individual and determining a level of amplified product of said sequence of said at least one (pseudo)gene of interest from said reference sample; and
determining a level of amplified product of said sequence of said at least one control gene from said sample of said individual and determining a level of amplified product of said sequence of said at least one control gene in said reference sample; and
comparing said levels of amplified products of said sequences of said at least one (pseudo)gene of interest with each other and with said levels of amplified products of said sequences of said at least one control gene, thereby determining the copy number of said at least one (pseudo)gene of interest. In one embodiment a method for determining a haplotype of a gene cluster of an individual is provided which method comprises determining a copy number of all genes of said gene cluster with a method of the invention.
A calibrator according to the invention is further particularly suitable to determine the presence or absence and the copy number of a (pseudo)gene of interest in an individual using an MLPA or MLPA-like method according to the invention, in which at least one probe set is used which consists of a left probe, a middle probe and a right probe. The invention therefore also provides a method for determining a copy number of at least one nucleic acid of interest in an individual, comprising the steps of:
a) adding to a sample of said individual and to a reference sample comprising a nucleic acid molecule or a vehicle or a plasmid according to the invention at least two different probe sets, each probe set comprising:
a first nucleic acid probe, said first probe comprising a first nucleic acid sequence complementary to a first region of said nucleic acid of interest and, located 5′ thereof, a non-complementary nucleic acid sequence comprising a first primer binding site, and
a second nucleic acid probe, said second probe comprising a second nucleic acid sequence complementary to a second region of said nucleic acid of interest and, located 3′ thereof, a non-complementary nucleic acid sequence comprising a second primer binding site,
wherein at least one of said probe sets comprises a third nucleic acid probe, said third probe comprising a third nucleic acid sequence complementary to a third region of said nucleic acid of interest, and
wherein, if said third probe is present in said probe set, said first and said third region of said nucleic acid of interest are located essentially adjacent to each other and said third and said second region of said nucleic acid of interest are located essentially adjacent to each other, and
wherein, if said third probe is not present in said probe set, said first and said second region of said nucleic acid of interest are located essentially adjacent to each other,
wherein at least one third nucleic acid probe is complementary to a region of said nucleic acid of interest comprising a gene-specific nucleotide and/or a pseudogene-specific nucleotide and/or a gene-specific sequence and/or a pseudogene-specific sequence and/or an additional polymorphism within a given gene or pseudogene, said polymorphism preferably comprising an SNP
b) adding to said sample of said individual and to said reference sample at least one different probe set, each probe set comprising:
a first nucleic acid probe, said first probe comprising a first nucleic acid sequence complementary to a first region of a control nucleic acid sequence and, located 5′ thereof, a non-complementary nucleic acid sequence comprising a first primer binding site, and
at least a second nucleic acid probe, said second probe comprising a second nucleic acid sequence complementary to a second region of said control nucleic acid and, located 3′ thereof, a non-complementary nucleic acid sequence comprising a second primer binding site, and
c) allowing hybridization of said at least two different probe sets to complementary nucleic acid of said sample of said individual,
d) allowing hybridization of said at least two different probe sets to complementary nucleic acid of said reference sample,
e) subjecting nucleic acid of said sample of said individual, and nucleic acid of said reference sample to a ligation reaction,
f) subjecting nucleic acid of said sample of said individual and nucleic acid of said reference sample to a nucleic acid amplification reaction, using at least one primer capable of specifically binding said first primer binding site and at least one primer capable of specifically binding said second primer binding site, and
g) determining whether amplified nucleic acid is present, thereby determining whether said at least one nucleic acid sequence of interest and/or said control nucleic acid is present in said sample of said individual,
h) determining a level of amplified product of said at least one nucleic acid sequence of interest of said sample of said individual and a level of amplified product of said at least one nucleic acid sequence of interest of said reference sample;
i) determining a level of amplified product of said at least one control nucleic acid sequence of said sample of said individual and a level of amplified product of said at least one control nucleic acid sequence of said reference sample;
j) comparing said levels of amplified product of said at least one nucleic acid of interest with said levels of amplified product of said at least one control nucleic acid, thereby determining the copy number of said at least one nucleic acid of interest.
As described above, the KIR locus in humans is polygenic and highly polymorphic, so that accurate and efficient characterization of an individual's KIR (pseudo)gene profile is cumbersome. A calibrator according to the invention is therefore particularly suitable for determining the presence and/or copy number of a KIR gene.
If the presence or absence of a specific (pseudo)gene is correlated with (predisposition to) disease, it is often sufficient to compare a sample of an individual with a reference sample of which it is known that the specific gene is present. However, the presence or absence of several KIR genes is not directly correlated with disease. An individual may lack one or more KIR genes without this resulting in disease. Importantly, the correlation between one or more specific KIR genes and disease or the predisposition to disease often depends on the copy number of the KIR gene. For instance, a copy number of 1 of a specific KIR gene is not correlated with a disease, but a copy number of 2 or more of this KIR gene results in, or predisposes to, disease. As an example, a higher copy number of KIR2DL2 and/or KIR2DS2 in an individual has been demonstrated to be predisposing for rheumatoid arthritis with extra-articular manifestations and rheumatoid vasculitis. Thus, obtaining information about the presence or absence of a specific KIR gene in an individual may not be sufficient to obtain information about the correlation between the KIR gene profile of an individual and the correlation to disease. It is also necessary to determine the copy number of KIR genes when information about such correlations are needed.
As described in Example 4, the present inventors constructed a calibrator according to the invention comprising nucleic acid sequences which are identical to parts of each currently known KIR gene. The sequence of this calibrator is depicted in
The left and right probes of the probe sets of
The invention therefore provides a nucleic acid molecule comprising a nucleotide sequence which has at least 70% sequence identity with at least one nucleic acid sequence consisting of:
a) a probe set of
b) a complementary sequence of said probe set without said primer binding sites,
wherein said nucleic acid sequence of a) or b) either comprises immediately adjacent to each other:
the sequences or complementary sequences of a left probe of said probe set, without primer binding site GGGTTCCCTAAGGGTTGGA, and of a right probe of the same probe set, without primer binding site TCTAGATTGGATCTTGCTGGCAC or TCTAGATTGGATCTTGCTGGCGC, if said probe set consists of two probes, or
the sequences or complementary sequences of a left probe of said probe set, without primer binding site GGGTTCCCTAAGGGTTGGA, and of a middle probe and right probe of the same probe set, without primer binding site TCTAGATTGGATCTTGCTGGCAC or TCTAGATTGGATCTTGCTGGCGC, if said probe set consists of three probes. Such nucleic acid molecule is herein also defined as “a nucleic acid molecule comprising a nucleic acid sequence of the probes of at least one probe set of
Hence, in a preferred embodiment a calibrator according to the invention comprises the sequences of all probes of a given probe set of
If the copy number of a KIR gene is to be determined, the level of amplification product of said KIR gene using a sample of an individual and the level of amplification product of a corresponding sequence in a reference sample are preferably compared with an expression level of a control nucleic acid. Also provided is therefore a nucleic acid molecule comprising a nucleotide sequence which has at least 70% sequence identity with at least one nucleic acid sequence consisting of:
a) a probe set of
b) a complementary sequence of said probe set without said primer binding sites,
wherein said nucleic acid sequence of a) or b) either comprises immediately adjacent to each other:
the sequences or complementary sequences of a left probe of said probe set, without primer binding site GGGTTCCCTAAGGGTTGGA, and of a right probe of the same probe set, without primer binding site TCTAGATTGGATCTTGCTGGCAC or TCTAGATTGGATCTTGCTGGCGC, if said probe set consists of two probes, or
the sequences or complementary sequences of a left probe of said probe set, without primer binding site GGGTTCCCTAAGGGTTGGA, and of a middle probe and right probe of the same probe set, without primer binding site TCTAGATTGGATCTTGCTGGCAC or TCTAGATTGGATCTTGCTGGCGC, if said probe set consists of three probes, said nucleic acid molecule further comprising at least one control nucleic acid sequence or a complementary sequence thereof. Said nucleic acid molecule can be used as such. However, such nucleic acid molecule can also be present in a vehicle such as a plasmid, which optionally comprises other nucleic acid sequences. Also provided is therefore a vehicle or a plasmid comprising a nucleic acid molecule according to the invention. Such nucleic acid molecule as such or vehicle or plasmid comprising such nucleic acid molecule are herein also referred to as “KIR calibrator” according to the invention.
Generally, one will be interested in determining the copy number of more than one KIR gene, for instance for determining the KIR haplotype of an individual, or for determining predisposition to a disorder which is associated with the presence or absence or copy number of more than one KIR gene. A KIR calibrator according to the invention has the advantage that multiple nucleic acid sequences, each of which are at least 70% identical to part of a given KIR gene of interest, are included. Preferably, sequences specific for all known KIR genes are located on a calibrator according to the invention. Thus, a KIR calibrator preferably comprises for each known KIR gene a nucleic acid sequence which is at least 70% identical to a part with a length of at least 10 nucleotides of said KIR gene. In that case, only one KIR calibrator according to the invention needs to be present in a reference sample to determine the copy number of all KIR genes of interest.
A KIR calibrator may consist of a single nucleic acid molecule according to the invention. However, a KIR calibrator may also comprise multiple nucleic acid molecules according to the invention. Most preferably, but not necessary, however, a KIR calibrator according to the invention consist of one vehicle or plasmid according to the invention. A nucleic acid molecule according to the invention therefore preferably comprises nucleic acid sequences of the probes without primer binding sites of at least two probe sets of
all nucleic acid sequences of
all nucleic acid sequences of
all nucleic acid sequences of
all nucleic acid sequences of
all nucleic acid sequences of
all nucleic acid sequences of
any combination thereof, and
any complementary sequences thereof. Such preferred KIR calibrator comprises a nucleic acid sequence of part of each KIR gene currently known. A reference sample comprising such KIR calibrator is thus particularly suitable to determine the presence or absence and copy number of each currently known KIR gene and can thus be used to determine the KIR haplotype, including gene copy number, of any individual. This is for instance demonstrated in Example 5, which describes the determination of the KIR haplotype, including copy number variation, of two siblings of two different families using a KIR calibrator according to the invention. Without the use of such KIR calibrator, it would have been only possible to determine presence or absence of each KIR gene and not the absolute copy number, because no reference samples are currently available comprising known quantities of all KIR genes.
The sequence of the probes, without primer binding sites, of all probes sets of
The construction of a non-limiting example of a KIR calibrator according to the invention is described in Example 4. This specific KIR calibrator comprises nucleic acid sequences corresponding to part of each currently known KIR gene, and nucleic acid sequences corresponding to part of seven control genes which are known to have a constant copy number in the human genome of 2. The KIR calibrator described in Example 4 comprises the sequences of the probes of all probe sets of
A KIR calibrator according to the invention may have spacer sequences between nucleic acid sequences or complement thereof which correspond to part of a KIR gene and/or a control gene. Preferably, in such nucleic acid molecule, most variation in sequence is allowed in spacer nucleic acid sequences. Nucleic acid sequences which correspond to part of a KIR gene or to part of a control gene or complement thereof preferably have at least 70%, more preferably at least 75%, more preferably at least 80%, more preferably at least 85%, more preferably at least 90%, more preferably at least 95% sequence identity with the corresponding sequence in
The invention also provides a use of a nucleic acid molecule or a vehicle or plasmid according to the invention for determining the copy number of at least one KIR gene in an individual and/or for determining a KIR haplotype of an individual.
The invention further provides method for determining the copy number of at least one KIR gene of an individual comprising:
amplifying a sequence with a length of at least 10 nucleotides of said at least one KIR gene using a sample of said individual and amplifying a sequence with a length of at least 10 nucleotides of said at least one KIR gene using a reference sample, said reference sample comprising a nucleic acid molecule or a plasmid according to the invention, and
amplifying a sequence with a length of at least 10 nucleotides of at least one control gene using said sample of said individual and amplifying a sequence with a length of at least 10 nucleotides of said at least one control gene using said reference sample;
determining a level of amplified product of said sequence of said at least one KIR gene from said sample of said individual and determining a level of amplified product of said sequence of said at least one KIR gene from said reference sample; and
determining a level of amplified product of said sequence of said at least one control gene from said sample of said individual and determining a level of amplified product of said sequence of said at least one control gene in said reference sample; and
comparing said levels of amplified products of said sequences of said at least one KIR gene with each other and with said levels of amplified products of said sequences of said at least one control gene, thereby determining the copy number of said at least one KIR gene.
Said part of said at least one KIR gene and said part of said at least one control gene preferably comprise at least 10, more preferably at least 15, more preferably at least 18, more preferably at least 19, more preferably at least 20 nucleotides. As described herein before, in an MLPA method according to the invention, preferably chemically synthesized MLPA probes are used with a length of between 20 and 110 nucleotides because such probes can be synthesized easily and cost-effective. Therefore, if the copy number of at least on KIR gene is determined using MLPA, the KIR gene-specific and control gene-specific sequences located on a calibrator according to the invention preferably have a length of between 40 and 330 nucleotides. Most preferably, such nucleic acid sequences have a length of between 90 and 300 nucleotides.
As described herein before, a KIR calibrator is particularly suitable for determining the KIR haplotype and/or copy number of KIR genes using an MLPA method. Therefore, in a preferred embodiment a method according to the invention further comprises the steps of:
a) adding to said sample of said individual and to said reference sample at least one probe set selected from
b) optionally, adding to said sample of said individual and to said reference sample at least one probe set selected from
c) allowing hybridization of said probe set or probe sets to complementary nucleic acid of said sample of said individual, and
d) allowing hybridization of said probe set or probe sets to complementary nucleic acid of said reference sample, and
e) subjecting nucleic acid of said sample of said individual, and nucleic acid of said reference sample to a ligation reaction.
Said method preferably further comprises amplifying ligated nucleic acid and determining levels of amplified products, thereby determining the copy number of at least one KIR gene of said individual. In a preferred embodiment, a method according to the invention comprises the use of at least one of the probe sets selected from
The invention also provides a method for determining a KIR haplotype of an individual comprising determining the copy number of at least 5, preferably at least 10, more preferably at least 15, most preferably all KIR genes of said individual with a method according to the invention.
It is described herein before in detail how the copy number of a gene is determined based on the level of amplified product of part of said gene in a test sample and in a reference sample and the level of amplified product of part of at least one control gene in said test sample and said reference sample. Briefly, the difference between the peak intensities of each amplified control gene product is determined by comparing for each control gene the intensity peak of amplified product in a test sample and the intensity peak of amplified product in a reference sample. The difference between the peak intensity of each amplified KIR gene is also determined by comparing for each KIR gene the intensity peak of amplified product in said test sample and the intensity peak of amplified product in said reference sample. Subsequently, the copy number of the KIR gene is determined based on the proportions of the peak intensities of the KIR gene in test and reference sample and the proportion of the peak intensities of the control gene in test and reference sample.
KIR polymorphisms have been associated with disease. Association between KIR polymorphisms and subtypes of leukemia were investigated by Zhang et al. (Zhang et al. 2009). The presence of KIR2DS4 was demonstrated to be predisposing to chronic myelogenous leukemia (CML) and the absence of KIR2DS3 was predisposing to acute lymphoblastic leukemia (ALL). KIR2DS4 is present in haplotype A, whereas KIR2DS3 is present in haplotype B. Presence of KIR2DS4 and absence of KIR2DS3 are predisposing to leukemia subtypes. Thus, characteristics of haplotype A are predisposing to leukemia subtypes. The present invention provides probes that are particularly well suitable for detecting KIR genes, including KIR2DS4 and KIR2DS3. Thus, with probes according to the present invention selected from
Therefore, in one embodiment the invention provides a method for determining predisposition to leukemia of an individual comprising determining the presence or absence of KIR2DS4 and/or KIR2DS3 in a nucleic acid sample of said individual with at least one probeset listed in
Association between KIR polymorphisms and inflammatory bowel disease (IBD) and/or Crohn's disease have been established as well (Hollenbach et al 2009). The KIR2DL2/KIR2DL3 heterozygous genotype predisposes or protects from Crohn's disease depending on the presence of their HLA-C ligands. KIR2DL2/KIR2DL3 heterozygosity in combination with C1 predisposes to Crohn's disease whereas KIR2DL2/KIR2DL3 heterozygosity in combination with C2 protects from IBD and/or Crohn's disease. KIR2DL2/KIR2DL3 heterozygosity in combination with C1/C2 heterozygosity has an intermediate effect on predisposition (Hollenbach et al 2009). Non-limiting examples for determining the presence or absence of C1 and/or C2 are detecting nucleic acid sequence(s) encoding C1 and/or C2 protein using for instance a nucleic acid amplification reaction or detecting C1 and/or C2 protein using for instance Western blot analysis.
The present invention provides probes that are particularly suitable for detecting KIR genes, including KIR2DL2 and KIR2DL3. Thus, with probes according to the present invention selected from
Therefore, in one embodiment the invention provides a method for determining predisposition to IBD and/or Crohn's disease of an individual comprising determining the presence or absence of KIR2DL2 and/or KIR2DL3 in a nucleic acid sample of said individual with at least one probeset listed in
Copy number variation of KIR2DL3, KIR3DL1 and KIR3DS1 is correlated to the course of disease in chronic infection, such as retroviral infection, herpes virus infection, and hepatitis virus infection, more in particular HW, CMV, EBV, HSV, HBV and HCV (Martin et al 2007 and Khakoo et al 2004). A higher copy number of KIR3DL1 and/or KIR3DS1 in an individual is indicative for an improved course of the disease and/or response to treatment of chronic infection as compared with a low copy number of KIR3DL1 and/or KIR3DS1 in an individual and a low copy number of KIR2DL3 in an individual is indicative for an improved course of the disease and/or response to treatment of chronic infection as compared with a high copy number of KIR2DL3 in an individual. Thus, a higher copy number of KIR3DL1 and/or KIR3DS1 in an individual is indicative for an increased survival in chronic infection and a lower copy number of KIR2DL3 in an individual is indicative for increased survival in chronic infection.
The present invention provides probes that are particularly well suitable for determining copy number variation of KIR genes, including KIR3DL1 and KIR3DS1. Thus, with probes according to the present invention selected from
Therefore the invention provides method for determining susceptibility of an individual to course of disease and/or response to treatment in chronic infection, preferably retroviral infection, herpes virus infection, and hepatitis virus infection, comprising determining the copy number of KIR2DL3, KIR3DL1 and/or KIR3DS1 in a nucleic acid sample of said individual with at least one probeset listed in
The presence of KIR2DS4 in a donor is correlated to transplantation-related outcome measures, such as mortality, graft-versus-host, graft-versus-tumor and grafted organ survival in recipients after transplantation. The presence of KIR2DS4 in a donor is indicative for reduced mortality, reduced graft-versus-host, increased graft-versus-tumor and increased grafted organ survival in recipients after transplantation as compared to the absence of KIR2DS4 in a donor. The present invention provides probes that are particularly well suitable for determining copy number variation of KIR genes, including KIR3DL1 and KIR3DS1. Thus, with probes according to the present invention selected from
Therefore the invention provides a method for determining predisposition to transplantation-related outcome measures, such as mortality, graft-versus-host, graft-versus-tumor and grafted organ survival of a recipient after transplantation, comprising determining the presence or absence of KIR2DS4 in a nucleic acid sample of a donor for said recipient with at least one probeset listed in
A correlation has been established between the copy number of KIR2DL2 and KIR2DS2 and rheumatoid arthritis (RA) with extra-articular manifestations and rheumatoid vasculitis. A higher copy number of KIR2DL2 and/or KIR2DS2 in an individual was demonstrated to be predisposing for rheumatoid arthritis with extra-articular manifestations and rheumatoid vasculitis (Majorczyk et al 2007, Yen et al 2001). Additionally, rheumatoid arthritis patients positive for KIR2DL3 and negative for KIR2DS3 had earlier disease diagnosis (Majorczyk et al 2007).
The present invention provides probes that are particularly well suitable for determining the presence or absence and copy number variation of KIR genes, including KIR2DL2, KIR2DS2, KIR2DL3 and KIR2DS3. Thus, with probes according to the present invention selected from
Therefore in one embodiment the invention provides a method for determining predisposition to rheumatoid arthritis with extra-articular manifestations and rheumatoid vasculitis of an individual comprising determining the copy number of KIR2DS2 and/or KIR2DL2 in a nucleic acid sample of said individual with at least one probeset listed in
Finally, a correlation has been found between the presence or absence or copy number of KIR genes and predisposition to autoinflammation, such as HLA-B27-related enthesitis-related arthropathy and reactive arthritis, psoriasis, in individuals. For instance, KIR3DL2 is increased in spondylarthritides and juvenile enthesitis-related arthritis (Chan et al 2005, Brown 2009). The present invention provides probes that are particularly well suitable for determining the presence or absence and copy number variation of KIR genes. Thus with probes selected from
Therefore, in one embodiment the invention provides a method for determining predisposition to autoinflammation, preferably HLA-B27-related enthesitis-related arthropathy and reactive arthritis, psoriasis, in individuals comprising a) determining the presence or absence and/or copy number of a KIR gene indicative for said disorder in a nucleic acid sample of said individual with at least one probeset listed in
In another embodiment the invention provides a method for determining predisposition to spondylarthritides and/or juvenile enthesitis-related arthritis of an individual comprising determining the copy number of KIR3DL2 in a nucleic acid sample of said individual with at least one probeset listed in
The invention is further explained in the following examples. These examples do not limit the scope of the invention, but merely serve to clarify the invention.
A) Left: The numbers of the individuals in top left pedigree correspond with the numbers of the DNA samples in the table. At the bottom the haplotype is denoted in letters and the legend for the haplotype is displayed below (www.ihwg.org). The CNV of some of the genes where quantified different by each of the two probe sets, the number before ‘/’ is for probe set 1 and after for probe set 2.
B1) Interpretation based on SSP-PCR data from CEPH-IHWG and the conventional KIR haplotype model (see also http://www.ncbi.nlm.nih.gov/projects/gv/mhc/xslcgi.fcgi?id=1347&cmd=kirped&locus_group=1).
B2) Novel haplotype model based on SSP-PCR data obtained from CEPH-IHWG (http://www.ncbi.nlm.nih.gov/projects/gv/mhc/xslcgi.fcgi?id=1347&cmd=kirped&locus_group=1).
B3) Copy number variation of KIR genes, determined using SSP-PCR data obtained from CEPH-IHWG based on the conventional KIR haplotype model (table 1) and the novel KIR haplotype model (table 2) and copy number variation of KIR genes, determined by KIR-MLPA using the extended probe sets 1 and 2 and the novel KIR haplotype model (table 3).
A) Left: The numbers of the individuals in top left pedigree correspond with the numbers of the DNA samples in the table. At the bottom the haplotype is denoted in letters and the legend for the haplotype is displayed below (www.ihwg.org). The CNV of some of the genes where quantified different by each of the two probe sets, the number before ‘/’ is for probe set 1 and after for probe set 2.
B1) Interpretation based on SSP-PCR data from CEPH-IHWG and the conventional KIR haplotype model (see also http://www.ncbi.nlm.nih.gov/projects/gv/mhc/xslcgi.fcgi?id=1347&cmd=kirped&locus_group=1).
B2) Novel haplotype model based on SSP-PCR data obtained from CEPH-IHWG (http://www.ncbi.nlm.nih.gov/projects/gv/mhc/xslcgi.fcgi?id=1347&cmd=kirped&locus_group=1).
B3) Copy number variation of KIR genes, determined using SSP-PCR data obtained from CEPH-IHWG based on the conventional KIR haplotype model (table 1) and the novel KIR haplotype model (table 2) and copy number variation of KIR genes, determined by KIR-MLPA using the extended probe sets 1 and 2 and the novel KIR haplotype model (table 3).
This Example presents a new method for KIR genotyping.
KIRs are expressed by natural killer (NK) cells and a subset of T cells. NK cells are cells of the lymphoid lineage, but display no antigen-specific receptors. Their main function is to monitor host cells for the presence of MHC class I molecules and this is important for e.g. distinguishing healthy cells from virus-infected or tumors cells. A low expression of MHC class I molecules on host cells, which may for instance occur during viral infections as a result of virus-mediated down regulation to prevent presentation of viral peptides to CD8 T cells, stimulate NK cells to launch cytotoxic attack. This phenomenon is also known as the “missing self” theory.
NK cells express a variety of receptors that mediate interactions with MHC class I molecules, including members of the KIRs and CD94/NKG receptor multigene families. Interaction between MHC class I molecules and these receptors regulates NK cytotoxicity generally through the generation of inhibitory signals. The composition between KIR and CD94/NKG families of humans and mice differs considerably, with KIRs constituting the most in genetic and gene number variation in man.
KIRs were first discovered in their role in fighting virus infections by natural killer cells, but they are also expressed by a subset of T cells. The KIR gene cluster is located at chromosome 19q13.4 within the leukocyte receptor complex (LCR) and spans a region of about 150 kb. Up to 15 genes plus two pseudogenes have been identified to date. Characteristic of the KIR gene cluster is the variable gene content and an extensive degree of allelic gene variants. The gene content between unrelated individuals can differ considerably in the amount of KIR (pseudo)genes present, but also in the numbers of activating and inhibitory (pseudo)genes. Contractions and expansions by non-reciprocal recombination are the major mechanism behind KIR diversification. KIRs can be divided into two haplotypes, A and B in which haplotype B has a greater variety in gene content and contains more activating KIR genes. Studies of different ethnic populations show significant differences in the distribution of these two haplotypes. The selective pressures, such as exposure to different pathogens and rapidly evolving MHC class I molecules appear to be the forces behind such a gene diversification. A functional analog is the Ly49 gene family in mice, but KIRs and Ly49 are structurally distinct proteins. KIRs have been identified in different primate species, but they are species-specific and differ in gene content among various species. These findings provide evidence for a rapid evolution and expansion of this gene family.
Another level of relevant variation is the level of expression of KIRs by individual NK cells. Each NK cell expresses only a subset of its KIR gene repertoire and the presence of HLA ligands seems to influence the frequency of NK cells expressing the cognate ligand. A higher frequency of NK cells expressing inhibitory KIRs in individuals have been found, when their cognate HLA ligand is present. The ligands of some KIRs, in particular those with activating potential remain to be determined.
Some of these activating KIRs seem to have lower affinity for their cognate HLA class I ligands in comparison with their related inhibitory receptors.
KIRs have been associated with several diseases, but due to the genetic diversity between and in populations and the differences in KIR expression by NK cells, a clear understanding of their role has yet to be defined. KIRs have been reported to play a role in allogeneic hematopoietic stem cell transplantation (HSCT), which is used in the treatment of leukemia. It was suggested that an intentional mismatch between donor KIR and recipient HLA ligands would allow for a graft anti-tumor effect. KIR3DS1 and KIR3DL1 have been reported to be associated with slower progression to AIDS and several other virus infections, such as Hepatitis C virus (HCV), human cytomegalovirus (CMV). Also the protozoan infection with Plasmodium falciparum implicated roles for KIRs in malaria. In autoimmune and inflammatory conditions, certain KIRs and cognate ligand potentially results in higher susceptibility or protection of the host.
The KIR acronym originally stood for killer cell-inhibitory receptor, because the first KIR discovered had an inhibiting effect on NK cells. To date, KIR is an abbreviation for Killer-cell Immunoglobulin-like Receptor, as this family includes both inhibitory and activating receptors. The HUGO Genome Nomenclature Committee (HGNC) is responsible for the naming of KIR genes. Currently KIR gene family consists of 15 genes and 2 pseudogenes, listed in Table 1 (Marsh et al, 2002). KIR genes are named after the protein structure they encode. The “D” denotes “Domain” and the number 2 or 3 before it indicates the number of extra cellular Ig-like domains. “L” indicates a “Long” cytoplasmic tail and “S” indicates a “Short” cytoplasmic tail and the “P” indicates a “pseudogene”. The number behind the letter L or S denotes the gene encoding for this structure. Thus KIR2DL1 encodes for a structure with two Ig-like domains and a long cytoplamic tail. KIR2DL5A and KIR2DL5B are exceptions; they were initially identified as one gene KIR2DL5. However these two structurally similar variants are discovered to be located on different regions of the KIR gene cluster and can be inherited separately (Gomez-Lozano et al, 2002).
The KIRs that possess long cytoplasmic tails transduce inhibitory signals to the NK cell, owing to the two immunoreceptor tyrosine-based inhibitory motifs (ITIMs) (
The KIR3DL1 and KIR3DL2, with three extracellular Ig-like domains represent the prototypical KIR from which all the others can be derived. KIR genes are organized in nine exons, the order of these exons corresponding to the different functional regions of the protein (
In KIR2DP1 exon 3 is a pseudoexon and exon 4 has an early stop codon. If KIR2DP1 would be transcribed this could result in a KIR protein with only a single Ig (D2) domain. In KIR3DP1 exon 2 is missing due to a deletion. The exons encoding for the stalk, TM and cytoplasmic regions are also absent. The three exons coding for the Ig-like domains are intact, however the leader sequence is missing. No transcripts have been found for KIR2DP1 (Trowsdale et al, 2001) and KIR3DP1, the latest one is normally silent, but a recombination of KIR2DL5A and KIR3DP1 have been found to be transcribed and is predicted to be secreted rather than anchored to the cell membrane (Gomez-Lozano, 2005).
Uhrberg et al. (Uhrberg et al, 1997) identified that the KIR locus in humans appeared to be polygenic and polymorphic. Individuals have a variable KIR gene content, achieved through differences in number of total KIR genes and differences in the amount of activating and inhibitory KIR genes. The mechanism behind the KIR diversification is non-reciprocal recombinations between non-allelic genes leading to expansion and contractions of the KIR locus. Also reciprocal crossing over events are postulated to contribute to the diversity. The KIR locus can be separated into two parts with KIR3DL3 on the centromeric end and the central KIR3DP1 on one half, and KIR2DL4 in the central and KIR3DL2 on the telomeric end on the other half. Inside these two parts of KIR locus, genes are located that are in much stronger linkage disequilibrium, supporting a homologous recombination event (Uhrberg 2005).
Studies worldwide using genomic DNA to determine the presence or absence of KIR genes in populations have contributed to an extensive amount of KIR-genotype profiling data. These studies show a difference in frequency of KIR genes in populations of different ethnic backgrounds and can be found on www.allelefrequencies.net. The methods used for KIR genotyping are polymerase chain reaction with sequence-specific primers (PCR-SSP), sequence-specific oligonucleotide probes, PCR (PCR-SSOP), multiplex PCR, automated sequencing and mass spectrometry.
KIR genes can be divided in the haplotypes A and B (Carrington et al, 2003). Both haplotypes contain the framework genes KIR3DL3, KIR3DP1, KIR2DL4 and KIR3DL2. These genes are conserved and are virtually present in every individual. Haplotype A is uniform in terms of gene content and is composed of five inhibitory genes (KIR3DL3, KIR2DL3, KIR2DL1, KIR2DL4KIR3DL1 and KIR3DL2, and only one activating KIR2DS4, as shown in
Haplotype B is more variable than haplotype A and is characterized by one or more of the following genes: KIR2DS2, KIR2DL2, KIR2DL5, KIR2DS3, KIR3DS1, KIR2DL5A, KIR2DS5 and KIR2DS1, conversely haplotype A is characterized by the absence of these genes. The frequency of both haplotypes is relatively even among populations of different ethnic background. It is possible that some haplotypes cannot be placed in these two categories, as the definition of haplotypes varies between authors and hybrids of haplotypes are possible (Vilches et al, 2002). Distinction between A and B haplotypes is useful in biological and medical settings, as haplotype B have more genes that encode for activating KIR than haplotype A. The haplotypes have been constructed by family segregation analysis, genomic sequencing and gene-order analysis (Shilling et al, 2002).
Adding another level of genetic diversity to the KIR family is the extensive degree of gene variations, which are exhibited by all KIR genes. Allelic diversity is generated by substitutions of nucleotides, recombination or gene conversion and point mutations. Activating KIRs and inhibitory KIRs share a high sequence homology. Activating KIRs are believed to be derived from inhibitory KIRs by alterations in sequence, creating a charged residue upstream of a stop codon and an elimination of ITIMs. Due to their younger evolution, allelic diversity of activating KIRs is quite limited when compared to inhibitory KIRs, but the variation of activating receptors across ethnic populations is more extensive.
Currently a total of 335 KIR alleles have been identified and can be found at the website: http://www.ebi.ac.uk/ipd/kir (table 2). KIR allele sequences are denoted by an asterisk after the gene name. Differences in the encoded protein sequences are distinguished by the first three digits, the next two digits are used to denote alleles that differ by synonymous differences within the coding sequence (i.e. not resulting in amino acid substitutions) and the last two digits are used for alleles that have differences in the noncoding region, such as introns and promoters. Thus, 3DL1*009 and 3DL1*010 are alleles that encode different protein products and 3DL 1*00101 and 3DL1*00102 are alleles that encode the same protein product, but these alleles differ by a synonymous DNA substitution within the coding region (Marsh et al, 2002).
The ligands for inhibitory KIRs are MHC class I molecules, which are constitutively expressed by most healthy cells, but can be down-regulated in tumors and infected cells allowing killing by NK cells. Interaction of MHC with inhibitory receptors ensures tolerance of NK cells towards self. MHC class I molecules are encoded by human leukocyte antigen (HLA) genes that are located at chromosome 6p21.3 and are polymorphic and display significant variations. KIR genes and HLA genes segregate independently during meiosis, because they are located on different chromosomes. This can lead to interesting HLA and KIR combinations inherited by one individual, but to obtain a functional interaction between receptor and the cognate ligand, they need to be expressed together. This raises the question whether a correlation exists between the genes encoding KIR and HLA. The ligand specificity for activating KIRs is not well defined. The ligands of some activating KIRs have not been identified yet. The activating receptors of KIR2DS2 and KIR2DS1 were reported to have a lower affinity of binding to HLA-C than those of their closely related inhibitory receptors. It is also possible that non-HLA ligands exist for these activating KIRs. The KIRs with a defined cognate ligand are presented in table 3.
The KIR surface protein repertoire in an individual is mainly determined by the KIR genes. Hence, a lack of expression is more likely caused by the lack of that gene than by a down-regulation. KIR genes are expressed by NK cells in a clonal manner, each individual NK cell within a person possesses a different combination of KIRs, with a subset of the total KIR gene repertoire being expressed on each individual. KIR2DL4 is one notable exception; this gene is ubiquitously expressed on NK cells. The frequency of each expressed KIR may differ between individuals, but is stable over time. For example the gene KIR2DL1 may be expressed on 50% of the NK cell population of individual A, while in individual B the expression of KIR2DL1 is found to be 14% of its NK cell population. One explanation for this difference could be that particular alleles of a gene are expressed more frequently due to the presence of multiple copies of a gene.
This Example presents a new method for KIR genotyping with multiplex ligation dependent probe amplification (MLPA). With this method a rapid and convenient way of KIR genotyping is performed and also the relative number of copies of the KIR genes is quantified. Copy number variation (CNV) accounts for a substantial amount of genetic variation, resulting in significant phenotypic variations in e.g. transcript levels and therefore are of functional relevance.
We developed two synthetic MLPA probe sets for the typing of 16 out of the 17 KIR genes KIR2DL1-5, KIR2DS1-5, KIR3DL1-3, KIR3DS1, KIR3DP1 and KIR2DP1. The probes for the KIR genes were designed for different loci to detect most of the alleles. Probesets 1 and 2 are listed in
DNA from unrelated randomly selected Caucasian donors was obtained for this study to test the peak profile of the probes. For the validation of the probes five SSP-PCR KIR typed genomic DNA samples and 11 EBV transformed B cell lines from the 10th International Histocompatibility Workshop were used (Cook et al, 2003), JVM, T7507, OLGA, SAVC, JBUSH, BM16, LBUF, AMALA, BM90, TAB089 and KAS116. The KIR Reference Panel I from the IHWG containing 48 samples from 12 Centre de'Etude du Polymorphism Humain (CEPH) families □ including 2 parents and 2 children (table 4: KIR typing of the 48 samples and
Probes were designed according to general instructions (www.mlpa.com/protocols.htm). All the probes were manufactured by Invitrogen (Carsblad, Calif.). The sizes of the probes after ligation (“ligated probes”) are spaced four to five nucleotides apart, to separate each amplification product on the sequence type gels, amplification product size ranged from 95 to 223 nucleotides. All MLPA probes contain a PCR primer sequence, which is recognized by a universal primer pair. PCR primer sequences were: forward 5′-GGGTTCCCTAAGGGTTGG-3′ and reverse 5′-TCTAGATTGGATCTTGCTGGCAC-3′.
The KIR probes were designed to identify and discriminate between the 17 KIR genes listed in table 1, with exception of KIR2DL5B. No specific probe could be designed for this gene. The probe for KIR2DL5 now, detects both KIR2DL5A and KIR2DL5B genes. In addition probes on alternative sequences and intron sequences were designed, using basic local alignment sequence tool searches and the IPD/KIR Database, http://www.ebi.ac.uk/ipd/kir. The sizes of the KIR probes can be found in tables 5 and 6.
The targets of the nine control probes are on conserved genes in the human genome, FGF3, BCAS4, LMNA, PARK2, MSH6, GALT, SPG4, IL-4 and NF2. These target genes were tested to show no considerable variation between donors in a previous MLPA study at Sanquin. Control 1 and 10 were initially 88 bp and 130 bp respectively, but have been elongated to 180 bp and 223 bp to distribute the control probes more evenly among the KIR probes. Table 7 shows the list of the genes and the sizes of the control probes.
Competitor probes are designed where the signal of the probe was off-scale to be detected by the capillary electrophoresis apparatus and are listed in table 8.
All DNA samples were diluted to 20 ng/μl with water and 5 μlwas denatured at 98° C. for 5 minutes in 200 μl tubes in a Biometra T-1 Thermoblock with heated lid. MLPA reagents (EK kit 5) were obtained from MRC-Holland (Amsterdam, The Netherlands). SALSA MLPA buffer (2 μl) and 1-10 fmol of each MLPA probe in a probe mixture (1 μl) were added and incubated for 1 minute 95° C., followed by 16 hours at 60° C. in a total volume of 10 μl. Ligation of the hybridized probes was performed by reducing the temperature to 54° C., before adding 32 μl Ligase-65 mix (3 μl ligase buffer A, ligase buffer B, 1 μl Ligase-65 and 25 μl water) and incubated for 15 min. After inactivating the enzyme at 98° C. for 5 min, 10 μl of the ligase mix was diluted with 4 μl PCR Buffer and 26 μl water at 4° C. in 200 μl tubes. For the PCR reaction, 10 μl of polymerase mix (0.5 μl polymerase, 2 μl SALSA enzyme dilution buffer, 2 μl SALSA PCR-primers and 5.5 μl water) was added at 60° C. PCR amplification of the ligated MLPA probes was performed for 36 cycles (30 sec 95° C., 30 sec 60° C., 60 sec 72° C.) followed by an incubation for 20 min at 72° C.
1 μl PCR product is added in new tubes containing 0.4 μl Promega Rox size standard 60-400 bp+8.6 μl High Definition buffer. The products are separated by Applied Biostystems Genetic Analyzer 3130XL capillary electrophoresis according to its molecular weight and the resulting electropherogram show specific peaks that correspond to each probe.
Data were visualized with Genemapper v3.6 and normalized with Soft genetics Genemarker v1.6, using internal control probe normalization (http://www.softgenetics.com/papers/MLPA). Finally these data was exported to an Excel file.
All the MLPA probes were initially tested on randomly chosen donors. We first examined if the probes would generate a signal and if these signals corresponded with the expected size of each probe. The control probe peaks and the probe peaks for the four framework genes, KIR2DL4, KIR2DL3, KIR3DL3 and KIR3DP1, occurred in all samples, as expected. KIR gene content variation between individuals was observed when different samples were compared,
Secondly, the intensity of the probe signal was examined. The peak patterns were visualized with Genemapper, to observe the peak intensities before normalization. Genemarker is used to normalize the data and correct this for the decay of larger probes, but does not indicate where signals are off-scale. It is preferred to have a probe signal between 500-6000 AU in order to obtain a more reliable DQ value. Moreover fluorescent peaks with a signal less than 500 AU may not always be detected when more probes are added to the reaction. Fluorescent peaks above 6000 AU can be off-scale to be detected by the sequencer and decrease the signal of other probes relatively. Several suggestions are described to enhance or lower probe intensity, the nucleotide composition next to the PCR primer tag sites and/or the GC content of a probe are a few factors that can be of influence (www.mlpa.com/protocols.htm). In general competitors are used for reduction of probe signals and a higher probe concentration for an increase in signal. Competitors are oligonucleotides that are identical to a part of the MLPA probe without the forward or reverse primer sequence, depending whether the left or right part is chosen. Competitors compete with the MLPA probe for the same target, however no amplification of these ligated probes will occur, since they lack a primer sequence. The result is that less probe amplification product will be detected and lower peak intensity is obtained.
Competitors were designed for control probes 2, 3, 4, 7 and 9 and in the first place also for the KIR probes 2DL4, 3DL3 (probe set 1) and 3DL2 (probe set 2) These probes had a length of 96 bp, 100 bp and 108 bp, respectively. However we observed a decrease in peak intensity, more or less corresponding with an increase in probe size. Longer synthetic probes are more likely to contain a higher proportion of incomplete oligonucleotides. Therefore it seemed to be an option to elongate the length of probes with high peak intensities and to shorten this for probes with low peak intensities. Probe 2DL4 was redesigned to 170 bp and 3DL3 to 154 bp and lower peak intensities were the result. The peak generated by probe 3DL3 (100 bp) was not affected by its competitor and was apparently a product of the probe 2DS3 (108 bp), because when this probe was removed from the probe set 1, the off-scale signal reduced to normal. Furthermore competitors with a length of 30 bp had less effect than those with a length of 50 bp, in which case a higher dosage was needed to reduce the probe signal (data not shown).
For probes that failed to generate a signal or for which the signal was insufficient, the followings have been performed; a three- to ten-fold concentration of these probes was used and probes that have a high overlap in sequence were not included in one probe set. Placing two cytosine nucleotides after the forward primer should increase the probes signal and a tyrosine base should decrease this, reported in the MLPA design protocol. However in our experiment, several probes were redesigned to contain two cytosines after the forward primer and this did not produce the same results. Probes that still failed to generate a signal after the aforementioned proceedings and testing on lager number of donors were replaced by probes on the reverse strand of the target gene or by probes that have a different target location on that gene.
The frequencies of each KIR gene probe peak on the tested samples were compared with the KIR gene frequencies in Caucasian population available on www.allelefrequencies.net (table 9). Probes with observed frequencies that were contradicted by the population frequencies were assumed to give false negative or false positive results and were replaced by new designs. These were assumed to be caused by gene variation at the ligation sites of the probe.
The list of the alleles that can be detected by the KIR probes and the coverage of the total KIR alleles by the probes are shown in table 10.
Other Factors Interfering with Peak Intensities
We experienced differences probe quality by probes that were manufactured at different companies. The nine control probes were initially ordered from Biolegio (www.biolegio.com) which had also supplied these for the C4 MLPA project previously done here. All the KIR MLPA probes were ordered at Invitrogen (www.invitrogen.com). The control probe set was separated in two mixes, control probes 1 (IL-4), 2 (FGF3), 3 (BCAS4), 4 (LMNA), 5 (PARK2) and 7 (MSH6) in one and the control probes 8 (GALT), 9 (SPG4) and Ctrl 10 (NF2) in the other. The concentration needed for each control probe varied and ranged from 0.5 fmol to 6 fmol and also different concentrations of competitors were needed.
The control probes used for the KIR MLPA were ordered from Invitrogen. Only 1 fmol is needed for each control, with the exception of control probe 5 (3 fmol) in order to obtain the same peak intensity as mention above and the probes do not need to be separated into two mixes. Due to the better probe quality, time is saved in producing the probe sets.
A MLPA reaction with 50 ng of DNA was performed and compared with 100 ng that is used throughout this study. MLPA reactions using a DNA amount of 20 ng have been reported by Schouten et al. (Schouten et al, 2002). When the peak profiles were compared, no striking differences between these two reactions were observed. The DQ of the nine control probes were calculated for each sample and a sample with 100 ng DNA was taken as reference. Seven out of eight samples containing 50 ng of DNA showed a DQ value outside [0.8-1.2] for more than three control probes, ranging from [0.3-1.5] within one sample. While all the eight samples of 100 ng DNA had DQ within the acceptable range [0.8-1.2] for all the nine control probes, with exception of one sample that had two control probe DQ value outside this range. Here we conclude that MLPA reactions with different amounts of DNA cannot be compared with each other, because the DQ values of the same sample did not yield the same score with the different DNA amounts.
Next the samples of 50 ng of DNA were compared among, by taking a sample of 50 ng DNA as reference. The observation was that three of the eight samples had more than three control probes with a DQ value out of the range of [0.8-1.2]. When the nine control DQ values of one sample were analyzed, values between [0.5-1.7] were found. Therefore MLPA reactions carried out with 50 ng of DNA were considered to be unreliable, as the DQ values of the probes showed a great variation between the samples and within one sample, which was not observed with the samples that contained 100 ng of DNA. The requirement of higher amounts of DNA for this study could be explained by the fact that we are using a completely synthetic probe set in contrast with the probe sets used by Schouten et al (Schouten et al, 2002). Moreover most studies that were carried out with little amount of DNA often only analyzed chromosomal abnormalities, such as recombination or mutations and did not quantify copy numbers.
Samples of different runs were not always comparable, when the DQ of the control probes were calculated. The explanation is that the experimental conditions may vary with each run, due to human acting or differences in probe signal reproducibility. Therefore, samples within the same run are preferably normalized and analyzed first before comparing the data with samples of a different run. Reference samples with a more or less established relative gene copy numbers, are preferably included in each experiment to act as reference.
Validation with KIR Typed DNA Samples
The specificity of the KIR probes was verified by testing 11 EBV-transformed cell lines, which were KIR-genotyped by the 10th International Histocompatiblity Workshop (IHW) (Cook et al, 2003). The cell lines were KIR-genotyped using PCR-SPP and PCR-SSOP and were carried out in three separated laboratories. The cell lines were not genotyped for the genes KIR2DL5A, KIR3DL3, KIR2DP1 and KIR3DP1 and also contained no negative controls for the genes KIR2DL1, KIR2DL4, KIR3DL1, KIR3DL2 and KIR2DS4.
In addition, DNA samples from 5 individuals were genotyped by PCR-SSP for further verification. These 5 samples were also genotyped for the genes KIR3DL3 and KIR3DP1 and found to contain true negative genotypic results for KIR2DL1 and KIR2DP1. The results of the verification of the two probe sets are shown in tables 11-14.
KIR genotyping with probe set 1 was found to be consistent with the 10th IHW on 10 of the cell lines for the probes 2DL1-5, 2DS1, 2DS3-5, 3DL1-2 and 3DS1. All cell lines were typed positive for the genes KIR2DP1, KIR3DP1 and KIR3DL3, the first has a frequency between 94-100% (table 9) and the last two are framework genes that are always present. Typing of the 5 individuals yielded the same results as with the PCR-SSP, except for the probe 2DS2.
Most studies on KIR genotyping detect the presence of KIR2DL5 and do not differentiate this gene between the two genes KIR2DL5A and KIR2DL5B. These two genes show a nucleotide sequence difference of only 1%. We were unable to design a probe for KIR2DL5B, because a specific ligation site to discriminate KIR2DL5B from KIR2DL5A and the other KIR genes was not found. The probes that were designed for KIR2DL5A also detect the allele KIR3DP1*004 (table 10), because this allele contains no other difference in the sequence within the probe's range, thus the probe sets do not contain specific probes for the selective detection of KIR2DL5A. In fact, KIR3DP1*004 is non-expressed, and forms a hybrid of the promoter of KIR2DL5A and the coding region of KIR3DP1. When probe 2DL5A generates a signal in the MLPA, this could indicate the presence of both KIR2DL5A and KIR3DP1*004 or either 2KIRDL5A or KIR3DP1*004 alone. However, probe 2DL5 detects the same KIR2DL5A alleles as probe 2DL5A. When probe 2DL5 is not binding and probe 2DL5A is, the absence of KIR2DL5A and the presence of KIR3DP1*004 is demonstrated. This is clearly demonstrated by the cell lines JVM, SAVC, JBUSH, BM16, TAB089, KAS116 and the individuals 33—8025 and 33—8588 (
Probe set 2 contains a smaller proportion of probes. A higher proportion of the probes had overlapping sequences and seven out of the ten KIR probes needed a 10-fold higher concentration than the others to obtain peak intensities above 500 AU.
Probes 2DS5 and 3DS1 bound to all samples including to those genotyped negative for KIR2DS5 and KIR3DS1, indicating unspecific ligation of the probes. Probes 2DL5 and 3DS1 were not based on primer sequences used before, the probe search tool on the KIR database and BLAST results showed no match with other KIR genes and these probes were considered to be specific for KIR2DS5 and KIR3DS1. No explanation could be found, why these probes gave false positive results. These probes were excluded from probe set 2.
Three out of the six negative cell lines for KIR2DS1 were typed positive by this probe, while the two negatives from the PCR-SSP-typed individuals were correctly typed. Probe 2DS1 target is on an intron and only little information about intron sequences is available. The fact that other KIR genes may possess the same sequence at this position, cannot be excluded and therefore this probe is not included in the probe set.
The probe 3DP1 in probe set 2 detects a deletion of exon 2, this allele of KIR3DP1 is designated as KIR3DP1*003 and has a frequency of 0.72 in the Caucasian population. Sample 33—8588 of the PCR-SSP typed individuals was typed negative for KIR3DP1 by the MLPA probe and positive by PCR-SSP (table 14). The conflicting typing results between these two methods can be explained by the presence of exon 2 in this sample.
Both probe sets have genotyped this cell line positive for KIR2DL3 and negative for KIR2DL5 and KIR2DS. In addition, probe set 1, typed LBUF negative for KIR2DS1, KIR2DS5 and KIR3DS1 (table 11 and 13). It is reasonable to assume that the cell line LBUF that was tested, was not the same as published before by the 10th IHW. LBUF had been KIR-genotyped by Hsu et al. 2002 (Hsu et al, 2002) and their typing was consistent with ours. Moreover, LBUF and the other cell lines was KIR-genotyped with the standard PCR-SSP method and these results confirmed our findings with MLPA, including the positive typing results of the genes KIR3DL3, KIR2DP1 and KIR3DP1 on all 11 cell lines.
For the verification of gene copy number quantification, samples with a well-defined number of copies of KIR genes were needed. Since these are not available, we used the KIR reference panel I for this purpose, comprising 12 families of two parents and two children each. These 48 reference samples have been KIR-genotyped by 15 different laboratory groups utilizing PCR-SSP and PCR-SSOP. The Centre de'Etude du Polymorphism Humain (CEPH), Foundation Jean Dausset, Paris, France (www.cephb.fr), had prepared lymphoblastoid cell lines (LCLs) of these families. The International Histocompatibility Working Group (IHWG) Cell and DNA Bank has made this panel available for commercial use (www.ihwg.org).
All the samples have been identified for the presence or absence of 16 of the KIR genes and for two variants of KIR3DP1, (KIR3DP1*003 and KIR3DP1v) and two variants of KIR2DS4 (KIR1D alias KIR2DS4*003 and KIR2DS4) (table 4). Whereas, KIR3DP1 of the KIR reference panel I is characterized by the absence of exon 2 and the KIR3DP1v indicates the remaining KIR3DP1 alleles. KIR1D contains a 22-bp deletion in Ig-like domain D2, causing a frame shift and early stop codon which lead to a truncated protein product (Hsu et al, 2002).
The haplotypes of these six families were also available as shown in
With both probe sets difficulties were experienced with generating reliable data of the MLPA experiments with the KIR reference panel, presumably this is caused by the lower quality of the DNA samples, as this did not occur with the genomic DNA samples of the previous experiments. The DQ values of the control probes had a higher frequency outside the proposed normal range [0.8-1.2]. Therefore, data of a number of samples is missing and these samples should be tested in the future.
16 probes: 2DL1-5A, 2DS1, and 2DS3-5, 3DL1-3, 3DS1, 2DP1 and 3DP1 were tested and the majority of the probes genotyped the KIR reference panel accordingly to what has been reported, except there were some differences with probes 2DP1 and 2DL5. These samples were correctly typed by probe set 2.
The probes: 2DL1-5A, 2DS2, 2DS4, 3DL1-3, 3DS1, 2DP1 and 3DP1, in total 14 probes were tested on the reference panel. Probe 3DP1 was designed for KIR3P1*003 (denoted as 3DP1 in table 4) and its specificity for this allele was confirmed with the reference panel. Probe 2DL2 typed approximately 58% false positive and probe 2DL1 typed three of the four negative of the panel to be positive and, therefore, no further testing has been done with these two probes. Probe 2DS2 typed around 15% incorrectly as negative, although in a previous run which was rejected because of the DQ values of the controls, these two samples were typed positive. These samples need to be revised before a conclusion about probe 2DS2 can be drawn. Probe 2DS4 gave one false negative result (sample 1333-8281). Only 80% of the KIR2DS4 alleles can be detected by this probe because of a gene variant that is 4 bases away from the ligation site in 1 out of 9 alleles. The right part of this probe will be redesigned with an UIB code on this position.
Probes that have been demonstrated to be accurate in KIR genotyping in both probe sets have been analyzed for their ability in copy number quantification. Relative quantification of CNV with one probe is simply not reliable because gene variations near the ligation site of the probe may influence the outcome in DQ value. This is especially true for KIR sequences, because they show a high level of gene variation, while demonstrating a homology up to 99%. Certain probes discriminate the different KIR genes only by one nucleotide difference at their ligation site. A gene variant near the ligation site of the target gene may lead to a lower probe signal. Alternatively, a gene variant at one of the other KIR genes might cause a probe to recognize this gene as its target, thus enhancing the probe signal. Therefore only the KIR genes of the families with the reported haplotype and the complete MLPA data of the two probes are analyzed for copy numbers.
The DQ values of the control probes of both probe sets on each sample were compared to check if the MLPA data are reliable. The nine control probes should generate the same DQ values as these control probes are the same in both probe sets and are tested on the same sample. Samples with less than seven comparable control probe DQ values between the two probe sets were excluded. Next, the DQ values of the KIR probes were evaluated. We interpreted the following; DQ values of 0.3< as 0 copies of that gene, DQ [0.4-0.7]=1 copy, DQ [0.8-1.2]=2 copies, DQ [1.3-1.7]=3 copies, DQ [1.8-2.2]=4 copies, DQ [2.3-2.7]=5 copies, etc. The borderline values, such as a DQ of 0.7 are questionable and when the second probe obviously quantified 1 copy of this gene, 0.7 was considered as 1 copy, the same approach is applied with other borderline values.
A difference in the quantification of the exact copy numbers was observed with the probes for KIR3DP1 in samples: 1347-8445, 1347-8436 and 1349-8398. Probe set 1 seems to detect more copies of this gene than probe set 2, which is in agreement with their design. Probe 3DP1(1) detects all the KIR3DP1(v) alleles and probe 3DP1(2) detects only KIR3DP1*003 denoted in the legend as 3DP1, which exhibit the exon 2 deletion. The probes 2DL3 and 2DL4 in probe set 1 detected fewer copies numbers than their counterparts in probe set 2. Probe 2DL3 and probe 2DL4 might have problems with the presence of gene variants at their target sequence, whereas these probes in probe set 2 have no gene variants in the probe target sequence and give a coverage of 100% (table 10). The probes for KIR3DL1 quantified the members of family 1349 differently. The probe in probe set 1 covers different alleles than the probe in probe set 2, the coverage rate are 78% and 41% respectively due to gene variants present at their target sequence more then 10 bases away from the ligation site, that might influence the binding efficiency and thereby the peakhights. Also here adding TUB codes in the probe sequence will overcome the problem of misinterpretation of copy number differences between individuals.
Despite the differences in copy number quantification of a number of probes, the overall inheritance pattern of the gene copies was in agreement with the inheritance of the haplotypes. For example the four framework genes KIR3DL3, KIR3DP1, KIR2DL4 and KIR3DL2 were present in all samples and at least 2 copies of each of these genes have been found. This indicates that these genes are present in at least one copy at each allele and are inherited from both parents. Examination of family 1347 revealed that the father, haplotype a/b (sample 8440) has three copies of gene KIR2DL5 on one allele, haplotype b and one on the other, haplotype a and has past haplotype b, with the three copies to the child (sample 8436) and the allele haplotype a, with one copy to the other child (sample 8412). For the family 1349, one copy of KIR2DS4 is believed to reside on one allele, haplotype c and two on the other, haplotype d of the mother (sample 8399). Because both children, haplotype b/c and haplotype a/c (sample 8393 and 8636), respectively, inherited the allele with two copies from their mother as they have both the haplotype c and one child (sample 8636) inherited one copy of this gene from its father, haplotype a. Also when the inheritance patterns of the remaining copy numbers of genes were analyzed, no inconsistency with the inheritance patterns of the haplotypes could be found. The rest of the families with fully reported haplotypes should be tested again to obtain complete data of all the members within one family, before the inheritance patterns and copy numbers can be analyzed.
Before the present invention, the main problem in designing synthetic MLPA probes for KM genotyping was to design probes specific enough for the target gene, but still sensitive enough to detect most of the alleles present in the population. KIR genes have very high level of homology (85-99%) in the sequences of both exons and introns and show an extensive degree of gene variation.
The MLPA is a good method, because it can discriminate target sequences that only differ one nucleotide at the ligation site. The present inventors designed synthetic MLPA probes consisting of three probe parts which added a second ligation site, so that an extra discrimination point was provided. In addition these three-part probes made it possible to elongate the ligated probe size, the longest probe tested in this study was 223 bp (Ctr 10). Due to the better quality of the probes and three-part probes, the number of probes in a synthetic MLPA probe set according to the invention is less restricted by the size of the ligated probes.
This study has demonstrated that the MLPA with two synthetic probe sets is reliable in KIR genotyping, as these two probe sets have been well validated by three independent approaches. The two probes sets complement each other in the detection and coverage of the KIR alleles, which yielded in no false negatives any more in all the samples used for verification. Even after exclusion of the probes that may have generated false positives from the probe sets, all 16 KIR genes can still be consistently detected for their presence or absence. This makes the MLPA methods used in this Example in a qualitative sense comparable to the PCR-SSP and PCR-SSOP methods. However time and work is saved with the performed Example, as only two reactions are needed to generate a complete KIR-genotype profile.
In summary, probe set 1 contains the probes 2DL1-5, 2DS1, and 2DS3-5, 3DL1-3, 3DS1, 2DP1 and 3DP1, in total 15 probes. Probe set 2 contains the probes 2DL3-5, 2DS2-4, 3DL1-3, 2DP1 and 3DP1, in total 11 probes. Together these two probe sets are accurate for the typing of 16 KIR genes and for quantifying relative copy numbers of at least 9 KIR genes.
This Example presents additional probes for KIR genotyping and copy number variation analysis with multiplex ligation dependent probe amplification (MLPA). Here, probes are presented for all 17 KIR genes KIR2DL1-5, KIR2DS1-5, KIR3DL1-3, KIR3DS1, KIR3DP1 and KIR2DP1, including KIR2DL5a and KIR2DL5b, KIR3DP1v and several null alleles. The extended probesets 1 and 2 are listed in
For DNA selection/isolation, probe design, MLPA reaction, electrophoresis and analysis according to materials & methods of example 1 with the exception that no competitors were used and data were normalized with Soft genetics Genemarker v1.85, using internal control probe normalization (http://www.softgenetics.com/papers/MLPA) and synthetic references.
With the extended probesets 1 and 2 all KIR genes and several KIR gene variants were detected.
The extended probe set 1 depicted in
The extended probe set 2 as depicted in
The probe 3DP1 in extended probe set 2 detects a deletion of exon 2, this allele of KIR3DP1 is designated as KIR3DP1*003, KIR3DP1*005 or KIR3DP1*006.
With the extended probesets 1 and 2 KIR2DL5A and 2DL5B are now also detected. The probes that were designed for KIR2DL5A and KIR2DL5B also detect the alleles KIR3DP1 variants (table 10, KIR3DP1v). When probe 2DL5A or 2DL5B generates a signal in the MLPA, this could indicate the presence of both KIR2DL5A and KIR3DP1v or KIR2DL5B and KIR3DP1v respectively. Alternatively, when probe 2DL5A or 2DL5B generate a signal in the MLPA the presence of either KIRDL5A or KIR3DP1v alone (with probe 2DL5A) or KIR2DL5B or KIR3DP1v alone (with probe 2DL5B) is indicated. Thus with these probes 2DL5A and 2DL5B more than one KIR gene is detected. Therefore, these probes are not suitable to determine copy number variation (see
For all KIR alleles except KIR3DP1 variants (KIR3DP1v), KIR2DL5A and 2DL5B copy number variation is determined with extended probesets 1 and 2 (
A difference in the quantification of the exact copy numbers as compared to example 1 was elaborated by studies with the extended probesets. Optimization of the probe set initially used in
From the MLPA data within pedigrees haplotyping can be inferred. First of all, the framework genes KIR3DL3 and KIR3DPI for the first block in both haplotypes A and B (
In family 1347, we have deduced, using the extended probesets, from the pedigree a correct and complete KIR haplotype analysis (
At the single gene level the MLPA results offers insight into the patterns of inheritance. The sibs inherited from their parents different KIR haplotypes, which □ for instance □ resulted in the variation in KIR2DL5 gene content. Thus, both sibs have 2 of these genes, containing 2 KIR2DL5 genes from the father (who carries 4 KIR2DL5 genes in total) and one null-haplotype from the mother. From the present data from the literature or the current MLPA data, it cannot yet be distinguished whether the two KIR2DL5 genes that both sibs have inherited, are the same alleles, or whether the KIR2DL5 are located in the first or second block of the so-called B haplotype (see also
At the haplotype level, patterns of inheritance are deduced for the remaining non-framework KIR genes in this pedigree, e.g. KIR2DL3, KIR2DS2, KIR2DL2, KIR2DP1, and KIR2DL1 genes in the first block of haplotype B, generally located in between the framework genes KIR3DL3 and KIR3DP1 genes (see also
In case of the first block of haplotype B, the results are explained by the inheritance of a KIR2DL3-KIR2DP1-KIR2DL1 haplotype from the father and the KIR2DS2-KIR2DL2-KIR2DP1-KIR2DL1 haplotypic block from the mother.
In case of the second block of haplotype B, it is clear that the KIR3DS1-KIR2DS3-KIR2DS1 haplotype has been inherited from the father and the KIR3DL1-KIR2DS4 from the mother. Yet, one sib (8436) must have lost a KIR3DL1 gene according to our MLPA analysis. Sib 8436 has the normal 3DL1 present in our MLPA, though sib 8412 has inherited a 3DL1N variant gene in stead of the normal 3DL1 gene. This is just by normal inheritance so not an exception.
SSP-PCR can not discriminate between 3DL1 variants (also not between 3DS1 variant genes nor 2DL4 variant genes).
At the haplotype level, patterns of inheritance are similarly deduced for the pedigree of family 1349 (
With respect to the first block of haplotype B, the results are explained by the inheritance of one of his two similar KIR2DL3-KIR2DP1-KIR2DL1 alleles from the father and one from the mother (while this female also carried a smaller KIR2DL3-KIR2DP1 haplotypic block).
In case of the second block of haplotype B, it is clear that the father carries a KIR3DL1-KIR2DS4 combination on one allele and a separate KIR2DS3-KIR2DS4-KIR2DS1 haplotypic on the other allele that were differently inherited by the two sibs, whereas the mother carries two identical KIR3DL1-KIR2DS4 alleles.
In
Two KIR haplotype models have been described (see for instance: H. Li, PLoS Genetics, 2008, 4, 11:e1000254; M. Uhrberg, Eur. J. Imm. Highlights, 2005, 35:10-15; M. Carington, The KIR Gene Cluster, 2003; K. Hsu, Imm. Reviews, 2002, 190:40-52). The conventional KIR haplotype model assumes that there are two haplotypes A and B. Both haplotypes A and B contain the framework genes 3DL3, 3DP1, 2DL4, and 3DL2. Then there are the KIR genes 2DP1, 2DL1 and 2DS4 that are common for both haplotypes, but only the haplotype A contains 2DL3, 3DL1 and 2DS4. Haplotype B is more variable and can contain the KIR genes 2DS1, 2DS2, 2DS3, 2DS4, 2DS5, 3DS1, 2DL2 and 2DL5 (apart form the aforementioned framework genes). In more than 96% of the worldwide global population the A haplotype at KIR gene cluster contains the KIR genes 3DL3, 2DL3, 2DP1, 2DL1, 3DP1, 2DL4, 3DL1, 2DS4 and 3DL2 (see also: www.allelfrequencies.net).
The novel KIR haplotype model assumes that haplotype A and B are present on the two different chromosomes. Therefore any individual can represent an AA, AB or BB genotype. Based on the genes that are present in the DNA sample of that individual, one can conclude which haplotypes are present and the positive genes from the assay can be divided over both haplotypes according to the rules that certain KIR genes are present only in one of the haplotypes A or B, essentially as was mentioned above.
For the SSP PCR data the two haplotype models are shown to interpret possible CNV results, resp. the conventional KIR haplotype model in FIGS. 11B1 and 12B1 and the novel KIR haplotype model in FIGS. 11B2 and 12B2. FIGS. 11B3 and 12B3 show the results of our MLPA data with the extended probe sets 1 & 2 compared with both the SSP PCR data according to the conventional KIR haplotype model and with the novel KIR haplotype model.
In conventional KIR haplotype model in FIGS. 11B1 and 12B1 the KIR gene region is described by framework genes (3DL3, 3DP1, 2DL4 and 3DL2), genes that can be present in both A and B haplotypes (2DP1, 2DL1 and 2DS4) and haplotype-specific genes. The KIR genes 2DL3, 3DL1 and 2DS4 are specific for haplotype A. while the KIR genes 2DL5, 2DS1, 2DS2, 2DS3, 2DS5, 3DS1 and 2DL2 are specific for haplotype B. The haplotype A is constant to a high degree. In more than 96% of the global population haplotype A consists of 3DL3, 2DL3, 2DP1, 2DL1, 3DP1, 2DL4, 3DL1, 2DS4 and 3DL2 (www.allelefrequencies.net). Haplotype B is more variable and carries more activating KIR genes.
FIGS. 11B2 and 12B2 show the interpretation for the respective families based on the novel KIR haplotype model and SSP-PCR data from CEPH-IHWG.
FIGS. 11B3 and 12B3 show the copy number variation for the respective families. In table 3 Copy number variation of KIR genes by MLPA is determined by 2 probes for each gene, except for the N-variant genes (single probe detection by definition), including those genes marked by an asterisk.
For the 3DP1v gene variant a combination of 3 probes has been designed. CNV can be deduced from a comparison between the results for the probes for 2DL5, 2DL5a and 2DL5b.
The 2DS4N KIR probe is designed to detect the KIR-2DS4 deletion-variant genes *003 to *009, while SSP-PCR only detects 2DS4 variant *003 (designated 1D).
In FIG. 12B3 KIR3DP1 variants are detected using MLPA (table 3), whereas KIR3DP1 variants are not detected when SSP-PCR is used. SSP-PCR of KIR3DP1v results in a band of 1672 bp that is obtained from the 3DP1 gene. Because this is a large fragment which are known to be difficult to detect. Therefore, a DNA sample can be positive for KIR3DP1v when MLPA is used but appear to be negative for KIR3DP1v when SSP-PCR is used.
Extended probe set 1 contains the probes 2DL1-5, 2DS1-5, 3DL1-3, 3DS1, 2DP1 and 3DP1, in total 20 probes. Extended probe set 2 contains the probes 2DL1-5, 2DS1-5, 3DL1-3, 3DS1, 2DP1 and 3DP1, in total 20 probes. Together these two probe sets are accurate for the typing of all 17 KIR genes, and 7 variant KIR gene variants (i.e. 2DL5a, 2DL5b, 3DP1v, and the null-variants 2DL4N, 3DL1N, 3DS1N, and 2DS4N), and for quantifying relative copy numbers of at least all 17 different KIR genes, and 4 null-variant (2DL4N, 3DL1N, 3DS1N, and 2DS4N) (see
The advantage of probe sets comprising three probe parts according to the present invention is that at least two different SNPs can be detected with one probe set. For instance, in a probeset consisting of three probe parts two sites for ligation are preferably present. A left probe part and middle probe part are ligated and additionally a middle probe part and right probe part are ligated. At each ligation site a SNP can be detected. With conventional MLPA probe sets, consisting of two half probes, only one SNP can be detected per probe set, because only one site for ligation is present.
In this Example detection of the Null allele of KIR3DL1 with a probeset consisting of three probes (one left probe part, one middle probe part and one right probe part) is described. This example is illustrated in
Materials & Methods
The null allele, called KIR3DL1*024N, is discriminated from KIR3DL1 using three probes of the invention. Partial probes (probe numbers as depicted in
For DNA selection/isolation, probe design, MLPA reaction, electrophoresis and analysis see materials & methods of example 1.
With these partial probes 2 probe sets can be formed. Those two probe sets consist of different left probe parts, but share the middle and right probe parts.
The final base of middle probe part 711B is a thymine. This thymine is specific for KIR3DL1 genes while all other KIR genes have a different base at this position. Therefore, with probe part 711B KIR3DL1 is discriminated from other KIR genes. Ligation between the middle probe part (711B) and right probe part (711C) will only occur when KIR3DL1 genes are present.
The final base of left probe part 711A is an adenine. This base is present in wildtype KIR3DL1 gene but deleted in the KIR3DL1 null allele, KIR3DL1*024N. Thus, probe part 711A containing an adenine at the final base position is specific for the wildtype KIR3DL1 gene and ligation between the 711A left probe part and the middle probe part (711B) will only occur if the KIR3DL1 wildtype gene is present. In left probe part 711D the final adenine is removed. Thus, probe part 711D is specific for null allele KIR3DL1*024N and ligation between the 711D left probe part and the middle probe part (711B) will only occur if KIR3DL1*024N is present.
Thus these two probe sets each detect 2 SNPs, namely those SNPs that are specific for KIR3DL1 wildtype gene and null allele KIR3DL1*024N because both the left probe part and the middle probe part are SNP-specific.
To be able to determine copy number variation of the KIR gene family using MLPA, it is necessary to compare the data with a sample of which it is exactly known how many copies of each KIR gene is present. Up till now, there was no such sample available. That is why a calibrator for the KIR MLPA was designed.
The calibrator is a DNA construct that is designed to contain one binding site for each probe set in the KIR MLPA. If this construct is run in an MLPA just as any other sample, it will result in a peak pattern which resembles the presence of one copy per KIR gene. In theory, if the concentration of this construct is prepared twice as high as compared to the (estimated) concentration of a human genome, the amount of copies that one peak represents is equal to two. The peak pattern of each human genome can be compared to the peak pattern of the calibrator and so the amount of copies of each KIR per genome can be determined.
The calibrator is constructed as follows: A total of six genes (
The 60-mers were obtained from (Life technologies) and assembled with the use of a PCR machine. The overlapping part of the oligo's ligate with each other and the open spaces of 20 bp are filled up with nucleotides. Part of the product is used for a second PCR reaction containing only the outside oligo's which function as primers in this reaction. The assembled gene is amplified.
A difference between the construction of the calibrator and the protocol of Stemmer et al. is the length of the starting oligo's. We have decided on the use of 60-mers, while Stemmer et al. uses 40-mers. Also, Stemmer et al. predicts that products can be synthesized of 3-5 kilo bases, while the largest part that we were able to create with this method is 680 bp. For this calibrator, six genes were designed of which gene 1 had a length of 892 bp, gene 2 to 5 had lengths of around 1.1 kb and gene 6 had a length of 2.8 kb. None of these genes could be synthesized in two steps. Therefore it was required to split the genes into smaller parts of maximally 680 bp with a 180 bp overlap. The oligo's that form these smaller parts were assembled together and the products were amplified. Successfully synthesized parts were combined in a second amplification reaction to create the entire gene of 1.1 kb. This process is illustrated by the agarose gel separation of the PCR products of one of these genes in
Genes 1 to 5 were all synthesized by these 3 steps. Gene 6 is larger and requires multiple steps to assemble. This gene was divided into 6 smaller genes of ˜650 bp which were combined to form 5 genes of ˜1 kb. These 1 kb products were then combined to form 4 products of ˜1.5 kb and so forth. The longer the products, the longer the overlap had to be for the separate products to ligate to each other. Also, for several steps it was required to purify the specific product from gel to remove all side products that had formed.
When a synthetic gene was produced it was cloned into a plasmid; either eGFP-C1 or pBlue Script. Sequencing of the inserts confirmed the presence of the entire gene. Mutations (introduced by possibly incorrectly formed oligo's and multiple PCR rounds) were present at a relatively low rate (approx. 2 mutations per kb). When a mutation was present in the vicinity of a ligation point, a site-directed mutation kit was used to mutate this specific base pair back to the original sequence. Only then were all genes combined into one plasmid: pBlueScript. The plasmid was digested so that these synthetic genes could be inserted. The result was one large construct of 11.5 kb as schematically depicted in
During an MLPA reaction the KIR probes will find only one site on the calibrator to bind. To prevent any interference or competition between different probe sets, the calibrator was designed in such a way that there is the least possible overlap between the probe binding sites. To accomplish this, spacers (non-coding sequences) were introduced between the probe binding sites. This method was applied for gene 1 to 5. A second technique is to alternate the probe binding site between the sense and anti-sense strand. A schematic representation of these two ways of distribution is shown in
The calibrator contains the probe binding sites of all KIR probe sets of
The genotyping and CNV detection within CEPH families enabled us to find further proof of principle and application of the calibrator-related assessment of the number of KIR genes within any given individual. As exemplified by two CEPH families 1344 and 1349, the CNV for 3DP1 and 2DL3 has been determined by the current KIR MLPA probe mixes and corresponding calibrator.
Previous SSP-PCR and MLPA data gave the same results in the presence or absence of KIR genes. By the added value of CNV detection within families, it has become possible to construct and trace the haplotypes of father and mother which are inherited by their offspring, as indicated in
Haplotyping is a means to further validate the MLPA for KIR genes. Each haplotype (i.e. a series of adjacent genes at one locus) on one chromosome of the parent is divided among the male and female germline cells and after conception of the fertilized egg a set of two haplotypes is created again, i.e. one from the father and one from the mother.
The framework genes are present in both the relatively fixed haplotype A and the highly variable haplotype B (see
A set of previously genotyped families was used to validate the use of the calibrator of Example 4, the sequence of which is shown in
The KIR MLPA was based upon the method as described in Example 1. In short, the MLPA was started with a probe hybridization step, followed by ligation of the different probes. The final step was the amplification of bound probes with a polymerase chain reaction (PCR). The PCR was optimized for this specific MLPA; 10 μl of ligation mixture was added to 40 μl of PCR mixture. This mixture consisted of 5 μl Accuprime Taq Buffer, 0.4 μl Accuprime Taq enzymes (Invitrogen, AccuPrime Taq DNA Polymerase High Fidelity kit), 1.875 mM MgSO4 and 1.5 μl of Salsa PCR primers (MRC Holland). The PCR reaction created products that contained a fluorescent label allowing for fragment analysis by electrophoresis. Fragment analysis was performed with a mixture of HiDi Formamide (9 μl) and Promega Internal Lane Standard 60-600 (1 μl). The probes were divided among three mixes to prevent competition between homologous primer sets. The control probes used in this assay were obtained from MRC Holland.
The PCR fragments that were formed during the KIR MLPA were analysed on a sequencer platform, in our case an 3130x1 Genetic Analyzer from Applied Biosystems. The raw data set was further analyzed using Genemarker software (Softgenetics LLC). This software normalized the peak pattern so that each peak gave a representative height and area for the actual amount of copies of each specific gene. This was also done for the peak pattern of the calibrator: the software assumed that two copies of each gene were present in every sample. Therefore, it adjusted the peaks of the calibrator (which actually represent one copy of each gene) to represent two copies.
The next step of the analysis was the MLPA analysis itself. The peak pattern of one or more calibrator samples was used as a reference for two copies. The software compared each peak from a tested sample with the height or area of the comparable peak in this reference, resulting in a ratio. A ratio of 0.75-1.25 meant an equal amount of copies, so 2. A ratio of 0.25-0.75 was marked as a deletion, so 1 copy. A ratio of 1.25-1.75 referred to a duplication, so 3 copies.
For each member of two CEPH families, the copy numbers of the KIR genes were determined as described above. The genes for the null-variants for KIR2DL4 and KIR2DS4 were determined, but KIR2DL5A and KIR2DL5B were not distinguished from each other. As inheritance of these genes is thought to occur in a Mendelian manner on two alleles, each copy in the offspring should have its origin in one of the parents. This allowed us to divide the found copy numbers of the KIR genes in separate alleles which are inherited from the parents.
The KIR haplotype of the two families as determined are shown in
Before the use of the KIR calibrator, only the presence or absence of a specific KIR gene could be demonstrated. The use of the KIR calibrator enables determination of the copy number of the KIR genes as well. For instance,
We further show that the calibrator-based KIR MLPA analysis can be used to not only indicate CNV of these KIR genes (showing 0-1-2-or-more gene copies) but also to help determine the mode of inheritance to siblings within one family. This is illustrated for KIR2DL3 in family 1349 (
Thus, CNV, which can now be determined with a KIR calibrator, can be a helpful way to determine a certain haplotype in a pedigree and use it as a tracer for genetic inheritance patterns within KIR-genotyped families.
Number | Date | Country | Kind |
---|---|---|---|
PCT/NL2008/050698 | Nov 2008 | NL | national |
Number | Date | Country | |
---|---|---|---|
Parent | 12998595 | Jul 2011 | US |
Child | 13200991 | US |