The present invention relates to a PCR primer set for classical HLA gene, and a highly efficient and highly uniform sequencing method using same.
Human Leukocyte Antigen (HLA) which is a major histocompatibility complex (MHC) of human is an important protein relating to immune responses and deeply involved in diseases related to immunity. The gene region encoding HLA is located at human chromosome 6 short arm part 6p21.3, and important genes of HLA include HLA-A, HLA-B, HLA-C belonging to class I molecule expressed in almost all cells, and HLA-DRB1, HLA-DQB1, HLA-DPB1 belonging to class II molecule mainly expressed in the cells in the immune system. These HLA genes are deeply involved in tissue compatibility in organ transplantation and accurate determination (typing) of HLA allele is clinically extremely important. However, the above-mentioned 6 kinds of HLA genes are regions abundantly polymorphic among human genomes, with more than 10,000 kinds of alleles, and a typing method has not been established yet.
Hosomichi et al. produced primers that specifically anneal to the upstream region and the downstream region of each of the above-mentioned 6 kinds of HLA genes and reported a method for determining the base sequence of HLA gene by a long PCR method using the primers and a next-generation sequencer (NGS) (patent document 1, non-patent document 1). However, in a preliminary experiment performed by the present inventors with the subject being Japanese people, a problem was found that uniform amplification cannot be achieved because long PCR using the above-mentioned primers showed high amplification efficiency for HLA-C and extremely low amplification efficiency for HLA-DRB1. Particularly, in HLA-DRB1, a certain kind of allele was hardly amplified and genes other than the target such as HLA-DRB3 and the like were amplified. A similar problem has been reported not only when the subject is Japanese but also Asian, Caucasian or Black (non-patent document 2).
It has been reported that, to solve the problem of the above-mentioned method, Ehrenberg et al. performed PCR by adding one kind of primer (in the cases of HLA-A and HLA-C) or 3 kinds of primers (in the case of HLA-DRB1) to the above-mentioned primers (non-patent document 2). However, uniform amplification is influenced by the different number of primers for each gene in the aforementioned PCR. Furthermore, as regards HLA-DRB1, the primer sequences are completely the same for genes other than the target genes and thus the unintended genes may be amplified.
As other approach to HLA gene typing, a method including designing, for the above-mentioned 6 kinds of HLA genes, 120-base probes corresponding to various sequences of HLA alleles based on the database, and reading the genome fragments obtained by hybridizing cDNA and the probe by NGS has been reported (non-patent document 3). In this method, however, as many as 10,000 kinds of probes have been designed to cover as many sequences as possible of HLA alleles registered in the database, which in turn causes a problem of decreased amplification uniformity and collection efficiency. In addition, it is difficult to detect alleles other than the covered HLA alleles and a genome fragment of a gene other than the target genes may be mixed.
The present invention aims to provide a primer set enabling uniform amplification of HLA genes and capable of increasing the number of samples that can be processed at one time. In addition, the present invention aims to provide a highly efficient and highly uniform sequencing method using the primer set.
Designing of PCR primer aiming at amplification of HLA gene has the following two contradictory problems.
A. Sequences of primers that can encompass as many kinds of HLA alleles as possible are selected.
B. For uniform amplification, a single primer is prepared for each amplification target region. When the regions overlap, the overlapping region should be as short as possible. Genes other than the target genes should not be amplified.
Therefore, the development of a primer set that solves the above-mentioned two problems (namely, fulfilling the above-mentioned conditions A, B) has been desired. To increase condition A, the number of primers needs to be increased or a sequence with high homology needs to be selected. In this event, condition B is difficult to achieve. The present inventor had an idea that a primer set fulfilling the two conditions of conditions A and condition B may be developed by performing the following steps 1-4.
1. Assembling sequence analysis data by next-generation sequencing of 6 genes in the samples of 768 Japanese people while eliminating reads derived from other genes.
2. Aligning the assembled results in all samples for each gene.
3. Regions having sequences common to all samples are detected from the results of 2 and noted as primer candidates.
4. Sequences having a sequence not common to other HLA genes and suitable for primers are selected from among the candidates of 3.
The present inventors have made further studies based on the conception of the above-mentioned idea and completed the present invention.
Accordingly, the present invention provides the following.
[1] A primer set for HLA-A gene amplification comprising a primer consisting of the base sequence shown in SEQ ID NO: 1 and a primer consisting of the base sequence shown in SEQ ID NO: 2.
[2] A primer set for HLA-B gene amplification comprising a primer consisting of the base sequence shown in SEQ ID NO: 3 and a primer consisting of the base sequence shown in SEQ ID NO: 4.
[3] A primer set for HLA-C gene amplification comprising a primer consisting of the base sequence shown in SEQ ID NO: 5 and a primer consisting of the base sequence shown in SEQ ID NO: 6.
[4] A primer set for HLA-DRB1 gene amplification comprising a primer consisting of the base sequence shown in SEQ ID NO: 7, a primer consisting of the base sequence shown in SEQ ID NO: 8, a primer consisting of the base sequence shown in SEQ ID NO: 9, and a primer consisting of the base sequence shown in SEQ ID NO: 10.
[5] A primer set for HLA-DQB1 gene amplification comprising a primer consisting of the base sequence shown in SEQ ID NO: 11 and a primer consisting of the base sequence shown in SEQ ID NO: 12.
[6] A primer set for HLA-DPB1 gene amplification comprising a primer consisting of the base sequence shown in SEQ ID NO: 13, a primer consisting of the base sequence shown in SEQ ID NO: 14, a primer consisting of the base sequence shown in SEQ ID NO: 15, and a primer consisting of the base sequence shown in SEQ ID NO: 16.
[7] A primer set for HLA gene amplification comprising not less than two selected from the group consisting of the primer set of [1], the primer set of [2], the primer set of [3], the primer set of [4], the primer set of [5], and the primer set of [6].
[8] A primer set for HLA gene amplification comprising all primer sets of [1] to [6].
[9] A method of sequencing an HLA gene comprising using the primer set of any of [1] to [8].
[10] A method of typing an HLA gene comprising using base sequence information obtained by the method of [9].
[11] A kit for HLA gene amplification comprising the primer set of any of [1] to [8].
The primer set of the present invention is obtained by assembling and aligning the results of large-scale sequencing and designing based on the common sequences. Thus, using the primer set of the present invention, many of HLA alleles that were difficult to amplify by the conventional method can be amplified without omission. In the primer set of the present invention, primers having no homology with other genes are selected from huge sequence candidates by a computational approach. Thus, using the primer set of the present invention, non-specific amplification can be prevented. Therefore, misdetermination of allele due to contamination by unintended genes can be reduced, which contributes to allele determination with high accuracy. Furthermore, since a single primer is used for the object gene region, uniform PCR amplification is possible, the number of samples that can be processed at one time can be increased, and HLA typing cost can be reduced.
The present invention provides a primer set capable of uniformly amplifying gene regions of HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DQB1 or HLA-DPB1 of HLA gene (including translational region, intron region and a part of untranslated region on 5′-side and 3′-side) by PCR (hereinafter to be abbreviated as “the primer set of the present invention”). Also, the present invention provides a typing method of HLA gene using the primer set of the present invention.
In the present invention, a “primer set” means a combination of not less than two PCR primers capable of amplifying a given region of each HLA gene. A primer set consisting of two primers of a forward primer and a reverse primer corresponding to each region is sometimes referred to as a primer pair.
To be specific, the present invention provides at least one primer pair selected from the primer sets recited in the following Table 1.
A schematic diagram showing the assembly results of large-scale sequencing in the HLA-A gene in the present invention and the positions of primers for amplification on the gene and an outline of the primers are shown in
A schematic diagram showing the assembly results of large-scale sequencing in the HLA-B gene in the present invention and the positions of primers for amplification on the gene and an outline of the primers are shown in
A schematic diagram showing the assembly results of large-scale sequencing in the HLA-C gene in the present invention and the positions of primers for amplification on the gene and an outline of the primers are shown in
A schematic diagram showing the assembly results of large-scale sequencing in the HLA-DRB1 gene in the present invention and the positions of primers for amplification on the gene and an outline of the primers are shown in
A schematic diagram showing the assembly results of large-scale sequencing in the HLA-DQB1 gene in the present invention and the positions of primers for amplification on the gene and an outline of the primers are shown in
A schematic diagram showing the assembly results of large-scale sequencing in the HLA-DPB1 gene in the present invention and the positions of primers for amplification on the gene and an outline of the primers are shown in
The primer constituting the primer set of the present invention (hereinafter to be abbreviated as “the primer of the present invention”) includes not only a nucleic acid consisting of the base sequence shown in each of the aforementioned SEQ ID NOs, but also a variant thereof having a base sequence that hybridizes to a base sequence complementary to the base sequence shown in each SEQ ID NO under stringent conditions and similarly having a function as a primer.
The “variant thereof having a base sequence that hybridizes to a base sequence complementary to the base sequence shown in each SEQ ID NO under stringent conditions and similarly having a function as a primer” is an oligonucleotide that hybridizes to an oligonucleotide consisting of a base sequence complementary to the base sequence shown in each SEQ ID NO under stringent conditions. For example, a “variant thereof having a base sequence that hybridizes to a base sequence complementary to the base sequence shown in SEQ ID NO:1 under stringent conditions and similarly having a function as a primer” has a function as a primer similar to that of an oligonucleotide consisting of the base sequence shown in SEQ ID NO:1, namely, a function capable of PCR amplification of a part of the HLA-A gene by combining with a primer consisting of the base sequence shown in SEQ ID NO: 2. Similarly, a “variant thereof having a base sequence that hybridizes to a base sequence complementary to the base sequence shown in SEQ ID NO:2 under stringent conditions and similarly having a function as a primer” is an oligonucleotide that hybridizes to an oligonucleotide consisting of a base sequence complementary to the base sequence shown in SEQ ID NO:2 under stringent conditions and has a function as a primer similar to that of an oligonucleotide consisting of the base sequence shown in SEQ ID NO:2, namely, a function capable of PCR amplification of a part of the HLA-A gene by combining with a primer consisting of the base sequence shown in SEQ ID NO: 1.
In such a variant, one or more nucleotides are substituted, deleted, inserted or added in the original oligonucleotide as long as the function as a primer is maintained.
The stringency during hybridization is known to be a function of temperature, salt concentration, primer strand length, GC content of the nucleotide sequence of the primer and the concentration of the chaotropic agent in the hybridization buffer. As the stringent conditions, the conditions described in Sambrook, J. et al. (1998) Molecular Cloning: A Laboratory Manual (2nd ed.), Cold Spring Harbor Laboratory Press, New York and the like can be used. The stringent temperature condition is not less than about 30° C., more preferably not less than about 37° C., most preferably not less than about 42° C. Other conditions include hybridization time, concentration of washing agents (e.g., SDS), presence or absence of carrier DNA and the like, and various stringencies can be determined by combining these conditions.
The primer of the present invention consists of a nucleic acid, and the nucleic acid is preferably a single-stranded nucleic acid. In the present invention, the nucleic acid means a molecule in which a nucleotide and a molecule having an equivalent function to that of the nucleotide are polymerized. For example, DNA, RNA and a polymer of RNA and DNA can be mentioned. When RNA is contained, “T(thymine)” in the DNA sequence is referred to as “U(uracil)” in the base sequence. The primer of the present invention can be produced by any chemical synthesis method well known to those of ordinary skill in the art. For example, it can be produced using an enzyme such as nuclease and the like, or can also be produced using a commercially available DNA/RNA automatic synthesizer (Applied Biosystems, Beckman Instruments etc.). In these primers, the constituting nucleic acid may be further modified freely. For example, in the primer of the present invention, the 5′-terminal or 3′-terminal may contain a labeling substance (e.g., fluorescent molecule, dye molecule, radioisotope, organic compound such as digoxigenin or biotin, and the like) and/or an addition sequence (loop primer part used in LAMP method and the like) to facilitate detection or amplification of the primer. The primer of the present invention may be phosphorylated or aminated at the 5-′terminal. The primer of the present invention may contain only a natural base or a modified base. Examples of the modified base include, but are not limited to, deoxyinosine, deoxyuracil, phosphorothioated base and the like. Furthermore, the primer of the present invention may contain any oligonucleotide derivative containing a phosphorothioate bond, a phophoroamidate bond and the like, or may contain peptide-nucleic acid (PNA) containing a peptide nucleic acid bond.
The primer of the present invention is preferably isolated or purified. Being “isolated or purified” means that an operation to eliminate components other than the object component from a natural or synthesized state is applied. The purity of the isolated or purified primer (percentage of target primer contained in total nucleic acid) in (w/w) % is generally not less than 50%, preferably not less than 70%, more preferably not less than 90%, most preferably not less than 95% (e.g., 100%). The purity of the primer may be appropriately changed according to the solvent and the state of solid or liquid. The unit of the purity may be (w/v) % or (v/v) %, and the desirable purity can be calculated as appropriate, taking into consideration the above-mentioned definition of purity in (w/w) %.
These primers can be provided as a solid in a dry state or in the state of alcohol precipitation, or can also be provided by being dissolved in water or a suitable buffer (e.g., TE buffer and the like).
A plurality of HLA genes can be amplified together by combining a plurality of the primer sets of the present invention. Therefore, the present invention provides primer sets including not less than two primer sets selected from the primer sets of the present invention recited in Table 1. Furthermore, using all primers recited in Table 1, multiplex PCR capable of amplifying a plurality of the above-mentioned important 6 kinds of HLA genes (i.e., HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DQB1 and HLA-DPB1) can be performed, whereby highly efficient, highly uniform and economical typing of HLA gene can be realized. Since HLA-DRB1 gene amplification primer set 1 and HLA-DRB1 gene amplification primer set 2 have overlapping regions, multiplex PCR is preferably performed separately. Similarly, multiplex PCR is preferably performed separately for HLA-DPB1 gene amplification primer set 1 and HLA-DPB1 gene amplification primer set 2 containing primers shown in SEQ ID NO: 15 or 16. Therefore, in a preferred embodiment, the present invention provides a primer set containing all the primers of the present invention recited in Table 1.
When multiplex PCR is performed using the primer set of the present invention, PCR is performed by placing plural primer sets for amplifying the object HLA gene in the same tube (container). Since HLA-DRB1 gene amplification primer set 1 and HLA-DRB1 gene amplification primer set 2 have overlapping regions, they are preferably placed in separate containers. Similarly, HLA-DPB1 gene amplification primer set 1 and HLA-DPB1 gene amplification primer set 2 are also preferably placed in separate containers. In a preferred embodiment, the present invention provides a method for performing PCR of 2 containers together in which one container contains a mixture of HLA-A gene amplification primer set, HLA-C gene amplification primer set, HLA-DRB1 gene amplification primer set 1 and HLA-DPB1 gene amplification primer set 1, and the other container contains a mixture of HLA-B gene amplification primer set, HLA-DRB1 gene amplification primer set 2, HLA-DQB1 gene amplification primer set, and HLA-DPB1 gene amplification primer set 2.
In the present invention, DNA to be a template for PCR can be prepared from a test sample. The test sample is not particularly limited and includes, for example, body fluid samples such as blood, urine and the like, cells such as mouth cavity mucosa and the like, and body hair such as hair and the like.
DNA can be prepared using known methods such as a proteinase K/phenol extraction method, a phenol/chloroform extraction method, an alkali dissolution method, a boiling method and the like. It is possible to prepare a highly pure DNA rapidly and conveniently from a trace sample by using a commercially available DNA/RNA extraction kit.
The reaction conditions of PCR for HLA by using the primer sets of the present invention may be, for example, the following conditions.
heat denaturation step (e.g., 92-98° C.)
annealing step (e.g., 55-72° C.)
elongation step (e.g., 65-80° C.)
In the above-mentioned PCR, the annealing step and the elongation step may be performed in one step (shuttle method), or a method including setting the annealing temperature higher and gradually lowering the temperature in each cycle may be adopted (touchdown method). Alternatively, they may be combined. When the shuttle method is performed, the temperature of the annealing step and the elongation step is typically 65-72° C.
Then, the base sequence of the obtained amplification product can be determined by a general method. It is preferable to determine the base sequence by using a next-generation sequencer (NGS). As for the next-generation sequencing, for example, Experiment Medicine (SUPPLEMENTAL) “Standard Protocols on Next Generatin Sequencing”, 2014 (YODOSHA CO., LTD.) and the like can be referred to. Examples of NGS used for such purpose include, but are not limited to, apparatuses manufactured by Illumina (e.g., MiSeq, HiSeq2500), apparatuses manufactured by Life Technologies (e.g., Ion Proton, Ion PGM), apparatuses manufactured by Roche Diagnostic (e.g., GS FLX+, GS Junior) and the like.
A method for sequencing using NGS varies according to the kind of NGS and can be performed, for example, according to each company's manual (e.g., Nextera® XT DNA Library Prep Reference Guide). Paired-end analysis is preferably used for sequencing the obtained sample. In the following, a summary of procedure of sequencing using MiSeq of Illumina, Inc. is described. Even when other apparatus (Ion Proton manufactured by Life Technologies, GS FLX+ of Roche Diagnostics K.K. and the like) is used, a base sequence can be similarly determined by a method suitable for the apparatus.
1. The amplification product of each sample is subjected to Tagmentation using a kit (e.g., Nextera XT DNA Sample Preparation Kit (Illumina, Inc.)).
2. Using a kit (e.g., Nextera XT v2 Index Kit Set A, B, C, D (Illumina, Inc.)), Index sequences, which are different for samples, are added to the both sides of the obtained fragment sequences and PCR is performed.
3. Amplification product is purified.
4. Using the amplification product from each sample, library size is confirmed.
5. The concentration is adjusted between samples.
6. The amplification products of the samples are pooled, and quantitative PCR is performed for the pooled amplification products.
7. The pooled amplification products are sequenced using MiSeq.
In this way, the genotype of the HLA gene can be determined (typing) using a database based on the base sequence information (hereinafter read) of the amplification product obtained by NGS. Examples of the database include IPD-IMGT/HLA database in which more than 17,000 kinds of alleles are registered, HLA dictionary newly created by the present inventors, and the like. The HLA dictionary is created by extracting the entire base sequences of exons and introns of all HLA alleles registered in IPD-IMGT/HLA database, grouping them by base sequence patterns, and recording these patterns and information of alleles belonging thereto. Examples of the typing method based on the base sequence information of the amplification product obtained by NGS include, but are not limited to, a method including mapping the read to alleles of multiple HLA genes, applying the results thereof to a linear programming problem and detecting a best fit allele pair (OptiType) (Szolek, A. et al., (2014) OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics, 30, 3310-3316.), a method including de novo assembling the read mapping results and determining alleles from the results, (HLA reporter) (Huang, Y. et al., (2015) HLA reporter: a tool for HLA typing from next generation sequencing data. Genome Med, 7, 25.).), comparing allele pairs from exon of antigen-presenting site and sequentially expanding the searching range to all exons, thus enabling typing of all alleles of IPD-IMGT/HLA (HLA-HD) (Kawaguchi, et al., (2017) HLA-HD: An accurate HLA typing algorithm for next-generation sequencing data. Human Mutation, 38, 788-797) and the like.
Alternatively, HLA typing may be performed without using NGS. Examples of such method include, but are not limited to, analysis methods using amplification products by long PCR such as array-based PCR-SSOP method, Sanger method-based PCR-SBT method, PCR-PHFA method, PCR-SSCP method and the like.
The present invention also provides a kit usable in PCR using the primer set of the present invention. The kit of the present invention may contain other reagents necessary for PCR in addition to the above-mentioned primer set of the present invention. When the aforementioned reagent does not adversely affect the reaction after preservation in coexistence with the primer set of the present invention, it can be mixed with the primer set and contained in the kit. Alternatively, the aforementioned reagent and the primer set of the present invention may be provided separately without being mixed. Examples of the aforementioned reagent include DNA extraction reagent, DNA polymerase enzyme, dNTP, reaction buffer, DNA molecule containing target sequence to be the positive control in PCR, instructions and the like. The aforementioned DNA polymerase enzyme may be a commercially available product. Examples of the DNA polymerase enzyme include, but are not limited to, PrimeSTAR GXL DNA Polymerase, Tks Gflex DNA Polymerase, TaKaRa LA Taq manufactured by TaKaRa, and Long PCR Enzyme Mix manufactured by Thermo Scientific.
While the present invention is explained in further detail by referring to Examples, the present invention is not limited by the Examples.
The design procedure of primers for performing multiplex PCR of 6 kinds of Classical HLA genes (HLA-A, -B, -C, -DRB1, -DQB1, -DPB1) is described. Bowtie2 (ver. 4.1.2) was used for mapping all reads obtained by the next-generation sequencer.
I. Assembly of Full-Length Gene from Sequencing Results of HLA Gene
Using existing primer set [Hosomichi, K. et al. (2013) Phase-defined complete sequencing of the HLA genes by next-generation sequencing. BMC Genomics, 14:355.], multiplex PCR with 2 sets of each 3 genes was performed for 768 samples (unreleased samples) (see Table 2). For DRB1, primer with modified sequence was added. For experiment conditions, refer to Table 3. The obtained amplification product was sequenced by MiSeq.
The sequencing result of MiSeq was mapped using bowtie2 for the sequence information of exons and introns of all HLA alleles obtained from IPD-IMGT/HLA database. The weight of read in each gene was calculated based on the mapping result. The calculation method of the weight is briefly explained below.
The mapping results are read and the correspondence of reads that match each exon is checked. It is examined whether 50% or more of the reads overlap with the base sequence of any exon contained in the HLA dictionary and the both base sequences completely match in the overlapping range or whether 50% or more of the reads overlap with the base sequence of any exon contained in the HLA dictionary and the both base sequences match within 2 bases mismatch in the overlapping range. The matched read for each gene is weighted according to the number of alleles corresponding to the matched sequence in each exon and intron. The weight of the read for each gene is obtained by calculating the following formula 1
otherwise, Frg=0
wherein
r: read
G: gene
wrG: weight given to gene G relating to read r
Λ: set of all HLA genes
g: gene (one of A)
X: exon or intron
mX(r): total number of alleles that matched read r on X
Kgr: set of exons and introns in which mx(r)>0 in gene g
Kg: set of all exons and introns in gene g
NX: total number of alleles containing exon X
separately for the paired-end read and summarizing by the following formula 2
w
r
G=min(wr
wherein
rf,r: pair of paired-end reads
rf: forward read
rr: reverse read
wr
wr
wr
The following procedure was performed separately for each HLA gene (gene G). The weight calculated above was added to the count of coverage.
The read group that obtained positive (>0) weight in gene G was mapped to all alleles (hereinafter complete allele) of gene G whose full length sequences including intron are registered in IPD-IMGT/HLA. The reads mapped within 5 bases mismatch to the complete allele were collected, and the coverage of each base at the mapped position was calculated. A sequence obtained by selecting a base showing the maximum calculated coverage at each position was created (hereinafter to be abbreviated as “consensus sequence”). When, at a certain position, the coverage of a base different from the consensus sequence is 10% or more of the coverage of the base corresponding to the consensus sequence, the position was taken as a different position and the base sequence thereof was taken as a different sequence. When multiple bases have a coverage of 10% or more, one with the largest coverage was selected. Two classes with the consensus and difference as IDs were prepared, they were compared to determine which of the consensus and different sequence is contained more in the reads mapped to locations containing different positions, and the reads were taken as belonging to the class contained more. A read that contained the same number or did not contain even a single different position was regarded as belonging to the both classes.
The coverage was recalculated only by the set of reads belonging to the class of consensus. When the coverage of the consensus sequence at a different position is lower than the coverage of the different sequence, the bases corresponding to the consensus sequence and different sequence were replaced at the different position. Thereafter, the coverage was calculated only by the set of reads belonging to the class on the difference side. When the coverage of the consensus sequence at a different position is higher than the coverage of the different sequence, the different position was eliminated. The reads of step 3 were classified again relative to the entire reads based on the inverted and eliminated information, inversion of the consensus sequence and different sequence at the different position and elimination of the different position were performed. This operation was repeated until inversion and elimination of different positions did not occur.
Steps 3, 4 were performed in entire complete alleles, and classified sets of reads were obtained. Any two complete alleles were taken as A, B, sets of reads belonging to the class of consensus of A, B were respectively taken as C(A), C(B), and sets of reads belonging to the class of difference of A, B were respectively taken as D(A), D(B). In each union of C(A)∪(B), C(A)∪D(B), D(A)∪C(B), D(A)∪D(B), the total number of reads after applying the weight of formula 2 (equation 2) was calculated. The number of reads in all unions of all combinations of complete allele pairs was compared, and the allele pair and the kind of union showing the highest read number were selected. The pair search included a combination of A and B being the same complete allele.
When the pair selected in step 5 contained different complete alleles, the correspondence between different positions of the two complete alleles is unknown. Thus, the different positions were matched using the relationship between the mapped reads. First, a set of mapped reads including this position was detected at each different position of the complete allele. The set of reads includes those belonging to the both classes of consensus and difference. The number of common reads for all combinations of sets of reads at different positions between two complete alleles was calculated, and the different positions corresponding to the set were made to correspond to the order of the combination of sets with many common reads as the same different positions between complete alleles. when a corresponding different position did not exist on one complete allele side, the different position was eliminated.
With respect to the allele pair determined in step 5 and the class corresponding thereto, and different positional relationship between them, each read was determined by the method of step 3 as to which complete allele side it belongs to. Consensus sequences were created from the set of reads of each complete allele obtained by the determination results.
1) All reads were remapped to the two consensus sequences obtained in step 7. 2) The number of mismatch in two mapping results was compared and each read was made to belong to the one showing a smaller number. When the number of mismatch was the same, the read was made to belong to the both. Using the set of reads belonging to each, a consensus sequence was created again. A position where the coverage of the consensus did not reach 5 read number was “N”. Steps 1) and 2) were performed again on 3) two sequences obtained. This was repeated until there was no update in both sequences.
The set of reads in two consensus sequences finally obtained in step 8 was compared. When, at this time, the number of reads belonging to only one allele is more than 10 times the number of reads belonging only to the other, the smaller allele was eliminated and the remaining allele was output as a homo allele by the consensus sequence. Otherwise, the consensus sequence of both alleles was output. The output sequence is taken as an assembly sequence.
Previously, in each of 768 DNA samples stored in the Center for Genomic Medicine affiliated with Graduate School of Medicine of Kyoto University (hereinafter to be abbreviated as “Center for Genomic Medicine”), an assembly sequence of 6 kinds of HLA genes was created using method of I.
Sequences in which the number of base sequences did not reach the following lengths were removed from the obtained assembly sequences. On this occasion, the “N” part was subtracted from the number of sequences
HLA-A: 2,800 bp
HLA-B: 2,800 bp
HLA-C: 2,800 bp
HLA-DRB1: 10,000 bp
HLA-DQB1: 6,500 bp
HLA-DPB1: 9,500 bp
When “N” was contained in 5% or more of the entire sequence, the assembly sequence was removed regardless of the length.
Multiple alignment was performed using MUSCLE (ver 3.8.31) [Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 32:1792-1797.] on the set of assembly sequences of each gene remaining in step 1.
When an insertion of 20 or more bases occurred (section with coverage of 1 continues over 20 bases) by one assembly sequence in a certain section of the consensus sequence obtained from the alignment results, the section was eliminated and narrowed the distance between the alignment positions.
The position where coverage of the consensus sequence obtained in Step 3 was not more than 95% of the number of assembly sequences was masked. In addition, the position corresponding to the exon on the consensus sequence was searched from the sequence data of known HLA genes by BLAST, a software for searching similar sequences, and the matched exon section was masked.
Using Primer3 [Untergasser, A. et al. (2012) Primer3—new capabilities and interfaces. Nucleic Acids Res., 40:e115.], the primer pair candidate was determined. The primer satisfied the following standards.
Not containing the region masked in Step 4.
A, B, C, DQB1 include all translation regions in one pair.
Two pairs were prepared for DRB1 and DPB1, the both pairs contain exon 2 and include the translation region by two forward and backward pairs.
The primer pair of each gene is specific to any other HLA gene. Specific position is present as inner as possible (3′-side position for 5′-side primer and 5′-side position for 3′-side primer).
Options of Primer3 are as follows.
PRIMER_OPT_SIZE=30
PRIMER_MIN_SIZE=20
PRIMER_MAX_SIZE=35
PRIMER_MIN_TM=40
PRIMER_MAX_TM=90
PRIMER_MIN_GC=10
PRIMER_MAX_GC=90
PRIMER_MAX_NS_ACCEPTED=1
P3_FILE_FLAG=1
PRIMER_MAX_SELF_END=3
PRIMER_PAIR_MAX_COMPL_END=3
PRIMER_MAX_HAIRPIN_TH=60
When plural candidates satisfied the above, an optimal one was selected in consideration of the sequence specificity and the score of Primer3. This operation was performed artificially. Information of the obtained primer sets is shown in Table 4.
1. Multiplex long PCR targeting 6 genes of HLA-A, -B, -C, -DRB1, -DQB1, -DPB1 was performed on each of 384 DNA samples in total. Amplification was performed in the respective two sets below.
Primer sequences and final concentrations are as shown in Table 3. The following were prepared for PCR.
DNA 25 ng
0.25 μL of Tks Gflex™ DNA Polymerase
6.25 μL 2×Gflex™ PCR Buffer
After preparation to a final volume of 12.5 μL, Cool Start™ method (TaKaRa Inc.) was applied, and PCR reaction was performed under temperature condition and cycle number shown in Table 5. As PCR apparatus, GeneAmp® PCR System 9700 (Thermo Fisher Scientific inc., Waltham, Mass., USA) was used.
2. Amplification product was purified by Agencourt® AMPure® XP Beads (Beckman Coulter, Brea, Calif., USA).
3. Each sample (1 μL) was electrophoresed on agarose gel, and the presence or absence of the object band was confirmed. The assumed band size is as indicated in Table 3.
4. Equal moles of the two sets were mixed and each sample was prepared to have the final concentration of 0.2 ng/μL (2.5 μL) of the amplification product.
Protocols not specifically described below were in accordance with the manual of Illumina, Inc. (Nextera XT DNA Library Prep Reference Guide (15031942 Rev. D)).
1. The amplification product (0.5 ng) of each sample was subjected to Tagmentation by applying Nextera XT DNA Sample Preparation Kit (Illumina) (300 bp×2 paired-end read).
2. Using Nextera XT v2 Index Kit Set A, B, C, D (2.5 μL for each row, column), different Index sequence was added to the both sides of each of the obtained fragment sequences of 384 samples, and PCR was performed.
3. The amplification product was purified by Agencourt® AMPure® XP Beads.
4. Using amplification product (1 μL) from each sample, the library size was confirmed by Agilent 2100 Bioanalyzer and High Sensitivity DNA chip (Agilent Technology).
5. The concentration between samples was adjusted with Library Normalization Beads in the kit.
6. Equal amounts of the amplification products of 384 samples were pooled.
7. Quantitative PCR using KAPA Library Quantification Kit was performed on pooled amplification products (see Technical Data Sheet of KAPA Library Quantification Kit. Illumina® platforms for the detail).
8. Using MiSeq (Illumina), 6 HLA genes were sequenced in 384 samples.
9. It was confirmed whether about 15 GB data size (25 M paired-end read) was output.
A preliminary experiment was performed using 384 samples stored in the Center for Genomic Medicine, designed primer sets for HLA gene and the above-mentioned experiment method. For comparison, 96 samples were simultaneously sequenced using the primer sets and experiment method in a previous report [Hosomichi, K. et al. (2013) Phase-defined complete sequencing of the HLA genes by next-generation sequencing. BMC Genomics, 14:355.]. Note that the comparison sample does not overlap with the preliminary experiment. The sequencing results of these samples by MiSeq were mapped to all HLA alleles with known full-length sequences in the IMGT/HLA database. The reads that matched any allele were collected and the average coverage for each gene was calculated. The gene length to be the denominator of the average coverage differs depending on the HLA allele. Thus, the average length of each gene in the allele set was used.
This application is based on a patent application No. 2017-024397 filed in Japan (filing date: Feb. 13, 2017), the contents of which are incorporated in full herein.
The primer set of the present invention enables uniform PCR amplification, and can increase the number of samples to be used for one operation of a next-generation sequencer. Using the common sequence of large-scale samples of Japanese people obtained at the Center for Genomic Medicine, the frequency of occurrence of unidentified allele that does not match the primer is considered to be extremely low for Japanese people. Thus, it is suitable for automation of HLA typing since the reperformance rate is reduced. It is applicable to organ transplantation, highly safe compatible tests in the treatment using iPS cells in the future, and securing many donors. In addition, since the number of primers can be reduced, is easily arranged and is easily commercialized.
Number | Date | Country | Kind |
---|---|---|---|
2017-024397 | Feb 2017 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2018/004703 | 2/9/2018 | WO | 00 |