The present disclosure claims priority to Chinese Patent Application No. 202111204073.9, entitled “HILA gene amplifying primer, kit, sequencing library construction method, and sequencing method”, filed on Oct. 15, 2021, the content of which is incorporated herein by reference in its entirety.
The present disclosure relates to the field of nucleic acid sequencing technology, specifically to HLA gene amplifying primers, a kit, a sequencing library construction method, and a sequencing method.
Human leukocyte antigen (HLA) molecules present antigens to T lymphocytes for recognition, playing an extremely important role in the adaptive immune response. HLA gene complex, which is a gene cluster encoding human major histocompatibility complex (MHC) molecules, is a highly polymorphic complex composed of a series of closely linked loci and located on the short arm of chromosome 6. The HLA gene complex is the most complex polymorphic system currently known in the human body. HLA is a highly polymorphic allogeneic antigen which, in chemical aspects, is a class of glycoproteins composed of an α-heavy chain (glycosylated) non-covalently combined to a β-light chain. Its peptide chain has an amino-terminal portion extending outward (accounting for about ¾ of the entire molecule), a carboxyl-terminal portion penetrating into the cytoplasm, and a middle hydrophobic portion located within the cell membrane. HLA is divided into class I molecules and class II molecules according to their distribution and function. HLA-class I molecules are presenting molecules for endogenous antigens; HLA-class II molecules are presenting molecules for exogenous antigens. HLA-class I molecules are encoded by HLA-A, B, and C genes, whose specificity depends on the α-heavy chain; their β-light chain is a β2-microglobulin whose encoding gene is located on chromosome 15. HLA-class II molecules are controlled by the HLA-D region (including 5 subregions), the A gene and B gene of each subregion encode the α-heavy chain and β-light chain, respectively, and the antigen polymorphism depends on the β-light chain. All the above genes are polymorphic sites (multiple alleles) and are co-dominant. If MHC is considered as a whole, its polymorphism is even more prominent. It is conservatively estimated that there are at least 1,300 different haplotypes, corresponding to approximately 17×107 genotypes. HLA genes are also an important genetic structure in humans and are widely used in the fields of forensic identification, organ transplantation and match, and so on. It is well known that high polymorphism is a very important feature of HLA genes. Previous studies on polymorphisms of human leukocyte antigen gene mainly focused on exons encoding antigen-binding peptides (exons 2-4 of HLA-A, HLA-B genes, exons 2 of HLA-DQB1 and HLA-DRB1 genes). Many researchers have done a lot of work on typing technology, the detection of polymorphic sites, disease correlation, and evolutionary analysis of these exons. However, more and more studies have shown that polymorphic sites in the non-coding regions of HLA genes and functions thereof also cannot be ignored.
In the early years, human leukocyte antigen (HLA) genotyping was carried out by serology, with disadvantages including possible occurrence of false negative or false positive results, which reduces the accuracy; limited source of highly specific HLA antisera, which makes it difficult to overcome the cross-reaction of monoclonal antiserum; requirement of 7-10 ml of whole blood to prepare viable lymphocytes and even viable B cells for serological typing; difficult collection and preservation of cells. With the progress of technology, DNA typing methods of HLA genes came out. Several common HLA genotyping technologies currently on the market are as follows: Sequence-Specific Oligonucleotides Probes (SSO): Sequence-Specific Primers (SSP); Sequencing-Based Typing (SBT); and Next Generation Sequencing (NGS). SSO has the advantages of simple experiments and short experimental cycles, however, this technology cannot detect new alleles and can only distinguish between known typing sequences, and the requirement of a large number of probes to achieve effective typing is also one of its disadvantages. SSP also has the advantages of simple experiments and short experimental cycles, but it also has the problem of the inability to detect new alleles and distinguish pseudogenes. SBT has the advantage of detecting new alleles, but it cannot perform phasing on the sample DNA and may produce ambiguous results for heterozygote samples, making accurate typing impossible. NOS technology also has the advantage of detecting new alleles, however. it also leads to introducing ambiguous results because the assembly of the original sequence obtained by sequencing is very dependent on data calculation due to the short read length of sequencing, although the typing results can reach a 2-field resolution.
Based on this, it is necessary to provide HLA gene amplifying primers, a kit, a sequencing library construction method. and a sequencing method that can improve the accuracy of the results and simplify the process.
According to a first aspect, the present disclosure provides HLA gene amplifying primers, including primers in any one or more of the following nine primer sets:
According to a second aspect, the present disclosure provides a kit for HLA gene sequencing, which includes said HLA gene amplifying primers.
According to a third aspect, the present disclosure provides a method of constructing an HLA gene sequencing library, including the following steps:
According to a fourth aspect, the present disclosure provides an HILA gene sequencing method, including constructing an HLA gene sequencing library using the method of constructing an HLA gene sequencing library, and then sequencing the library.
The present disclosure proposes nine primer sets, which respectively target different target sequences of the HLA gene. The nine primer sets may be used individually for PCR amplification to obtain amplified genes, or combinedly for single-tube multiplex PCR amplification. Using the nine primer sets, 11 HLA gene sequences (including DPB1, DQA1, DQB1, DPA1, DRB1/3/4/5, HLA-A, HLA-B, HLA-C) can be obtained by single-tube multiplex PCR amplification. Compared with current next-generation sequencing, which requires at least three tubes for separate amplification respectively to achieve the amplification of the above 11 gene sequences, the present disclosure has a more convenient experimental process and a more simplified process.
The library prepared with the amplification product produced using the primer set of the present disclosure can be used to detect new alleles based on the third-generation single-molecule real-time sequencing (SMRT sequencing) technology of Pacific Biosciences (PacBio), It has the advantages of long read length and high accuracy, solving the problem of inability of phasing, and can generally reach 3-field resolution without ambiguous results. The present disclosure, taking advantage of the characteristic of long read length of third-generation single-molecule sequencing technology, increases the amplification area with regard to the design of primers and improves the coverage of the amplification area. Traditional primers are designed for exon regions only, while the primer sets of the present disclosure can lead to amplification that yields not only information about exon regions but also more comprehensive information about introns. Combining the PacBio sequencing technology with the primer sets designed in the present disclosure. it can effectively provide complete HLA gene information, which provides a complete solution for further research on the function of HLA genes in the future, and improve the sensitivity and accuracy of typing.
The details of one or more embodiments of the present disclosure are set forth in the accompanying drawings and the description below. Other features, objects and advantages of the present disclosure will become apparent from the description, the accompanying drawings, and the claims.
In order to more clearly explain the implements of the present disclosure, the drawings in the embodiments will be briefly illustrated in the following. Obviously, the drawings as described in the following are only schematic diagrams of embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can be obtained according to the provided drawings, without any creative efforts.
In In order to facilitate the understanding of the present disclosure, the present disclosure will be described more fully hereinafter with reference to the related accompanying drawings. Preferable embodiments of the present disclosure are presented in the accompanying drawings. However, the present disclosure may be embodied in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided so that the understanding of the content of the present disclosure will be more thorough.
All technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present disclosure applies, unless otherwise defined. The terms used in the specification of the present disclosure herein are for the purpose of describing specific embodiments only and are not intended to limit the present disclosure. The term “and/or” used herein includes any and all combinations of one or more of the associated listed items.
In the present disclosure, gDNA refers to genomic DNA, that is, the whole DNA of an organism in its haploid state.
In the present disclosure, the terms “target nucleic acid sequence”, “target nucleic acid” and “target sequence” refer to the objective nucleic acid sequence to be detected. In the present disclosure, the terms “target nucleic acid sequence”, “target nucleic acid” and “target sequence” have the same meaning and are used interchangeably. In the present disclosure, the terms “targeted sequence” and “target-specific sequence” refer to a sequence that is capable of selectively/specifically hybridizing to a target nucleic acid sequence under a condition that allows hybridization, annealing, or amplification of the nucleic acid, which includes a sequence complementary to the target nucleic acid sequence. In the present disclosure, the terms “targeted sequence” and “target-specific sequence” have the same meaning and are used interchangeably. It will be easily understood that a targeted sequence or target-specific sequence is specific for a target nucleic acid sequence.
In other words, under the condition that allows hybridization, annealing, or amplification of the nucleic acid, the targeted sequence or target-specific sequence hybridizes or anneals only to the specific target nucleic acid sequence, but not to other nucleic acid sequences,
In the present disclosure, “F” and “R”, the initial letters of “forward” and “reverse”, represent forward primer and reverse primer in the present disclosure, respectively. In the present disclosure, “forward primer” and “reverse primer” have the same meaning as “upstream primer” and “downstream primer”, respectively, and can be used interchangeably.
In the present disclosure, the term “upstream” is used to describe the relative positional relationship of two nucleic acid sequences (or two nucleic acid molecules), and bas the meaning as generally understood by those skilled in the art. For example, the expression “a nucleic acid sequence is located upstream of another nucleic acid sequence” means that, when aligned in the 5′ to 3′ direction, the former is located at a more forward position (i.e., closer to the 5′ end) than the latter. In the present disclosure, the term “downstream” has the opposite meaning to “upstream”.
In the present disclosure, the term “PCR” is the abbreviation of polymerase chain reaction and has the meaning generally understood by those skilled in the art, which refers to a reaction that amplifies a target nucleic acid using a nucleic acid polymerase and primers.
In the present disclosure, the term “multiplex PCR” refers to a PCR process that simultaneously amplifies multiple nucleic acid fragment products using two or more sets of primers in a single PCR system.
In the present disclosure, the terms “hybridization” and “annealing” mean a process by which complementary single-stranded nucleic acid molecules form a double-stranded nucleic acid. In the present disclosure, “hybridization” and “annealing” have the same meaning and are used interchangeably. Typically, two nucleic acid sequences that are completely complementary or substantially complementary may be subjected to hybridization or annealing. The complementarity required for hybridization or annealing of two nucleic acid sequences depends on the hybridization conditions used, particularly the temperature.
In the first aspect, the embodiments of the present disclosure provide HLA gene amplifying primers, including primers in any one or more of the following nine primer sets:
In some embodiments, the HLA gene amplifying primers include the primers in any two, three, four, five, six, seven, eight, or nine of the nine primer sets.
The present disclosure proposes nine primer sets, which respectively target different target sequences of the HLA gene. The nine primer sets may be used individually for PCR amplification to obtain amplification products of a certain target sequence, or combinedly for single-tube multiplex PCR amplification to obtain mixed amplification products of multiple target sequences. Using the nine primer sets. 11 HLA gene sequences (including DPB1, DQA1, DQB1, DPA1, DRB1/3/4/5, HLA-A, HLA-B. HLA-C) can be simultaneously amplified by single-tube multiplex PCR amplification. Compared with current next-generation sequencing, which requires at least three tubes for separate amplification respectively to achieve the simultaneous amplification of the above 11 gene sequences, the present disclosure has a more convenient experimental process and a more simplified process.
In multiplex PCR, multiple target sequences targeted by multiple primer sets are amplified simultaneously, and the amplified sequences are of different lengths, so there will inevitably be interference between the amplification of each gene fragment. Therefore. the adjustments of the ratio of each primer set and the ratio of primers in each primer set are of great significance to achieve the multiplex amplification with high quality.
In some embodiments, a molar ratio of the primers in each primer set is as follows:
In some embodiments, the ratio of the sum of molars of the primers in each primer set is as follows: the first primer set:the second primer set:the third primer set:the fourth primer set:the fifth primer set:the sixth primer set:the seventh primer set:the eighth primer set:the ninth primer set=(8.9˜9.1):(5.4˜5.6):(21˜23):(1.4˜1.6):(1.8˜2):(0.9˜1.1):(2˜2.2):(0.9˜1.1):(2.4˜2.6).
In some embodiments, the sum of molars of the primers in each primer set is as follows:
In some embodiments, while the primers in each primer set satisfy the above molar ratio, the sum of molars of the primers in each primer set can be the same multiple of the above sum of the molars.
For the PacBio third-generation sequencing platform, a simple and fast library construction and sequencing method of HLA gene has been studied. Taking the advantages of third-generation sequencing, combined with special primer amplification means, it can achieve the purpose of multiplex amplification at one time while overcoming the shortcomings of traditional sequencing methods.
In the second aspect, the embodiments of the present disclosure provide a kit for HLA gene sequencing, which includes the HLA gene amplifying primers described in any of the above embodiments. This kit is suitable for library construction and sequencing.
Optionally, the kit further includes any one or more of a PCR amplification reagent, a sequencing library construction reagent, and a gene purification reagent.
In some embodiments, the sequencing library construction reagent includes any one or more of an adaptor, an adaptor ligation reaction reagent, and an exonuclease.
The adaptor is a hairpin adaptor. By ligating the hairpin adaptors at both ends of a gene amplification fragment, the gene amplification fragment forms a circular closed structure.
In some embodiments, based on a 3.5 μL system, the adaptor ligation reaction reagent is a mixture of the following reaction reagents: 1 nmol of dNTP, 10 nmol of dATP, 0.75 U of T4 DNA polymerase, 2.5 U of T4 polynucleotide kinase, 120 U of T4 DNA ligase, T4 DNA polymerase buffer and water. 1 U of T4 DNA is defined as the amount of enzyme sufficient to incorporate 10 nmol of dNTP into an acid-insoluble precipitate within 30 minutes at 37° C.; IU of T4 polynucleotide kinase is defined as the amount of enzyme sufficient to incorporate 1 nmol of [γ-32P] into an acid-insoluble precipitate within 30 minutes at 37° C.; and IU of T4 DNA ligase is defined as the amount of enzyme sufficient for more than 50% of the DNA fragments to be ligated a 6 μg of λDNA-HindIII digested materials is reacted for 30 minutes at 16° C. in a 20 μl ligation system. By rationally designing the formula and proportion of the adaptor ligation reaction reagents, this embodiment can realize one-step adaptor ligation, which improves the process efficiency compared with the traditional multi-step ligation.
In the third aspect, the embodiments of the present disclosure provide a method of constructing an HLA gene sequencing library, including the steps of:
In some embodiments, reaction conditions for the amplification are as follows: 2 min to 3 min holding at 93° C.˜95° C.; 30 cycles, each cycle including 10 seconds at 98° C.; 11 min to 13 min at 68° C. in each cycle from cycle 1 to cycle 10, with addition of 30 seconds per cycle starting from cycle 11; and 10 min holding at 68° C.
The primers are designed for the region to be amplified, because multiple pairs of primers may be needed to be used for amplification at the same time, and the lengths of the amplified sequences are different. By adjusting the ratio of primers and adjusting the appropriate reaction environment, multiplex amplification is achieved in a single tube.
In some embodiments, reaction conditions for the ligation reaction are as follows; 20 min to 30 min holding at 37° C.; 20 min to 30 min holding at 16° C. to 25° C.; and 10 min holding at 65° C.
The purification of the library is for the purpose of removing various reaction solvents and impurities introduced during the construction of the library.
In the fourth aspect, the embodiments of the present disclosure provide an HLA gene sequencing method, including constructing an HLA gene sequencing library using the method of constructing an HLA gene sequencing library described in any of the above embodiments and then sequencing the library.
In some embodiments, the sequencing is performed based on the PacBio sequencing platform.
Pacbio's third-generation sequencing is based on the principle of sequencing by synthesis, to perform sequencing reactions using the single-molecule real-time sequencing (SMRT sequencing) system chip as a carrier, The fundamentals are as follows: a polymerase captures a library DNA sequence and is anchored at the bottom of a zero mode waveguide; four different fluorescently-labeled dNTPs randomly enter the bottom of the zero mode waveguide; the fluorescent dNTPs are irradiated by laser, allowing the fluorescence to be emitted and detected; a fluorescent dNTP, which matches a base of the DNA template, is incorporated for extension under the action of the enzyme; during the reaction of the enzyme, it, on the one hand, extends the chain and on the other hand causes the fluorescent group on the dNTP to fall off: as the polymerization reaction occurs, the sequencing is performed simultaneously,
In Pacbio's third-generation sequencing, DNA molecules are ligated with a hairpin adapter, and thus the whole constructed library is a circular molecule conducive to Its repeated replication. Moreover, repeated sequencing of one fragment can improve the accuracy, unlike the Illumina sequencing, in which noise may be caused and limit the read length due to phasing and prephasing caused by measuring multiple bases at the same time.
With the Pacific Biosciences sequencing platform, complete information can be obtained without the need to assemble the raw data obtained. This avoids problems such as the inability to distinguish sister chromosome information through first and second-generation sequencing technologies and possible error information caused by assembly. taking the entire HLA genotyping technique to a new level.
In some embodiments, the test samples suitable for the sequencing method are from pure DNA samples or other sample forms, such as whole blood, plasma, and the like.
The following are specific examples.
First, the primers are designed for the region to be amplified, because multiple pairs of primers may be needed to be used for amplification at the same time, and the lengths of the amplified sequences are different. By adjusting the ratio of primers. finding suitable reagents, and adjusting the appropriate reaction environment, multiplex amplification is achieved in a single tube. Moreover, based on the characteristic of long read length in the third-generation single-molecule real-time sequencing technology, a library is constructed directly by the Pacific Biosciences third-generation sequencing using the amplified products, and then the amplified HLA genotyping information is obtained with the Pacific Biosciences sequencing platform. The entire process is divided into five parts, in order of amplification, library construction, purification, library mixing, and computer-sequencing.
Amplification was performed using 88 human gDNA standard samples purchased from Coriell. It was proved by the results of a large number of samples that the methods of the present disclosure have universality and sensitivity.
Specific amplifying primers were designed for different HLA loci and 5′-phosphorylated primers were synthesized. Sequences of the primers are shown in the following Table 1:
SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 14, and 16 were forward primers, and SEQ ID NOS: 2, 4, 6, 8, 10, 12, 15, 17, and 18 were reverse primers. The primers were prepared and diluted using an Elution Buffer. First, the forward and reverse primers were diluted separately, and then mixed. Next, a primer MIX containing all 18 primers was formulated according to the primer volume in Table 2 below.
After the primer MIX was formulated, the reagents were added to a new PCR tube In sequence according to the PCR amplification system in Table 3 below, with the tube lid closed finally. The amplification system was mixed well and centrifuged.
The PCR program was set according to the parameters in Table 4 below for amplification, Hot Lid: 105° C., rising and cooling rate: 2.0° C./s.
The amplified products were electrophoresed using 1% agarose gel to determine whether the target gene was amplified. The conditions of agarose gel electrophoresis were as follows: voltage: 150 V, time: 50 min, sample volume: 5 ul of amplification product+5 ul of 6×loading buffer. The amplification results of gDNA purchased from Coriell are shown in
After the amplification, the amplified product was vibrated and shortly spun, and then stood horizontally after the spin. Then an adaptor ligation was performed as follows.
According to the traditional PacBio adaptor annealing program, the Barcode adaptor was subjected to annealing for future use.
The Ligase Mix was formulated according to Table 5 below.
A new PCR tube was used to formulate a ligation system according to Table 6 below.
Once formulation was completed, the tube lid was closed. The tube was shortly spun, vibrated for well mix, and then shortly spun again, and then put into a PCR instrument pre-set according to Table 7 for the ligation reaction, and finally, the ligation product was obtained;
Reaction procedure: hot lid: 75° C., rising and cooling rate: 2.5° C./s.
After the ligation reaction, the exonuclease Mix was prepared according to Table 8 below.
Tubes containing the ligation products were placed on ice and 1 μl of exonuclease Mix was added to each tube.
The tube lid was closed. The tube was shortly spun, vibrated for well mix, and then shortly spun again, and then put into a PCR instrument pre-set according to Table 7 for a digestion reaction.
Reaction procedure: hot lid: 45° C., rising and cooling rate: 2.5° C./s.
AMPure® magnetic beads (AMPure® PB beads) were taken out of the refrigerator half an hour in advance, vibrated for resuspension, and then placed on a vertical mixer for equilibration at room temperature for 30 min.
Computer-sequencing was performed according to the process of computer-sequencing of amplicons recommended by PacBio Sequel 11. The results are shown in Tables 10-12 below.
indicates data missing or illegible when filed
indicates data missing or illegible when filed
indicates data missing or illegible when filed
Note: the data in hold were HLA genes with improved typing accuracy by the present product compared to the reference typing accuracy.
The results of HLA genotyping by third-generation sequencing are summarized in Table 13.
The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, all should be considered as the scope described in this specification.
The aforementioned embodiments only illustrate several embodiments of the present disclosure, which facilitate a specific and detailed understanding of the technical solutions of the present disclosure, but they cannot be understood to limit the protection scope of the present disclosure. It should be noted that a plurality of variations and modifications may be made by those skilled in the art without departing from the conception of the present disclosure, which are all within the scope of protection of the present disclosure. Accordingly, the scope of protection of the present disclosure shall be based on the appended claims, and the description may be used to interpret the content of the claims.
Number | Date | Country | Kind |
---|---|---|---|
202111204073.9 | Oct 2021 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2022/089017 | 4/25/2022 | WO |