The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jun. 14, 2019, is named 104126-201_SL.txt and is 2,011 bytes in size.
Disclosed herein are compositions and methods useful for performing single step polymerase chain reactions (PCR) of complex target gene pools, for example, microbiome samples, using specific and nonspecific primers to amplify target sequences while maintaining the original ratios of gene variants and reducing or eliminating primer concentration-dependent PCR amplification bias.
Investigating the microbiome involves cataloging the types of microbes present and how many of each type there are. Targeted sequencing of 16S, 18S, 23S, Internally Transcribed Spacer (ITS) or other combinations of conserved and variable genetic regions can be used to identify microbes in a complex mixture. Most targeted regions contain conserved sequences that are present across many domains of life that can be targeted by PCR primers that flank more poorly conserved variable regions that are specific to a particular organism, providing a positive identification for that organism. PCR assays have an advantage in that the amount of sequencing required for microbiome profiling can be orders of magnitude less than needed for full microbiome sequencing. PCR assays using 16S, 23S, or other targeted DNA sequencing techniques typically involves designing primers that anneal in conserved regions of the target gene. Although the primer sites may be conserved, there is sufficient variation between organisms such that a single primer set is typically unable to capture all variant genes. Organisms will be either poorly represented or missed because the primer sequence will anneal poorly to the variant gene. The problem of missing microbial variants can be addressed by including additional primers that anneal to the variants (Apprill, A., et al, Aquat Microb Ecol, 2015 and Caporaso, J. G. et al., PNAS, 2011), but this approach can result in differential amplification depending on the relative abundance of targets and PCR primers (Castelino, Madhura et al. “Optimisation of Methods for Bacterial Skin Microbiome Investigation: Primer Selection and Comparison of the 454 versus MiSeq Platform.” BMC Microbiology 17 (2017): 23. PMC. Web. 11 Feb. 2018), introducing PCR bias. To elucidate, within a PCR reaction containing multiple targets, a high abundance target will deplete its corresponding primer more rapidly than a low abundance target, and the depleted primer will result in less efficient PCR for that target as the reaction progresses. Low abundance targets will not experience primer depletion at the same rate, so low abundance PCR remains efficient for longer. As a result of primer depletion during PCR, the original population ratios of target genes (representing the organisms) will become distorted, artificially inflating low abundance and/or repressing high abundance representation of organisms in the microbial mix, which can greatly complicate interpretation of microbiome data.
The problem of missing microbial variants can be solved by including PCR primer variants, but this introduces a new problem of primer concentration-dependent PCR amplification bias.
The compositions and methods disclosed herein utilize specific and nonspecific primers in the same PCR reaction to amplify target sequences in ratios that are proportional to the levels of the target sequences present in the starting sample thus reducing or eliminating the problem of primer concentration-dependent PCR amplification bias while still capturing variants in samples containing a complex target gene pool, such as the microbiome.
Disclosed herein are compositions and methods for single-step polymerase chain reaction (PCR) of a sample containing a complex target gene pool (for example, 16S rRNA amplification of a microbiome sample) that can simultaneously amplify a wide variety of variant target gene sequences common to the sample while maintaining the original ratios of gene variants. The compositions and methods described herein utilize (1) a gene-specific primer pool that contains multiple variants that occur in a sample containing a complex mixture of target sequences (for example, microbial gene sequences) that are both required for amplification of variants in the mixture which may introduce amplification bias, with (2) a non-specific PCR primer that is designed to target multiple gene-specific primer variants and eliminate amplification bias. The non-specific primers are designed so that they cannot anneal to a target until the third round of PCR, after two rounds of specific PCR have occurred. A primary advantage of this method over previous methods is that the two types of primers (specific and non-specific) can be combined in the same PCR reaction, such that if one or more target gene specific primers are depleted during PCR of a complex sample mixture, the non-specific primers can take over amplification in later rounds. The result is that bias due to the depletion of specific primers during PCR will be reduced or eliminated, maintaining accurate proportional representation of the corresponding target genes.
The invention includes specific and non-specific PCR primers designed to amplify variations in a target gene across a variety of organisms while reducing amplification bias that results from the resulting competition between primers during PCR. The specific PCR primer pool contains sequence variants that anneal to a variety of organisms, and in addition employ a common primer ‘tail’ that is not present in the genome of any target organism. The nonspecific primer consists of a single sequence, identical to the common primer ‘tail’ sequence on the specific primer. Since the nonspecific primer sequence has no genomic target, it will not anneal in PCR cycles 1 and 2.
The combination of multiple gene-specific primers with non-specific artificial sequence primers enables coverage of gene variants while reducing the potential for PCR bias. Although the specific and non-specific primers are included in one reaction, the non-specific primers cannot participate before the third round of amplification.
In the first round of PCR, the specific portion of the organism specific PCR primers will bind, if their genomic target is present. The nonspecific 5′ portion of the specific primer (shown in
Round 1 PCR synthesizes the complementary gDNA strand, but there is no complement synthesized for the nonspecific sequence on the tail of the specific primer. As a result, during the second round of PCR, again only the organism-specific annealing targets are available. During the second round of PCR the first round products are copied back through the PCR primer sequence, including the full nonspecific portion of the primer. From this point onward each PCR round can continue to use either an organism-specific primer (if available) or the non-specific primer (present in excess for all templates).
Accordingly, disclosed herein is a multi-primer assay for determining ratios of target DNA sequences in a sample by (a) contacting the sample with a plurality of oligonucleotide primers in a single vessel, wherein the plurality of oligonucleotide primers includes (i) one or more sets of forward and reverse specific primers having a nonspecific nucleotide sequence designed to not anneal to a DNA sequence in the sample, linked to specific nucleotide sequences complementary to specific consecutive base sequences of target DNA sequences; and (ii) one or more sets of forward and reverse nonspecific primers having a nucleotide sequence identical to the nonspecific nucleotide sequence in (i); (b) performing a minimum of three rounds of a multi-primer amplification reaction in the vessel, wherein the sets of nonspecific primers do not participate in the amplification reaction until round three of the amplification reaction; and (c) detecting the presence of amplification products corresponding to the target DNA sequences, wherein the ratios of the amplification products reflects the ratio of the target DNA sequences in the sample. A nonspecific primer is designed to have a nucleotide sequence that does not anneal to any DNA sequence in the sample, and will not produce an amplicon until the third round three of the amplification reaction. Primers can be designed using methods known in the art.
In some embodiments, the sets of forward and reverse specific primers include a linker sequence between the nonspecific nucleotide sequence and the specific nucleotide sequence. In some embodiments, the linker sequence has a length of from about 5 to about 25 nucleotides. In some embodiments, the linker sequence comprises a unique barcode sequence used to identify individual samples. Barcoding individual samples with specific DNA tags allows many samples to be combined for sequencing, streamlining the workflow, reducing workflow complexity, decreasing time to result, and reducing costs. DNA barcodes can be selected to be of sufficient length to generate the desired number of barcodes with sufficient variability to account for common sequencing errors, generally ranging in size from about 2 to about 20 bases, but may be longer or shorter.
In some embodiments, the target sequence comprises a sequence of a gene selected from the group consisting of: 16S rRNA, 23S rRNA, 18S rRNA, ITS1, ITS2, HSP65, rpoB, recA, Internally Transcribed Spacer (ITS), human HLA, microbial toxin producing genes, microbial pathogenicity genes, microbial plasmid genes, human immune system genes, immune system components, ribosomal RNA genes, and other variable genetic regions of non-human organisms. In some embodiments, the 16S rRNA target sequence comprises a region selected from the group consisting of: V1V3, V3V4, V4-V5 and V1V9. In some embodiments, the target sequence comprises a region V1V9-EXT that includes part or all of the V1V9 region of a 16S rRNA gene and part or all of the adjacent (i) Internally Transcribed Spacer gene, and (ii) 23S gene.
In some embodiments, the forward specific primers have a nucleotide sequence of AATGATACGGCGACCACCGAGATCTACACATATGCGCACACTCTTTCCCTACA CGACGCTCTTCCGATCTAGRGTTYGATYCTGGCTYAG (SEQ ID NO: 1) wherein Y is C or T, and R is A or G. In some embodiments, the reverse specific primers have a nucleotide sequence of CAAGCAGAAGACGGCATACGAGATCTGATCGTGTGACTGGAGTTCAGACGTG TGCTCTTCCGATCTTYACCGCRRCTGCTGGCAC (SEQ ID NO: 2) wherein Y is C or T, and R is A or G. In some embodiments, the forward nonspecific primer has a nucleotide sequence of AATGATACGGCGACCACCGAGATCTACACATATGCGCACACTCTTTCCCTACA CGA (SEQ ID NO: 3). In some embodiments, the reverse nonspecific primer has a nucleotide sequence of CAAGCAGAAGACGGCATACGAGATCTGATCGTGTGACTGGAGTTCAGACGTG TGCTCTTC (SEQ ID NO: 4).
In some embodiments, the target DNA in the sample originates from a source selected from the group consisting of: feces, cell lysate, tissue, blood, tumor, tongue, tooth, buccal swab, phlegm, mucous, wound swab, skin swab, vaginal swab, or any other biological material, tissue or fluid originally obtained from a human, animal, plant, or environmental sample (such as soil, water or air), including raw samples, complex samples, mixtures, and microbiome samples.
In some embodiments, the target DNA originates from an organism selected from the group consisting of: spores, biofilms, multicellular organisms, unicellular organisms, prokaryotes, eukaryotes, microbes, bacteria, archaea, protozoa, algae, fungi and viruses.
Also disclosed herein are nucleic acid amplification primers represented by the following general formulas: 5′-A-B-C-3′ or 5′-A-C-3′ or 5′-B-A-C-3′ or 5′-A-C-B-3′ or 5′-C-B-A-3′ or 5′-C-A-3′ or 5′-B-C-A-3′ or 5′-C-A-B-3′ wherein, ‘A’ represents a nonspecific nucleotide sequence having a length of from about 15 to about 100 nucleotides, that does not anneal to a target nucleic acid, represents a linker nucleotide sequence having a length of from about 5 to about 30 nucleotides, and represents a nucleotide sequence complementary to a specific consecutive base sequence of a template nucleic acid having a length of from about 10 to about 30 nucleotides. In a preferred embodiment, the nucleic acid amplification primers are represented by the following general formula: 5′-A-B-C-3′.
In some embodiments, the target DNA sequence is selected from a human, microbial, animal, plant or viral gene sequence. In some embodiments, the target DNA sequence comprises a gene selected from the group consisting of: 16S rRNA, 23S rRNA, 18S rRNA, ITS1, ITS2, HSP65, rpoB, recA, Internally Transcribed Spacer (ITS), human HLA, microbial toxin producing genes, microbial pathogenicity genes, microbial plasmid genes, immune system genes, immune system components, olfactory receptor genes, ribosomal RNA genes, and other variant by related genes of prokaryotes and eukaryotes.
In some embodiments, the specific primers are complementary to a region of a 16S rRNA sequence selected from the group consisting of: V1V3, V3V4, V4-V5, and V1V9. In some embodiments, the specific primers are complementary to the V1V9-EXT region comprising the entire V1V9 region of the 16S rRNA gene and part or all of its adjacent ITS and 23S rRNA genes.
The present invention also includes kits that utilize any of the compositions and methods described herein.
The benefits of the approach disclosed herein include that multiple specific primers can be combined in a single PCR assay, with different concentrations, sequences, and affinities for their targets. The potential for PCR primer competition between these multiple primers is greatly reduced because of the presence of the non-specific primer that has identical concentrations, sequences and affinities across all target organisms. This enables the use of multiple PCR primers that include both a wide variety of genomic targets, as well as more uniform amplification across all targets. Since the non-specific primers can pick up amplification after the second round (cycle), the overall primer pool for all amplicons remains uniform during PCR. Because the non-specific target sequence can be selected to include sequences required for post-PCR library creation for sequencing, many steps can be removed from adapter and library processing that have been shown to result in sample crosstalk and index hopping (MacConaill, L. E., Burns, R. T., Nag A., Coleman, H. A., Slevin, M. K., Giorda, K., Light, M., Lai, K., Jarosz, M., McNeill, M. S., Ducar, M. D., Meyerson, M., Thorner, A. R., Unique, dual-indexed sequencing adapters with UMIs effectively eliminate index cross-talk and significantly improve sensitivity of massively parallel sequencing. BMC Genomics. 2018 Jan. 8; 19(1):30).
The Examples described herein apply the compositions and methods of the invention to amplification of DNA sequences from a microbiome sample. DNA was generated by lysing the cells in the target microbiome, after which the resulting DNA was used as template in PCR amplification targeting the 16S rRNA gene, present in all bacteria and archaea. Microbes can be identified using their 16S rRNA gene sequence, which varies slightly in most, if not all, bacteria and archaea. The variation in 16S gene sequence means that individual species of bacteria and archaea have characteristic DNA variations in the 16S rRNA gene that serve as identifiers, or fingerprints, for that species. Kits, protocols and software available from Shoreline Biome (Farmington, Conn.) enable comprehensive fingerprinting of the microbes in a sample and simultaneous 16S rRNA profiling of many samples at once, at high resolution, using amplicons designed in both the 16S rRNA and 23S rRNA genes. Known microbes can be identified after sequencing by mapping the DNA sequence of the 16S gene to a database of known reads. Unknown microbes will contain 16S DNA sequences that are different from any of the microbes in the database, but can be tracked using their unique 16S sequence. In addition, the number of reads obtained for each microbe in a sample can reveal the relative abundance of each microbe in a sample. The relative abundance can be an important indicator of the state of each individual microbiome. Lysis or PCR techniques that change relative abundances of microbes, or leave out certain microbes altogether, can lead to sequencing results that incorrectly characterize the state of the microbiomes being studied. The invention as described is an improved method for achieving the correct relative abundances of a wider variety of microbes from a sample.
Detailed Steps:
Step 1. Shoreline Biome DNA Preparation Kit:
Contents:
CCTACACGACGCTCTTCCGATCTAGRGTTYGATYCTGGCTYAG-3′
ACGTGTGCTCTTCCGATCTTYACCGCRRCTGCTGGCAC-3′
CCTACACGA-3′
ACGTGTGCTCTTC-3′
Step 2. Steps to the PCR Protocol:
The Example described herein applies the compositions and methods of the invention to amplification of DNA sequences. 16S primers are useful for microbiome research because they act as universal primers that can detect most bacteria in a complex sample. This attribute also makes it difficult to detect specific amplification biases due to primer depletion while maintaining the complexity of a sample. In order to overcome these issues, a model was developed to be used as a representative example of a microbiome sample and show that amplification can occur with very limited specific primer if non-specific primers are added to the PCR reaction as described in this Example. An 8-organism mock microbiome community and an additional organism (Bifidobacterium adolescentis) was used to represent a microbiome sample. B. adolescentis was chosen specifically for the fact that it is not amplified by the V1V3 primers and can therefore be used to detect amplification by limiting the amount of its specific primer.
Detailed Steps:
Step 1. Shoreline Biome DNA Preparation Kit:
Contents:
CCTACACGACGCTCTTCCGATCTAGRGTTYGATYCTGGCTYAG-3′
ACGTGTGCTCTTCCGATCTTYACCGCRRCTGCTGGCAC-3′
CCTACACGA-3′
ACGTGTGCTCTTC-3′
Step 2. Steps to the PCR Protocol:
One or more embodiments of the present invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. Accordingly, other embodiments are within the scope of the claims that follow.
Number | Date | Country | |
---|---|---|---|
62666854 | May 2018 | US |