This relates to a method for generating truncated and barcoded nucleic acid molecules from at least two target polynucleotide sequences, each from distinct biological particles.
Many methods have been developed to attach a barcode sequence to a target nucleic acid molecule. For example, inDrop™ (“indexing droplets,” Klein et al., Cell 161:1187-1201 (2015)), 10X platform from 10X Genomics (Zheng et al., Nat Commun 8: 14049 (2017)), and DropSeg™ (Macasco et al., Cell 161:1202 (2015)) (collectively referred to as “DropSeq-like methods”) can each attach a cell barcode and unique molecular index (also called “unique molecular identifier” or “UMI”) to cDNA. When combined with massively parallel sequencing (e.g., “NextGen Sequencing” or “NGS”), such barcoding methods can be immensely powerful in analyzing large numbers of biological samples (e.g., tens of thousands of individual cells). However, due to inherent limitations in some NGS technologies, often only sequence information for the portion of the nucleic acid in proximity to the barcode can be obtained using existing methods. For example, with Illumina, Inc.'s sequencers, library molecules with a long (e.g., >1,000 bp (base pairs)) insert tend to generate clusters with poor quality during bridge PCR. Thus, the DNA molecules to be sequenced are usually shortened to approximately 500 bp or less to accommodate this limitation. As a result, only sequences close to the barcode (e.g., within approximately 500 bp or less) have been able to be obtained using these methods. For this reason, DropSeq-like methods are considered 3′ sequencing techniques, because the barcode is attached to the 3′ end of the nucleic acid and the sequencing can only provide information on the region of approximately 500 bp or less to that 3′ end.
However, sequence distant from the barcode may be of interest. For example, in DropSeq-like methods the barcode is attached to the 3′ end of the mRNA molecule (or 5′ end of the first strand cDNA molecule); whereas one may be interested in learning about a splicing junction, a possible point mutation, or a hypervariable region several kilobases upstream in the mRNA molecule. Unfortunately, it is difficult to obtain such information using DropSeq-like methods.
We describe a series of circularization-based methods that generate sequencing libraries where sequence distant from the barcode can be brought to proximity with the barcode in linear DNA. These methods are collectively referred to as circularization-based DNA reorientation, or TeleLink™. The resultant DNA molecules can then be analyzed with NGS (e.g., using Illumina platforms) where both the barcode and the distant sequence can be read.
In accordance with the description, in one embodiment a method for generating truncated and barcoded nucleic acid molecules from at least two target polynucleotide sequences each from distinct biological particles comprises:
In some embodiments, the method further comprises amplifying the truncated barcoded nucleic acid molecules to obtain a barcoded amplified product comprising the barcode and the portion of the target polynucleotide sequence.
In some embodiments, the truncated nucleic acid molecules are amplified using primers capable of binding to the primer-binding sites.
In some embodiments, the barcoded amplified product comprises a length of equal to or less than 500 base pairs.
In some embodiments, the barcoded nucleic acid molecules further comprise at least one primer binding site.
In some embodiments, the method further comprises introducing at least one primer-binding site to the truncated and barcoded nucleic acid molecules.
In some embodiments, the method further comprises truncating the target polynucleotide sequence before circularizing the barcoded nucleic acid molecules.
In some embodiments, the method further comprises ligating at least one additional domain to the truncated end of the barcoded nucleic acid molecule before circularizing the barcoded nucleic acid molecules.
In some embodiments, the method further comprises ligating at least one additional domain to barcoded nucleic acid molecules before circularizing the barcoded nucleic acid molecules.
In some embodiments, the barcoded nucleic acid molecule is DNA, RNA, or bisulfite-treated DNA.
In some embodiments, the target nucleic acid molecule is DNA.
In some embodiments, the target polynucleotide sequence is at least part of an engineered molecule that is used to engineer or probe the biological particle.
In some embodiments, the length of circular barcoded nucleic acid molecules is greater than 1 kb, 1.5 kb, 2 kb, 3 kb, 5 kb, or 10 kb.
In some embodiments, the distinct biological particles comprise cells, nuclei, or a cell cluster. In some embodiments, the biological particles are cells. In some embodiments, at least some of the cells are prokaryotic cells.
In some embodiments, at least some of the cells are eukaryotic cells.
In some embodiments, at least some of the cells are engineered with DNA, RNA or viral vectors that encode one or more biological agents that cause RNA-mediated gene knockdown, genome editing, transcriptional alteration, or epigenetic alteration.
In some embodiments, the one or more biological agents comprise one or more of siRNA, shRNA, miRNA, zinc finger domains, transcription activator-like effector (TALE), Cas9, RNA with CRISPR origin.
In some embodiments, the cell cluster comprises a T cell and an antigen presenting cell.
In some embodiments, the cell cluster comprises a cell that expresses an antigen-recognizing agent and a cell that expresses an antigen.
In some embodiments, the antigen-recognizing agent comprises an antigen-recognizing protein or an antigen-recognizing polynucleotide.
In some embodiments, the antigen-recognizing protein comprises an antibody, a functional antibody fragment, or a T cell receptor.
In some embodiments, the antigen is complexed with a major histocompatibility complex (MHC) molecule.
In some embodiments, the target polynucleotide sequence comprises a partial or complete T cell receptor sequence, or a partial or complete B cell receptor sequence.
In some embodiments, the target polynucleotide sequence comprises a mutation.
In some embodiments, the target polynucleotide sequence comprises a transcription start site.
In some embodiments, the target polynucleotide sequence comprises a splicing junction.
In some embodiments, a method for sequencing a target nucleic acid molecule comprises sequencing the barcoded amplified products.
Additional objects and advantages will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice. The objects and advantages will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the claims.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate several embodiments and together with the description, serve to explain the principles described herein.
Table 1 discloses what each domain name in
Domain level description. In this document, sometimes the polynucleotide sequence is described at domain level. Each domain name corresponds to a specific polynucleotide sequence and/or a specific function. For example, domain ‘A’ may have a sequence of 5′-TATTCCC-3′, domain ‘B’ may have a sequence of 5′-AGGGAC-3′, and domain ‘C’ may have a sequence of 5′-GGGAAGA-3′. In this case the polynucleotide having a sequence that is the concatenation of domains A, B, and C, can be written as [A|B|C}. The symbol ‘[’ denotes the 5′ end, the symbol ‘}’ denotes the 3′ end, and the symbol ‘|’ separates domain names. An asterisk sign shows sequence complementarity. For example domain ‘B*’ is the reverse complement of domain ‘B’.
Functional description vs sequence description. In some figures and descriptions in this document (e.g.
Table 5 provides a listing of certain sequences referenced herein.
Biological particles: “Biological particles” are individually separable and dispersible particles of biological origin, such as cells (prokaryotic or eukaryotic), nuclei, cell clusters, organelles (such as mitochondria), and viruses. Other than viruses, biological particles are usually composed of at least 50 molecules and are usually large enough that they cannot pass through 0.22-micron filter. In some embodiments, the biological particles are prepared from biological samples. For example, the biological particles can be cells prepared from fresh tissue (such as dense cell matter from tumor or neural tissues). In some embodiments, the biological particles are whole cells or nuclei prepared from frozen tissue. See Krishnaswami et al., Nat. Protoc. 11:499-524 (2016). In some situations, the analysis of nuclei (rather than cells) may be advantages or necessary. For example, when the cells are abnormally shaped cells (e.g. neurons) or when freezing conditions have ruptured the outer cell membrane, intact cells can be difficult to prepare, whereas intact nuclei can be prepared more readily.
In some embodiments, at least some of the cells can be engineered with DNA, RNA, or viral vectors that encode one or more biological agents that cause RNA-mediated gene knockdown, genome editing, transcriptional alteration, or epigenetic alteration. The one or more biological agents may include, for example, one or more of siRNA, shRNA, miRNA, zinc finger domains, transcription activator-like effector (TALE), Cas9, or RNA with CRISPR origin.
Cell Clusters: As used herein, “cell clusters” refer to a grouping of cells. In some embodiments, the cell clusters comprise cells that express an antigen-recognizing agent and cells that express an antigen. Antigen-recognizing agents include, for example, an antigen-recognizing protein, such as an antibody, functional antibody fragment, or a T-cell receptor (TCR), or an antigen-recognizing polynucleotide. In some embodiments, the cell cluster comprises T cells and antigen presenting cells (APCs). The antigen may be complexed, for example, with a major histocompatibility complex (MHC) molecule.
Barcode: As used herein, a “barcode” or “BC” refers to a sequence barcode or barcodes responsible for deciphering the original location, count, or identity of the nucleic acid molecule. In some embodiments, the barcode comprises a compartment barcode (CB) and/or a unique molecular identification (UMI) sequence. To accomplish the barcoding, it is only necessary to bind a single barcode to the nucleic acid molecule. The length of a barcode may be from 3 to 20 nucleotides, 4 to 10 nucleotides, or 6 to 8 nucleotides in length, or 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 nucleotides in length.
Compartment barcode: A “compartment barcode” or “CB” is a nucleic acid sequence that is carried by primers that denote the identity of the compartment a target nucleic acid was associated with. Compartment barcode usually varies between compartments (i.e., different compartments have different compartment barcodes). At the same time, all compartment barcode sequences on all primers in one compartment usually are, or are intended to be, the same. The length of a barcode may be from 3 to 20 nucleotides, 4 to 10 nucleotides, or 6 to 8 nucleotides in length, or 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 nucleotides in length.
The compartment barcode is often created by clonal expansion of single template nucleic acid molecules (e.g., Church and Vigneault, US20130274117) or by split-and-pool synthesis (e.g., in inDrop™ and DropSeg™ technologies, see Klein et al. above and Macosko et al., Cell 161:1202-1214 (2015), respectively).
In some embodiments, a compartment barcode is a cell barcode. See, e.g., Klein et al. above. For example, in single cell RNA-Seq techniques, such as Drop-Seg™ and inDrop™, compartment barcodes are used as cell barcodes, such that all RNA transcripts from the same cell are reverse-transcribed off primers sharing the same compartment barcode.
Unique molecular identification (UMI) sequence: As used herein, a “unique molecular identification” or “UMI” sequence refers to short oligonucleotides added to each molecule in some NGS protocols prior to amplification. The UMI may include random nucleotides (e.g., NNNNNNN), partially degenerate nucleotides (e.g., NNNRNYN), or defined nucleotides (e.g., when template molecules are limited). The use of UMIs can reduce the quantitative bias introduced by replication, which may be necessary to have enough molecules for detection, as duplicate molecules may be identified. In some embodiments, the length of an UMI is from 3 to 10 or 4 to 8 bp in length, or 3, 4, 5, 6, 7, 8, 9, or 10 bp in length.
Primer: Primers are oligonucleotides that, during an experiment or a series of experiments, become part of a molecule or a molecular complex comprising: (a) the primer; and (b) a nucleic acid moiety that is either a target nucleic acid or a nucleic acid moiety whose formation is dependent on the presence or sequence of the target nucleic acid. As used herein, “primer” includes a single primer or a panel of different primers. In some embodiments, one or more of the primers may have an extendable 3′ end, may hybridize to a template nucleic acid (DNA or RNA), and/or may be extended by polymerases to copy the template nucleic acid (such as the target nucleotide sequence). In some embodiments, one or more of the primers may be a substrate for ligation. In some embodiments, one or more of the primers may participate in a hybridization or crosslinking reaction.
One or more of the primers may be engineered or chosen based on the features of target nucleotide sequence. The primers usually have at least 4, 5, or 6 consecutive nucleotides that are complementary to at least a portion of the target nucleotide sequence. One or more of the primers may comprise a non-specific sequence (e.g., oligo/poly (d)T/U) or gene-specific sequence. As an example, if the target nucleic acid is polyadenylated RNA, oligo dT primer can be used as primer. The oligo dT primer anneals to the polyA tail of the RNA. In other embodiments, a gene-specific primer can be used. Gene-specific primers are designed based on known sequences of the target RNA. Gene-specific primers are commonly used in one-step RT-PCR applications.
The length of one or more of the primers may be from 4 to 200,80 to 160, or 120 to 140 nucleotides in length, or 4, 5, 6, 8, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotides in length. In some embodiments, the primer is also associated with a unique molecular identification (UMI) sequence and/or a barcode (BC) sequence. Methods to design primers to known sequence are well known to a person of ordinary skill in the art.
In some embodiments, one or more of the primers may contain randomly synthesized sequence, alone or in combination with an oligo dT primer. Randomly synthesis gives a range of sequences with potential to anneal at random points on a DNA sequence and act as a primer to start first strand cDNA synthesis in various PCR applications. In some embodiments, the randomly synthesized sequence is from 2 to 20, 3 to 15, or 4 to 10 nucleotides in length, or 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, or 20 nucleotides in length. For example, random hexamer or random hexonucleotides are commonly used when the sequence of target nucleotide sequence is unknown or diverse. See, e.g., Hansen et al., Nucleic Acids Res. 38:e131 (2010).
Primer delivery particle: As used herein, “primer delivery particle” refers to a particle that can host primers within, on the surface, or throughout the material comprising the particle. In some embodiments, the primer delivery particle also hosts a unique molecular identification (UMI) sequence and/or a barcode (BC) sequence and these sequences can be directly linked to the primer sequence. The primers may be attached to the primer delivery particle by methods known to those of skill in the art, such as by amine-thiol crosslinking, maleimide crosslinking, or crosslinking usingN-hydroxysuccinimide or N-hydroxysulfosuccinimide In some embodiments, biotin may be used to attach the primer to one or more beads coated with streptavadin.
In some embodiments, the diameter of a primer delivery particle can be about from 1 micron to 1 millimeter, or greater than or equal to 1, 5, 10, 30, 50, 100, 500, or 750 microns. The primer delivery particle can be of uniform or heterogeneous volume. The average volume of a batch of primer delivery particles used in one experiment may be from 0.5 femtoLiter to 0.5 microLiter, from 1.0 femtoLiter to 0.25 microLiter, or from 10 femtoLiter to 0.125 microLiter, or from 1 picoLiter to 5 nanoLiter.
In some embodiments, the primer delivery particle may be a droplet or fluid, such as a water in oil droplet or lipid microsphere that contains the primers internally in an aqueous solution. A primer delivery particle may also be a “solid,” such as a bead, or a soft, compressible, yet non-fluidic material, such as a hydrogel (e.g., agarose gel, polyacrylamide gel, and polydimethylsiloxane (PDMS) gel, such as polyethylene glycol (PEG)/PDMS hydrogel).
A bead may encompass any type of solid or hollow sphere, ball, bearing, cylinder, or other similar configuration composed of plastic, ceramic, metal, or polymeric material onto which a nucleic acid may be immobilized (e.g., covalently or non-covalently). A bead may comprise nylon string or strings. A bead may be spherical or non-spherical in shape. Beads may be unpolished or, if polished, the polished bead may be roughened before treating (e.g., with an alkylating agent). A bead may comprise a discrete particle that may be spherical (e.g., microspheres) or have an irregular shape. The diameter of the beads may be about 5 μm, 10 μm, 20 μm, 25 μm, 30 μm, 35 μm, 40 μm, 45 μm, 50 μm, 60 μm, 70 μm, 80 μm, 90 μm, or 100 μm. A bead may refer to any three-dimensional structure that may provide an increased surface area for immobilization of biological particles and macromolecules, such as DNA and RNA. Beads may comprise a variety of materials including, but not limited to, paramagnetic materials, ceramic, plastic, glass, polystyrene, methylstyrene, acrylic polymers, titanium, latex, sepharose, cellulose, nylon, agarose, polyacrylamide, and the like. Examples of beads include the gel bead GEMs in Zheng et al., Nat. Commun. 8:14049 (2017) and the gel beads in Klein et al.
The terms “hydrogel”, “gel,” and the like, are used interchangeably herein and may refer to a material which is not a readily flowable liquid and not a solid but a gel of from 0.25% to 50%, 0.5% to 40%, 1% to 30%, or 5% to 25%, or 0.5%, 1%, 5%, 10%, 20%, 30%, 40%, or 50%, by weight of gel forming solute material, and from 45% to 98%, 55% to 95%, 60% to 90%, or 65% to 85% by weight of water. The gels may be formed, for example, using a solute, synthetic or natural (e.g., for forming gelatin) to form interconnected cells which bind, entrap, absorb and/or otherwise hold water to create a gel, which may include bound and unbound water. The gel may be a polymer gel.
Primer binding site: As used herein, a primer binding site is a region of a nucleotide sequence where a RNA or DNA single-stranded primer binds to start replication.
Target polynucleotide sequence: A target polynucleotide sequence is the polynucleotide sequence selected for analysis, wherein the analysis can be any procedure that produces a human- or computer-observable signal. The analysis may comprise polymerase chain reaction (PCR), quantitative PCR (qPCR), Sanger sequencing, or NextGen sequencing (NGS, using platforms such as Illumina MiSeg™, Illumina HiSeg™, Illumina NextSeg™ Illumina NovaSeg™, Ion Torrent, SOLiD™, Roche 454, and the like), and the like. The analysis may yield information about the sequence or quantity of the target polynucleotide sequence. A target polynucleotide sequence can be DNA, RNA, or modified nucleic acid, such as bisulfite-treated DNA. The target polynucleotide sequence is at least part of an engineered molecule that is used to engineer or probe the biological particle. Thus, the target polynucleotide sequence may be the entirety or a subset of the genome or the transcriptome. The target polynucleotide sequence may be endogenous to the biological particle it resides in (i.e., it is in the biological particle without human intervention), or be exogenous to the biological particle it resides in (i.e., it is in the biological particle due entirely or partly to human intervention). The target polynucleotide sequence may be exogenously expressed mRNA, shRNA, non-coding RNA, or guide RNA (for the CRISPR/Cas9-based system). The target polynucleotide sequence may contain a barcode sequence. In some embodiments, the target polynucleotide sequence comprises one or more of a partial or complete T cell or B cell receptor sequence, a mutation, a transcription start site, or a splicing junction.
The target polynucleotide sequence may be a synthetic nucleic acid molecule that is conjugated to a detection probe, such as monoclonal antibody. Sometimes the original target nucleic acid one intends to analyze is converted to another molecular species or molecular complex such as a hybridization product, a primer-extension product (where the original target nucleic acid acts as the template or primer), a PCR product (where the original target nucleic acid acts as the template), a ligation product (where the original target nucleic acid acts as the splint, the 5′ ligation substrate or the 3′ ligation substrate). The newly created molecular species or molecular complexes can also be considered target polynucleotide sequence.
Template-Switching Oligonucleotide: As used herein, a “template-switching oligonucleotide” (TS oligo or TSO) refers to a DNA oligo sequence primer that carries additional consecutive bases at the 3′ end (e.g., 3 riboguanosines (rGrGrG)). The complementarity between these consecutive bases and the 3′ extension of the cDNA molecule empowers the subsequent template switching. Turchinovich et al., RNA Biol. 11(7):817-828 (2014). The sequence of the TSO (other than the consecutive Gs at the 3′ end) is largely arbitrary. The length of a TSO is equal to or greater than 3, 4, 5, 10, 20, or 30 nucleotides in length. In some embodiments the TSO is from 15 to 30 nucleotides in length.
A TSO may be used, for example, in methods such as template-switching polymerase chain reaction (TS-PCR) to produce cDNA from RNA. Petalidis et al., Nucleic Acids Res. 31(22):e142 (2003). TS-PCR is a method of reverse transcription and polymerase chain reaction (PCR) amplification that relies on a natural PCR primer sequence at the polyadenylation site and adds a second primer through the activity of murine leukemia virus (MLV) reverse transcriptase. Examples of TS-PCR include the SMART™ (switching mechanism at the 5′ end of the RNA transcript) or SMARTer™ methods of Clontech Laboratories, and the CATS™ (capture and amplification by tailing and switching) of Diagenode Inc.
In one example, upon reaching the 5′ end of the RNA template during first-strand synthesis, the terminal transferase activity of the MLV (e.g., Moloney murine leukemia virus or MMLV) reverse transcriptase adds a few additional nucleotides (mostly deoxycytidine) to the 3′ end of the newly synthesized cDNA strand. These bases function as a TSO-anchoring site. Upon base pairing between the TSO and the appended deoxycytidine stretch, the reverse transcriptase “switches” template strands, from cellular RNA to the TSO, and continues replication to the 5′ end of the TSO. The resulting cDNA contains the complete 5′ end of the transcript, and universal sequences of choice can be added to the reverse transcription product. Along with tagging of the cDNA 3′ end by oligo dT primers, one may amplify the entire full-length transcript pool in a sequence-independent manner. Shapiro et al., Nat. Rev. Genet. 14(9):618-630 (2013).
Circularizing: As used herein, “circularizing” refers to the conversion of a linear nucleic acid molecules into a circular form. Circularization may be obtained by, for example, homologous recombination of the ends or by association of complementary single stranded ends (sticky ends). Circularization may also be obtained by ligating the two ends of the linear nucleic acids. The ligation can be blunt-end ligation or sticky-end ligation. In some embodiments, the length of circular barcoded nucleic acid molecules is equal to or greater than 1 kb, 1.5 kb, 2 kb, 3 kb, 5 kb, or 10 kb.
Linearizing: As used herein, linearizing refers the conversion of circular nucleic acid molecules to a linear form by fragmentation. Linearization may be accomplished by physical (e.g., acoustic, sonication, hydrodynamic), enzymatic (e.g., transposase, DNase I or other restriction endonuclease, non-specific nuclease), and/or chemical (e.g., heat and divalent metal cation, such as magnesium or zinc) methods. In some embodiments, linearization is by enzymatic means, such as through use of a transposase.
Tagmentation. As used herein, tagmentation refers to fragmentation and tagging of double-stranded DNA using a transposase, such as Tn5 transposase (e.g., Nextera™ methods by Illumina).
A typical barcoded nucleic acid molecule has the structure shown in
Step 0. Ensure there is a functional primer-binding site between the BC and the sequence of interest. An additional primer binding site P3 between BC and the sequence of interest can be strategically added, for example, during primer synthesis (e.g., by including P3 sequence in the primer extension template during the split-and-pool primer synthesis for inDrop™ technology). Poly A and poly T sequence may also serve as P3. As a result, the barcoded long DNA molecule has the structure shown in 101 of
Step 1 (optional). Create a truncated molecule that optionally includes an additional domain X.
Step 2. Circularize the truncated molecule (103) to join the free end of P2 and the other end of the truncated molecule (optionally with domain X in between) to form a circular DNA (104) of
If the truncated molecule is in dsDNA form, the ligation can be made between blunt ends or sticky ends. The sticky end can be created by multiple mechanisms, such as: (a) cleavage with a restriction enzyme; (b) embedding a deoxyuridine base followed by cleavage with USER™ enzyme mix (New England BioLabs, see, e.g., Geu-Flores et al., Nucleic Acids Res. 35(7):e55 (2007)); (c) using a 5′-to-3′ exonuclease activity as in the Gibson Assembly (Gibson et al., Nat. Methods 6:343-345 (2009)); or (d) using 3′-to-5′ exonuclease activity as in ligation-independent cloning (LIC) (Aslandidis et al., Nucleic Acids Res. 18:6069-74 (1990)) or sequence and ligation-independent cloning (SLIC) (Li et al., Nat. Methods 4:251-256 (2007)).
Promotion of intra-molecular circularization and minimization of inter-molecular ligation may be achieved by: (a) compartmentalizing the molecules in a large number (e.g., millions or more) of small compartments (e.g., droplets); (b) adding reagents that reduce diffusion (e.g., glycerol); or (c) immobilizing the DNA on a surface or to polymer in a hydrogel to restrict free diffusion. If the substrate is ssDNA, an oligo complementary to a constant region on the substrate (e.g., P3) can be used to immobilize the substrate DNA molecule on a solid surface or to a polymer. If the substrate is dsDNA, a dsDNA-binding protein, such as a catalytically inactive form of a restriction enzyme, Zinc-Finger Protein, TALE protein, and dCas9/gRNA complex, can be used to immobilize the substrate DNA on the solid surface or to a polymer. Immobilization can also be achieved, for example, by attaching a biotin moiety to the DNA and attaching the DNA to a surface or a polymer modified with streptavidin, or by covalently attaching DNA to a surface or a polymer. Optionally, linear (i.e., non-circularized) DNA can be removed by exonuclease treatment.
In this example, the additional domains (202) and (203) are designed in the way that the 3′ of each strand contain a stretch of sequence containing strictly A and T (e.g., 5′-TAT-3′ on the top strand and 5′-AAT-3′ on the bottom strand), followed by a stretch of sequence containing strictly G and C (e.g., 5′-GGCGGGCGCG-3′ on the top strand and 5′-CGCGCCCGCC-3′ on the bottom strand). The dsDNA can be treated, for example, with a DNA polymerase with 3′-to-5′ exonuclease activity and/or proof-reading activity (e.g., KOD (Thermococcus kodakaraenis) and Pfu (Pyrococcus furiosus) DNA polymerases) in the presence of dATP (deoxyadenosine triphosphate) and dTTP (deoxythymidine triphosphate), but not dCTP (deoxycytidine triphosphate) or dGTP (deoxyguanosine triphosphate). This way the DNA polymerase will keep degrading the G and C nucleotides on the 3′ of the DNA until it meets the A or T on the template where it will go back and forth between degrading the nucleotide and filling it back, likely favoring the latter. Other DNA polymerases that may be used include, but are not limited to, T7 DNA polymerase, DNA polymerase I, Taq DNA polymerase.
After creating the 5′ sticky end, the dsDNA can be immobilized on a solid surface. In some embodiments, the solid surface may be modified with streptavidin (206), such as streptavidin-coated magnetic beads, at low enough density that two dsDNA molecules are unlikely to reach each other. The condition used to immobilize the DNA on the surface should be such that hybridization of sticky ends is unfavorable. These conditions help to reduce or prevent inter-molecular ligation. In some embodiments, the order of Step 2.1 and Step 2.2 of
Next, in Step 2.3 of
Step 3. Truncate the circularized molecule to form truncated linear molecule (106), while introducing a new primer-binding site P4 within proximity (e.g., less than or equal to 1000 bp, 900 bp, 800 bp, 700 bp, 600 bp, 500 bp, 400 bp, 300 bp, 200 bp, 100 bp, or 500 bp) of P3. Position 105 of
1B.
Step 4. Amplify the resulting truncated barcoded DNA segment using primers capable of binding to the primer binding sites (e.g., that recognize P3 and P4 of
In some embodiments, the creation of the truncated molecule described in Step 1 can be omitted. This method can be used, for example, to study the sequence immediately adjacent to P1 (such as transcription start site). This method is illustrated in
In some embodiments, the barcoded amplification product is sequenced by methods known to a person of ordinary skill in the art. For example, the barcoded amplification product may be sequenced by methods that include, but are not limited to, polymerase chain reaction (PCR), quantitative PCR (qPCR), Sanger sequencing, NextGen sequencing (NGS, using platforms such as Illumina MiSeg™, Illumina HiSeg™, Illumina NextSeg™, Illumina NovaSeg™, Ion Torrent, SOLiD™, Roche 454, and the like), and the like.
The methods described herein can be used to achieve several functionalities in the context of scRNA-seq (single cell RNA sequencing), such as: (1) pairing a T-cell receptor (TCR) sequence with a 3′ expression profile of single cells; (2) pairing point mutation distal to 3′ end and 3′ expression profile of single cells; and (3) quasi full-length scRNA-seq. Rationale and methods for these applications are given in the Examples.
scRNA-seq measures the distribution of expression levels for each gene across a population of cells. scRNA-seq may be accomplished using methods known to those of skill in the art and variations thereof, such as SMART-seq™, Smart-seq2™, SMARTer™, CEL-seq™, CEL-seq2™, InDrop-seg™, Drop-seq™, MARS-seq™, SCRB-seg™, Seq-well™, STRT-seq™, etc. In some embodiments, scRNA-seq uses the SMARTer™ (Switching Mechanism At 5′ End of RNA Transcript) method.
The “T-cell receptor” or “TCR” as used herein is a molecule found on the surface of T cells, or T lymphocytes, that is responsible for recognizing fragments of antigen as peptides bound to major histocompatibility complex (MHC) molecules. The binding between TCR and antigen peptides is of relatively low affinity and is degenerate: that is, many TCRs recognize the same antigen peptide and many antigen peptides are recognized by the same TCR. Sewell, A. K., Nat. Rev. Imm. 12(9): 669-677 (2012). When the TCR engages with antigenic peptide and MHC (peptide/MHC), the T lymphocyte is activated through signal transduction, that is, a series of biochemical events mediated by associated enzymes, co-receptors, specialized adaptor molecules, and activated or released transcription factors.
The TCR is a disulfide-linked membrane-anchored heterodimeric protein generally consisting of highly variable alpha (α) and beta (β) chains. Janeway et al., Immunobiology: The Immune System in Health and Disease. 5th ed. Glossary: Garland Science (2001). Each chain is composed of two extracellular domains: a variable (V) region and a constant (C) region. The C region is proximal to the cell membrane, followed by a transmembrane region and a short cytoplasmic tail, while the V region binds to the peptide/MHC complex.
The V domain of both the TCR α-chain and β-chain each have three hypervariable or complementarity determining regions (CDRs). There is also an additional area of hypervariability on the β-chain (HV4) that does not normally contact antigen and, therefore, is not considered a CDR.
CDR3 is the main CDR responsible for recognizing processed antigen, although CDR1 of the alpha chain has also been shown to interact with the N-terminal part of the antigenic peptide, whereas CDR1 of the β-chain interacts with the C-terminal part of the peptide. CDR2 is thought to recognize the MHC. CDR4 of the β-chain is not thought to participate in antigen recognition, but has been shown to interact with superantigens.
The C domain of the TCR consists of short connecting sequences in which a cysteine residue forms disulfide bonds, which form a link between the two chains.
The “B-cell receptor” or “BCR” is a transmembrane receptor protein located on the outer surface of B cells. The BCR comprises a membrane-bound immunoglobulin (antibody) molecule of one isotype (IgD, IgM, IgA, IgG, or IgE) and a signal transduction moiety comprising a heterodimer Ig-α/Ig-β, bound together by disulfide bridges. Similar to the TCR, the V domain of the BCR α-chain and β-chain each have three hypervariable regions or CDRs, which form the antigen-binding site.
When analyzing a B cell or T cell, it is often important to understand both its gene expression profile on a transcriptomic scale, and the BCR or TCR sequence that confers the antigen specificity. Even though each task alone can be accomplished using existing methods (e.g., single-cell gene expression profile can be readily achieved using DropSeq-like methods that feature oligo dT-based reverse transcription primer, and BCR/TCR sequencing can be achieved by replacing the oligo-dT with sequence complementary to the constant (C) region of the BCR/TCR), it is non-trivial to obtain both 3′ expression profile and BCR/TCR sequence. This example shows how to solve this problem using circularization-based DNA reorientation. T cell analysis is used as an example, but the same principle can be applied to B cell analysis. Some steps of this process are depicted in
The mRNAs from greater than 100, 200, 500, 1000, 5000, 10,000, 20,000, etc. of T cells can be barcoded using a DropSeq-like approach. A modified inDrop™ can be used as the exemplary method. In this modified method, one can create greater than 1,000, 2,000, 5,000, 10,000, 20,000, etc. of water-in-oil droplets where there is only one T cell and one hydrogel bead, where the hydrogel bead embeds RT primers that carry the same cell barcode. The RT primer (401 of
After reverse transcription is completed, the reverse transcriptase can be heat-inactivated and the emulsion can be broken to pool all RT product. The reverse transcriptase may add a few C bases at the 3′ end of the first-strand cDNA. A template-switching oligo (TSO) which has a few G bases at the 3′ end can be added. The C bases at the 3′ end of the first-strand cDNA may pair with the G bases on the template-switching oligo and get extended using the template-switching oligo as a template (
Next, a primer comprising the TS sequence and a primer comprising the DA sequence can be used to amplify the first-strand cDNA (
Next, a new pair of primers (403 and 404 of
Next, additional PCR steps can be performed to attach additional domains to the ends of the dsDNA (
A primer essentially having the sequence of DA can be used as a sequencing primer to read the sequences of CB and UMI, and a primer essentially having the sequence of DB* can be used as a sequencing primer to read the sequences of domains J, D, and V. In some cases, the DA and DB* domains may essentially have the sequences of Rd2 and Rd1, respectively (Read2 and Read1, respectively, in the Illumina platform). And the step to read the sequences of CB and UMI can be essentially the same step of reading the i7 index (i.e., index read 1) in common Illumina sequencing run, except that more cycles may be used.
To sequence paired TCRs or BCRs along with transcriptome, an alternative to using TSO is to use a panel of V gene primers for second strand synthesis.
The design and production of primer 801 as well as Step 8.1 (reverse transcription in indexed droplets) can follow Klein et al above. After breaking the emulsion, an aliquot (hereby called the ‘TCR Aliquot’) representing ˜20% of the total volume of the aqueous phase can be used for V gene primer-based second strand synthesis (SSS) and PCR (Step 8.2). Each primer for SSS (named SSS Primer) has a sequence of [$zRd2|$V_Panel}, where $V_Panel is a variable sequence having many variants, each variant corresponding to a V gene of TCR alpha or beta chain.
To perform SSS of Step 8.2, the TCR Aliquot can be mixed with all the SSS Primers so that the final concentration of each SSS Primer is ˜5 nM, in the presence of ˜100 mM Na+ and ˜5 mM Mg++. The mixture will be heated to ˜60° C. for 5 hours to allow hybridization. Next, a thermostable DNA polymerase (e.g., Taq) along with dNTPs can be added to the mixture which allows the SSS Primers to extend on the template. This primer extension product can be SPRI-purified and named ‘SSS Product’.
The SSS Product can be PCR-amplified by primers having the sequence of $zRd1Δ and $zRd2Δ (see Table 4 for sequences). The sequence of these primers may also be truncated by 12- to 14-nt at the 3′ end to ensure specific amplification. This PCR amplification completes Step 8.2. Next, one may use primers having sequence [zX|zRd1Δ} and [$X*|$Idx*|$zRd2Δ} to perform PCR while introducing sample index (Step 8.3). Domain $Idx can be a 6- to 8-nt arbitrary which can serve as sample index. Domain $X may have the sequence shown in Table 5, and serve as the circularization domain. This PCR product can then be circularized (Step 8.4) using the method described in
The circularized DNA can be amplified using primer 804 and 805 (Step 8.5), which essentially linearize and truncate the DNA. Primer 804 has the sequence [$P5|$C5*}. Primer 805 has the sequence of [$zP7|$C3}. This PCR product is suitable for standard HiSeq X or NovaSeq sequencing.
Sequencing Point Mutations Distant from the 3′ End
In some situations, it may be desired to analyze the transcriptome profile and mutation status of a cell simultaneously. For example, in tumor microenvironment there may be both tumor cells that carry a particular mutation and normal cells that do not carry such mutation. It may be desired to study the difference in transcriptome profiles between tumor cells and normal cells.
As an example, if tumor cells, but not normal cells, have K27M mutation in the H3F3A gene, one may process the sample using the strategy shown in
To analyze H3F3A K27 mutation status, the cDNA can be PCR-amplified (
The second primer (507) contains a domain DC at the 5′ end and a domain MD3 at its 3′ end. The domain MD3 is designed to prime close to the 3′ end of the mRNA (excluding the polyA tail). The PCR amplification (
Most DropSeq-like ultra-high throughput scRNA-Seq methods only allow sequencing of the 3′ ends of the mRNAs. The methods described in Examples 1 and 2 show how one may obtain sequence upstream on the mRNA if one knows the sequence context in region of interest on the mRNA (e.g., the sequence in the C domain of TCR and the sequence around the potential point mutation site). However, in some embodiments one may wish to survey the full-length mRNA sequence in an exploratory or hypothesis-free fashion, without necessarily knowing the sequence context a priori. This example describes how one may achieve that with TeleLink™.
Synthesis of barcoded first-strand cDNA, where the barcode comprises both a cell barcode and a UMI domain, can be accomplished by insertion of an additional domain (domain DB) between the UMI and the poly-T region (domain PolyT) (see (601) of
After the emulsion is broken, instead of using the CEL-Seq2 method for second-strand synthesis and amplification (as in the standard inDrop™ method), one may use the SMARTer (Switching Mechanism At 5′ End of RNA Transcript) method (e.g., using the SMARTer kit from Clontech Laboratories), which requires a template switching oligonucleotide (TSO). Such cDNA can be further amplified so that each initial mRNA molecule may be represented by multiple copies (shown as (601) of
The DNA molecules that have undergone the second tagmentation reaction can be PCR-amplified using primers essentially having sequences DB* and DD (see the arrow in
Overall, as schematically shown by (657) of
We use TCR sequences as model sequences to demonstrate the DNA circularization protocol. We prepared 2 dsDNA templates with the sequences called $TRA and $TRB, respectively, using Jurkat cell (Clone E6-1) cDNA and standard molecular biology methods. The sequences of $TRA and $TRB are listed in Table 7. We appended the GC-only domains (serving the purpose of the GC-only regions of 202 and 203 of FIG. 2, and the domains $X and $X* in
Next, the 3′-end GC-only regions on PCR product were chewed off using the Q5® High-Fidelity DNA Polymerase with the presence of dATP and dTTP only. We call these PCR product ‘digested’ TRA and TRB gene segments.
Next, we mixed digested TRA and TRB gene segments at the molar ratio of 1:100 at a series of concentrations (listed in
To test the intra-molecular circularization efficiency versus inter-molecular ligation events, we designed primers having sequences $P03, $P04, $P05, and $P06 as shown in Table 6, targeting the 5′- and 3′-ends of the TRA and TRB gene segments as shown in
Comparing Ct values of total TRA (row “P07+P08” in
The foregoing written specification is considered to be sufficient to enable one skilled in the art to practice the embodiments. The foregoing description and Examples detail certain embodiments and describes the best mode contemplated by the inventors. It will be appreciated, however, that no matter how detailed the foregoing may appear in text, the embodiment may be practiced in many ways and should be construed in accordance with the appended claims and any equivalents thereof.
As used herein, the term “about” refers to a numeric value, including, for example, whole numbers, fractions, and percentages, whether or not explicitly indicated. The term “about” generally refers to a range of numerical values (e.g., +/−5-10% of the recited range) that one of ordinary skill in the art would consider equivalent to the recited value (e.g., having the same function or result). When terms such as “at least” and “about” precede a list of numerical values or ranges, the terms modify all of the values or ranges provided in the list. In some instances, the term “about” may include numerical values that are rounded to the nearest significant figure.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2018/045893 | 8/9/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62543612 | Aug 2017 | US |