The present invention relates generally to the field of molecular biology. More particularly, it concerns amplification of paired protein-coding mRNA sequences using a modified DNA polymerase having reverse transcriptase activity.
There is a need to identify the expression of two or more transcripts from individual cells at high throughput. In particular, for numerous biotechnology and medical applications it is important to identify and sequence the gene pairs encoding the two chains comprising adaptive immune receptors from individual cells at a very high throughput in order to accurately determine the complete repertoires of immune receptors expressed in patients or in laboratory animals. Immune receptors expressed by B and T lymphocytes are encoded respectively by the VH and VL antibody genes and by TCR α/β or γ/δ chain genes. Humans have many tens of thousands or millions of distinct B and T lymphocytes classified into different subsets based on the expression of surface markers (CD proteins) and transcription factors (e.g., FoxP3 in the Treg T lymphocyte subset). High-throughput DNA sequencing technologies have been used to determine the repertoires of VH or VL chains or, alternatively, of TCR α and β in lymphocyte subsets of relevance to particular disease states or, more generally, to study the function of the adaptive immune system (Wu et al., 2011). Immunology researchers have an especially great need for high throughput analysis of multiple transcripts at once.
Currently available methods for immune repertoire sequencing involve mRNA isolation from a cell population of interest, e.g., memory B-cells or plasma cells from bone marrow, followed by RT-PCR in bulk to synthesize cDNA for high-throughput DNA sequencing (Reddy et al., 2010; Krause et al., 2011). However, heavy and light antibody chains (or a and β T-cell receptors) are encoded on separate mRNA strands and must be sequenced separately. Thus, these available methods have potential to unveil the entire heavy and light chain immune repertoires individually, but cannot yet resolve heavy and light chain pairings at high throughput. Without multiple-transcript analysis at the single-cell level to collect heavy and light chain pairing data, the full adaptive immune receptor, which includes both chains, cannot be sequenced or reconstructed and expressed for further study.
In one embodiment, compositions isolated in a compartment are provided, said compositions comprising (i) polymerase that comprises one or more genetically engineered mutations compared to a wild-type Archaeal Family-B polymerase, the polymerase having an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical to SEQ ID NO: 1 and in which one or more amino acid residues at a position selected from the group consisting of positions Y493, Y384, V389, 1521, E664 and G711 in the amino acid sequence shown in SEQ ID NO: 1 or at a position corresponding to any of these positions, are substituted with another amino acid residue; and (ii) a DNA molecule comprising linked cDNAs corresponding to two distinct mRNA transcripts from a single cell. In some aspects, the compartment is an emulsion macrovesicle. In certain aspects, the two distinct mRNA transcripts encode paired antibody VH and VL domains. In other aspects, the two distinct mRNA transcripts encode paired T-cell receptor sequences.
In one embodiment, methods are provided, said methods comprising: a) sequestering single cells into individual compartments; b) lysing the cells to generate a lysate comprising mRNA transcripts; c) performing reverse transcription and a first PCR amplification of the mRNA transcripts using a single polymerase to generate distinct cDNA products corresponding to at least two distinct mRNAs from a single cell; and d) sequencing the distinct cDNA products amplified from at least one single cell. In some aspects, the single polymerase has proofreading activity. In certain aspects, the methods is further defined as a method for obtaining a plurality of natively paired mRNA transcript sequences.
In some aspects, the cells are B cells. In certain aspects, the at least two distinct mRNAs encode paired antibody VH and VL sequences. As such, the method may be further defined as a method for obtaining paired antibody VH and VL sequences for an antibody that binds to an antigen of interest.
In some aspects, the cells are T cells. In certain aspects, the at least two distinct mRNAs encode paired T-cell receptor sequences. As such, the method may be further defined as a method for obtaining paired T-cell receptor sequences for a T-cell receptor that binds to an epitope of interest.
In certain aspects, the mRNA transcripts are not captured. In certain aspects, the mRNA transcripts are bound to a solid support prior to step (c). As such, the method may further comprise binding the mRNA transcripts to a solid support prior to step (c). In some aspects, the solid support is a bead. In certain aspects, the solid support comprises oligonucleotides that hybridize to the mRNA transcripts, such as, for example, oligonucleotides comprising poly-T sequences.
In some aspects, the individual compartments are wells in a gel or microtiter plate. In certain aspects, the individual compartments have a volume of greater than 5 nL. In further aspects, the wells are sealed with a permeable membrane prior to step (c). In some aspects, the individual compartments are microvesicles in an emulsion.
In some aspects, steps (a) and (b) are performed concurrently. In certain aspects, steps (a) and (b) comprise isolating single cells into individual microvesicles in an emulsion and in the presence of a cell lysis solution.
In some aspects, the individual compartments in step (a) further comprise oligonucleotides for priming of reverse transcription. In certain aspects, step (b) further comprises allowing the mRNA transcripts to associate with the oligonucleotides. In certain aspects, the method comprises obtaining sequences from at least 10,000 individual cells. In certain aspects, the method comprises obtaining at least 5,000 individual paired antibody VH and VL sequences.
In some aspects, step (c) comprises linking cDNA by performing overlap extension reverse transcriptase polymerase chain reaction to link at least two transcripts into a single DNA molecule. In some aspects, step (c) does not comprise the use of overlap extension reverse transcriptase polymerase chain reaction. In some aspects, step (c) comprises linking VH and VL cDNAs by performing overlap extension reverse transcriptase polymerase chain reaction to link VH and VL cDNAs in single molecules. In certain aspects, step (c) does not comprise the use of overlap extension reverse transcriptase polymerase chain reaction and wherein the VH and VL cDNAs are separate molecules. In certain aspects, the VH and VL sequences are obtained by sequencing of distinct molecules. As such, the method may further comprise identifying the paired antibody VH and VL sequences comprises performing a probability analysis of the sequences. In some aspects, the probability analysis is based on the CDR-H3 or CDR-L3 sequences. In some aspects, identifying the paired antibody VH and VL sequences comprises comparing raw sequencing read counts.
In some aspects, step (c) comprises linking cDNA by performing recombination. In some aspects, the methods further comprise performing a second PCR amplification after step (c) and before step (d).
In some aspects, the cells are mammalian cells. In certain aspects, the cells are B cells, T cells, NKT cells, or cancer cells.
In some aspects, sequestering the single cells comprises introducing the cells to a device comprising a plurality of microwells so that the majority of cells are captured as single cells. In some aspects, the methods further comprise identifying multiple mRNA transcripts for a plurality of single cells based on the sequencing step (d). In some aspects, the methods further comprise isolating the mRNA transcripts prior to step (c). In some aspects, the methods further comprise determining natively paired transcripts using probability analysis. In certain aspects, identifying the natively paired transcripts comprises comparing raw sequencing read counts.
In various aspects of the present embodiments, the single polymerase is a recombinant Archaeal Family-B polymerase that transcribes a template that is RNA and has one or more mutations compared to a wild-type Archaeal Family-B polymerase. The polymerase may have one or more mutations compared to wild-type KOD polymerase. The one or more mutations are in a region of the polymerase that induces stalling at uracil residues; one or more mutations are in a region that recognizes the 2′ hydroxyl of template RNAs; one or more mutations are in a region that directly acts with a template strand; one or more mutations are in a region for secondary shell interactions; one or more mutations are in a template recognition interface region; one or more mutations are in a region for recognizing an incoming template; one or more mutations are in an active site region; and/or one or more mutations are in a post-polymerization region, in specific embodiments. In some cases, a mutation is in a region or position in which the polymerase recognizes the 2′ hydroxyl of a template RNA. At least one mutation may be an amino acid substitution, in at least some cases.
In some aspects, the polymerase has one or more genetically engineered mutations compared to a wild-type Archaeal Family-B polymerase, the polymerase having an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical to SEQ ID NO: 1 and in which one or more amino acid residues at a position selected from the group consisting of positions Y493, Y384, V389, 1521, E664 and G711 in the amino acid sequence shown in SEQ ID NO:1 or at a position corresponding to any of these positions, are substituted with another amino acid residue. In some cases, the polymerase comprises an amino acid substitution corresponding to position Y493 to a leucine residue or a cysteine residue. In some cases, the polymerase comprises an amino acid substitution corresponding to position Y493 to a leucine residue. In some cases, the polymerase comprises an amino acid substitution corresponding to position Y384 to a phenylalanine residue, a leucine residue, an alanine residue, a cysteine residue, a serine residue, a histidine residue, an isoleucine residue, a methionine residue, an asparagine residue, or a glutamine residue. In some cases, the polymerase comprises an amino acid substitution corresponding to position Y384 to a histidine residue or an isoleucine residue. In some cases, the polymerase comprises an amino acid substitution corresponding to position V389 to a methionine residue, a phenylalanine residue, a threonine residue, a tyrosine residue, a glutamine residue, an asparagine residue, or a histidine residue. In some cases, the polymerase comprises an amino acid substitution corresponding to position V389 to an isoleucine residue. In some cases, the polymerase comprises an amino acid substitution corresponding to position 1521 to a leucine. In some cases, the polymerase comprises an amino acid substitution corresponding to E664 is to a lysine residue. In some cases, the polymerase comprises an amino acid substitution corresponding to position G711 to a leucine residue, a cysteine residue, a threonine residue, an arginine residue, a histidine residue, a glutamine residue, a lysine residue, or a methionine residue. In some cases, the polymerase comprises an amino acid substitution corresponding to position G711 to a valine residue. In some cases, the polymerase comprises an amino acid substitution at a position R97 in the amino acid sequence shown in SEQ ID NO:1 with another amino acid residue. In some cases, the polymerase comprises one or more amino acid residues at a position selected from the group consisting of positions A490, F587, M137, K118, T514, R381, F38, K466, E734 and N735 in the amino acid sequence shown in SEQ ID NO:1 or at a position corresponding to any of these positions, which is substituted with another amino acid residue. In some cases, the polymerase has proofreading activity. In some cases, the polymerase lacks proofreading activity. In some cases, the polymerase has thermophilic activity. In some cases, the polymerase is capable transcribing at least 10 nucleotides from a RNA template. In some cases, the polymerase is capable of transcribing a template that is 2′-OMethyl DNA. In some cases, the polymerase is capable transcribing at least 5 or at least 10 nucleotides from a 2′-OMethyl DNA template.
In some aspects, the polymerase has an amino acid sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO:1 and an amino acid substitution corresponding to an amino acid at positions 493, 384, 389, 97, 521, 711, 735, or a combination thereof. In some cases, the polymerase further comprises an amino acid substitution corresponding to an amino acid at positions 664. In some cases, the polymerase further comprises an amino acid substitution corresponding to position 493 to a leucine residue, a cysteine residue, or a phenylalanine residue. In some cases, the polymerase further comprises an amino acid substitution corresponding to position 493 to a leucine residue. In some cases, the polymerase further comprises an amino acid substitution corresponding to position 493 to an isoleucine residue, a valine residue, an alanine residue, a histidine residue, a threonine residue, or a serine residue. In some cases, the polymerase has an amino acid sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO:1 and an amino acid substitution corresponding to an amino acid at positions 493, 384, 389, 521, 711 or a combination thereof. In some cases, the polymerase comprises an amino acid substitution that corresponds to an amino acid at position 490, 587, 137, 118, 514, 381, 38, 466, 734, ora combination thereof. In some cases, the polymerase comprises an amino acid substitution corresponding to position 384 to a histidine residue or an isoleucine residue. In some cases, the polymerase comprises an amino acid substitution corresponding to position 384 to a phenylalanine residue, a leucine residue, an alanine residue, a cysteine residue, a serine residue, a histidine residue, an isoleucine residue, a methionine residue, an asparagine residue, or a glutamine residue. In some cases, the polymerase comprises an amino acid substitution corresponding to position 389 to an isoleucine residue or a leucine residue. In some cases, the polymerase comprises an amino acid substitution corresponding to position 389 to a methionine residue, a phenylalanine residue, a threonine residue, a tyrosine residue, a glutamine residue, an asparagine residue, or a histidine residue. In some cases, the amino acid substitution corresponding to position 664 is to a lysine residue or a glutamine residue. In some cases, the amino acid substitution corresponding to position 97 to any amino acid residue other than arginine. In some cases, the amino acid substitution corresponding to position 521 to a leucine. In some cases, the amino acid substitution corresponding to position 521 to a phenylalanine residue, a valine residue, a methionine residue, or a threonine residue. In some cases, the amino acid substitution corresponding to position 711 to a valine residue, a serine residue, or an arginine residue. In some cases, the amino acid substitution corresponding to position 711 to a leucine residue, a cysteine residue, a threonine residue, an arginine residue, a histidine residue, a glutamine residue, a lysine residue, or a methionine residue. In some cases, the amino acid substitution corresponding to position 735 to a lysine residue. In some cases, the amino acid substitution corresponding to position 735 to an arginine residue, a glutamine residue, an arginine residue, a tyrosine residue, or a histidine residue. In some cases, the amino acid substitution corresponding to position 490 is to a threonine residue. In some cases, the amino acid substitution corresponding to position 490 is to a valine residue, a serine residue, or a cysteine residue. In some cases, the amino acid substitution corresponding to position 587 is to a leucine residue or an isoleucine residue. In some cases, the amino acid substitution corresponding to position 587 is to an alanine residue, a threonine residue, or a valine residue. In some cases, the amino acid substitution corresponding to position 137 is to a leucine residue or an isoleucine residue. In some cases, the amino acid substitution corresponding to position 137 is to an alanine residue, a threonine residue, or a valine residue. In some cases, the amino acid substitution corresponding to position 118 is to an isoleucine residue. In some cases, the amino acid substitution corresponding to position 118 is to a methionine residue, a valine residue, or a leucine residue. In some cases, the amino acid substitution corresponding to position 514 is to an isoleucine residue. In some cases, the amino acid substitution corresponding to position 514 is to a valine residue, a leucine residue, or a methionine residue. In some cases, the amino acid substitution corresponding to position 381 is to a histidine residue. In some cases, the amino acid substitution corresponding to position 381 is to a serine residue, a glutamine residue, or a lysine residue. In some cases, the amino acid substitution corresponding to position 38 is to a leucine residue or an isoleucine residue. In some cases, the amino acid substitution corresponding to position 38 is to a valine residue, a methionine residue, or a serine residue. In some cases, the amino acid substitution corresponding to position 466 is to an arginine residue. In some cases, the amino acid substitution corresponding to position 466 is to a glutamate residue, an aspartate residue, or a glutamine residue. In some cases, the amino acid substitution corresponding to position 734 is to a lysine residue. In some cases, the amino acid substitution corresponding to position 734 is to an arginine residue, a glutamine residue, or an asparagine residue.
In certain aspects, the polymerase has an amino acid sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO:1 and wherein the polymerase has an amino acid substitution at one or more of the following positions corresponding to SEQ ID NO:1: R97; Y384; V389; Y493; F587; E664; G711; and W768. In some cases, the polymerase has one or more of the following amino acid substitutions corresponding to SEQ ID NO:1: R97M; Y384H; V389I; Y493L; F587L; E664K; G711V; and W768R.
In certain aspects, the polymerase has an amino acid sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO:1 and wherein the polymerase has an amino acid substitution at one or more of the following positions corresponding to SEQ ID NO:1: F38; R97; K118; R381; Y384; V389; Y493; T514; F587; E664; G711; and W768. In some cases, the polymerase has one or more of the following amino acid substitutions corresponding to SEQ ID NO:1: F38L; R97M; K1181; R381H; Y384H; V389I; Y493L; T514I; F587L; E664K; G711V; and W768R.
In certain aspects, the polymerase has an amino acid sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO:1 and wherein the polymerase has an amino acid substitution at one or more of the following positions corresponding to SEQ ID NO:1: F38; R97; K118; M137; R381; Y384; V389; K466; Y493; T514; F587; E664; G711; and W768. In some cases, the polymerase has one or more of the following amino acid substitutions corresponding to SEQ ID NO:1: F38L; R97M; K1181; M137L; R381H; Y384H; V389I; K466R; Y493L; T514I; F587L; E664K; G711V; and W768R.
In certain aspects, the polymerase has an amino acid sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO:1 and wherein the polymerase has an amino acid substitution at one or more of the following positions corresponding to SEQ ID NO:1: F38; R97; K118; M137; R381; Y384; V389; K466; Y493; T514; 1521; F587; E664; G711; N735; and W768. In some cases, the polymerase has one or more of the following amino acid substitutions corresponding to SEQ ID NO:1: F38L; R97M; K1181; M137L; R381H; Y384H; V389I; K466R; Y493L; T514I; I521L; F587L; E664K; G711V; N735K; and W768R.
In certain cases, polymerases further comprise an additional domain, such as one that does not itself take part in polymerization but has polymerization enhancing activity. In a specific embodiment, the additional domain comprise part or all of DNA-binding protein 7d (Sso7d), Proliferating cell nuclear antigen (PCNA), helicase, single stranded binding proteins, bovine serum albumin (BSA), one or more affinity tags, a label, and a combination thereof.
In certain aspects, the polymerase lacks 3′ to 5′ exonuclease activity. In some cases, the polymerase has an amino acid sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO:1 and wherein the polymerase has an amino acid substitution corresponding to N210. In some cases, the polymerase has an amino acid substitution corresponding to N210D. In some cases, the polymerase has an amino acid sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO:1 and wherein the polymerase has an amino acid substitution corresponding to D141 and E143. In some cases, the polymerase has an amino acid substitution corresponding to D141A and E143A.
In certain aspects, the polymerase comprises an amino acid sequence 98% identical to the amino acid sequence of SEQ ID NO: 3. In certain aspects, the polymerase comprises an amino acid sequence 99% identical to the amino acid sequence of SED ID NO: 3. In one aspect, the polymerase comprises an amino acid sequence identical to the amino acid sequence of SEQ ID NO: 3.
As used herein, “essentially free,” in terms of a specified component, is used herein to mean that none of the specified component has been purposefully formulated into a composition and/or is present only as a contaminant or in trace amounts. The total amount of the specified component resulting from any unintended contamination of a composition is therefore well below 0.05%, preferably below 0.01%. Most preferred is a composition in which no amount of the specified component can be detected with standard analytical methods.
As used herein the specification, “a” or “an” may mean one or more. As used herein in the claim(s), when used in conjunction with the word “comprising,” the words “a” or “an” may mean one or more than one.
The use of the term “or” in the claims is used to mean “and/or” unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and “and/or.” As used herein “another” may mean at least a second or more.
Throughout this application, the term “about” is used to indicate that a value includes the inherent variation of error for the device, the method being employed to determine the value, or the variation that exists among the study subjects.
Other objects, features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present invention. The invention may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
The present disclosure generally relates to sequencing two or more genes expressed in a single cell in a high-throughput manner. More particularly, the present disclosure provides a method for high-throughput sequencing of pairs of transcripts co-expressed in single cells to determine pairs of polypeptide chains that comprise immune receptors (e.g., antibody VH and VL sequences).
The methods of the present disclosure allow for the repertoire of immune receptors and antibodies in an individual organism or population of cells to be determined. Particularly, the methods of the present disclosure may aid in determining pairs of polypeptide chains that make up immune receptors. B cells and T cells each express immune receptors; B cells express immunoglobulins, and T cells express T cell receptors (TCRs). Both types of immune receptors consist of two polypeptide chains. Immunoglobulins consist of variable heavy (VH) and variable light (VL) chains. TCRs are of two types: one consisting of an a and a β chain, and one consisting of a γ and a δ chain. Each of the polypeptides in an immune receptor has a constant region and a variable region. Variable regions result from recombination and end joint rearrangement of gene fragments on the chromosome of a B or T cell. In B cells additional diversification of variable regions occurs by somatic hypermutation. Thus, the immune system has a large repertoire of receptors, and any given receptor pair expressed by a lymphocyte is encoded by a pair of separate, unique transcripts. Only by knowing the sequence of both transcripts in the pair can the receptor as a whole be studied. Knowing the sequences of pairs of immune receptor chains expressed in a single cell is also essential to ascertaining the immune repertoire of a given individual or population of cells.
Currently available methods to analyze multiple transcripts in single cells, such as the two transcripts that comprise adaptive immune receptors, are limited by low throughput, very high instrumentation and reagent costs, and the need to capture the transcripts on a substrate. See U.S. Pat. No. 9,708,654, which is incorporated herein by reference in its entirety. No technology currently exists for rapidly analyzing how many cells express a set of transcripts of interest or, more specifically, for sequencing native lymphocyte receptor chain pairs at very high throughput (greater than 10,000 cells per run) without a capture step. The present disclosure aims to correct these deficiencies by providing a new technique for sequencing multiple transcripts simultaneously at the single-cell level with a throughput two to three orders of magnitude greater than the current state of the art.
One advantage of the methods of the present disclosure is that the methods result in a higher throughput several orders of magnitude larger than the current state of the art. In addition, the present disclosure allows for the ability to link two transcripts for large cell populations in a high throughput manner, faster and at a much lower cost than competing technologies.
In certain embodiments, the present disclosure provides methods comprising separating single cells in a compartment with oligonucleotides; lysing the cells; allowing mRNA transcripts released from the cells to hybridize with the oligonucleotides; performing overlap extension reverse transcriptase polymerase chain reaction to covalently link DNA from at least two transcripts derived from a single cell; and sequencing the linked DNA. In certain embodiments, the cells may be mammalian cells. In certain embodiments, the cells may be B cells, T cells, NKT cells, or cancer cells.
In other embodiments, the present disclosure provides methods comprising separating single cells in a compartment with oligonucleotides; lysing the cell; allowing mRNA transcripts released from the cells to hybridize with the oligonucleotides; performing reverse transcriptase polymerase chain reaction to form at least two cDNAs from at least two transcripts derived from a single cell; and sequencing the cDNA.
In other embodiments, the present disclosure provides a system comprising an aqueous fluid phase exit disposed within an annular flowing oil phase, wherein the aqueous phase fluid comprises a suspension of cells and is dispersed within the flowing oil phase, resulting in emulsified droplets with low size dispersity comprising an aqueous suspension of cells.
In other embodiments, the present disclosure provides a composition comprising an oligonucleotide capable of binding mRNA, and two or more primers specific for a transcript of interest.
In certain embodiments, the present disclosure also provides for a device comprising ordered arrays of microwells, each with dimensions designed to accommodate a single lymphocyte cell. In one embodiment, the microwells may be circular wells 56 μm in diameter and 50 μm deep, for a total volume of 125 pL. Such microwells would normally range in volume from 20-3,000 pL, though a wide variety of well sizes, shapes and dimensions may be used for single cell accommodation. In certain embodiments, the microwell may be a nanowell. In certain embodiments, the device may be a chip. The device of the present disclosure allows the direct entrapment of tens of thousands of single cells, with each cell in its own microwell, in a single chip. In certain embodiments, the chip may be the size of a microscope slide. In one embodiment, a microwell chip may be used to capture single cells in their own individual microwells. The microwell chip can be made from polydimethylsiloxane (PDMS); however, other suitable materials known in the art such as polyacrylimide, silicon and etched glass may also be used to create the microwell chip.
In certain embodiments, the oligonucleotides may be a poly(T), a sequence specific for heavy chain amplification, and/or a sequence specific for light chain amplification. A dialysis membrane covers the microwells, keeping the cells in the microwells while lysis reagents are dialyzed into the microwells. The lysis reagents cause the release of the cells' mRNA transcripts into the microwell. In embodiments where the oligonucleotide is poly(T), the poly(A) mRNA tails are captured by the poly(T) oligonucleotides. In another embodiment, the oligonucleotide may be a primer specific to a transcript of interest. The mRNA are then incubated in solution with reagents for overlap extension (OE) reverse transcriptase polymerase chain reaction (RT-PCR). This reaction mix includes primers designed to create a single PCR product comprising cDNA of two transcripts of interest covalently linked together. Before thermocycling, the reagent solution is emulsified in oil phase to create droplets. The linked cDNA products of OE RT-PCR are recovered and used as a template for nested PCR, which amplifies the linked transcripts of interest. The purified products of nested PCR are then sequenced and pairing information is analyzed. In other embodiments, restriction and ligation may be used to link cDNA of multiple transcripts of interest. In other embodiments, recombination may be used to link cDNA of multiple transcripts of interest.
The present disclosure also provides a method to trap mRNA from single cells, perform cDNA synthesis, link the sequences of two or more desired cDNAs from single cells to create a single molecule, and finally reveal the sequence of the linked transcripts by High Throughput (Next-gen) sequencing. According to the present disclosure, one way to increase throughput in biological assays is to use an emulsion that generates a high number of 3-dimensional parallelized microreactors. Emulsion protocols in molecular biology often yield 109-1011 droplets per mL (sub-pL volume). Emulsion-based methods for single-cell polymerase chain reaction (PCR) have found a wide acceptance, and emulsion PCR is a robust and reliable procedure found in many next-generating sequencing protocols. However, very high throughput RT-PCR in emulsion droplets has not yet been implemented because cell lysates within the droplet inhibit the reverse transcriptase reaction. Cell lysate inhibition of RT-PCR can be mitigated by dilution to a suitable volume.
An aqueous solution with a suspension of cells is emulsified into oil phase by injecting an aqueous cell/bead suspension into a fast-moving stream of oil phase. The shear forces generated by the moving oil phase create droplets as the aqueous suspension is injected into the stream, creating an emulsion with a low dispersity of droplet sizes. Each cell is in its own droplet. The uniformity of droplet size helps to ensure that individual droplets do not contain more than one cell. Cells are then thermally lysed, and the mixture is cooled. The mRNA is incubated in a solution for emulsion OE RT-PCR to link the cDNAs of transcripts of interest together. Nested PCR and sequencing of the linked transcripts is performed according to the present disclosure. In certain embodiments, the aqueous suspension of cells comprises reverse transcription reagents. In certain other embodiments, the aqueous suspension of cells comprises at least one of polymerase chain reaction and reverse transcriptase polymerase chain reaction reagents, including a single enzyme that is capable of catalyzing both the PCR and the RT reactions. In other embodiments, restriction and ligation may be used to link cDNA of multiple transcripts of interest. In other embodiments, recombination may be used to link cDNA of multiple transcripts of interest.
In another embodiment, emulsion droplets which contain individual cells and RT-PCR reagents are formed by injection into a fast-moving oil phase. Thermal cycling is then performed on these droplets directly. In certain embodiments, an overlap extension reverse transcription polymerase chain reaction may be used to link cDNA of multiple transcripts of interest.
Primer design for OE RT-PCR determines which transcripts of interest expressed by a given cell are linked together. For example, in certain embodiments, primers can be designed that cause the respective cDNAs from the VH and VL chain transcripts to be covalently linked together. Sequencing of the linked cDNAs reveals the VH and VL sequence pairs expressed by single cells. In other embodiments, primer sets can also be designed so that sequences of TCR pairs expressed in individual cells can be ascertained or so that it can be determined whether a population of cells co-expresses any two genes of interest.
Bias can be a significant issue in PCR reactions that use multiple amplification primers because small differences in primer efficiency generate large product disparities due to the exponential nature of PCR. One way to alleviate primer bias is by amplifying multiple genes with the same primer, which is normally not possible with a multiplex primer set. By including a common amplification region to the 5′ end of multiple unique primers of interest, the common amplification region is thereby added to the 5′ end of all PCR products during the first duplication event. Following the initial duplication event, amplification is achieved by priming only at the common region to reduce primer bias and allow the final PCR product distribution to remain representative of the original template distribution.
Such a common region can be exploited in various ways. One clear application is to add the common amplification primer at higher concentration and the unique primers (with 5′ common region) at a low concentration, such that the majority of nucleic acid amplification occurs via the common sequence for reduced amplification bias.
Accordingly, in certain embodiments, the present disclosure provides methods comprising adding a common sequence to the 5′ region of two or more oligonucleotides that are specific to a set of gene targets; and performing nucleic acid amplification of the set of gene targets by priming the common sequence.
The methods of the present disclosure allow for information regarding multiple transcripts expressed from a single cell to be obtained. In certain embodiments, probabilistic analyses may be used to identify native pairs with read counts or frequencies above non-native pair read counts or frequencies. The information may be used, for example, in studying gene co-expression patterns in different populations of cancer cells. In certain embodiments, therapies may be tailored based on the expression information obtained using the methods of the present disclosure. Other embodiments may focus on discovery of new lymphocyte receptors.
In some embodiments, enzymes having the ability to generate DNA from a template that comprises RNA bases, either in part or in its entirety, are used. In certain embodiments, the enzymes are as described in PCT/US2017/014082, which is incorporated herein by reference in its entirety. In specific embodiments, the enzymes are recombinant enzymes. In some embodiments, the enzymes have the ability to use RNA as a template when their parent enzyme from which they were derived (by mutation) lacked such ability. In specific cases, the enzymes that acquire reverse transcriptase activity are able to recognize alternative bases or sugars in a template strand (compared to an enzyme that can only recognize DNA as a template), such as by allowing recognition of a template having uracil instead of thymine and having variability at the 2′ position in the ribose ring.
The enzymes of the present disclosure make it easier to melt RNA structure and generate cDNA copies, in specific embodiments. Although there are other commercially available reverse transcriptases with modest thermostability, the enzymes of the present disclosure have much higher thermostability (e.g., thermostability at temperatures above 50° C., 51° C., 52° C., 53° C., 54° C., 55° C., 56° C., 57° C., 58° C., 59° C., 60° C., 61° C., 62° C., 63° C., 64° C., 65° C., 66° C., 67° C., 68° C., 69° C., 70° C., or more) and have proofreading activity. In specific embodiments, the enzymes of the present disclosure are more processive and/or more primer-dependent, resulting in less promiscuity in generating an accurate cDNA imprint of an mRNA population, for example. Because of their proofreading domain, the enzymes of the present disclosure generate fewer mutations than other enzymes and provide a more accurate representation of the RNAs present in a given population (including, for example, a sample from one or more individuals, environments, and so forth).
At least some enzymes of the disclosure encompass proofreading activity, which may be defined herein as the ability of the enzyme to recognize an incorrect base pair, reverse its direction and excise the mismatched base, followed by insertion of the correct base. Enzymes of the disclosure may be referred to as comprising 3′-5′ exonuclease activity. Although testing a particular enzyme for proofreading activity may be achieved in a variety of ways, in specific embodiments the enzyme is tested by dideoxy-mismatch PCR that necessitates removal of a 3′ deoxy mismatch primer prior to polymerization or primer extension reactions with 3′ terminal deoxy mismatches.
Although certain enzymes of the disclosure may be characterized as reverse transcriptases, in particular aspects the enzymes can utilize DNA, RNA, modified DNA, and/or modified RNA as a template. Modified DNA and RNA may be referred to as information nucleotide-comprising polymers that can be replicated enzymatically that contain altered chemical modifications to the backbone, sugar or base. In specific cases, the modified DNA or RNA is modified at the 2′ position of a sugar of a component of the template. Particular embodiments encompass recombinant Archaeal Family-B polymerases that transcribe a template that is DNA, RNA, modified DNA, or modified RNA.
The enzymes of the disclosure may be generated using a starting polymerase that lacks reverse transcriptase activity, and in specific embodiments, that starting polymerase is an Archaeal Family-B polymerase, such as KOD polymerase. Any number of mutations may be generated from the starting polymerase and tested for using methods of the disclosure. In specific embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 or more mutations are incorporated into a polymerase that lacks reverse transcriptase activity such that the entirety of mutations (or a sub-combination thereof) are responsible for imparting reverse transcriptase activity to the polymerase that originally lacked it. The mutations may be of any kind, including amino acid substitution(s), deletion(s), insertion(s), inversion(s), and so forth. In specific embodiments, the mutation is a single amino acid change, and the change may or may not be conservative. Although in some cases the amino acid substitution mutation must be to a certain amino acid, in other cases the mutation may be to any amino acid. Embodiments within the scope herein are not limited by the means of generating/designing the various enzymes. While some enzymes are designed via mutations to a starting polymerase, embodiments herein are not limited to any particular mechanism of action and an understanding of the mechanism of action is not necessary to practice such embodiments.
In certain embodiments, an enzyme of the disclosure has a specific amino acid sequence identity compared to a given enzyme, for example a wild-type Archaeal Family-B polymerase, such as KOD polymerase (including, for example, SEQ ID NO:1). In specific embodiments, the enzyme has an amino acid sequence that is at least 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% identical to the amino acid sequence of SEQ ID NO:1. An enzyme of the disclosure may be of a certain length, including at least or no more than 600, 625, 650, 675, 700, 725, 750, 755, 760, 765, 770, 775, 780, 781, 782, 783, or 784 amino acids in length, for example. The enzyme may or may not be labeled. The enzyme may be further modified, such as comprising new functional groups such as phosphate, acetate, amide groups, or methyl groups, for example. The enzymes may be phosphorylated, glycosylated, lapidated, carbonylated, myristoylated, palmitoylated, isoprenylated, farnesylated, alkylated, hydroxylated, carboxylated, ubiquitinated, deamidated, contain unnatural amino acids by altered genetic codes, contain unnatural amino acids incorporated by engineered synthetase/tRNA pairs, and so forth. The skilled artisan recognizes that post-translational modification of the enzymes may be detected by one or more of a variety of techniques, including at least mass spectrometry, Eastern blotting, Western blotting, or a combination thereof, for example.
Specific examples of enzymes of the disclosure include at least the following:
B11 reverse transcriptase (an example of a derivative of KOD polymerase that is a hyperthermophilic reverse transcriptase):
CORE3 reverse transcriptase (an example of a derivative of KOD polymerase that is a hyperthermophilic proofreading reverse transcriptase):
In particular aspects, the enzymes of the disclosure have one or more mutations in at least one of the following regions of a particular polymerase (here, as it corresponds to SEQ ID NO:1): residues (1-130 and 338-372 is N-terminal domain); (131-338 is exonuclease domain); (448-499 is finger domain); (591-774 is thumb domain); (374-447 and 500-590 is palm domain).
In certain embodiments, the enzymes of the disclosure have mutations at particular amino acids (the position of which corresponds to SEQ ID NO:1, in certain examples) and, in some cases particular residues are the substituted amino acid at that position. Table A provides an example of a list of certain mutations that may be present in the disclosure, and in specific embodiments a combination of mutations is utilized in the enzyme.
In at least some cases, the enzymes have a mutation at R97 as it corresponds to SEQ ID NO:1. In some cases, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, fifteen or more, or sixteen or more mutations from this table are present in an enzyme of the disclosure. In specific embodiments, the following combinations are included alone or with one or more other mutations listed above or not listed above:
Y384 and V389; Y384 and E664; Y384 andY493; Y384 and R97; Y384 and 1521; Y384 and G711; Y384 and N735; Y384 and A490; V389 and E664; V389 and Y493; V389 and R97; V389 and 1521; V389 and G711; V389 and N735; V389 and A490; E664 and Y493; E664 and R97; E664 and 1521; E664 and G711; E664 and N735; E664 and A490; Y493 and R97; Y493 and 1521; Y493 and G711; Y493 and N735; Y493 and A490; R97 and 1521; R97 and 1521; R97 and G711; R97 and N735; R97 and A490; 1521 and G711; 1521 and N735; 1521 and A490; G711 and N735; or G711 and A490. In at least some cases, one or more other mutations are combined with these specific combinations.
In specific embodiments, the polymerase has an amino acid substitution at one or more of the following positions corresponding to SEQ ID NO:1:
Any of the combinations in a), b), c), or d) may include A490, F587, M137, K118, T514, R381, F38, K466, and/or E734. In particular embodiments, the polymerase has one or more of the following specific amino acid substitutions corresponding to SEQ ID NO:1:
Any of the combinations in a), b), c), or d) may include A490, F587, M137, K118, T514, R381, F38, K466, and/or E734.
All or some of the essential materials and reagents required for carrying out methods of the disclosure may be provided in a kit. The kit may comprise one or more of RNA base-comprising primers, DNA base-comprising primers, vectors, polymerase-encoding nucleic acids, buffers, ribonucleotides, deoxyribonucleotides, salts, and so forth corresponding to at least some embodiments of the provided methods. Embodiments of kits may comprise reagents for the detection and/or use of a control nucleic acid or enzyme, for example. Kits may provide instructions, controls, reagents, containers, and/or other materials for performing various assays or other methods (e.g., those described herein) using the enzymes of the disclosure.
The kits generally may comprise, in suitable means, distinct containers for each individual reagent, primer, and/or enzyme. In specific embodiments, the kit further comprises instructions for producing, testing, and/or using enzymes of the disclosure.
The following examples are included to demonstrate preferred embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function well in the practice of the invention, and thus can be considered to constitute preferred modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the invention.
The flow-joint apparatus comprises a barbed Y connector (PVDF, 1/16″, #3063342, Cole-Parmer) that facilitates the merger of two input streams from separate 5 mL syringes into a 27-gauge needle (#Z192384-100EA, Sigma Aldrich). The syringes are connected to 1/16 inch Tygon tubing (#80-10002-03, Cytek Biosciences) via female Luer lock to barb connectors (#11532, Qosina) (
To physically link the antibody heavy and light chain transcripts from a single cell, cell lysate isolated from single cells is co-emulsified with a RT-PCR solution composed of 0.5×RTX buffer, 1.6 U/μL SUPERase.In RNase Inhibitor (Invitrogen), 0.4 mM dNTP, 2 M Betaine (Sigma-Aldrich), RTX 8 μg/mL, 0.1 wt % BSA (Invitrogen Ultrapure BSA, 50 mg/mL) and primer sets designed for overlap extension RT-PCR (Table 1). The oil phase consists of mineral oil (Sigma Aldrich Corp.) supplemented with 0.05% Triton X-100 (Sigma Aldrich Corp.) and 2% ABIL EM 90 (Degussa). The emulsions are distributed into a 96-well PCR plate and subjected to overlap-extension RT-PCR under the following conditions: 30 min at 68° C., 2 min at 94° C., followed by 25 cycles of 94° C. for 30 s, 60° C. for 30 s, and 68° C. for 2 min. Final reaction products are extended at 68° C. for 7 min (
Whether RTX and commercially available RT-PCR kits retain their polymerase activity in the emulsion containing cell lysate was investigated. Blood was drawn from a healthy female volunteer after informed consent had been obtained. PBMCs were isolated from the blood, resuspended in the RPMI-1640 containing 10% DMSO and 10% FBS, and then were frozen for cryopreservation. Total B cells were isolated from thawed PBMCs using the reagents of a Memory B Cell Isolation Kit (Miltenyi Biotec). Total B cells were washed with cold 80 mM Tris-HCl (pH7.5) twice and concentrated to 6.6×108 cells/mL. One million total B cells were lysed with 100 μL following RT-PCR reagents containing surfactant. RT-PCR reagent using RTX: 1×RTX buffer (60 mM Tris-HCl (pH 8.4), 25 mM (NH4)2SO4, 10 mM KCl, 1 mM MgSO4), 0.8 U/μL SUPERase⋅In RNase Inhibitor (Invitrogen), 0.2 mM dNTPs, 1 M Betaine (Sigma-Aldrich), 0.4 μg RTX, 0.05 wt % BSA (Invitrogen Ultrapure BSA, 50 mg/mL), 0.5% Tween 20 (Sigma-Aldrich), and primer sets designed for overlap extension RT-PCR (Table 1). Three different commercially available RT-PCR reagents were used for this experiment (QIAGEN® OneStep RT-PCR Kit (QIAGEN), qScript One-Step Fast qRT-PCR Kit, ROX (Quanta Biosciences), and SuperScript™ III One-Step RT-PCR System with Platinum™ Taq DNA Polymerase (Thermo Fisher Scientific)). The RT-PCR reagents were prepared according to the manufacturer's protocol and supplemented with BSA, primers, and Tween 20 as described above. These RT-PCR reagents containing cell lysate were injected into 5.5 mL oil independently (molecular biology grade mineral oil (Sigma Aldrich Corp.) supplemented with 0.05% Triton X-100 (Sigma Aldrich Corp.) and 2% ABIL EM 90 (Degussa)) and stirred by IKA dispersing tube (DT-20, VWR) on the IKA ULTRA TURRAX Tube drive at 615 RPM for 5 min. The resulting emulsions were distributed into 96-well plates and RT-PCR was performed as follows: RT-PCR using RTX: 30 min at 68° C., 2 min at 94° C., followed by 25 cycles of 94° C. for 30 s, 60° C. for 30 s, 68° C. for 2 min. The final product was extended at 68° C. for 7 min. QIAGEN RT-PCR kit: 30 min at 55° C., 3 min at 94° C., followed by 35 cycles of 94° C. for 30 s, 60° C. for 30 s, 72° C. for 2 min. The final product was extended at 72° C. for 7 min. Quanta Biosciences RT-PCR kit: 30 min at 55° C., 2 min at 94° C., followed by 25 cycles of 94° C. for 30 s, 60° C. for 30 s, 72° C. for 2 min. The final product was extended at 72° C. for 7 min. Thermo Fisher Scientific RT-PCR kit: 30 min at 60° C., 2 min at 94° C., followed by 35 cycles of 94° C. for 30 s, 60° C. for 30 s, 68° C. for 2 min. The final product was extended at 68° C. for 7 min. As positive controls, 30 ng total B cell RNAs were mixed with RT-PCR reagents and regular RT-PCR without emulsion was performed.
Following RT-PCR, the emulsions were collected in Eppendorf tubes and centrifuged at 17,000 g for 10 min. The mineral oil phase was decanted, and the DNA amplicons were recovered via three serial extractions using (in order) diethyl ether, water-saturated ethyl acetate, and diethyl ether. Residual ether was removed using a SpeedVac (30 minutes at RT) and the DNA was concentrated using a PCR purification kit (Zymo research Corp.) as per the manufacturer's instructions and eluted with 40 μL water. Nested PCR was performed in a total volume of 50 μL using 2 μL of the cDNA, nested primers (Table 2), and DreamTaq™ Hot Start DNA Polymerase (Thermo Fisher Scientific) according to the manufacturer's protocol and the following conditions: 95° C. for 3 min, followed by 40 cycles of 95° C. for 30 s, 62° C. for 30 s, 72° C. for 1 min. Finally, DNA was extended at 72° C. for 7 min. DNA was run on a 1% agarose gel and detected (
Blood was drawn from a healthy 36-year-old female volunteer after informed consent had been obtained. PBMCs were isolated from the blood, resuspended in RPMI-1640 containing 10% DMSO and 10% FBS, and then frozen for cryopreservation. Memory B cells were isolated from thawed PBMCs using the Memory B Cell Isolation Kit (Miltenyi Biotec). Approximately 564,000 memory B cells were obtained and cultured in RPMI-1640 medium containing 10% FBS, 2 mM L-glutamine, 1 x non-essential amino acids, 1× sodium pyruvate, and 1× penicillin/streptomycin (Life Technologies) and expanded for four days in the presence of 10 μg/mL anti-CD40 antibody (5C3, BioLegend), 1 μg/mL CpG ODN 2006 (Invivogen, San Diego, Calif., USA), 100 units/mL IL-4, 100 units/mL IL-10, and 50 ng/mL IL-21 (PeproTech, Rocky Hill, N.J., USA). Expanded B cells were washed with 15 mL 2×RTX buffer (1×RTX buffer: 60 mM Tris-HCl (pH 8.4), 25 mM (NH4)2SO4, 10 mM KCl, 1 mM MgSO4), and cell number was determined.
Two technical replicates were performed, each utilizing approximately 25,000 expanded memory B cells spiked with 300 ARH-77 cells. The cells were reconstituted in 1.4 mL 2×RTX buffer and loaded into a 5 mL syringe. Another syringe contained 1.4 mL RT-PCR solution, composed of 0.5×RTX buffer, 1.6 U/μL SUPERase⋅In RNase Inhibitor (Invitrogen), 0.4 mM dNTPs, 2 M Betaine (Sigma-Aldrich), RTX 8μg/mL, 0.1 wt % BSA (Invitrogen Ultrapure BSA, 50 mg/mL), 0.5% (v/v) Tween 20 (Sigma-Aldrich), and primer sets designed for overlap extension RT-PCR (Table 1). Both syringes were simultaneously compressed by a syringe pump (KD Scientific Legato 200, Holliston, Mass., USA) at the speed of 1.3 mL/min, and the resulting stream was directly injected into 9 mL of chilled oil (molecular biology grade mineral oil (Sigma Aldrich Corp.) supplemented with 0.05% Triton X-100 (Sigma Aldrich Corp.) and 2% ABIL EM 90 (Degussa)) stirred by IKA dispersing tube (DT-20, VWR) on the IKA ULTRA TURRAX Tube drive at 615 RPM (
Following RT-PCR, the emulsions were collected in Eppendorf tubes and centrifuged at 17,000 g for 10 min. The mineral oil phase was decanted, and the DNA amplicons were recovered via three serial extractions using (in order) diethyl ether, water-saturated ethyl acetate, and diethyl ether. Residual ether was removed using a SpeedVac (30 minutes at RT) and the DNA was concentrated using a PCR purification kit (Zymo research Corp.) as per the manufacturer's instructions. Nested PCR was performed in a total volume of 250 μL using 100 ng cDNA, nested primers (Table 2), and Platinum™ Taq DNA Polymerase (Thermo Fisher Scientific) according to the manufacturer's protocol and the following conditions: 94° C. for 3 min, followed by 25 cycles of 94° C. for 30 s, 62° C. for 30 s, 72° C. for 30 s. Finally, DNA was extended at 72° C. for 7 min. The 850 bp PCR product was isolated from a 1% agarose gel using a gel purification kit (Zymo Research Corp.) according to the manufacturer's protocol.
A two-step procedure was performed to append Illumina adaptor sequences to the amplicon. First, 50 ng of DNA was amplified using NEBNext® High-Fidelity 2×PCR Master Mix (New England BioLabs Inc) in combination with the primers in Table 3 under the following conditions: 98° C. for 30 s, followed by 8 cycles of 98° C. for 10 s, 62° C. for 30 s, 72° C. for 30 s, and finally a 7 min extension at 72° C. The PCR product was concentrated using a PCR purification kit and quantified by Nanodrop. In the second reaction, 50 ng of DNA was amplified by NEBNext® High-Fidelity 2×PCR Master Mix in combination with the primers in Table 4 under the following conditions: 98° C. for 30 s, followed by 8 cycles of 98° C. for 10 s, 62° C. for 30 s, 72° C. for 30 s, and finally a 7 min extension at 72° C. The 1100 bp PCR product was isolated from a 1% agarose gel using a gel isolation kit and submitted for Illumina MiSeq 2×300 sequencing.
Raw 2×300 Illumina reads were trimmed and filtered to remove low quality sequences using Trimmomatic and submitted to MiXCR for CDR3 identification and gene annotation. Sequences with >2 reads were grouped into lineages based on 90% CDRH3 nucleotide identity using Usearch (version 7.0). Rarefaction analysis was performed by subsampling the raw Illumina reads to measure the sample diversity independent from the number of sequencing reads (
HEK293 cells were gently dissociated from the culturing plate by pipetting and centrifuged at 300×g. The culture medium was removed, cells were resuspended in cold 1 mL 80 mM Tris-HCl (pH 7.5) and then centrifuged at 900×g for 5 min. The supernatant was removed and this washing step was repeated. The cells were resuspended in the cold 80 mM Tris-HCl (pH 7.5) at the concentration of 100,000 cells/μL and then 0.2 μL cell suspension was mixed with the 50 μl various RT-PCR reagents (RTX, Titan One Tube RT-PCR System (#11855476001, Sigma), QIAGEN® OneStep RT-PCR Kit (#210210, QIAGEN), SuperScript® III One-Step RT-PCR System (#12574-026, ThermoFisher Scientific), qScript One-Step Fast qRT-PCR Kit, ROX (#95080-500, Quanta Biosciences)) containing 0.5% Tween 20. The RT-PCR reagent recipes are described in Table 5. 300 ng total RNA from HEK293 cells was used as a positive control. The PGK1 primer sequences are described in Table 6. RT-PCR to detect PGK1 mRNA was performed as follows: RT-PCR using RTX: 30 min at 68° C., 2 min at 94° C., followed by 25 cycles of 94° C. for 30 s, 60° C. for 30 s, 68° C. for 1 min. The final product was extended at 68° C. for 7 min. Titan One Tube RT-PCR System: 30 min at 50° C., 2 min at 94° C., followed by 35 cycles of 94° C. for 30 s, 60° C. for 30 s, 68° C. for 1 min. The final product was extended at 72° C. for 7 min. QIAGEN RT-PCR kit: 30 min at 50° C., 5 min at 95° C., followed by 35 cycles of 94° C. for 30 s, 60° C. for 30 s, 72° C. for 1 min. The final product was extended at 72° C. for 7 min. Quanta Biosciences RT-PCR kit: 30 min at 55° C., 2 min at 94° C., followed by 35 cycles of 94° C. for 30 s, 60° C. for 30 s, 72° C. for 1 min. The final product was extended at 72° C. for 7 min. Thermo Fisher Scientific RT-PCR kit: 30 min at 60° C., 2 min at 94° C., followed by 35 cycles of 94° C. for 30 s, 60° C. for 30 s, 68° C. for 1 min. The final product was extended at 68° C. for 7 min. The resulting DNAs were run on a 1% agarose gel and detected (
VH-VL pairing accuracy and throughput was examined using expanded human B cells. Frozen PBMCs from a healthy 36-year-old female volunteer (Table 7, Donor A, same donor as in Example 4) were thawed and CD27+ memory B cells were isolated by a Memory B Cell Isolation Kit (Miltenyi Biotec) and expanded for four days as described in Example 4. The expanded memory B cells were divided into two replicates. Each replicate contained 30,000 expanded B cells and 500 ARH-77 B cells were added as a spike-in control (60:1 ratio). Single-cell emulsion RT-PCR was performed as described in Example 4 and with the volumes described in Table 7. The resulting VH-VL amplicons were purified as described in Example 4. Nested PCR was performed in a total volume of 250 μL using 30% volume of the cDNA, nested primers (Table 2), and DreamTaq™ Hot Start DNA Polymerase (Thermo Fisher Scientific) according to the manufacturer's protocol and the following conditions: 95° C. for 3 min, followed by 28 cycles of 95° C. for 30 s, 62° C. for 30 s, 72° C. for 1 min. Finally, DNA was extended at 72° C. for 7 min. DNA was run on a 1% agarose gel and detected. The 850 bp PCR product was isolated from a 1% agarose gel using a gel purification kit (Zymo Research Corp.) according to the manufacturer's protocol. The Illumina adaptor sequences were added as described in Example 4 and with the MiSeqFw primer in Table 4 and MiSeqRev3 (IgGA, sample A), MiSeqRev4(IgM, sample A), MiSeqRev5 (IgGA, sample A′), or MiSeqRev6 (IgM, sample A′) in Table 8.
DNA was sequenced using Illumina MiSeq 2×300. 5,761 VH-VL clusters in sample A and 5,260 VH-VL clusters in sample A′ (Table 7) were detected. Among both replicates, 3,166 identical CDR-H3 amino acid sequences were observed, which must have been originated from identical B cell progenitors. Out of the identical CDR-H3 sequences, 2,786 CDR-H3 paired with identical CDR-L3 in both replicates. This results in 93.8% pairing precision (Table 7, see the formula below for the pairing precision calculation). In the MiXCR annotated sequences before clustering, ARH-77 VH and VL were correctly paired and detected as 15 reads and 11 reads in sample A and sample A′, respectively. ARH-77 VH paired with incorrect VL was detected as single reads and thus were filtered out through the bioinformatic pipeline (DeKosky et al., In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire. Nature Medicine. (2015)). During the CD27+ memory B cell isolation step with the kit, CD27− B cells were also isolated, which mostly represent naïve B cells. CD27− B cells were expanded using the same protocol. 1.83×105 expanded B cells were mixed with 500 ARH-77 cells (366:1 ratio) and performed single-cell emulsion RT-PCR. A technical replicate experiment was performed without SUPERase⋅In™ RNase inhibitor. The resulting VH-VL amplicons were analyzed as described in Example 4. For sequencing, MiSeqFw primer in Table 4 and MiSeqRev7 (IgGA, sample B), MiSeqRev8 (IgM, sample B), MiSeqRev9 (IgGA, sample B′), or MiSeqRev10 (IgM, sample B′) in Table 8 were used for adding Illumina adaptor sequences. 21,801VH-VL clusters in sample B and 17,223 VH-VL clusters in sample B′ (Table 7) were detected. Among both replicates, 4,976 identical CDR-H3 amino acid sequences were observed, which must have been originated from identical B cell progenitors. Out of the identical CDR-H3 sequences, 4,642 CDR-H3 paired with identical CDR-L3 in both replicates. This results in 96.5% pairing precision.
In the MixCR annotated sequences before clustering, the correct ARH77 VH-VL pair was detected as 118 reads in sample B and 435 reads in sample B′. In sample B, the top correct ARH-77 VH which paired with incorrect VL was detected as single reads and thus were filtered out through our bioinformatic pipeline. In sample B′, the top correct ARH-77 VH which paired with incorrect VL was detected as two reads. Thus, the signal to noise ratio in this experiment was 217.5:1 (DeKosky et al., In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire. Nature Medicine. (2015)).
The pairing precision was calculated with the following formula as described before (DeKosky et al., 2015; McDaniel et al., 2016).
TP1 and 2 is the number of VH sequences paired with identical VL sequences in both replicates. FP1 or 2 is the number of VH sequences paired with different VL sequences across the replicates. P is the VH-VL pairing precision. To estimate the TCR pairing precision, VH was replaced with TCRβ and VL was replaced with TCRα.
Next, it was tested whether the methods could be used to analyze paired TCRαβ at the single-cell level by the single-cell emulsion RT-PCR. Blood was drawn from a healthy 59-year-old female volunteer (Donor B, Table 7) after informed consent had been obtained. PBMCs were isolated from the blood, resuspended in the RPMI-1640 containing 10% DMSO and 10% FBS, and then were frozen for cryopreservation. The frozen PBMCs were thawed and total T cells were isolated with Pan T cell isolation kit (#130-096-535, Miltenyi Biotec). The T cells were cultured in RPMI-1640 medium containing 10% FBS, 2 mM L-glutamine, 1× non-essential amino acids, 1× sodium pyruvate, and 1× penicillin/streptomycin (Life Technologies) and expanded in the presence of CD3/CD28 dynabeads (#11161D, Thermo Fisher Scientific) and 30 units/mL IL-2 (PeproTech) for a week. The medium was exchanged every three days and fresh beads and IL-2 were added. 2.9×105 expanded T cells were divided into two replicates. Single-cell emulsion RT-PCR was performed for each replicate as described in Example 4 but using the primers described in Table 9 to pair TCRαβ. In this experiment, Span80 based oil (mineral oil containing 4.5% Span-80(#56760, Sigma Aldrich), 0.4% Tween 80(#P9416, Sigma Aldrich), 0.05% Triton X-100, v/v%) was used. The volumes of reagents were described in the Table 7. The TCRα and TCRβ primers are the modification of the following reference. (Han et al., 2014).
Following RT-PCR, the emulsions were collected in Eppendorf tubes and centrifuged at 17,000 g for 10 min. The mineral oil phase was decanted, and the DNA amplicons were recovered using two serial extractions using water-saturated diethyl ether. Residual ether was removed using a SpeedVac (30 minutes at RT) and the DNA was concentrated using a PCR purification kit (Zymo research Corp.) as per the manufacturer's instructions. For TCR analysis, eluted cDNA and AMPure XP beads (#A63880, Beckman Coulter) were mixed at a ratio of 2:1 to remove small unlinked cDNAs. After 5 min incubation, the supernatant was removed by using a magnetic rack, and the beads were washed with 200 μL 80% EtOH twice without resuspension. After 10 min drying, beads were reconstituted with 50 μL ultrapure water and the supernatant was recovered by using the magnetic rack. Nested PCR was performed in a total volume of 250 μL, using 10% volume of cDNA, nested primers (Table 10), and DreamTaq™ Hot Start DNA Polymerase (Thermo Fisher Scientific) according to the manufacturer's protocol and the following conditions: 95° C. for 3 min, followed by 30 cycles of 95° C. for 30 s, 62° C. for 30 s, 72° C. for 1 min. Finally, DNA was extended at 72° C. for 7 min. DNA was run on a 1% agarose gel and detected. The ˜550 bp PCR product was isolated from a 1% agarose gel using a gel purification kit (Zymo Research Corp.) according to the manufacturer's protocol.
A one-step procedure was performed to append Illumina adaptor sequences to the amplicon. First, 50 ng of DNA was amplified using NEBNext® High-Fidelity 2×PCR Master Mix (New England BioLabs Inc) in combination with a MiSeqFw primer in Table 4 and MiSeqRev10 (sample C) or MiSeqRev11(sample C′) in Table 8 under the following conditions: 98° C. for 30 s, followed by 6 cycles of 98° C. for 10 s, 62° C. for 30 s, 72° C. for 30 s, and finally a 7 min extension at 72° C. The ˜600 bp PCR product was isolated from a 1% agarose gel using a gel isolation kit and submitted for Illumina MiSeq 2×300 sequencing.
The TCR sequences were quality filtered and annotated using the MiXCR software. Because somatic hypermutation does not occur in TCR genes, the sequences were clustered at the 97% CDR-β3 nucleotide similarity using Usearch (Dekosky et al. 2016), and TCR clusters observed by two or more reads were extract, 6,186 TCRαβ clusters were observed in sample C, and 7,023 TCRαβ clusters in sample C′. Among both replicates, 3,102 identical CDR-β3 amino acid sequences were observed, which must have been originated from identical T cell progenitors. Out of the identical CDR-β3 sequences, 2,706 CDR-β3 paired with identical CDR-α3 in both replicates. This results in 93.4% TCRαβ pairing precision (Table 7).
Next, it was tested whether cell concentration affects the pairing precision of TCRαβ. Frozen PBMCs from a healthy donor (Donor A) were thawed and total T cells were isolated by Pan T Cell Isolation Kit. The T cells were expanded for a week as described above and used for single-cell emulsion RT-PCR at the concentration 2.0×105 cells/mL in a syringe. The volumes of the reagents were described in Table 7. The resulting TCRαβ cDNAs were amplified as described above. MiSeqFw primer in Table 4 and MiSeqRev5 (sample D) or MiSeqRev6 (sample D′) in Table 8 were used for adding Illumina adaptor sequence. The DNA was sequenced with Illumina MiSeq 2× 300. 13,273.5 TCRαβ clusters were detected on the average. Among both replicates, 8,746 identical CDR-β3 amino acid sequences were observed. Out of the identical CDR-β3 sequences, 7,562 CDR-β3 paired with identical CDR-α3 in both replicates. This results in 92.9% TCRαβ pairing precision (Table 7). Thus, more concentrated cells did not disrupt the throughput and pairing precision of single-cell emulsion RT-PCR. Much more concentrated cells could likely be used for single-cell emulsion RT-PCR.
Single-cell emulsion RT-PCR to analyze immune receptors elicited by influenza vaccination. A healthy 25-year-old donor (Donor C) was vaccinated with Fluzone® Quadrivalent inactivated influenza vaccine (after informed consent had been obtained), and then PBMCs were isolated seven days after the vaccination. One million PBMCs were directly used for single-cell emulsion RT-PCR to generate VH-VL fusion amplicons in the volume described in Table 7. In parallel, 650,000 PBMCs were stimulated with 100 ng/mL PMA (#P8139, Sigma Aldrich) and 100 ng/mL ionomycin (#I9657, Sigma Aldrich) for four hours and performed single-cell emulsion RT-PCR to generate TCRαβ fusion amplicons. A technical replicate experiment for TCR sequencing was also performed without SUPERase⋅In™ RNase inhibitor. In this experiment, 1,000 Jurkat T cells were mixed with 650,000 PMA/ionomycin stimulated PBMCs and then performed single-cell emulsion RT-PCR. For these experiments, DT-50 tubes were used for the emulsification (#0003699600, IKA). The emulsion was collected and the aqueous phase were extracted using diethyl ether/ethyl acetate as described above. Then, the aqueous phase was mixed with 2.5 volumes of 100% EtOH and 0.04 volume of 3M sodium acetate and then centrifuged at 17,000×g for 30 min at 4° C. After removing the supernatant, 1 mL 70% EtOH was added and centrifuged at 17,000×g for 5 min. After removing the supernatant, the pellet was dissolved with 400 μL ultrapure water and column concentration was performed according to the manufacturer's protocol (#C1003-50, #D4004-1-L, #D4003-2-48, Zymo Research Corp). cDNA was eluted with 50 μL ultrapure water. For TCR analysis, eluted cDNA and AMPure XP beads (#A63880, Beckman Coulter) were mixed at a ratio of 2:1, and small unlinked cDNAs were removed as described above. Nested PCR was performed with DreamTaq™ Hot Start DNA Polymerase (#EP1702, ThermoFisher Scientific), primers described in Table 2 for BCR, primers described in Table 10 for TCR, 30% of cDNA for BCR, 10% of cDNA for TCR, and the following conditions: 94° C. for 3min initial denaturation, followed by 30 cycles of PCR amplification: 94° C. for 30 s, 62° C. for 30s, 72° C. forlmin. Final extension: 72° C. for 7 min. The amplicon was gel purified and Illumina adaptor sequences were added as described above. MiSeqRev12 (IgM, sample E), MiSeqRev2 (IgG, sample E), MiSeqRev2 (sample F), MiSeqRev7 (sample F′) and MiSeqFw primer were the primers used (Table4 and Table8). VH-VL and TCRαβ sequences were obtained using Illumina MiSeq 2×300 sequencing. 3,276 VH-VL clusters (Table 7, sample E), 7,064 TCRαβ clusters (Table 7, sample F) and 7,325 TCRαβ clusters (Table 7, sample F′) were detected. The TCRαβ pairing precision calculated between F and F′ was 90.2%. The top correct Jurkat-encoded TCRαβ was detected as 821 read counts whereas top Jurkat TCRβ paired with incorrect TCRα was detected as 3 read counts. Thus, the signal to noise ratio in this experiment was 273.6:1.
To determine antigen-specific antibody sequences, VH sequences of plasmablasts and memory B cells from the Fluzone-vaccinated donor were analyzed. The PBMCs freshly drawn from the Fluzone® vaccinee were stained at 4° C. for 15 min in PBS/0.2% BSA with anti-human CD19-v450 (HIB19, BD Biosciences, San Jose, Calif.), CD27-APC (M-T271, BD Biosciences), CD38-PE (HIT2, BioLegend, San Diego, Calif.), CD2O-FITC (2H7, BioLegend), and CD3-PerCP/Cy5.5 (HIT3a, BioLegend). Cells were washed and filtered. Forward (F SC) and side (SSC) light scatters were used to gate broadly on mononucleated cells, and then low SSC-W and low FSC-W gates were drawn to discriminate singlet cell events to collect CD3−CD19lo/−CD20+CD27+ memory B cells and CD3−CD19lo/−CD20−CD27++CD38++ plasmablasts, which were sorted directly into 1 mL TRIzol reagent (Thermo Fisher Scientific) using a FACSAria Fusion cell sorter (BD Biosciences) (
The resulting PCR product was isolated from a 1% agarose gel using a gel purification kit (Zymo Research Corp.) and then sequenced with Illumina MiSeq 2×300. To identify VH-VL sequences of plasmablasts or memory B cells, VH sequences from the plasmablasts and memory B cells were clustered with VH-VL sequences of sample E at the 90% CDR-H3 nucleotide similarity. To know the entire light chain sequence of the identified clonotypes, 50 ng nested PCR product of VH-VL was amplified with hIgK_MiSeqRev, hIgL_MiSeqRev (Table 2), and a primer in Table 13, NEBNext® High-Fidelity 2×PCR Master Mix (New England BioLabs Inc) under the following conditions: 98° C. for 30 s, followed by 12 cycles of 98° C. for 10 s, 62° C. for 30 s, 72° C. for 30 s, and finally a 7 min extension at 72° C. The product was column purified and eluted with 30 μL ultrapure water. Then Illumina adaptor sequence was introduced to the product as described above by using MiSeqRev3(Table 8) and MiSeqFw (Table 4) primers. The product was sequenced with Illumina MiSeq 2×300.
Selected VH:VL sequences from plasmablasts/memory B cells (Table 14) were synthesized as gBlocks (Integrated DNA Technologies) and cloned into IgG expression vector (pcDNA3.4, Invitrogen). Heavy chain plasmid and light chain plasmid were transfected into Expi293 cells at a 1:3 ratio and the cells were incubated at 37° C. with 8% CO2 for a week. The supernatant was recovered and then mixed with 0.04 volume of 25×PBS. Subsequently, the supernatant was centrifuged at 500 g for 10 min at RT. The supernatant was passed over a column containing 1 mL Protein G agarose resin (Thermo Scientific) three times. The column was washed with 20 mL of PBS and then antibodies were eluted with 5 mL 100 mM glycine-HCl (pH 2.7), and neutralized with 1 ml 1 M Tris-HCl (pH 8.0) immediately. Antibodies were buffer-exchanged into PBS using Amicon Ultra-30 centrifugal spin columns (Millipore) and used for Enzyme-linked immunosorbent assay (ELISA).
ELISA was performed with the following influenza Hemagglutinin antigens. Hemagglutinin Protein from Influenza Virus, B/Phuket/3073/2013; H3 Hemagglutinin Protein from Influenza Virus, A/Wisconsin/67/2005 (H3N2), Recombinant from Baculovirus, (#NR-15171, BEI Resources); H3 Hemagglutinin Protein from Influenza Virus, A/New York/55/2004 (H3N2), Recombinant from Baculovirus, (#NR-19241, BEI Resources); H3 Hemagglutinin Protein with C-Terminal Histidine Tag from Influenza Virus, A/Perth/16/2009 (H3N2), Recombinant from Baculovirus (#NR-42974, BEI Resources).The 50% effective concentration (EC50) values based on ELISA were used to determine the apparent binding affinities of the recombinant monoclonal antibodies. First, costar 96-well ELISA plates (Corning) were coated overnight at 4° C. with 4 μg/ml recombinant HAs and washed and blocked with 2% milk in PBS for two hours at RT. After blocking, serially diluted recombinant antibodies bound to the plates for one hour, followed by 1:5000 diluted goat anti-human IgG Fc HRP-conjugated secondary antibodies (Jackson ImmunoResearch; 109-035-008) for one hour. For detection, 50 μl TMB-ultra substrate (Thermo Scientific) was added before quenching with 50 μl M H2504. Absorbance was measured at 450 nm using a Tecan M200 plate reader. Data were analyzed and fitted for EC50 using a 4-parameter logistic nonlinear regression model in the GraphPad Prism software. All ELISA assays were performed in triplicate. As a result, three antibodies showed binding to HA antigens with high affinity (
All of the methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims.
The following references, to the extent that they provide exemplary procedural or other details supplementary to those set forth herein, are specifically incorporated herein by reference.
European Patent No. EP 1 317 539 B
This application claims the benefit of U.S. Provisional Patent Application No. 62/537,686, filed Jul. 27, 2017, the entirety of which is incorporated herein by reference.
This invention was made with government support under Grant No. HDTRA1-12-C-0105 awarded by the Department of Defense/Department of Threat Reduction. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US18/44171 | 7/27/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62537686 | Jul 2017 | US |