The invention generally relates to methods for analyzing cellular nucleic.
Sequencing-by-synthesis involves template-dependent addition of nucleotides to a template/primer duplex. Nucleotide addition is mediated by a polymerase enzyme and added nucleotides may be labeled in order to facilitate their detection. Single molecule sequencing has been used to obtain high-throughput sequence information on individual DNA or RNA. See, Braslaysky, Proc. Natl. Acad. Sci. USA 100: 3960-64 (2003). Recently, all four Watson-Crick nucleotides may be added simultaneously, each with a different detectable label or nucleotides may be added one at a time in a step-and-repeat manner for imaging incorporations.
Although in most applications of this technology the amount of template nucleic acid is not limiting, a number of applications start from small quantities of nucleic acid. For example, when bacteria that cannot be cultured (Rappe et al., Annu. Rev. Microbiol. 57:369-394, 2003) or when cDNA libraries from a small number of cells (Schutze et al., Nat. Biotechnol. 16:737-742, 1998) are sequenced, template nucleic acid amounts limit the number of sequences that may be determined.
In order to increase the amount of template nucleic acid in a sample, an amplification reaction, e.g., PCR, typically is conducted. However, due to the stochastic nature of the amplification reaction, a population of molecules that is present in a small amount in the sample often is overlooked. In fact, if rare nucleic acid is not amplified in the first few rounds of amplification, it becomes increasingly unlikely that the rare event will ever be detected. Thus, the resulting biased post-amplification nucleic acid population does not represent the true condition of the sample from which it was obtained.
There is a need for methods that allow for analysis of samples that include only a small quantity of nucleic acid.
Methods of the invention allow for amplification-free analysis of very small quantities (e.g., nanogram, picogram, or fentogram amounts) of nucleic acids from a cell by direct sequencing methodologies. In this manner, methods of the invention allow for sequencing of biologically important rare cells without potentially biasing nucleic acid manipulation steps such as amplification. In particular embodiments, methods of the invention analyze nucleic acids obtained from only a single cell, such as a cancer cell.
Methods of the invention involve capturing RNA from a lysed cell onto a substrate, producing a cDNA/RNA duplex, removing the RNA from the cDNA/RNA duplex, priming the cDNA to produce a primer/cDNA duplex, exposing the primer/cDNA duplex to at least one detectably labeled nucleotide in the presence of a polymerase capable of catalyzing addition of the nucleotide to the primer/cDNA duplex, detecting incorporation of the nucleotide into the primer portion, and repeating the exposing and detecting steps at least once. In certain embodiments, methods of the invention further involve, prior to the capturing step, lysing a cell to release RNA from that cell. In certain embodiments, methods of the invention further involve removing a complementary strand of the cDNA, and resequencing the cDNA at least once.
Any surface capture method known in the art may be used to capture the RNA onto a substrate. In certain embodiments, the RNA is captured by adding a poly(A) tail to the RNA, and hybridizing the poly(A) tailed RNA to poly(T) primers that are attached to the substrate.
In certain embodiments, producing the cDNA/RNA duplex includes performing a sequencing reaction on the poly(A) tailed RNA that is hybridized to the poly(T) primer, thereby producing the cDNA/RNA duplex. Any sequencing method known in the art may be used. In particular embodiments, the sequencing reaction is a sequencing-by-synthesis reaction.
Either unique primers or universal primers, i.e., primers that all have the same sequence, may be used with methods of the invention. In certain embodiments, priming includes determining the sequence of the cDNA, synthesizing a primer that corresponds to a portion of the cDNA, and hybridizing the primer to the corresponding portion of the cDNA. In other embodiments, priming involves adding a poly(G) tail to a 3′ end of the cDNA, and hybridizing a poly(C) primer to the poly(G) tail. In certain embodiments, prior to the exposing step, the method further involves adding dCTP to the primer/cDNA duplex to fill remaining unpaired poly(G) nucleotides on the poly(G) tail of the cDNA, thereby ensuring alignment of the poly(C) primer with the poly(G) tail. Methods of the invention may also include exposing the primer/cDNA duplex to locked nucleic acids. Methods of the invention may also include, prior to the exposing step, adding dATP in order to fill poly(T) primers on the surface of the substrate that did not pair with an RNA. In certain embodiments, at least a portion of the primer/cDNA duplexes are individually optically resolvable.
Panel B is a graph showing the reproducibility of methods of the invention across independent runs. The Pearson correlation coefficient between two experiments was 0.991. A thousand 490 cells per channel in the AmpliGrid lysis system were used.
Panel C is a graph showing correlation of gene counts obtained with methods of the invention compared to established digital gene expression assays performed using standard approaches (e.g., Lipson et al., Nat Biotechnol 27:652-658, 2009). The Expression profile obtained with methods of the invention of a thousand 490 cells was compared to a standard digital gene expression approach performed with 10 microgram 490 RNA isolated with Trizol from five million cells (Lipson et al., Nat Biotechnol 27:652-658, 2009). Both datasets are in high agreement (r=0.901). In panels B and C, X and Y axes indicate log10 counts obtained per gene, each axes normalized to one million total reads.
Panel D is a graph showing the cumulative read length distribution of raw (diamond), filtered (square) and aligned (triangle) reads in a single channel. The mean aligned read length is 36.5nts.
The invention generally relates to methods for analyzing cellular nucleic acids. Methods of the invention involve capturing RNA from a lysed cell onto a substrate, producing a cDNA/RNA duplex, removing the RNA from the cDNA/RNA duplex, priming the cDNA to produce a primer/cDNA duplex, exposing the primer/cDNA duplex to at least one detectably labeled nucleotide in the presence of a polymerase capable of catalyzing addition of the nucleotide to the primer/cDNA duplex, detecting incorporation of the nucleotide into the primer portion, and repeating the exposing and detecting steps at least once.
Methods of the invention are advantageous because they do not require lossy RNA isolation or poly(A) tailed RNA selection steps. Methods of the invention take advantage of the higher-affinity RNA-DNA hybridization kinetics relative to DNA-DNA hybridization kinetics, allowing a higher fraction of poly(A) tailed templates to be captured on flow cell surfaces. Methods of the invention do not require RNA or cDNA fragmentation or size selection steps, thus minimizing bias particularly against short transcripts. Since each transcript captured on the surface can give rise to only one read, expression levels are independent of transcript length and thus do not require normalization for transcript length or other factors. Due to the lack of biasing, sample manipulation steps such as ligation, restriction digestion or amplification, artifacts are minimized.
Ribonucleic acids (RNA) are derived from naturally occurring sources, e.g., cells. In one embodiment, RNA molecules are isolated from a biological sample containing a variety of other components, such as proteins, lipids and non-template nucleic acids. RNA molecules can be obtained from any cellular material, obtained from an animal, plant, bacterium, fungus, or any other cellular organism. In certain embodiments, the RNA templates are obtained from a single cell. Biological samples for use in the present invention include viral particles or preparations. RNA molecules can be obtained directly from an organism or from a biological sample obtained from an organism, e.g., from blood, urine, cerebrospinal fluid, seminal fluid, saliva, sputum, stool and tissue. Any tissue or body fluid specimen may be used as a source for RNA for use in the invention. RNA molecules can also be isolated from cultured cells, such as a primary cell culture or a cell line. The cells or tissues from which RNA molecules are obtained can be infected with a virus or other intracellular pathogen. A sample can also be total RNA extracted from a biological specimen, a cDNA library, viral, or genomic DNA.
RNA obtained from biological samples typically is fragmented to produce suitable fragments for analysis. In one embodiment, RNA from a biological sample is fragmented by sonication. RNA molecules can be obtained as described in U.S. Patent Application Publication Number US2002/0190663 A1, published Oct. 9, 2003. Generally, RNA can be extracted from a biological sample by a variety of techniques such as those described by Maniatis, et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, N.Y., pp. 280-281 (1982). Generally, individual RNA template molecules can be from about 5 bases to about 20 kb. RNA molecules may be single-stranded, double-stranded, or double-stranded with single-stranded regions (for example, stem- and loop-structures).
A biological sample as described herein may be homogenized or fractionated in the presence of a detergent or surfactant. The concentration of the detergent in the buffer may be about 0.05% to about 10.0%. The concentration of the detergent can be up to an amount where the detergent remains soluble in the solution. In a preferred embodiment, the concentration of the detergent is between 0.1% to about 2%. The detergent, particularly a mild one that is nondenaturing, can act to solubilize the sample. Detergents may be ionic or nonionic. Examples of nonionic detergents include triton, such as the Triton® X series (Triton® X-100 t-Oct-C6H4—(OCH2—CH2)xOH, x=9-10, Triton® X-100R, Triton® X-114 x=7-8), octyl glucoside, polyoxyethylene(9)dodecyl ether, digitonin, IGEPAL® CA630 octylphenyl polyethylene glycol, n-octyl-beta-D-glucopyranoside (betaOG), n-dodecyl-beta, Tween® 20 polyethylene glycol sorbitan monolaurate, Tween® 80 polyethylene glycol sorbitan monooleate, polidocanol, n-dodecyl beta-D-maltoside (DDM), NP-40 nonylphenyl polyethylene glycol, C12E8 (octaethylene glycol n-dodecyl monoether), hexaethyleneglycol mono-n-tetradecyl ether (C14EO6), octyl-beta-thioglucopyranoside (octyl thioglucoside, OTG), Emulgen, and polyoxyethylene 10 lauryl ether (C12E10). Examples of ionic detergents (anionic or cationic) include deoxycholate, sodium dodecyl sulfate (SDS), N-lauroylsarcosine, and cetyltrimethylammoniumbromide (CTAB). A zwitterionic reagent may also be used in the purification schemes of the present invention, such as Chaps, zwitterion 3-14, and 3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulf-onate. It is contemplated also that urea may be added with or without another detergent or surfactant.
Lysis or homogenization solutions may further contain other agents, such as reducing agents. Examples of such reducing agents include dithiothreitol (DTT), β-mercaptoethanol, DTE, GSH, cysteine, cysteamine, tricarboxyethyl phosphine (TCEP), or salts of sulfurous acid.
The RNA capture step prior to producing a cDNA/RNA duplex may be any suitable hybrid capture method. For example, capture can occur in solution, on beads (polystyrene beads), in a column (such as a chromatography column), in a gel (such as a polyacrylamide gel), or directly on the surface to be used for producing the cDNA/RNA. An array of support-bound capture oligos can be used to hybridize specifically to a target sequence. Additionally, chromatography-based capture techniques are useful. For example, ion exchange chromatography, HPLC, gas chromatography, and gel-based chromatography all are useful. In one embodiment, gel-based capture is used in order to achieve sequence-specific capture. Using this method, multiple different sequences are captured simultaneously using immobilized probes in the gel. The target sequences are isolated by removing portions of the gel containing them and eluting target from the gel portions for sequencing.
In an alternative embodiment, the target RNA molecule either includes, or is modified to include, an adaptor sequence (such as a polyadenylation region) that is complementary to a portion of a capture probe in order to aid in the capture of the RNA. A preferred embodiment involves an immobilized capture probe having a sequence that hybridizes (e.g., is complementary to) with the adaptor sequence. Methods of the invention are conducted by contacting capture probes with a sample including RNA molecules from a lysed cell under conditions suitable for specific hybridization between the target RNA molecule and immobilized capture probe, thereby forming target/capture probe duplexes. A wash step removes debris and unhybridized nucleic acid in the sample. In one embodiment, the cDNA/RNA duplex is produced using the capture probe as a primer. In another embodiment, the target/capture probe duplex is melted to release the target RNA. The resulting purified target population is analyzed as described below.
If target nucleic acid is melted off the capture probe, the targets are either attached to a surface for production of cDNA/RNA duplexes or hybridized to primers that have been attached to the surface. Surface attachment of oligonucleotides for sequencing can be direct or indirect. For example, nucleic acids are attached to an epoxide surface via a direct amine linkage as described below. Alternatively, the surface is prepared with a binding partner, the opposite of which is attached to the RNA. For example, the surface can be streptavidinated and biotinylated nucleic acids can be used to form an attachment at the surface. Other binding pairs (e.g., antibody/antigen, such as digoxigenen/anti-digoxigenen and dinitorphenol/anti-dinitrophenol) can also be used.
Further description of hybrid capture is provided in Lapidus (U.S. patent application number 2007/0048744), the content of which is incorporated by reference herein in its entirety.
In a preferred embodiment, RNA molecules are attached to a substrate (also referred to herein as a surface). In certain embodiments, RNA molecules are attached to the surface such that the molecules are individually optically resolvable. Substrates for use in the invention can be two- or three-dimensional and can comprise a planar surface (e.g., a glass slide) or can be shaped. A substrate can include glass (e.g., controlled pore glass (CPG)), quartz, plastic (such as polystyrene (low cross-linked and high cross-linked polystyrene), polycarbonate, polypropylene and poly(methymethacrylate)), acrylic copolymer, polyamide, silicon, metal (e.g., alkanethiolate-derivatized gold), cellulose, nylon, latex, dextran, gel matrix (e.g., silica gel), polyacrolein, or composites.
Suitable three-dimensional substrates include, for example, spheres, microparticles, beads, membranes, slides, plates, micromachined chips, tubes (e.g., capillary tubes), microwells, microfluidic devices, channels, filters, or any other structure suitable for anchoring a nucleic acid. Substrates can include planar arrays or matrices capable of having regions that include populations of template nucleic acids or primers. Examples include nucleoside-derivatized CPG and polystyrene slides; derivatized magnetic slides; polystyrene grafted with polyethylene glycol, and the like.
Substrates are preferably coated to allow optimum optical processing and nucleic acid attachment. Substrates for use in the invention can also be treated to reduce background. Exemplary coatings include epoxides, and derivatized epoxides (e.g., with a binding molecule, such as an oligonucleotide or streptavidin).
Various methods can be used to anchor or immobilize the nucleic acid molecule to the surface of the substrate. The immobilization can be achieved through direct or indirect bonding to the surface. The bonding can be by covalent linkage. See, Joos et al., Analytical Biochemistry 247:96-101, 1997; Oroskar et al., Clin. Chem. 42:1547-1555, 1996; and Khandjian, Mol. Bio. Rep. 11:107-115, 1986. A preferred attachment is direct amine bonding of a terminal nucleotide of the template or the 5′ end of the primer to an epoxide integrated on the surface. The bonding also can be through non-covalent linkage. For example, biotin-streptavidin (Taylor et al., J. Phys. D. Appl. Phys. 24:1443, 1991) and digoxigenin with anti-digoxigenin (Smith et al., Science 253:1122, 1992) are common tools for anchoring nucleic acids to surfaces and parallels. Alternatively, the attachment can be achieved by anchoring a hydrophobic chain into a lipid monolayer or bilayer. Other methods for known in the art for attaching nucleic acid molecules to substrates also can be used.
In certain embodiments, a tail is attached to the RNA molecules. RNA tailing is described for example in Steinman et al. (International patent application number PCT/US09/64001), the content of which is incorporated by reference herein in its entirety. The RNA tails act as a primer binding sites. The primer binding site may be used to hybridize the template RNA molecule to a sequencing primer (e.g., a poly(T) sequence), which may optionally be anchored to a substrate. The primer binding sequence may be a unique sequence including at least 2 bases but likely contains a unique order of all 4 bases and is generally 20-50 bases in length. One example of a specific sequence binding primer is: 5′-CAG GGC AGA GGA TGG ATG CAA GGA TAA GTG GA-3′ (SEQ ID NO: 1). In a particular embodiment, the primer binding sequence is a homopolymer of a single base, e.g. poly(T), generally 20-200 bases in length.
The RNA tail also may include a blocker, e.g., a chain terminating nucleotide, on the 3′-end. The blocker prevents unintended sequence information from being obtained using the 3′-end of the primer binding site inadvertently as a second sequencing primer, particularly when using homopolymeric primer sequences. The blocker may be any moiety that prevents a polymerase from adding bases during incubation with a dNTPs. An exemplary blocker is a nucleotide terminator that lacks a 3′-OH, i.e., a dideoxynucleotide (ddNTP). Common nucleotide terminators are 2′,3′-dideoxynucleotides, 3′-aminonucleotides, 3′-deoxynucleotides, 3′-azidonucleotides, acyclonucleotides, etc. The blocker may have attached a detectable label, e.g. a fluorophore. The label may be attached via a labile linkage, e.g., a disulfide, so that following hybridization of the template RNA to the surface, the locations of the template nucleic acids may be identified by imaging. Generally, the detectable label is removed before commencing with sequencing. Depending upon the linkage, the cleaved product may or may not require further chemical modification to prevent undesirable side reactions, for example following cleavage of a disulfide by TCEP the produced reactive thiol is blocked with iodoacetamide.
Methods of the invention involve attaching the oligonucleotide tail to the template RNA molecules. In certain embodiments, the oligonucleotide tail is attached to the template RNA molecule with an enzyme, such as terminal transferase. The enzyme may be a ligase or a polymerase. The ligase may be any enzyme capable of ligating an oligonucleotide to the template RNA molecule. Suitable ligases include T4 RNA ligase (such ligases are available commercially, from New England Biolabs). Methods for using ligases are well known in the art. The polymerase may be any enzyme capable of adding nucleotides to the 3′ terminus of template RNA molecules. The polymerase may be, for example, yeast poly(A) polymerase, commercially available from USB. The polymerase is used according to the manufacturer's instructions. In a particular embodiment, the enzyme is a terminal transferase, which is commercially from New England Biolabs. The enzyme is used according to the manufacturer's instructions.
The ligation may be blunt ended or via use of complementary over hanging ends. In certain embodiments, the ends of the template RNA are repaired, trimmed (e.g. using an exonuclease), or filled (e.g., using a polymerase and dNTPs), to form blunt ends. Upon generating blunt ends, the ends may be treated with a polymerase and dATP to form a template independent addition to the 3′-end of the template RNA, thus producing a single A overhanging. This single A is used to guide ligation of fragments with a single T overhanging from the 5′-end in a method referred to as T-A cloning.
Alternatively, because the possible combination of overhangs left by the restriction enzymes are known after a restriction digestion, the ends may be left as is, i.e., ragged ends. In certain embodiments double stranded oligonucleotides with complementary over hanging ends are used. In a particular example, the A:T single base over hang method is used (see Steinman et al., International patent application number PCT/US09/64001).
In a particular embodiment, a substrate has anchored a reverse complement to the primer binding sequence of the oligonucleotide, for example 5′-TC CAC TTA TCC TTG CAT CCA TCC TCT GCC CTG (SEQ ID NO: 2) or a poly(T)(50). When homopolymeric sequences are used for the primer, it may be advantageous to perform a procedure known in the art as a “fill and lock”. When poly(A) (20-70) on the template RNA and poly(T)(50) on the surface hybridize there is a high likelihood that there will not be perfect alignment, so the hybrid is filled in by incubating the sample with polymerase and TTP. Following the fill step, the sample is washed and the polymerase is incubated with one or two dNTPs complementary to the base(s) used in the lock sequence. The fill and lock can also be performed in a single step process in which polymerase, TTP and one or two reversible terminators (complements of the lock bases) are mixed together and incubated. The reversible terminators stop addition during this stage and can be made functional again (reversal of inhibitory mechanism) by treatments specific to the analogs used. Some reversible terminators have functional blocks on the 3′-OH which need to be removed while others, for example Helicos BioSciences Virtual Terminators have inhibitors attached to the base via a disulfide which can be removed by treatment with TCEP.
After tailing, the tailed RNA are introduced to primers and RNA template/primer duplexes are formed. A cDNA/RNA duplex is then produced as described below. Further description is provided in Kahvejian (U.S. patent application number 2008/0081330), the content of which is incorporated by reference herein in its entirety.
cDNA/RNA Duplexes
Single molecule sequencing methodologies may be used to produce the cDNA/RNA duplexes, and such methodologies are discussed in further detail below. Single molecule sequencing is shown for example in Lapidus et al. (U.S. Pat. No. 7,169,560), Lapidus et al. (U.S. patent application number 2009/0191565), Quake et al. (U.S. Pat. No. 6,818,395), Harris (U.S. Pat. No. 7,282,337), Quake et al. (U.S. patent application number 2002/0164629), and Braslaysky, et al., PNAS (USA), 100: 3960-3964 (2003), the contents of each of these references is incorporated by reference herein in its entirety.
Briefly, the tailed RNA are attached to a surface of a flow cell. The RNA may be covalently attached to the surface or various attachments other than covalent linking as known to those of ordinary skill in the art may be employed. Moreover, the attachment may be indirect, e.g., via hybridization to a primer already attached to the surface or via a polymerase directly or indirectly attached to the surface. The surface may be planar or otherwise, and/or may be porous or non-porous, or any other type of surface known to those of ordinary skill to be suitable for attachment. A strand of cDNA is then produced by polymerase-mediated addition of nucleotides incorporated into the growing strand surface primer. In certain embodiments, the nucleotides used in the sequencing reaction are not chain terminating nucleotides.
In certain embodiments, it may be beneficial to know the sequence of the cDNA strand, and thus fluorescently-labeled nucleotides are used in the reaction. The cDNA sequence is obtained by imaging the polymerase-mediated addition of fluorescently-labeled nucleotides incorporated into the growing strand surface oligonucleotide, at single molecule resolution. Knowing the sequence of the cDNA allows for design of primers that are capable of hybridizing to the 3′ end of the cDNA molecule for subsequence sequencing reactions.
Alternatively, certain embodiments do not require knowledge of the sequence of the cDNA strand for subsequent sequencing. In these embodiments, the cDNA strand may be produced using unlabeled nucleotides, known as dark filling. The reaction occurs as described above, however, unlabeled nucleotides are used to produce the cDNA strand.
Once the cDNA/RNA duplex is produced, the RNA strand is removed and the cDNA strand is primed with a primer. The template can be removed by any suitable means, for example by raising the temperature of the surface of the flow cell such that the duplex is melted, or by changing the buffer conditions to destabilize the duplex, or combination thereof. Methods for melting nucleic acid duplexes are well known in the art and are described, for example, in chapter 10 of Molecular Cloning, a Laboratory Manual, 3.sup.rd Edition, J. Sambrook, and D. W. Russell, Cold Spring Harbor Press (2001), Lander et al. (U.S. patent application number 2009/0305248), Quake (U.S. patent application number 2006/0019267), and Harris (U.S. patent application number 2009/0053705), content of each of which is incorporated herein by reference. In certain embodiments, the RNA strand of the cDNAIRNA duplex is removed by passing hot water over the duplex. Once dissociated, the RNA may then be removed from the surface, for example, by rinsing the surface with a suitable rinsing solution. The complementary cDNA generated during the copy step remains on the surface because it is extended from the covalently attached poly(T) oligonucleotide.
Once the RNA template has been removed, leaving only a cDNA template attached to the solid support, the cDNA template is primed for subsequent sequencing. There are numerous methods for priming the cDNA template. In certain embodiments, a tail, such as a poly(G) tail is added to the 3′ end of the cDNA strand. The added tail acts as a primer binding site on the cDNA molecule for a subsequent sequencing reaction. In certain embodiments, an oligonucleotide of poly(G) is added to the 3′ end of the cDNA. Adding the oligonucleotide tail of poly(G) may be accomplished by methods described herein, such as using a terminal transferase enzyme. The poly(G) tail is then blocked used a ddGTP. Blocking is described herein. A poly(C) primer is then hybridized to the poly(G) tail. Further description is provided in Harris (U.S. patent application number 2009/0053705), Harris (U.S. Pat. No. 7,282,337), and Harris (U.S. patent application number 2009/0197257), the content of each of which is incorporated by reference herein in its entirety.
In certain embodiments, the poly(C) primer will not perfectly align with the poly(G) tail on the cDNA molecules, so dCTP is added to the primer/cDNA duplex in a stepwise manner to fill remaining unpaired poly(G) nucleotides on the poly(G) tail of the cDNA, thereby ensuring alignment of the poly(C) primer. Once primed, a sequencing reaction is performed. The sequencing reaction is conducted as described herein.
In certain embodiments, after priming of the cDNA molecules, dATP is added in order to fill poly(T) primers on the substrate that did not pair with an RNA, but could still have been G-tailed by terminal transferase. This step prevents undesirable sequencing of G-tailed poly(dT) primers during sequencing, which would result in the abundance of poly-A reads.
In other embodiments, the poly(G) tail is not necessary, because generation of the cDNA strand can provide the sequence of the terminal portion of the cDNA, which information may be used to design a primer that will hybridize to the terminal portion of the cDNA for the second sequencing reaction.
Methods of synthesizing primers are known in the art. See, e.g., Sambrook et al. (DNA microarray: A Molecular Cloning Manual, Cold Spring Harbor, N.Y., 2003) or Maniatis, et al. (Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, N.Y., 1982), the contents of each of which are incorporated by reference herein in their entirety. Suitable methods for synthesizing oligonucleotide primers are also described in Caruthers (Science 230:281-285, 1985), the contents of which are incorporated by reference. The primer includes a nucleotide sequence with substantial complementarity to the cDNA strand, so that the primer hybridize with the cDNA strand. Complementarity between the primer and the cDNA strand need only be sufficient to specifically bind the primer to the cDNA sequence.
Primers suitable for use in the present invention include those formed from nucleic acids, nucleic acid analogs, locked nucleic acids, modified nucleic acids, and chimeric primers of a mixed class including a nucleic acid with another organic component such as peptide nucleic acids. Exemplary nucleotide analogs include phosphate esters of deoxyadenosine, deoxycytidine, deoxyguanosine, deoxythymidine, adenosine, cytidine, guanosine, and uridine. Other examples of non-natural nucleotides include a xanthine or hypoxanthine; 5-bromouracil, 2-aminopurine, deoxyinosine, or methylated cytosine, such as 5-methylcytosine, and N4-methoxydeoxycytosine. Also included are bases of polynucleotide mimetics, such as methylated nucleic acids, e.g., 2′-O-methRNA, peptide nucleic acids, modified peptide nucleic acids, and any other structural moiety that can act substantially like a nucleotide or base, for example, by exhibiting base-complementarity with one or more bases that occur in DNA or RNA.
The length of the primer is not critical, as long as the primer is capable of hybridizing to the cDNA. In fact, primers may be of any length. For example, primers may be as few as 5 nucleotides, or as much as 100 nucleotides. Exemplary primers are 5-mers, 10-mers, 15-mers, 20-mers, 25-mers, 50-mers, or 100-mers. Methods for determining an optimal primer length are known in the art. See, e.g., Shuber (U.S. Pat. No. 5,888,778).
Single molecule sequencing is shown for example in Lapidus et al. (U.S. Pat. No. 7,169,560), Lapidus et al. (U.S. patent application number 2009/0191565), Quake et al. (U.S. Pat. No. 6,818,395), Harris (U.S. Pat. No. 7,282,337), Quake et al. (U.S. patent application number 2002/0164629), Braslaysky, et al., PNAS (USA), 100: 3960-3964 (2003), Harris et al. (Science 320:106-109, 2008), Lipson et al. (Nat Biotechnol 27:652-658, 2009), and Pushkarev et al. (Nat Biotechnol 2009), the contents of each of these references is incorporated by reference herein in its entirety. The following sections discuss general considerations for nucleic acid sequencing, for example, polymerases useful in sequencing-by-synthesis, reaction conditions, signal detection and analysis.
Nucleotides
Nucleotides useful in the invention include any nucleotide or nucleotide analog, whether naturally-occurring or synthetic. For example, preferred nucleotides include phosphate esters of deoxyadenosine, deoxycytidine, deoxyguanosine, deoxythymidine, adenosine, cytidine, guanosine, and uridine. Other nucleotides useful in the invention comprise an adenine, cytosine, guanine, thymine base, a xanthine or hypoxanthine; 5-bromouracil, 2-aminopurine, deoxyinosine, or methylated cytosine, such as 5-methylcytosine, and N4-methoxydeoxycytosine. Also included are bases of polynucleotide mimetics, such as methylated nucleic acids, e.g., 2′-O-methRNA, peptide nucleic acids, modified peptide nucleic acids, locked nucleic acids and any other structural moiety that can act substantially like a nucleotide or base, for example, by exhibiting base-complementarity with one or more bases that occur in DNA or RNA and/or being capable of base-complementary incorporation, and includes chain-terminating analogs. A nucleotide corresponds to a specific nucleotide species if they share base-complementarity with respect to at least one base.
Nucleotides for nucleic acid sequencing according to the invention preferably include a detectable label that is directly or indirectly detectable. Preferred labels include optically-detectable labels, such as fluorescent labels. Examples of fluorescent labels include, but are not limited to, 4-acetamido-4′-isothiocyanatostilbene-2,2′disulfonic acid; acridine and derivatives: acridine, acridine isothiocyanate; 5-(2′-aminoethyl)aminonaphthalene-1-sulfonic acid (EDANS); 4-amino-N-[3-vinylsulfonyl)phenyl]naphthalimide-3,5 disulfonate; N-(4-anilino-1-naphthyl)maleimide; anthranilamide; BODIPY; Brilliant Yellow; coumarin and derivatives; coumarin, 7-amino-4-methylcoumarin (AMC, Coumarin 120), 7-amino-4-trifluoromethylcouluarin (Coumaran 151); cyanine dyes; cyanosine; 4′,6-diaminidino-2-phenylindole (DAPI); 5′5″-dibromopyrogallol-sulfonaphthalein (Bromopyrogallol Red); 7-diethylamino-3-(4′-isothiocyanatophenyl)-4-methylcoumarin; diethylenetriamine pentaacetate; 4,4′-diisothiocyanatodihydro-stilbene-2,2′-disulfonic acid; 4,4′-diisothiocyanatostilbene-2,2′-disulfonic acid; 5-[dimethylamino]naphthalene-1-sulfonyl chloride (DNS, dansylchloride); 4-dimethylaminophenylazophenyl-4′-isothiocyanate (DABITC); eosin and derivatives; eosin, eosin isothiocyanate, erythrosin and derivatives; erythrosin B, erythrosin, isothiocyanate; ethidium; fluorescein and derivatives; 5-carboxyfluorescein (FAM), 5-(4,6-dichlorotriazin-2-yl)aminofluorescein (DTAF), 2′,7′-dimethoxy-4′5′-dichloro-6-carboxyfluorescein, fluorescein, fluorescein isothiocyanate, QFITC, (XRITC); fluorescamine; IR144; IR1446; Malachite Green isothiocyanate; 4-methylumbelliferoneortho cresolphthalein; nitrotyrosine; pararosaniline; Phenol Red; B-phycoerythrin; o-phthaldialdehyde; pyrene and derivatives: pyrene, pyrene butyrate, succinimidyl 1-pyrene; butyrate quantum dots; Reactive Red 4 (Cibacron™ Brilliant Red 3B-A) rhodamine and derivatives: 6-carboxy-X-rhodamine (ROX), 6-carboxyrhodamine (R6G), lissamine rhodamine B sulfonyl chloride rhodamine (Rhod), rhodamine B, rhodamine 123, rhodamine X isothiocyanate, sulforhodamine 13, sulforhodamine 101, sulfonyl chloride derivative of sulforhodamine 101 (Texas Red); N,N,N′,N′tetramethyl-6-carboxyrhodamine (TAMRA); tetramethyl rhodamine; tetramethyl rhodamine isothiocyanate (TRITC); riboflavin; rosolic acid; terbium chelate derivatives; Cy3; Cy5; Cy5.5; Cy7; IRD 700; IRD 800; La Jolta Blue; phthalo cyanine; and naphthalo cyanine. Preferred fluorescent labels are cyanine-3 and cyanine-5. Labels other than fluorescent labels are contemplated by the invention, including other optically-detectable labels.
Polymerases
Nucleic acid polymerases generally useful in the invention include DNA polymerases, RNA polymerases, reverse transcriptases, and mutant or altered forms of any of the foregoing. DNA polymerases and their properties are described in detail in, among other places, DNA Replication 2nd edition, Kornberg and Baker, W. H. Freeman, New York, N.Y. (1991). Known conventional DNA polymerases useful in the invention include, but are not limited to, Pyrococcus furiosus (Pfu) DNA polymerase (Lundberg et al., 1991, Gene, 108: 1, Stratagene), Pyrococcus woesei (Pwo) DNA polymerase (Hinnisdaels et al., 1996, Biotechniques, 20:186-8, Boehringer Mannheim), Thermus thermophilus (Tth) DNA polymerase (Myers and Gelfand 1991, Biochemistry 30:7661), Bacillus stearothermophilus DNA polymerase (Stenesh and McGowan, 1977, Biochim Biophys Acta 475:32), Thermococcus litoralis (Tli) DNA polymerase (also referred to as Vent™ DNA polymerase, Cariello et al., 1991, Polynucleotides Res, 19: 4193, New England Biolabs), 9.degree.Nm™ DNA polymerase (New England Biolabs), Stoffel fragment, ThermoSequenase® (Amersham Pharmacia Biotech UK), Therminator™ (New England Biolabs), Thermotoga maritima (Tma) DNA polymerase (Diaz and Sabino, 1998 Braz J. Med. Res, 31:1239), Thermus aquaticus (Taq) DNA polymerase (Chien et al., 1976, J. Bacteoriol, 127: 1550), DNA polymerase, Pyrococcus kodakaraensis KOD DNA polymerase (Takagi et al., 1997, Appl. Environ. Microbiol. 63:4504), JDF-3 DNA polymerase (from thermococcus sp. JDF-3, Patent application WO 0132887), Pyrococcus GB-D (PGB-D) DNA polymerase (also referred as Deep Vent™ DNA polymerase, Juncosa-Ginesta et al., 1994, Biotechniques, 16:820, New England Biolabs), UlTma DNA polymerase (from thermophile Thermotoga maritima; Diaz and Sabino, 1998 Braz J. Med. Res, 31:1239; PE Applied Biosystems), Tgo DNA polymerase (from thermococcus gorgonarius, Roche Molecular Biochemicals), E. coli DNA polymerase I (Lecomte and Doubleday, 1983, Polynucleotides Res. 11:7505), T7 DNA polymerase (Nordstrom et al., 1981, J. Biol. Chem. 256:3112), and archaeal DP1I/DP2 DNA polymerase II (Cann et al, 1998, Proc. Natl. Acad. Sci. USA 95:14250).
Both mesophilic polymerases and thermophilic polymerases are contemplated. Thermophilic DNA polymerases include, but are not limited to, ThermoSequenase®, 9.degree.Nm™, Therminator™, Taq, Tne, Tma, Pfu, Tfl, Tth, Tli, Stoffel fragment, Vent™ and Deep Vent™ DNA polymerase, KOD DNA polymerase, Tgo, JDF-3, and mutants, variants and derivatives thereof. A highly-preferred form of any polymerase is a 3′ exonuclease-deficient mutant.
Reverse transcriptases useful in the invention include, but are not limited to, reverse transcriptases from HIV, HTLV-1, HTLV-II, FeLV, FIV, SIV, AMV, MMTV, MoMuLV and other retroviruses (see Levin, Cell 88:5-8 (1997); Verma, Biochim Biophys Acta. 473:1-38 (1977); Wu et al., CRC Crit. Rev Biochem. 3:289-347 (1975)).
Detection
Any detection method can be used that is suitable for the type of label employed. Thus, exemplary detection methods include radioactive detection, optical absorbance detection, e.g., UV-visible absorbance detection, optical emission detection, e.g., fluorescence or chemiluminescence. For example, extended primers can be detected on a substrate by scanning all or portions of each substrate simultaneously or serially, depending on the scanning method used. For fluorescence labeling, selected regions on a substrate may be serially scanned one-by-one or row-by-row using a fluorescence microscope apparatus, such as described in Fodor (U.S. Pat. No. 5,445,934) and Mathies et al. (U.S. Pat. No. 5,091,652). Devices capable of sensing fluorescence from a single molecule include scanning tunneling microscope (siM) and the atomic force microscope (AFM). Hybridization patterns may also be scanned using a CCD camera (e.g., Model TE/CCD512SF, Princeton Instruments, Trenton, N.J.) with suitable optics (Ploem, in Fluorescent and Luminescent Probes for Biological Activity Mason, T. G. Ed., Academic Press, Landon, pp. 1-11 (1993), such as described in Yershov et al., Proc. Natl. Acad. Sci. 93:4913 (1996), or may be imaged by TV monitoring. For radioactive signals, a phosphorimager device can be used (Johnston et al., Electrophoresis, 13:566, 1990; Drmanac et al., Electrophoresis, 13:566, 1992; 1993). Other commercial suppliers of imaging instruments include General Scanning Inc., (Watertown, Mass. on the World Wide Web at genscan.com), Genix Technologies (Waterloo, Ontario, Canada; on the World Wide Web at confocal.com), and Applied Precision Inc. Such detection methods are particularly useful to achieve simultaneous scanning of multiple attached template nucleic acids.
A number of approaches can be used to detect incorporation of fluorescently-labeled nucleotides into a single nucleic acid molecule. Optical setups include near-field scanning microscopy, far-field confocal microscopy, wide-field epi-illumination, light scattering, dark field microscopy, photoconversion, single and/or multiphoton excitation, spectral wavelength discrimination, fluorophor identification, evanescent wave illumination, and total internal reflection fluorescence (TIRF) microscopy. In general, certain methods involve detection of laser-activated fluorescence using a microscope equipped with a camera. Suitable photon detection systems include, but are not limited to, photodiodes and intensified CCD cameras. For example, an intensified charge couple device (ICCD) camera can be used. The use of an ICCD camera to image individual fluorescent dye molecules in a fluid near a surface provides numerous advantages. For example, with an ICCD optical setup, it is possible to acquire a sequence of images (movies) of fluorophores.
Some embodiments of the present invention use TIRF microscopy for imaging. TIRF microscopy uses totally internally reflected excitation light and is well known in the art. See, e.g., the World Wide Web at nikon-instruments.jp/eng/page/products/tirf.aspx. In certain embodiments, detection is carried out using evanescent wave illumination and total internal reflection fluorescence microscopy. An evanescent light field can be set up at the surface, for example, to image fluorescently-labeled nucleic acid molecules. When a laser beam is totally reflected at the interface between a liquid and a solid substrate (e.g., a glass), the excitation light beam penetrates only a short distance into the liquid. The optical field does not end abruptly at the reflective interface, but its intensity falls off exponentially with distance. This surface electromagnetic field, called the “evanescent wave”, can selectively excite fluorescent molecules in the liquid near the interface. The thin evanescent optical field at the interface provides low background and facilitates the detection of single molecules with high signal-to-noise ratio at visible wavelengths.
The evanescent field also can image fluorescently-labeled nucleotides upon their incorporation into the attached template/primer complex in the presence of a polymerase. Total internal reflectance fluorescence microscopy is then used to visualize the attached template/primer duplex and/or the incorporated nucleotides with single molecule resolution.
Some embodiments of the invention use non-optical detection methods such as, for example, detection using nanopores (e.g., protein or solid state) through which molecules are individually passed so as to allow identification of the molecules by noting characteristics or changes in various properties or effects such as capacitance or blockage current flow (see, for example, Stoddart et al, Proc. Nat. Acad. Sci., 106:7702, 2009; Purnell and Schmidt, ACS Nano, 3:2533, 2009; Branton et al, Nature Biotechnology, 26:1146, 2008; Polonsky et al, U.S. Application 2008/0187915; Mitchell & Howorka, Angew. Chem. Int. Ed. 47:5565, 2008; Borsenberger et al, J. Am. Chem. Soc., 131, 7530, 2009); or other suitable non-optical detection methods.
Analysis
Alignment and/or compilation of sequence results obtained from the image stacks produced as generally described above utilizes look-up tables that take into account possible sequences changes (due, e.g., to errors, mutations, etc.). Essentially, sequencing results obtained as described herein are compared to a look-up type table that contains all possible reference sequences plus 1 or 2 base errors.
In certain embodiments, the cDNA is resequenced by denaturing the extended complementary strand of the cDNA, removing the newly-synthesized complementary strand, annealing a new primer, and then repeating the experiment with fresh reagents to sequentially analyze the sequence of the same cDNA molecule. This approach is very sensitive because only a single copy of the cDNA molecule is needed to obtain sequence information. Further, releasing the extension product from the cDNA template, e.g., by denaturing, and annealing the cDNA template with a different primer, provides the opportunity to re-read the same cDNA molecule with different sets of nucleotides (e.g., different combinations of two types of labeled nucleotides and two types of unlabeled nucleotides).
In some embodiments, nucleotides lacking any labeling moiety are provided for a period of time to allow unlabeled nucleotides to fill in regions, for example regions that are an already known, until the complementary strand extends to reach unknown regions further downstream. At this point, nucleotides bearing a labeling moiety can be added and analysis begun or continued.
Further description of resequencing is provided in Lander et al. (U.S. patent application serial number 2009/0305248) and Quake (U.S. patent application serial number 2006/0019267), the content of each of which is incorporated by reference herein in its entirety.
References and citations to other documents, such as patents, patent applications, patent publications, journals, books, papers, web contents, have been made throughout this disclosure. All such documents are hereby incorporated herein by reference in their entirety for all purposes.
Various modifications of the invention and many further embodiments thereof, in addition to those shown and described herein, will become apparent to those skilled in the art from the full contents of this document, including references to the scientific and patent literature cited herein. The subject matter herein contains important information, exemplification and guidance that can be adapted to the practice of this invention in its various embodiments and equivalents thereof.
Methods of the invention were used to profile a mouse pancreatic ductal cell line (SM25) and a mouse pancreatic adenocarcinoma cell line (490). SM25 was derived from a mouse with the pancreas targeted expression of KRAS G12D, which induces a precursor pancreatic cancer lesion called PanIN (Aguirre et al., Genes Dev 17:3112-3126, 2003). 490 was derived from a pancreatic adenocarcinoma that was developed in a genetically engineered mouse model incorporating activating mutation KRAS G12D and the conditional loss of p53 targeted specifically to the pancreas (Bardeesy et al. Proc Natl Acad Sci USA 103:5947-5952, 2006).
SM25 and 490 cells in quantities indicated in the figures were lysed in the lysis buffers of PicoPure RNA Isolation (Molecular Devices), AmpliGrid Cell Extraction (Advalytix), CellsDirect Two-Step qRT-PCR (Invitrogen) or FastLane Cell cDNA (Qiagen) kits as instructed by the manufacturers. The lysates were hybridized in 10 μl volume to Helicos poly(dT)-coated sequencing flow cell channels in 1×SSC, 0.05% SDS at 37° C. for 30 minutes. First-strand cDNA was synthesized with the SuperScript III first-strand cDNA synthesis kit (Invitrogen) using manufacturer's recommendations, except no additional primers were added, and the incubation steps were modified as follows: 37° C. 15 minutes, and 55° C. 45 minutes. Subsequent to cDNA synthesis, hot water was passed through the channels to degrade and melt away the RNA strands. Guanine tailing was performed using terminal transferase, adding 500 μl guanine in 20 μl volume in 1×TdT buffer, 2.5 mM CoCl2 and 20 units terminal transferase per channel. The reaction took place at 37° C. for 30 minutes, followed by 3′ blocking with 100 μM ddGTP and ddATP under the same reaction conditions. The poly-C primers were hybridized at 50 nM in 1×SSC, 0.05% SDS at 55° C. for 30 minutes, followed by step-wise “fill” steps with 500 μM cytosine and adenine nucleotides with 5 units Klenow fragment (NEB) in 1×NEB2 buffer and 20 μl reaction volume per channel. The lock step was then performed using virtual terminator guanine and thymidine nucleotide analogs. Sequencing by synthesis was then initiated as described herein. For the experiments with the HIV RT, the only step that was altered was the cDNA synthesis step, which was performed in the 1×HIV RT buffer, 150 μM dNTPs and 10 units enzyme in 20 μl reaction volume. The reaction took place at 42° C. for 30 minutes, followed by 55° C. for 30 minutes.
For RNA extraction, cells were detached from cell culture plates by standard trypsinization and centrifuged into a pellet. The supernatant was removed and cells were flash frozen in liquid nitrogen. Frozen pellets were then thawed and cells were homogenized with the QIAshredder Kit (Qiagen). Total RNA was then extracted with the RNeasy Mini Kit (Qiagen) and treated with on column DNase I (Qiagen) as instructed. Total RNA quantification and quality assessment was done with Nanodrop OD260 and OD260/280 measurements. RNA was then subjected to reverse transcriptase using the Invitrogen Superscript III First-Strand Synthesis System for RT-PCR kit per protocol to make cDNA. Oligo(dT)20 primers from the Invitrogen kit were used for reverse transcription and RNase H was used for RNA removal after cDNA synthesis. The cDNA was then aliquoted for quantitative PCR at 10 ng per reaction. Primers for each gene were obtained from a pre-validated source, PrimerBank (http://pga.mghlarvard.edu/primerbank/). The primers were prepped at a final concentration of 0.4 μM using Power SYBR Green Master Mix per protocol and each condition was done in triplicate. qPCR reactions were run and analyzed using the ABI 7500 Real-Time PCR system (Applied Biosystems). Beta-actin (actb) was used as the endogenous control for each cell line.
Read filtering, alignment (using the IndexDP algorithm) and transcript counting were done as described (Lipson et al. Nat Biotechnol 27:652-658, 2009). The mouse reference used was the MM9 assembly downloaded from the UCSC Genome Browser. For the whole genome alignment of reads, the IndexDP alignment threshold used was 4.3 (Lipson et al. Nat Biotechnol 27:652-658, 2009).
Guanine was chosen over other nucleotides for on-surface TdT-mediated tailing for several reasons: guanine tailing with TdT was generally limited to about 25-30 nts whereas TdT-mediated tailing with other nucleotides exhibit variable lengths and was not as controllable as guanine tailing; adenine tailing would have caused the poly(A) tail at the 3′ end of the cDNA molecules to interact with the poly(dT) capture primers on the surface; and thymidine tailing would have required a poly(A) primer to be used for the sequencing initiation purposes rather than a poly(C) primer, which was have been problematic because the poly(A) primers would hybridize not only to the TdT-generated poly(T) tails, but also to the poly(dT) capture primers on the sequencing surfaces.
SSIII was chosen as the reverse transcriptase for production of the cDNA/RNA duplex because it is a commonly-used reverse transcriptase reported to give satisfactory performance for on-surface applications (Taniguchi et al. Nat Methods 6:503-506, 2009). Profiles that were obtained with this reverse transcriptase were compared to HIV reverse transcriptase to determine any reverse transcriptase specific differences (
To determine the effect of cell quantity used per channel to the number of usable reads obtained, a titration protocol was performed, results of which are shown in
To determine the correlation of methods of the invention to standard digital gene expression methods, profiling of 10 micrograms of 490 RNA (from about 4 million cells) was compared to one thousand 490 cells profiled with methods of the invention. High correlation was observed between both datasets (
Over 97% of the reads were 24-60 nucleotides (nts) in length, with a median length of 36 nts (
Table 1 provides data showing filtered and aligned read yields, read lengths and percentage of reads aligning in the antisense direction of known gene transcription directions are shown. Each row indicates data from single channels of a 50-channel HELISCOPE system (single molecule sequencing by synthesis system, Helicos Biosciences Corporation, Cambridge Mass.). Sample names contain the cell lysis method, cell line, and the reverse transcriptase enzyme used. The last row indicate the yields from 10 microgram 490 RNA profiled with an established digital gene expression method.
To further investigate the quantification ability of methods of the invention and their use in identifying differentially expressed genes, profiles of the 490 and SM25 cells produced by methods of the invention were compared (
The present invention is a continuation-in-part of PCT international patent application number PCT/US10/24547, filed Feb. 18, 2010, which claims the benefit of and priority to U.S. provisional patent application Ser. No. 61/153,548, filed Feb. 18, 2009. The present invention also claims the benefit of and priority to U.S. provisional patent application Ser. No. 61/251,966, filed Oct. 15, 2009. The content of each of these applications is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
61251966 | Oct 2009 | US | |
61153548 | Feb 2009 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12904683 | Oct 2010 | US |
Child | 14158618 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US10/24547 | Feb 2010 | US |
Child | 12904683 | US |