The present invention relates to molecular profiling of cells and tissue. In particular, the invention relates to molecular profiling using proximity ligation-in situ hybridization (PLISH).
In parallel with the development of single-cell RNA sequencing (scRNA-seq), there have been rapid advances in single-molecule in situ hybridization (smISH) techniques that localize RNAs of interest directly in fixed cells (Shah et al. (2016) Neuron 92:342-357; Huss et al. (2015) Cold Spring Harbor Protocols 2015:259-268; Chen et al. (2015) Science 348:aaa6090; Wang et al. (2012) The Journal of Molecular Diagnostics 14:22-29; Larsson et al. (2010) Nature Methods 7:395-397, Raj et al. (2008) Nature Methods 5:877-879; Femino et al. (1998) Science 280:585-590). These smISH techniques involve hybridization of fluorescently-labeled oligonucleotide probes, typically 24-96 per gene, to mark individual RNA molecules with a discrete, diffraction-limited punctum that can be quantitatively analyzed by fluorescence microscopy. smISH has been used in cultured cells to study the subcellular distribution of RNAs (reviewed in Buxbaum et al., (2015) Nature Reviews Molecular Cell Biology 16:95-109), the consequences of stochastic noise on gene expression (Raj et al. (2010) Nature 463:913-918; Raj et al. (2006) PLoS Biology 4:e309), and the impact of cell shape and environment on expression programs (Moffitt et al. (2016) eLife 5:e13065; Battich et al. (2015) Cell 163:1596-1610). Frei and colleagues used a different approach to detect multiple transcripts in cells (Frei et al. (2016) Nature Methods 13:269-275). They adopted a strategy whereby two oligonucleotide probes must independently bind in proximity on the target transcript in order to form a scaffold on which subsequent amplification can take place, which integrates high specificity with high signal generation (Wang et al. (2012) Journal of Molecular Diagnostics 14:22-29; Gross-Thebing et al. (2014) BMC Biology 12:55). Their technique utilized classical proximity ligation (Fredriksson et al. (2002) Nature Biotechnology 20:473-477; Soderberg et al. (2006) Nature Methods 3:995-1000) for specificity and Rolling Circle Amplification (RCA) of padlock probes (Larsson et al. (2010) Nature Methods 7:395-397; Ke et al. (2013) Nature Methods 10:857-860) for signal amplification. This approach was suitable for co-detection of multiple transcripts with proteins in single cells by flow or mass cytometry.
An increasingly important application for smISH is the simultaneous localization of customized panels of transcripts in tissue, which is used to validate putative cell subtypes identified by scRNA-seq studies (Grun and van Oudenaarden, (2015) Cell 163:799-810). Performing smISH in intact tissue can also reveal the spatial relationship between the cells expressing secreted signaling factors and the cells expressing the corresponding receptors, information that current scRNA-seq approaches cannot resolve because they require tissue dissociation with irretrievable loss of spatial context. Finally, when applied on a genome-wide scale in tissues, smISH has the potential to entirely bypass scRNA-seq as an upfront discovery tool.
The development of multiplexed smISH for use in tissue has been challenging due to autofluorescent background and light scattering (Shah et al. (2016) Neuron 92:342-357; Sylwestrak et al. (2016) Cell 164:792-804; Moffitt et al. (2016) PNAS 113:14456-14461; Chen et al. (2016) Nature Methods 13:679-684; Choi et al. (2014) ACS Nano 8:4284-4294; Lyubimova et al. (2013) Nature Protocols 8:1743-1758). One strategy for addressing this problem is to amplify probe signals by the hybridization chain reaction (HCR, reviewed in (Choi et al., (2016) Development 143:3632-3637); see also (Wang et al. (2012) The Journal of Molecular Diagnostics 14:22-29) for branched-DNA amplification), which provides up to five orthogonal detection channels. Higher levels of multiplexing can be achieved by repeated cycles of RNA in situ hybridization followed by a re-amplification step (Shah et al. (2016) Neuron 92:342-357), but because a single round of probe hybridization in tissue sections takes hours, multiplexing with HCR is laborious. Unamplified smISH techniques have the practical advantage that hundreds of endogenous RNA species can be barcoded in a single reaction, and then read out with rapid label-image-erase cycles (Moffitt et al. (2016) PNAS 113:14456-14461; Moffitt et al. (2016) PNAS 113:11046-11051), but these do not provide adequate signal in tissues.
Ideally, a technique for high-throughput profiling in tissue would combine all of the RNA probe hybridization and signal amplification steps into a single reaction. Previously, Nilsson and colleagues presented an elegant enzymatic solution to this problem (Larsson et al. (2010) Nature Methods 7:395-397; Ke et al. (2013) Nature Methods 10:857-860). They used barcoded padlock probes to label cDNA molecules in cells and tissues, and rolling-circle amplification (RCA) to transform the circularized probes into long tandem repeats. The approach worked in tissues and handled an unbounded number of orthogonal amplification channels. The only limitations were that the RNA-detection efficiency was capped at about 15% (each transcript could only be probed at a single site because the 3′ end of the cDNA served as the replication primer), and that the approach required an in situ reverse transcription step with specialized and costly locked nucleic-acid primers.
Thus, better methods are needed for molecular profiling of cells and tissue.
The invention relates to reagents and methods for detecting nucleic acids using proximity ligation-in situ hybridization (PLISH). PLISH utilizes probes, which bind along the length of each target nucleic acid and rolling circle amplification (RCA) to increase the signal for detection. A key feature endowing PLISH with ultrasensitive transcript detection is the oligonucleotide probe design that results in formation of Holliday-like junctions. Specificity is achieved by incorporating proximity ligation, wherein production of a detectable signal depends on binding of at least two probes sufficiently close together on a nucleic acid to allow ligation to produce circular DNA for amplification. Random and even sequence-specific off-target binding of a single probe does not produce a signal. PLISH is compatible with automated image analysis for multiplex expression profiling of large numbers of single cells.
In one aspect, the invention includes a method of detecting one or more target nucleic acids in a sample, the method comprising: a) providing at least one probe set for each target nucleic acid, wherein each probe set comprises: i) a first probe comprising a 5′ overhang region and a region that hybridizes to the target nucleic acid at a first target site; ii) a second probe comprising a 3′ overhang region and a region that hybridizes to the target nucleic acid at a second target site; b) contacting the sample with the probe sets; c) adding at least one bridge oligonucleotide to the sample for each probe set, wherein the bridge oligonucleotide comprises i) a first portion that hybridizes to a complementary portion in the 5′ overhang region of the first probe of the probe set, and ii) a second portion that hybridizes to a complementary portion in the 3′ overhang region of the second probe of the probe set, wherein the first probe and the second probe, when bound to one of the target nucleic acids, are in sufficient proximity to each other to simultaneously hybridize to the bridge oligonucleotide; d) adding at least one circle oligonucleotide to the sample for each probe set, wherein the circle oligonucleotide comprises a first portion that hybridizes to a complementary region at the 5′ end of the 5′ overhang region of the first probe of the probe set, and a second portion that hybridizes to a complementary region at the 3′ end of the 3′ overhang region of the second probe of the probe set; e) forming circular DNA where any two probes of a probe set bind sufficiently close to each other on one of the target nucleic acids to allow ligation of the bridge oligonucleotide and circle oligonucleotide that are hybridized to the two probes to generate a closed circle; f) performing rolling circle amplification, wherein each circular DNA molecule formed serves as a template to produce a concatemer comprising multiple copies of the circular DNA nucleotide sequence; g) contacting each concatemer with one or more imager oligonucleotides, wherein each imager oligonucleotide comprises a detectable label and a nucleotide sequence complementary to one or more sites in the circular DNA sequence, wherein the imager oligonucleotide binds to said sites in the multiple copies of the circular DNA sequence of the concatemer; and h) detecting the bound imager oligonucleotides.
In certain embodiments, the first target site is located either 5′ of the second target site or 3′ of the second target site on the target nucleic acid. In certain embodiments, the first and second target sites are adjacent to each other on the target nucleic acid, or the first and second target sites are contiguous on the target nucleic acid.
In another embodiment a plurality of probe sets comprising probes capable of hybridizing at a plurality of target sites on a single target nucleic acid are used.
In another embodiment, a plurality of probe sets comprising probes capable of hybridizing at a plurality of target sites on multiple target nucleic acids are used for multiplexed detection of a plurality of target nucleic acids. The method may further comprise using a plurality of circle oligonucleotides, wherein each circle oligonucleotide binds to a different probe set; and a plurality of imager oligonucleotides, wherein each imager oligonucleotide comprises a different detectable label. For example, each circle oligonucleotide may comprise one or more binding sites for a different imager oligonucleotide, such that different circle oligonucleotides are bound by different imager oligonucleotides comprising different detectable labels to allow different target nucleic acids to be detectably distinguished from one another.
Exemplary detectable labels include fluorescent labels, bioluminescent labels, chemiluminescent labels, isotopic labels, nanoparticles, and metals.
In certain embodiments, each probe has a similar melting temperature (Tm) for binding to its cognate target site. For example, the Tm may range from about 45° C. to about 65° C., including any Tm within this range such as 45° C., 46° C., 47° C., 48° C., 49° C., 50° C., 51° C., 52° C., 53° C., 54° C., 55° C., 56° C., 57° C., 58° C., 59° C., 60° C., 61° C., 62° C., 63° C., 64° C., or 65° C.
In certain embodiments, the target nucleic acids are RNA or DNA. For example, a target nucleic acid may be an RNA selected from the group consisting of a messenger RNA, a ribosomal RNA, a transfer RNA, a non-coding RNA, and a regulatory RNA.
In certain embodiments, a bridge oligonucleotide or circle oligonucleotide comprises at least one binding site for an imager oligonucleotide. In other embodiments, the bridge and circle oligonucleotides both comprise at least one binding site for an imager oligonucleotide. In another embodiment, a circle oligonucleotide comprises multiple binding sites for an imager oligonucleotide.
In certain embodiments, the target nucleic acids are in a cell. The cell may be a eukaryotic cell (e.g., an animal cell, a plant cell, a fungal cell, or a protist cell), a prokaryotic cell, an archaeon cell, or an artificial cell. In another embodiment, the cell is a human cell. The cell may be a fixed cell or a live cell. In another embodiment, the method further comprising lysing or permeabilizing the cell.
In certain embodiments, the target nucleic acids are in a population of cells, a tissue, an organ, or an organism. For example, methods of the invention may be performed on a sample comprising a plurality of cell types, such as a biopsy or blood sample potentially including immune cells, progenitor or stem cells, or cancer cells. In certain embodiments, the method further comprises mapping an anatomical location for at least one target nucleic acid in a tissue or organ.
In certain embodiments, a cell or tissue is exposed to a test condition prior to said contacting the sample with one or more probe sets. For example, the test condition may comprise exposing a cell or tissue to a drug, a ligand for a receptor, a hormone, a second messenger, a pathogen, a genetic modification, a change in temperature, growth media, membrane potential, or osmotic pressure.
In certain embodiments, a subset of the target nucleic acids is detected simultaneously.
In certain embodiments, the detectable labels on the imager oligonucleotides are fluorescent labels. Such labels can be detected, for example, by performing fluorescence imaging. In some embodiments, multiple cycles of fluorescence imaging are performed to allow detection of subsets of the target nucleic acids sequentially.
In another embodiment, subsets of the target nucleic acids are detected sequentially by a method comprising: a) contacting the sample with a subset of the imager oligonucleotides; b) performing a cycle of fluorescence imaging; c) removing the imager oligonucleotides from the sample; d) contacting the sample with another subset of the imager oligonucleotides; e) performing another cycle of fluorescence imaging; and f) removing the imager oligonucleotides from the sample. The method may further comprise repeating steps (a)-(f) until all of the imager oligonucleotides have been used for detection of the plurality of target nucleic acids.
In another embodiment, the method further comprises sequencing at least one target nucleic acid.
In another embodiment, the method further comprises detecting at least one protein in the sample. For example, the method may further comprise performing immunohistochemistry on the sample.
In certain embodiments, a plurality of cell types is present in the sample. In another embodiment, the method further comprises identifying at least one cell type based on detection of one or more target nucleic acids. In some embodiments, the identification of cell types is automated by using an algorithm for cell classification, such as a clustering algorithm (e.g., K-means clustering) or a machine learning algorithm (e.g., t-distributed stochastic neighbor embedding).
In another aspect, the invention includes a composition for detecting one or more target nucleic acids in a sample comprising: a) at least one probe set for each target nucleic acid, wherein each probe set comprises: i) a first probe comprising a 5′ overhang region and a region capable of hybridizing to the target nucleic acid at a first target site; ii) a second probe comprising a 3′ overhang region and a region capable of hybridizing to the target nucleic acid at a second target site; b) at least one bridge oligonucleotide for each probe set, wherein the bridge oligonucleotide comprises i) a first portion capable of hybridizing to a complementary portion in the 5′ overhang region of the first probe of the probe set, and ii) a second portion capable of hybridizing to a complementary portion in the 3′ overhang region of the second probe of the probe set, wherein the first probe and the second probe, when bound to one of the target nucleic acids, are in sufficient proximity to each other to simultaneously hybridize to the bridge oligonucleotide; and c) at least one circle oligonucleotide for each probe set, wherein the circle oligonucleotide comprises a first portion capable of hybridizing to a complementary region at the 5′ end of the 5′ overhang region of the first probe of the probe set, and a second portion capable of hybridizing to a complementary region at the 3′ end of the 3′ overhang region of the second probe of the probe set.
In another aspect, the invention includes a kit comprising any of the compositions described herein and instructions for detecting target nucleic acids. The kit may further comprise other reagents for detecting target nucleic acids, as described herein, such as a ligase and/or reagents for performing rolling circle amplification (e.g., a polymerase, deoxyribonucleotides).
In another aspect, the invention includes an oligonucleotide comprising a nucleotide sequence selected from the group consisting of SEQ ID NOS:1-464, or sequences displaying at least about 80-100% sequence identity thereto, including any percent identity within this range, such as 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99% sequence identity thereto.
In another aspect, the invention includes a system for performing multiplex detection of target nucleic acids comprising a hybridization chamber sealed to a solid support, such as a coverslip or slide supporting a cell or tissue sample. Multiplex assays are performed by stepwise application of the oligonucleotide reagents, including the probes, circle oligonucleotides, bridge oligonucleotides, and imager oligonucleotides through an inlet port to the hybridization chamber. Oligonucleotide reagents travel through an outlet port of the hybridization chamber to contact cells or tissue on the solid support.
The methods of the invention may be combined with any other method for measuring cellular parameters, including but not limited to immunostaining, immunohistochemistry, mass cytometry, or fluorescence activated cell sorting (FACS), or any other method that can be used to characterize a cell subpopulation of interest (e.g., by detection of cellular markers such as protein markers that differentiate different cell types of interest). Quantification of detection probes may be used to determine the abundances of target nucleic acids and may be used to identify cells expressing the target nucleic acids at different levels.
These and other embodiments of the subject invention will readily occur to those of skill in the art in view of the disclosure herein.
The practice of the present invention will employ, unless otherwise indicated, conventional methods of chemistry, cell biology, biochemistry, molecular biology and recombinant DNA techniques, and immunology within the skill of the art. Such techniques are explained fully in the literature. See, e.g., RNA: Methods and Protocols (Methods in Molecular Biology, edited by H. Nielsen, Humana Press, 1st edition, 2010); Rio et al. RNA: A Laboratory Manual (Cold Spring Harbor Laboratory Press; 1st edition, 2010); Farrell RNA Methodologies: Laboratory Guide for Isolation and Characterization (Academic Press; 4th edition, 2009); PCR Technology: Current Innovations (T. Nolan and S. A. Bustin eds., CRC Press, 3rd edition, 2013); Antibodies A Laboratory Manual (E. A. Greenfield ed., Cold Spring Harbor Laboratory Press, 2nd Lab edition, 2013); A. L. Lehninger, Biochemistry (Worth Publishers, Inc., current addition); Sambrook, et al., Molecular Cloning: A Laboratory Manual (3rd Edition, 2001); Methods In Enzymology (S. Colowick and N. Kaplan eds., Academic Press, Inc.).
All publications, patents and patent applications cited herein, whether supra or infra, are hereby incorporated by reference in their entireties.
In describing the present invention, the following terms will be employed, and are intended to be defined as indicated below.
It must be noted that, as used in this specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to “a probe” includes two or more probes, and the like.
The term “about,” particularly in reference to a given quantity, is meant to encompass deviations of plus or minus five percent.
As used herein, a “cell” refers to any type of cell from a prokaryotic, eukaryotic, or archaeon organism, including bacteria, archaea, fungi, protists, plants, and animals, including cells from tissues, organs, and biopsies, as well as recombinant cells, cells from cell lines cultured in vitro, and cellular fragments, cell components, or organelles comprising nucleic acids. The term also encompasses artificial cells, such as nanoparticles, liposomes, polymersomes, or microcapsules encapsulating nucleic acids. A cell may include a fixed cell, permeabilized cell, or a live cell. The methods described herein can be performed, for example, on a sample comprising a single cell, a population of cells, or a tissue or organ.
A “live cell,” as used herein, refers to an intact cell, naturally occurring or modified. The live cell may be isolated from other cells, mixed with other cells in a culture, or within a tissue (partial or intact), or an organism.
The terms “polynucleotide,” “oligonucleotide,” “nucleic acid” and “nucleic acid molecule” are used herein to include a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. This term refers only to the primary structure of the molecule. Thus, the term includes triple-, double- and single-stranded DNA, as well as triple-, double- and single-stranded RNA. It also includes modifications, such as by methylation and/or by capping, and unmodified forms of the polynucleotide. More particularly, the terms “polynucleotide,” “oligonucleotide,” “nucleic acid” and “nucleic acid molecule” include polydeoxyribonucleotides (containing 2-deoxy-D-ribose), polyribonucleotides (containing D-ribose), any other type of polynucleotide which is an N- or C-glycoside of a purine or pyrimidine base, and other polymers containing nonnucleotidic backbones, for example, polyamide (e.g., peptide nucleic acids (PNAs)) and polymorpholino (commercially available from the Anti-Virals, Inc., Corvallis, Oreg., as Neugene) polymers, and other synthetic sequence-specific nucleic acid polymers providing that the polymers contain nucleobases in a configuration which allows for base pairing and base stacking, such as is found in DNA and RNA. There is no intended distinction in length between the terms “polynucleotide,” “oligonucleotide,” “nucleic acid” and “nucleic acid molecule,” and these terms will be used interchangeably. Thus, these terms include, for example, 3′-deoxy-2′,5′-DNA, oligodeoxyribonucleotide N3′ P5′ phosphoramidates, 2′-O-alkyl-substituted RNA, double- and single-stranded DNA, as well as double- and single-stranded RNA, DNA:RNA hybrids, and hybrids between PNAs and DNA or RNA, and also include known types of modifications, for example, labels which are known in the art, methylation, “caps,” substitution of one or more of the naturally occurring nucleotides with an analog, internucleotide modifications such as, for example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoramidates, carbamates, etc.), with negatively charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.), and with positively charged linkages (e.g., aminoalklyphosphoramidates, aminoalkylphosphotriesters), those containing pendant moieties, such as, for example, proteins (including nucleases, toxins, antibodies, signal peptides, poly-L-lysine, etc.), those with intercalators (e.g., acridine, psoralen, etc.), those containing chelators (e.g., metals, radioactive metals, boron, oxidative metals, etc.), those containing alkylators, those with modified linkages (e.g., alpha anomeric nucleic acids, etc.), as well as unmodified forms of the polynucleotide or oligonucleotide.
“Recombinant” as used herein to describe a nucleic acid molecule means a polynucleotide of genomic, cDNA, viral, semisynthetic, or synthetic origin which, by virtue of its origin or manipulation is not associated with all or a portion of the polynucleotide with which it is associated in nature. The term “recombinant” as used with respect to a protein or polypeptide means a polypeptide produced by expression of a recombinant polynucleotide. In general, the gene of interest is cloned and then expressed in transformed organisms, as described further below. The host organism expresses the foreign gene to produce the protein under expression conditions.
As used herein, a “solid support” refers to a solid surface such as a magnetic bead, latex bead, microtiter plate well, glass plate, nylon, agarose, acrylamide, and the like.
“Substantially purified” generally refers to isolation of a substance (e.g., compound, nucleic acid, oligonucleotide, protein, or peptide composition) such that the substance comprises the majority percent of the sample in which it resides. Typically, in a sample, a substantially purified component comprises 50%, preferably 80%-85%, more preferably 90-95% of the sample. Techniques for purifying polynucleotides and polypeptides of interest are well-known in the art and include, for example, ion-exchange chromatography, affinity chromatography and sedimentation according to density.
By “isolated” is meant, when referring to a protein, polypeptide or peptide, that the indicated molecule is separate and discrete from the whole organism with which the molecule is found in nature or is present in the substantial absence of other biological macro molecules of the same type. The term “isolated” with respect to a nucleic acid is a nucleic acid molecule devoid, in whole or part, of sequences normally associated with it in nature; or a sequence, as it exists in nature, but having heterologous sequences in association therewith; or a molecule disassociated from the chromosome.
As used herein, the term “target nucleic acid region” or “target nucleic acid” denotes a nucleic acid molecule with a “target sequence” to be detected or amplified. The target nucleic acid may be either single-stranded or double-stranded and may include other sequences besides the target sequence. The term “target sequence” or “target site” refers to the particular nucleotide sequence of the target nucleic acid which is detected by binding of a probe. The target sequence may include a probe-hybridizing region contained within the target molecule with which a probe will form a stable hybrid under desired conditions. The “target sequence” may also include the sequences to which oligonucleotide primers complex and are extended using the target sequence as a template. Where the target nucleic acid is originally single-stranded, the term “target sequence” also refers to the sequence complementary to the “target sequence” as present in the target nucleic acid. If the “target nucleic acid” is originally double-stranded, the term “target sequence” refers to both the plus (+) and minus (−) strands (or sense and anti-sense strands).
The term “adjacent” or “substantially adjacent” as used herein refers to the positioning of two regions or target sites on the target nucleic acid. The two adjacent regions or target sites (e.g., where a pair of probes bind) may be separated by 0 up to 150 nucleotides, including any number of nucleotides in this range such as 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 100, 110, 120, 130, 140, or 150 nucleotides. A zero nucleotide gap means that the two regions or target sites directly abut one another. In other words, the two regions bound by a pair of probes may be contiguous, i.e. there is no gap between the two target sites. Alternatively, the two regions hybridized by the oligonucleotides may be separated by 1 to about 150 nucleotides.
The term “primer” or “oligonucleotide primer” as used herein, refers to an oligonucleotide that hybridizes to the template strand of a nucleic acid and initiates synthesis of a nucleic acid strand complementary to the template strand when placed under conditions in which synthesis of a primer extension product is induced, i.e., in the presence of nucleotides and a polymerization-inducing agent such as a DNA or RNA polymerase and at suitable temperature, pH, metal concentration, and salt concentration. The primer is preferably single-stranded for maximum efficiency in amplification, but may alternatively be double-stranded. If double-stranded, the primer can first be treated to separate its strands before being used to prepare extension products. This denaturation step is typically effected by heat, but may alternatively be carried out using alkali, followed by neutralization. Thus, a “primer” is complementary to a template, and complexes by hydrogen bonding or hybridization with the template to give a primer/template complex for initiation of synthesis by a polymerase, which is extended by the addition of covalently bonded bases linked at its 3′ end complementary to the template in the process of DNA or RNA synthesis. Typically, nucleic acids are amplified using at least one set of oligonucleotide primers comprising at least one forward primer and at least one reverse primer capable of hybridizing to regions of a nucleic acid flanking the portion of the nucleic acid to be amplified.
The term “amplicon” refers to the amplified nucleic acid product of a polymerase chain reaction (PCR), rolling circle amplification (RCA), or other nucleic acid amplification process.
As used herein, the term “probe” or “oligonucleotide probe” refers to a polynucleotide, as defined above, that contains a nucleic acid sequence complementary to a nucleic acid sequence present in the target nucleic acid analyte. The polynucleotide regions of probes may be composed of DNA, and/or RNA, and/or synthetic nucleotide analogs. Probes may be labeled in order to detect the target sequence. Such a label may be present at the 5′ end, at the 3′ end, at both the 5′ and 3′ ends, and/or internally.
The terms “hybridize” and “hybridization” refer to the formation of complexes between nucleotide sequences which are sufficiently complementary to form complexes via Watson-Crick base pairing. Where a primer “hybridizes” with target (template), such complexes (or hybrids) are sufficiently stable to serve the priming function required by, e.g., the DNA polymerase to initiate DNA synthesis.
It will be appreciated that the hybridizing sequences need not have perfect complementarity to provide stable hybrids. In many situations, stable hybrids will form where fewer than about 10% of the bases are mismatches, ignoring loops of four or more nucleotides. Accordingly, as used herein the term “complementary” refers to an oligonucleotide that forms a stable duplex with its “complement” under assay conditions, generally where there is about 90% or greater homology.
The terms “selectively detects” or “selectively detecting” refer to the detection of a nucleic acid (e.g., DNA or RNA transcript) using oligonucleotides (e.g., probes, circle oligonucleotides, and bridge oligonucleotides) that are capable of detecting a particular target sequence, for example, by amplifying and/or binding to a target sequence of a particular nucleic acid, or ligation product or extension product thereof, but do not amplify and/or bind to other nucleic acid sequences under appropriate hybridization conditions.
As used herein, the term “detectable label” refers to a molecule or substance capable of detection, including, but not limited to, fluorescers, chemiluminescers, chromophores, bioluminescent proteins, enzymes, enzyme substrates, enzyme cofactors, enzyme inhibitors, isotopic labels, semiconductor nanoparticles, dyes, metal ions, metal sols, ligands (e.g., biotin, streptavidin or haptens) and the like. The term “fluorescer” refers to a substance or a portion thereof which is capable of exhibiting fluorescence in the detectable range. Particular examples of labels which may be used in the practice of the invention include, but are not limited to, SYBR green, SYBR gold, a CAL Fluor dye such as CAL Fluor Gold 540, CAL Fluor Orange 560, CAL Fluor Red 590, CAL Fluor Red 610, and CAL Fluor Red 635, a Quasar dye such as Quasar 570, Quasar 670, and Quasar 705, an Alexa Fluor such as Alexa Fluor 350, Alexa Fluor 488, Alexa Fluor 546, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647,and Alexa Fluor 784, a cyanine dye such as Cy 3, Cy3.5, Cy5, Cy5.5, and Cy7, fluorescein, 2′,4′,5′,7′-tetrachloro-4-7-dichlorofluorescein (TET), carboxyfluorescein (FAM), 6-carboxy-4′,5′-dichloro-2′,7′-dimethoxyfluorescein (JOE), hexachlorofluorescein (HEX), rhodamine, carboxy-X-rhodamine (ROX), tetramethyl rhodamine (TAMRA), FITC, dansyl, umbelliferone, dimethyl acridinium ester (DMAE), Texas red, luminol, and quantum dots, enzymes such as alkaline phosphatase (AP), beta-lactamase, chloramphenicol acetyltransferase (CAT), adenosine deaminase (ADA), aminoglycoside phosphotransferase (neo®, G418′) dihydrofolate reductase (DHFR), hygromycin-B-phosphotransferase (HPH), thymidine kinase (TK), β-galactosidase (lacZ), and xanthine guanine phosphoribosyltransferase (XGPRT), beta-glucuronidase (gus), placental alkaline phosphatase (PLAP), and secreted embryonic alkaline phosphatase (SEAP). Enzyme tags are used with their cognate substrate. The term also includes chemiluminescent labels such as luminol, isoluminol, acridinium esters, and peroxyoxalate and bioluminescent proteins such as firefly luciferase, bacterial luciferase, Renilla luciferase, and aequorin. The term also includes isotopic labels, including radioactive and non-radioactive isotopes, such as, 3H, 2H, 120I , 123I, 124I, 125, 131I , 35S, 11C, 13C, 14C, 32P, 15N, 13N, 110I, 111In, 177Lu, 18F, 52Fe, 62Cu, 64Cu, 67Cu, 67Ga, 68Ga, 86Y, 90Y, 89Zr, 94mTc, 94Tc, 99mTc, 154Gd, 155Gd, 156Gd, 157Gd, 158Gd, 15O, 186Re, 188Re, 51M, 52Mn, 55Co, 72As, 75Br, 76Br, 82mRb, and 83Sr. The term also includes color-coded microspheres of known fluorescent light intensities (see e.g., microspheres with xMAP technology produced by Luminex (Austin, Tex.); microspheres containing quantum dot nanocrystals, for example, containing different ratios and combinations of quantum dot colors (e.g., Qdot nanocrystals produced by Life Technologies (Carlsbad, Calif.); glass coated metal nanoparticles (see e.g., SERS nanotags produced by Nanoplex Technologies, Inc. (Mountain View, Calif.); barcode materials (see e.g., sub-micron sized striped metallic rods such as Nanobarcodes produced by Nanoplex Technologies, Inc.), encoded microparticles with colored bar codes (see e.g., CellCard produced by Vitra Bioscience, vitrabio.com), glass microparticles with digital holographic code images (see e.g., CyVera microbeads produced by Illumina (San Diego, Calif.), near infrared (NIR) probes, and nanoshells. The term also includes contrast agents such as ultrasound contrast agents (e.g. SonoVue microbubbles comprising sulfur hexafluoride, Optison microbubbles comprising an albumin shell and octafluoropropane gas core, Levovist microbubbles comprising a lipid/galactose shell and an air core, Perflexane lipid microspheres comprising perfluorocarbon microbubbles, and Perflutren lipid microspheres comprising octafluoropropane encapsulated in an outer lipid shell), magnetic resonance imaging (MRI) contrast agents (e.g., gadodiamide, gadobenic acid, gadopentetic acid, gadoteridol, gadofosveset, gadoversetamide, gadoxetic acid), and radiocontrast agents, such as for computed tomography (CT), radiography, or fluoroscopy (e.g., diatrizoic acid, metrizoic acid, iodamide, iotalamic acid, ioxitalamic acid, ioglicic acid, acetrizoic acid, iocarmic acid, methiodal, diodone, metrizamide, iohexol, ioxaglic acid, iopamidol, iopromide, iotrolan, ioversol, iopentol, iodixanol, iomeprol, iobitridol, ioxilan, iodoxamic acid, iotroxic acid, ioglycamic acid, adipiodone, iobenzamic acid, iopanoic acid, iocetamic acid, sodium iopodate, tyropanoic acid, and calcium iopodate).
The term “subject” or “host subject” includes bacteria, archaea, fungi, protists, plants, and animals (both vertebrates and invertebrates), including, without limitation, plants such as flowering plants (e.g., Arabidopsis thaliana), conifers and other gymnosperms, ferns, clubmosses, hornworts, liverworts, mosses (e.g., Physcomitrella patens), and green algae (e.g., Chlamydomonas reinhardtii); fungi such as molds and yeasts (e.g., Saccharomyces cerevisiae, Schizosaccharomyces pombe), protists such as amoebae, flagellates, and ciliates (e.g., Tetrahymena thermophila); worms (e.g., Caenorhabditis elegans), insects such as beetles, ants, bees, moths, butterflies, and flies (e.g., Drosophila melanogaster), amphibians such as frogs (e.g., Xenopus tropicalis, Xenopus laevis) and salamanders (e.g., axolotls); fish (e.g., Danio rerio, Fundulus heteroclitus, Nothobranchius furzeri); reptiles; mammals, including human and non-human mammals such as non-human primates, including chimpanzees and other apes and monkey species; laboratory animals such as mice, rats, rabbits, hamsters, guinea pigs, and chinchillas; domestic animals such as dogs and cats; farm animals such as sheep, goats, pigs, horses and cows; and birds such as domestic, wild and game birds, including chickens, turkeys and other gallinaceous birds, ducks, and geese. In some cases, the methods of the invention find use in experimental animals, in veterinary application, and in the development of animal models for disease, including, but not limited to, rodents including mice, rats, and hamsters; primates, and transgenic animals.
As used herein, a “biological sample” refers to a sample of cells, tissue, or fluid isolated from a subject, including but not limited to, for example, blood, plasma, serum, fecal matter, urine, bone marrow, bile, spinal fluid, lymph fluid, samples of the skin, external secretions of the skin, respiratory, intestinal, and genitourinary tracts, tears, saliva, milk, cells, muscles, joints, organs, biopsies, and also samples of in vitro cell culture constituents including but not limited to conditioned media resulting from the growth of cells and tissues in culture medium, e.g., recombinant cells, and cell components.
Before describing the present invention in detail, it is to be understood that this invention is not limited to particular formulations or process parameters as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments of the invention only, and is not intended to be limiting.
Although a number of methods and materials similar or equivalent to those described herein can be used in the practice of the present invention, the preferred materials and methods are described herein.
The invention relates to the discovery of a novel approach for multiplexed detection of nucleic acids by in situ hybridization. The method combines the specificity of proximity ligation, the sensitivity of using multiple probes to target a transcript, and the high signal produced by rolling circle amplification. The engineered probes are designed to capitalize on the formation of Holliday-like junctions for optimal signal amplification. PLISH provides single molecule resolution and allows for quantitation of a virtually unlimited number of transcripts within individual cells.
In order to further an understanding of the invention, a more detailed discussion is provided below regarding molecular profiling of cells and tissue with PLISH.
A. Detecting Nucleic Acids with PLISH
The PLISH method is typically performed as follows: A tissue or cell sample is incubated with one or more pairs of probes (i.e., probe set). The two probes (referred to as right H probe and left H probe) in each probe set hybridize at adjacent sites on a target nucleic acid. The sample is then washed to remove excess unbound probes. Bridge and circle oligonucleotides, chemically or enzymatically phosphorylated at their 5′ ends, are hybridized to the bound pairs of adjacent probes. The sample is then washed to remove excess unbound bridge and circle oligonucleotides. The sample is treated with a ligase resulting in probe-templated ligation of the bridge and circle oligonucleotides to create a closed single-stranded DNA (ssDNA) circle. The sample is optionally washed to remove excess ligase. Rolling circle amplification is performed on the closed ssDNA circle, primed by the 3′ end of the right H probe. The sample is optionally washed to remove excess polymerase. Detectably labeled imager oligonucleotides are added to the sample, which hybridize to the rolling-circle amplicons, either directly or indirectly through adapter oligonucleotides. The sample is optionally washed to remove excess imager oligonucleotides. The target nucleic acids are detected by measuring a signal from the bound imager oligonucleotides. The sample can be imaged to reveal the location of the detectably labeled imager oligonucleotides complexed with the target nucleic acids.
A target nucleic acid may be any nucleic acid of interest (e.g., RNA or DNA, or a modified nucleic acid). In some embodiments, the target nucleic acid is a coding RNA (e.g., messenger RNA (mRNA)) or a non-coding RNA (e.g., transfer RNA (tRNA), ribosomal RNA (rRNA), microRNA (miRNA), mature miRNA, immature miRNA, small nuclear RNA (snRNA), or long noncoding RNA (lncRNA)). In some embodiments, the target nucleic acid is a splice variant of an RNA molecule (e.g., mRNA, pre-mRNA). The target nucleic acid may be an unspliced RNA (e.g., pre-mRNA, mRNA), a partially spliced RNA, or a fully spliced RNA.
Target nucleic acids of interest may differ in abundance within a cell population or exhibit differential expression in association with a disease or condition. The methods of the invention can be used for molecular profiling of cells to measure expression levels of nucleic acids, including without limitation RNA transcripts in individual cells.
In some embodiments, the target nucleic acid is DNA (e.g., denatured genomic, viral, or plasmid DNA). For example, the methods can be used to detect copy number variants or rare genetic variants and determine their abundances in a cell population.
The methods of the invention may be applied to cell samples comprising a single cell or a population of cells of interest and can be performed on any type of cell, including any cell from a prokaryotic, eukaryotic, or archaeon organism, including bacteria, archaea, fungi, protists, plants, and animals. Cells from tissues, organs, and biopsies, as well as recombinant cells, cells from cell lines cultured in vitro, and artificial cells (e.g., nanoparticles, liposomes, polymersomes, or microcapsules encapsulating nucleic acids) may all be used in the practice of the invention. The methods of the invention are also applicable for detecting nucleic acids in cellular fragments, cell components, or organelles comprising nucleic acids.
In some embodiments, PLISH is performed on an intact cell, naturally occurring or modified. The cell may be isolated from other cells, mixed with other cells in a culture, or within a tissue (partial or intact), or an organism. In some embodiments, the cell is lysed or permeabilized. PLISH is well suited for use with fixed cells and tissues, such as fixed cells and tissues obtained from a subject, e.g., in a clinical setting. For example, PLISH can be used on conventional formalin-fixed tissues that have been cryo- or paraffin-embedded and can be performed concurrently with immunostaining.
In some instances, the methods described herein will find use in detection, quantification, and/or mapping of RNA transcripts in a cell or tissue sample from a subject. Cell or tissue samples may be collected from any animal, including humans, livestock, pets, laboratory animals, bioproduction animals (e.g., animals used to generate a bioproduct), and the like. Mammals of interest from which such samples may be derived include but are not limited to e.g., humans, ungulates (e.g., any species or subspecies of porcine (pig), bovine (cattle), ovine (sheep) and caprine (goats), equine (horses), camelids (camels) or, generally, hooved domestic or farm animals, etc.), rodents (e.g., mice, rats, gerbils, hamsters, guinea pigs, and the like), rabbits, cats, dogs, primates, and the like.
In some instances, samples may be derived from non-human animals including but not limited to non-human mammals. Non-human mammals from which samples may be derived include but are not limited to those listed above. Non-human animals from which samples may be derived include but are not limited to those listed above and, in addition, e.g., avians (i.e., birds, such as, e.g., chicken, duck, etc.), amphibians (e.g., frogs), fish, etc.
The methods of the invention may be performed, for example, on cells, tissue, or organs of the nervous system, muscular system, respiratory system, cardiovascular system, skeletal system, reproductive system, integumentary system, lymphatic system, excretory system, endocrine system (e.g. endocrine and exocrine), or digestive system. Any type of cell can potentially be used, as described herein, including, but not limited to, epithelial cells (e.g., squamous, cuboidal, columnar, and pseudostratified epithelial cells), endothelial cells (e.g., vein, artery, and lymphatic vessel endothelial cells), and cells of connective tissue, muscles, and the nervous system. Such cells may include, but are not limited to, epidermal cells, fibroblasts, chondrocytes, skeletal muscle cells, satellite cells, heart muscle cells, smooth muscle cells, keratinocytes, basal cells, ameloblasts, exocrine secretory cells, myoepithelial cells, osteoblasts, osteoclasts, neurons (e.g., sensory neurons, motor neurons, and interneurons), glial cells (e.g., oligodendrocytes, astrocytes, ependymal cells, microglia, Schwann cells, and satellite cells), pillar cells, adipocytes, pericytes, stellate cells, pneumocytes, blood and immune system cells (e.g., erythrocytes, monocytes, dendritic cells, macrophages, neutrophils, eosinophils, mast cells, T cells, B cells, natural killer cells), hormone-secreting cells, germ cells, interstitial cells, lens cells, photoreceptor cells, taste receptor cells, and olfactory cells; as well as cells and/or tissue from the kidney, liver, pancreas, stomach, spleen, gall bladder, intestines, bladder, lungs, prostate, breasts, urogenital tract, pituitary cells, oral cavity, esophagus, skin, hair, nail, thyroid, parathyroid, adrenal gland, eyes, nose, or brain.
At least one probe set is provided for each target nucleic acid to be detected, wherein each probe set comprises: i) a first probe comprising a 5′ overhang region and a region that hybridizes to the target nucleic acid at a first target site; ii) a second probe comprising a 3′ overhang region and a region that hybridizes to the target nucleic acid at a second target site.
A target site is a complementary region of the target nucleic acid to which a probe binds. A pair of probes in a probe set bind to a pair of different target sites that are sufficiently close together to allow simultaneous hybridization to a bridge oligonucleotide. The probes will usually hybridize to two adjacent regions (i.e., target sites) on the target nucleic acid, which may be separated by 0 up to 150 nucleotides, including any number of nucleotides in this range such as 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 100, 110, 120, 130, 140, or 150 nucleotides. A zero nucleotide gap means that the two regions or target sites directly abut one another. In other words, the two regions bound by a pair of probes may be contiguous, i.e. there is no gap between the two target sites. Alternatively, the two regions hybridized by the probe oligonucleotides may be separated by 1 to about 150 nucleotides. Target sites are typically present on the same strand of the target nucleic acid in the same orientation. Target sites are usually selected to provide a unique binding site not present in other nucleic acids in the sample. Each target site is generally from about 18 to about 30 nucleotides in length, or any length within this range such as 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in length.
In some embodiments, the probes in a probe set have a similar melting temperature for binding to their cognate target sites. For example, the Tm may range from about 45° C. to about 65° C., including any Tm within this range such as 45° C., 46° C., 47° C., 48° C., 49° C., 50° C., 51° C., 52° C., 53° C., 54° C., 55° C., 56° C., 57° C., 58° C., 59° C., 60° C., 61° C., 62° C., 63° C., 64° C., or 65° C.
A bridge oligonucleotide hybridizes to a pair of probes to form a complex on a target nucleic acid. The bridge oligonucleotide comprises i) a first portion that hybridizes to a complementary region in the 5′ overhang region of one probe of the pair, and ii) a second portion that hybridizes to a complementary region in the 3′ overhang region of the second probe of the pair. The first probe and the second probe, when bound to a target nucleic acid, are in sufficient proximity to each other to simultaneously hybridize to the bridge oligonucleotide to allow formation of the complex with the bridge oligonucleotide on the target nucleic acid. A signal is only generated when two probes hybridize sufficiently close to each other on a target nucleic acid to allow hybridization of the circle oligonucleotide in this manner.
A circle oligonucleotide comprises a first portion that hybridizes to a complementary region at the 5′ end of the 5′ overhang region of the first probe of a probe set, and a second portion that hybridizes to a complementary region at the 3′ end of the 3′ overhang region of the second probe of the probe set. Circular DNA forms where any two probes of a probe set bind sufficiently close to each other on one of the target nucleic acids to allow ligation of a bridge oligonucleotide and circle oligonucleotide that are hybridized to the two probes to generate a closed circle.
Rolling circle amplification (RCA) is performed with each circular DNA molecule formed serving as a template to produce a concatemer comprising multiple copies of the circular DNA nucleotide sequence. RCA is an isothermal nucleic acid amplification technique that uses a polymerase to extend a primer annealed to a circular template to produce a long ssDNA concatemer that contains tens to hundreds of tandem repeats of a sequence complementary to the circular template. A strand-displacing polymerase, such as Phi29, Bst, or Vent exo-DNA polymerase can be used for rolling circle amplification. For a description of RCA, see, e.g., Ali et al. (2014) Chemical Society Reviews 43 (10):3324-3341, Demidov (2002) Expert Rev. Mol. Diagn. 2(6):542-548; herein incorporated by reference.
The length of the oligonucleotide reagents (e.g., probes, circle oligonucleotides, bridge oligonucleotides, and imager oligonucleotides) will vary and may be 10 or more nucleotides and range from 10 to 100 or more nucleotides, including e.g., 10 to 100 nucleotides, 20 to 90 nucleotides, 30 to 80 nucleotides, 40 to 60 nucleotides, 10 to 50 nucleotides, 12 to 50 nucleotides, 14 to 50 nucleotides, 16 to 50 nucleotides, 18 to 50 nucleotides, 20 to 50 nucleotides, 22 to 50 nucleotides, 24 to 50 nucleotides, 26 to 50 nucleotides, 28 to 50 nucleotides, 30 to 50 nucleotides, 10 to 40 nucleotides, 12 to 40 nucleotides, 14 to 40 nucleotides, 16 to 40 nucleotides, 18 to 40 nucleotides, 20 to 40 nucleotides, 22 to 40 nucleotides, 24 to 40 nucleotides, 26 to 40 nucleotides, 28 to 40 nucleotides, 30 to 40 nucleotides, 10 to 30 nucleotides, 12 to 30 nucleotides, 14 to 30 nucleotides, 16 to 30 nucleotides, 18 to 30 nucleotides, 20 to 30 nucleotides, 12 or more nucleotides, 13 or more nucleotides, 14 or more nucleotides, 15 or more nucleotides, 16 or more nucleotides, 17 or more nucleotides, 18 or more nucleotides, 19 or more nucleotides, 20 or more nucleotides, 30 or more nucleotides, 40 or more nucleotides, 50 or more nucleotides, 60 or more nucleotides, 12 nucleotides, 13 nucleotides, 14 nucleotides, 15 nucleotides, 16 nucleotides, 17 nucleotides, 18 nucleotides, 19 nucleotides, 20 nucleotides, 21 nucleotides, 22 nucleotides, 23 nucleotides, 24 nucleotides, 25 nucleotides, 26 nucleotides, 27 nucleotides, 28 nucleotides, 29 nucleotides, 30 nucleotides, 35 nucleotides, 40 nucleotides, 45 nucleotides, 50 nucleotides, 55 nucleotides, 60 nucleotides etc. Exemplary oligonucleotide sequences for probes, circle oligonucleotides, bridge oligonucleotides, and imager oligonucleotides are shown in Example 1 and SEQ ID NOS:1-464 of the Sequence Listing.
In some instances, the oligonucleotides of the subject disclosure may include one or more nucleoside analogs. For example, in some instances, imager oligonucleotides of the instant disclosure may include one or more deoxyribouracil (i.e., deoxyribose uracil, -deoxyuridine, etc.) nucleosides/nucleotides. In certain instances, an oligonucleotide may include 2 or more nucleoside analogs including but not limited to e.g., 3 or more, 4 or more, 5 or more, 6 or more, etc. In some instances, the number of nucleoside analogs as a percentage of the total bases of an oligonucleotide is 1% or more, including but not limited to e.g., 2% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 11% or more, 12% or more, 13% or more, 14% or more, 15% or more, 16% or more, 17% or more, 18% or more, 19% or more, 20% or more, 21% or more, 22% or more, 23% or more, 24% or more, 25% or more, 26% or more, 27% or more, 28% or more, 29% or more, 30% or more, etc.
Probes, circle oligonucleotides, bridge oligonucleotides, and imager oligonucleotides for use in the assays described herein are readily synthesized by standard techniques, e.g., solid phase synthesis via phosphoramidite chemistry, as disclosed in U.S. Pat. Nos. 4,458,066 and 4,415,732, incorporated herein by reference; Beaucage et al., Tetrahedron (1992) 48:2223-2311; and Applied Biosystems User Bulletin No. 13 (1 Apr. 1987). Other chemical synthesis methods include, for example, the phosphotriester method described by Narang et al., Meth. Enzymol. (1979) 68:90 and the phosphodiester method disclosed by Brown et al., Meth. Enzymol. (1979) 68:109. Poly(A) or poly(C), or other non-complementary nucleotide extensions may be incorporated into oligonucleotides using these same methods. Hexaethylene oxide extensions may be coupled to the oligonucleotides by methods known in the art. Cload et al., J. Am. Chem. Soc. (1991) 113:6324-6326; U.S. Pat. No. 4,914,210 to Levenson et al.; Durand et al., Nucleic Acids Res. (1990) 18:6353-6359; and Horn et al., Tet. Lett. (1986) 27:4705-4708.
B. Multiplexing
The methods described herein can be readily used to screen a sample for the presence of target nucleic acids. The methods are suitable for detection of a single target nucleic acid as well as multiplex analyses in which two or more different target nucleic acids are detected in a sample. In some instances, multiple nucleic acids (e.g., RNA transcripts) may be screened in a single sample, and the presence or quantities of each target nucleic acid may be assessed. The detection methods described herein may be utilized in parallel for the detection and measurement of large numbers of target nucleic acids in a cell or tissue sample. The methods of the invention are capable of highly sensitive and highly multiplexed assessment of many different target nucleic acids in a single sample.
In some embodiments, a plurality of different target nucleic acids are detected in a sample, such as up to 2, up to 3, up to 4, up to 5, up to 6, up to 7, up to 8, up to 9, up to 10, up to 12, up to 15, up to 18, up to 20, up to 25, up to 30, up to 40, up to 50, up to 60, up to 70, up to 80, up to 90, up to 100, up to 500, up to 1000, or more distinct target nucleic acids.
A multiplexed assay may make use of various different probes, circle oligonucleotides, bridge oligonucleotides, and uniquely labeled imager oligonucleotides for detection of particular target nucleic acids. For multiplex assays, the number of different probe sets, circle oligonucleotides, bridge oligonucleotides, and imager oligonucleotides that may be employed typically ranges from about 2 to about 20 or higher, e.g., up to 100 or higher, 1000 or higher, etc., including but not limited to e.g., 2 to 50, 2 to 100, 10 to 100, 50 to 100, 50 to 200, 50 to 300, 50 to 400, 50 to 500, etc.
Multiplexed assays are generally performed using a plurality of probe sets. The number of probe sets will vary depending on the number and/or type of target nucleic acids to be screened. Accordingly, in some instances, probe set libraries may be used for screening large numbers of target nucleic acids. Libraries may be categorized by the type of RNA transcripts targeted by probes contained in the library, including e.g., libraries which contain various probes for detection of mRNAs in particular cell types, tissues, or organs, or associated with particular disease states, developmental stages, or physiological conditions.
The number of different probes sets will vary and may range from 10 or less to 1000 or more, including but not limited to e.g., 10 to 1000, 20 to 1000, 30 to 1000, 40 to 1000, 50 to 1000, 60 to 1000, 70 to 1000, 80 to 1000, 90 to 1000, 100 to 1000, 100 to 900, 100 to 800, 100 to 700, 100 to 600, 100 to 500, 100 to 400, 100 to 300, 100 to 200, 10 to 900, 10 to 800, 10 to 700, 10 to 600, 10 to 500, 10 to 400, 10 to 300, 10 to 200, 10 to 100, 20 to 100, 30 to 100, 40 to 100, 50 to 100, 60 to 100, 70 to 100, 80 to 100, 90 to 100, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 250, 500, 1000, etc. The different probes of a library may be physically separated, e.g., in separate containers or separate wells of a multi-well plate, or may not be physically separated, i.e., may be pooled, in a single solution, in a single container, etc.
In some instances, a library of probe sets may include a corresponding library of circle oligonucleotides, bridge oligonucleotides, or imager oligonucleotides for multiplexed detection of the target nucleic acids. Libraries of the present disclosure may also include one or more additional reagents for performing all or part of a method as described herein, including e.g., additional reagents for ligation, rolling circle amplification, detection, etc. In some instances, additional reagents may be included in a pooled library. For example, in some instances, reagents for ligation (e.g., a ligase) or rolling circle amplification (polymerase and deoxyribonucleotides) may be included within a pooled library of probe sets. In some instances, additional reagents may be included in the individual wells of a multi-well plate. For example, in some instances, reagents for ligation or rolling circle amplification (e.g., a polymerase, dNTPs, etc.) may be included within the wells of a multi-well plate probe set library. Appropriate buffers, salts, etc. may or may not be included in the libraries as described. In some instances, libraries and/or components thereof, e.g., a probe set library, may be provided in a lyophilized form and may be rehydrated upon use.
C. Detection
The presence of target nucleic acids is determined by using detectably labeled imager oligonucleotides that bind to sites in the circular DNA sequence that is amplified by rolling circle amplification. Imager oligonucleotides may be detectably labeled with any molecule or substance capable of detection, including, but not limited to, fluorescers, chemiluminescers, chromophores, bioluminescent proteins, enzymes, enzyme substrates, enzyme cofactors, enzyme inhibitors, isotopic labels, semiconductor nanoparticles, dyes, metal ions, metal sols, ligands (e.g., biotin, streptavidin or haptens) and the like. Representative examples of detectable labels, which may be used in the practice of the invention, include, but are not limited to, SYBR green, SYBR gold, a CAL Fluor dye such as CAL Fluor Gold 540, CAL Fluor Orange 560, CAL Fluor Red 590, CAL Fluor Red 610, and CAL Fluor Red 635, a Quasar dye such as Quasar 570, Quasar 670, and Quasar 705, an Alexa Fluor such as Alexa Fluor 350, Alexa Fluor 488, Alexa Fluor 546, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647,and Alexa Fluor 784, a cyanine dye such as Cy 3, Cy3.5, Cy5, Cy5.5, and Cy7, fluorescein, 2′,4′,5′,7′-tetrachloro-4-7-dichlorofluorescein (TET), carboxyfluorescein (FAM), 6-carboxy-4′,5′-dichloro-2′,7′-dimethoxyfluorescein (JOE), hexachlorofluorescein (HEX), rhodamine, carboxy-X-rhodamine (ROX), tetramethyl rhodamine (TAMRA), FITC, dansyl, umbelliferone, dimethyl acridinium ester (DMAE), Texas red, luminol, and quantum dots, enzymes such as alkaline phosphatase (AP), beta-lactamase, chloramphenicol acetyltransferase (CAT), adenosine deaminase (ADA), aminoglycoside phosphotransferase (neo®, G418′) dihydrofolate reductase (DHFR), hygromycin-B-phosphotransferase (HPH), thymidine kinase (TK), β-galactosidase (lacZ), and xanthine guanine phosphoribosyltransferase (XGPRT), beta-glucuronidase (gus), placental alkaline phosphatase (PLAP), and secreted embryonic alkaline phosphatase (SEAP). Enzyme tags are used with their cognate substrate. Detectable labels also include chemiluminescent labels such as luminol, isoluminol, acridinium esters, and peroxyoxalate and bioluminescent proteins such as firefly luciferase, bacterial luciferase, Renilla luciferase, and aequorin. Detectable labels also include isotopic labels, including radioactive and non-radioactive isotopes, such as, 3H, 2H, 120I, 123I, 124I, 125I, 131I, 35S, 11C, 13C, 14C, 32P, 15N, 13N, 110In, 111In, 177Ln, 18F, 52Fe, 62Cu, 64Cu, 67Cu, 67Ga, 68Ga, 86Y, 90Y, 89Zr, 94mTc, 94Tc, 99mTc, 154Gd, 155Gd, 156Gd, 157Gd, 158Gd, 15O, 186Re, 188Re, 51M, 52mMn, 55Co, 72As, 75Br, 76Br, 82mRb, and 83Sr. Detectable labels also include color-coded microspheres of known fluorescent light intensities (see e.g., microspheres with xMAP technology produced by Luminex (Austin, Tex.); microspheres containing quantum dot nanocrystals, for example, containing different ratios and combinations of quantum dot colors (e.g., Qdot nanocrystals produced by Life Technologies (Carlsbad, Calif.); glass coated metal nanoparticles (see e.g., SERS nanotags produced by Nanoplex Technologies, Inc. (Mountain View, Calif.); barcode materials (see e.g., sub-micron sized striped metallic rods such as Nanobarcodes produced by Nanoplex Technologies, Inc.), encoded microparticles with colored bar codes (see e.g., CellCard produced by Vitra Bioscience, vitrabio.com), glass microparticles with digital holographic code images (see e.g., CyVera microbeads produced by Illumina (San Diego, Calif.), near infrared (NIR) probes, and nanoshells. Detectable labels also include contrast agents such as ultrasound contrast agents (e.g. SonoVue microbubbles comprising sulfur hexafluoride, Optison microbubbles comprising an albumin shell and octafluoropropane gas core, Levovist microbubbles comprising a lipid/galactose shell and an air core, Perflexane lipid microspheres comprising perfluorocarbon microbubbles, and Perflutren lipid microspheres comprising octafluoropropane encapsulated in an outer lipid shell), magnetic resonance imaging (MRI) contrast agents (e.g., gadodiamide, gadobenic acid, gadopentetic acid, gadoteridol, gadofosveset, gadoversetamide, gadoxetic acid), and radiocontrast agents, such as for computed tomography (CT), radiography, or fluoroscopy (e.g., diatrizoic acid, metrizoic acid, iodamide, iotalamic acid, ioxitalamic acid, ioglicic acid, acetrizoic acid, iocarmic acid, methiodal, diodone, metrizamide, iohexol, ioxaglic acid, iopamidol, iopromide, iotrolan, ioversol, iopentol, iodixanol, iomeprol, iobitridol, ioxilan, iodoxamic acid, iotroxic acid, ioglycamic acid, adipiodone, iobenzamic acid, iopanoic acid, iocetamic acid, sodium iopodate, tyropanoic acid, and calcium iopodate).
The label may be a directly detectable label, which can be directly detected without the use of additional reagents, or an indirectly detectable label, which is detectable by employing one or more additional reagents (e.g., where the label is a member of a signal producing system made up of two or more components). In some embodiments, the imager oligonucleotides comprise directly detectable labels such as, but not limited to, fluorescent labels, radioisotopic labels, chemiluminescent labels, chelated metals, and the like.
In some embodiments, the label is a fluorescent label, wherein detection of a target nucleic acid involves detection of a fluorescent signal from bound imager oligonucleotides. A concatemer comprising a repeating circular DNA sequence is produced by rolling circle amplification, and the amplification product is detected by hybridization of one or more fluorescently labeled imager oligonucleotides to the amplification product. Any convenient means for detecting fluorescence may be used for detecting the bound imager oligonucleotides, including but not limited to, e.g., fluorescence microscopy, flow cytometry, imaging flow cytometry, etc.
For multiplex assays, each RNA species can be detectably labeled in a unique color by using imager oligonucleotides with spectrally-distinct fluorophores. Fluorescence micrographs can be interpreted by direct visual inspection. Typically, up to five distinct channels can be simultaneously detected and imaged by conventional fluorescence microscopy, as well as allowing a determination of RNA abundance.
Multiple cycles of fluorescence imaging may be performed to allow detection of larger numbers of transcripts. Subsets of the target nucleic acids may be imaged sequentially. For example, a sample may be contacted with a subset of the imager oligonucleotides designed for detection of specific target nucleic acids, followed by performing a cycle of fluorescence imaging. Before performing another round of fluorescence imaging, the imager oligonucleotides are removed from the sample, for example, by using a wash step. Then, additional imager oligonucleotides are added to the sample to detect additional target nucleic acids.
Highly multiplexed measurement of different RNA species may require a large number of iterated data collection cycles. Ideally, the cycles should be fast, and removal of the bound imager oligonucleotides between cycles should not cause any mechanical or chemical damage to the sample. Short imager oligonucleotides (e.g., up to 11 nucleotides in length), which equilibrate rapidly on and off of the RCA amplicons, can be removed with a simple buffer exchange (see Example 1). Alternatively, uracil-containing imager oligonucleotides can be used, which can be readily removed by a brief enzymatic digestion (e.g., see Example 1 for a description of removal of uracil-containing imager oligonucleotides with uracil-specific excision reagent (USER) enzyme).
In certain embodiments, RNA species are imaged in sets of 5, with differently colored fluorophores associated with different targets (most fluorescence microscopes can only accommodate 5 color channels). In order to overcome the limit of 5 color channels on a typical fluorescence microscope, iterative rounds of staining, imaging and erasing can be used to colocalize large numbers of distinct RNA species in sequential images.
D. Applications
The methods and compositions described herein have particular utility in the detection, quantification, and/or mapping of target nucleic acids present in a sample. Such detection may find various applications in a variety of technological fields including but not limited to e.g., basic scientific research (e.g., biomedical research, biochemistry research, immunological research, molecular biology research, microbiological research, cellular biology research, genetics, and the like), medical and/or pharmaceutical research (e.g., drug discovery research, drug design research, drug development research, pharmacology, toxicology, medicinal chemistry, preclinical research, clinical research, personalized medicine, and the like), medicine, epidemiology, public health, biotechnology, veterinary science, veterinary medicine, agriculture, material science, molecular detection, molecular diagnostics, and the like.
Multiplexed assays can be used in molecular profiling to identify distinct cell-types and cell populations. The methods of the invention can be used to map all or some of the molecularly distinct cell types that make up a complex tissue based on their expression of target nucleic acids. Multiplexed assays can be used, for example, in molecular profiling to identify distinct cell populations within a tissue to determine the organization of cells in various systems including solid tumors and developing organs.
In particular, the methods of the invention should have many applications, for example, in the discovery and localization of novel cell types, the mapping of signaling centers, analysis of development, or molecular profiling of cell-types associated with disease. The methods of the invention can be used in analysis of formalin-fixed and paraffin-embedded samples, cryo-preserved samples and legacy tissue bank samples. In particular, the methods are applicable to clinical pathology labs. Additionally, the methods can be used in medical diagnostics based on multiplexed expression profiling in primary patient samples, with no prior purification or isolation of cells. Examples include: (a) direct liquid biopsy, such as for detection of circulating cancer cells or fetal cells by profiling patient blood products on a microscope slide, (b) quality control of patient stem cells monitoring the gene expression of stem cells that are being differentiated ex vivo for therapeutic purposes, and (c) discovery and use of context-dependent biomarkers, i.e., biomarkers that provide a definitive diagnosis when observed in a specific tissue context.
E. Automated Image Analysis and Cell Classification
In some embodiments, image analysis and identification of cell types in a tissue based on the detected target nucleic acids present is automated by use of an algorithm or classifier. Automated analysis will be particularly useful for multiplex assays involving detection of large numbers of RNA transcripts. Cell types can be identified and classified using techniques known in the art. For example, a machine learning algorithm or clustering algorithm may be used.
The machine learning algorithm may comprise a supervised learning algorithm. Examples of supervised learning algorithms may include Average One-Dependence Estimators (AODE), Artificial neural network (e.g., Backpropagation), Bayesian statistics (e.g., Naive Bayes classifier, Bayesian network, Bayesian knowledge base), Case-based reasoning, Decision trees, Inductive logic programming, Gaussian process regression, Group method of data handling (GMDH), Learning Automata, Learning Vector Quantization, Minimum message length (decision trees, decision graphs, etc.), Lazy learning, Instance-based learning Nearest Neighbor Algorithm, Analogical modeling, Probably approximately correct learning (PAC) learning, Ripple down rules, a knowledge acquisition methodology, Symbolic machine learning algorithms, Subsymbolic machine learning algorithms, Support vector machines, Random Forests, Ensembles of classifiers, Bootstrap aggregating (bagging), and Boosting. Supervised learning may comprise ordinal classification such as regression analysis and Information fuzzy networks (IFN). Alternatively, supervised learning methods may comprise statistical classification, such as AODE, Linear classifiers (e.g., Fisher's linear discriminant, Logistic regression, Naive Bayes classifier, Perceptron, and Support vector machine), quadratic classifiers, k-nearest neighbor, Boosting, Decision trees (e.g., C4.5, Random forests), Bayesian networks, and Hidden Markov models.
The machine learning algorithms may also comprise an unsupervised learning algorithm. Examples of unsupervised learning algorithms may include a t-distributed stochastic neighbor embedding algorithm, artificial neural network, data clustering, expectation-maximization algorithm, self-organizing map, radial basis function network, vector quantization, generative topographic map, information bottleneck method, and IBSEAD. Unsupervised learning may also comprise association rule learning algorithms such as Apriori algorithm, Eclat algorithm and FP-growth algorithm. Hierarchical clustering, such as Single-linkage clustering and Conceptual clustering, may also be used. Alternatively, unsupervised learning may comprise partitional clustering such as K-means algorithm and Fuzzy clustering.
In some instances, machine learning algorithms comprise a reinforcement learning algorithm. Examples of reinforcement learning algorithms include, but are not limited to, temporal difference learning, Q-learning and Learning Automata. Alternatively, the machine learning algorithm may comprise Data Pre-processing.
F. Kits
The above-described assay reagents, including probes, circle oligonucleotides, bridge oligonucleotides, and imager oligonucleotides, and optionally reagents for performing ligation and rolling circle amplification can be provided in kits, with suitable instructions and other necessary reagents, in order to conduct the assays for detecting target nucleic acids (e.g., DNA or RNA transcripts) as described above. The kit will normally contain in separate containers the probes, circle oligonucleotides, bridge oligonucleotides, and imager oligonucleotides, and other reagents that the assay format requires. Instructions (e.g., written, CD-ROM, DVD, Blu-ray, flash drive, digital download, etc.) for carrying out the assays usually will be included in the kit. The kit can also contain, depending on the particular assay used, other packaged reagents and materials (i.e., wash buffers, and the like). Assays for detecting nucleic acids, as described herein, can be conducted using these kits.
In certain embodiments, the kit comprises one or more oligonucleotide reagents (e.g., probe, circle, bridge, and imager oligonucleotides) comprising a nucleotide sequence selected from the group consisting of SEQ ID NOS:1-464, or sequences displaying at least about 80-100% sequence identity thereto, including any percent identity within this range, such as 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99% sequence identity thereto.
Below are examples of specific embodiments for carrying out the present invention. The examples are offered for illustrative purposes only, and are not intended to limit the scope of the present invention in any way.
Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, temperatures, etc.), but some experimental error and deviation should, of course, be allowed for.
Introduction
Here, we report an in situ hybridization technique with performance characteristics that enable rapid and scalable single-cell expression profiling in tissue. Our approach is a simplified variant of the padlock/RCA technique which replaces padlock probes with RNA-templated proximity ligation (Soderberg et al. (2006) Nature Methods 3:995-1000; Frei et al. (2016) Nature Methods 13:269-275) at Holliday junctions (Labib et al. (2013) Analytical Chemistry 85:9422-9427); hence, we term it proximity ligation in situ hybridization (PLISH).
As demonstrated below, PLISH generates data of exceptionally high signal-to-noise. Multiplexed hybridization and signal amplification of all target RNA species is carried out in a single parallel reaction, and the RNAs are then localized with rapid label-image-erase cycles. PLISH exhibits high detection efficiency because it probes multiple sites in each target RNA, and high specificity because of the proximity ligation mechanism. PLISH utilizes only commodity reagents, so it can be scaled up inexpensively to cover many genes. It works well on conventional formalin-fixed tissues that have been cryo- or paraffin-embedded, and can be performed concurrently with immunostaining, making it extremely versatile.
Using the murine lung as a characterized model tissue, we show that multiplexed PLISH can rediscover and spatially map the distinct cell types of a tissue in an automated and unsupervised fashion. An unexpected discovery from this experiment is that murine Club cells separate into two populations that differ molecularly and segregate anatomically. PLISH constitutes a novel, single cell spatial-profiling technology that combines high performance, versatility and low cost. Because of its technical simplicity, it will be accessible to a broad scientific community.
Results
Proximity Ligation In Situ Hybridization (PLISH)
Proximity ligation at Holliday junctions offers a simple mechanism for the amplified detection of RNA (Labib et al., supra). First, a transcript is targeted with a pair of oligonucleotide ‘H’ probes designed to hybridize at adjacent positions along its sequence (
To implement PLISH, we adapted protocols for antibody-based proximity ligation (Soderberg et al., supra). The technique utilizes conventional oligonucleotides, two commercially available enzymes, and procedures familiar to molecular biologists. The ligase and polymerase enzymes are less than half the size of an immunoglobulin G, and they diffuse at least as rapidly as the 60 mer DNA hairpins used for HCR amplification (Choi et al. (2014) ACS Nano 8:4284-4294; Joubert et al. (2003) Journal of Biological Chemistry 278:25341-25347; Lapham et al. (1997) Journal of Biomolecular NMR 10:255-262; Modrich et al. (1973) The Journal of Biological Chemistry 248:7495-7501). Our initial studies produced bright puncta that were absent if any of the oligonucleotide or enzyme reagents was withheld. The signal from the individual RCA amplicons exceeded cellular and tissue fluorescence background by more than 30-fold, rendering autofluorescence inconsequential (
Highly Specific and Sensitive Detection of RNA Transcripts
The requirement for coincident hybridization of two probes at adjacent sites in an RNA transcript should make PLISH highly specific. To evaluate this, we performed several experiments. First, we used PLISH to detect the transcription factor SRY-box 4 (SOX4) in cultured HCT116 cells. A pool of ten H probe sets exhibited much higher RNA detection efficiency than a single H probe set, as expected (
Second, we tested the sequence-specificity of the PLISH signal in tissue by pre-incubating samples with antisense ‘blocking’ oligonucleotides complementary to the target RNA at the H probe hybridization sites. For these experiments, we stained mouse lung sections for secretoglobin 1a1 (Scgb1a1), a marker of airway Club cells. Antisense oligonucleotides drastically attenuated the number of PLISH puncta, whereas scrambled blocking oligonucleotides of the same length had no apparent effect (
Third, we analyzed murine lung sections for the co-localization of the mRNA transcript and protein product of surfactant protein C (Sftpc), which is expressed in alveolar epithelial type II (AT2) cells. Of the cells that were positive for PLISH signal, 98.5% were also positive for antibody staining (n=184,
To quantify the sensitivity and accuracy of RNA detection, we benchmarked PLISH measurements against a reference-standard dataset of single-cell, quantitative reverse transcription polymerase chain reaction (qPCR) and RNA-seq measurements on HCT116 cells (Wu et al. (2014) Nature Methods 11:41-46). For genes with fragment-per-kilobase-per-million-read (FPKM) values greater than one, the single-cell qPCR technique detected mRNA in >90% of the cells (
To quantify RNA-detection efficiency in tissue, we marked a set of axin 2 (Axin2) transcripts in mouse lung sections using an HCR-amplified smISH procedure (Choi et al. (2014) ACS Nano 8:4284-4294) and then determined the fraction of the marked transcripts that could be identified by PLISH. We chose the Axin2 gene because of its low expression level in the lung. HCR detected a sparse population of cells with one to two puncta each (the HCR detection efficiency was low because we used a single HCR probe rather than 24). PLISH puncta generated with a pool of four H probe pairs co-localized with 32% of the HCR puncta (
Visualization of Molecular and Histological Features in Tissue
We next characterized the performance of PLISH for low-plex RNA localization in tissues. This experimental format uses a disposable hybridization chamber that is sealed to a coverslip or slide surrounding a tissue section (
First, we analyzed murine lung sections for RNA expression of the ciliated-cell marker Forkhead box J1 (Foxj1), and the Club-cell marker Scgb1a1. Foxj1 is a low-abundance transcript with an FPKM value of 10 in ciliated cells, as measured by scRNA-seq (Treutlein et al. (2014) Nature 509:371-375). We observed single cells with multiple discrete Foxj1 puncta in the terminal bronchiolar epithelium, surrounded by numerous strongly Scgb1a1 positive cells (
Second, we analyzed human lung FFPE sections for RNA expression of SCGB1A1, and for protein expression of the basal cell marker, Keratin 5 (KRT5). To do this, we appended two antibody incubation steps to the standard PLISH protocol. Strongly SCGB1A1 positive cells were localized to the lumen of the airways, overlying KRT5 positive cells (
Third, we analyzed murine lung sections for RNA expression of three genes: the AT2 cell marker Sftpc, the macrophage-enriched marker Lysozyme 2 (Lyz2), and Scgb1a1. Overlays of the three channels provided a striking visual depiction of the different cell types. Macrophages were bright in the Lyz2 channel, but absent in the other channels (
We also evaluated how PLISH performs in primary samples of diseased human tissue, to assess whether it will be useful for molecular analysis of the many human diseases that cannot be accurately modeled in animals. One example is idiopathic pulmonary fibrosis (IPF), a fatal lung disease of unknown pathogenesis (Travis et al. (2013) American Journal of Respiratory and Critical Care Medicine 188:733-748). The diagnosis of IPF is based on the presence of specific histological features, including clusters of spindle-shaped fibroblasts, stereotyped ‘honeycomb’ cysts, and epithelial cell hyperplasia. In this regard, single-cell profiling approaches that operate on dissociated tissue (Xu et al. (2016) JCI Insight 1:e90558) are intrinsically limited because they cannot correlate molecular data with cytologic and spatial features. As a preliminary test, we used PLISH to analyze RNA expression of the AT2 cell marker SFTPC in resected lung tissue from control and IPF patients. In contrast to the uniformly cuboidal SFTPC-expressing AT2 cells distributed throughout alveoli of non-IPF lungs (
Multiplexed and Iterative PLISH in Tissues
Highly multiplexed measurement of different RNA species requires iterated data collection cycles, since conventional fluorescence microscopy only provides up to five channels (
To demonstrate and validate the multiplexing capacity of PLISH, we co-localized the mRNA of eight selected genes in 2900 single cells from an adult mouse lung (
To quantify the expression of all eight genes on a per-cell basis, we created a PLISH-specific pipeline in CellProfiler, an open-source software package (Kamentsky et al. (2011) Bioinformatics 27:1179-1180). The pipeline first identified nuclei in the DAPI channel, which were used as anchor points for expansion to full-cell assignments. Fortuitously, the bulk of the detected mRNAs in AT1 cells, which have an extremely flat and broad morphology, were clustered around the nuclei. We summed the PLISH signal for each gene in the nuclear and peri-nuclear regions of each cell, and saved the results as single-cell expression profiles indexed on anatomical location. We also created a utility to pseudocolor cells in a transmitted light micrograph according to their inferred cell type (see below), so that we could visualize the relationship between cellular gene expression and anatomical localization.
Automated Cell Classification and Insights into Lung Biology
An important scientific challenge is to identify and map all of the molecularly distinct cell types that make up complex tissues, and in situ single-cell profiling should be a powerful tool for working towards this goal. As a proof-of-concept for this, we asked whether known lung cell types could be rediscovered by an automated and unsupervised analysis of our multiplexed PLISH data set. We used two standard data analysis tools, K-means clustering (
For a higher-resolution analysis of cellular gene expression, we examined the expression pattern of individual genes in re-colored t-SNE plots (
To validate the PLISH results, we pseudocolored the cells in transmitted-light images according to their class (
Discussion
PLISH represents a practical technology for multiplexed expression profiling in tissues. It combines high performance in four key areas: specificity, detection efficiency, signal-to-noise, and speed. The specificity derives from coincidence detection, which requires two probes to hybridize next to one another for signal generation. Efficient detection of low-abundance transcripts is accomplished by targeting multiple sites along the RNA sequence. Enzymatic amplification produces extremely bright puncta and allows many different RNA transcripts to be marked with unique barcodes in one step. The different RNA transcripts can then be iteratively detected to rapidly generate high dimensional data.
While low-plex PLISH on a handful of different genes can be valuable, the PLISH technology is also scalable, without requiring specialized microscopes (or other equipment), software, or computational expertise. The oligonucleotides and enzymes are inexpensive and commercially available from multiple vendors. The H probes are the cost-limiting reagent, but can be synthesized in pools (Murgha et al. (2014) PLoS One 9:e94752; Beliveau et al. (2012) PNAS 109:21301-21306). Assuming five pairs of H probes for each target RNA species, and 20 cents for a 40 mer oligonucleotide, the cost of PLISH reagents amounts to $3 per gene. It should therefore be practical to simultaneously interrogate entire molecular systems, such as signaling pathways or super-families of adhesion receptors. The high specificity and signal-to-noise of PLISH will be advantageous for deep profiling, where non-specific background increases with increasingly complex mixtures of hybridization probes (Moffitt et al. (2016) PNAS 113:11046-11051).
Our initial studies demonstrate PLISH's capacity for rapid, automated and unbiased cell-type classification, and illustrate how it can complement single-cell RNA sequencing (sc-RNAseq). Sc-RNAseq offers greater gene depth than in situ hybridization approaches, but it is less sensitive, fails to capture spatial information, and induces artefactual changes in gene expression during tissue dissociation (van den Brink et al. (2017) Nature Methods 14:935-936; Lee et al. (2015) Nature Protocols 10:442-458). PLISH provides the missing cytological and spatial information, and it is applied to intact tissues. Going forward, sequencing can be used to nominate putative cell types and molecular states based on the coordinate expression of ‘signature genes’, and multiplexed PLISH can be used to distinguish true biological variation from technical noise and experimentally-induced perturbations. Importantly, multiplexed PLISH provides the tissue context of distinct cell populations, which is essential for understanding the higher-order organization of intact systems like solid tumors and developing organs. In diseases like IPF where morphology and gene expression are severely deranged (Xu et al. (2016) JCI Insight 1:e90558), histological, cytological and spatial features may even be essential for making biological sense of sequencing data.
Currently, efforts are underway to more deeply characterize cellular states by integrating diverse types of molecular information. We have already demonstrated the combined application of PLISH with conventional immunostaining. Going one step further, oligonucleotide-antibody conjugates make it possible to mix and match protein and RNA targets in a multiplexed format (Weibrecht et al. (2013) Nature Protocols 8:355-372). The generation of comprehensive, multidimensional molecular maps of intact tissues, in both healthy and diseased states, will have a fundamental impact on basic science and medicine.
Materials and Methods
Materials
Unless otherwise specified, all reagents were from Thermo-Fisher and Sigma-Aldrich. Oligonucleotides were purchased from Integrated DNA Technologies. T4 polynucleotide kinase, T4 ligase, USER enzyme and their respective buffers were purchased from New England Biolabs. Nxgen phi29 poly-merase and its buffer were purchased from Lucigen.
Abbreviations: BSA, bovine serum albumin; DAPI, 4,6-diamidino-2-phenylindole; DEPC, diethyl-pyrocarbonate; EDTA, ethylenediaminetetraacetic acid; min, minutes; PBS, phosphate buffered saline; PFA, paraformaldehyde; RCA, rolling circle amplification; RT, room temperature. All oligonucleotide sequences are listed in Table 1.
Sample Preparation
HCT116 cells (ATCC; CCL-247) were authenticated by HLA typing and confirmed negative for Mycoplasma contamination using PCR. Cells were grown on poly-lysine coated #1.5 coverslips (Fisher-brand 12-544 G) using standard cell culture protocols until they reached the desired confluency. The cells were rinsed in 1× PBS and fixed in 3.7% formaldehyde with 0.1% DEPC at RT for 20 minutes. The fixed cells were treated with 10 mM citrate buffer (pH 6.0) at 70° C. for 30 minutes, dehydrated in an ethanol series, then enclosed by application of a seal chamber (Grace Biolabs 621505) to the coverslip.
Lungs were collected from adult B6 mice (Jackson Labs) and fixed by immersion in 4% PFA as previously described (Desai et al. (2014) Development 143:3632-3637). Non-IPF human lung tissue was obtained from a surgical resection, and IPF tissue from an explant. All mouse and human research were approved by the Institutional Animal Care and Use Committee and Internal Review Board, respectively, at Stanford University. The tissues were fixed by immersion in 10% neutral buffered formalin in PBS at 4° C. overnight under gentle rocking, cryoprotected in 30% sucrose at 4° C. overnight, submerged in OCT (Tissue Tek) in an embedding mold, frozen on dry ice, and stored at −80° C. 20 mm sections were cut on a cryostat (LeicaCM 3050S) and collected on either poly-lysine coated #1.5 coverslips or glass slides (Fisherbrand Superfrost), air dried for 10 minutes, and post-fixed with 4% PFA at RT for 20 minutes. The human lung tissue in
PLISH Probe Design and Preparation
Target RNAs were probed at ˜40 nucleotide detection sites, with 1 to 10 sites per RNA species depending on expression level. NCBI BLAST searches were used to eliminate detection sites that shared 10 or more contiguous nucleotides with a non-target RNA. The detection sites were also selected to minimize self-complementarity as indicated by the IDT oligo analyzer. Each detection site was targeted with a pair of H probes designated HL (left H probe) and HR (right H probe). The HL and HR probes included ˜20 nucleotide binding sequences that were complementary respectively to the 5′ and 3′ halves of the detection site. The binding sequences were chosen so that the 5′ end of the HL binding sequence and the 3′ end of the HR binding sequence would abut at a 5′-AG-3′ or a 5′-TA-3′ dinucleotide in the target RNA. The lengths of the binding sequences were adjusted so that the melting temperature of the corresponding DNA duplex would fall between 45-65° C. as computed by IDT Oligo analyzer using default settings of 0.25 mM oligo concentration and 50 mM salt concentration. To generate H probes, suitable HL and HR binding sequences were catenated at their respective 5′ and 3′ ends with overhang sequences taken from one of eight modular design templates (Table 1). The left and right overhang sequences in each design template were complementary to a specific bridge (B) and circle (C) oligonucleotide, which directed a desired fluorescent readout. The design templates reported here utilized a common 31 base oligonucleotide for the bridge. Following previous work (Soderberg et al., supra), the circle oligonucleotides were ˜60 bases long with 11 base regions of complementarity to cognate H probes on either end. The circle sequences were chosen to minimize self-complementarity. Each imager oligonucleotide was complementary to a barcode embedded in one of the C oligonucleotides, allowing unique detection of the corresponding RCA amplicon.
The H-probe oligonucleotides were ordered on a 25 nanomole scale with standard desalting. The B and C oligonucleotides were ordered on a 100 nanomole scale with HPLC purification and phosphorylated with T4 polynucleotide kinase according to the manufacturer's recommendations. Imager oligonucleotides were purchased either as HPLC-purified fluorophore conjugates (A488, Texas Red, Cy3, Cy5), or as amine-modified oligonucleotides that were subsequently coupled to Pacific Blue-NETS ester according to the manufacturer recommendations.
PLISH Barcoding Procedure
Six buffers were used for PLISH barcoding: H-probe buffer (1M sodium trichloroacetate, 50 mM Tris pH 7.4, 5 mM EDTA, 0.2 mg/mL Heparin), bridge-circle buffer (2% BSA, 0.2 mg/mL heparin, 0.05% Tween-20, 1× T4 ligase buffer in RNAse-free water), PBST (PBS+0.1% Tween-20), ligation buffer (10 CEU/μl T4 DNA ligase, 2% BSA, 1× T4 ligase buffer, 1% RNaseOUT and 0.05% Tween-20 in RNAse-free water), labeling buffer (2×SSC/20% formamide in RNAse-free water), and RCA buffer (1 U/μl Nxgen phi29 polymerase, 1× Nxgen phi29 polymerase buffer, 2% BSA, 5% glycerol, 10 mM dNTPs, 1% RNaseOUT in RNAse-free water).
An H cocktail was prepared by mixing H probes in H-probe buffer at a final concentration of 100 nM each. If an RNA was targeted with more than five probe sets, the concentrations of the H probes for that RNA were pro-rated so that their sum did not exceed 1000 nM. A BC cocktail was also prepared by mixing B and C oligonucleotides in bridge-circle buffer at a final concentration of 6 μM each.
Single-step barcoding was performed in sealed chambers. The workflow consisted of three steps: (i) The sample was incubated in the H cocktail at 37° C. for 2 hours. The sample was then washed 4 x 5 minutes with H-probe buffer at RT, and incubated in the BC cocktail at 37° C. for 1 hour. (ii) Following a 5 minutes wash with PBST at RT, the sample was incubated in ligation buffer at 37° C. for 1 hour. (iii) The sample was washed 2×5 minutes with labeling buffer at RT, and washed with 1× Nxgen phi29 polymerase buffer at RT for 5 minutes. The sample was then incubated in RCA buffer at 37° C. for 2 hours (typical for cultured cells) to overnight (typical for tissue). Finally, the sample was washed 2×5 minutes with labeling buffer.
Imaging
Barcoded PLISH samples were fluorescently labeled by two different procedures, designated ‘washout’ and ‘fast’. In the washout procedure, the sample was incubated with imager oligonucleotides in imager buffer (labeling buffer with 0.2 mg/mL heparin) at a final concentration of 100 nM each for 30 minutes, and then washed 2×5 minutes with PBST at RT. In the fast procedure, the sample was incubated for 5 minutes with imager oligonucleotides in imager buffer at a final concentration of 3 nM each, and then imaged immediately. Samples that did not require label-image-erase cycles were stained with DAPI (stock 1 mg/ml; final concentration—1:1000 in PBS) for 5 minutes and mounted in H-1000 Vec-tashield mounting medium (Vector).
Data were collected by confocal microscopy (Leica Sp8 and Zeiss LSM 800) using a 40× oil immersion or a 25× water immersion objective lens. 20 μm z-stacks were scanned, and maximum projection images were saved for analysis. For 5-color experiments, DAPI was added after the Pacific Blue channel had been imaged, and the Texas Red and Cy3 channels were linearly unmixed using Zeiss software. Transmitted light images were acquired on a Leica Sp8 confocal microscope using the 488 nm Argon laser and the appropriate PMT-TL detector. Images from serial rounds of data collection were aligned using the nuclear stain from each round as a fiducial marker. Unless otherwise stated, imaging data of cells and mouse lung tissue are representative of three independent experiments with ˜4 fields of view each. Imaging data of human lung tissue are representative of two independent experiments with ˜4 fields of view each.
PLISH and HCR Co-Localization
HCR was performed following a published protocol (Choi et al., supra) with probes that targeted two sites covering nucleotides 621-670 and 1159-1208 in the mouse Axin2 transcript, and Alexa-Fluor 488-/AlexaFluor 647-labeled amplifier oligonucleotides. The samples were then processed for PLISH with H probes targeting four sites covering nucleotides 347-386, 1878-1917, 2412-2451 and 2956-2995 in the Axin2 transcript, and imaged using a Cy3-labeled imager oligonucleotide.
PLISH with Concurrent Immunohistochemistry
PLISH barcoding was performed as described above. Subsequently, the sample was washed 3×5 minutes with PBST at RT, and incubated in blocking solution (50 ml/ml [5%] normal goat serum, 1 ml/ml [0.1%] Triton X-100, 5 mM EDTA and 0.03 g/ml [3%] BSA in PBS) at RT for 1 hour. The sample was then incubated with primary antibody (Rabbit anti-pro-Sftpc, Millipore, 1:500 or Rabbit anti-Cytokeratin 5, Abcam Ab193895, 1:400) in blocking solution at 37° C. for 2 hours under gentle rocking, washed 4×5 minutes with PBST at RT, and incubated with secondary antibody (Goat anti-Rabbit-Cy5, Jackson Lab, 1:250) and DAPI (1:1000) in blocking solution at RT for 1 hour. The sample was washed 3×5 minutes in PBST at RT and mounted in H-1000 Vectashield.
Antisense Blocking Oligonucleotide
Mouse lung tissue cryosections were collected on slides, post-fixed and processed as described above. The samples were incubated with a 60-base oligonucleotide complementary to nucleotides 219-278 in the Scgb1a1 mRNA, or with a scrambled 60-base oligonucleotide, at 100 nM final concentration in H-probe buffer at 37° C. for 2 hours. The samples were then washed 2×5 minutes with H-probe buffer at RT, and processed for PLISH using H probes that targeted nucleotides 229-268 in the Scgb1a1 transcript.
Signal Erasure for Iterative Cycles of PLISH
To perform enzymatic erasure, 15-20 base imager oligonucleotides were ordered with the dT nucleotides replaced by dU nucleotides. Following imaging, the signal was erased by incubating the sample with 0.1 U/mL USER enzyme in 1× USER enzyme buffer at 37° C. for 20 minutes, followed by washing 2×3 minutes with PBST at RT. To perform rapid erasure, short 10-11 base oligonucleotides were ordered. Following imaging, the signal was erased by incubating the sample with PBST at 37° C. for 15 minutes.
Correlative Immunostaining
Lungs collected from B6 and the Lyz2+/EGFP mouse strains (Faust et al., (2000) Blood 96:719-726) were fixed and immunostained as whole mounts as previously described (Desai et al., supra). Primary antibodies were chicken anti-GFP (Abcam ab13970), rat anti-Ecad/Cdh1 (Invitrogen ECCD-2), goat anti-Scgb1a1 (gift from Barry Stripp), rabbit anti-pro-Sftpc (Chemicon AB3786), and rat anti-Ager (R and D MAB1179). Fluorophore-conjugated secondary antibodies raised in Goat (Invitrogen) or Donkey (Jackson Labs) were used at 1:250 and DAPI at 1:1000.
Data Analysis
FIJI was used to pseudocolor unprocessed micrographs for display as three-color overlays. A custom CellProfiler (Kamentsky et al. (2011) Bioinformatics 27:1179-1180) pipeline was created to measure RNA signal intensities at the single-cell level. Briefly, the centers of cell nuclei were first identified as maxima in a filtered DAPI image, and associated with a numerical index. Nuclear boundaries were assigned by a propagation algorithm, and then expanded by ˜1 micron to define sampling areas. The following data were then recorded: (i) average pixel intensities for each data channel over each sampling area; (ii) the coordinates of the sampling areas; (iii) shape metrics for the corresponding nuclei; and (iv) an image with the boundary pixels of each nucleus set equal to the associated index value. For each RNA species, the PLISH data were first normalized onto a 0:10 scale by dividing through by the largest value observed in any cell over all of the fields of view, and then multiplying by ten. The data were then log-transformed onto a −1:1 scale by the operation: transformed_data=log(0.1+normalized_data). Custom Matlab scripts were used to perform hierarchical clustering of the log-transformed single-cell expression profiles, to generate heatmaps, and to create images with the boundary pixels of each nucleus colored according to a cluster assignment. Custom R scripts were used for k-means clustering and to make t-SNE projection plots.
While the preferred embodiments of the invention have been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US18/23846 | 3/22/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62475090 | Mar 2017 | US |