This specification relates generally to automated systems and methods for associating single cell imaging with whole genome RNA transcription profiling.
Recent advances in microfluidics and cDNA barcoding have led to a dramatic increase in the throughput of single-cell RNA-Seq (scRNA-seq)[1-5]. However, unlike earlier or less scalable techniques[6-8], these new tools do not offer a straightforward way to directly link phenotypic information obtained from individual, live cells to their expression profiles. Nonetheless, microwell-based implementations of scRNA-seq are compatible with a wide variety of phenotypic measurements including live cell imaging, immunofluorescence, and protein secretion assays[3, 9-12]. These methods involve co-encapsulation of individual cells and barcoded RNA capture beads in arrays of microfabricated chambers. Because the barcoded beads are randomly distributed into microwells, one cannot directly link phenotypes measured in the microwells to their corresponding expression profiles.
The present disclosure provides automated systems and methods for associating single cell imaging data with whole genome RNA transcription profiling.
This specification describes methods and systems for automated single cell imaging and sample preparation that enable association of single cell imaging data with RNA transcriptomics. An example system includes an instrument assembly comprising a fluidics subsystem, a thermal subsystem, and an imaging subsystem including a motorized stage configured for holding and scanning a microwell array. The system includes a control subsystem coupled to the instrument assembly, and the control subsystem is configured for performing operations. The operations include flowing, using the fluidics subsystem, a plurality of cells onto the microwell array, wherein a subset of the cells reside as single cells in the microwells and obtaining, for each position in the microwell array, one or more first images at the position using the imaging subsystem. The control subsystem is configured for flowing, using the fluidics subsystem, microbeads having a cell identifying optical barcode sequence and an RNA binding sequence onto the microwell array, wherein a subset of the beads reside as a single cell-bead pair in the microwells. The control subsystem is configured for flowing, using the fluidics subsystem, a cell lysis buffer and one or more reagents for RNA library preparation onto the microwell array. The control subsystem is configured for flowing, using the fluidics subsystem, a first of N pools of a plurality of optical hybridization probes onto the microwell array and hybridizing the probes to the beads located therein having a complementary nucleotide sequence in the cell identifying optical barcode sequence attached thereto. The control subsystem is configured for obtaining, for each position, one or more second images to quantify a fluorescent intensity at the position using the imaging subsystem, each of the one or more second images used to create a binary code depicting a match or a lack of a match between at least one of the optical hybridization probes and the cell identifying optical barcodes. The control subsystem is configured for repeating the flowing and hybridizing step and obtaining of the one or more second images step for each of the N pools of probes. The control subsystem is configured for determining, by mapping the binary code for each of the N pools of probes to the cell identifying barcode sequence, for each position the cell identifying optical barcode for the position using the second images and storing a data association between the cell identifying optical barcode for the position and the first image at the position.
An example method includes an automated method for associating single cell imaging data with RNA transcriptomics. The method includes flowing, using a fluidics subsystem, a plurality of cells onto a microwell array, wherein a subset of the cells reside as single cells in the microwells; obtaining, for each position of a plurality of positions in a microwell array, one or more first images at the position using an imaging subsystem; flowing, using the fluidics subsystem, a plurality of microbeads having a cell identifying optical barcode sequence and an RNA binding sequence onto the microwell array, wherein a subset of the beads reside as a single cell-bead pair in the microwells; flowing, using the fluidics subsystem, a cell lysis buffer and one or more reagents for RNA library preparation onto the microwell array; flowing, using the fluidics subsystem, a first of N pools of a plurality of optical hybridization probes onto the microwell array and hybridizing the probes to the beads located therein having a complementary nucleotide sequence in the cell identifying optical barcode sequence attached thereto. The control subsystem is configured for obtaining, for each position of the plurality of positions, one or more second images to quantify a fluorescent intensity at the position using the imaging subsystem, each of the one or more second images used to create a binary code depicting a match or a lack of a match between at least one of the optical hybridization probes and the cell identifying optical barcodes; repeating the flowing and hybridizing step and obtaining of the one or more second images step for each of the N pools of probes; determining, by mapping the binary code for each of the N pools of probes to the cell identifying barcode sequence, for each position of the plurality of positions, the cell identifying optical barcode for the position using the second images and storing a data association between the cell identifying optical barcode for the position and the first image at the position; and storing, for each position of the plurality of positions, after receiving nucleic acid sequencing data for each cell identifying optical barcode, a data association between the nucleic acid sequencing data, the cell identifying optical barcode, and the first image associated with the cell identifying optical barcode.
The computer systems described in this specification may be implemented in hardware, software, firmware, or any combination thereof. In some examples, the computer systems may be implemented using a computer readable medium having stored thereon computer executable instructions that when executed by the processor of a computer control the computer to perform steps. Examples of suitable computer readable media include non-transitory computer readable media, such as disk memory devices, chip memory devices, programmable logic devices, and application specific integrated circuits. In addition, a computer readable medium that implements the subject matter described herein may be located on a single device or computing platform or may be distributed across multiple devices or computing platforms.
An example method is provided for identifying a correspondence between single cell optical phenotypes and cell type, lineage, or clone. The method includes: initializing a system, the system comprising: an instrument assembly comprising a fluidics subsystem, a thermal subsystem, and an imaging subsystem, wherein the imaging subsystem comprises a stage configured for holding a microwell array; a control subsystem coupled to the instrument assembly, the control subsystem comprising at least one processor and memory; and using the control subsystem for performing operations. The operations including flowing, using the fluidics subsystem, a plurality of cells onto the microwell array, wherein a subset of the cells reside as single cells in the microwells; obtaining, for each position of a plurality of positions in the microwell array, one or more first images at the position using the imaging subsystem and measuring one or more of a cell optical phenotypic feature; flowing, using the fluidics subsystem, a plurality of microbeads having a cell identifying optical barcode sequence and an RNA binding sequence onto the microwell array, wherein a subset of the beads reside as a single cell-bead pair in the microwells; flowing, using the fluidics subsystem, a cell lysis buffer and one or more reagents for RNA library preparation onto the microwell array; flowing, using the fluidics subsystem, a first of N pools of a plurality of optical hybridization probes onto the microwell array and hybridizing the probes to the beads located therein having a complementary nucleotide sequence in the cell identifying optical barcode sequence; obtaining, for each position of the plurality of positions, one or more second images to quantify a fluorescent intensity at the position using the imaging subsystem, each of the one or more second images used to create a binary code depicting a match or a lack of a match between at least one of the optical hybridization probes and the cell identifying optical barcodes; repeating the flowing and hybridizing step and obtaining of the one or more second images step for each of the N pools of probes; determining, by mapping the binary code for each of the N pools of probes to the cell identifying barcode sequence, for each position of the plurality of positions, the cell identifying optical barcode for the position, and storing a data association between the cell identifying optical barcode for the position and the first image at the position; storing, for each position of the plurality of positions, after receiving nucleic acid sequencing data for each cell identifying optical barcode, a data association between the nucleic acid sequencing data, the cell identifying optical barcode, and the first image associated with the cell identifying optical barcode. The method includes generating a representation of the relationship between the one or more cell optical phenotypic features and the nucleic acid sequencing data associated with each of the first images, wherein a correlation between the single cell phenotypic features and the associated sequencing data identifies a correspondence between single cell optical phenotypes and cell type, lineage, or clone based on transcriptomics of that single cell.
The automated system and methods of the present disclosure can be used for preparation of nucleic acid sequencing libraries in addition to preparation of RNA libraries. For example, a plurality of microbeads having a cell identifying optical barcode sequence and a primer sequence for capture of cellular nucleic acid can be flowed onto the microwell array. The primer sequence can be an oligo(dT) to capture RNA, mRNA, and non-coding RNA; a random sequence to capture any DNA or RNA; or a specific sequence targeted to a DNA loci or an RNA transcript. In this manner the automated system is provided for associating single cell imaging with unique optical barcode readout, and preparation of nucleic acid libraries. Similarly, an automated method is provided for associating single cell imaging data with nucleic acid sequencing data. In addition, a method for identifying a correspondence between single cell optical phenotypes and cell type, lineage, or clone is provided, where a correlation between the single cell phenotypic features and the associated sequencing data identifies a correspondence between single cell optical phenotypes and cell type, lineage, or clone based on nucleic acid sequence of that single cell.
Among the commercially available systems for single cell isolation and next generation sequencing (NGS) sample preparation, none are capable of associating a single cell image with a unique optical barcode readout, and preparation of single cell RNA libraries to enable association of single cell phenotypic data with RNA transcriptomics. This specification describes methods and systems which will allow high-quality multi-channel fluorescent imaging combined with automated single cell, whole transcriptome RNA library preparation, e.g., of several thousand single cells per 4-5 hour run. The system can establish single cell whole transcriptome sequencing (‘RNA-Seq’) data quality metrics. In operation, the system automates a capture of single cell images, association of a single cell image with a corresponding unique optical barcode readout (based on a unique cell identifying optical barcode sequence), and next generation sequencing (NGS) sample preparation method, referred to as Single Cell Optical Phenotyping and Expression Sequencing or SCOPESeq.
In the automated cell imaging and RNA library sample preparation system of the present disclosure, single cells are isolated into individual reaction chambers of a microwell array along with a microbead having a plurality of oligonucleotides conjugated on its surface. Each oligonucleotide includes a cell identifying optical barcode sequence that is unique to that bead as well as an RNA binding sequence for RNA capture after cell lysis. The ‘cell identifying optical barcode sequence’ is also referred to herein interchangeably as a ‘cell identifying optical barcode’. The microbeads having the cell identifying optical barcode and RNA binding sequence are also referred to herein interchangeably as ‘mRNA capture beads’ or ‘RNA capture beads’ or ‘microbeads’ or in some instances ‘beads’. The oligonucleotides on the microbeads can include an adapter sequence for sequencing (e.g., for sequencing on Illumina platforms) (otherwise referred to as ‘PCR handle’). The microbeads having the cell identifying optical barcode and the complementary optical hybridization probes of the present disclosure are described in U.S. Pat. Application PCT/US2016/034270, filed on May 26, 2016, and published as WO 2016/191533 and U.S. Pat. Application PCT/US2018/62650, filed on Nov. 27, 2018, and published as WO 2019/104337, which are hereby incorporated by reference in their entireties. The system is configured for flowing optical hybridization probes that are complementary to the cell identifying optical barcodes and labeled with an optical label, such as a fluorophore, onto the microwell array and for obtaining images of the microwells in response to the probes. The system and unique cell identifying optical barcodes and complementary optical hybridization probes facilitate a link between phenotypic imaging of cells resident on the microwell array with single cell whole transcriptome sequencing.
Typically, the user 108 would load the microwell array 112 into the optional adapter plate and place it into the system 100. The system 100 would flow cells from an input reservoir into the microwell array 112 and allow the cells to settle into individual microwells. The system 100 provides scanning, image analysis, and an RNA library sample preparation protocol. Sample preparation can include controlling fluidics and thermal subsystems.
The controller 124 is programmed for identifying microwells that each contain a single cell. The controller 124 can be programmed for identifying other relevant features in images of the cells within the microwells.
The controller 124 is programmed for causing the system 100 to automate the SCOPESeq process as described below with reference to
The instrument assembly 104 can include a digital camera 140 or other appropriate imaging device, a communications hub (e.g., USB Hub 142), a fluorescence light emitting diode (LED) engine 144, and a light guide 146. The light guide 146 delivers the fluorescence excitation light from the LED engine to the microscope. Alternate configurations include a fiber optic bundle or even direct coupling of the LED engine to the microscope optical train.
The fluorescence LED engine 144 can include multiple narrow-band LEDs configured to illuminate the microwell array 112 by way of the light guide adapter 146.
The instrument assembly 104 includes a microscope subsystem (e.g., an internal inverted microscope) including a motorized XY stage 148 and an autofocus motor 150 configured for translating a microscope objective 152. Typically, the camera 140 and the fluorescence LED engine 144 and microscope subsystem are arranged in an epi-fluorescence configuration. The instrument assembly 104 includes a bright-field LED 158 for illuminating the microwell array 112 during imaging.
The instrument assembly 104 includes a microfluidic subsystem and a thermal subsystem 152. The thermal subsystem 152 can include, for example, a stage heater on the XY stage 148 and a thermal control system for controlling the stage heater. The microfluidic subsystem includes a pump, a pressure controller, and a fluidic manifold. The microfluidic subsystem includes various appropriate valves, for example, a 6-way valve and a 24-reagent valve for application of reagents from a reagent cartridge. The controller 124 is programmed to control the microfluidic subsystem and the thermal subsystem to automate the SCOPE-seq process as described further below with reference to
In some examples, the microfluidic subsystem is configured for microfluidic flow control of, e.g., eighteen different reagents to fulfill the biochemical reactions of the SCOPEseq process. In addition, various flow rates can be used from, e.g., 10 µL/min to 200 µL/min that are controlled within 5 µL/min of the set point.
The microfluidic subsystem can include a flow rate unit configured for accurate and simple flow rate measurement capability that is compatible with a variety of reagents that range from organic to aqueous to fluorinated oil. The unit can have measurement feedback capabilities to the flow rate controller that will provide accurate flow rate control throughout the microfluidic subsystem.
The microfluidic subsystem can include a flow control unit configured for pulse-free flow to facilitate fluidic movement without cell shear stress. This unit can have a millisecond response time between reagent switching and bubble-free fluidic flow.
The microfluidic subsystem can include valving units, e.g., two sets of unique valving units. First, a multi-way bidirectional valve that can multiplex with a second multi-way valve can be used to switch between different reagents to flow into the microchip. These switch units have millisecond response time to rapidly adjust to new reagent flow. This will provide appropriate flow responses for microwell sealing with fluorinated oil. Second, multi-way valves may be used to direct reagents from the output port of the microchip to sample collection or waste reservoirs. The multi-way valving units will also eliminate any hydrostatic flow, providing a pressurized flow cell which will be necessary for imaging and heating.
The microfluidic subsystem can include pressurized reagent reservoirs. For instance, reagent cartridges can be used that ensure appropriate sealing of the reagents, as well as maintaining sufficient pressurized environments for fluid flow into the microfluidic subsystem.
The thermal subsystem can include one or more Peltier units that can heat and cool throughout a workflow to provide constant temperature control when necessary to facilitate appropriate conditions for various biochemical assays. In some examples, the thermal subsystem includes a proportional, integral, derivative (PID) thermal control unit, e.g., with accuracy with 1° C., to facilitate proper PID feedback to the Peltier units to set and control appropriate assay temperatures. In some examples, the thermal subsystem includes a stage heater integrated with the XY stage, e.g., as shown in
Cells are first flowed onto the microwell array to provide a random distribution with a relatively large fraction of cells residing singly in a given microwell. Cells can be imaged on the microwell array at this time to collect phenotypic data as well as to determine those microwells containing a single cell. Cells can be stained in any manner as would be understood by those of ordinary skill in the art to facilitate collection of phenotypic information. Microbeads are then flowed into the chamber. The size of the wells and size of the beads are harmonized to ensure only one bead can reside in a given microwell, and a concentration of beads is used such that greater than, e.g., 75%, 80%, 85%, or 95% of wells contain a single bead.
Lysis buffer can then be flowed onto the microwell array, immediately followed by perfluorinated oil. The oil effectively “seals” each microwell from aqueous cross-contamination. RNA is then captured by the beads after lysis and reverse transcriptase mix can then be flowed onto the microwell array. At this point, the RNA captured on the beads has been reverse transcribed to cDNA and the complementary optical hybridization probes can be flowed in and imaged to determine bead-cell linkage. The data association between the cell identifying optical barcode for the microwell position and the first image at the position is stored by the system and used to link the cell images taken prior to library preparation to the genomic (or transcriptomic) data generated during sequencing.
For example, consider the following discussion of an example method for optical demultiplexing described in Example 5. In this example method, 96 out of 256 possible binary codes are used (see
To decode the cell barcode sequences from imaging, a ‘cycle-by-cycle’ method can be used, which calls the binary code for each bead based on the bimodal distribution of intensity values across all beads in each hybridization cycle. This method works well when the bead fluorescence intensity values of the ‘one’ state population are well separated from that of the ‘zero’ state population. However, because the beads exhibit auto-fluorescence at shorter wavelengths, the two populations are not clearly separated in the Cy3 emission channel.
To accurately decode the cell barcode sequences from imaging, the system can utilize a modified ‘bead-by-bead’ fluorescence intensity analysis strategy. The cell barcode sequences of each bead are determined by sorting the eight intensity values in ascending order, calculating the relative intensity change between each pair of adjacent values, establishing a threshold based on the largest relative intensity change to assign a binary code, and mapping the binary code to the actual cell barcode sequence (see
Example 5 describes a comparison of the cycle-by-cycle and bead-by-bead methods. In dataset PJ070 and PJ069, 46% and 57% scRNA-seq profiles are linked with cell images using the ‘bead-by-bead’ method in comparison to only 24% and 37% using the ‘cycle-by-cycle’ method. In both datasets, at least a 20% increase is observed in the fraction of linked cells with the ‘bead-by-bead’ method (
When the oil is washed out after lysis, the lysate is completely removed from the microwells, showing a dark response while imaging. This QC step confirms that the microwell array has been washed successfully and that the RT mix has the ability to be in contact with every bead (the RNA is attached to the beads at this point and therefore cannot be washed out or result in cross contamination). After completion of the system 100 operations, the beads are removed and can be pooled for further cDNA library preparation including DNA amplification followed by nucleic acid sequencing. An electropherogram in
The method 800 includes flowing cells onto the microwell array of the system 100 (802) and obtaining, for each position in the microwell array, one or more first images at the position using an imaging subsystem (804). The first images can depict, e.g., cells loaded into the microwells of the array and information about the phenotype of the cells. Each image is associated with a corresponding position of the microwell in the array. The position can be specified, e.g., as an X-Y coordinate on the microwell array. In some examples, the method 800 includes determining, for each position, a number of cells depicted in a microwell corresponding to the position using the first image of the position. This allows for downstream elimination of data for microwells containing more than one cell.
The method 800 includes flowing, using a fluidics subsystem, RNA capture beads having attached cell identifying optical barcode sequences onto the microwell array (806). The method 800 includes flowing, using the fluidics subsystem, a lysis buffer onto the microwell array and imaging, using the imaging subsystem, the microwell array and performing image analysis to monitor lysis for completion within the microwells (808). The method 800 includes flowing, using the fluidics subsystem, reverse transcription mix onto the microwell array after determining completion of lysis based on performing image analysis (810).
The method 800 includes flowing, using the fluidics subsystem, a first of N pools of optical hybridization probes onto the microwell array and hybridizing the probes to the beads located therein having a complementary nucleotide sequence in the cell identifying optical barcode sequence attached thereto (812). The method 800 includes obtaining, for each position of the plurality of positions, one or more second images to quantify a fluorescent intensity at the position using the imaging subsystem, each of the one or more second images used to create a binary code depicting a match or a lack of a match between at least one of the optical hybridization probes and the cell identifying optical barcodes (814). A match can be identified where a sufficient intensity of light is identified in an image of a microwell containing a microbead after flowing the optical hybridization probe.
The method 800 includes repeating the flowing and hybridizing step and obtaining the one or more second images step for each of the N pools of probes (816).
The method 800 includes determining, by mapping the binary code for each of the N pools of probes to the cell identifying barcode sequence, for each position of the plurality of positions, the cell identifying optical barcode for the position and storing a data association between the cell identifying optical barcode for the position and the first image at the position (818). For example, determining the cell identifying optical barcode can comprise a digital value formatted such that each bit position in the value corresponds to a match or a lack of a match between an optical hybridization probe or a pool of optical hybridization probes and a cell identifying optical barcode.
In the method 800, microbeads are removed from the microwell array for sequencing. The method 800 includes storing, for each position of the plurality of positions, after receiving nucleic acid sequencing data for each cell identifying optical barcode, a data association between the nucleic acid sequencing data, the cell identifying optical barcode, and the first image associated with the cell identifying optical barcode (820).
The method 800 can include displaying a graphical user interface (GUI) for controlling various aspects of the process. For example, the GUI can provide controls for starting and stopping a run. The GUI can provide images of specified cells at various stages of a run. The GUI can present status reports during a run.
In some examples, the method 800 includes recovering the microbeads. For example, recovering the microbeads can include inverting the chip to allow the beads to settle by gravity into the flow channel. Recovering the microbeads can include flowing in a high-density fluid that will “float” the beads up into the flow channel. Recovering the microbeads can include pulsing the flow to agitate the beads out of their wells into the flow channel. Recovering the microbeads can include sonicating the beads to agitate the beads out of their wells into the flow channel. Recovering the microbeads can include chemically or optically cleaving the cDNA from the beads to allow it to be collected while the beads themselves are left behind.
To link cellular imaging with scRNA-seq from the same cell, the cell identifying optical barcode sequence on each bead is identified in the microwell array by sequential fluorescent probe hybridization. Each cell barcode (i.e. “S” and “Q” in
The accuracy of the sequencing data that can be obtained from cDNA library preparation using the automated instrument is illustrated in
Imaging of the optical hybridization probes on the automated system is described in Example 4.
The automated system and methods of the present disclosure can be used for identifying a correspondence between single cell optical phenotypes and cell type, lineage, or clone. For example, identification of relationships between imaging features and lineage identities of malignantly transformed glioblastoma (GBM) cells is described in Example 7. To demonstrate collection of paired optical and transcriptional phenotypes from human tissue samples using the cell identifying optical barcodes described herein, an experiment is performed on cells dissociated from a human GBM surgical sample and labeled with calcein AM, a fluorogenic dye that reports esterase activity. 1,954 scRNA-seq profiles are obtained and 1,110 of them linked to live cell images. Cell multiplets are removed based on imaging analysis. A large population of cells is identified with amplification of chromosome 7 and loss of chromosome 10, two commonly co-occurring aneuploidies that are pervasive in GBM, based on the gene expression. Key gene signatures that define the population are identified by computational analysis. All of the major cell types are recovered that have been previously reported from scRNA-seq of GBM including myeloid cells, endothelial cells, pericytes, malignant-transformed astrocyte-like cells, mesenchymal-like cells, oligodendrocyte-progenitor-like/neuroblast-progenitor-like cells (OPC/NPC) and cycling cells (
Malignant cells in GBM can resemble multiple neural lineages and exhibit a mesenchymal phenotype. Because malignant GBM cells are known to be highly plastic and undergo differentiation and de-differentiation, a diffusion map is used to visualize their lineage relationships. Malignant cells are selected based on aneuploidy as described above, the dimensionality of malignant cell gene expression is reduced, and the factorized data are visualized with a diffusion map, which reveals two major branches. One branch consists of astrocyte-like cells and terminates with mesenchymal-like cells, while the other branch consists of OPC/NPC cells and cycling cells. This is consistent with previously published studies showing that astrocyte-like and mesenchymal glioma cells are significantly more quiescent than OPC-like glioma cells.
To explore how imaging features of malignant cells are related to the two major cellular lineages, it is asked whether unsupervised clustering of cellular imaging features would correspond to the two major lineages observed in scRNA-seq. Malignant cells are clustered by the three imaging meta-features described above using hierarchical clustering, and two major cellular imaging clusters are identified. By plotting two imaging clusters on the diffusion map embedding of the malignant cells, it is found that cells with round shape, low intensity and small size (imaging cluster 0) are enriched in the OPC/NPC-cycling branch, and cells with rough shape, high intensity and large size (imaging cluster 1) are enriched in the astrocyte-mesenchymal branch (
An example method is provided for identifying a correspondence between single cell optical phenotypes and cell type, lineage, or clone. The method includes: initializing a system, the system comprising: an instrument assembly comprising a fluidics subsystem, a thermal subsystem, and an imaging subsystem, wherein the imaging subsystem comprises a stage configured for holding a microwell array; a control subsystem coupled to the instrument assembly, the control subsystem comprising at least one processor and memory; and using the control subsystem for performing operations. The operations including flowing, using the fluidics subsystem, a plurality of cells onto the microwell array, wherein a subset of the cells reside as single cells in the microwells; obtaining, for each position of a plurality of positions in the microwell array, one or more first images at the position using the imaging subsystem and measuring one or more of a cell optical phenotypic feature; flowing, using the fluidics subsystem, a plurality of microbeads having a cell identifying optical barcode sequence and an RNA binding sequence onto the microwell array, wherein a subset of the beads reside as a single cell-bead pair in the microwells; flowing, using the fluidics subsystem, a cell lysis buffer and one or more reagents for RNA library preparation onto the microwell array; flowing, using the fluidics subsystem, a first of N pools of a plurality of optical hybridization probes onto the microwell array and hybridizing the probes to the beads located therein having a complementary nucleotide sequence in the cell identifying optical barcode sequence; obtaining, for each position of the plurality of positions, one or more second images to quantify a fluorescent intensity at the position using the imaging subsystem, each of the one or more second images used to create a binary code depicting a match or a lack of a match between at least one of the optical hybridization probes and the cell identifying optical barcodes; repeating the flowing and hybridizing step and obtaining of the one or more second images step for each of the N pools of probes; determining, by mapping the binary code for each of the N pools of probes to the cell identifying barcode sequence, for each position of the plurality of positions, the cell identifying optical barcode for the position, and storing a data association between the cell identifying optical barcode for the position and the first image at the position; storing, for each position of the plurality of positions, after receiving nucleic acid sequencing data for each cell identifying optical barcode, a data association between the nucleic acid sequencing data, the cell identifying optical barcode, and the first image associated with the cell identifying optical barcode. The method includes generating a representation of the relationship between the one or more cell optical phenotypic features and the nucleic acid sequencing data associated with each of the first images, wherein a correlation between the single cell phenotypic features and the associated sequencing data identifies a correspondence between single cell optical phenotypes and cell type, lineage, or clone based on transcriptomics of that single cell.
In one example, the cell optical phenotypic feature is one or more of area, mean intensity, standard deviation of intensity, minimum intensity, maximum intensity, median intensity, perimeter, width, height, major axis, minor axis, circularity, Feret’s diameter, minimum Feret’s diameter, roundness, or solidity; however, the method is not limited to these cell optical phenotypic features. One advantage of this method is that a broad repertoire of cell optical phenotypic features can be measured including intracellular in addition to surface features. This contrasts with FACS, in which only changes expressed on the surface of cells can be identified.
The cell optical phenotypic feature can be derived from bright-field, dark field, fluorescence, luminescence, Raman, or scattering microscopy or other microscopies, as is understood to those of skill in the art.
In the method of identifying a correspondence between single cell optical phenotypes and cell type, lineage, or clone, the cells can comprise a tissue, a tumor, a cell culture, or any type of a bodily fluid, including, but not limited to, a blood sample, a urine sample, or a saliva sample.
In the method, the cells can be human, mammal, or animal cells. In one example, the cells are immune cells, T cells, B cells, stromal cells, stem cells, neural cells, or tumor cells.
In one example of the method of identifying a correspondence between single cell optical phenotypes and cell type, lineage, or clone, the cells are immune cells and the optical phenotypic features measured includes immunophenotyping features, such as is known to those of skill in the art to characterize the immune phenotype of an immune cell type.
In another example of the method of identifying a correspondence between single cell optical phenotypes and cell type, lineage, or clone, the cells used in the method are cells that have been subject to genetic modification. By measuring one or more cell optical phenotypic features for the gene edited cells, the goal is to identify a correspondence between the optical phenotypic features and the cell clones that either have or do not have the genetic modification. Once this correspondence is identified, the desired cell clones either positive or negative for the genetic modification can be identified by optical methods rather than requiring more expensive gene sequencing. This has applications for cells for immunotherapy as well as others. In one example, the cells that have been subject to genetic modification are stem cells, immune cells, T cells, or B cells.
In one example of an automated system of the present disclosure, the system is used for associating single cell imaging with unique optical barcode readout, and preparation of sequencing libraries other than RNA libraries. For example, the system comprising: an instrument assembly comprising a fluidics subsystem, a thermal subsystem, and an imaging subsystem, wherein the imaging subsystem comprises a stage configured for holding a microwell array; a control subsystem coupled to the instrument assembly, the control subsystem comprising at least one processor and memory, the control subsystem configured for performing operations comprising: flowing, using the fluidics subsystem, a plurality of cells onto the microwell array, wherein a subset of the cells reside as single cells in the microwells; obtaining, for each position of a plurality of positions in the microwell array, one or more first images of the cell at the position using the imaging subsystem; flowing, using the fluidics subsystem, a plurality of microbeads having a cell identifying optical barcode sequence and a primer sequence to capture cellular nucleic acid onto the microwell array, wherein a subset of the beads reside as a single cell-bead pair in the microwells; flowing, using the fluidics subsystem, a cell lysis buffer and one or more reagents for sequencing library preparation onto the microwell array; flowing, using the fluidics subsystem, a first of N pools of a plurality of optical hybridization probes onto the microwell array and hybridizing the probes to the beads located therein having a complementary nucleotide sequence in the cell identifying optical barcode sequence; obtaining, for each position of the plurality of positions, one or more second images to quantify a fluorescent intensity at the position using the imaging subsystem, each of the one or more second images used to create a binary code depicting a match or a lack of a match between at least one of the optical hybridization probes and the cell identifying optical barcodes; repeating the flowing and hybridizing step and obtaining of the one or more second images step for each of the N pools of probes; and determining, by mapping the binary code for each of the N pools of probes to the cell identifying barcode sequence, for each position of the plurality of positions, the cell identifying optical barcode for the position and storing a data association between the cell identifying optical barcode for the position and the first image at the position.
In this example of the automated system, the primer sequence designed to capture cellular nucleic acid can be an oligo(dT) to capture RNA, mRNA, and non-coding RNA; a random sequence to capture any DNA or RNA; or a specific sequence targeted to a DNA loci or an RNA transcript.
In one example, the automated system of the present disclosure can be used in a method for associating single cell imaging data with nucleic acid sequencing data, rather than for just RNA transcriptomics. For example, the method comprising: initializing a system, the system comprising: an instrument assembly comprising a fluidics subsystem, a thermal subsystem, and an imaging subsystem, wherein the imaging subsystem comprises a stage configured for holding a microwell array; a control subsystem coupled to the instrument assembly, the control subsystem comprising at least one processor and memory; and using the control subsystem for performing operations comprising: flowing, using the fluidics subsystem, a plurality of cells onto the microwell array, wherein a subset of the cells reside as single cells in the microwells; obtaining, for each position of a plurality of positions in a microwell array, one or more first images at the position using the imaging subsystem; flowing, using the fluidics subsystem, a plurality of microbeads having a cell identifying optical barcode sequence and a primer sequence to capture cellular nucleic acid onto the microwell array, wherein a subset of the beads reside as a single cell-bead pair in the microwells; flowing, using the fluidics subsystem, a cell lysis buffer and one or more reagents for sequencing library preparation onto the microwell array; flowing, using the fluidics subsystem, a first of N pools of a plurality of optical hybridization probes onto the microwell array and hybridizing the probes to the beads located therein having a complementary nucleotide sequence in the cell identifying optical barcode sequence; obtaining, for each position of the plurality of positions, one or more second images to quantify a fluorescent intensity at the position using the imaging subsystem, each of the one or more second images used to create a binary code depicting a match or a lack of a match between at least one of the optical hybridization probes and the cell identifying optical barcodes; repeating the flowing and hybridizing step and obtaining of the one or more second images step for each of the N pools of probes; determining, by mapping the binary code for each of the N pools of probes to the cell identifying barcode sequence, for each position of the plurality of positions, the cell identifying optical barcode for the position and storing a data association between the cell identifying optical barcode for the position and the first image at the position; and storing, for each position of the plurality of positions, after receiving nucleic acid sequencing data for each cell identifying optical barcode, a data association between the nucleic acid sequencing data, the cell identifying optical barcode, and the first image associated with the cell identifying optical barcode wherein the single cell imaging data is thereby associated with the nucleic acid sequence for that cell.
In the example of this automated method, the primer sequence can an oligo(dT) to capture RNA, mRNA, and non-coding RNA; a random sequence to capture any DNA or RNA; or a specific sequence targeted to a DNA loci or an RNA transcript.
In one example, the automated system of the present disclosure can be used in a method for identifying a correspondence between single cell optical phenotypes and cell type, lineage, or clone, comprising: initializing a system, the system comprising: an instrument assembly comprising a fluidics subsystem, a thermal subsystem, and an imaging subsystem, wherein the imaging subsystem comprises a stage configured for holding a microwell array; a control subsystem coupled to the instrument assembly, the control subsystem comprising at least one processor and memory; using the control subsystem for performing operations comprising: flowing, using the fluidics subsystem, a plurality of cells onto the microwell array, wherein a subset of the cells reside as single cells in the microwells; obtaining, for each position of a plurality of positions in the microwell array, one or more first images at the position using the imaging subsystem and measuring one or more of a cell optical phenotypic feature; flowing, using the fluidics subsystem, a plurality of microbeads having a cell identifying optical barcode sequence and a primer sequence to bind cellular nucleic acid onto the microwell array, wherein a subset of the beads reside as a single cell-bead pair in the microwells; flowing, using the fluidics subsystem, a cell lysis buffer and one or more reagents for sequencing library preparation onto the microwell array; flowing, using the fluidics subsystem, a first of N pools of a plurality of optical hybridization probes onto the microwell array and hybridizing the probes to the beads located therein having a complementary nucleotide sequence in the cell identifying optical barcode sequence; obtaining, for each position of the plurality of positions, one or more second images to quantify a fluorescent intensity at the position using the imaging subsystem, each of the one or more second images used to create a binary code depicting a match or a lack of a match between at least one of the optical hybridization probes and the cell identifying optical barcodes; repeating the flowing and hybridizing step and obtaining of the one or more second images step for each of the N pools of probes; determining, by mapping the binary code for each of the N pools of probes to the cell identifying barcode sequence, for each position of the plurality of positions, the cell identifying optical barcode for the position, and storing a data association between the cell identifying optical barcode for the position and the first image at the position; storing, for each position of the plurality of positions, after receiving nucleic acid sequencing data for each cell identifying optical barcode, a data association between the nucleic acid sequencing data, the cell identifying optical barcode, and the first image associated with the cell identifying optical barcode. The method includes generating a representation of the relationship between the one or more cell optical phenotypic features and the nucleic acid sequencing data associated with each of the first images, wherein a correlation between the single cell phenotypic features and the associated sequencing data identifies a correspondence between single cell optical phenotypes and cell type, lineage, or clone based on nucleic acid sequence of that single cell.
In the example method, the primer sequence can be an oligo(dT) to capture RNA, mRNA, and non-coding RNA; a random sequence to capture any DNA or RNA; or a specific sequence targeted to a DNA loci or an RNA transcript.
Accordingly, while the methods and systems have been described in reference to specific embodiments, features, and illustrative embodiments, it will be appreciated that the utility of the subject matter is not thus limited, but rather extends to and encompasses numerous other variations, modifications and alternative embodiments, as will suggest themselves to those of ordinary skill in the field of the present subject matter, based on the disclosure herein.
Various combinations and sub-combinations of the structures and features described herein are contemplated and will be apparent to a skilled person having knowledge of this disclosure. Any of the various features and elements as disclosed herein may be combined with one or more other disclosed features and elements unless indicated to the contrary herein. Correspondingly, the subject matter as hereinafter claimed is intended to be broadly construed and interpreted, as including all such variations, modifications and alternative embodiments, within its scope and including equivalents of the claims.
Device preparation. A microwell array device was fabricated from polydimethylsiloxane (PDMS), a commonly used elastomeric polymer, and stored in a humid chamber in wash buffer (20 mM Tris-HCl pH 7.9, 50 mM NaCl, 0.1% Tween-20) one day before use.
Cell preparation. Five different experiments were performed in which 4 of the experiments involved mixed mouse (3T3)/human (U87) cells and one was with U87 human cells alone. Cells were dissociated into single cell suspensions using 0.25% Trypsin-EDTA (Life Technologies, cat# 25200-072); human U87 cells were stained with Calcein AM (ThermoFisher Scientific, cat# C3100MP) and mouse 3T3 cells were stained with Calcein red-orange (ThermoFisher Scientific, cat# C34851) in 1X TBS at 37° C. for 15 minutes. The U87 and 3T3 cells were mixed at 1:1 ratio with a final total cell concentration 1000 cells/µl.
Initialize system. The microwell array device was inserted into the instrument assembly and the automated system was configured for automated cell and bead loading followed by single cell RNA sequencing library preparation. The single cell suspension was loaded into the cell loading reservoir. The beads (Chemgenes Drop-SEQ beads) were added to the bead loading reservoir. Single cell RNA-Seq library preparation reagents were loaded into the reagent reservoirs and the reagent reservoirs were attached to the instrument assembly.
The following steps were performed on the automated system:
Cell loading. After flowing Tris-buffered saline (TBS) through the device, single cells were loaded into individual microwells of the device at a density of approximately 10% (see
Cellular imaging. The cell-loaded microwell device was scanned under the bright-field and fluorescence channels (
Imaging based multiplets identification. Two-color live staining fluorescence images were merged with Calcein AM signal in green and Calcein red-orange signal in magenta. Each well was automatically examined within the smallest bounding square. Wells with mixed-species cells were determined as having at least one green object and one magenta object; wells with a single cell were determined as having only one green object or one magenta object.
Bead loading and imaging. After washing the microwell device with TBS, beads were loaded into individual microwells of the device to an approximate density of 80% as confirmed by imaging (
Cell lysis and imaging. After washing the microwell array device with TBS, lysis buffer (1% 2-Mercaptoethanol (Fisher Scientific, cat# BP176-100), 99% Buffer TCL (Qiagen, cat# 1031576)) followed by perfluorinated oil (Sigma-Aldrich, cat# F3556-25ML) was flowed into the device and incubated at 50° C. for 20 minutes to promote cell lysis. The device was imaged as a quality control step to assess the extent of cell lysis (
Image analysis. Lysis was confirmed using ImageJ to analyze images. To identify microwells, the difference was taken between the background and the bright-field image, then the threshold calculated using Otsu’s method (https://doi.org/10.1109/TSMC.1979.4310076). The threshold was used to generate a binary image, which was then dilated, and holes were filled. The binary objects were identified to create a mask of the wells to measure cell loading and lysis efficiency. After cell loading, the average fluorescence intensities of microwells in the live staining images were measured. Average intensity values followed a bimodal distribution, with the higher intensity population corresponding to microwells that contain cells. After cell lysis, the fluorescence intensity of the microwell device was measured and the lysis efficiency was calculated for wells that originally contained a cell.
Reverse transcription. Reverse transcription mixture (1X Maxima RT buffer, 1 mM dNTPs, 1 U/µL SUPERaselN, 2.5 µM template switch oligo, 10 U/µL Maxima H Minus reverse transcriptase (Thermo Fisher Scientific, cat# EP0752), 0.1% Tween-20) was flowed into the device followed by an incubation at 25° C. for 30 minutes and then at 42° C. for 90 minutes. Wash buffer supplemented with RNase inhibitor was flushed through the device.
The microwell device was removed from the instrument assembly and Exonuclease I reaction mixture (1X Exo-I buffer, 1 U/µL Exo-I (New England Biolabs, cat# M0293L)) was flowed through the device followed by an incubation at 37° C. for 45 minutes. TE/TW buffer (10 mM Tris pH 8.0, 1 mM EDTA, 0.01% Tween-20) was flushed through the device. The beads were collected and pooled for sequencing.
The pooled beads were washed sequentially with TE/SDS buffer (10 mM Tris-HCl, 1 mM EDTA, 0.5% SDS), TE/TW buffer, and nuclease-free water. cDNA amplification was performed in 50 µL PCR solution (1X Hifi Hot Start Ready mix (Kapa Biosystems, cat# KK2601), 1 µM SMRTpcr primer (Table EV5)), with 14 amplification cycles (95° C. 3 min, 4 cycles of (98° C. 20 s, 65° C. 45 s, 72° C. 3 min), 10 cycles of (98° C. 20 s, 67° C. 20 s, 72° C. 3 min), 72° C. 5 min) on a thermocycler. PCR product was purified using AMPure paramagnetic beads (Beckman, cat# A63881) with a bead-to-sample volume ratio of 0.6:1. Purified cDNA was then tagmented and amplified using the Nextera kit for in vitro transposition (Illumina, FC-131-1024). 0.8 ng cDNA was used as input per reaction. A unique i7 index primer was used to barcode the library. The i5 index primer was replaced by a universal P5 primer for the selective amplification of 5′ end of cDNA (corresponding to the 3′ end of RNA). Two rounds of SPRI paramagnetic bead-based purification with a bead-to-sample volume ratio of 0.6:1 and 1:1 were performed sequentially on the Nextera PCR product to obtain a sequencing-ready library. 20% PhiX library (Illumina, FC-131-1024) was spiked-in before sequencing on an Illumina NextSeq 500 with a 26-cycle read 1, 58 cycle read 2, and 8 cycle index read. A custom sequencing primer was used for read 1.
The sequencing data resulting from the5 experiments described above is shown is Table 1. The data show the automated system can produce high purity cDNA libraries from multiple cell types.
Sub-sampling analysis. To analyze the saturation behavior and sensitivity of scRNA-seq data, the aligned reads were randomly sub-sampled and re-processed with the scRNA-seq analysis. Two statistics are then calculated, molecules per cell and genes per cell, based on the cells that are discovered from the total reads.
Validation Data. Additional data validating the sequencing results from the mixed species experiments on the automated system are shown in
8-nt cell barcode sequences were designed using an R package ‘DNAbarcodes’ with following criteria: sequences were at least 3 Levenshtein distance from each other; sequences that contain homopolymers longer than 2 nucleotides, with GC content <40% or >60%, or perfectly self-complementary sequences were removed. Sequences were further selected based on less secondary structure formation.
The bead design is illustrated in
192 oligonucleotides that are complementary to the 8-nt cell barcodes with 3′-amino modifications were synthesized and purified (Sigma-Aldrich), then resuspended in water at 200 µM. To generate probe mixtures corresponding to each bit in the binary code, oligonucleotides labeled with ‘1’ were taken (see
The automated system steps shown in
In the present experiment, 96 out of 256 possible binary codes are used (see
To compare the ‘bead-by-bead’ optical decoding method with the ‘cycle-by-cycle’ method, two methods are tested on two datasets.
To decode the cell identifying optical barcode sequences from imaging, a ‘cycle-by-cycle’ method is used, which calls the binary code for each bead based on the bimodal distribution of intensity values across all beads in each hybridization cycle. This method works well when the bead fluorescence intensity values of the ‘one’ state population are well separated from that of the ‘zero’ state population. However, because the beads exhibit auto-fluorescence at shorter wavelengths, the two populations are not clearly separated in the Cy3 emission channel.
To accurately decode the cell barcode sequences from imaging, a modified ‘bead-by-bead’ fluorescence intensity analysis strategy is utilized. The cell barcode sequences of each bead are determined by sorting the eight intensity values in ascending order, calculating the relative intensity change between each pair of adjacent values, establishing a threshold based on the largest relative intensity change to assign a binary code, and mapping the binary code to the actual cell barcode sequence (
In dataset PJ070 and PJ069, 46% and 57% scRNA-seq profiles are linked with cell images using the ‘bead-by-bead’ method in comparison to only 24% and 37% using the ‘cycle-by-cycle’ method. In both datasets, at least a 20% increase is observed in the fraction of linked cells with the ‘bead-by-bead’ method (
The following experiment is performed to compare optical decoding methods:
Preparation. A microwell array device is filled with wash buffer (20 mM Tris-HCl pH7.9, 50 mM NaCl, 0.1% Twe20) and stored in a humid chamber one day before use. Cell culture or tissue samples are dissociated into single cell suspension and stained with desired fluorescent dyes.
Cell loading. The pre-filled microwell array device is flushed with Tris-buffered saline (TBS). The single cell suspension is pipetted into the microwell array device. After 3-minute, un-trapped cells are then flushed out with TBS.
Cellular imaging. The cell-loaded microwell device is scanned using an automated fluorescence microscope (Nikon, Eclipse Ti2) under the bright-field and fluorescence channels. Bright-field images are taken using an RGB light source (Lumencor, Lida) and wide-field 10x 0.3 NA objective (Nikon, cat# MRH00101). Fluorescence images are taken using LED light source (Lumencor, SPECTRA X), Quad band filter set (Chroma, cat# 89402), wide-field 10x 0.3 NA objective (Nikon, cat# MRH00101) with 470 nm (GFP channel) and 555 nm (TRITC channel) excitation for Calcein AM and Calcein red-orange, respectively.
scRNA-seq (steps performed on microwell device). Beads (Chemgenes) are pipetted into the microwell device, and untrapped beads are flushed out with 1x TBS. The microwell device containing the cells and the beads is connected to the computer-controlled reagent and temperature delivery system as previously described. Lysis buffer (1% 2-Mercaptoethanol (Fisher Scientific, cat# BP176-100), 99% Buffer TCL (Qiagen, cat# 1031576) and perfluorinated oil (Sigma-Aldrich, cat# F3556-25ML) is flowed into the device followed by an incubation at 50° C. for 20 minutes to promote cell lysis, and then at 25° C. for 90 minutes for RNA capture. Wash buffer supplemented with RNase inhibitor (0.02 U/µL SUPERaselN (Thermo Fisher Scientific, cat# AM2696) in wash buffer) is flushed through the device to unseal the microwells and remove any uncaptured RNA molecules. Reverse transcription mixture (1X Maxima RT buffer, 1 mM dNTPs, 1 U/µL SUPERaselN, 2.5 µM template switch oligo, 10 U/µL Maxima H Minus reverse transcriptase (Thermo Fisher Scientific, cat# EP0752), 0.1% Tween-20) is flowed into the device followed by an incubation at 25° C. for 30 minutes and then at 42° C. for 90 minutes. Wash buffer supplemented with RNase inhibitor is flushed through the device. The device is disconnected from the automated reagent delivery system. Exonuclease I reaction mixture (1X Exo-I buffer, 1 U/µL Exo-I (New England Biolabs, cat# M0293L)) is pipetted into the device followed by an incubation at 37° C. for 45 minutes. TE/TW buffer (10 mM Tris pH 8.0, 1 mM EDTA, 0.01% Tween-20) is flushed through the device.
Optical demultiplexing methods. The microwell device containing the beads with cDNAs is connected to a computer-controlled reagent delivery and scanning system. Melting buffer (150 mM NaOH) is infused into the device and incubated for 10 minutes. The device is then washed with imaging buffer (2xSSC, 0.1% Tween-20). An automated imaging program scans the device in the bright-field, Cy3 and Cy5 emission channels. Fluorescence images are acquired using an LED light source (Lumencor, spectra x), Quad band filter set (Chroma, cat# 89402), wide-field 10x objective (Nikon, cat# MRH00101) and 555 nm and 649 nm excitation for Cy3 and Cy5, respectively. Hybridization solution (imaging buffer supplemented with probe pool A, described below) is infused into the device and incubated for 10 minutes. The device is then washed with imaging buffer. An automated imaging program scans the device in the bright-field, Cy3 and Cy5 emission channels. Repeat the previous step 7 times, with probe pool B to H. Melting buffer is infused into the device and incubates for 10 minutes. The device is then washed with imaging buffer, and then disconnected from the automated reagent delivery system.
Creation of Optical Probe Pools. To link cellular imaging with scRNA-seq from the same cell, the cell identifying optical barcode sequence on each bead in the microwell array is identified by sequential fluorescent probe hybridization. A temporal barcoding strategy is used in which each cell identifying optical barcode sequence corresponds to a unique, pre-defined 8-bit binary code (See
scRNA-seq Steps Performed off Microwell Device. Perfluorinated oil is pipetted into the device containing cells and the beads to seal the microwells. The device is then cut into 10 regions. Beads from each region are extracted separated by soaking each small piece of bead-containing PDMS in 100% ethanol, vortexing, water bath sonication, and centrifugation in a 1.7 mL microcentrifuge tube. PDMS is then removed by tweezer. Beads extracted from each region are processed in separate reactions for the downstream library construction. Beads are washed sequentially with TE/SDS buffer (10 mM Tris-HCl, 1 mM EDTA, 0.5% SDS), TE/TW buffer, and nuclease-free water. cDNA amplification is performed in 50 µL PCR solution (1X Hifi Hot Start Ready mix (Kapa Biosystems, cat# KK2601), 1 µM SMRTpcr primer, with 14 amplification cycles (95° C. 3 min, 4 cycles of (98° C. 20 s, 65° C. 45 s, 72° C. 3 min), 10 cycles of (98° C. 20 s, 67° C. 20 s, 72° C. 3 min), 72° C. 5 min) on a thermocycler. PCR product from each piece is pooled and purified using SPRI paramagnetic bead (Beckman, cat# A63881) with a bead-to-sample volume ratio of 0.6:1. Purified cDNAs are then tagmented and amplified using the Nextera kit for in vitro transposition (Illumina, FC-131-1024). 0.8 ng cDNA is used as input per reaction. A unique i7 index primer is used to barcode the libraries obtained from each piece of the device. The i5 index primer is replaced by a universal P5 primer for the selective amplification of 5′ end of cDNA (corresponding to the 3′ end of RNA). Two rounds of SPRI paramagnetic bead-based purification with a bead-to-sample volume ratio of 0.6:1 and 1:1 are performed sequentially on the Nextera PCR product to obtain sequencing-ready libraries. The resulting single-cell RNA-Seq libraries are pooled and 20% PhiX library (Illumina, FC-131-1024) is spiked-in before sequencing on an Illumina NextSeq 500 with a 26-cycle read 1,58 cycle read 2, and8 cycle index read. A custom sequencing primer is used for read 1.
Automated reagent delivery system. An automated reagent delivery and scanning system is designed for automated optical decoding. In this system, fixed positive pressure (~1 psi) stabilized by a pressure regulator (SMC Pneumatics, cat# AW20-N02-Z-A) is used to drive fluid flow. The microwell device is constantly pressurized during incubation steps, which prevents evaporation and bubble formation. Two 10-channel rotary selector valves (IDEX Health & Science, cat# MLP778-605) are connected in parallel to toggle between 14 reagent channels. A three-way solenoid valve (Cole-Parmer, cat# EW-01540-11), located at the downstream of the microwell device, is used as an on/off switch for reagent flow. The multi-channel selector valves are controlled by a USB digital I/O device (National Instruments, cat# SCB-68A). The three-way solenoid valve is controlled by the same USB digital I/O device, but through a homemade transistor-switch circuit. The system is controlled by an imaging software (Nikon, NIS-Elements).
Bead optical decoding analysis. Eight cycles of probe hybridizations (A to H) are used for cell barcode optical decoding. For each cycle, the device is imaged in the bright-field, Cy3 and Cy5 emission channels. Beads are first identified in the bright-field image by the ImageJ Particle Analyzer plugin, and the positions of the beads in the bright-field image are recorded. Then the average fluorescence intensities of each bead in the Cy3 and Cy5 images are measured. Beads identified in cycles B to H are mapped to the nearest bead in cycle A. Thus, a probe hybridization matrix is obtained with n beads x 16 intensity values (8 for Cy3 and 8 for Cy5). To call cell barcodes from the imaging data, two methods are tested:
Cycle-by-cycle. In the cycle-by-cycle method, for each cycle and each fluorescent channel; Get N log transformed average intensity values; Compute an intensity histogram using 50 bins; Determine the median intensity value M, and identify the highest bin with intensity values smaller than M as B1 and the highest bin with intensity values greater than M as B2; Identify the lowest bin B3 with intensity values between B1 and B2; Get the medium intensity value I of bin B3, then assign 0 to intensity values smallerthan I and assign 1 to intensity values greater than I. Refer to the binary code table. If the code assigned is in the table, then return the corresponding cell identifying optical barcode sequence.
Bead-by-bead. In the bead-by-bead method, for each bead and each fluorescence channel; Get eight average fluorescence intensity values x1,x2,...,x8; Let y1, y2,...,y8 be the sorted values; Let fn = (yn+1 - yn)/yn, n = 1,2,...,7 be the relative intensity fold change between neighbor sorted values; Determine the largest fold change N = argmax(fn), then assign 0 to values to y1,y2,...,yN and assign 1 to n values yN+1,yN+2,...,y8; Refer to the binary code table. If the code assigned is in the table, then return the corresponding cell barcode sequence; Otherwise, remove fN from list {fn} and repeat the process using the next largest fold change until a corresponding cell barcode sequence is returned or the list {fn} is empty.
An experiment is performed to demonstrate using RNA capture beads containing cell identifying optical barcodes to link single cell phenotypic image and nucleic acid sequence data, in terms of throughput, molecular capture efficiency, and accuracy of linking imaging and sequencing data.
This experiment is performed with mixed human (U87) and mouse (3T3) cells labeled with two differently colored live staining dyes. Mixed cells are loaded into the microwells at a relatively high density and 9,061 transcriptional profiles are obtained from a single experiment. At saturating sequencing depth, on average 10,245 RNA transcripts are detected from 3,548 genes per cell (
Cell culture. Human U87 and mouse 3T3 cells are cultured in Dulbecco’s modified eagle medium (DMEM, Life Technologies, cat# 11965118) supplemented with 10% fetal bovine serum (FBS, Life Technologies, cat# 16000044) at 37° C. and 5% carbon dioxide.
Human and mouse cells mixed experiment. Human U87 cells are stained with Calcein AM (ThermoFisher Scientific, cat# C3100MP) and mouse 3T3 cells are stained with Calcein red-orange (ThermoFisher Scientific, cat# C34851) in culture medium at 37° C. for 10 minutes. The stained cells are then dissociated into single cell suspension by 0.25% Trypsin-EDTA (Life Technologies, cat# 25200-072) and re-suspended in TBS buffer. The U87 and 3T3 cells are mixed at 1:1 ratio with a final total cell concentration 1000 cells/µl. The mixed cell suspension is processed and sequenced and images and sequencing data are processed as described above in Example 5.
Imaging based multiplets identification. Two-color live staining fluorescence images are merged with Calcein AM signal in green and Calcein red-orange signal in magenta. Each well is manually examined within the smallest bounding square. Wells with mixed-species cells are determined as having at least one green object and one magenta object; wells with a single cell are determined as having only one green object or one magenta object.
Sub-sampling analysis. To analyze the saturation behavior and sensitivity of scRNA-seq data (
Accuracy of linking imaging and scRNA-seq data. The linking accuracy is defined as the concordance between the scRNA-seq and imaging-based species calling for cell barcodes associated with a single species. In scRNA-seq data, cells with >90% of reads aligning uniquely to a given species are considered to correspond to a single species. In the imaging data, the imaging-based species call is determined based on cell live staining colors. Cells with Calcein AM intensity > 724 are called as imaging-based human cells; Cells with Calcein red-orange intensity > 2,048 are called as imaging-based mouse cells. Intensity thresholds are determined as the intensity of the shortest bin between the two mean values of the bimodal Gaussian distribution of intensity values.
To demonstrate collection of paired optical and transcriptional phenotypes from human tissue samples using the cell identifying optical barcodes described herein, an experiment is performed on cells dissociated from a human glioblastoma (GBM) surgical sample and labeled with calcein AM, a fluorgenic dye that reports esterase activity. 1,954 scRNA-seq profiles are obtained and 1,110 of them linked to live cell images. Cell multiplets are removed based on imaging analysis. Calcein AM is commonly used as a live stain and, thus, outlier cells with low fluorescence intensity are also removed. Malignantly transformed GBM cells often resemble non-neoplastic neural cell types in the adult brain, and thus simple marker-based analysis is insufficient to confirm malignant status. To address this, a large population of cells is identified with amplification of chromosome 7 and loss of chromosome 10, two commonly co-occurring aneuploidies that are pervasive in GBM, based on the gene expression. A low-dimensional representation is then computed of the data using single-cell hierarchical Poisson factorization (scHPF) to identify key gene signatures that define the population and visualized their distributions across cells using Uniform Manifold Approximation and Projection (UMAP). All of the major cell types are recovered that have been previously reported from scRNA-seq of GBM including myeloid cells, endothelial cells, pericytes, malignant-transformed astrocyte-like cells, mesenchymal-like cells, oligodendrocyte-progenitor-like/neuroblast-progenitor-like cells (OPC/NPC) and cycling cells (
Identification of Relationships between Imaging Features and Lineage Identities of Malignantly Transformed GBM Cells. Malignant cells in GBM can resemble multiple neural lineages and exhibit a mesenchymal phenotype. Because malignant GBM cells are known to be highly plastic and undergo differentiation and de-differentiation, a diffusion map is used to visualize their lineage relationships. Malignant cells are selected based on aneuploidy as described above, the dimensionality of malignant cell gene expression is reduced by scHPF, and the factorized data are visualized with a diffusion map, which reveals two major branches. One branch consists of astrocyte-like cells and terminates with mesenchymal-like cells, while the other branch consists of OPC/NPC cells and cycling cells. This is consistent with previously published studies showing that astrocyte-like and mesenchymal glioma cells are significantly more quiescent than OPC-like glioma cells.
To explore how imaging features of malignant cells are related to the two major cellular lineages, it is asked whether unsupervised clustering of cellular imaging features would correspond to the two major lineages observed in scRNA-seq. Malignant cells are clustered by the three imaging meta-features described above using hierarchical clustering, and two major cellular imaging clusters are identified. By plotting two imaging clusters on the diffusion map embedding of the malignant cells, it is found that cells with round shape, low intensity and small size (imaging cluster 0) are enriched in the OPC/NPC-cycling branch, and cells with rough shape, high intensity and large size (imaging cluster 1) are enriched in the astrocyte-mesenchymal branch (
GBM tissue processing. A single-cell suspension is obtained from excess material collected during surgical resection of a WHO Grade IV GBM. The patient is anonymous and the specimen is de-identified. The tissue is mechanically dissociated following a 30-minute incubation with papain at 37° C. in Hank’s balanced salt solution. Cells are re-suspended in TBS after centrifugation at 100xg followed by selective lysis of red blood cells with ammonium chloride for 15 minutes at room temperature. Finally, cells are washed with TBS and quantified using a Countess (ThermoFisher). Cells are stained with Calcein AM (ThermoFisher Scientific, cat# C3100MP). The GBM cell suspension is processed and sequenced using RNA capture beads containing the cell identifying optical barcodes and imaging and sequencing data are processed as described herein in Examples 5-7. Multiplets are removed based on manual examination of each well within the smallest bounding square of the Calcein AM fluorescence image. The dead cells are identified based on the Calcein AM fluorescence intensity. A Gaussian distribution is fitted to the fluorescent intensity histogram, a threshold of lower5 percentile is set, and cells with intensity lower than the threshold are removed.
Live cell imaging analysis. Images are analyzed using ImageJ software. To identify microwells with cells, microwell outlines are identified as objects from the bright-field image using a local threshold, and then average fluorescence intensities of microwells in the live staining images are measured. Average intensity values follow a bimodal distribution, with the higher intensity population corresponding to microwells that contain cells. To extract cell optical phenotypes, only microwells with cells are selected and each cell is analyzed individually within the smallest bounding square of the corresponding microwell. The cell is identified in the live staining fluorescence image using the auto threshold and particle analyzer. Microwells with multiple cells identified by the software are excluded. Sixteen imaging features are measured for each cell in the fluorescence image: area, mean intensity, standard deviation of intensity, minimum intensity, maximum intensity, median intensity, perimeter, width, height, major axis, minor axis, circularity, Feret’s diameter, minimum Feret’s diameter, roundness, and solidity.
Analysis of scRNA-seq with optically barcoded beads. To analyze the scRNA-seq data collected using beads containing cell identifying optical barcode sequences, the cell-identifying optical barcode and UMI from Read 1 is first extracted based on the designed oligonucleotide sequence, NN(8-nt Cell Barcode S)NN(8-nt Cell Barcode Q)NNNN. The 192 8-nt cell barcode sequences have a Hamming distance of at least three for all sequence pairs. Therefore, one substitution error is corrected in the cell barcode sequences. Only reads with a complete cell barcode are retained. Next, the reads are aligned from Read 2 to a merged human/mouse genome (GRCh38 for human and GRCm38 for mouse) with merged GENCODE transcriptome annotations (GENCODE v.24 for both species) using STAR v.2.7.0 aligner after removal of 3′ poly(A) tails (indicated by tracts of >7 A’s) and fragments with fewer than 24 nucleotides after poly(A) tail removal. Only reads that uniquely mapped to exons on the annotated strand are included for the downstream analysis. Reads with the same cell barcode, UMI (after one substitution error correction) and gene mapping are considered to originate from the same cDNA molecule and collapsed. Finally, this information is used to generate a molecular count matrix.
Optically barcoded beads for linking cell imaging and sequencing data. To link the cell identifying optical barcodes identified from imaging to cell imaging phenotypes, bright-field images of the microwell device obtained during optical decoding are mapped to images of the live cell imaging based on the upper-left and the bottom right microwells. Cells are then registered to the nearest mapped bead within a microwell radius. To link cell imaging phenotypes to expression profiles, only cell barcodes with registered cells are considered, and then the exact and unique mapping of the cell identifying optical barcodes from imaging and sequencing is found.
Single cell hierarchical Poisson factorization (scHPF) analysis. To reduce the dimensionality of scRNA-seq results, the gene count matrix is factorized using the scHPF with default parameters and K = 13. One of the factors contains several heat shock with high gene scores (among the top 50 genes), likely indicating dissociation artifacts in certain cells. This factor is removed in all downstream analysis.
Malignant cell identification. The cell aneuploidy analysis was performed based on the scHPF model as described previously. To compute the scHPF-imputed expression matrix, the gene and cell weight matrix (expectation matrix of variable θ and β) is multiplied in the scHPF model and then the result matrix log-transformed as log2(expected counts/10000 + 1) . The average gene expression on each somatic chromosome is calculated using the scHPF-imputed count matrix as previously described. A malignancy score is defined as the difference between the average expression of Chr. 7 genes to that of Chr. 10 genes, < log2(Chr. 7 Expression) > - < log2(Chr. 10 Expression) > . A double Gaussian distribution is fitted to the malignancy score and the score of the shortest bin between two mean intensities is used as the threshold that separates the malignant and non-malignant cell populations. The difference of chromosome average expression between malignant and non-malignant cells is computed as the expression subtracted by the average expression of non-malignant cells.
scRNA-seq clustering and visualization. To visualize the scHPF model (
Cell optical phenotypes clustering. To reduce the dimensionality of the cellular imaging features, 16 cell imaging features are z-normalized and hierarchically clustered using the ‘linkage’ method in the python module ‘SciPy’ with correlation distance. The dendrogram in
Diffusion map embedding of malignantly transformed GBM cells. The molecular count matrix for malignantly transformed GBM cells (identified by aneuploidy analysis as described above) is factorized using scHPF with default parameters and K=15. Prior to further analysis, one of the 15 factors is removed, which exhibits high scores for heat shock response genes, because it likely represents a dissociation artifact in a subset of cells. Diffusion components are then computed with the DMAPS Python library. A Pearson correlation distance matrix computed from the scHPF cell score matrix is used as input with a kernel bandwidth of 0.5. The first two diffusion components are plotted in
scRNA-seq differential expression. The Mann-Whitney U-test is used for differential expression analysis. For pairwise comparison of two groups of cells, the group with more cells is randomly sub-sampled to the same cell number as the group with fewer cells. Next, the detected molecules from the group with a higher average number of molecules detected per cell are randomly sub-sampled so that the two groups had the same average number of molecules detected per cell. The resulting sub-sampled matrices are then normalized using a random pooling method as implemented in the scran R package. Finally, the resulting normalized matrices are subjected to gene-by-gene differential expression testing using the Mann-Whitney U-test using the ‘mannwhitneyu’ function in the Python package SciPy. The resulting p-values are corrected using the Benjamini-Hochberg method as implemented in the ‘multipletests’ function in the Python package statsmodels.
This application is a 35 U.S.C. Section 371 national phase application of PCT International Patent Application No. PCT/US2020/039943, filed Jun. 26, 2020, incorporated herein by reference in its entirety and which claims the benefit of U.S. Provisional Pat. Application Serial No. 62/867,830, filed Jun. 27, 2019, the disclosure of which is incorporated herein by reference in its entirety.
This invention was made with government support under Grant Nos. 9R44HG010003-02A1, 5R44HG010003-03, 75N91019C00029, CA202827, and HG010003 awarded by the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/039943 | 6/26/2020 | WO |
Number | Date | Country | |
---|---|---|---|
62867830 | Jun 2019 | US |