The sequence listing file under the file name “Sequence_Listing_034689-000048.xml” submitted in ST.26 XML file format with a file size of 93 KB created on Sep. 13, 2022 and filed on Sep. 14, 2022 is incorporated herein by reference.
The present invention relates to a system for correlated spatial analysis of multiple post-transcriptional regulations in single cell at a tissue-wide scale, in particular, a post-transcriptional regulation-specific molecular fishing platform in combination with a digital spectrum fluorescent in-situ hybridization (Spectrum-FISH) barcoding system to realize high throughput, large-scale profiling of post-transcriptional regulations with subcellular spatial resolution across an acute tissue biopsy.
Spatial heterogeneity in gene expression is closely related to human physiology in normal or diseased conditions. Different techniques have been evolved for capturing spatial information with cellular resolution across tissues to study critical genetic or epigenetic regulations among large-scale cellular populations. Among those techniques, fluorescence in-situ hybridization (FISH) has been widely used to assess the spatial distribution of mRNAs, in which different encoding strategies have been incorporated to enable high multiplexed analysis of mRNAs, where MERFISH (multiplexed error-robust FISH) is one of those using a combinatorial labeling strategy with an error-robust algorithm for simultaneous imaging of over 100 RNA species in a single cell. Another advanced FISH technique is sequential fluorescence in-situ hybridization (SeqFISH) for sequential analysis of different RNAs through multiple rounds of hybridization to increase the assay throughput. However, for profiling heterogeneous molecular targets of a tissue-wide sample, conventional FISH-based profiling techniques can not meet such a need because the conventional techniques mostly focus on subcellular localization of genetic targets and only limited to individual cells. Some of the conventional techniques even require super-resolution imaging aids, which limit their applications.
A large-scale, tissue-wide spatial transcriptomics by using an array of poly(T) tails to capture released mRNAs from a pre-treated tissue sample on a grid of spatially-indexed coordinates has been developed recently to attempt to meet such a demand. The captured mRNAs by such spatial transcriptomic method are subsequently identified by sequencing. However, such method involves a complex pre-treatment of the tissue, including frozen sectioning, fixation, membrane penetration, etc., to release intracellular mRNAs, which is laborious and time-consuming, and also easily leads to cross-contamination among adjacent regions and RNA degradation within certain spatial extent, in turn reducing spatial resolution of the associated transcriptomic analysis.
In addition, the majority of the conventional tissue-wide spatial profiling techniques are only limited to transcriptional (mRNAs), proteomic targets or chromatin modification, and seldom for analyzing tissue-wide post-transcriptional regulations. In terms of the post-transcriptional regulations, microRNAs (miRNAs) and methylated RNAs are two prevalent mechanisms closely related to complex gene expression topology for accommodating substantial regulatory flexibility, diversity, and robustness, where miRNAs are non-coding single-stranded small RNA molecules with about 21-23 nucleotides that modulate gene expression by inhibiting mRNA translation; RNA methylation is the most commonly found modification on mRNAs for regulating their metabolism. For example, N6-methyladenosine (m6A) modification is highly enriched in the mammalian brain, and distinct m6A methylation patterns varies at different brain regions with a dynamic involvement in neural development. Some recent studies show that miRNAs and RNA methylation in concert play an important role in post-transcriptional genetic regulation.
However, there is a lack of practical techniques that can analyze their correlation in post-transcriptional regulation with sufficient throughput and spatial information at a tissue-wide scale. The use of long priming probes in conventional FISH-based methods does not favor the profiling of small miRNAs. Although the nucleotide length of mRNAs with m6A methylation is sufficiently long for probe priming, additional biochemical analysis such as liquid chromatography is required to differentiate methylation levels within a pool of mRNAs. If there are more than one type of molecular targets from a population of individuals cells to be analyzed in-situ with sufficient spatial resolution, conventional techniques usually require mapping each type of molecular targets one-by-one, instead of a single cell level.
Current analytical methods based on sensing and quantifying fluorescent signals from different probes by conventional fluorescent microscopy are limited by the number of fluorescent channels, interference and crosstalk between different fluorescent channels, thereby limiting the number of applicable fluorophores and multiplexing throughput. Previously, FISH-based method and NanoString barcoding (a comparison between different conventional methods will be described hereinafter and summarized in Table 3) have been reported. FISH based fluorescence encoding requires super-resolution microscopy and involves serial rounds of hybridization, imaging and probe stripping, making it expensive, time-consuming and technically difficult for wider adoption. For NANOSTRING barcoding, the barcodes are prepared using a string of RNA particles coupled with fluorophores, which requires intricate sequence design and synthesis processes. In an assay, the reporter barcodes also need to be stretched by an electric field before imaging, further limiting its usage for in situ analysis. Though NANOSTRING has already been used in a spatial mRNA profiling, the implementation is of relatively low spatial resolution and slow profiling speed due to the sequential region by region cleaving process of the reporters.
Therefore, a need exists for a single cell multi-omic profiling approach including post-transcriptional regulation to capture spatial information of multiple molecular targets with subcellular resolution among a larger scale of cellular populations from a tissue-wide biopsy, and a new coding system for translating multi-spectral information from different fluorophores with overlapping emission wavelengths into corresponding codes containing specific spectral features, which at least diminishes, eliminates or overcomes the disadvantages, problems or challenges in the conventional techniques.
The present disclosure proposes a platform configured to massively capture intracellular molecular targets from a large population of cells in acute tissue slices and incorporating a robust molecular fishing system for targeting a wide variety of molecular targets at once. Captured or extracted intracellular molecular targets by the platform are analysed in-situ through an initial spatial registration and employing a subsequent barcoding strategy to enable a high-throughput multiplexing and quantification of molecular targets in different individual cells with respect to the spatial registration of extracted intracellular molecular targets by the platform. The proposed platform and strategy are sequencing-free, and capable of profiling multiple post-transcriptional molecular targets involved in genetic or epigenetic regulations in a single process run. Multi-spectral information obtained by different fluorescent channels from a specially-designed mixture of barcodes (or different ratios of fluorophores conjugated with different reporter probes) for hybridizing different intracellular molecules on the platform extracted from the tissue slice is digitalized and output to a network implementing one or more machine learning algorithms to analyse and extract the corresponding feature vector representing a specific type of molecular targets in order to quantify different types of molecular targets present in a population of cells and correlate the quantitative result to their spatial distribution across the tissue, in order to obtain a post-transcriptional profile of multiple molecular targets in a target tissue.
Accordingly, in a first aspect of the present invention, there is provided a molecular fishing system comprising an array of nanoprobes (or a plurality of vertically-aligned nanoneedles), where each of the nanoprobes (or nanoneedles) is functionalized with one or more molecular target fishing molecules for extracting molecular targets via intracellular biopsy, i.e., interfacing with a superficial layer of cells in a freshly prepared tissue slice.
In certain embodiments, the array of nanoprobes is made of silicon.
In other embodiments, the array of nanoprobes can be made of a material with sufficient mechanical strength to enable puncture of the nanoprobes into the tissue sample to a subcellular level.
In certain embodiments, the one or more molecular target fishing molecules include nucleic acids, proteins, antibodies, or any combination thereof.
In certain embodiments, the nucleic acids being the one or more molecular target fishing molecules are DNA or RNA molecules, or a combination thereof.
In certain embodiments, the DNA or RNA molecules being the molecular target fishing molecules include oligo (dT) primers and antisense oligonucleotides
In certain embodiments, the molecular target fishing molecules are selected from p19 siRNA binding proteins for targeting microRNAs (miRNAs).
In certain embodiments, the molecular target fishing molecules are selected from anti-N6-methyladenosine (m6A) antibody for targeting N6-methyladenosine messenger RNAs (m6A mRNAs).
In certain embodiments, other antibodies selected as the molecular target fishing molecules are for targeting proteins and other methylated DNAs or mRNAs.
In certain embodiments, the nanoprobes are initially amino-functionalized prior to cross-linking with the one or more molecular target fishing molecules.
In certain embodiments, each of the nanoprobes is configured to have a high height-to-base width aspect ratio.
In certain embodiments, each of the nanoprobes has substantially identical height, base width, and spacing with the other nanoprobe on the base of the array.
In certain embodiments, each of the nanoprobes has an average base width from 200 nm to 500 μm.
In certain embodiments, each of the nanoprobes has an average height from 2 μm to 200 mm.
In certain embodiments, between each pair of the nanoprobes there is an average spacing distance from 5 μm to 500 μm.
In an exemplary embodiment, the amino-functionalized nanoprobes are subsequently biotinylated followed by labeling with streptavidin conjugated fluorescent dye, prior to cross-linking with the one or more molecular target fishing molecules.
In a second aspect of the present invention, there is provided a method of mapping spatial distribution and expression of multiple molecular targets with individual cells within a two-dimensional tissue boundary. The method includes:
In certain embodiments, the fluorescent dyes can be selected from any fluorescent dyes capable to emit light signals within a detectable range of the applicable microscopy but outside the spectrum of the imprint irradiation.
In certain embodiments, the fluorescent dyes have excitation and emission wavelengths from about 579 to 603 nm.
In certain embodiments, a crosslinker used to crosslink between the fluorescent dyes and the nanoprobes is UV cleavable.
In certain embodiments, a pixelated fluorescent pattern corresponding to the presence of the streptavidin conjugated fluorescent dyes on the nanoprobes covered by the tissue sample is obtained during the exposure to the irradiation.
In certain embodiments, after contacting the array of nanoprobes with the surface of the tissue sample at the interface where the one or more molecular target fishing molecules and the crosslinked fluorescent dyes on the nanoprobes will be in contact with a superficial layer of the tissue, a pressure is applied to the nanoprobes towards the tissue sample in order to puncture the nanoprobes into the superficial layer of the tissue for the subsequent extraction.
In certain embodiments, the pressure applied to the nanoprobes towards the tissue sample is by centrifugation of both the array of nanoprobes and the tissue sample held in a container.
In certain embodiments, the imprint irradiation for imprinting the outline of the tissue on the array is UV irradiation.
In certain embodiments, an image of the tissue and relative position of the nanoprobes is captured as a spatial registration of certain cell types in the tissue associated with the nanoprobes prior to the removal of the tissue sample from the array.
In certain embodiments, certain cell types in the tissue slice are labelled by immunostaining with specific markers.
In certain embodiments, the subsequent barcoding is performed by subjecting the array of nanoprobes to a plurality of reporter sequences complementary to the molecular targets extracted by the nanoprobes, where the corresponding reporter sequence is associated with a pre-determined mix ratio of multiple labelling agents for spectral analysis.
In a third aspect of the present invention, a spectrum barcoding system for profiling different molecular targets associated with post-transcriptional regulation mechanisms in individual cells of a tissue sample, in which the system includes a plurality of different sets of in-situ hybridization particles, and each set of in-situ hybridization particles is associated with a guiding probe and a plurality of labelling agents at a pre-determined mix ratio corresponding to a specific molecular target.
In certain embodiments, the in-situ hybridization particles are modified with a specific functional group to crosslink with the labelling agent.
In certain embodiments, the specific functional group is selected from amino group, hydroxyl group, carboxyl group, N-hydroxyl succinimide group, or sulfhydryl group.
In other embodiments, the in-situ hybridization particles can be modified by electrostatic adhesion or selected from porous particles for association with or absorption of different labelling agents.
In certain embodiments, the in-situ hybridization particles are in different shapes including spherical, nanoroad, nanowire, and star.
In certain embodiments, each of the in-situ hybridization particles has an average size of about 1 nm to about 1 cm.
In certain embodiments, the in-situ hybridization particles are beads in nano scale (or nanobeads).
In certain embodiments, the in-situ hybridization particles are made of one or more of magnetic material, inorganic material and a polymer, which include, but not limited to, Fe2O3, Fe3O4, silicon, silicon oxide, gold, silver, AlOOH, polystyrene, polyvinyl chloride, or any combination thereof.
In certain embodiments, the in-situ hybridization particles are magnetic beads.
In certain embodiments, the labelling agents have excitation and emission wavelengths from 300 nm to 800 nm.
In certain embodiments, the labelling agents are fluorophores.
Other than fluorophores, the labelling agents can be one or more of quantum dot, upconversion materials, fluorescent molecules and proteins conjugated with the in-situ hybridization particles according to other embodiments.
In certain embodiments, each combination of in-situ hybridization particles includes from at least 1 to 50 different types of labelling agents.
In certain embodiments, the pre-determined mix ratio of each type of labelling agent to the other type(s) in the same combination of in-situ hybridization particles is 1:1-99.
In other words, the mix ratio of each labelling agent is from 1/2 to 1/100 in the same combination of in-situ hybridization particles.
In certain embodiments, the guiding probe includes a nucleotide sequence or amino acid sequence complementary to a specific sequence of the molecular target, or antibodies, or any combination thereof.
In certain embodiments, the nucleotide sequence of the guiding probe is antisense oligoes to a DNA or RNA sequence of the molecular target.
In certain embodiments, the guiding probe is functionalized with one of biotin, amino group, hydroxyl group, carboxyl group, N-hydroxy succinimide group and sulfhydryl group.
In certain embodiments, the functionalized guiding probe is associated with the corresponding in-situ hybridization particle.
In some other embodiments, the guiding probe is a protein-RNA complex that is capable of recognizing a single base of the molecular target.
In certain embodiments, the in-situ hybridization particles can be further functionalized with one or more functional elements including plasmid, siRNA, drug, or a complex of sgRNA associated with any of Cas9, Cas12, and Cas13 proteins.
In certain embodiments, after a first round of in-situ hybridization particles contacts with the molecular targets on the nanoprobes, an emission pattern/spectrum from the corresponding labelling agent of the first round of in-situ hybridization particles associated with the array of nanoprobes is captured by all applicable channels of a multi-channel microscope, followed by cleavage of the guiding probe associated with the first round of the in-situ hybridization particles, and then the nanoprobes are exposed to a second or subsequent round of in-situ hybridization particles for capturing a second or subsequent emission pattern/spectrum from their respective labelling agent by the same applicable channels of the multi-channel microscope before the respective guiding probe being cleaved.
In certain embodiments, the number of applicable channels of the multi-channel microscope is at least 4.
In certain embodiments, the multi-channel microscope includes confocal microscope and other fluorescence detection device.
In a fourth aspect of the present invention, a spectral digitization method is provided to encode at least two conditions/statuses of each emission pattern/spectrum captured by each of the applicable channels of the multi-channel microscopy in the spectrum barcoding system of the present invention, where the method includes encoding the at least two conditions of the emission spectrum detected by each applicable channel with at least two numbers.
In certain embodiment, the two numbers (binary strategy) employed to encode two different conditions are “1” and “0” to represent the presence and absence of an emission spectrum in an individual cell, respectively.
In certain embodiments, the total number of spectrum barcodes under a binary mixing strategy is determined by N×(2C−1), where C denotes the number of applicable channels for encoding; N denotes the number of hybridization/visualization round, and where the barcode “0000” is not used when the number of applicable channels is 4 and only one round of in-situ hybridization is performed.
In other embodiments, a trinary mixing strategy is employed to encode for three different conditions in case where the throughput is further enhanced by increasing a mix ratio of multiple labelling agents, wherein different brightness levels (or intensity levels) of an emission spectrum of a labelling agent detectable by the applicable channel of the microscopy are encoded as “0”, “1”, and “2”, respectively.
In the embodiments that the mix ratio of multiple labelling agents is increased, the throughput of the spectrum barcoding system in each single process run can be determined by the following equation:
N×(RstepC−1)
where Rstep denotes the ratio step number; C denotes the number of applicable channels, and N denotes the number of hybridization/visualization rounds.
In certain embodiments, emission spectra captured by the respective applicable channel of the multi-channel microscope after said encoding are further processed by extracting features from region of interests (ROIs) followed by optimization before barcode feature vectors are generated.
In certain embodiments, the tissue sample is stained with a cell-specific labelling agent and fluorescent images thereof are captured for image segmentation by adaptive binarization before feature extraction from the ROIs.
In certain embodiments, the generated barcode feature vectors are decoded by machine learning based models or algorithms, including but not limited to, linear regression, logistic regression, decision tree, SVM, Bayes and KNN, or advanced deep-learning algorithms, such as Convolutional Neural Networks (CNNs), Long Short Term Memory Networks (LSTMs), Recurrent Neural Networks (RNNs).
Other aspects of the present invention include a kit for multiplex detection of molecular targets including the in-situ hybridization particles described herein with a pre-determined mix ratio of labelling agents having different or similar (overlapping) excitation and emission spectra associated with one or more guiding probes for different genotypes or species of molecular targets, together with an implementation of the digitalized spectrum barcoding strategy described herein such that microscopy with a limited number of channels/limited resolution or labelling agents having high crosstalk with each other can still achieve a significantly high throughput due to the encoding and decoding mechanisms employed in the present invention.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Other aspects of the present invention are disclosed as illustrated by the embodiments hereinafter.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The appended drawings, where like reference numerals refer to identical or functionally similar elements, contain figures of certain embodiments to further illustrate and clarify the above and other aspects, advantages and features of the present invention. It will be appreciated that these drawings depict embodiments of the invention and are not intended to limit its scope. The invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been depicted to scale.
It will be apparent to those skilled in the art that modifications, including additions and/or substitutions, may be made without departing from the scope and spirit of the invention. Specific details may be omitted so as not to obscure the invention; however, the disclosure is written to enable one skilled in the art to practice the teachings herein without undue experimentation.
References in the specification to “one embodiment”, “an embodiment”, “an example embodiment”, etc., indicate that the embodiment described can include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
The term “a” or “an” are used to include one or more than one and the term “or” is used to refer to a nonexclusive “or” unless otherwise indicated. In addition, it is to be understood that the phraseology or terminology employed herein, and not otherwise defined, is for the purpose of description only and not of limitation. Furthermore, all publications, patents, and patent documents referred to in this document are incorporated by reference herein in their entirety, as though individually incorporated by reference. In the event of inconsistent usages between this document and those documents so incorporated by reference, the usage in the incorporated reference should be considered supplementary to that of this document; for irreconcilable inconsistencies, the usage in this document controls.
Value in a range format should be interpreted in a flexible manner to include not only the numerical values explicitly recited as the limits of the range, but also to include all the individual numerical values or sub-ranges encompassed within that range as if each numerical value and sub-range is explicitly recited. For example, a concentration range of “about 0.1% to about 5%” should be interpreted to include not only the explicitly recited concentration of about 0.1 wt. % to about 5 wt. %, but also the individual concentrations (e.g., 1%, 2%, 3%, and 4%) and the sub-ranges (e.g., 0.1% to 0.5%, 1.1% to 2.2%, and 3.3% to 4.4%) within the indicated range.
In the methods of preparation or using the system, device, apparatus, or alike described herein, the steps can be carried out in any order without departing from the principles of the invention, except when a temporal or operational sequence is explicitly recited. Recitation in a claim to the effect that first a step is performed, and then several other steps are subsequently performed, shall be taken to mean that the first step is performed before any of the other steps, but the other steps can be performed in any suitable sequence, unless a sequence is further recited within the other steps. For example, claim elements that recite “Step A, Step B, Step C, Step D, and Step E” shall be construed to mean step A is carried out first, step E is carried out last, and steps B, C, and D can be carried out in any sequence between steps A and E, and that the sequence still falls within the literal scope of the claimed process. A given step or sub-set of steps can also be repeated.
The present invention provides a platform for extracting molecular targets in a tissue-wide scale and a high throughput method to locate and map the expression pattern with different sections of the tissue sample while a proposed digitized spectrum barcoding strategy enables multiplexing in a limited number of visualization channels by using a set of different mix ratio of fluorophores-labelled particles and taking their signature spectral pattern to quantify copies of a specific molecular target based on certain machine learning based algorithms. A spatial post-transcriptome analysis of both miRNAs and methylated mRNAs with single cell resolution across a millimeter to centimeter brain tissue slice is enabled. By the present platform, the molecular information is sampled and preserved by individual nanoprobes, which are registered to thousands of cells in an acute tissue slice after the initial cellular contact for intracellular molecular fishing. To achieve such spatial mapping, a UV imprinting scheme is proposed to acquire a pixelated tissue morphology feature on the array of nanoprobes with a larger footprint than the tissue. Under the proposed UV imprinting scheme, as all the nanoprobes are labeled with a photocleavable fluorescent dye, a brief UV exposure will be sufficient to cleave off the dyes on nanoprobes that are not covered (unmasked) by the tissue sample, thus to generate a contrasted fluorescent pattern of the tissue outline on the biochip, providing a reference framework for spatial registration to subsequent tissue images to be acquired by immunostaining and optical microscopy. Unlike the conventional spatial indexing methods that typically involve molecular sampling with tremendous efforts in spatial control, the proposed UV imprinting scheme facilitates spatial encoding which is easy to implement and does not require any special equipment, such as robotic sampler or microfluidic chamber, making it extremely cost-effective. Together with a custom-made imaging processing algorithm, each nanoprobe can be traced back to individual cells at subcellular resolution to reveal the post-transcriptional profiles across a whole centimeter tissue sample.
In certain embodiment, the platform incorporates a biochip with an array of vertically aligned nanoprobes to effectively extract intracellular molecules (miRNAs, m6A-mRNAs) for downstream analysis in the coordinates of the large-scale of cells within a tissue slice.
The present disclosure also proposes a digitalized spectrum barcoding approach to achieve multiplexing throughput in a relatively limited number of fluorescent channels or absent super-resolution microscopy. The digitalized spectrum barcoding approach relies on a “rainbow” fluorescence composition (i.e., a mixture of different fluorophores or luminescent labelling agents) and a machine learning based spectrum decoding method for functional differentiation of multiple molecular targets in a single cell. The simplest spectrum digitalization is based on a binary mixing strategy, e.g., to assign the presence and absence of fluorescence signal at each channel from the nanoprobes after each round of in-situ hybridization with the corresponding combination of nanobeads with a pre-determined mix ratio of multiple fluorophores as “1” and “0”, respectively. To enhance the multiplexing, multi-round visualization strategy by DNase-assisted removal of spectrum barcodes at each round of in-situ hybridization is proposed. Theoretically, the encoding pool can be further expanded if the rainbow fluorescence dyes mix ratio changes from the current binary format and adopts a step-wise ratio for different fluorophores, i.e., the number of available codes will be N×(RstepC−1), where Rstep indicates the ratio step number, C indicates the fluorophore channels number, and N indicates the rounds of visualization cycles. For example, with a larger mix ratio number, using a trinary mixing strategy, with ‘0’, ‘1’ and ‘2’ for three different conditions in each channel, there will surely be higher throughput. Therefore, under the trinary mixing strategy with 7 different fluorophores, the multiplexing throughput can be significantly increased to over 10,000 by 5 rounds of visualization.
Turning to
Depending on the targets to be extracted/isolated, different ‘bait’ molecules (or molecular target fishing molecules) are functionalized on the nanoprobes for molecular extraction/isolation (or called “molecular fishing”). For example, for miRNAs extraction, p19 protein is used; for extracting RNAs with m6A modifications, specific antibodies are used. Other possible examples of “bait” molecules include poly(T) sequences for extracting mRNAs, different antibodies for extracting signaling proteins or methylated RNAs, RNA-binding proteins (RBPs) for extracting interactive translational regulation factors, etc. It should be understood that more than one “bait” molecules can be functionalized on each of the nanoprobes to enable extraction/isolation of multiple molecular targets from each individual cell of the tissue sample simultaneously.
In the context of using the present nanoprobe array in epi-transcriptome analysis at single cell level, it is important to map the nanoprobes in the coordinates associated with a large number of cells with microscale resolution. In certain embodiments, a UV imprinting strategy is employed by labelling the nanoprobes with a fluorescence dye, e.g., ALEXA FLUOR 568, or AF-568 by a photo-cleavable crosslinker. When a piece of tissue slice (smaller than the array) is interfaced with the nanoprobes for “molecular fishing”, a brief UV irradiation (e.g., at 365 nm, ˜5 mw/cm2) is applied, which energy level is just enough to cleave the photo-cleavable crosslinker in order to remove the fluorescence labels from the nanoprobes under a direct UV exposure, while the part covered by the tissue (masked region) remains unaffected (
The upper panel of
When the molecular target is miRNAs, the bait molecules are selected from p19 siRNA binding proteins, whereas when the molecular target is m6A mRNAs, the bait molecules are selected from anti-N6-methyladenosine (m6A) antibodies.
After the molecular fishing and UV imprinting, the extracted molecular targets on the nanoprobes are further subject to spectrum barcoding in the presence of nanobeads 500 associated with a complementary guiding probe 400 for in-situ hybridization with a specific molecular target 300. Depending on the throughput and the number of applicable fluorescence channels of the spectral system, a certain number of fluorophores at a pre-determined mix ratio is prepared and conjugated with the nanoprobe-molecular target-nanobead complex such that each species of molecular targets has a unique spectrum barcode (or spectrum profile).
In certain embodiments, the guiding probe is a DNA sequence antisense to a specific nucleotide sequence on the molecular target, which is associated with certain number of fluorophores at a pre-determined mix ratio.
When four applicable channels are used, a mixture of fluorophores used to conjugate with the nanobeads through a functional group such as amino group before hybridization with the specific nucleotide sequence on the molecular target is prepared by mixing four different fluorophores having different or overlapping excitation and/or emission spectra with each other.
In some working examples, AF-488, AF-514, AF-555, and AF-647 were mixed together at a pre-determined mix ratio. A mix ratio of each fluorophore to the rest of the fluorophores in the preparation may be 1/2, 1/3, 1/4, . . . up to 1/100. In most cases, the working excitation and emission spectra of the fluorophores are within a range of wavelengths from 300 nm to 800 nm. An example of using four different fluorophores to prepare the spectrum barcoding system for in-situ hybridization with the molecular targets on the nanoprobes is depicted in
Referring to the lower panel in
Theoretically, by using a multi-channel spectral system for analysis could render the number of codes totaled from the equation N×(2C−1), where C is the number of applicable fluorescent channels; N is the rounds of the hybridization analysis, and the barcode “0000” (representing blank or no fluorescence) is not used. In the example using a four-channel spectral system for capturing the fluorescence signals, the “rainbow” fluorescent beads can be encoded by 15 different spectral combinations. Different nanobead preparations having different mix ratio of fluorophores provide different visualization effects (an example prepared in vials with different colors is shown in
To overcome the interference by the variation of fluorescence intensities and a low level of background signal, especially in the “0” coded spectral window, a machine learning based algorithm is used to differentiate different digital spectral features from the 15 barcoded spectral vectors.
In some uncertain cases, that is, the nanobeads with ambiguous signal which does not carry a typical spectral feature or simply a noise resulted from microscopic imaging, a ‘filtering’ step (as illustrated in
After the spectrum decoding based on the proposed machine learning based model, for each target, the number of nanobeads on individual nanoprobes are counted to indicate the copy number of the molecular targets (e.g. miRNA or mRNA), in order to quantity the corresponding post-transcriptional expression of different molecular targets in individual cells across the tissue sample.
To correlate the resulted post-transcriptional miRNA profiles with the spatial distribution of individual cells, an acute OB tissue is sectioned into two opposing slices: one is immunostained with one or more cell-type specific markers (e.g., GFAP for astrocytes and NeuN for neurons) and the other is examined by the spectrum barcoding method of miRNA species. The results from the two opposing slices are then registered to give a full picture of the heterogeneous post-transcriptional miRNA regulation in the coordinates of all identified cells (
To fully unleash the power of the present invention for tissue-wide post-transcriptional profiling, the spatial expression of the 24 targeted miRNAs as shown in Table 1 across a whole coronal OB slice is analyzed. An OB slice is first labelled by six anatomical regions based on the Allen Brain Atlas (
To identify the dominant miRNA signature out of the 24 targets in Table 1, for each OB region, the expression of a particular miRNA (average from all associated nanoprobes) is statistically compared across the six OB regions (
To further explore an intrinsic spatial miRNA patterning, unsupervised BayesSpace clustering analysis is performed without prior OB structural labelling. As shown in
In addition to miRNAs, other post-transcriptional targets such as RNA methylations are also analyzed by the present invention so that the spatial cooperative involvement of different post-transcriptional mechanisms can be verified. To apply the present system in determining spatial profiling of RNA methylation, 9 mRNAs with m6A methylation are targeted in an acute coronal OB slice (Table 2). The “bait” protein (p19 for miRNAs) associated with the nanoprobes should be replaced by m6A-specific antibodies, and the nanoprobe associated operations described herein remains unchanged. The m6A-specific antibodies can extract all m6A-methylated RNAs, which are later decoded by an on-chip (in-situ) analysis and quantification (
AAACTACGATGGCAA
TTG (SEQ ID NO: 49)
AAACTACGATGGAGA
TCT (SEQ ID NO: 51)
AAACTACGATGGCAG
CTG (SEQ ID NO: 53)
AAACTACGATGGCCA
TGG (SEQ ID NO: 55)
AAACTACGATGGGCA
TGC (SEQ ID NO: 57)
AAACTACGATGGAGG
CCT (SEQ ID NO: 59)
AAACTACGATGGCGC
GCG (SEQ ID NO: 61)
AAACTACGATGGAGG
CCT (SEQ ID NO: 63)
AAACTACGATGGTTT
AAA (SEQ ID NO: 65)
The expression of the 9 m6A-mRNAs in different OB sub-regions is firstly examined, showing a unique pattern in SEZ in comparison to other OB regions (OPL, ML, IPL and GR) with significant upregulation of m6A-Gpr161 and down-regulation of m6A-Epha7 in SEZ. Such observation is further confirmed by a spatial mapping of these m6A-mRNAs (
Some studies found that m6A methylation of mRNAs can be regulated by miRNAs via a sequence pairing mechanism to modulate the binding between methyltransferase and mRNAs. The versatility of the spectrum barcoding system and related decoding method provide an extra dimension to demonstrate the cooperative involvement of the two post-transcriptional regulatory mechanisms across a whole tissue sample. A spatial distribution vector (SDV) for each of the miRNA clusters and m6A-mRNA clusters is generated (
(A) Nanoprobe Functionalization
As illustrated in
As illustrated in
The amino functionalized chips were activated using glutaraldehyde (15%, v/v) for 2 hours and then crosslinked with p19 siRNA binding protein (1 μg/ml in depc-PBS; New England Biolabs) or anti-N6-methyladenosine (m6A) antibody (1 μg/ml in depc-PBS, Abcam) for 2 hours. BSA (1%, m/m in depc-PBS) and triton X100 (0.1%, v/v, in depc-PBS) mixture were used to block the unreacted groups of the chips for 5 hours. The chips were further reacted with PC biotin-PEG3-NHS ester (0.2 mg/ml, Sigma-Aldrich) for 1 hour and then then labeled with streptavidin conjugated with Alexa Fluor™ 568 (0.04 mg/ml in depc-PBS, ThermoFisher) for 1 hour, followed with treating with unlabeled biotin (0.1 mg/ml) for another 2 hours to block the unreacted streptavidin sites. All the reactions were performed in room temperature unless specifically mentioned.
(B) Spectrum Codes Fabrication
50 μL amino magnetic beads (J&K, 300-400 nm diameter, in PBS) were used for the spectrum codes preparation followed by reacting with 1 μL biotin-NHS (Sigma, 2 mg/mL) for 2 hours in room temperature. The beads were then washed three times using depc-PBS and 5 μL diverse type of streptavidin conjugated with specific fluorophores (i.e. Streptavidin, ALEXA FLUOR™ 488 conjugate, Streptavidin, ALEXA FLUOR™ 514 conjugate, Streptavidin, ALEXA FLUOR™ 555 conjugate, Streptavidin, ALEXA FLUOR™ 647 conjugate, ThermoFisher, 2 mg/mL) were mixed together with biotin functionalized beads for 2 hours at room temperature. Depc-PBS was used to wash the beads three times and 4 μL corresponding 5′-biotin DNA probe (BGI, 100 μM) was mixed with the beads for 2 hours, followed by treating with unlabeled biotin (0.1 mg/ml) for another 2 hours to block the unreacted streptavidin. After washing three times with depc-PBS, the beads were dispersed in 50 μL 1% BSA and 0.1% trition X100 solution and stored in 4° C. for further applications.
Optionally, the binding process can be achieved in several widely used methods, e.g., a) Electrostatic adhesion. In this condition, the particles surface will carry opposite charges with that of fluorophores. For example, if the fluorophores have negative surface charge in solution and the particles have positive charges, they will tend to bind with each other by the electrostatic adhesion. b) Absorption. If the particles are porous, then the fluorophores can be absorbed within the pores to achieve the binding status between particles and fluorophores. c) Other crosslinking mechanisms. Other than the above-mentioned biotin-streptavidin pair fluorophores and particle can both have amino group and can be crosslinked via glutaraldehyde or NHS-PEG-NHS or via NHS-NH2 reaction, for example, AF-488-NHS with amino-Fe3O4 particles.
(C) Spatial Probing of miRNAs Via Intracellular Biopsy
All the complementary miRNA probes with overhang were mixed with a concentration of 10−8 M in hybridization buffer (1%, m/m, BSA, 0.01%, v/v, tween 20, 5×SSC), followed by adding miR-cel 39 (10-10M) as internal reference and hybridize with the complementary probe for 15 min in 37° C. water bath.
For preparing the tissue sample, C57BL/6 mice (4-6 weeks) were sacrificed by cervical dislocation. The brains were harvested and maintained in ice-cold depc-PBS buffer, mounted on a vibratome with glue, and then were cut into coronal slices each of 500 μm thick.
Acute brain slice was rinsed with depc-PBS for several times and then transfer under the needle chip immersed with the prepared probes in a four well plate. The plate was centrifuged at 500 rpm (35.5 g of RCF, same below) for 5 mins to initiate a membrane puncture. The slices (or cells) with the nanoprobe patch were incubated 15 mins before miRNA target extraction in 37° C. The sample again underwent a centrifugation at 500 rpm for 10 mins to fish targeted miRNAs from brain slices (or cells) for further analysis.
After intracellular fishing, the needle chip with tissue on the top was transferred in depc-PBS with 0.1% tween 20 and exposed to 5 mW UV for 20 mins to imprint the tissue outline. A digital photograph was also captured to record the tissue and needle spatial relative location as an auxiliary registration method complementary to the UV imprint. Then the tissue was removed from the needle chip. Barcodes with different reporter probes were used to label the targets in the needle chip for 2 hours. Leica SP8 with 63×oil objective was used for imaging.
(D) Spatial Probing of m6A mRNAs Via Intracellular Biopsy
Acute brain slice was rinsed with depc-PBS for several times and then transfer under the needle chip immersed with the prepared probes in a four well plate. The plate was centrifuged at 500 rpm (35.5 g of RCF, same below) for 15 mins to fish targeted m6a mRNAs from brain slices (or cells) for further analysis.
After intracellular fishing, the needle chip with tissue on the top was transferred in depc-PBS with 0.1% tween 20 and exposed to 5 mW/cm2 UV for 20 mins to imprint the tissue outline. A digital photograph was also captured to record the tissue and needle spatial relative location as an auxiliary registration method complementary to the UV imprint. Then the tissue was removed from the needle chip. Barcodes with different reporter probes were used to label the targets in the needle chip for 2 hours. Leica SP8 with 63×oil-immersion objective was used for imaging.
(E) Strip and Re-Hybridization
After finishing imaging of one round, DNase I (ThermoFisher) was used to digest the reporter probes and strip the barcodes in 37° C. for 30 mins. The needle chip sonicated in triton X100 (0.25%, v/v, in depc-DI) for 30 s and washed with depc-PBS for three times. Another round of barcode with reporter probes were used to label the targets of the needle chip. Besides DNase I, other enzymes such as NaOH or formamide may be used.
(F) Image Processing and Analysis
Confocal microscope (Leica) equipped with 63× oil-immersion objective was used for imaging. Registration of tissue and needle array were performed in Photoshop and Matlab image processing toolbox using the UV imprint outline, digital photo and the tissue immunofluorescence staining image. The labelled brain region of the staining image was performed in Photoshop based on Allen brain atlas. For the single cell analysis, watershed segmentation in Fiji was used to process the NeuN stained image. The segmentation of the GFAP stained image was performed by adaptive binarization in Matlab. For decoding the barcodes, a threshold was optimized for the fluorescence image binarization. ROIs were extracted to generate barcode feature vectors for the further decoding using ML model.
(G) Immunofluorescent Staining Based Single Cell Subtype Clustering and Mapping
The miRNAs or m6A mRNAs barcode pattern was registered with the binarized astrocyte or neuron mask. The connected domains were analyzed with ‘bwlabel’ and ‘regionprops’ in Matlab. The copy number of the miRNAs or m6A mRNAs were calculated and normalized by dividing the correspond cell area. The single cell expression matrix was extracted and cluster analysis for the single cell subtype cluster was performed in R using the ConsensusClusterPlus package, The optimized cluster number was selected based on the CDF delta area plot using the usual elbow method.
(H) Conjoint Analysis of miRNA-mRNA Spatial Correlation and Target Relationship
Similar unsupervised cluster analysis was performed for both miRNAs and m6A mRNAs using BayesSpace R package2. The enrichment of the clusters in related brain regions were calculated with hypergeometric test. Spatial distribution vector (SDV) of miRNAs or m6A mRNAs were calculated based on the distribution ratio of each cluster on the six brain regions. The correlation map of miRNAs and m6A mRNAs were then calculated with the generated SDV for each two-cluster pair. Cluster pair with high correlation was selected for further analysis.
(I) Selection and Design of m6A mRNA Probes
Target mRNAs were designated using the database from Allen Brain Atlas. Top differential fold change mRNAs in main olfactory bulb with high confidence m6A sites were selected for the further profiling. m6A-Atlas was used to identify the high-confidence m6A sites of specific mRNAs. Database from PA-m6A-seq, miCLIP, DART-seq, m6A-CLIP-seq, m6A-REF-seq, MAZTER-seq and m6A-seq with improved protocol were applied for reference. A 41 bp reference sequence was acquired and based on which, the related reporter probes for specific m6A mRNAs were designed.
(J) Spectral Digitalization Barcoding and Encoding Strategy
Fluorophores with different excitation wavelengths or emission wavelengths were selected for barcoding. 300-400 nm beads were used as coding media, which can be easily imaged in common confocal microscope. Machine learning (ML) algorithms were used for decoding with each fluorophore channel as a feature vector input. A simple way was initially considered to judge the existence of fluorophore directly by setting up a series of thresholds. However, the interference and crosstalk between adjacent channels make its implementation impossible, because even in the absence of any fluorophore, a peak influenced by other channels could still be detected. Although using low crosstalk fluorophores may solve this issue, it will lead to a small number of applicable fluorophore candidates. Employment of machine learning (ML) is thereby a solution to the crosstalk problem and makes the application of adjacent fluorophore channels with strong crosstalk possible. In the present invention, ML is used to decode the fluorophore combinations by pattern recognition instead of isolated single threshold. Each barcode will have its specific spectrum pattern and can be decoded by corresponding machine learning algorithms.
In this example, four fluorophores were utilized with a total throughput of N×(2{circumflex over ( )}C−1), where C is the number of fluorescent channels and N is the rounds of visualization cycles. With more fluorophores used, there will be exponential growth of the throughput. For example, with 7 fluorophore channels in single round, the throughput will be 127. Fluorophores with similar emission spectrum excited by different lasers or similar excitation lasers with different emission spectrum, can be used simultaneously in different imaging sequence, which increases the number of potential candidates. In this example, seven fluorophore candidates with acceptable brightness available had been used for a higher throughput probing.
Beside increasing more fluorophore channels, the mixed ratio of the fluorophores might also be adjusted. In this example, a binary mixing strategy was employed where each channel only has two conditions, namely, ‘0’ or ‘1’. With a larger mix ratio number, for example, using a trinary mixing strategy, with ‘0’, ‘1’ and ‘2’ for three different conditions in each channel, there will surely be higher throughput. The equation will be N×(RstepC−1), where R step indicates the ratio step number, C indicates the fluorophore channels number, and N indicates the rounds of visualization cycles. Under the trinary mixing strategy with 7 different fluorophores, the multiplexing throughput can be significantly increased to over 10,000 by 5 rounds of visualization.
For example, when three types of fluorophores, namely fluorophore A, fluorophore B and fluorophore C were used, the existence of each fluorescence can be indicated by two statuses, namely ‘1’ and ‘0’, where ‘1’ means existence and ‘0’ means no existence of related fluorophore. Therefore, if a single particle simultaneously has fluorophore A, B, C, this fluorophore digital code will be ‘1 1 1’, where the first ‘1’ represents the existence of fluorophore A, the second ‘1’ represents the existence of fluorophore B and the third ‘1’ represents the existence of fluorophore C. Similarly, if another single particle only has fluorophores A and B, then it fluorophore digital code is ‘1 1 0’. Based on this coding strategy, with three types of fluorophores (N=3), a total of 7 fluorophores digital codes can be generated, namely ‘1 1 1’, ‘1 1 0’, ‘1 0 1’, ‘0 1 1’, ‘1 0 0’, ‘0 1 0’ and ‘0 0 1’. The ‘0 0 0’ is not used since it actually appears no fluorescence. This binary coding strategy can be applied in more fluorophores with the following equations, 2N−1, where N is the number of fluorophores used for coding.
Other than existence of the fluorophore, more than two statuses such as brightness grading can be further encoded by introducing one more status in relation to ‘fluorescence intensity’. For example, three fluorescence statuses, namely ‘0’, ‘1’, ‘2’ in digital code, representing ‘dark’, ‘half-bright’, ‘bright’ fluorescent statuses, respectively, can be introduced. Therefore, in three fluorophore condition (N=3) and three fluorescence statuses (I=3), total of 26 fluorescence codes could be resulted. The equation can be IN−1, where I indicates the number of fluorescent statuses caused by fluorescence intensity, N indicates the number of fluorophores used for coding.
(K) Machine Learning (ML) Based Decoding Algorithms
For decoding (readout and identification of the fluorescence codes), it is difficult to directly identify them based on visual observation or determination by threshold because of the overlapping of the spectrums emitted by the fluorophores. This cross-channel interference will lead to the existence of weak fluorescence in some fluorescent channel, where the corresponding fluorophore does not really exist. Besides, the differentiation of various fluorescence intensities makes this identification process even more difficult. In this regard, some common machine learning schemes were employed in this example to assist the readout and identification process. The difference among the existing machine learning models might be the accuracy, in which some might be better and more suitable for decoding these fluorescent codes than the others. For example, traditional machine learning algorithms such as linear regression, logistic regression, decision tree, SVM, Bayes and KNN, or advanced deep-learning algorithms, such as Convolutional Neural Networks (CNNs), Long Short Term Memory Networks (LSTMs), Recurrent Neural Networks (RNNs), etc. can be used.
There were mainly two groups in the ML decoding process, namely the training group and the test group as in
Filter_upper=
where
Filter_lower=C0×
where C0 was a coefficient less than 1 and determined by the spectrum intensity range.
The following table (Table 3) summarizes some key differences between different FISH methods/sequencing techniques and the present invention (“Spectrum-FISH”).
Although the invention has been described in terms of certain embodiments, other embodiments apparent to those of ordinary skill in the art are also within the scope of this invention. Accordingly, the scope of the invention is intended to be defined only by the claims which follow.