In one aspect, methods and compositions are provided for simultaneously profiling genome-wide chromatin protein binding or histone modification marks and RNA expression in the same cell.
Gene expression exhibits remarkable cellular heterogeneity, which may be influenced by multiple factors including different aspects of chromatin modifications (Corces, M. R. et al.
(2016) Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat Genet 48, 1193-1203, doi: 10.1038/ng.3646; Cheung, P. et al. (2018) Single-Cell Chromatin Modification Profiling Reveals Increased Epigenetic Variations with Aging. Cell 173, 1385-1397 e1314, doi: 10.1016/j.cell.2018.03.079). In the past few years, several assays measuring different aspects of chromatin states at a single-cell resolution have been developed. These include Droplet-based single cell ChIP-seq15, Tn5-based chromatin accessibility assays (ATAC-seq) (Buenrostro, J. D. et al. (2015) Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486-490, doi:10.1038/nature14590. Cusanovich, D. A. et al. (2015) Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910-914, doi: 10.1126/science.aab1601). DNase I hypersensitivity assay(DNase-seq) (Jin, W. et al. (2015) Genome-wide detection of DNase I hypersensitive sites in single cells and FFPE tissue samples. Nature 528, 142-146, doi: 10.1038/nature15740), MNase-based nucleosome position and chromatin accessibility assay (scMNase-seq) (Lai, B. et al. (2018) Principles of nucleosome organization revealed by single-cell micrococcal nuclease sequencing. Nature 562, 281-285, doi: 10.1038/s41586-018-0567-3), immunocleavage-based histone modification assays (Cut&Run, scChIC-seq) (Ku, W. L. et al. Single-cell chromatin immunocleavage sequencing (scChIC-seq) to profile histone modification. Nat Methods 16, 323-325, doi: 10.1038/s41592-019-0361-7 (2019). Skene, P. J. & Henikoff, S. An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. Elife 6, doi: 10.7554/eLife.21856 (2017). Hainer, S. J., Boskovic, A., McCannell, K. N., Rando, O. J. & Fazzio, T. G. (2019) Profiling of Pluripotency Factors in Single Cells and Early Embryos. Cell 177, 1319-1329 e1311, doi: 10.1016/j.cell.2019.03.014), antibody-guided Tn5 chromatin tagging assays (ACT-seq, Cut&Tag, CoBATCH) (Carter, B. et al. Mapping histone modifications in low cell number and single cells using antibody-guided chromatin tagmentation (ACT-seq). Nature communications 10, 1-5 (2019). Wang, Q. et al. CoBATCH for High-Throughput Single-Cell Epigenomic Profiling. Mol Cell 76, 206-216 e207, doi: 10.1016/j.molcel.2019.07.015 (2019). Kaya-Okur, H. S. et al. (2019) CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun 10, 1930, doi: 10.1038/s41467-019-09982-5), and NOMe-seq assay (Pott, S. (2017) Simultaneous measurement of chromatin accessibility, DNA methylation, and nucleosome phasing in single cells. Elife 6, doi: 10.7554/eLife.23203). These assays measure one or more aspects of chromatin states and provide data on cellular heterogeneity in chromatin but do not directly measure simultaneously both RNA and chromatin or transcription factor binding in the same single cell.
In one aspect, we now provide new compositions and methods for directly measuring simultaneously both RNA and chromatin or transcription factor binding in the same single cell.
More particularly, in one preferred aspect, methods are provided for diagnosing or prognosing an illness, the methods comprising:
In a further aspect, methods are provided methods for diagnosing or prognosing an illness, the methods comprising:
Suitably, excess primers are digested with an exonuclease prior to contacting cells with a barcode adapter.
Such methods are particularly useful to diagnosing cancer in a subject and may include treating a subject's biological sample according to a present method.
Additionally, the present methods are useful to identify biomarkers diagnostic or therapeutic of a cancer and may include treating a subject's biological sample in accordance with a method as disclosed herein, and thereafter administering to the subject a cancer therapeutic agent based on the identified biomarkers.
The present methods are also useful to determine cellular heterogeneity of solid tumor samples to treat cancer, any may include treating a subject's tumor sample in accordance with a method as disclose herein; determining the cellular heterogeneity of the tumor sample and, treating the subject with one or tumor specific therapeutic and/or chemotherapeutic agents. Preferably, the determination of the cellular heterogeneity of the tumor can accurately diagnose stages and nature of the tumor.
Still further, the present methods are also useful to evaluate cells, any may include the cells to a present method, thereby evaluating the cells. The cells may comprise, for example, tumor cells, stem cells, modified cells, infected cells, CAR-T cells, CAR-NK cells, transformed cells, cell lines or combinations thereof. The cells may be evaluated for epigenetic variations, transcriptomic variations, gene expression, protein expression, biomarkers or combinations thereof, among others.
Additional methods are provided are provided methods for diagnosing or prognosing an illness, including to identify and profile histone modifications in individual cells, the methods suitably comprising:
In certain aspects, the amplified DNA fragments from the first amplification assay are mapped to a human reference genome (UCSC hg18). In certain aspects, the mapped DNA fragments from the first amplification assay are separated into individual sets based on each barcode.
In certain aspects, the above method may be used to determine cellular heterogeneity and cellular differentiation in a subject, and include obtaining a sample from the subject and assaying the sample according to the above method. In certain aspects, the subject may be suffering from a genetic disorder, disease, neurological disease or disorders, cancer, autoimmune disease or combinations thereof.
In a further aspect, methods are provided for detecting and identifying nuclease hypersensitive sites in individual cells, and may comprise:
In such method, the nuclease suitably may comprise: endonucleases, exonucleases, DNases, MNase or combinations thereof. Preferred barcode adaptors may comprise a nucleotide sequence having a 50% sequence identity to: acactgacgacatggttctacannnnnnnnagateggaagagcacacgtctgaactccagtcac (SEQ ID NO: 2), tgtagaaccatgtcgtcagtgtcccccccc/3ddC (SEQ ID NO: 3), gatcggaagagcgtcgtgtagggaaagagtg (SEQ ID NO: 4) or tctttccctacacgacgctcttccgatct (SEQ ID NO: 5).
In a yet further aspect, methods are provided for determining cellular heterogeneity and cellular differentiation occurring during development, a genetic condition or disease state, the methods suitably comprising:
In a still further aspect, methods are provided for detecting and identifying DNase I nuclease hypersensitive sites in individual cells, comprising:
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although any methods and materials similar or equivalent to those described herein can be used in the practice for testing of the present invention, the preferred materials and methods are described herein. In describing and claiming the present invention, the following terminology will be used. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
The articles “a” and “an” are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, “an element” means one element or more than one element. Thus, recitation of “a cell”, for example, includes a plurality of the cells of the same type. Furthermore, to the extent that the terms “including”, “includes”, “having”, “has”, “with”, or variants thereof are used in either the detailed description and/or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising.”
The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 10%, up to 5%, or up to 1% of a given value or range. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude within 5-fold, and also within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated the term “about” meaning within an acceptable error range for the particular value should be assumed.
The terms “amplify”, “amplification”, “amplification reaction”, or “amplifying” refer to any in vitro process for multiplying the copies of a target nucleic acid. Amplification sometimes refers to an “exponential” increase in target nucleic acid. However, “amplifying” may also refer to linear increases in the numbers of a target nucleic acid, but is different than a one-time, single primer extension step. In some embodiments a limited amplification reaction, also known as pre-amplification, can be performed. Pre-amplification is a method in which a limited amount of amplification occurs due to a small number of cycles, for example 10 cycles, being performed. Pre-amplification can allow some amplification, but stops amplification prior to the exponential phase, and typically produces about 500 copies of the desired nucleotide sequence(s). Use of pre-amplification may limit inaccuracies associated with depleted reactants in certain amplification reactions, and also may reduce amplification biases due to nucleotide sequence or species abundance of the target. In some embodiments a one-time primer extension may be performed as a prelude to linear or exponential amplification.
In the descriptions above and in the claims, phrases such as “at least one of” or “one or more of” may occur followed by a conjunctive list of elements or features. The term “and/or” may also occur in a list of two or more elements or features. Unless otherwise implicitly or explicitly contradicted by the context in which it is used, such a phrase is intended to mean any of the listed elements or features individually or any of the recited elements or features in combination with any of the other recited elements or features. For example, the phrases “at least one of A and B;” “one or more of A and B;” and “A and/or B” are each intended to mean “A alone, B alone, or A and B together.” A similar interpretation is also intended for lists including three or more items. For example, the phrases “at least one of A, B, and C;” “one or more of A, B, and C;” and “A, B, and/or C” are each intended to mean “A alone, B alone, C alone, A and B together, A and C together, B and C together, or A and B and C together.” In addition, use of the term “based on,” above and in the claims is intended to mean, “based at least in part on,” such that an unrecited feature or element is also permissible.
As used herein, the terms “comprising,” “comprise” or “comprised,” and variations thereof, in reference to defined or described elements of an item, composition, apparatus, method, process, system, etc. are meant to be inclusive or open ended, permitting additional elements, thereby indicating that the defined or described item, composition, apparatus, method, process, system, etc. includes those specified elements—or, as appropriate, equivalents thereof—and that other elements can be included and still fall within the scope/definition of the defined item, composition, apparatus, method, process, system, etc.
As used herein, the term “illness” refers to any disease or condition afflicting a mammal such as a human, including for example, cancers, immune dysregulations, infections, neurological conditions, and genetic disorders.
The term “sample” in the present specification and claims is used in its broadest sense and can be, by non-limiting example, includes specimens or cultures (e.g., microbiological cultures), biological as well as non-biological specimens. Biological samples may comprise animal-derived materials, including fluid (e.g., blood, saliva, urine, lymph, etc.), solid (e.g. stool) or tissue (e.g., buccal, organ-specific, skin, etc.), as well as liquid and solid food and feed products and ingredients such as dairy items, vegetables, meat and meat by-products, and waste.
Biological samples may be obtained from, e.g., humans, any domestic or wild animals, plants, bacteria or other microorganisms, etc. These examples are not to be construed as limiting the sample types applicable to the present disclosure. Those of skill in the art would appreciate and understand the particular type of sample required for the detection of particular target sequences (Pawliszyn, J., Sampling and Sample Preparation for Field and Laboratory, (2002). Venkatesh Iyengar. G., et al., Element Analysis of Biological Samples: Principles and Practices (1998). Drielak .S., Hot Zone Forensics: Chemical, Biological. and Radiological Evidence Collection (2004); and Nielsen. D. M., Practical Handbook of Environmental Site Characterization and Ground-Water Monitoring (2005)).
As referred to herein, a “subpopulation” of cells refers to a particular subset of cells of a particular cell type which can be distinguished or are uniquely identifiable and set apart from other cells of this cell type. The cell subpopulation may be phenotypically characterized, and is preferably characterized by methods embodied herein. A cell (sub)population as referred to herein may constitute of a (sub)population of cells of a particular cell type characterized by a specific cell state.
Ranges provided herein are understood to be shorthand for all of the values within the range. For example, a range of 1 to 50 is understood to include any number, combination of numbers, or sub-range from the group consisting 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50. Concentrations, amounts, cell counts, percentages and other numerical values may be presented herein in a range format. It is to be understood that such range format is used merely for convenience and brevity and should be interpreted flexibly to include not only the numerical values explicitly recited as the limits of the range but also to include all the individual numerical values or sub-ranges encompassed within that range as if each numerical value and sub-range is explicitly recited.
Any compositions or methods provided herein can be combined with one or more of any of the other compositions and methods provided herein.
All publications, published patent documents, and patent applications cited herein are hereby incorporated by reference to the same extent as though each individual publication, published patent document, or patent application was specifically and individually indicated as being incorporated by reference.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the Office upon request and payment of the necessary fee.
RNAPII binding data from the RNA-RNAPII scPCOR-seq assay. The data were treated similarly as described in
The disclosure provides a novel technique, termed scPCORseq herein (single-cell Profiling of Chromatin Occupancy and RNAs Sequencing), for simultaneously profiling genome-wide chromatin protein binding or histone modification marks and RNA expression in the same cell. It was demonstrated, as described in detail in the examples section which follows, that scPCOR-seq is able to profile either histone H3 lysine 4 trimethylation (H3K4me3) or RNA Polymerase II (RNAPII) and RNAs in a mixture of human H1, GM12878 and 293T cells at a single-cell resolution and either H3K4me3, RNAPII, or RNA profile can correctly separate the cells. It was observed that the cell-to-cell variation in RNAPII binding is dependent on its genomic location and is correlated with the cell-to-cell variation in gene expression. It was demonstrated that not only does RNAPII binding to the transcription start site (TSS) regions, but also its binding to the transcription end sites (TES) regions, contributes to the cellular heterogeneity in gene expression. The data revealed thousands of CRE-gene interaction pairs from the single-cell RNAPII binding and RNA co-profiling data, which may contribute to the cell-to- cell variation in expression. Overall, the composition and methods embodied herein, provides a powerful and novel approach to understand the relationships among different omics layers.
Accordingly, in certain embodiments, a method for simultaneous profiling of chromatin occupancy and RNA in a single cell comprises isolating and culturing cells of interest from a sample; contacting the cells with a fixative agent; performing guided chromatin cleavage; subjecting the cells to reverse transcription; subjecting the cells to terminal deoxynucleotidyl transferase (TdT)-mediated oligonucleotides to both cDNA and chromatin cleaved ends in the presence of an oligonucleotide adaptor; pooling the cells from each reaction well and sorting the pooled cells, followed by one or more amplification steps; and, subjecting the sorted cells to a library sequencing; thereby, simultaneously profiling of chromatin occupancy and RNA in a single cell.
The basic idea of the chromatin immunocleavage (ChIC) method is to indirectly tether a nuclease, whose activity can be controlled, to antibodies that are specifically bound to a chromatin protein of interest. Subsequent activation of the tethered nuclease should result in DNA cleavage in the vicinity of the chromatin bound protein. Mapping of such DNA cleavage sites provides information about the genomic interaction sites of the protein of interest. In certain embodiments,
Micrococcal nuclease (MNase) is the enzyme of choice since its robust enzymatic activity stringently depends on Ca2+ions of millimolar (optimal at 10 mM) concentrations. This enzyme introduces DNA double-strand breaks in chromatin at nucleosomal linker regions and at nuclease hypersensitive (HS) sites.
To tether MNase to antibodies, a fusion protein consisting of two immunoglobulin binding domains of staphylococcal protein A that are N-terminally fused with MN are prepared. The protein (called pA-MNase) has a molecular weight of 34 kDa. In a general sense, the ChIC method is akin to the antibody-staining techniques for immunofluorescence studies, where the last step involves the addition of pA-MN. ChIC differs also from the staining techniques in that it is carried out in solution, where excess antibodies and pA-MN are removed by centrifugation in a microfuge.
The present disclosure also provides methods for labeling and identifying nucleic acid sequences using adaptors. An adaptor is an oligonucleotide composed of natural nucleotides, modified nucleotides, and/or synthetic (e.g., non-natural) nucleotides. An adaptor may be composed of DNA nucleotides, RNA nucleotides, RNA and DNA nucleotides (forming a RNA/DNA hybrid), synthetic nucleotides, modified nucleotides, and combinations of two or more of these. An adaptor may be in any conformation known in the art for oligonucleotides. Non-limiting examples of adaptor conformations include single-stranded, double-stranded, a mixture of single-stranded and double stranded, or hairpin-forming. The adaptor may be 15-100 nucleotides in length. In some embodiments, the adaptor is 15-45 nucleotides in length.
In some embodiments, an adaptor comprises a single-cell barcode (hereinafter referred to as “single-cell barcode-adaptors” or “barcode-adaptors”). A single-cell barcode is a sequence of nucleotides, typically up to 20 nucleotides but which can be longer, and is unique to each single cell. A single-cell barcode may be composed of DNA nucleotides, RNA nucleotides, RNA and DNA nucleotides (forming a RNA/DNA hybrid), synthetic nucleotides, modified nucleotides, and combinations of two or more of these. A single-cell barcode may be incorporated into the 5′ end of the adaptor. A single-cell barcode may be incorporated into the 3′ end of the adaptor. A single-cell barcode may be incorporated into the middle (e.g., not at the 5′ end or the 3′ end) of the adaptor.
In some embodiments, a single-cell barcode-adaptor oligonucleotide is “bead-bound,” i.e., is immobilized on a bead, or other solid object, that is modified to bind nucleotides. In some embodiments, a bead is a microsphere that binds single-cell barcode-adaptors. Beads can be individually assayed or isolated based on the physical characteristics of the bead. Beads for binding single-cell barcode-adaptors may be polystyrene beads, magnetic beads, hydrogel, or silica beads. In some embodiments, the 5′ end of the single-cell barcode-adaptor is bound to a bead and the 3′ end is not bound to a bead. In some embodiments, the 3′ end of the single-cell barcode-adaptor is bound to a bead and the 5′ end is not bound to a bead.
In other embodiments, a single-cell barcode-adaptor is not immobilized on a bead (i.e., neither end is bound to a bead), which is also referred to herein as being “free,” e.g., a “free single-cell barcode-adaptor.”
The single-cell barcode-adaptors may be single-stranded or double-stranded. In some embodiments, the single-cell barcode-adaptors are single-stranded.
In some embodiments, the adaptors contain a unique molecule identifier (UMI) sequence. In some embodiments, the single-cell barcode-adaptors contain a UMI. A UMI is a molecular tag of nucleotides that is used to detect and quantify unique RNA transcripts from a population as opposed to artifacts from PCR amplification. In some embodiments, the UMI sequence is random. A UMI sequence may be 4-30 nucleotides in length. In some embodiments, the UMI is 5-20 nucleotides in length. In some embodiments, the UMI is 6-12 nucleotides in length. In some embodiments, the UMI is 15-30 nucleotides in length.
In some embodiments, a plurality of single-cell barcode-adaptors molecules (e.g., bead-bound, free) are utilized. A plurality may include 2 or more single-cell barcode-adaptors molecules, 10 or more single-cell barcode-adaptors molecules, 100 or more single-cell barcode-adaptors molecules, 1,000 or more single-cell barcode-adaptors molecules, 10,000 or more single-cell barcode-adaptors molecules, 100,000 or more single-cell barcode-adaptors molecules, 1,000,000 or more single-cell barcode-adaptors molecules, or 10,000,000 or more single-cell barcode-adaptors molecules. In some embodiments, the plurality of single-cell barcode-adaptors molecules are utilized to sequence the RNA from a single cell. In some embodiments, the plurality of single-cell barcode-adaptors molecules are utilized to sequence the RNA from a plurality of cells.
In some embodiments, single-cell barcode-adaptors molecules (e.g., bead-bound, free) are blocked at or near the 3′ end of the adaptor. In some embodiments, single-cell barcode-adaptors molecules (e.g., bead-bound, free) are blocked at or near the 3′ end of the adaptor.
In certain embodiments, a plurality of single-cell barcode-adaptors molecules (e.g., bead-bound, free) may comprise the same nucleotide sequence or different nucleotide sequences. In some embodiments, the plurality of single-cell barcode-adaptors molecules comprise the same nucleotide sequence. In some embodiments, the plurality of single-cell barcode-adaptors molecules do not comprise the same nucleotide sequence. In some embodiments, the single-cell barcode-adaptors molecules comprise at least 2 different nucleotide sequences, at least 10 different nucleotide sequences, at least 100 different nucleotide sequences, at least 1,000 different nucleotide sequences, at least 10,000 different nucleotide sequences, at least 100,000 different nucleotide sequences, or any number of different nucleotide sequences between 2-100,000 different nucleotide sequences.
Histone modifications, which are typically measured by chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing (Barski A., et al. 2007. High-resolution profiling of histone methylations in the human genome. Cell 129: 823-837; Johnson D S., et al. 2007. Genome-wide mapping of in vivo protein-DNA interactions. Science 316: 1497-1502; Mikkelsen T. S., et al. 2007. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448: 553-560; Robertson G., et al. 2007. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods 4: 651-657) at the bulk-cell level, are associated with transcriptional regulation. Chromatin regions enriched in 113K4 methylation and H3K27 acetylation are potentially active promoters or enhancers that activate the transcription of target genes; on the other hand, genes enriched in H3K27me3 signals are usually repressed (Kim T. H., et al. 2005. A high-resolution map of active promoters in the human genome. Nature 436: 876-880.2005; Barski A., et al. 2007; Mikkelsen T. S., et al .; Wei G. et al. 2009. Global mapping of H3K4me3 and H3K27me3 reveals specificity and plasticity in lineage fate determination of differentiating CD4+ T cells. Immunity 30: 155-167; Creyghton M. P., et al. 2010. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci U SA 107: 21931-21936). While the genomic profiles of various histone modifications have been extensively characterized at the bulk cell level, several single-cell epigenomic techniques for detecting histone modification marks are reported recently (Rotem A., et al. 2015. Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state. Nat Biotechnol 33: 1165-1172; Ai S. et al. 2019. Profiling chromatin states using single-cell itChIP-seq. Nat Cell Biol 21: 1164-1172; Carter B. et al. 2019. Mapping histone modifications in low cell number and single cells using antibody-guided chromatin tagmentation (ACT-seq). Nat Commun 10: 3747; Grosselin K., et al. 2019. High-throughput single-cell ChIP-seq identifies heterogeneity of chromatin states in breast cancer. Nat Genet 51: 1060-1066; Hainer S. J., et al. 2019. Profiling of Pluripotency Factors in Single Cells and Early Embryos. Cell doi:10.1016/j.ce11.2019.03.014; Harada A., et al. 2019. A chromatin integration labelling method enables epigenomic profiling with lower input. Nat Cell Biol 21: 287-296; Kaya-Okur H. S., et al. 2019. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nature Communications 10; Ku W. L., et al. 2019. Single-cell chromatin immunocleavage sequencing (scChIC-seq) to profile histone modification. Nat Methods 16: 323-325; Wang Q. et al. 2019. CoBATCH for High-Throughput Single-Cell Epigenomic Profiling. Mol Cell 76: 206-216 e207).
Although single cell assays including scChIL-seq (Harada et al. 2019), scChlC-seq (Ku et al. 2019), uliCUT&RUN (Hainer et al. 2019), scCUT&Tag (Kaya-Okur et al. 2019), iACT-seq (Carter et al. 2019), CoBATCH (Wang et al. 2019), itChIP-seq (Ai et al. 2019) and scChlP-seq (Rotem et al. 2015; Grosselin et al. 2019) were developed recently for measuring histone marks, they have specific limitations. While scChIP-seq combined the droplet barcoding approach with ChIP-seq (Barski et al. 2007; Rotem et al. 2015; Grosselin et al. 2019), all other methods except for itChIP-seq replaced the traditional immunoprecipitation with antibody guided digestion of chromatin either via antibody-directed, transposase-mediated integration of a DNA tag and fragmentation (for scChIL-seq (Harada et al. 2019) and scCUT&Tag (Kaya-Okur et al. 2019), iACT-seq (Carter et al. 2019), CoBATCH (Wang et al. 2019)), or via DNA cleavage specifically around nucleosomes containing the target modification (Schmid M., et al. 2004. ChIC and ChEC; genomic mapping of chromatin proteins. Mol Cell 16: 147-157) (for uliCUT&RUN (Hainer et al., 2019) and scChlC-seq (Ku et al., 2019)). scChlP-seq (Rotem et al. 2015; Grosselin et al., 2019), with a relatively complicated workflow, could detect about 2000-4000 cells in one experiment with an average of 4000 reads per cell. Although iACT-seq, scCUT&Tag, uliCUT&RUN, itChIP-seq and scChIC-seq have simpler workflows and more cost-effective, iACT-seq and scCUT&Tag could detect an average of 2000-6000 reads per cells and the cell throughput of uliCUT&RUN, itChlP-seq and scChIC-seq is low. While scChIL-seq and CoBATCH worked well for detecting active marks, they were not optimal for detecting repressive marks in fixed samples considering the attenuated activity of Tn5 in non-accessible chromatin regions and its intrinsic bias towards open regions (Harada et al. 2019). Therefore, there is a need to develop a single cell technique for profiling histone marks with higher cell throughput, more widely applications and detection of more reads per cell.
Accordingly, a method of identifying and profiling histone modifications in individual cells comprises crosslinking cells with a cross-linking fixative agent; contacting the fixed cells with a chromatin specific guided nuclease for cleaving the chromatin; repairing of the nuclease cleaved ends by a polynucleotide kinase and adding of 5′-phosphates for poly nucleotide tailing and ligation; and, barcoding of the nuclease cleaved sites with a barcode adaptor and pooling of the cells; splitting of the cells and incubating the cells with a reverse cross-linking buffer; capturing of barcoded cellular DNA fragments and index labeling of the barcoded DNA fragments by a first amplification assay to produce DNA libraries; pooling and purifying the DNA libraries and poly A tailing the purified DNA libraries; ligating the poly A tailed to an adaptor and purifying the ligated DNA; performing a second amplification assay, isolating, purifying and sequencing the amplified fragments; thereby, identifying and profiling histone modifications in individual cells.
Cells, nucleic acids and the like utilized in methods described herein may be obtained from any suitable biological specimen or sample, and often is isolated from a sample obtained from a subject. A subject can be any living or non-living organism, including but not limited to a human, a non-human animal, a plant, a bacterium, a fungus, a virus, or a protist. Any human or non-human animal can be selected, including but not limited to mammal, reptile, avian, amphibian, fish, ungulate, ruminant, bovine (e.g., cattle), equine (e.g., horse), caprine and ovine (e.g., sheep, goat), swine (e.g., pig), camelid (e.g., camel, llama, alpaca), monkey, ape (e.g., gorilla, chimpanzee), ursid (e.g., bear), poultry, dog, cat, mouse, rat, fish, dolphin, whale and shark. A subject may be a male or female, and a subject may be any age (e.g., an embryo, a fetus, infant, child, adult).
A sample or test sample can be any specimen that is isolated or obtained from a subject or part thereof. Non-limiting examples of specimens include fluid or tissue from a subject, including, without limitation, blood or a blood product (e.g., serum, plasma, or the like), umbilical cord blood, bone marrow, chorionic villi, amniotic fluid, cerebrospinal fluid, spinal fluid, lavage fluid (e.g., bronchoalveolar, gastric, peritoneal, ductal, ear, arthroscopic), biopsy sample, celocentesis sample, cells (e.g., blood cells) or parts thereof (e.g., mitochondrial, nucleus, extracts, or the like), washings of female reproductive tract, urine, feces, sputum, saliva, nasal mucous, prostate fluid, lavage, semen, lymphatic fluid, bile, tears, sweat, breast milk, breast fluid, hard tissues (e.g., liver, spleen, kidney, lung, or ovary), the like or combinations thereof. The term blood encompasses whole blood, blood product or any fraction of blood, such as serum, plasma, buffy coat, or the like as conventionally defined. Blood plasma refers to the fraction of whole blood resulting from centrifugation of blood treated with anticoagulants. Blood serum refers to the watery portion of fluid remaining after a blood sample has coagulated. Fluid or tissue sample soften are collected in accordance with standard protocols hospitals or clinics generally follow. For blood, an appropriate amount of peripheral blood (e.g., between 3-40 milliliters) often is collected and can be stored according to standard procedures prior to or after preparation.
A sample or test sample can include samples containing spores, viruses, cells, nucleic acid from prokaryotes or eukaryotes, or any free nucleic acid. For example, a method described herein may be used for detecting nucleic acid on the outside of spores (e.g., without the need for lysis). A sample may be isolated from any material suspected of containing a target sequence, such as from a subject described above. In certain instances, a target sequence may be present in air, plant, soil, or other materials suspected of containing biological organisms.
Nucleic acid may be derived (e.g., isolated, extracted, purified) from one or more sources by methods known in the art. Any suitable method can be used for isolating, extracting and/or purifying nucleic acid from a biological sample, non-limiting examples of which include methods of DNA preparation in the art, and various commercially available reagents or kits, such as Qiagen's QIAamp Circulating Nucleic Acid Kit, QiaAmp DNAMini Kit or QiaAmp DNA Blood Mini Kit (Qiagen, Hilden, Germany), GENOMICPREP™, Blood DNA Isolation Kit (Promega, Madison, WI.), GFX™ Genomic Blood DNA Purification Kit (Amersham, Piscataway, N.J.), and the like or combinations thereof.
In some embodiments, a cell lysis procedure is performed. Cell lysis may be performed prior to initiation of an amplification reaction described herein (e.g., to release DNA and/or RNA from cells for amplification). Cell lysis procedures and reagents are known in the art and may generally be performed by chemical (e.g., detergent, hypotonic solutions, enzymatic procedures, and the like, or combination thereof), physical (e.g., French press, sonication, and the like), or electrolytic lysis methods. Any suitable lysis procedure can be utilized. For example, chemical methods generally employ lysing agents to disrupt cells and extract nucleic acids from the cells, followed by treatment with chaotropic salts. In some embodiments, cell lysis comprises use of detergents (e.g., ionic, nonionic, anionic, zwitterionic). In some embodiments, cell lysis comprises use of ionic detergents (e.g., sodium dodecyl sulfate (SDS), sodium lauryl sulfate (SLS), deoxycholate, cholate, sarkosyl) Physical methods such as freeze/thaw followed by grinding, the use of cell presses and the like also may be useful. High salt lysis procedures also may be used. For example, an alkaline lysis procedure may be utilized. The latter procedure traditionally incorporates the use of phenol-chloroform solutions, and an alternative phenol-chloroform-free procedure involving three solutions may be utilized. In the latter procedures, one solution can contain 15 mM Tris, pH 8.0; 10 mM EDTA and 100 μg/ml RNAse A; a second solution can contain 0.2N NaOH and 1% SDS; and a third solution can contain 3 M KOAc, pH 5.5, for example. In some embodiments, a cell lysis buffer is used in conjunction with the methods and components described herein.
Nucleic acid may be provided for conducting the methods embodied herein without processing of the sample(s) containing the nucleic acid. For example, in some embodiments, nucleic acid is provided for conducting amplification methods described herein without prior nucleic acid purification. In some embodiments, a target sequence is amplified directly from a sample (e.g., without performing any nucleic acid extraction, isolation, purification and/or partial purification steps). In some embodiments, nucleic acid is provided for conducting methods described herein after processing of the sample(s) containing the nucleic acid. For example, a nucleic acid can be extracted, isolated, purified, or partially purified from the sample(s). The term “isolated” generally refers to nucleic acid removed from its original environment(e.g., the natural environment if it is naturally occurring, or a host cell if expressed exogenously), and thus is altered by human intervention (e.g., “by the hand of man”) from its original environment. The term “isolated nucleic acid” can refer to a nucleic acid removed from a subject (e.g., a human subject). An isolated nucleic acid can be provided with fewer non-nucleic acid components (e.g., protein, lipid, carbohydrate) than the amount of components present in a source sample. A composition comprising isolated nucleic acid can be about 50% to greater than 99% free of non-nucleic acid components. A composition comprising isolated nucleic acid can be about 90%, 91%, 92%, 93%, 94%, 95%, 96%,97%, 98%, 99% or greater than 99% free of non-nucleic acid components. The term “purified” generally refers to a nucleic acid provided that contains fewer non-nucleic acid components (e.g., protein, lipid, carbohydrate) than the amount of non-nucleic acid components present prior to subjecting the nucleic acid to a purification procedure. A composition comprising purified nucleic acid may be about 80%, 81%, 82%,83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%,97%, 98%, 99% or greater than 99% free of other non-nucleic acid components.
An amplification process herein may be conducted over a certain length of time. In some embodiments, an amplification process is conducted until a detectable nucleic acid amplification product is generated. A nucleic acid amplification product may be detected by any suitable detection process and/or a detection process described herein. In some embodiments, an amplification process is conducted over a length of time within about 20 minutes or less. For example, an amplification process may be conducted within about 1 minute, about 2 minutes, about 3 minutes, about 4 minutes, about 5 minutes, about 6 minutes, about 7 minutes, about 8 minutes, about 9 minutes, about 10 minutes, about 11 minutes, about 12 minutes, about 13 minutes, about 14 minutes, about 15 minutes, about 16 minutes, about 17 minutes, about 18 minutes, about 19 minutes, or about 20 minutes. In some embodiments, an amplification process is conducted over a length of time within about 10 minutes or less.
Any suitable RNA or DNA amplification technique may be used. In certain embodiments, the RNA or DNA amplification is an isothermal amplification. In certain embodiments, the isothermal amplification comprises nucleic-acid sequence-based amplification (NASBA), recombinase polymerase amplification (RPA), loop-mediated isothermal amplification (LAMP), real-time loop-mediated isothermal amplification (RT-LAMP), strand displacement amplification (SDA), helicase-dependent amplification (HDA), or nicking enzyme amplification reaction (NEAR). In certain embodiments, non-isothermal amplification methods may be used which include, but are not limited to, PCR, multiple displacement amplification (MDA), rolling circle amplification (RCA), ligase chain reaction (LCR), ramification amplification method (RAM) cross-priming amplification (CPA) or smart amplification (SMAP).
The methods and components described herein may be used for multiplex amplification. Multiplex amplification generally refers to the amplification of more than one nucleic acid of interest (e.g., amplification or more than one target sequence). For example, multiplex amplification can refer to amplification of multiple sequences from the same sample or amplification of one of several sequences in a sample. Multiplex amplification also may refer to amplification of one or more sequences present in multiple samples either simultaneously or instep-wise fashion. For example, a multiplex amplification may be used for amplifying least two target sequences that are capable of being amplified (e.g., the amplification reaction comprises the appropriate primers and enzymes to amplify at least two target sequences). In some instances, an amplification reaction may be prepared to detect at least two target sequences, but only one of the target sequences may be present in the sample being tested, such that both sequences are capable of being amplified, but only one sequence is amplified. In some instances, where two target sequences are present, an amplification reaction may result in the amplification of both target sequences. A multiplex amplification reaction may result in the amplification of one, some, or all of the target sequences for which it comprises the appropriate primers and enzymes. In some instances, an amplification reaction may be prepared to detect two sequences with one pair of primers, where one sequence is a target sequence and one sequence is a control sequence (e.g., a synthetic sequence capable of being amplified by the same primers as the target sequence and having a different spacer base or sequence than the target). In some instances, an amplification reaction may be prepared to detect multiple sets of sequences with corresponding primer pairs, where each set includes a target sequence and a control sequence.
Accordingly, in certain embodiments the methods disclosed herein include amplification reagents. Polymerases are proteins capable of catalyzing the specific incorporation of nucleotides to extend a 3′ hydroxyl terminus of a primer molecule, such as, for example, an amplification primer, against a nucleic acid target sequence (e.g., to which a primer is annealed). Polymerases may include, for example, thermophilic or hyperthermophilic polymerases that can have activity at an elevated reaction temperature (e.g., above 55° C., above 60° C., above 65° C., above 70° C., above 75° C., above 80° C., above 85° C., above 90° C., above 95° C., above 100° C.). A hyperthermophilic polymerase may be referred to as a hyperthermophile polymerase. A polymerase having hyperthermophilic polymerase activity may be referred to as having hyperthermophile polymerase activity. A polymerase may or may not have strand displacement capabilities. In some embodiments, a polymerase can incorporate about 1 to about 50 nucleotides in a single synthesis. For example, a polymerase may incorporate about 5, 10, 15, 20, 25, 30, 35, 40, 45 or 50 nucleotides in a single synthesis. In some embodiments, a polymerase, can incorporate 20 to 40 nucleotides in a single synthesis. In some embodiments, a polymerase, can incorporate up to 50 nucleotides in a single synthesis. In some embodiments, a polymerase, can incorporate up to 40 nucleotides in a single synthesis. In some embodiments, a polymerase, can incorporate up to 30 nucleotides in a single synthesis. In some embodiments, a polymerase, can incorporate up to 20 nucleotides in a single synthesis.
In some embodiments, amplification reaction components comprise one or more DNA polymerases. In some embodiments, amplification reaction components comprise one or more DNA polymerases comprising: 9° N DNA polymerase; 9° Nm™ DNA polymerase; THERMINATOR™ DNA Polymerase; THERMINATOR™ II DNA Polymerase; THERMINATOR™ III DNA Polymerase; THERMINATOR™ gamma. DNA Polymerase; Bst DNA polymerase; Bst DNA polymerase (large fragment); Phi29 DNA polymerase, DNA polymerase I (E. coli), DNA polymerase I, large (Klenow) fragment; Klenow fragment (3′-5′ exo-); T4 DNA polymerase; T7 DNA polymerase; DEEP VENTR™ (exo-) DNA Polymerase; D DEEP VENTR™ DNA Polymerase; DYNAZYME™ EXT DNA; DyNAzyme™ II Hot Start DNA Polymerase; PHUSION™ High-Fidelity DNA Polymerase; VENTR® DNA Polymerase; VENTR® (exo-) DNA Polymerase; REPLIPHI™ Phi29 DNA polymerase; EquiPhi29 DNA polymerase; rBst DNA Polymerase, large fragment (ISOTHERM™ DNA polymerase); MASTERAMP™ AMPLITHERM™ DNA Polymerase; Tag DNA polymerase; Tth DNA polymerase; Tfl DNA polymerase; Tgo DNA polymerase; SP6 DNA polymerase; Tbr DNA polymerase; DNA polymerase Beta; and ThermoPhi DNA polymerase.
In some embodiments, amplification reaction components comprise one or more hyperthermophile DNA polymerases. Generally, hyperthermophile DNA polymerases are thermostable at high temperatures. For example, a hyperthermophile DNA polymerase may have a half-life of about 5 to 10hours at 95 degrees Celsius and a half-life of about 1 to 3 hours at 100 degrees Celsius. In some embodiments, amplification reaction components comprise one or more hyperthermophile DNA polymerases from Archaea. In some embodiments, amplification reaction components comprise one or more hyperthermophile DNA polymerases from Thermococcus. In some embodiments, amplification reaction components comprise one or more hyperthermophile DNA polymerases from Thermococcaceaen archaean. In some embodiments, amplification reaction components comprise one or more hyperthermophile DNA polymerases from Pyrococcus. In some embodiments, amplification reaction components comprise one or more hyperthermophile DNA polymerases from Methanococcaceae. In some embodiments, amplification reaction components comprise one or more hyperthermophile DNA polymerases from Methanococcus. In some embodiments, amplification reaction components comprise one or more hyperthermophile DNA polymerases from Thermus. In some embodiments, amplification reaction components comprise one or more hyperthermophile DNA polymerases from Thermus thermophiles.
Epigenetic modification of chromatin critically contributes to cancer development and subsequently epigenetic modification variations have been established as biomarkers for various cancers. During the past few years, accompanying the technical breakthroughs in single-cell RNA-seq technologies, scRNA-seq has been applied to multiple cancer samples, which discovered a broad range of cellular heterogeneity in cancer samples. Further studies have found that the cellular heterogeneity within the cancer samples critically impact the pathology of cancer and therapeutic decisions. Thus, the cellular heterogeneity information found within various cancers can serve as valuable biomarkers for diagnosis and treatment of cancers. Similar to the application of scRNA-seq technology to cancer samples, the scPCOR-seq technique can be applied to various cancers to discover both gene expression and epigenetic biomarkers of disease.
Other methods of use include virus infections, e.g. SARS-COV-2, such as pandemic COVID 19. COVID-19 is known to be lethal to some individuals but not to others and the lethality may be associated with uncontrolled over immune reaction of the individuals to the viral infection. High levels of interferon gamma gene activation is a critical component of the immune reaction. Gene regulation (activation and repression) is prepared by its epigenetic modification. Thus scPCOR-seq can be applied to individuals to screen for epigenetic variations in interferon gamma and other chemokine and cytokines genes, which may predict uncontrolled reaction upon COVID-19 development. This will serve as important biomarkers for therapeutic decisions. Other examples, include profiling blood samples of leukemia patients: diagnosis and therapeutic biomarkers; examining cellular heterogeneity of various solid tumor samples to accurately diagnose the stage and nature and disease; valuation of the heterogeneity and quality of CAR-T cells before infusion to the patient. This assay profiles both the transcriptome and epigenome of CAR-T cells and thus can provide comprehensive information on the cells. Blood stem cell therapy: provide profiles of white blood cells on both transcriptomes and epigenomes
In some aspects, the present disclosure provides methods of diagnosing a disease or disorder. Control samples may be from a known healthy subject or group of subjects (e.g., not having a disease or disorder), from a subject or group of subjects known to have a disease or disorder, or from a reference sequence, wherein the reference sequence is known to be associated with a disease or disorder. Non-limiting of diseases or disorders that may be diagnosed using methods of the present disclosure include cancer (e.g., brain cancers, lymphomas, leukemias, lung cancer, pancreatic cancer, breast cancer, renal cancer, prostate cancer, hepatic cancer, gastric cancer, bone cancer), autoimmune disorders (e.g., rheumatoid arthritis, lupus, Celiac disease, Sjögren's syndrome), and diabetes.
In some aspects, the methods embodied herein are used to identify different cell types. Non-limiting examples of cell types that may be identified with methods of the instant disclosure include tumors (e.g., solid tumors, serous tumors, brain tumors, spinal cord tumors, meninges tumors, lymphomas, pancreatic tumors, hepatic tumors, breast tumors, renal tumors, lung tumors, gastric tumors, colon tumors, bone tumors, leukemias), T cells (e.g., CD4.sup.+, CD8.sup.+, regulatory, helper), B cells (e.g., plasma cells, lymphoplasmacytoid cells, memory B cells, B-2 cells, B-1 cells), natural killer cells, stem cells (e.g., hematopoietic).
In some aspects, the methods embodied herein are used to identify the differentiation state of cells. Non-limiting examples of differentiation states that may be identified with methods of the instant disclosure include pluripotent (e.g., embryonic stem cells, induced stem cells), partially differentiated (e.g., hematopoietic stem cells), or terminally differentiated (e.g., neurons, myocytes, osteoblasts, glial cells, epithelial cells).
In some aspects, the methods embodied herein are used for a systematic analysis of genomic interactions between cells.
In some aspects, the methods embodied herein are used for combinatorial probing of cellular circuits, for dissecting cellular circuitry, for delineating molecular pathways, and/or for identifying relevant targets for therapeutics development.
In some aspects, the methods embodied herein are used to analyzing genetic signatures of cells (e.g. the composition of a solid tumor), such as molecular profiling at the single cell or cell (sub)population level.
In further related aspects, the disclosure relates to diagnostic (including monitoring the status of a subject), prognostic (including monitoring treatment efficacy), prophylactic, or therapeutic methods. Diagnostic or prognostic methods may comprise detecting the gene signatures, protein signature, and/or other genetic or epigenetic signature as discussed herein. Therapeutic or prophylactic methods according to the invention in particular may comprise modulating the responder phenotype, and may include modulating the gene signature, protein signature, and/or other genetic or epigenetic signature of cells or cell (sub)populations. Such methods include both in vitro as well as in vivo modulation.
As used herein, the term “gene signature” may be used interchangeably with the term “signature gene”. These terms relate to one or more gene (or one or more particular splice variants thereof), the (increased) expression or activity of which or alternatively the decreased or absence of expression or activity of which is characteristic for a particular (multi)cellular phenotype, i.e. the occurrence of such particular (multi)cellular phenotype may be identified based on the presence or absence of such gene signature. The signature may thus be characteristic of a particular phenotype, but may also be characteristic of a particular immune cell subpopulation within a particular phenotype. Similarly, an “epigenetic signature” relates to one or more epigenetic element (or modification), the (increased) occurrence of which or alternatively the absence of which is characteristic for a particular (multi)cellular phenotype, i.e. the occurrence of such particular (multi)cellular phenotype may be identified based on the presence or absence of such epigenetic signature. As used herein a signature encompasses any gene or genes or epigenetic element(s) whose expression profile or whose occurrence is associated with a specific cell type, subtype, or cell state of a specific cell type or subtype within a population of cells. Increased or decreased expression or activity or prevalence may be compared between different phenotypes in order to characterize or identify specific phenotypes. A gene signature as used herein, may thus refer to any set of up- and down-regulated genes between two (multi)cellular states or phenotypes derived from a gene-expression profile. For example, a gene signature may comprise a list of genes differentially expressed in a distinction of interest; (e.g., high responders versus low responders; diseased state versus normal state; etc.). Similarly, an epigenetic signature as used herein, may thus refer to any set of induced or repressed epigenetic elements between two (multi)cellular states or phenotypes derived from an epigenetic profile. For example, an epigenetic signature may comprise a list of epigenetic elements differentially present in a distinction of interest; (e.g., high responders versus low responders; diseased state versus normal state; etc.). It is to be understood that also when referring to proteins (e.g. differentially expressed proteins), such may fall within the definition of “gene” signature, and may on certain occasions be referred to as “protein signature”.
Kits are also provided herein. The kit can include primers, adaptors, terminal deoxynucleotidyl transferases (TdT), amplification reagents and other components suitable for use in the methods, e.g. ligases, polynucleotide kinases, fixative agents and the like.
Methods for simultaneous profiling of chromatin occupancy and RNA in the same single cell are not available currently. Here, a technique, termed scPCOR-seq (single-cell Profiling of Chromatin Occupancy and RNAs Sequencing), is reported for simultaneously profiling genome-wide chromatin protein binding or histone modification marks and RNA expression in the same cell.
Reagents. Histone H3 trimethyl Lys4 antibody was purchased from Millipore (catalog no. 07473), RNAPII antibody was purchased from Abcam (catalog no. ab817). Methanol-free formaldehyde solution was purchased from Thermo Fisher Scientific (catalog no. 28906). Terminal Transferase was purchased from New England BioLabs (catalog no. M0315L). The human embryonic stem cell line H1 (WA01- lot WB35186 p30) was provided by WiCell Research Institute. PA-MNase was purified after transformation of PET15b-PA-MNase plasmid (Addgene#124883) into BL21 Gold (DE3) following standard protocol.
Cell culture and fixation. HEK293T cells and GM12878 were maintained in DMEM (Invitrogen, catalog no. 10566-016) supplemented with 10% FBS (Sigma-Aldrich, catalog no. F4135-500ML) following standard procedure. The HI human embryonic stem cell line was maintained in feeder-free mTeSR™1 medium (Stem Cell Technologies, catalog no.85850) and passaged with ReLeSR™ (Stem Cell Technologies, catalog no.05872) following the manufacturer's instruction. Cells were harvested, washed with 1× PBS twice, and resuspended in DMEM containing 10% FBS and 1% formaldehyde. After 5 min incubation in room temperature, the reaction was stopped by adding 1.25 M glycine, followed by two rounds of washes with PBS. The cells were aliquoted into 1×106 cells per tube, frozen on dry ice, and stored at −80° C.
Antibody-guided MNase digestion and end repair. The fixed cells were thawed on ice. To prepare PA-MNase and antibody complex, 1 μl antibody and 3 μl PA-MNase were pre-incubated on ice in 4 μl antibody binding buffer (10 mM Tris-Cl (pH 7.5), 1 mM EDTA, 150 mM sodium chloride, 0.1% Triton X-100) for 30 min. Meanwhile, H1 fixed cells (1 million) and HEK 293T fixed cells (1 million) were resuspended in 100 μl antibody binding buffer. Then, cell suspension was added to the PA-MNase and antibody complex, incubated on ice for 1 hour. Cells were washed three times with high salt buffer (10 mM Tris-Cl (pH 7.5), 1 mM EDTA, 400 mM sodium chloride and 1% (v/v) Triton X-100), followed by washing once with rinsing buffer (10 mM Tris pH7.5, 10 mM sodium chloride and 0.1% (v/v) Triton X-100). Then the cells were resuspended in 40 μl reaction solution buffer (10 mM Tris-Cl (pH 7.4), 10 mM sodium chloride, 0.1% (v/v) Triton X-100, 2 mM CaCl2), incubated at 37° C. for 3 min in water bath. The reaction was stopped by adding 4.4 μl 100 mM EGTA. After washing twice with rinsing buffer, the cells were end-repaired by T4 Polynucleotide Kinase (PNK) in 150 μl reaction buffer (1× PNK buffer, 1 mM ATP, 150 unites PNK) at 37° C. for 30 min, followed by washing twice with rinsing buffer to stop the reaction.
In-situ reverse transcription. The cells were resuspended in 25 μl reverse transcription buffer (5 μl 10× Maxima H Minus reverse transcription buffer, 1.25 μl 10% NP40, 16.75 μl H2O, 1 μl 100 um not-so-random primers mixture ((Armour, C. D. et al. Digital transcriptome profiling using selective hexamer priming for cDNA synthesis. Nat Methods 6, 647-649, doi: 10.1038/nmeth.1360 (2009)), 1 μl 10 ng/μl Oligo dT22 primer (NNNNNNGAGCGTTTTTTTTTTTTTTTTTTTTTTVN)). After incubating at 65° C. for 1 min, the reaction was immediately put on ice, while the enzyme mix is prepared (8.75 μl H2O, 5 μl 10×Maxima H Minus reverse transcription buffer, 8 μl 10 mM dNTPs, 2 μl Maxima H Minus reverse transcriptase, 0.625 μl SUPERase·In™ RNase Inhibitor, 0.625 μl RNaseOUT™ Recombinant Ribonuclease Inhibitor) and added into the reaction. The reverse transcription was performed as described (Zhu, C. et al. An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat Struct Mol Biol 26, 1063-1070, doi: 10.1038/s41594-019-0323-x (2019). (50° C.×10 min; 3 cycles for the following: 8° C.×12 s, 15° C.×45 s, 20° C.×45 s, 30° C.×30 s, 42° C.×2 min, 50° C.×5 min; 50° C.×10 min and hold at 4° C.
Exonuclease I (Exo I) digestion. The cells were washed twice with rinsing buffer, resuspended in 50 μl reaction buffer (5 μl 10×Exo I buffer, 1 μl Exo I, 44 μl H2O) and incubated at 37° C. for 20 min. This is to remove the excess primers left after reverse transcription. After digestion, the cells were washed twice with rinsing buffer to stop the reaction.
Library construction. 96 barcode-P7 adaptors (10 μM) stored in a 96 well plate were thawed at 4° C., then 1 μl of each was added to the corresponding well in a new 96 well plate with multichannel pipette. Downstream library construction was performed as described previously for iscChIC-seq (Ku, Pan and Zhao et al., manuscript in revision). Briefly, the cells were suspended with nuclei suspension buffer and mixed with enzyme dilution buffer, followed by aliquoted into 10 μl in 96 wells, mixing with the added barcode-P7 adaptors. The plate was sealed completely and incubated at 37° C. for 60 min. After incubation, the cells were pooled together in a solution trough containing 500 μl stop buffer, resuspended with 800 μl 1× PBS and send to flow cytometry core. 30 cells were sorted in each well of a new 96 well plate which contain 13 μl buffer mixture per well (3 μl reverse-crosslink buffer, 10 μl PBS containing 0.1% NP40). The plate was sealed completely and incubated at 65° C. for 6 hours and 80° C. for 10 min.
After reverse crosslinking, indexed PCR1 was performed by adding 13 μl 2× PHUSION® High-Fidelity PCR Master Mix with HF Buffer (New England BioLabs) and 1 μl 2 μM index primer with the following condition: 98° C. 3 min, 12 cycles of 65° C. 30 s, 72° C. 30 s, followed by 72° C. 5 min. Then the libraries were pooled together, digested with Exo I and purified by MINELUTE® Reaction Cleanup Kit (Qiagen). Downstream A-tailing and P5 adaptor ligation were performed as described previously. PCR2 amplification with i5 index primer and P7-cs2 primer was set in the following condition: 98° C. 3 min, 57° C. 3 min, 72° C. 1 min, 7 cycles of 98° C. 10 s, 65° C. 15 s, 72° C. 30 s, followed by 72° C. 5 min. The PCR products were run on the 2% E-Gel® EX Agarose Gel (Invitrogen). The fragments between 250-600 base pair (bp) were isolated and purified by the MinElute Gel Extraction Kit (Qiagen). The concentration of the library was measured by Qubit dsDNA HS kit (Thermo Fisher Scientific). The paired-end sequencing was performed on Illumina Hiseq 2500 and Novaseq.
Pre-processing of scPCOR-seq and Reads mapping. Pairs of reads were considered to be valid if read 2 contained the exact linker sequences “AGAACCATGTCGTCAGTGT”. The valid pairs of read are further separated into either RNA part or chromatin occupancy part. If the linker sequences “GAGCG” for not-so-random primers or the linker sequences “CCTGCAGG” for oligodT were found in the location within 7-11th and 7-14th base of read 1, the pair of reads belonged to RNA. The remaining valid pairs belonged to chromatin occupancy. Using the information of the cell barcodes located at 5′ of read 2, both pairs of reads belonging to RNA and chromatin occupancy were separated into 96 sets of FASTQ files, respectively. Reads were mapped to the human reference genome hg19 using Bowtie2 Duplicates using different trimming parameters. Finally, the mapping results were combined, and Duplicated reads were removed based on mapping position and UMI for the reads belonging to chromatin occupancy.
Filtering for single cells and genes. For both scRNA-scRNAPII and scRNA-scH3K4me3 measurements. Genes and Peak regions were excluded if less than 6 cells or more than 300 cells have reads in these regions. If the cell-to-cell variation quantified by coefficient for the genes or peak regions are less than two, they were excluded, respectively. Single cells that have both at least 1000 RNA reads, and 1000 DNA reads were first considered. Also, if single cells have reads in less than 50 peak regions or 50 gene regions, they were excluded. Finally, the outlier cells, genes, peak regions were excluded, in which an outlier is a value that is more than three scaled median absolute deviations (Kaya-Okur, H. S. et al. CUT & Tag for efficient epigenomic profiling of small samples and single cells. (2019) Nat Commun 10, 1930, doi: 10.1038/s41467-019-09982-5) away from the median.
Cell Clustering. For either scRNA-scRNAPII or scRNA-scH3K4me3 measurements, the read count matrix for RNA was denoted as R′ while the read count matrix for DNA was denoted as D′. The columns of R′ correspond to cells and its rows correspond to the genes. Similarly, the columns of D′ correspond to cells and its rows correspond to the peak regions. Both of the read count matrices were normalized by the library sizes and were transformed by based two logarithm transformations. The final matrices are denoted as R and D for R′ and D′, respectively. For both RNA and DNA parts, the similarity between any two cells were computed using Pearson Correlation, resulting in two correlation matrices denoted as CR and CD, respectively. The Laplacian transformation was applied to the correlation matrices. The Laplacian matrix L is defined by L=1−T−1/2AT−1/2, where I is the identity matrix. A is a similarity matrix where A=e−(2−C)/max(2−C), C=CR or CD. Note that T is the Tis the degree matrix of A, a diagonal matrix that contains the row-sums of A on the diagonal (Dii=ΣiAij). The eigenvectors of the Laplacian matrix were computed and formed a matrix V where each column represents an eigenvector. The columns of V from left to right are sorted in ascending order based on their corresponding eigenvalues. For either RNA or DNA, a binary matrix E was considered in which its rows and columns correspond to single cells. The K-mean method was applied to the matrix Wt to cluster the single cells with k=2, where Wt is a submatrix V containing the first t columns. If cells i and j belong to the same cluster, Eij=Eji=1; otherwise 0. We consider t is between 2 to 15 and two consensus matrices ER and ED, correspond to RNA and DNA respectively, were calculated by averaging all binary matrices from each individual clustering. Finally, K-mean clustering was applied to the sum of ER and ED with k=2. Two set of cells determined by the clusters were obtained and denoted as KS1 and KS2.
PCA: For both RNA and DNA parts, principal component analysis (PCA) was applied to the two matrices to obtain the first 100 components. UMAP was further applied to the obtained principal component matrix. Cells were clustered for the scPCOR-seq cell line data. First, two cell-to-cell correlation matrices corresponding to RNA and DNA parts were computed using the obtained principal components. The z-score transformation was applied to these matrices (Faith, J. J., et al., Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. Plos Biology, 2007. 5(1): p. 54-66). The edges between two genes/regions with z-score values smaller than 3.2 were filtered out, resulting in two networks for RNA and DNA. The multiplex network clustering method MolTi (Didier, G., et al. Peerj, 2015. 3) was applied to both RNA and DNA networks.
Purity of clusters. The dimension reduction method t-SNE was applied to the two matrices R and D, with parameters ‘NumPCAComponents’ equal to 5 and 3, respectively. Also, the Perplexity for both reductions was equal to 100. For both RNA and DNA, two t-SNE component vectors were obtained as output. The K-mean clustering method was again separately applied to ER and ED with k=2. Two set of cells determined by each clustering were obtained. For RNA, they were denoted as KR1 and KR2. For DNA they were denoted as KD1 and KD2. The purity of KS1, i=1,2 is equal to
CRE-gene correlation. Human Cis-regulatory elements were downloaded from ENCODE. The CRE regions that have reads in any cells in either H1 or 293 T cells were excluded. For each cell, the count of RNAPII binding in the CRE regions (+500) was computed and normalized by the library sizes. The Pearson correlation between the RNAPII density in each CRE region and the gene expression of each gene was calculated for both H1 and 293 T cells. Thus, for both H1 and 293 T cells, two correlation matrices with dimensions of number of CRE regions and number of genes were obtained. The negative elements were set to be equal to zero. A value is obtained for each CRE region by summing over all genes for the matrix subtracting between the two correlation matrices. Thus, CRE regions specific to H1 cells and 293T cells were obtained based on the values calculated.
Comparison between TrAC-looping data and CRE-gene interactions. First, the functional CRE-gene candidates were identified by requiring that both elements are on the same chromosome and the distance between CRE region and gene region is less than 100 kbp. A CRE-gene pair was H1 specific if its correlation between the RNAPII density and mRNA level is higher in H1 cells compared to 293T cells, and vice versa. Number of PETs from TrAC-looping data that connected the CRE region and gene region from each cell type specific CRE-gene interaction were counted. Note that a window size of 5 kb were used for the CRE regions and gene regions when comparing with the TrAC-looping data. The number of PETs were normalized by the total number of PETS in the library.
An indexing single-cell ChIC-seq (iscChIC-seq) protocol was developed to profile histone modifications, in which Terminal Transferase (TdT) was used to mediate dG tailing on MNase digestion sites, while oligo-dC protruding barcode adaptors were ligated to these sites by T4 Ligase. In order to capture both histone modification or protein occupancy on chromatin and RNA in the same cell, a strategy to detect RNA profiles simultaneously (
H3K4me3 and RNAs were profiled by applying scPCOR-seq to human 293T cells and mouse NIH 3T3 cells to estimate the detection of doublets. After identifying the barcodes that refer to cells in either RNA or H3K4me3 data, a collision rate of 0.08 was observed in the RNA data and a collision rate of 0.118 in the H3K4me3 data (
Next, H3K4me3 and RNAs were first profiled by applying scPCOR-seq to a mixture of human H1 ESCs, 293T cells, and GM12878 cells. After sequencing the libraries, the RNAs were distinguished from chromatin targets by a unique barcode embedded in the primers used for reverse transcription. 3,713 single cells were identified from the sequencing data (about 2,000 mRNA reads per cell and 45,000 H3K4me3 unique reads per cell). The H3K4me3 and RNA signals from the pooled single cells were compared with ENCODE H3K4me3 ChIP-seq data (
To test whether scPCOR-seq is able to detect chromatin binding proteins and RNAs in the same single-cell, it was applied to profile both RNA Polymerase II (RNAPII) binding and RNAs in a mixture of H1 ESCs and 293T cells. 2,347 single cells were identified from the sequencing data (about 3,000 mRNA reads per cell and 7,000 RNAPII unique reads per cell). The RNAPII binding and RNA signals from the pooled single cells were compared with ENCODE bulk cell RNAPII ChIP-seq data (
Next, to further validate the scPCOR-seq data, it was tested whether the single-cell RNA data or chromatin occupancy data from the assays can separate cells to different clusters. First, the dimension reduction t-SNE method was directly applied to the scPCOR-seq RNA and H3K4me3 data separately. The K-mean clustering method was applied to the reduced dimensions for clustering scRNA and scH3K4me3, separately. On the other hand, a consensus clustering approach was applied to both scRNA and scH3K4me3 data, from the RNA-H3K4me3 measurement. Single cells were separated into three clusters (Cluster 1 in blue, Cluster 2 in red, and Cluster 3 in orange) (
The scPCOR-seq data was further validated by testing whether the single-cell RNA data or the H3K4me3 data from the assays can separate cells to different clusters. First, the PCA was directly applied to the scPCOR-seq RNA and H3K4me3 data separately. UMAP was applied to the reduced dimensions for scRNA and scH3K4me3, separately. Finally, the software MolTi (Didier, G., et al. Identifying communities from multiplex biological networks. Peerj, 2015. 3.) (multiplex-modularity with the adapted Louvain algorithm to cluster single cells using both RNA and
H3K4me3 data. Single cells were separated into three clusters (Cluster 1 in blue, Cluster 2 in red, and Cluster 3 in orange) from each dataset (
To test whether scPCOR-seq can be used to analyze more complex systems, it was applied to examining the in vitro differentiation of CD36+ erythrocyte precursor cells from human CD34+ hematopoietic stem/progenitor cells (Cui, K. R., et al., Cell Stem Cell, 2009. 4(1): p. 80-93). During the differentiation, the cell surface marker CD36 was significantly upregulated from day 5 and reaches peak expression by day 11, which is accompanied decreased expression of CD34. Libraries were constructed for both H3K4me3 and RNA for CD34+ cells and the cells differentiated for 2, 5, 8 and 11 days. The H3K4me3 and RNA signals from the pooled single cells (CD36+11 days differentiation) were compared with the published bulk cell H3K4me3 ChIP-seq data (
As shown in
Finally, the accessibility bias was examined in the H3K4me3 data by comparing the H3K4me3 with H3K4me3 ChIP-seq data and ATAC-seq data in CD36+ cells. The H3K4me3 data from scPCOR-seq data is highly consistent with H3K4me3 ChIP-seq data instead of the ATAC-seq data.
Since RNAPII is directly responsible for producing RNAs and RNAPII binding from pooled single-cells in H1 and 293T cells indicate a positive correlation between RNAPII binding and RNA levels (
In addition to promoters and transcribed regions, RNAPII is associated with cis regulatory elements (CREs) such as enhancers of active genes (De Santa, F. et al. (2010) A large fraction of extragenic RNA pol II transcription sites overlap enhancers. PLOS Biol 8, e1000384, doi:10.1371/journal.pbio.1000384). Thus, co-binding to CREs and genes may provide evidence of a functional interaction relationship. To this end, the candidate CREs were downloaded from the ENCODE database (Roadmap Epigenomics, C. et al. (2015) Integrative analysis of 111 reference human epigenomes. Nature 518, 317-330, doi:10.1038/nature14248). By considering a window of 1000 bp for each element, the RNAPII density at the CREs and the correlation between the RNAPII density at CRE and gene expression level for both H1 and 293T cells was computed. A pair of CRE and gene is considered to be functionally interacting if the correlation between RNAPII density and gene expression level is higher than a cutoff. Therefore, H1 and 293T cells can have different interactions between CRE regions and genes (
Enhancers regulate their target gene expression by direct physical interaction with target promoters. Thus, the functional interaction between the CRE-gene pairs discovered above could be facilitated by direct physical interaction. To further test this hypothesis, the physical chromatin interaction between the CRE-gene pairs was examined using TrAC-looping data, which specifically detects chromatin interactions among accessible chromatin regions (Lai, B. et al. (2018) Principles of nucleosome organization revealed by single-cell micrococcal nuclease sequencing. Nature 562, 281-285, doi: 10.1038/s41586-018-0567-3). Since most enhancer-promoter interactions occur within a range of 100 kb (van Arensbergen, J., van Steensel, B. & Bussemaker, H. J. (2014) In search of the determinants of enhancer-promoter interaction specificity. Trends Cell Biol 24, 695-702, doi: 10.1016/j.tcb.2014.07.004), this category of functional CRE-gene interactions was focused on by selecting the CRE-TSS pairs that have a distance shorter than 100 kb. Next, the H1-specific and 293T-specific CRE-gene interaction pairs were identified and their respective physical interaction strength was examined using the H1 cell TrAC-looping data. The results showed that the normalized TrAC-looping PETs from H1 cells were significantly higher at the H1-specific CRE-gene pairs than the 293T-specific pairs (
Elucidating cellular heterogeneity was shown to be important for understanding different biological processes, including cell differentiation and tumor progression etc. However, few studies addressed the question of origins and mechanisms of cellular heterogeneity in gene expression. A number of studies indicated variations in chromatin status may contribute to variations in gene expression, suggesting that both cis regulatory elements and trans acting chromatin binding factors play important roles in the cellular heterogeneity of gene expression. In this study, scPCOR-seq was developed, a method for simultaneously measuring RNA expression levels and chromatin occupancy of chromatin binding proteins or histone modifications in the same single cell and demonstrated its application to human H1 ESCs, GM12878, and 293T cells. Analysis of the data revealed that a differential correlation between the location of RNAPII binding and the cell-to-cell variation in gene expression and many CREs co-bound by RNAPII. Overall, it was concluded that scPCOR-seq will serve as a new powerful tool to study the relationship between different omics-layers and the mechanisms behind cellular heterogeneity.
In this study, an assay, termed herein “iscChlC-seq” was developed to profile histone modification marks in single cells. This technique employs the highly efficient TdT enzyme combined with T4 DNA ligase to add a unique barcode to the DNA ends generated by antibody-guided MNase cleavage in each cell. Using iscChIC-seq, the active histone modification mark H3K4me3 and repressive histone mark H3K27me3 were profiled in more than 10,000 single human white blood cells for each modification with detection of about 11,000 and 45,000 reads per cell, respectively, the largest cell number and read number compared to other current high-cell throughput methods. The data allowed successful clustering of different immune cells including T, B, NK, and monocytes from human WBCs. It was found that cell-to-cell variations in H3K4me3 and H3K27me3 in bivalent domains are positively correlated. The cell types annotated from H3K4me3 single cell data are specifically correlated with the cell types annotated from H3K27me3 single cell data. Overall, it was concluded that iscChlC-seq is a reliable method for studying histone modifications at the single cell level, which provide important information for the differentiation status of cells.
Histone H3 trimethyl Lys4 antibody were purchased from Millipore (catalog no. 07-473), histone H3 trimethyl Lys27 antibody were purchased from Diagenode (catalog no. pAb-069-050). Methanol-free formaldehyde solution and DSG (disuccinimidyl glutarate) were purchased from Thermo Fisher Scientific (catalog no. 28906, 20593). Terminal Transferase was purchased from New England BioLabs (catalog no. M0315L). The human embryonic stem cell line H1 (WA01—lot WB35186 p30) was provided by WiCell Research Institute. PA-MNase was purified after transformation of PET15b-PA-MNase plasmid (Addgene#124883) into BL21 Gold (DE3) following standard protocol.
HEK293T cells and GM12878 were maintained in DMEM (Invitrogen, catalog no. 10566-016) supplemented with 10% FBS (Sigma-Aldrich, catalog no. F4135-500ML) following standard procedure. The H1 human embryonic stem cell line was maintained in feeder-free mTeSR™1 medium (Stem Cell Technologies, catalog no.85850) and passaged with ReLeSR™ (Stem Cell Technologies, catalog no.05872) following the manufacturer's instruction. Cells were harvested, washed with 1× PBS twice, and resuspended in DMEM containing 10% FBS and 1% formaldehyde. After 5 min incubation in room temperature, the reaction was stopped by adding 1.25 M glycine, followed by two rounds of washes with PBS. The cells were aliquoted into 1×106 cells per tube, frozen on dry ice, and stored at −80° C.
PET15b-PA-MNase plasmid (Addgene#124883) was transformed into BL21 Gold (DE3) following standard protocol and grow in 40 ml LB medium (containing Ampicillin) overnight. Culture was diluted (1:50) into prewarmed LB medium (containing Ampicillin) and shake for 2 hours at 37° C. till OD600 reached ˜0.6. Fresh IPTG was added to the culture to final 1 mM and shake for another 2.5 hours. For PA-MNase purification, cells pellet was collected, resuspended in 30 ml lysis buffer (50 mM NaH2PO4, 300 mM NaCl, 10 mM Imidazole, 1× EDTA-free protease inhibitor cocktails, 0.5 mM PMSF) supplemented with 30 mg Lysozyme (Thermo Fisher Scientific) and incubated on ice for 30 min. Cell lysate was sonicated for 10 cycles (10 sec on, 10 sec off) and centrifuged at 10,000g for 20 min. In the meantime, 2 ml 50% bead slurry were washed with lysis buffer. Then the supernatant was collected, mixed with beads slurry and rotated at 4° C. for 1 h. After spinning down, the beads were washed 4 times with 8 ml wash buffer (50 mM NaH2PO4, 300 mM NaCl, 20 mM Imidazole, 1× EDTA-free protease inhibitor cocktails, 0.5 mM PMSF), followed by three times elution with elution buffer(50 mM NaH2PO4, 300 mM NaCl, 250 mM Imidazole, 1× EDTA-free protease inhibitor cocktails, 0.5 mM PMSF). The purified fraction was mixed with glycerol, finally aliquoted into small tubes and stored in −80° C.
Human blood samples were obtained from healthy donors from the NIH Blood Bank. The WBCs were isolated as described (Ku W. L. et al. 2019. Single-cell chromatin immunocleavage sequencing (scChIC-seq) to profile histone modification. Nat Methods 16: 323-325). Two-step fixation was modified from(Tian et al. 2012) and performed at room temperature. First, 50 M cells were suspended in 50 ml PBS/MgC12 containing 2 mM DSG and rotated for 45 min. After washing with PBS, the cells were resuspended in 45 ml culture medium DMEM containing 10% FBS. 3 ml 16% formaldehyde was added to 1% final concentration and rotated for 5 min, then the reaction was stopped by adding glycine, followed by two times washes with PBS. The cells were aliquoted into 2×106 cells per tube, frozen on dry ice, and stored at −80° C. until use.
To prepare ProteinA-MNase and antibody complex, 10 μl antibody and 25 μl PA-MNase were pre-incubated on ice in 40 μl antibody binding buffer (10 mM Tris-Cl (pH 7.5), 1 mM EDTA, 150 mM sodium chloride, 0.1% Triton X-100) for 30 min. Meanwhile, the fixed cells (0.25 million) were thawed on ice and resuspended in 200 μl antibody binding buffer. For H3K27me3 analysis, chromatin need to be firstly decondensed by suspending the fixed cells in 0.5 ml RIPA buffer (10 mM Tris-Cl (pH 7.5), 1 mM EDTA, 150 mM sodium chloride, 0.2% SDS, 0.1% sodium deoxycholate, 1% Triton X-100) and incubated at room temperature for 10 min followed by a one time wash in 0.5 ml antibody binding buffer. Then the cells were mixed with PA-MNase and antibody complex, incubated on ice for 60 min, followed by three washes with 500 μl high salt buffer (10 mM Tris-Cl (pH 7.5), 1 mM EDTA, 400 mM sodium chloride and 1% (v/v) Triton X-100). After washing in 200 μl rinsing buffer (10 mM Tris pH7.5, 10 mM sodium chloride and 0.1% (v/v) Triton X-100), the 336 cells were resuspended in 40 μl reaction solution buffer (10 mM Tris-Cl (pH 7.4), 10 mM sodium chloride, 0.1% (v/v) Triton X-100, 2 mM CaCl2) to activate MNase digestion and incubated at 37° C. for 3 min in water bath. The reaction was stopped by adding 4.4 μl 100 mM EGTA. The cells were pelleted by centrifugation at 500 g for 5 min.
The MNase cleavage sites were end-repaired by T4 Polynucleotide Kinase (PNK) for removal of 3′-phosphoryl groups and addition of 5′-phosphates to allow subsequent polyG tailing and ligation. After digestion, the cells were washed twice with 1 ml 1× T4 ligase buffer containing 0.1% NP40, then suspended in 300 μl mixed T4 PNK buffer (1× T4 PNK buffer, 1 mM ATP, 30 μl T4 PNK enzyme) and incubated at 37° C. for 30 min. Meanwhile, 96 barcode-P7 adaptors were thawed, 2.5 μl 10 μM barcode-P7 adaptors were added to a new 96 well PCR plate with multichannel pipette (1 barcode per well). After incubation, the cells were washed once with 1 ml rinsing buffer, suspended with 516 μl nuclei re-suspension buffer (1.27× T4 ligase buffer, 2.5 mM dGTP, 0.05% NP40), and mixed with 526 μl enzyme dilution buffer (1.25× T4 ligase buffer, 52.5 μl Terminal Transferase, 78 μl T4 ligase). Then 10 μl cell suspension was aliquoted, mixed with the 2.5 μl barcode-P7 adaptor in each well. Finally, the 12.5 μl reaction mixture (1× T4 ligase buffer, 1 mM dGTP, 0.02% NP40, 0.5 μl Terminal Transferase, 0.75 μl T4 ligase) in the 96 well PCR plate was sealed completely and incubated at 37° C. for 60 min.
After barcoding the MNase cleavage sites, the reaction system in the 96 wells were pooled together in a solution trough containing 500 μl stop buffer (10 mM Tris-HCl (pH 8.0), 150 mM NaCl, 10 mM EDTA, 0.1%(v/v) Triton X-100), the cells were pelleted, resuspended in 800 μl PBS and send to flow cytometry core. 30 cells were sorted in each well of a new 96 well plate using a BD FACSAria III cell sorter (BD Biosciences) and collected in 10 μl PBS containing 0.1% NP40. Totally 5 plates were collected. After adding 3 μl reverse-crosslink buffer (50 mM Tris-HCl (pH 8.0), 25 ng/ml Proteinase K and 0.1% NP40) into each well by multichannel pipette, the plates were sealed completely, incubated in PCR machine for 65° C. overnight and 80° C. 10 min to inactivate the Proteinase K.
After reverse-crosslink, the DNA fragments with barcode adaptors were captured and labeled with second library indexes through 12 cycles of annealing and extension with 96 PCR1 index primers. The reaction was carried out by adding 15 μl 2× PHUSION® High-Fidelity PCR Master Mix with HF Buffer (New England BioLabs) and 2.5 μl 2 μM index primer (1 index per well) into the reverse-crosslinked solution in 96 wells. Then all the libraries were pooled together as described above, digested 370 with 96 μl Exonuclease I (Thermo Fisher Scientific) at 37° C. for 30 min to degrade the excess index primers. The DNAs were purified by MINELUTE® Reaction Cleanup Kit (Qiagen) and eluted with 64 μl EB buffer (Qiagen). The A tailing was performed in 1× NEBuffer 2 (New England BioLabs) by adding the Klenow fragment (3′→5′ exo-) (New England Biolabs) and 1 mM deoxyATP (New England Biolabs). After incubation at 37° C. for 30 min, the DNAs were purified and eluted by 23 μl EB buffer. Then the Illumine P5 adaptor was ligated to the A-tailing fragments using the T4 DNA ligase (New England BioLabs) by incubation at 16° C. overnight. The DNAs were purified again and eluted by 15 μl EB buffer. PCR2 amplification was performed by adding the PHUSION® High-Fidelity PCR Master Mix with HF Buffer, i5 index primer and P7-cs2 primer in the following condition: 98° C. 3 min, 57° C. 3 min, 72° C. 1 min, 15 cycles of 98° C. 10 s, 65° C. 15 s, 72° C. 30 s, followed by 72° C. 5 min. Then the PCR products were run on the 2% E-Gel® EX Agarose Gel (Invitrogen), the 250-600 base pair (bp) fragments were isolated and purified using the MINELUTE Gel Extraction Kit (Qiagen). The concentration of the library was measured by Qubit dsDNA HS kit (Thermo Fisher Scientific). The paired-end sequencing was performed on Illumina HiSeq 3000.
The scripts for de-multiplexing and genome-wide mapping are available at github.com/wailimku/testing123. For profiling each type of histone marks, 30 single cells were sorted into each of the 480 wells by FACS and sent to sequencing after the library's preparation steps. All sequencing data was paired-end. The R2 reads contained the information of cell barcodes, in which the cell barcode sequences followed the common sequence
For each well, R1 reads were mapped to the human reference genome (UCSC hg18) using Bowtie2 (Langmead and Salzberg 2012). Using the cell barcode information from R2 reads, the mapped R1 reads were separated into 96 sets corresponding to the 96 cell barcodes. Reads with mapping quality less than 10 were removed and duplicated reads were removed. For each well, in order to determine the sets of mapped reads among the 96 sets were from single cells, the 96 sets of mapped reads were ranked based on the total number of mapped reads in the sets. A set of reads were considered to be from single cells if they satisfied: 1) They were one of the top 25 ranked sets. 2) The total number of mapped reads in the set was greater than 1000. Note that, using the calculation of collision rate from a previous study(Cusanovich et 404 al. 2015), 25 sets of reads were considered from single cells if 30 single cells were sorted into a well. Thus, the top 25 ranked sets were considered in criterion 1 above. As a result, combining all single cell data from the 480 wells, about 10,000 single cells were identified for both H3K4me3 and H3K27me3.
Visualization in Genome Browser. For H3K4me3 and H3K27me3, 2,000 single cells were randomly selected and pooled together as the pseudo-bulk cell data. This pseudo-bulk cell data was visualized using the WashU genome browser(Zhou X. et al. 2011. The Human Epigenome Browser at Washington University. Nat Methods 8: 989-990) (
Peaks calling. To examine the quality of the single cell data, the pooled single cell data were compared to the bulk cell ChIP-seq data downloaded from ENCODE (Kazachenka A. et al. 2018. Identification, Characterization, and Heritability of Murine Metastable Epialleles: Implications for Non-genetic Inheritance. Cell 175: 1717). Peaks of this ENCODE data were called using SICER (Zang C. et al. 2009. A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics 25: 1952-1958; Xu S. et al. 2014. Spatial clustering for identification of ChIP-enriched regions (SICER) to map regions of histone methylation patterns in embryonic stem cells. Methods Mol Biol 1150: 97-111). A final set of peaks for each histone marks was obtained by combining the peaks from different immune cell types. Totally, the final combined sets of peaks obtained from ENCODE data contained 52,798 and 79,100 peaks for H3K4me3 and H3K27me3, respectively. Peaks from the pooled single cells were identified using SICER and their widths were fixed to be 3,000 and 10,000 for H3K4me3 and H3K27me3, respectively. The overlap between peaks from the pooled single cells and the bulk-cell data were computed using the function “findOverlaps” in the R packages “GenomicRange”(Lawrence M. et al. 2013. PLOS Comput Biol 9: e1003118.).
Scatter plots. The human genome was equally divided into bins (bin size=5 kb for H3K4me3; bin size =50 kb for H3K27me3). For both bulk cell and pooled single cell libraries, the read density (counts per million, CPM) at each bin was calculated. The correlation between the logarithm of the read densities of two libraries was quantified using the Pearson correlation coefficient (
TSS profile plots. For H3K4me3, the software Homer(Heinz et al. 2010) was used to calculate the TSS density profile (annotatePeaks.pl tss mm9 -size 3000 -hist 20 -len 1) for each single cells. In particular, a region of 3 kb around each TSS was considered. This region was then divided into 150 bins. The density profile was generated using the number of reads mapped onto the bin divided by the total number of mapped reads, and averaged over all promoters.
Expression matrix. Single cells with reads more than 3000 (4000) were first selected. This resulted in 7798 and 9207 single cells for H3K4me3 and H3K27me3, respectively. Second, it was required that the fraction of reads in peaks higher than 0.15 (0.15) were selected for clustering analysis for H3K4me3 (H3K27me3) single cell data. This resulted in 6,021 and 7,038 single cells for H3K4me3 and H3K27me3, respectively. For each cell in H3K4me3 (H3K27me3), reads located within the 52,978 (79,100) combined H3K4me3 (H3K227me3) were counted. A consensus clustering approach was applied, that is similar to SC3 (Kiselev et al. 2017), to the iscChIC-seq data. First, a read count matrix R was computed, in which the columns correspond to cells and rows correspond to the peaks. Rij indicates the number of reads at the ith peak from the jth cell. Each column in the read count matrix was divided by the library size and multiplied by a factor of 106. The resulting matrix denoted as M. The log 2 transformation was further applied resulting M′ where M′=log2(M +1). For filtering the non-informative bins, a binary matrix Mb was obtained from M′ and defined as,
The ith row (peak) in the matrix M′ would be selected if
value equals to 100 for both H3K4me3 and H3K27me3, respectively. The filtering of these bins is based on the assumption that reads at a bin should be found in more single cells if the bin is more informative. The expression matrix was denoted after the deletion of rows (peaks) as M″.
Calculation of the Laplacian matrix. Consider mj to be a vector equal to the jth column (cells) of M″. First, the similarity between cells was computed using the Pearson correlation, and resulting a correlation matrix C. In particular, Cij is the Pearson correlation value between the vectors mj and mi. Thus, the rows and columns of the matrix C correspond to single cells. The Laplacian matrix L is defined by L=I−D−1/2AD−1/2, where I is the identity matrix. A is a similarity matrix where A=e−(2−C)/max(2−C). Note that D is the degree matrix of A, a diagonal matrix that contains the row-sums of A on the diagonal (Dii=ΣiAij). The eigenvectors of the Laplacian matrix were computed and formed a matrix V where each column represents an eigenvector. The columns of V from left to right are sorted in ascending order based on their corresponding eigenvalues.
Optimal number of clusters. The silhouette analysis was applied to determine the optimal number of clusters. First, a matrix WS1 was created, which is a submatrix of V and WijS1=Vij. Note that i is from 1 to the total number of bins and j=1, . . . s1. s1 is fixed to be 12 for both H3K4me3 and H3K27me3. The K-mean method was applied to the matrix WS1 el for clustering single cells into k clusters and computed the silhouette coefficient for the clusters. By varying the number of clusters k from 4 to 12, the optimal k value was determined by selecting the case of k having the largest silhouette coefficient value. The optimal k is equal to six for both H3K4me3 and H3K27me3.
Clustering. A binary matrix E was considered in which its rows and columns correspond to single cells. The K-mean method was applied to the matrix Wt to cluster the single cells with k=6. If cells i and j belong to the same cluster, Eij=Eji=1; otherwise 0. We consider t is between 2 to 15 and for each t, the clustering analysis was repeated for 10 times and thus obtaining 10 different—Es. A final matrix Ec is calculated by averaging all binary matrices from each individual clustering.
t-SNE visualization. The dimension reduction method t-SNE was applied to the matrix Ec. The position of single cells is visualized in the two-dimensional t-SNE representative space.
Cluster annotations. After clustering single cells from the single cell H3K4me3 or H3K27me3 data, the clusters were annotated to cell types using the bulk cell ENCODE data. First, the H3K4me3 and H3K27me3 ENCODE data was downloaded for B cells, monocytes, T cells, and NK cells. There were at least two replicates for each histone marks and each cell type. For both H3K4me3 and H3K27me3, the density matrices with log 2 transformation (VB, Vmono, VT, VNK), which was similar to M″, were computed for the four cell types, respectively. The number of rows was equal to the number of peaks while the number of columns was equal to the number of replicates. Note that peaks that were deleted in the single cell analysis were also deleted for the bulk cell density vectors. The student t-test was used to compute the cell-type specific peaks from the four density matrices (VB, Vmono, VT, VNK). The ith row vector of the matrix Vz (Z=B, mono, T, or NK) was denoted as vlz. The ith peak (row) was specific to a cell type Z if Viz is significantly higher than all vl′Y with a p-value of 0.05 and mean(vi′z)−mean(viY)>a cutoff (0.4 for H3K27me3, and 0.2 for H3K4me3), where Y=B, mono, T, NK and Y≠Z. For the purpose of cluster annotation, the sets of cell-type-specific peaks (specific to cell type Z) were denoted as S4,an,z and S27,an,z for the H3K4me3 and H3K27me3 bulk cell data, respectively.
For each histone mark, pseudo-bulk log 2 density matrices (W1, W2, W3, W4, W5, W6) were computed for cluster 1, 2, 3, 4, 5, and 6, respectively. In each of these matrices, the number of columns was equal to the number of peaks while the number of rows was equal to the number of pseudo-bulk replicates. To generate Wi (i=1, 2, 3, 4, 5, 6), six sub-samples of cells were randomly selected from the cells belonging to cluster i, in which the size of each subsample was equal to one-third of the number of cells belonging to cluster i. By pooling the cells in each sub-sample, the log 2 density for each peak was calculated for obtaining Wi. The jth row of Wi was denoted as Wji. The jth peak was specific to a cluster i if Wji was significantly higher than all Wjk where k=1,2,3,4,5,6 and k≠i. Note that p-value computed by student-t test was required to be smaller than 0.05 and mean (Wji)-mean (Wjk) was higher than a cutoff (0.1 for both H3K4me3 and H3K27me3). The sets of cluster-specific peaks (specific to cluster i) for the use of cluster annotation were denoted as X4,an,i and (X27,an,i for the H3K4me3 and H3K27me3 bulk cell data, respectively.
The set of cluster-specific peaks and cell-type-specific peaks were compared. For H3K4me3 data, the p-value for the intersect between a cell type Z and a cluster i (X4,an,i∩S4,an,z) was computed by the hypergeometric test. A cluster i was considered to be annotated validly to a cell type Z if the p-value for (X4,an,i∩S4,an,z) is smaller than 11e -05 and the p-value for other comparisons (X4,an,i∩S4,an,z) Y=B, mono, T, NK but ≠Z) is greater than 1-05.
Reproducibility of cluster annotations. To check how reproducible the cluster annotations is, the computations were for 100 times and the cluster density matrices were re-generated each time via the same sub-sampling procedures. The mean and the standard deviation of the p-value in the comparisons were computed and shown in
Matching the clusters between H3K4me3 and H3K27me3 marks. For either single cell H3K4me3 or H3K27me3 data, six clusters were found where four of them were annotated as monocytes s T cells, B cells, and NK cells, respectively. If a cluster obtained from single cell H3K4me3 data annotated with a cell type, this cluster was expected to correlate with the cluster obtained from single cell H3K27me3 data annotated with the same cell type.
Bivalent domains were defined as regions where H3K4me3 and H3K27me3 peaks obtained from ENCODE data that were overlapped (command: bedtools intersect-a ‘113K27me3 peak file’ -b ‘113K4me3 peak file’). 25,951 bivalent domains were obtained, in which 7,989 bivalent domains were overlapped with the TSS regions. For both single cell H3K4me3 and H3K27me3 data, we computed the pseudo-bulk log 2 density (WB,4, Wmono,4 WT,4, WNK,4 and WB,27, Wmono,27, WT,27, WNK,27) for clusters annotated to B cells, Monocytes, T cells and NK cells, respectively. To generate Wz,4 or Wz,27, six sub-samples of cells were randomly selected from the cells belonging to cluster annotated to cell type Z, in which the size of each subsample was equal to two-third of the number of cells belonging to that cluster. By pooling the cells in each sub-sample, the log 2 density for each peak was calculated for obtaining Wz,4 or Wz,27. The jth row of Wz,4 was denoted as WjZ,4 while the jth row of Wz,27 was denoted as W2,27. A peak was specific to a H3K4me3 cluster annotated to cell type Z if WjZ,4 was significantly higher than all WjY,4 where Y=B, mono, T, NK but Y≠Z. Note that FDR of the p-value (computed by student-t test) was required to be smaller than 0.05 and mean(WjZ,27)-mean (WjY,4 ) was larger than 0.3. A peak was specific to a H3K27me3 cluster annotated to cell type Z if WjZ,27 was significantly lower than all WjY,27 where Y=B, mono, T, NK but Y≠Z. Note that FDR for the p-value was required to be smaller than 0.05 and mean (WjZ,27)-mean(WjY,27) was smaller than 0.3. The sets of cluster-specific peaks (specific to cluster annotated to cell type Z) for the use of matching H3K4me3 and H3k27me3 clusters were denoted as X4,mat,z and X27,mat,z for the H3K4me3 and H3K27me3 clusters, respectively. The p-value for the intersection X4,mat,z∩X27,mat,z was computed by hypergeometric test, where Z, Y=B, mono, T, NK.
Relationship between cell-to-cell variation in H3K4me3 and H3K27me3. Different from the procedures of matching the H3K4me3 and H3K27me3 clusters, all bivalent domains were considered. Also, instead of calculating the pseudo-bulk log 2 density matrices, the vectors of coefficients of variation (CVB,4, CVmono,4, CVT,4, CVNK,4 and CVB,27, CVmono,27, CVT,27, CVNK,27) were calculated for the H3K4me3 and H3K27eme3 clusters annotated to B cells, Monocytes, T cells and NK cells, respectively. Similar to the single cell log 2 density matrices M″, the log 2 density matrices for single cells in H3K4me3 and H3K27me3 clusters were denoted as (MB,4, Mmono,4, MT,4, MNK,4 and MB,27, Mmono,27, MT,27, MNK,27) referring to H3K4me3 and H3K27me3 clusters annotated to B cells, Monocytes, T cells and NK cells, respectively. Each of these density matrices has the dimensions of the number of bivalent domains multiplied by the number of single cells in the clusters. The vectors of coefficients of variation were computed using these density matrices over the single cells. For the purpose of finding the relationship between cell-to-cell variation in H3K4me3 and H3K27me3, the jth bivalent domain was specific to a H3K4me3 cluster annotated to cell type Z if log2cvjZ,4 is larger than all log2cvJY,4 than a cutoff (0.2) where Y=B, mono, T, NK and Y≠Z, and the number of non-zero elements in jth row of MZ,4 MB,4 is larger than 5% of the mean of the number of non-zero elements overall all rows in MZ,4. The second requirement is to only include those relatively more confident CV value for each cluster. The same calculation was applied to obtain the bivalent domains that were specific to a H3K27me3 cluster annotated to cell type Z. The sets of cluster-specific peaks (specific to cluster annotated to cell type Z) for the use of finding the relationship between cell-to-cell variation in H3K4me3 and H3k27me3 were denoted as X4,cv,z and X27,cv,z for the H3K4me3 and H3K27me3 clusters, respectively. By considering the bivalent domains in the set of X4,ev,z∩X27,cv, the spearman correlation between CVZ,4 and CVY,27 for and Y, Z=B, mono, T, and NK.
The simultaneous addition of several dG nucleotides to DNA ends by TdT enzyme and ligation of oligo-dC barcode adaptors by T4 DNA ligase is an efficient strategy to barcode chromatin regions following DNase digestion. This barcoding strategy was adapted to label the DNA ends generated by antibody-guided MNase cleavage in ChIC-seq assays to profile histone modifications in more than tens of thousands of single cells in one experiment through three levels of barcoding and indexing strategy (
The iscChIC-seq was first applied to white blood cells isolated from human blood for profiling the H3K4me3 modification, which is an active histone modification mark, at a single cell resolution. Using a cutoff to filter cells with less than 1,000 reads, 10,000 single cells and about 9,000 reads per cell on average were detected in one single experiment. Using a more stringent filtering criteria (a cell has at least 3,000 reads), this resulted in ˜7,800 single cells each having about 11,000 reads on average. The cell number and unique reads number per cell detected by iscChlC-seq were significantly improved as compared with the previous published single-cell methods. The genomic profiles of the sequencing read from pooled single cells displayed specific peaks around transcription start site (TSS) and were highly consistent with that of the bulk cell H3K4me3 ChIP-seq data from ENCODE (
Next, it was examined if different cell types of the human WBCs, which contain T cells, NK cells, monocytes, and B cells, could be identified from the iscChlC-seq data. For this purpose, a combined reference set of H3K4me3 peaks for human WBCs were first computed using the ENCODE bulk cell H3K4me3 ChIP-seq data (Methods). By applying the silhouette analysis(Rousseeuw P. J. 1987. Silhouettes—a Graphical Aid to the Interpretation and Validation of Cluster-Analysis. J Comput Appl Math 20: 53-65), a number of six were found to be the optimal number of clusters (
Next, the genomic profiles of the annotated pooled single cell data (from cluster T, B, NK, and monocyte) were compared with the genome profiles of ENCODE bulk cell ChIP-seq data for the corresponding cell types. The analysis revealed that the annotated cluster of single cells showed a genomic profile highly similar to that of the corresponding bulk cells at the cell-type specific gene loci including PAXS, CD19, CD14, CD93, CD3D, CDS, TBX21 and NCR1 (
At the single cell level, the majority of cells annotated as T cells, B cells, NK cells, monocytes exhibited high H3K4me3 density in regions associated with CD3D+CD3E+CD3G (T cell-specific), PAX5 (B cell-specific), TBX21 (NK and T cell-specific), CD14+CD93 (monocyte-specific), respectively (
To test if iscChlC-seq worked for detecting repressive histone marks, it was applied to profiling H3K27me3 in WBCs. Using a filtering approach similar to that used for H3K4me3 iscChIC-seq libraries, 10,000 single cells each having about 40,000 unique reads on average were detected. Using a more stringent filtering criteria such that a cell has at least 4,000 unique reads, it resulted in ˜9,000 single cells each having about 45,000 reads on average. The genomic profiles of the pooled single cells were highly consistent with the profiles of the bulk cell H3K27me3 ChIP-seq data from ENCODE (
Different from ChlP-seq, ChIC-seq depends on antibody-guided cleavage of chromatin by MNase and thus may have bias toward open chromatin regions. To address this question, all the DHSs were identified from the ENCODE DNase-seq datasets from T, B, NK and monocyte cells and the fraction of the ENCODE bulk cell H3K4me3 ChIP-seq reads that overlapped with DHSs in each cell type were analyzed. The analysis revealed that about 60% to 67% of H3K4me3 CHIP-seq reads from the ENCODE bulk cell H3K4me3 ChIP-seq libraries fell into the DHS regions. In contrast, about 52% to 56% of the H3K4me3 reads from the pooled single cells fell into the DHS regions, providing evidence that the specificity of the H3K4me3 reads from the iscChIC-seq libraries is slightly lower than that of the bulk cell ChIP-seq libraries, which may be caused by differences in washing conditions and/or differences in cell numbers used for the experiments. The H3K27me3 data was also similarly analyzed. These results indicate that while about 38% to 53% of H3K27me3 reads from the ENCODE bulk cell H3K27me3 ChIP-seq libraries fell into the DHS regions, about 33% to 41% of the H3K27me3 reads from the pooled single cells fell into the DHS regions. Thus the percentage of the H3K27me3 reads from the iscChIC-seq libraries in DHS regions is slightly lower than that from the bulk cell libraries, indicating that the H3K27me3 reads detected by iscChlC-seq are not substantially biased toward open chromatin regions. To further estimate the true positive and false positive rates of the iscChlC-seq reads, it was assumed that the peaks from pooled single cells that overlap with those from ENCODE data are true positives while the peaks not overlapping with the ENCODE peaks are false positives. The analysis revealed that while the false positive rate ranges from 1.6 to 2.7%, the true positive rate is about 22% to 32% for H3K4me3 and H3K27me3, respectively.
Since the same WBC populations were used in profiling single cell H3K4me3 and single cell H3K27me3, it would be important to examine if a cluster annotated with a cell type from H3K4me3 iscChlC-seq data is specifically correlated with the cluster annotated with the same cell type from H3K27me3 iscChIC-seq data. H3K4me3, an active modification, and H3K27me3, a repressive modification, are co-localized at some key regulatory genomic regions due to either bivalent modifications or cellular heterogeneity (Bernstein B. E. et al. 2006. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125: 315-326; Roh T. Y. et al. 2006. The genomic landscape of histone modifications in human T cells. Proc Natl Acad Sci USA 103: 15782-15787; Wang Q. et al. 2019. CoBATCH for High-Throughput Single-Cell Epigenomic Profiling. Mol Cell 76: 206-216 e207; Wei G. et al. 2009. Global mapping of H3K4me3 and H3K27me3 reveals specificity and plasticity in lineage fate determination of differentiating CD4+ T cells. Immunity 30: 155-167). The relative levels of these two modifications at these regions are related to each other and influence the expression of underlying genes (Roh et al. 2006). To test this possibility, 7,873 TSS regions (+/−2.5 kb) were first identified which exhibited overlapping H3K4me3 and H3K27me3 peaks from the bulk cell H3K4me3 and H3K27me3 ChIP-seq data in monocytes, T cells, B cells, and NK cells. Next, cluster-specific H3K4me3 peaks among the 7,873 bivalent genes from the H3K4me3 iscChIC-seq data were identified, which are peaks that have higher H3K4me3 methylation level in one cell cluster compared to all other clusters. To relate the H3K4me3 modification with H3K27me3 modification in the iscChlC-seq datasets, it was reasoned that when H3K4me3 level becomes higher, the H3K27me3 level should become lower. Thus, from the four cell clusters based on the H3K27me3 iscChlC-seq data, the cluster-specific peaks among the 7,873 bivalent genes were identified, which are peaks that have lower H3K27me3 methylation level in one cluster compared to all other clusters. Comparison between these two kinds of cluster-specific peaks revealed that the specific peaks of a H3K4me3 cluster is significantly overlapped with the specific peaks of the H3K27me3 cluster if they are annotated as the same cell type (
H3K4me3 is usually associated with gene activation, while H3K27me3 is associated with gene repression. The previous single-cell H3K4me3 data indicated that the cell-to-cell variation in H3K4me3 is correlated with the cell-to-cell variation in gene expression (Ku W. L. et al. 2019. Single-cell chromatin immunocleavage sequencing (scChlC-seq) to profile histone modification. Nat Methods 16: 323-325), suggesting that single-cell histone modification data is useful in understanding the cellular heterogeneity in gene expression. However, due to the relatively small number of single-cells (scChIC-seq assay) or relatively sparse unique reads (iACT-seq and scCUT&Tag), the application of these techniques are limited. In this study, the TdT+T4 DNA ligase-mediated barcoding strategy with the scChIC-seq protocol for iscChlC-seq, which enabled the analysis of either active or repressive histone modification profiles in more than 10,000 single cells in one experiment. The assay captured 11,000 unique reads for H3K4me3 or 45,000 reads for H3K27me3 per single cell, which are better than other high throughput techniques for histone modifications. Different from PA-TN5-based techniques, iscChlC-seq works well for both active and repressive marks. Comparison with the bulk cell ChIP-seq data indicated that iscChIC-seq does not have substantial bias toward open chromatin regions for either active or repressive histone modification marks. In addition, iscChlC-seq does not require expensive equipment or special reagents and thus easily accessible to most laboratories with molecular biology capabilities.
The analysis in this study indicated that both the active H3K4me3 and repressive H3K27me3 iscChlC-seq data were effective in clustering the complex WBCs and sorting out different cell types. H3K4me3 and H3K27me3 are colocalized to a subset of genomic regions, which are termed “bivalent domains”. Bivalent modifications are usually associated with key differentiation regulator genes and thus show substantial changes during cell development or differentiation and the expression of a bivalent gene is correlated with the relative level of H3K4me3 and H3K27me3 signals at the gene locus. Although the overlap of H3K4me3 and H3K27me3 peaks at these genomic regions may be caused by different mechanisms including true bivalent modifications and cellular heterogeneity, the dynamic equilibrium of the two opposing modifications at these regions result from the competition of the corresponding enzymes to these regions. Hence, the two functionally opposite modifications may be co-regulated but demonstrate opposite directions. Indeed, the data herein showed that the increased H3K4me3 levels in bivalent genes in one type of cell cluster are positively correlated with the decreased H3K27me3 levels in the same bivalent genes in the same type of cell cluster. The cell-to-cell variations in H3K4me3 and H3K27me3 are positively correlated and exhibit the highest correlation when the cell cluster annotated from the H3K4me3 iscChlC-seq data matches with the same type of cell cluster annotated from the H3K27me3 iscChlC-seq data. Thus, these properties of bivalent modifications can be used to specifically correlate the cell clusters annotated from different single cell H3K4me3 and H3K27me3 data.
Overall, the data herein, show that iscChlC-seq is a reliable single-cell technique for measuring histone modifications and potentially for chromatin binding proteins, which may find broad applications in studying cellular heterogeneity and differentiation status in complex developmental and disease systems.
Cellular heterogeneity in gene expression, has been extensively studied through single-cell sequencing methods. For example, single-cell RNA sequencing (scRNA-seq) has revealed significant heterogeneity in primary glioblastomas (Patel, A. P., et al. (2014) Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science, 344, 1396-1401). Also, increased levels of heterogeneity in these tumors are inversely correlated with survival, indicating that intratumor heterogeneity should be an essential clinical factor. Successful identification of regulators of this heterogeneity is critical to the development of new therapeutic drugs.
DNase I hypersensitivity of chromatin informs the chromatin states of cis-regulatory elements that govern the expression of target genes including master regulators (Lai, B., et al. (2018) Principles of nucleosome organization revealed by single-cell micrococcal nuclease sequencing. Nature, 562, 281-285. Mezger, A., et al. (2018) High-throughput chromatin accessibility profiling at single-cell resolution. Nat Commun, 9, 3647. Chen, X., et al. (2018) A rapid and robust method for single cell chromatin accessibility profiling. Nat Commun, 9, 5345. Cusanovich, D. A., et al. (2018) A Single-Cell Atlas of In Vivo Mammalian Chromatin Accessibility. Cell, 174, 1309-1324 e1318). Cellular heterogeneity in gene expression has been linked to variation in chromatin accessibility (Jin, W., et al. (2015) Genome-wide detection of DNase I hypersensitive sites in single cells and FFPE tissue samples. Nature, 528, 142-146), nucleosome organization and long distance enhancer-promoter interactions (Jin, W., et al. (2015) Genome-wide detection of DNase I hypersensitive sites in single cells and FFPE tissue samples. Nature, 528, 142-146); thus, measuring chromatin states at the single-cell level is of the utmost importance for understanding the molecular mechanisms of gene expression heterogeneity. Several single cell techniques were developed to measure chromatin accessibility, including scATAC-seq (Buenrostro, J. D., et al. (2015) Single-cell chromatin accessibility reveals principles of regulatory variation. Nature, 523, 486-490. Satpathy, A. T., et al. (2018) Transcript-indexed ATAC-seq for precision immune profiling. Nat Med, 24, 580-590. Lareau, C. A., et al (2019) Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat Biotechnol, 37, 916-924) by Tn5 chromatin tagmentation, scDNase-seq by DNase I digestion for chromatin fragmentation, and scMNase-seq by MNase detection of chromatin accessibility and nucleosome positions. The standard throughput of many of these methods is in the thousands of cells, and of these methods scATAC-seq has the highest cell throughputs; however, it is also known that DNA tagmentation bias exists in the use of Tn5 (Li, Z., et al. (2019) Identification of transcription factor binding sites using ATAC-seq. Genome Biol, 20, 45), which may affect the accuracy of the regulator prediction and cell-to-cell variation in accessibility, limiting its potential applications.
DNase I enzymes have different properties compared to Tn5 (Karabacak Calviello, A., et al. (2019) Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling. Genome Biol, 20, 42). However, due to a lack of development in combinational indexing strategies for scDNase-seq, its cell throughput is very low and thus its application in single-cell studies is limited. To address this limitation, the study described herein provided a novel indexing strategy, which avoids the use of expensive equipment for automation or microfluidics, to enable the analysis of more than 15,000 cells in a single experiment. This new strategy, termed indexing scDNase-seq (iscDNase-seq), involves barcoding the DNA ends with a combination of TdT terminal transferase and T4 DNA ligase. We applied it to assay single-cell DHSs from human white blood cells (WBC). Computational analysis of the assay results recovered expected cell types from the WBCs and inferred their underlying regulatory mechanisms in accessibility variation. By comparing the iscDNase-seq data obtained herein with publicly available dscATAC-seq data for B cells, T cells, NK cells, and monocytes, it was found that iscDNase-seq detects DHSs missed by scATAC-seq that have high sequence conservation and are associated with significant gene expression. Importantly, iscDNase-seq data can better predict the cellular heterogeneity in gene expression compared to scATAC-seq data. Thus, iscDNase-seq is an attractive alternative method for measuring single-cell chromatin accessibility.
In the iscDNase-seq protocol (
Human healthy donor bloods were collected and defibrinated or heparinized in a EDTA sodium-treated tubes or bags for anticoagulant of blood by the NIH blood bank. The peripheral blood mononuclear cells (PBMC) were purified by the density centrifugation using Lymphocyte Separation Medium (Corning, catalog no. 45000-726).
Two-Step Crosslinking of Cells
The isolated 50 M of PBMC suspended in 50 ml PBS/MgCl2 were first fixed by adding 400 μl freshly prepared 0.25 M Disuccinimidyl glutarate (DSG, ThermoFisher Scientific, catalog no.20593) and incubating at room temperature for 45 min with rotation (Tian, B., et al. (2012) Two-Step Cross-linking for Analysis of Protein-Chromatin Interactions. Methods of Molecular Biology, 809, 105-120). After three washes with PBS, the cells were suspended in culture medium DMEM supplemented with 10% FBS and further fixed by adding 1:15 volume of 16% (w/v) methanol-free formaldehyde solution (Thermo Fisher Scientific) and incubating at room temperature for 10 min (Kidder, B. L., et al. (2011) ChIP-Seq: technical considerations for obtaining high-quality data. Nature Immunology, 12, 918-922). The reaction was terminated by adding a 1:10 volume of 1.25 M glycine and incubating at room temperature for 5 min. The fixed cells were collected by centrifugation at 1320 rpm for 7 min and washed with PBS. The fixed cells were stored in aliquots (1×106 cells per tube) at −80° C. until use.
The two-step fixed cells (1×106) were suspended in 0.5 ml of RSB buffer (10 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% Triton X-100) and incubated for 10 min on ice. 50 units of DNase I were added to the cells, followed by incubation in 37° C. water bath for 5 minutes to digest the chromatin (Pilot DNase I titration is needed (Cooper, J., et al. (2017) Genome-wide mapping of DNase I hypersensitive sites in rare cell populations using single-cell DNase sequencing. Nature Protocols, 12, 2342-2354)). The reaction was quenched by adding 10 μl 0.5 M EDTA to a final concentration of 10 mM. The cells were centrifuged at 1320 rpm for 5 mins at 4° C. The supernatants were carefully removed by pipetting without disturbing the cell pellets. The pellets were washed three times using 1 ml 1× T4 ligase buffer (final 0.1% NP40) to remove the DNase I completely.
The DNase I-digested cells were resuspended in nuclei resuspension buffer (328 μl H2O; 132 μl 10 mM dGTP; 66 μl 10×T4 ligase buffer; 5.3μl 10% NP40) and equally distributed to 96 wells of a 96-well plate. To add several Gs at the 3′ end of DNA and allow adaptor ligation, 2.5 μl of 10 μM barcode P7 adaptor were added into each well, followed by adding 5 μl of the enzyme dilution buffer (66 μl 10× T4 ligase buffer; 330 μl H2O; 40 μl TdT enzyme; 13 μl T4 PNK; 78.75 μl T4 ligase) with gentle mixing (pipette up and down 5-7 times). TdT and T4 ligation is performed on the PCR machine for 30 min at 37° C. with lid heating.
After TdT and T4 ligation, nuclei were pooled and re-suspended in 1 ml PBS containing 0.1% NP40 and 3 μM DAPI (Invitrogen) for nuclei staining. After 5 min incubation at room temperature, the nuclei were counted under the DAPI fluorescent microscope and 30 nuclei were distributed, using a flow cytometry sorter, into each well of a 96-well plate containing 3 μl reverse-crosslink buffer (50 mM Tris-HCl pH 8.0, 25 ng/ml Proteinase K, 0.1% NP40) mixed with 10 μl PBS containing 0.1% NP40. Up to 6 plates of cells were collected. The plates were sealed completely and incubated at 65° C. overnight on PCR machine with lid heating. After reverse-crosslinking, add 2.5 μl of 2 μM well index primer and 15 μl of 2×PHUSION® master mix (New England BioLabs, catalog no.M0531S) into each well for PCR1 amplification without DNA purification. The PCR1 was done under the following condition: 98° C., 3 min; followed by 12 cycles of 65° C., 30 s and 72° C., 30 s; one cycle of 72° C., 5 min. After PCR1, for each 96-well plate, all of the products were pooled and incubated with 96 μl of Exonuclease I (ThermoFisher Scientific, catalog no. EN0582) at 37° C. for 30 mins to degrade the excessive of well index primers. DNA was then purified by the MINELUTE® Reaction Cleanup Kit (Qiagen, catalog no. 28206).
A-tailing and P5 adaptor ligation were performed as described previously (Ku, W. L., et al. (2019) Single-cell chromatin immunocleavage sequencing (scChIC-seq) to profile histone modification. Nature Methods, 16, 323). After P5 adaptor ligation, library DNA is purified by the MINELUTE® Reaction Cleanup Kit. PCR2 was performed by adding 15 μL DNA; 0.4 μl of 10 μM i5 primer; 0.4 μl of 10 μM p7-cs2 primer; 15.8 μl 2×PHUSION® Master Mix with the following condition: 98° C., 3 min; 57° C., 3 min; 72° C., 1 min; followed by 15 cycles of 98° C., 10 s; 65° C., 15 s and 72° C., 30 s; one cycle of 72° C., 5 min. The 220-600 base pair (bp) fragments were isolated using the 2% E-GEL® EX Agarose Gels (Invitrogen, cat #G401002) and purified using the Q1Aquick Gel Extraction kit (Qiagen). The concentration of the purified DNA was measured using Qubit dsDNA HS kit (Thermo Fisher Scientific). The paired-end 50-6-8-50 sequencing was performed using the Illumina MiSeq and HiSeq 3000.
The scripts for de-multiplexing and genome-wide mapping are available at github.com/wailimku/testing456. 30 single cells were sorted into each of the 480 wells by FACS and sent to sequencing after the library's preparation steps. All sequencing data was paired-end. The R2 reads contained the information of cell barcodes. For each well, R1 reads were mapped to the human reference genome (UCSC hg18) using Bowtie2 (Langmead, B. and Salzberg, S. L. (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods, 9, 357-359). Using the cell barcode information from R2 reads, we separated the mapped R1 reads into 96 sets corresponding to the 96 cell barcodes. Reads with mapping quality less than 10 were removed and duplicated reads were removed. For each well, in order to determine the sets of mapped reads among the 96 sets were from single cells, we ranked the 96 sets of mapped reads based on the total number of mapped reads in the sets. A set of reads were considered to be from single cells if they satisfied:
For further filtering the single cells, the merged peaks identified by bulk-cell DNase-seq data were downloaded from ENCODE. Totally, bulk cell DNase-seq libraries were downloaded from ENCODE. For each of the bulk-cell DNase-seq library, peaks were called using MACS2 (Zhang, Y., et al. (2008) Model-based analysis of ChIP-Seq (MACS). Genome Biol, 9, R137), and peaks from all libraries were merged if they overlapped by at least 1 bp. Finally, 218,595 were identified for the bulk-cell DNase-seq data for human WBC. The width of peaks was fixed to be 1,000. A further filtering step was applied to the selected single cells by requiring that reads in single cell need to be more than 4000 and FRiP (fraction of reads in peaks defined by the bulk-cell DNase-seq data) of single cell need to be greater than 0.15.
All reads from single cells were pooled together and visualized via the WashU genome browser (Zhou, X., et al. (2011) The Human Epigenome Browser at Washington University. Nat Methods, 8, 989-990) together with the bulk-cell DNase-seq data. Peaks from the pooled single cells were identified using MACS (Zhang, Y., et al. 2008 Genome Biol, 9, R137) and their widths were fixed to be 5,00. The overlap between peaks from the pooled single cells and the bulk-cell data were computed using the function ‘FindOverlap’ in the R package called GenomicRanges (Lawrence, M., et al. (2013) Software for computing and annotating genomic ranges PLOS Comput Biol, 9, e1003118). The read density of pooled single cell and pooled bulk-cell data from the 18 bulk-cell libraries were calculated over the bulk-cell peaks. In particular, peaks with read density equal to 0 from either pooled single cell or bulk cells were removed in the calculation. The correlation between the read densities of pooled single cell and bulk cell was quantified by the Pearson Correlation.
Clustering Analysis for the iscDNase-seq Data
Expression matrix. First, a read count matrix R, was computed in which the columns correspond to cell and rows correspond to DHSs that were identified using pooled single cells. Rij indicates the number reads at the DHS site i from the jth cell. For filtering the non-information DHSs, DHSs with total number of reads over all single cells less than 150 were filtered out.
A Latent Semantic Indexing (LSI) analysis. Similar to the previous studies, latent semantic indexing (LSI) was applied to the read count matrix to reduce the dimensions. To perform the LSI analysis, the read count matrix was normalized by term frequency inverse document frequency (TF-IDF) and then a Singular-Value Decomposition (SVD) was performed on the normalized count matrix (Chen, X., et al. (2018) A rapid and robust method for single cell chromatin accessibility profiling. Nat Commun, 9, 5345; Cusanovich, D. A., et al. (2015) Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science, 348, 910-914). By removing the first dimension component after SVD transformation, the inverse SVD transformation was applied, resulting in a normalized read count matrix E′ in which rows correspond to DHSs and columns correspond to cells.
t-SNE visualization and clustering. A t-SNE was applied to the normalized read count matrix E′. The position of single cells was visualized in the two-dimensional t-SNE representative space. Single cells are labeled in two different ways. First, single cells were labeled according to the clusters they were from. Second, single cells were labeled according the annotation of cell types. DB SCAN was applied to the two-dimensional t-SNE representative space for clustering.
Generating Heatmap for the Cluster Specific Reads of iscDNase-seq Data
Identifying cluster specific peaks. The normalized read count matrix E′ was transformed to another normalized matrix G in which rows correspond to DHSs and columns corresponds to clusters. In particular, Gij=mean (E′ ik) for all cell k belonging to cluster j. Further, the fold-change of DHSs in each cluster was computed where fold change at peak i for cluster
for all j=1, . . . , 4 and ≠k. For each cluster, DHSs was selected with fold-change greater than 1.5. Finally, the heatmap of E′ at the specific peaks were plotted.
TF motif analysis. For each cluster, AME was applied to the specific peaks for identifying significant motifs, and the top 40 significant motifs were selected first by also requiring p-value <0.01 (McLeay, R. C. and Bailey, T. L. (2010) Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data. BMC Bioinformatics, 11, 165). Then of that set, only motifs exclusive to one cluster were kept.
Comparing iscDNase-seq Against dscATAC-seq
Peak calling. Peaks were identified using MACS calls (parameters:—format bed—nomodel—call-summits—nolambda—keep-dup) on each assay-cell type. Unique peak sets are equivalent to A∩B′ where A is the assay of interest and B is the other assay with both sets belonging to the same cell type of either single cell or bulk assays. Unique intersecting peak sets are equivalent to taking the intersection between two unique peak sets where one belongs to single cells and the other belongs to bulk cells. These set operations are used to yield a refined set of peaks specific to a single cell assay that are also found in the bulk assay with the same digestion enzyme but not in other assays that use different enzymes.
Conservation scores. Unique intersecting peak sets were compared by constructing average conservation score profiles for them. For each peak in a peak set, the average phastCons score was plotted at single bp resolution.
Enrichment analysis. Unique intersecting peak sets were compared by finding the expression of their peaks' nearest genes within 2.5 kbp. Expression data was gathered from GEO and the reads per kilobase per million mapped reads was calculated using rpkmforgenes.py24. Peaks were then annotated using ChIPseeker with the gene expression data from rpkmforgenes.py (Yu, G., et al. (2015) ChlPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics, 31, 2382-2383).
Coefficient of variation scores were calculated for peak accessibility and gene expression, where the gene expression data came from 10× Genomics. For annotating peaks with TSS, ChlPseeker ((Yu, G., et al. (2015)) was used with a 20 kbp range, and genes and peaks with no mapped reads were filtered out.
The iscDNase-seq procedure is illustrated in
iscDNase-seq was first applied to WBCs purified from human blood to detect open chromatin regions at single cell resolution. Using a cutoff to filter cells with less than 1,000 reads and a fraction of reads in peaks (FRiP) smaller than 15%, d approximately 15,000 single cells and 10,000 reads per cell on average were detected in a single experiment. Using a more stringent filtering criterion where a cell must have at least 4,000 reads resulted in approximately 10,000 single cells and 12,000 reads on average (
Human WBCs contain T cells, NKcells, monocytes, and B cells. To benchmark cell cluster annotations, iscDNase-seq was applied to human CD4 T cells, B cells, NK cells, and monocytes that were purified by flow cytometry sorting. Using the same filtering strategy as the human WBCs, 699 B cells, 3,590 monocytes, 1,421 T cells, and 1,923 NK cells were obtained. To cluster the single cells from each specific cell type, read counts were first calculated in the DHSs identified from the pooled single cell data for each of the sorted cell types and whole WBCs. Next, the Latent Semantic Indexing method was applied to normalize the data. Finally, the dimensionality reduction t-SNE was directly applied to the normalized read count matrix. Finally, the cluster results were visualized along with annotations of the known cell types and clusters (
Next, it was examined whether any clusters were results of cell doublet formation. The reads per cell were visualized in the tSNE plots (
Next, it was examined whether cell type specific regulatory regions could be identified using the iscDNase-seq data. To do this, the marker peaks that can distinguish each cluster from the other clusters were detected. As shown in
scATAC-seq and iscDNase-seq use different enzymes (Tn5 or DNase I) to probe chromatin accessibility, and thus iscDNase-seq may reveal information that is not recognized by scATAC-seq. To test this idea, the recent single cell ATAC-seq data (dscATAC-seq) for B cells, monocytes, T cells, and NK cells was downloaded (Lareau, C. A., et al (2019) Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat Biotechnol, 37, 916-924). For both dscATAC-seq and iscDNase data, the cell-type specific peaks were identified using MACS with a peak width setting of 500 bp. By comparing the cell-type specific peaks from iscDNase-seq with those from dscATAC-seq, it was found that peaks from iscDNase-seq were highly overlapped with the peaks from dscATAC-seq only when they were from the same cell type (
To examine the functional significance of unique sites detected by iscDNase-seq versus dscATAC-seq, the gene ontology terms associated with the unique sites were first analyzed. It was found that the enriched GO terms for the unique sites detected by iscDNase-seq and dscATAC-seq were very different (
Next, the nucleotide compositions of unique sites detected by iscDNase-seq and dscATAC-seq were compared. It was observed that the unique iscDNase-seq sites were more likely to be AT-rich while the unique dscATAC-seq peaks were more likely to be CG-rich (
To test this hypothesis, the level of sequence conservation as sequence conservation is often an indicator of functional element was compared. By retrieving the average phastCons conservation scores (31) of the unique iscDNase-seq and dscATAC-seq sites, we observed that the unique DNase-seq sites were more likely to have a conserved region around the center of the sites, while the unique dscATAC-seq peaks have a lower conserved region away from the center of the sites (
One major goal of performing single-cell experiments is to examine the cellular heterogeneity. Elucidating the relationship between cell-to-cell variation in different omics layers is critical for identifying the origins of cellular heterogeneity and understanding how different omics layers interact. Previous studies reported that cell-to-cell variation in accessibility is positively correlated with that in gene expression. However, it is not clear whether the degree of difference in detecting accessibility could affect this correlation. To address this question, the correlation between iscDNase-seq or dscATAC-seq with scRNA-seq was computed as described in
The strategy of calculating the correlation between iscDNase-seq or dscATAC-seq with scRNA-seq is described below (
It is possible that either of the assays detects the more precise accessibility of the open chromatin regions at different distances away from TSSs. Therefore, genome regions that are 20 kb downstream and upstream of TSSs are divided into bins with equal bin size of 500 bp. For each assay, multiple correlation coefficients were computed between the variation in accessibility and gene expression, using different annotations of DHSs to TSS based on the consideration of different bins. In each calculation, only bins that have the same distance away from TSSs were considered. Finally, a set of correlation coefficients were obtained which refer to bins that are located away from TSSs with different distances (
It was previously demonstrated scDNase-seq is a sensitive method for detecting genome-wide DHSs in very small number of cells or single-cells (Jin, W., et al. (2015) Genome-wide detection of DNase I hypersensitive sites in single cells and FFPE tissue samples. Nature, 528, 142-146). Furthermore, cell-to-cell variation in chromatin accessibility calculated using single-cell DHS data generated by scDNase-seq was highly correlated with that of gene expression based on scRNA-seq data. In this study, a new strategy was designed, iscDNase-seq, to dramatically improve the throughput of single-cells that can be analyzed in one experiment. iscDNase-seq is capable of analyzing tens of thousands of single-cells in one experiment, 100-fold improvement compared with the current scDNase-seq method, without the need of expensive and sophisticated equipment and accessible to most molecular biology laboratories.
Although both ATAC-seq and DNase-seq provide information on chromatin accessibility, recent studies found that DNase-seq and ATAC-seq can detect different chromatin open regions and DNase-seq is more likely to detect enhancer regions compared to ATAC-seq, providing evidence that iscDNase-seq and single cell ATAC-seq assays may detect different properties of chromatin. Indeed, the results from comparing the iscDNase-seq data and single cell ATAC-seq data indicated that the DHS regions uniquely detected by iscDNase-seq showed higher sequence conservation scores than those uniquely detected by scATAC-seq. Furthermore, it was demonstrated that the genes located near DHSs uniquely detected by iscDNase-seq exhibited higher expression levels than the genes located near DHSs uniquely detected by single cell ATAC-seq assays. These results indicated that iscDNase-seq is more likely to detect functional elements required for cell-specific gene expression than the single cell ATAC-seq assays do. Consistent with this, it was found that the correlation between the cell-to-cell variations in gene expression and DHSs detected by iscDNase-seq is also significantly higher than that between the cell-to-cell variations in gene expression and DHSs detected by single cell ATAC-seq assays. All these results together provide evidence that iscDNase-seq is an attractive alternative single cell method for single-cell epigenomics studies.
From the foregoing description, it will be apparent that variations and modifications may be made to the invention described herein to adopt it to various usages and conditions. Such embodiments are also within the scope of the following claims.
All citations to sequences, patents and publications in this specification are herein incorporated by reference to the same extent as if each independent patent and publication was specifically and individually indicated to be incorporated by reference.
This Application claims the benefit of U.S. Provisional Application 63/111,951 filed on Nov. 10, 2020. The entire contents of this application is incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/058809 | 11/10/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63111951 | Nov 2020 | US |