A Sequence Listing in the form of an XML file (entitled “CUW-00901_SL.xml”, created on Jan. 10, 2025, and having a size of 25,411 bytes) is hereby incorporated by reference in its entirety.
Pervasive somatic mutations have recently been identified across healthy tissues. Advances in deep nucleotide sequencing and single-cell technologies have revealed that somatic mutations, even those found in cancer-associated genes, are common throughout all human tissues, including blood, skin, lung, and esophagus. Indeed, accumulation of somatic mutations is widely recognized as a critical feature driving the onset of cancer, responsible for aberrant phenotypes leading to malignant clonal expansions. In addition to driving carcinogenesis, somatic mutations have also been established as causative in a wide variety of human diseases, including ulcerative colitis, fatty liver disease, severe adult-onset inflammatory syndrome (VEXAS), and focal cortical dysplasia, amongst many others.
Despite the ubiquitous occurrence of mutations throughout healthy tissues and their wide implications in human disease, the presence of a pathologically associated mutation does not always result in progression to disease. This suggests that additional biological features, such as epigenetic regulation, shape the cellular context within which somatic mutations may drive disease. Moreover, clonal complexity in normal and malignant tissue has largely been studied exclusively at the genetic level, with limited ability to connect genotypes to cellular phenotypes in primary human samples. However, two main challenges arise when studying clonal expansions both in healthy and malignant tissues. Firstly, patient samples with clonal expansions are inherently comprised of an admixture of normal and mutant cells, and mutations often do not result in distinct cell surface markers that would allow for the physical isolation and further study of mutant cells. Secondly, complex tissues can display a high level of cellular heterogeneity across cell types and states, and therefore require high-throughput technologies allowing the capture of thousands of single cells to resolve such complexity. Thus, the ability to interrogate the impact of somatic mutations directly in human samples with high-throughput single-cell technologies is a major goal of the biomedical field.
The present disclosure is generally directed to methods for genotyping a locus. In one aspect, disclosed herein are methods of generating a plurality of biological particles. In another aspect, disclosed herein are methods of generating a plurality of barcoded locus nucleic acid fragments. Such methods comprise generating a plurality of biological particles, each biological particle comprising a polynucleotide. The methods may further comprise generating a plurality of partitions, each partition comprising one of the plurality of biological particles, a primer pair, and a barcoded bead. The primer pair may comprise: a 5′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 3′ end of a the locus within the polynucleotide, and a 3′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 5′ end of the locus.
In some embodiments, the polynucleotide is a genomic DNA or fragment thereof. In some embodiments, the polynucleotide is a DNA molecule, RNA molecule, or DNA/RNA hybrid molecule. In some embodiments, the methods further comprises amplifying the locus using the primer pair to generate a locus nucleic acid fragment. In some embodiments, the locus nucleic acid fragment is generated inside the partition. In some embodiments, the 5′ end of the 5′ primer-adapter nucleic acid molecule further comprises an adapter sequence. In some embodiments, the adapter sequence is Read 1 Nextera sequence (R1N). In some embodiments, the 5′ end of the 3′ primer-adapter nucleic acid molecule further comprises a second adapter sequence. In some embodiments, the second adapter sequence is a Read 1 Nextera sequence (R1N). In some embodiments, the locus nucleic acid fragment further comprises a sequence complementary to the Read 1 Nextera sequence.
In some embodiments, the barcoded bead is attached to a nucleic acid. In some embodiments, the nucleic acid attached to the barcoded bead comprises a common barcode sequence. In some embodiments, the barcode is 10-20 nucleotides. In some embodiments, the nucleic acid attached to the barcoded bead further comprises a sequence complementary to at least a portion of the adapter sequence. In some embodiments, the adapter sequence is the Read 1 N. In some embodiments, the nucleic acid attached to the barcoded bead further comprises a functional sequence configured to attach to a flow cell of a sequencer. In some embodiments, the functional sequence for attachment to a sequencing flow cell is a P5 sequence or a P7 sequence. In some embodiments, the methods further comprise amplifying the locus nucleic acid fragment, or a derivative thereof, using a second primer pair comprising: a 3′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 5′ end of the locus, and a 5′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to functional sequence, thereby generating a plurality of barcoded locus nucleic acid fragments, each fragment of the plurality of barcoded locus nucleic acid fragments comprising the locus.
In some embodiments, the methods further comprise amplifying the locus nucleic acid fragment, or a derivative thereof, using a third primer pair comprising: a 5′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 3′ end of the locus, and a 3′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to functional sequence, thereby generating a plurality of barcoded locus nucleic acid fragments, each fragment of the plurality of barcoded locus nucleic acid fragments comprising the locus. In some embodiments, the plurality of barcoded locus nucleic acid fragments comprising the locus are generated inside the partition. In some embodiments, each fragment of the plurality of barcoded locus nucleic acid fragments further comprises the Read 1 N sequence. In some embodiments, each fragment of the plurality of barcoded locus nucleic acid fragments further comprises the common barcode sequence and/or the functional sequence configured to attach to a flow cell of a sequencer. In some embodiments, the plurality of barcoded locus nucleic acid fragments are amplified exponentially.
In some embodiments, each biological particle further comprises a plurality of genomic DNA fragments. In some embodiments, the polynucleotide comprises a plurality of genomic DNA fragments. In some embodiments, the biological particle comprises genomic DNA and chromatin. In some embodiments, the genomic DNA is fragmented by a transposase to generate the plurality of genomic DNA fragments. In some embodiments, the transposase is fused to a set of secondary nanobodies (nb). In some embodiments, a primary antibody is bound to a target of interest on the chromatin. In some embodiments, the transposase-nb fusion protein specifically binds to primary antibody. In some embodiments, the transposase is a Tn5 transposase. In some embodiments, the genomic DNA is fragmented by a transposase-nucleic acid complex to generate a plurality of tagged genomic DNA fragments. In some embodiments, the 5′ end of each of the tagged genomic DNA fragments comprises an adapter sequence. In some embodiments, the 3′ end of each of the tagged genomic DNA fragments comprises a second adapter sequence. In some embodiments, the adapter sequence is a Read 1 sequence (R1N).
In some embodiments, the second adapter sequence is Read 2 sequence (R2N). In some embodiments, the methods further comprise amplifying the plurality of genomic DNA fragments, or derivatives thereof, by nucleic acid amplification generating a plurality of barcoded genomic DNA fragments. In some embodiments, the plurality of barcoded genomic DNA fragments are generated inside the partition. In some embodiments, each fragment of the plurality of barcoded genomic DNA fragments comprises the tagged genomic DNA. In some embodiments, each fragment of the plurality of barcoded genomic DNA fragments further comprises the common barcode sequence and/or the functional sequence configured to attach to a flow cell of a sequencer. In some embodiments, the plurality of barcoded genomic DNA fragments are amplified linearly. In some embodiments, the plurality of tagged genomic DNA, or derivatives thereof are amplified linearly. In some embodiments, the methods further comprise amplifying the barcoded locus nucleic acid fragments, or derivatives thereof, by nucleic acid amplification. In some embodiments, the methods further comprise amplifying the plurality of barcoded genomic DNA fragments, or derivatives thereof, by nucleic acid amplification. In some embodiments, the methods further comprise sequencing said barcoded locus nucleic acid fragments or derivatives thereof. In some embodiments, the methods further comprise sequencing said barcoded genomic DNA fragments or derivatives thereof. In some embodiments, the methods further comprise sequencing said barcoded locus nucleic acid fragments and said barcoded genomic DNA fragments simultaneously.
In some embodiments, the method provides chromatin accessibility and/or genotyping information. In some embodiments, the method provides genotype-phenotype mapping in clonal mosaicism. In some embodiments, said barcoded bead is a gel bead. In some embodiments, said gel bead is a degradable gel bead. In some embodiments, said partition comprises a chemical agent configured to degrade said degradable gel bead. In some embodiments, said nucleic acid barcode molecule is releasably attached to said barcoded bead. In some embodiments, In some embodiments, the methods comprise releasing said nucleic acid barcode molecule from said barcoded bead. In some embodiments, said plurality of partitions is a plurality of droplets. In some embodiments, said plurality of partitions is a plurality of wells. In some embodiments, said barcoded nucleic acid fragment, or derivative thereof, is indicative of an accessible region of said chromatin. In some embodiments, said barcoded nucleic acid fragment, or derivative thereof, is indicative of a genotype. In some embodiments, said biological particle is a cell, a cell nucleus, or a cell bead. In some embodiments, the cell is a fixed cell. In some embodiments, said barcoded nucleic acid fragment is generated by a nucleic acid extension reaction. In some embodiments, said barcoded nucleic acid fragment is generated by a ligation reaction. In some embodiments, the method provides single-cell genotyping and chromatin accessibility mapping. In some embodiments, the method provides single-cell genotyping and chromatin accessibility mapping in clonal mosaicism.
In another aspect, disclosed herein is a partition comprising: a biological particle comprising a polynucleotide; a primer pair, and a barcoded bead; and wherein the primer pair comprises: a 5′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 3′ end of a the locus within the polynucleotide, and a 3′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 5′ end of the locus. In some embodiments, the polynucleotide is a genomic DNA or fragment thereof. In some embodiments, the polynucleotide is a DNA molecule, RNA molecule, or DNA/RNA hybrid molecule. In some embodiments, the 5′ end of the 5′ primer-adapter nucleic acid molecule further comprises an adapter sequence. In some embodiments, the adapter sequence is Read 1 Nextera sequence (R1N). In some embodiments, the 5′ end of the 3′ primer-adapter nucleic acid molecule further comprises a second adapter sequence. In some embodiments, the second adapter sequence is a Read 1 Nextera sequence (R1N).
In some embodiments, the barcoded bead is attached to a nucleic acid. In some embodiments, the nucleic acid attached to the barcoded bead comprises a common barcode sequence. In some embodiments, the barcode is 10-20 nucleotides. In some embodiments, the nucleic acid attached to the barcoded bead further comprises a sequence complementary to at least a portion of the adapter sequence. In some embodiments, the adapter sequence is the Read 1 N. In some embodiments, the nucleic acid attached to the barcoded bead further comprises a functional sequence configured to attach to a flow cell of a sequencer. In some embodiments, the functional sequence for attachment to a sequencing flow cell is a P5 sequence or a P7 sequence. In some embodiments, each biological particle further comprises a plurality of genomic DNA fragments. In some embodiments, the polynucleotide comprises a plurality of genomic DNA fragments. In some embodiments, the biological particle comprises genomic DNA and chromatin. In some embodiments, the genomic DNA is fragmented by a transposase to generate the plurality of genomic DNA fragments. In some embodiments, the transposase is fused to a set of secondary nanobodies (nb). In some embodiments, a primary antibody is bound to a target of interest on the chromatin. In some embodiments, the transposase-nb fusion protein specifically binds to primary antibody. In some embodiments, the transposase is a Tn5 transposase.
In some embodiments, the genomic DNA is fragmented by a transposase-nucleic acid complex to generate a plurality of tagged genomic DNA fragments. In some embodiments, the 5′ end of each of the tagged genomic DNA fragments comprises an adapter sequence. In some embodiments, the 3′ end of each of the tagged genomic DNA fragments comprises a second adapter sequence. In some embodiments, the adapter sequence is a Read 1 sequence (R1N). In some embodiments, the second adapter sequence is Read 2 sequence (R2N). In some embodiments, said barcoded bead is a gel bead. In some embodiments, said gel bead is a degradable gel bead. In some embodiments, said partition comprises a chemical agent configured to degrade said degradable gel bead. In some embodiments, said nucleic acid barcode molecule is releasably attached to said barcoded bead. In some embodiments, said plurality of partitions is a plurality of droplets. In some embodiments, said plurality of partitions is a plurality of wells. In some embodiments, said barcoded nucleic acid fragment, or derivative thereof, is indicative of an accessible region of said chromatin. In some embodiments, said barcoded nucleic acid fragment, or derivative thereof, is indicative of a genotype. In some embodiments, said biological particle is a cell, a cell nucleus, or a cell bead. In some embodiments, the cell is a fixed cell.
In some embodiments, said barcoded nucleic acid fragment is generated by a nucleic acid extension reaction. In some embodiments, 1 said barcoded nucleic acid fragment is generated by a ligation reaction.
In another aspect, disclosed herein is a kit comprising partitioning fluid, a barcoded bead, and a primer pair comprising: a 5′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 3′ end of a specific locus within a polynucleotide, and a 3′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 5′ end of the specific locus. In some embodiments, the kit further comprises a reagent for disrupting a cell. In some embodiments, the kit further comprises a reagent for nucleic acid amplification. In some embodiments, the partitioning fluid comprises an aqueous buffer, a non-aqueous partitioning fluid, or oils. In some embodiments, the kit further comprises instructions for genotyping a locus or generating a barcoded nucleic acid fragment. In some embodiments, the barcoded bead is attached to a nucleic acid comprising a common barcode sequence. In some embodiments, the 5′ end of the 5′ primer-adapter nucleic acid molecule further comprises an adapter sequence. In some embodiments, the adapter sequence is Read 1 sequence (R1rc). In some embodiments, the 5′ end of the 3′ primer-adapter nucleic acid molecule further comprises an adapter sequence. In some embodiments, the adapter sequence is a Read 1 sequence (R1N). In some embodiments, the nucleic acid attached to the barcoded bead further comprises a sequence complementary to at least a portion of the adapter sequence. In some embodiments, said barcoded bead is a gel bead. In some embodiments, the gel bead is a degradable gel bead.
The present disclosure is generally directed to methods for genotyping a locus. Such method comprises generating a plurality of biological particles, each biological particle comprising a plurality of nucleic acid fragments. The method further comprises generating a plurality of partitions, each partition comprising one of the plurality of biological particles comprising a plurality of nucleic acid fragments, each partition further comprising a primer pair and a barcoded bead. The primer pair may comprise a 5′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 3′ end of a specific locus within the nucleic acid fragment, and a 3′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 5′ end of the specific locus.
Currently, there are a small number of single-cell targeted genotyping strategies, which can be broadly split into two categories via the source material from which genotyping information is captured. The first group of methods identify wild type or mutant mRNA transcripts from the gene of interest to genotype individual cells. This approach pairs well with single-cell RNA sequencing (scRNA-seq) assays, as the material for genotyping is already captured and appended with unique cell barcodes with the standard protocol prior to any modifications made for targeted genotyping. These methods allow for direct assessment of the biological impact of somatic mutations in primary human patient samples, critically using a patient's own wild type cells for intra-sample comparisons and therefore accounting for potential technical or biological confounders. Given the low rate of transcript capture associated with droplet-based high-throughput methods, the associated genotyping methods are efficient only for targets that are highly expressed. In addition, current poly-A capture methods rely on reverse-transcription steps, and therefore capture only ˜1.5 kb from the 3′ end of the transcript. Therefore, mutations in genes that are lowly expressed, or whose mutation loci are located far from the ends of the molecule, the efficiency of these methods drops drastically.
The second group of methods utilizes genomic DNA as the source of genotyping information, and thus does not require mRNA transcripts to be captured (or even to be present) for successful genotyping. Notably, TARGET-seq, a plate-based targeted genotyping approach paired with single-cell RNA-seq, utilizes both mRNA and genomic DNA to great effect. As TARGET-seq is plate-based, gDNA that would normally not be barcoded in droplet-based scRNA-seq techniques is maintained in a separate physical well from other cells, allowing for greater flexibility for downstream targeted genotyping. However, this approach is applied at the expense of throughput, since increasing the scale of the production for capturing thousands of cells is labor-intensive and often not viable.
To surmount the challenges of resolving the admixture of wild type and mutant cells while simultaneously interrogating chromatin accessibility of thousands of single cells at a time, we developed Genotyping of Targeted loci with single cell Chromatin Accessibility (GoT-ChA). Due to the clear benefits of targeted genotyping using genomic DNA, we developed GoT-ChA to capture genotyping information directly from the genome rather than using mRNA transcripts. GoT-ChA fundamentally relies upon an in-droplet PCR reaction using custom primers to amplify a genomic region of interest, with nascent transcripts sequentially amplified with barcoded oligonucleotides prior to emulsion breakage. As such, GoT-ChA is adaptable to any system that utilizes in-droplet PCR to append cell barcodes onto nucleic acids of interest. For example, GoT-ChA is compatible with methods such as single-cell Cut&Tag, providing DNA-bound protein information or the genomic distribution of specific histone marks, transcription factors, or other aspects of chromatin that can be probed using an antibody approach (e.g., DNA breaks, R-Loops) along with single-cell genotyping. GoT-ChA is also compatible with various multi-omic methods, such as ASAP-seq, that are able to simultaneously obtain chromatin accessibility information, mitochondrial genome coverage, and both intra- and extracellular protein expression measurements via oligonucleotide-tagged antibodies. Critically, the additional multitude of mitochondrial variants captured with such methods may enable further genotyping imputation, provided adequate coverage of mitochondrial variants that are in-phase with GoT-ChA genotyping. Therefore, GoT-ChA can synergize with multiple single cell multi-omic methods to enable a multitude of differential analyses interrogating the effects of somatic mutations on various modalities of biological information, directly in human patient samples.
At the current time, GoT-ChA is the only high-throughput droplet-based single cell method that obtains targeted genotyping from gDNA along with chromatin accessibility or even histone modifications or DNA-binding proteins profiles. Critically, GoT-ChA captures the genotype information directly from genomic DNA, and thus obviates limiting dependencies in current methods, such as target localization and expression level, allowing high-throughput somatic genotype-to-phenotype mapping. GoT-ChA provides critical insights into the epigenetic features that shape cellular fates and how that regulation may be disrupted via somatic mutations, ultimately advancing our knowledge of how such mutations impact human health from benign maladies to overt malignancies.
In hematopoiesis, changes in chromatin accessibility define priming and commitment of hematopoietic precursors towards cellular fates. In turn, somatic mutations in hematopoietic stem and progenitor cells (HSPCs) drive the onset and progression of myeloid disorders, such as myeloproliferative neoplasms (MPNs), and reshape differentiation topologies.
To chart how somatic mutations disrupt the epigenetic landscape in human clonal outgrowths, we developed Genotyping of Targeted loci with Chromatin Accessibility (GoT-ChA), linking genotypes to chromatin accessibility across thousands of single cells. Crucially, GoT-ChA captures genotypes directly from genomic DNA (
Next, we applied GoT-ChA to CD34+ cells from JAK2V617F-mutant myelofibrosis (MF) samples. We clustered cells based on chromatin profiles, revealing the expected cell populations in hematopoiesis, and then projected genotyping status onto the differentiation map. In further validation of genotyping accuracy, copy number inference showed a sample that contained a partial deletion of chromosome 20 concordant with our genotyping. Furthermore, GoT-ChA can be integrated with recent protocols to allow for high mitochondrial genome coverage (Lareau et al, Nature Biotechnology, 2020). We observed mitochondrial mutations that were highly concordant with JAK2V617F, allowing genotyping of >85% of cells.
Within MPN samples, wildtype (WT) and mutated (MUT) cells were intermingled across the differentiation topology. Nonetheless, we observed an increase in the mutant fraction within erythroid progenitors (EP). Moreover, pseudo-temporal ordering of chromatin accessibility revealed that the mutant cell fraction increased along erythroid or megakaryocyte differentiation in untreated MPN, in line with clinical phenotypes.
Chromatin accessibility profiles can provide clues to the underlying regulatory network through transcription factor (TF) motif accessibility. Uniquely, GoT-ChA enables de novo differential motif accessibility, directly comparing WT and MUT cells co-existing within the same bone marrow. Mutant HSPCs showed increased motif accessibility (FDR<0.05) for TFs associated with erythropoiesis, suggestive of increased erythroid priming. Within EP clusters, we observed increased motif accessibility of STAT5A and STAT5B, downstream targets of JAK2. These data demonstrated a cell-type specific effect of the JAK2V617F mutation.
Ruxolitinib is a frontline JAK1/2 inhibitor for MF. Despite improvements in quality of life, ruxolitinib does not clearly target the MPN clone or prevent progression of disease. In ruxolitinib-treated patients the MUT cell fraction was uniformly distributed along the differentiation, demonstrating an abrogation of the fitness advantage of JAK2V617F in committed progenitors, but not in HSPCs. Consistently, STAT5A motif accessibility remained increased in MUT cells at intermediate stages of erythroid maturation but decreased to similar levels as WT cells at later stages.
Overall, GoT-ChA radically expands the single-cell multi-omics toolkit and obviates limiting dependencies on target gene transcription, allowing high throughput somatic genotype-to-phenotype mapping. Applied to JAK2V617F-mutated MPN, GoT-ChA uncovered a cell-type specific fitness advantage with erythroid commitment, that was reversed upon JAK2 inhibitor treatment. The reshaping of the differentiation topography traced back to differential transcription factor activity driving uncommitted vs. committed JAK2V617F progenitors. Thus, single-cell multi-omics with GoT-ChA enables to chart the epigenetic underpinnings of hematopoietic clonal outgrowth.
As will be appreciated by those of skill in the art, GoT-ChA is generally compatible with any sequencing or genotyping approach that uses partitions (such as droplets or beads) and barcodes to sequence multiple genomes in parallel, such as single-cell sequencing techniques. In certain embodiments, GoT-ChA may be used in combination with Cut&Tag (Nat Commun. 2019 Apr. 29; 10 (1): 1930), ISSAAC-seq (world wide web at doi.org/10.1101/2022.01.16.476488), scifi-RNAseq (Nat Methods. 2021 June; 18 (6): 635-642.), Single cell CRISPR screening, and Single cell lineage tracing.
Clonal driver mutations fuel clonal outgrowths in normal tissue mosaicism. Somatic evolution has long been appreciated to be a central feature of malignancy, enabling adaptation to therapeutic pressures. Recently, exciting data showed that somatic evolution is ubiquitously found in normal human tissues, with massive clonal expansions, harboring somatic mutations in known cancer drivers (e.g., DNMT3A and SF3B1 in clonal hematopoiesis (CH); TP53 and NOTCH1 in esophageal mosaicism). Thus, somatic driver mutations provide fitness advantage allowing clonal outgrowth, not only in cancer, but also in normal tissue mosaicism. We therefore posit that the ability to link clonal genotypes to cellular phenotypes is a central challenge in clonal mosaicism (CM), critical to the understanding of how driver mutations provide fitness advantage leading to selection and clonal outgrowth.
Linking clonal genotypes to fitness-enhancing phenotypes in human samples is curtailed by current technological limitations. While approaches such as cell culture and murine models have enabled mechanistic elucidation of how somatic mutations confer growth advantage, models are lacking in many tissue types and may not reflect the evolutionary processes that occur in humans over decades. These limitations have inspired the research community to pursue large-scale profiling of primary human tissues with high-throughput platforms such as exome, RNA and epigenomics sequencing. However, these approaches are challenged in addressing the question of clonal fitness advantage, as clonal admixtures are not captured by bulk sequencing methods. Thus, to date, efforts to chart clonal outgrowths in normal human tissues have been limited to genotyping, and therefore we have little information directly from human samples about how mutations drive clonal growth. This is due in part to the fact that clones in normal tissues often affect only a small fraction of cells, and lack distinguishing surface markers amenable to enrichment by flow sorting. While recent single-cell RNA-seq (scRNA-seq) technologies have partially overcome this limitation by mapping differentiation processes at single-cell resolution, these methods cannot concurrently capture the genotype to allow isolation of specific mutants for analysis. Thus, there is an urgent unmet need in the field of clonal mosaicism to develop methods for multi-omics profiling in single cells to overlay somatic mutational genotyping with downstream epigenome, transcriptome or protein information.
Single-cell multi-omics technology innovation links somatic mutations to downstream epigenetic and transcriptional phenotypes. To address this challenge, we have developed plate-based methods for multi-omic single-cell technologies capable of capturing multiple layers of information from the same single cells. To address the specific challenge of genotyping in scRNA-seq at the high throughput needed for the study of CM, we developed genotyping of transcriptomes. This method genotypes transcripts (cDNA-based) containing somatic mutations together with scRNAseq, providing the ability to study the transcriptional impact of somatic mutations at single-cell resolution. Importantly, it turns the admixture of mutant and wildtype cells from a limitation to an advantage, enabling the direct comparison of mutant and wildtype cells within the same individual, overcoming individual-specific confounders in human studies. Applied to CALR mutation-driven clonal outgrowths in the human bone marrow, our studies revealed the differential fitness impact of the CALR mutations as a function of cell identity, discoveries that could only have been made due to this unique technology. We have also applied this platform to clonal mosaicism samples in hematopoiesis to define the impact of DNMT3A and SF3B1 mutations on human hematopoietic differentiation topography. Collectively, joint capture of somatic genotypes and phenotypes enables charting the effect of mutations on fitness as a function of cell identity.
Expanding the Genotyping of Targeted loci (GoT) toolkit to study clonal heterogeneity in human sample CM. The GoT toolkit is expanded to enable comprehensive study of the phenotypic consequences of somatic clonal genotypes in human normal tissues. First, given the high frequency of epigenetic and transcription factor mutations in CM, deciphering the mechanisms underlying clonal outgrowth requires genotype-aware high-throughput single-cell chromatin accessibility studies. We therefore developed a prototype to targeted single-cell genotyping (gDNA-based) in the context of Chromatin Accessibility (GoT-ChA) to jointly genotype driver mutations in single-cell Assay for Transposase-Accessible Chromatin by sequencing (scATAC-seq). GoT-ChA has two unique advantages in the context of CM: (i) as genotyping is done with gDNA it obviates limiting dependencies on targeted locus transcription, expanding the reach of single-cell multi-omics to any mutations in the genome, and (ii) it is compatible with single nuclei studies, critical for scalable application to solid human tissues. GoT-ChA is optimized to a CM across tissues and add multiplexing capabilities to target multiple mutations. We leverage the transposase-enabled aspect of GoT-ChA as a backbone to enable profiling across other epigenetic marks such as histone modification leveraging single cell Cut & Tag approaches. Last, to expand genotype-phenotype linkage, we integrate GoT-ChA with single cell transcriptome and protein profiling. We develop the molecular and analytic tools to enable production scale capabilities for the GoT toolkit. Collectively, our work allows linking CM genotypes with broad phenotyping across epigenetic, transcriptomic and protein profiling.
As used herein the specification, “a” or “an” may mean one or more. As used herein in the claim(s), when used in conjunction with the word “comprising”, the words “a” or “an” may mean one or more than one. For example, reference to an “antibody” is a reference from one to many antibodies. As used herein “another” may mean at least a second or more.
An “isolated” molecule or cell is a molecule or a cell that is identified and separated from at least one contaminant molecule or cell with which it is ordinarily associated in the environment in which it was produced. Preferably, the isolated molecule or cell is free of association with all components associated with the production environment. The isolated molecule or cell is in a form other than in the form or setting in which it is found in nature. Isolated molecules therefore are distinguished from molecules existing naturally in cells; isolated cells are distinguished from cells existing naturally in tissues, organs, or individuals.
An “isolated” nucleic acid molecule is a nucleic acid molecule that is identified and separated from at least one contaminant nucleic acid molecule with which it is ordinarily associated in the environment in which it was produced. Preferably, the isolated nucleic acid is free of association with all components associated with the production environment. The isolated nucleic acid molecules encoding the polypeptides and antibodies herein is in a form other than in the form or setting in which it is found in nature. Isolated nucleic acid molecules therefore are distinguished from nucleic acids encoding any polypeptides and antibodies herein that exist naturally in cells.
A “biological sample” encompasses a variety of sample types obtained from an individual and can be used in a diagnostic or monitoring assay. The definition encompasses blood and other liquid samples of biological origin, solid tissue samples such as a biopsy specimen or tissue cultures or cells derived therefrom and the progeny thereof. The definition also includes samples that have been manipulated in any way after their procurement, such as by treatment with reagents, solubilization, or enrichment for certain components, such as polynucleotides. The term “biological sample” encompasses a clinical sample, and also includes cells in culture, cell supernatants, cell lysates, serum, plasma, biological fluid, and tissue samples. The term “biological sample” includes urine, saliva, cerebrospinal fluid, interstitial fluid, ocular fluid, synovial fluid, blood fractions such as plasma and serum, and the like. The term “biological sample” also includes solid tissue samples, tissue culture samples, and cellular samples.
The term “sample,” as used herein, generally refers to a biological sample of a subject. The biological sample may comprise any number of macromolecules, for example, cellular macromolecules. The sample may be a cell sample. The sample may be a cell line or cell culture sample. The sample can include one or more cells. The sample can include one or more microbes. The biological sample may be a nucleic acid sample or protein sample. The biological sample may also be a carbohydrate sample or a lipid sample. The biological sample may be derived from another sample. The sample may be a tissue sample, such as a biopsy, core biopsy, needle aspirate, or fine needle aspirate. The sample may be a fluid sample, such as a blood sample, urine sample, or saliva sample. The sample may be a skin sample. The sample may be a cheek swab. The sample may be a plasma or serum sample. The sample may be a cell-free or cell free sample. A cell-free sample may include extracellular polynucleotides. Extracellular polynucleotides may be isolated from a bodily sample that may be selected from the group consisting of blood, plasma, serum, urine, saliva, mucosal excretions, sputum, stool and tears.
The term “biological particle,” as used herein, generally refers to a discrete biological system derived from a biological sample. The biological particle may be a macromolecule. The biological particle may be a small molecule. The biological particle may be a virus. The biological particle may be a cell or derivative of a cell. The biological particle may be an organelle. The biological particle may be a rare cell from a population of cells. The biological particle may be any type of cell, including without limitation prokaryotic cells, eukaryotic cells, bacterial, fungal, plant, mammalian, or other animal cell type, mycoplasmas, normal tissue cells, tumor cells, or any other cell type, whether derived from single cell or multicellular organisms. The biological particle may be a constituent of a cell. The biological particle may be or may include DNA, RNA, organelles, proteins, or any combination thereof. The biological particle may be or include a chromosome or other portion of a genome. The biological particle may be or may include a matrix (e.g., a gel or polymer matrix) comprising a cell or one or more constituents from a cell (e.g., cell bead), such as DNA, RNA, organelles, proteins, or any combination thereof, from the cell. The biological particle may be obtained from a tissue of a subject. The biological particle may be a hardened cell. Such hardened cell may or may not include a cell wall or cell membrane. The biological particle may include one or more constituents of a cell, but may not include other constituents of the cell. An example of such constituents is a nucleus or an organelle. A cell may be a live cell. The live cell may be capable of being cultured, for example, being cultured when enclosed in a gel or polymer matrix, or cultured when comprising a gel or polymer matrix.
The term “vector,” as used herein, is intended to refer to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. One type of vector is a “plasmid,” which refers to a circular double stranded DNA into which additional DNA segments may be ligated. Another type of vector is a phage vector. Another type of vector is a viral vector, wherein additional DNA segments may be ligated into the viral genome. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) can be integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operatively linked. Such vectors are referred to herein as “recombinant expression vectors,” or simply, “expression vectors.” In general, expression vectors useful in recombinant DNA techniques are often in the form of plasmids. In the present specification, “plasmid” and “vector” may be used interchangeably as the plasmid is the most commonly used form of vector.
“Polynucleotide,” or “nucleic acid,” as used interchangeably herein, refer to polymers of nucleotides of any length, and include DNA and RNA. The nucleotides can be deoxyribonucleotides, ribonucleotides, modified nucleotides or bases, and/or their analogs, or any substrate that can be incorporated into a polymer by DNA or RNA polymerase or by a synthetic reaction. A polynucleotide may comprise modified nucleotides, such as methylated nucleotides and their analogs. If present, modification to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may comprise modification(s) made after synthesis, such as conjugation to a label. Other types of modifications include, for example, “caps,” substitution of one or more of the naturally occurring nucleotides with an analog, internucleotide modifications such as, for example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoamidates, carbamates, etc.) and with charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.), those containing pendant moieties, such as, for example, proteins (e.g., nucleases, toxins, antibodies, signal peptides, ply-L-lysine, etc.), those with intercalators (e.g., acridine, psoralen, etc.), those containing chelators (e.g., metals, radioactive metals, boron, oxidative metals, etc.), those containing alkylators, those with modified linkages (e.g., alpha anomeric nucleic acids, etc.), as well as unmodified forms of the polynucleotides(s). Further, any of the hydroxyl groups ordinarily present in the sugars may be replaced, for example, by phosphonate groups, phosphate groups, protected by standard protecting groups, or activated to prepare additional linkages to additional nucleotides, or may be conjugated to solid or semi-solid supports. The 5′ and 3′ terminal OH can be phosphorylated or substituted with amines or organic capping group moieties of from 1 to 20 carbon atoms. Other hydroxyls may also be derivatized to standard protecting groups. Polynucleotides can also contain analogous forms of ribose or deoxyribose sugars that are generally known in the art, including, for example, 2′-O-methyl-, 2′-O-allyl-, 2′-fluoro- or 2′-azido-ribose, carbocyclic sugar analogs, α-anomeric sugars, epimeric sugars such as arabinose, xyloses or lyxoses, pyranose sugars, furanose sugars, sedoheptuloses, acyclic analogs, and basic nucleoside analogs such as methyl riboside. One or more phosphodiester linkages may be replaced by alternative linking groups. These alternative linking groups include, but are not limited to, embodiments wherein phosphate is replaced by P(O)S (“thioate”), P(S)S (“dithioate”), (O)NR2 (“amidate”), P(O)R, P(O)OR′, CO, or CH2 (“formacetal”), in which each R or R′ is independently H or substituted or unsubstituted alkyl (1-20 C) optionally containing an ether (—O—) linkage, aryl, alkenyl, cycloalkyl, cycloalkenyl or aralkyl. Not all linkages in a polynucleotide need be identical. The preceding description applies to all polynucleotides referred to herein, including RNA and DNA.
A “host cell” includes an individual cell or cell culture that can be or has been a recipient for vector(s) for incorporation of polynucleotide inserts. Host cells include progeny of a single host cell, and the progeny may not necessarily be completely identical (in morphology or in genomic DNA complement) to the original parent cell due to natural, accidental, or deliberate mutation. A host cell includes cells transfected in vivo with a polynucleotide(s) of this disclosure.
The term “subject” as used herein refers to a living mammal and may be interchangeably used with the term “patient”. Examples of mammals include, but are not limited to, any member of the mammalian class: humans, non-human primates such as chimpanzees, and other apes and monkey species; farm animals such as cattle, horses, sheep, goats, swine; domestic animals such as rabbits, dogs, and cats; laboratory animals including rodents, such as rats, mice and guinea pigs, and the like. The term does not denote a particular age or gender.
The term “culturing” refers to the in vitro propagation of cells or organisms on or in media of various kinds. It is understood that the descendants of a cell grown in culture may not be completely identical (i.e., morphologically, genetically, or phenotypically) to the parent cell. By “expanded” is meant any proliferation or division of cells.
Whenever the term “at least,” “greater than,” or “greater than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “at least,” “greater than” or “greater than or equal to” applies to each of the numerical values in that series of numerical values. For example, greater than or equal to 1, 2, or 3 is equivalent to greater than or equal to 1, greater than or equal to 2, or greater than or equal to 3.
Whenever the term “no more than,” “less than,” or “less than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “no more than,” “less than,” or “less than or equal to” applies to each of the numerical values in that series of numerical values. For example, less than or equal to 3, 2, or 1 is equivalent to less than or equal to 3, less than or equal to 2, or less than or equal to 1.
The term “genome,” as used herein, generally refers to genomic information from a subject, which may be, for example, at least a portion or an entirety of a subject's hereditary information. A genome can be encoded either in DNA or in RNA. A genome can comprise coding regions (e.g., that code for proteins) as well as non-coding regions. A genome can include the sequence of all chromosomes together in an organism. For example, the human genome ordinarily has a total of 46 chromosomes. The sequence of all of these together may constitute a human genome.
The terms “adaptor(s),” “adapter(s),” “adaptor molecule(s),” and “tag(s)” may be used synonymously. An adaptor or tag can be coupled to a polynucleotide sequence to be “tagged” by any approach, including ligation, hybridization, or other approaches. Adaptors may also be used to refer to a nucleic acid sequence or segment, such as a functional sequence. These adaptors may comprise nucleic acid sequences that may add a function, e.g., spacer sequence, primer sequencing site, barcode sequence, unique molecular identifier sequence, etc.
The terms “primer sequencing site” and “sequencing primer sequence” may be used interchangeably herein. Primer sequencing sites generally refer to nucleic acid sequences that can be used for sequencing.
The term “sequencing,” as used herein, generally refers to methods and technologies for determining the sequence of nucleotide bases in one or more polynucleotides. Polynucleotides may comprise nucleic acid molecules, such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), including variants or derivatives thereof (e.g., single-stranded DNA). Sequencing can be performed by various systems currently available, such as, without limitation, a sequencing system by Illumina®, Pacific Biosciences (PacBio®), MGI, Complete Genomics, Oxford Nanopore®, or Life Technologies (Ion Torrent®). Alternatively or in addition, sequencing may be performed using nucleic acid amplification, polymerase chain reaction (PCR) (e.g., digital PCR, quantitative PCR, or real time PCR), or isothermal amplification. Such systems may provide a plurality of raw genetic data corresponding to the genetic information of a subject (e.g., human), as generated by the systems from a sample provided by the subject. In some examples, such systems provide sequencing reads (also “reads” herein). A read may include a string of nucleic acid bases corresponding to a sequence of a nucleic acid molecule that has been sequenced. Sequencing can comprise short-read sequencing or long-read sequencing, or both. In some situations, systems and methods provided herein may be used with proteomic information.
The term “barcode,” as used herein, generally refers to a label, or identifier, that conveys or is capable of conveying information about an analyte. A barcode can be part of an analyte. A barcode can be independent of an analyte. A barcode can be a tag attached to an analyte (e.g., nucleic acid molecule) or a combination of the tag in addition to an endogenous characteristic of the analyte (e.g., size of the analyte or end sequence(s)). A barcode may be unique. Barcodes can have a variety of different formats. For example, barcodes can include: polynucleotide barcodes; random nucleic acid and/or amino acid sequences; and synthetic nucleic acid and/or amino acid sequences. A barcode can be attached to an analyte in a reversible or irreversible manner. A barcode can be added to, for example, a fragment of a deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) sample before, during, and/or after sequencing of the sample. Barcodes can allow for identification and/or quantification of individual sequencing-reads.
The term “real time,” as used herein, can refer to a response time of less than about 1 second, a tenth of a second, a hundredth of a second, a millisecond, or less. The response time may be greater than 1 second. In some instances, real time can refer to simultaneous or substantially simultaneous processing, detection or identification.
The term “molecular tag,” as used herein, generally refers to a molecule capable of binding to a macromolecular constituent. The molecular tag may bind to the macromolecular constituent with high affinity. The molecular tag may bind to the macromolecular constituent with high specificity. The molecular tag may comprise a nucleotide sequence. The molecular tag may comprise a nucleic acid sequence. The nucleic acid sequence may be at least a portion or an entirety of the molecular tag. The molecular tag may be a nucleic acid molecule or may be part of a nucleic acid molecule. The molecular tag may be an oligonucleotide or a polypeptide. The molecular tag may comprise a DNA aptamer. The molecular tag may be or comprise a primer. The molecular tag may be, or comprise, a protein. The molecular tag may comprise a polypeptide. The molecular tag may be a barcode.
The term “bead,” as used herein, generally refers to a particle. The bead may be a solid or semi-solid particle. The bead may be a gel bead. The gel bead may include a polymer matrix (e.g., matrix formed by polymerization or cross-linking). The polymer matrix may include one or more polymers (e.g., polymers having different functional groups or repeat units). Polymers in the polymer matrix may be randomly arranged, such as in random copolymers, and/or have ordered structures, such as in block copolymers. Cross-linking can be via covalent, ionic, or inductive, interactions, or physical entanglement. The bead may be a macromolecule. The bead may be formed of nucleic acid molecules bound together. The bead may be formed via covalent or non-covalent assembly of molecules (e.g., macromolecules), such as monomers or polymers. Such polymers or monomers may be natural or synthetic. Such polymers or monomers may be or include, for example, nucleic acid molecules (e.g., DNA or RNA). The bead may be formed of a polymeric material. The bead may be magnetic or non-magnetic. The bead may be rigid. The bead may be flexible and/or compressible. The bead may be disruptable or dissolvable. The bead may be a solid particle (e.g., a metal-based particle including but not limited to iron oxide, gold or silver) covered with a coating comprising one or more polymers. Such coating may be disruptable or dissolvable.
The term “partition,” as used herein, generally, refers to a space or volume that may be suitable to contain one or more species or conduct one or more reactions. A partition may be a physical compartment, such as a droplet or well. The partition may isolate space or volume from another space or volume. The droplet may be a first phase (e.g., aqueous phase) in a second phase (e.g., oil) immiscible with the first phase. The droplet may be a first phase in a second phase that does not phase separate from the first phase, such as, for example, a capsule or liposome in an aqueous phase. A partition may comprise one or more other (inner) partitions. In some cases, a partition may be a virtual compartment that can be defined and identified by an index (e.g., indexed libraries) across multiple and/or remote physical compartments. For example, a physical compartment may comprise a plurality of virtual compartments.
The present disclosure is generally directed to methods for genotyping a locus. Such method comprises generating a plurality of biological particles, each biological particle comprising a plurality of nucleic acid fragments. The method further comprises generating a plurality of partitions, each partition comprising one of the plurality of biological particles comprising a plurality of nucleic acid fragments, each partition further comprising a primer pair and a barcoded bead. The primer pair may comprise a 5′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 3′ end of a specific locus within the nucleic acid fragment, and a 3′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 5′ end of the specific locus.
The present disclosure is generally directed to methods for generating a barcoded nucleic acid fragment. The method comprises contacting a plurality of biological particles comprising chromatin with a transposase to generate a plurality of nucleic acid fragments of said chromatin. The method may further comprise partitioning said plurality of biological particles comprising said plurality of nucleic acid fragments, a plurality of primer pairs, and a plurality of barcoded beads into a plurality of partitions. A partition of said plurality of partitions may comprise a biological particle of said plurality of biological particles, one or more primer pair of the plurality of primer pairs, and a barcoded bead of said plurality of barcoded beads. Said biological particle may comprise a nucleic acid fragment of said plurality of nucleic acid fragments. Said barcoded bead may be attached to a plurality of nucleic acid barcode molecules comprising a common barcode sequence. The primer pairs comprises: a 5′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 3′ end of a specific locus within the nucleic acid fragment, and a 3′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 5′ end of the specific locus. The method may further comprise amplifying the specific locus. The method may further comprise generating a barcoded nucleic acid fragment using the amplified specific locus and a nucleic acid barcode molecule of said plurality of nucleic acid barcode molecules.
A critical challenge in studying the phenotypes of clonal expansions, in both healthy tissues and overt malignancies, is that primary human samples are often composed of an admixture of wildtype and mutant cells. Thus, precision mapping of genotypes to phenotypes is obscured. Additionally, bulk population measurements largely aggregate heterogeneous groups of cells, hindering the identification of cell type- or cell state-specific phenotypes arising from the presence of somatic mutations. To circumvent these limitations, single cell simultaneous capture of genotypes together with phenotypic information is required, enabling intrasample, cell type-specific mutant to wildtype comparisons. Application of such single cell multi-omic approaches have shown that somatic mutations in human tissues exert a differing phenotypic effect as a function of cell state.
While droplet-based single cell technologies allow high-throughput linkage of cDNA-captured mutated genotypes with phenotypes, similar high-throughput methods for linking somatic genotypes with epigenetic profiles are lacking. Furthermore, cDNA based capture results in limiting dependencies on target gene expression and the distance of the locus from transcript end. For example, these dependencies led to inefficient capture of the lowly expressed JAK2V617F locus (<10% of cells), requiring gDNA capture via lower throughput plate-based single cell sequencing. The dependency on gene expression also limits the application to archival frozen tissues, as nuclei isolation further decreases the number of available mRNA molecules for genotyping. Collectively, these limitations restrict the ability to profile key lowly expressed mutations, as well as mutations leading to nonsense-mediated mRNA decay or affecting non-coding regions.
GoT-ChA addresses this challenge by delivering droplet-based, broadly available, high-throughput joint capture of genotypes and chromatin accessibility. We further show that GoT-ChA can be readily integrated with protein and mitochondrial DNA capture, enabling robust linkage of somatic genotypes to a variety of signals at single cell resolution. As GoT-ChA is based on gDNA rather than cDNA capture, it also obviates the limiting dependencies on mutated locus expression and location. Thus, GoT-ChA enables the interrogation of somatic mutations throughout the genome, and radically expands the range of human biological phenomena that can be investigated for epigenetic deregulation due to somatic mutations. Importantly, the ability to apply GoTChA to nuclei opens the possibility for application to archived frozen solid tissues or tumors, critical for the exciting emerging field of clonal mosaicism across human tissues.
To leverage the unique ability to chart the impact of somatic mutation on epigenetic differentiation landscapes, we focused on JAK2V617F-driven clonal expansions in primary human samples from PV and MF patients. These data revealed that the epigenetic consequences of the JAK2V617F mutation are highly cell state dependent. Indeed, the frequency of mutated cells expanded at the stage of committed erythroid and megakaryocytic progenitors, consistent with clinical phenotypes, and demonstrates that the clonal representation of this mutation varies by differentiation stage and fate. Notably, the ability to profile single-cell chromatin landscapes allowed to examine patterns of HSPC priming and demonstrated that increases in the frequencies of mutated committed progenitors were heralded by increased accessibility of erythroid transcription factor motifs already in mutated HSPCs, consistent with aberrant lineage priming in MPN initiating cells.
JAK2V617F myeloproliferation is characterized by the presence of an inflammatory microenvironment, driving bone marrow fibrosis and extramedullary hematopoiesis. Previous work has linked JAK2-mediated activation of STAT1 and STAT3 to increased NF-kB signaling in mouse models, and highlighted cell-extrinsic effects of the microenvironment. Here, we show that the JAK2V617F mutant HSPCs also display epigenetic profiles that are consistent with cell-intrinsic pro-inflammatory phenotypes, with increased motif accessibility of NF-kB-, AP-1-, and TGF-b-associated transcription factors in mutant cells. The observation that pro-inflammatory phenotypes are, at least to some degree, linked directly to JAK2V617F in a cell-intrinsic fashion opens an avenue for potential combined therapeutic strategies for mutant-specific targeting, aimed at both JAK2V617F constitutive activation as well as pro-inflammatory signaling in mutant HSPCs. In another striking demonstration of cell-type specificity in mutational impact, JAK2V617F megakaryocytic progenitors, which produce mature megakaryocytes thought to drive characteristic marrow fibrosis through pro-fibrotic cytokine signaling, showed a proinflammatory landscape specific for AP-1 transcription factor activity, which has been linked with the fibrotic clinical phenotype of MF. In contrast, early HSPCs (HSPC1) showed broad, pro-inflammatory signature characterized not only by increased AP-1-, but also NF-kB- and TGF-b-associated transcription factor motif accessibility. The study of paired samples from a PV patient who progressed to MF showed that many of the changes described above, including increased proinflammatory NF-kB and AP-1 motif accessibility in HSPCs and aberrant regulation of the g-globin locus in EPs, are already evident long before significant marrow fibrosis occurs, providing human data support to the causal role of JUN/FOS in inducing fibrosis. These data suggest that JAK2V617F-mediated inflammation and fibrosis results from a complex interplay between cell-extrinsic and cell-intrinsic effects that vary across different progenitor populations.
Interestingly, we observed a near complete loss of differential accessibility signals between mutant and wildtype cells within patients treated with ruxolitinib therapy, including reversion of intrinsic proinflammatory phenotypes and differentiation biases. However, ruxolitinib treatment does not eliminate JAK2V617F mutant HSPCs. This is in line with the observation that while ruxolitinib reduces splenomegaly and disease symptoms resulting in overall improvement in quality of life, it fails to prevent disease progression or eliminate the mutated clones in MPNs. Thus, while ruxolitinib treatment appears to abrogate JAK2V617F-mediated shifts in the epigenetic landscape of HSPCs, mutated cells may continue to promote disease progression through clonal evolution, even in the context of JAK inhibition. Thus, improved JAK2 inhibition for elimination of mutated cells may be critical for the prevention of disease progression.
While our cell mixing study directly testing heterozygous genotyping demonstrated a 34% rate of complete allelic capture, incomplete capture of targeted alleles during in-droplet genotyping reduces the capacity for classification of heterozygous cells. While MF patients tend to have homozygous JAK2V617F mutations, allelic dropout will result in misclassification of heterozygous cells as either homozygous wildtype or mutant. Nonetheless, genotype misclassifications would dilute the strength of the biological differences between genotypes, and thus the differential epigenomic alterations reported here likely serve as a lower bound to effect sizes. Further, application of GoT-ChA-ASAP in samples with in-phase mitochondrial variants may allow for inference genotyping, decreasing the impact of allelic dropout. In addition, although our multi-modal approach expanding GoT-ChA allows to link genotype to both epigenetic changes and to protein expression levels, it does not provide simultaneous capture of transcriptional information, thus limiting the assessment of mutational impact on gene expression. While the 10× Genomics Multiome platform obtains both chromatin accessibility and gene expression information, the cell barcoding reaction does not utilize in-droplet PCR and thus precludes the usage of GoT-ChA. However, development of alternative multiomic technologies that retain an indroplet barcoding PCR, such as ISSAAC-seq that utilizes the 10× Genomics scATAC-seq platform as a foundation for simultaneous scRNA-seq, provides promising avenues for linking gene expression changes to mutation-specific epigenetic alterations assayed via GoT-ChA. Furthermore, given that the transposase-enabled scATAC-seq has been leveraged to expand the range of assayable epigenomic profiles to other modalities such as histone modifications, GoT-ChA has the potential to link genotypes with additional epigenetic features and therefore provide a more comprehensive understanding of the cellular phenotypes driven by somatic mutations.
Collectively, disclosed herein is a powerful novel single-cell multiomic approach that allows for the direct investigation of the impact of somatic mutations on chromatin accessibility in primary human patient samples. These data show that the JAK2V617F somatic mutation, central to MPN pathogenesis, leads to epigenetic rewiring, in a cell-intrinsic and cell typespecific manner. These results, thus, demonstrate the power of joint single-cell capture of genotypes and epigenomes for a high-resolution study of clonal outgrowths in primary human tissue. GoT-ChA may be of particular importance to the emerging field of clonal mosaicism, now understood to be ubiquitous across the human body. Clonal mosaicism arises when a post-zygotic mutational event is detectable in subpopulations of cells as an alternative genotype while not present in the germline genome. In non-malignant clonal mosaicism, previous investigations have been largely limited to genotyping, due to the inability to separate admixtures of wildtype and mutant cells for genotype-phenotype inferences in primary human samples. We envision that GoT-ChA will thus serve as a foundation for broad future explorations to uncover the critical link between mutated somatic genotypes and epigenetic alterations across human clonal outgrowths in malignant and non-malignant contexts.
“Transposase” is an enzyme that binds to the end of a transposon and catalyses its movement to another part of the genome by a cut and paste mechanism or a replicative transposition mechanism. Any suitable transposase may be used in the method described herein, such as Tn5. For example, a transposase may be a fusion protein with various functional domains in addition to the endogenous accessible chromatin binding/cutting activity. Some examples may include Tn5 fusions with dead Cas9 enzymes for genomic targeting of transposase activity, or Tn5 fusions with protein A/G which can be used in Cut&Tag assays to tagment regions where primary antibodies against histone marks or transcription factors are bound, providing information about genome-wide histone mark distribution or TF activity in conjunction with targeted genotyping.
A “barcode” is a label, or identifier, that conveys or is capable of conveying information (e.g., information about an analyte in a sample, a bead, and/or a capture probe). A barcode can be part of an analyte, or independent of an analyte. A barcode can be attached to an analyte. A particular barcode can be unique relative to other barcodes.
Barcodes can have a variety of different formats. For example, barcodes can include polynucleotide barcodes, random nucleic acid and/or amino acid sequences, and synthetic nucleic acid and/or amino acid sequences. A barcode can be attached to an analyte or to another moiety or structure in a reversible or irreversible manner. A barcode can be added to, for example, strand of a deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) sample before or during sequencing of the sample. Barcodes can allow for identification and/or quantification of individual sequencing-reads (e.g., a barcode can be or can include a unique molecular identifier or “UMI”).
A “probe” or a “target,” when used in reference to a nucleic acid or sequence of a nucleic acids, is intended as a semantic identifier for the nucleic acid or sequence in the context of a method or composition, and does not limit the structure or function of the nucleic acid or sequence beyond what is expressly indicated.
An “adaptor,” an “adapter,” and a “tag” are terms that are used interchangeably in this disclosure, and refer to species that can be coupled to a polynucleotide sequence (in a process referred to as “tagging”) using any one of many different techniques including (but not limited to) ligation, hybridization, and tagmentation. Adaptors can also be nucleic acid sequences that add a function, e.g., spacer sequences, primer sequences/sites, barcode sequences, unique molecular identifier sequences.
The terms “hybridizing,” “hybridize,” “annealing,” and “anneal” are used interchangeably in this disclosure, and refer to the pairing of substantially complementary or complementary nucleic acid sequences within two different molecules. Pairing can be achieved by any process in which a nucleic acid sequence joins with a substantially or fully complementary sequence through base pairing to form a hybridization complex. For purposes of hybridization, two nucleic acid sequences are “substantially complementary” if at least 60% (e.g., at least 70%, at least 80%, or at least 90%) of their individual bases are complementary to one another.
A “primer” is a single-stranded nucleic acid sequence having a 3′ end that can be used as a chemical substrate for a nucleic acid polymerase in a nucleic acid extension reaction. RNA primers are formed of RNA nucleotides, and are used in RNA synthesis, while DNA primers are formed of DNA nucleotides and used in DNA synthesis. Primers can also include both RNA nucleotides and DNA nucleotides (e.g., in a random or designed pattern). Primers can also include other natural or synthetic nucleotides described herein that can have additional functionality. In some examples, DNA primers can be used to prime RNA synthesis and vice versa (e.g., RNA primers can be used to prime DNA synthesis). Primers can vary in length. For example, primers can be about 6 bases to about 120 bases. For example, primers can include up to about 25 bases.
A “ligation” is a method of ligating two (or more) nucleic acid sequences that are in proximity with each other through enzymatic means (e.g., a ligase). In some embodiments, ligation can include a “gap-filling” step that involves incorporation of one or more nucleic acids by a polymerase, based on the nucleic acid sequence of a template nucleic acid molecule, spanning a distance between the two nucleic acid molecules of interest (see, e.g., U.S. Pat. No. 7,264,929, the entire contents of which are incorporated herein by reference).
A wide variety of different methods can be used for ligating nucleic acid molecules, including (but not limited to) “sticky-end” and “blunt-end” ligations. Additionally, single-stranded ligation can be used to perform ligation on a single-stranded nucleic acid molecule. Sticky-end ligations involve the hybridization of complementary single-stranded sequences between the two nucleic acid molecules to be joined, prior to the ligation event itself. Blunt-end ligations generally do not include hybridization of complementary regions from each nucleic acid molecule because both nucleic acid molecules lack a single-stranded overhang at the site of ligation.
The terms “detectable label,” “optical label,” and “label” are used interchangeably herein to refer to a directly or indirectly detectable moiety that is associated with (e.g., conjugated to) a molecule to be detected, e.g., a capture probe or analyte. The detectable label can be directly detectable by itself (e.g., radioisotope labels or fluorescent labels) or, in the case of an enzymatic label, can be indirectly detectable, e.g., by catalyzing chemical alterations of a chemical substrate compound or composition, which chemical substrate compound or composition is directly detectable. Detectable labels can be suitable for small scale detection and/or suitable for high-throughput screening. As such, suitable detectable labels include, but are not limited to, radioisotopes, fluorophores, chemiluminescent compounds, bioluminescent compounds, and dyes.
The detectable label can be qualitatively detected (e.g., optically or spectrally), or it can be quantified. Qualitative detection generally includes a detection method in which the existence or presence of the detectable label is confirmed, whereas quantifiable detection generally includes a detection method having a quantifiable (e.g., numerically reportable) value such as an intensity, duration, polarization, and/or other properties. In some embodiments, the detectable label is bound to a feature or to a capture probe associated with a feature. For example, detectably labeled features can include a fluorescent, a colorimetric, or a chemiluminescent label attached to a bead (see, for example, Raj eswari et al., J. Microbiol Methods 139:22-28, 2017, and Forcucci et al., J. Biomed Opt. 10:105010, 2015, the entire contents of each of which are incorporated herein by reference).
A variety of steps can be performed to prepare a biological sample for analysis. Except where indicated otherwise, the preparative steps described below can generally be combined in any manner to appropriately prepare a particular sample for analysis.
A biological sample can be harvested from a subject (e.g., via surgical biopsy, whole subject sectioning), grown in vitro on a growth substrate or culture dish as a population of cells, or prepared as a tissue slice or tissue section. Grown samples may be sufficiently thin for analysis without further processing steps. Alternatively, grown samples, and samples obtained via biopsy or sectioning, can be prepared as thin tissue sections using a mechanical cutting apparatus such as a vibrating blade microtome. As another alternative, in some embodiments, a thin tissue section can be prepared by applying a touch imprint of a biological sample to a suitable substrate material.
The thickness of the tissue section can be a fraction of (e.g., less than 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, or 0.1) the maximum cross-sectional dimension of a cell. However, tissue sections having a thickness that is larger than the maximum cross-section cell dimension can also be used. For example, cryostat sections can be used, which can be, e.g., 10-20 micrometers thick.
More generally, the thickness of a tissue section typically depends on the method used to prepare the section and the physical characteristics of the tissue, and therefore sections having a wide variety of different thicknesses can be prepared and used. For example, the thickness of the tissue section can be at least 0.1, 0.2, 0.3, 0.4, 0.5, 0.7, 1.0, 1.5, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 20, 30, 40, or 50 micrometers. Thicker sections can also be used if desired or convenient, e.g., at least 70, 80, 90, or 100 micrometers or more. Typically, the thickness of a tissue section is between 1-100 micrometers, 1-50 micrometers, 1-30 micrometers, 1-25 micrometers, 1-20 micrometers, 1-15 micrometers, 1-10 micrometers, 2-8 micrometers, 3-7 micrometers, or 4-6 micrometers, but as mentioned above, sections with thicknesses larger or smaller than these ranges can also be analysed.
Multiple sections can also be obtained from a single biological sample. For example, multiple tissue sections can be obtained from a surgical biopsy sample by performing serial sectioning of the biopsy sample using a sectioning blade. Spatial information among the serial sections can be preserved in this manner, and the sections can be analysed successively to obtain three-dimensional information about the biological sample.
In some embodiments, the biological sample (e.g., a tissue section as described above) can be prepared by deep freezing at a temperature suitable to maintain or preserve the integrity (e.g., the physical characteristics) of the tissue structure. Such a temperature can be, e.g., less than −20° C., or less than −25° C., −30° C., −40° C., −50° C., −60° C., −70° C., −80° C.-90° C., −100° C., −110° C., −120° C., −130° C., −140° C., −150° C., −160° C., −170° C., −180° C., −190° C., or −200° C. The frozen tissue sample can be sectioned, e.g., thinly sliced, onto a substrate surface using any number of suitable methods. For example, a tissue sample can be prepared using a chilled microtome (e.g., a cryostat) set at a temperature suitable to maintain both the structural integrity of the tissue sample and the chemical properties of the nucleic acids in the sample. Such a temperature can be, e.g., less than −15° C., less than −20° C., or less than −25° C. A sample can be snap frozen in isopentane and liquid nitrogen. Frozen samples can be stored in a sealed container prior to embedding.
In some embodiments, the biological sample can be prepared using formalin-fixation and paraffin-embedding (FFPE), which are established methods. In some embodiments, cell suspensions and other non-tissue samples can be prepared using formalin-fixation and paraffin-embedding. Following fixation of the sample and embedding in a paraffin or resin block, the sample can be sectioned as described above. Prior to analysis, the paraffin-embedding material can be removed from the tissue section (e.g., deparaffinization) by incubating the tissue section in an appropriate solvent (e.g., xylene) followed by a rinse (e.g., 99.5% ethanol for 2 minutes, 96% ethanol for 2 minutes, and 70% ethanol for 2 minutes).
As an alternative to formalin fixation described above, a biological sample can be fixed in any of a variety of other fixatives to preserve the biological structure of the sample prior to analysis. For example, a sample can be fixed via immersion in ethanol, methanol, acetone, formaldehyde (e.g., 2% formaldehyde), paraformaldehyde-Triton, glutaraldehyde, or combinations thereof.
In some embodiments, acetone fixation is used with fresh frozen samples, which can include, but are not limited to, cortex tissue, mouse olfactory bulb, human brain tumor, human post-mortem brain, and breast cancer samples. In some embodiments, a compatible fixation method is chosen and/or optimized based on a desired workflow. For example, formaldehyde fixation may be chosen as compatible for workflows using IHC/IF protocols for protein visualization. As another example, methanol fixation may be chosen for workflows emphasizing RNA/DNA library quality. Acetone fixation may be chosen in some applications to permeabilize the tissue. When acetone fixation is performed, pre-permeabilization steps (described below) may not be performed. Alternatively, acetone fixation can be performed in conjunction with permeabilization steps.
As an alternative to paraffin embedding described above, a biological sample can be embedded in any of a variety of other embedding materials to provide a substrate to the sample prior to sectioning and other handling steps. In general, the embedding material is removed prior to analysis of tissue sections obtained from the sample. Suitable embedding materials include, but are not limited to, waxes, resins (e.g., methacrylate resins), epoxies, and agar.
To facilitate visualization, biological samples can be stained using a wide variety of stains and staining techniques. In some embodiments, a sample can be stained using any number of biological stains, including but not limited to, acridine orange, Bismarck brown, carmine, coomassie blue, cresyl violet, DAPI, eosin, ethidium bromide, acid fuchsine, hematoxylin, Hoechst stains, iodine, methyl green, methylene blue, neutral red, Nile blue, Nile red, osmium tetroxide, propidium iodide, rhodamine, or safranin.
The sample can be stained using known staining techniques, including Can-Grunwald, Giemsa, hematoxylin and eosin (H&E), Jenner's, Leishman, Masson's trichrome, Papanicolaou, Romanowsky, silver, Sudan, Wright's, and/or Periodic Acid Schiff (PAS) staining techniques. PAS staining is typically performed after formalin or acetone fixation.
In some embodiments, the biological sample can be stained using a detectable label (e.g., radioisotopes, fluorophores, chemiluminescent compounds, bioluminescent compounds, and dyes) as described elsewhere herein. In some embodiments, a biological sample is stained using only one type of stain or one technique. In some embodiments, staining includes biological staining techniques such as H&E staining. In some embodiments, staining includes identifying analytes using fluorescently-conjugated antibodies. In some embodiments, a biological sample is stained using two or more different types of stains, or two or more different staining techniques. For example, a biological sample can be prepared by staining and imaging using one technique (e.g., H&E staining and brightfield imaging), followed by staining and imaging using another technique (e.g., IHC/IF staining and fluorescence microscopy) for the same biological sample.
In some embodiments, biological samples can be destained. Methods of destaining or discoloring a biological sample are known in the art, and generally depend on the nature of the stain(s) applied to the sample. For example, H&E staining can be destined by washing the sample in HCl. In some embodiments, destaining can include 1, 2, 3, or more washes in HCl. In some embodiments, destaining can include adding HCl to a downstream solution (e.g., permeabilization solution). As another example, in some embodiments, one or more immunofluorescence stains are applied to the sample via antibody coupling. Such stains can be removed using techniques such as cleavage of disulfide linkages via treatment with a reducing agent and detergent washing, chaotropic salt treatment, treatment with antigen retrieval solution, and treatment with an acidic glycine buffer. Methods for multiplexed staining and destaining are described, for example, in Bolognesi et al., J. Histochem. Cytochem. 2017; 65 (8): 431-444, Lin et al., Nat Commun. 2015; 6:8390, Pirici et al., J. Histochem. Cytochem. 2009; 57:567-75, and Glass et al., J. Histochem. Cytochem. 2009; 57:899-905, the entire contents of each of which are incorporated herein by reference.
In an aspect, the systems and methods described herein provide for the compartmentalization, depositing, or partitioning of one or more particles (e.g., biological particles, macromolecular constituents of biological particles, beads, reagents, etc.) into discrete compartments or partitions (referred to interchangeably herein as partitions), where each partition maintains separation of its own contents from the contents of other partitions. The partition can be a droplet in an emulsion. A partition may comprise one or more other partitions.
A partition may include one or more particles. A partition may include one or more types of particles. For example, s partition of the present disclosure may comprise one or more biological particles and/or macromolecular constituents thereof. A partition may comprise one or more gel beads. A partition may comprise one or more cell beads. A partition may include a single gel bead, a single cell bead, or both a single cell bead and single gel bead. A partition may include one or more reagents. Alternatively, a partition may be unoccupied. For example, a partition may not comprise a bead. A cell bead can be a biological particle and/or one or more of its macromolecular constituents encased inside of a gel or polymer matrix, such as via polymerization of a droplet containing the biological particle and precursors capable of being polymerized or gelled. Unique identifiers, such as barcodes, may be injected into the droplets previous to, subsequent to, or concurrently with droplet generation, such as via a microcapsule (e.g., bead), as described elsewhere herein. Microfluidic channel networks (e.g., on a chip) can be utilized to generate partitions as described herein. Alternative mechanisms may also be employed in the partitioning of individual biological particles, including porous membranes through which aqueous mixtures of cells are extruded into non-aqueous fluids.
The partitions can be flowable within fluid streams. The partitions may comprise, for example, micro-vesicles that have an outer barrier surrounding an inner fluid center or core. In some cases, the partitions may comprise a porous matrix that is capable of entraining and/or retaining materials within its matrix. The partitions can be droplets of a first phase within a second phase, wherein the first and second phases are immiscible. For example, the partitions can be droplets of aqueous fluid within a non-aqueous continuous phase (e.g., oil phase). In another example, the partitions can be droplets of a non-aqueous fluid within an aqueous phase. In some examples, the partitions may be provided in a water-in-oil emulsion or oil-in-water emulsion. A variety of different vessels are described in, for example, U.S. Patent Application Publication No. 2014/0155295, which is entirely incorporated herein by reference for all purposes. Emulsion systems for creating stable droplets in non-aqueous or oil continuous phases are described in, for example, U.S. Patent Application Publication No. 2010/0105112, which is entirely incorporated herein by reference for all purposes.
In the case of droplets in an emulsion, allocating individual particles to discrete partitions may in one non-limiting example be accomplished by introducing a flowing stream of particles in an aqueous fluid into a flowing stream of a non-aqueous fluid, such that droplets are generated at the junction of the two streams. Fluid properties (e.g., fluid flow rates, fluid viscosities, etc.), particle properties (e.g., volume fraction, particle size, particle concentration, etc.), microfluidic architectures (e.g., channel geometry, etc.), and other parameters may be adjusted to control the occupancy of the resulting partitions (e.g., number of biological particles per partition, number of beads per partition, etc.). For example, partition occupancy can be controlled by providing the aqueous stream at a certain concentration and/or flow rate of particles. To generate single biological particle partitions, the relative flow rates of the immiscible fluids can be selected such that, on average, the partitions may contain less than one biological particle per partition in order to ensure that those partitions that are occupied are primarily singly occupied. In some cases, partitions among a plurality of partitions may contain at most one biological particle (e.g., bead, DNA, cell or cellular material). In some embodiments, the various parameters (e.g., fluid properties, particle properties, microfluidic architectures, etc.) may be selected or adjusted such that a majority of partitions are occupied, for example, allowing for only a small percentage of unoccupied partitions. The flows and channel architectures can be controlled as to ensure a given number of singly occupied partitions, less than a certain level of unoccupied partitions and/or less than a certain level of multiply occupied partitions.
A partition may comprise one or more unique identifiers, such as barcodes. Barcodes may be previously, subsequently or concurrently delivered to the partitions that hold the compartmentalized or partitioned biological particle. For example, barcodes may be injected into droplets previous to, subsequent to, or concurrently with droplet generation. The delivery of the barcodes to a particular partition allows for the later attribution of the characteristics of the individual biological particle to the particular partition. Barcodes may be delivered, for example on a nucleic acid molecule (e.g., an oligonucleotide), to a partition via any suitable mechanism. Barcoded nucleic acid molecules can be delivered to a partition via a microcapsule. A microcapsule, in some instances, can comprise a bead. Beads are described in further detail below.
A bead may be porous, non-porous, solid, semi-solid, semi-fluidic, fluidic, and/or a combination thereof. In some instances, a bead may be dissolvable, disruptable, and/or degradable. In some cases, a bead may not be degradable. In some cases, the bead may be a gel bead. A gel bead may be a hydrogel bead. A gel bead may be formed from molecular precursors, such as a polymeric or monomeric species. A semi-solid bead may be a liposomal bead. Solid beads may comprise metals including iron oxide, gold, and silver. In some cases, the bead may be a silica bead. In some cases, the bead can be rigid. In other cases, the bead may be flexible and/or compressible.
A bead may be of any suitable shape. Examples of bead shapes include, but are not limited to, spherical, non-spherical, oval, oblong, amorphous, circular, cylindrical, and variations thereof.
Beads may be of uniform size or heterogeneous size. In some cases, the diameter of a bead may be at least about 10 nanometers (nm), 100 nm, 500 nm, 1 micrometer (μm), 5 μm, 10 μm, 20μ, 30μ, 40 μm, 50μ, 60μ, 70 μm, 80μ, 90 μm, 100μ, 250 μm, 500 μm, 1 mm, or greater. In some cases, a bead may have a diameter of less than about 10 nm, 100 nm, 500 nm, 1 μm, 5 μm, 10 μm, 20 μm, 30 μm, 40 μm, 50 μm, 60 μm, 70 μm, 80 μm, 90 μm, 100 μm, 250 μm, 500 μm, 1 mm, or less. In some cases, a bead may have a diameter in the range of about 40-75 μm, 30-75 μm, 20-75 μm, 40-85 μm, 40-95 μm, 20-100 μm, 10-100 μm, 1-100 μm, 20-250 μm, or 20-500 μm.
A bead may comprise natural and/or synthetic materials. For example, a bead can comprise a natural polymer, a synthetic polymer or both natural and synthetic polymers. Examples of natural polymers include proteins and sugars such as deoxyribonucleic acid, rubber, cellulose, starch (e.g., amylose, amylopectin), proteins, enzymes, polysaccharides, silks, polyhydroxyalkanoates, chitosan, dextran, collagen, carrageenan, ispaghula, acacia, agar, gelatin, shellac, sterculia gum, xanthan gum, Corn sugar gum, guar gum, gum karaya, agarose, alginic acid, alginate, or natural polymers thereof. Examples of synthetic polymers include acrylics, nylons, silicones, spandex, viscose rayon, polycarboxylic acids, polyvinyl acetate, polyacrylamide, polyacrylate, polyethylene glycol, polyurethanes, polylactic acid, silica, polystyrene, polyacrylonitrile, polybutadiene, polycarbonate, polyethylene, polyethylene terephthalate, poly(chlorotrifluoroethylene), poly(ethylene oxide), poly(ethylene terephthalate), polyethylene, polyisobutylene, poly(methyl methacrylate), poly(oxymethylene), polyformaldehyde, polypropylene, polystyrene, poly(tetrafluoroethylene), poly(vinyl acetate), poly(vinyl alcohol), poly(vinyl chloride), poly(vinylidene dichloride), poly(vinylidene difluoride), poly(vinyl fluoride) and/or combinations (e.g., co-polymers) thereof. Beads may also be formed from materials other than polymers, including lipids, micelles, ceramics, glass-ceramics, material composites, metals, other inorganic materials, and others.
In some instances, the bead may contain molecular precursors (e.g., monomers or polymers), which may form a polymer network via polymerization of the molecular precursors. In some cases, a precursor may be an already polymerized species capable of undergoing further polymerization via, for example, a chemical cross-linkage. In some cases, a precursor can comprise one or more of an acrylamide or a methacrylamide monomer, oligomer, or polymer. In some cases, the bead may comprise prepolymers, which are oligomers capable of further polymerization. For example, polyurethane beads may be prepared using prepolymers. In some cases, the bead may contain individual polymers that may be further polymerized together. In some cases, beads may be generated via polymerization of different precursors, such that they comprise mixed polymers, co-polymers, and/or block co-polymers. In some cases, the bead may comprise covalent or ionic bonds between polymeric precursors (e.g., monomers, oligomers, linear polymers), nucleic acid molecules (e.g., oligonucleotides), primers, and other entities. In some cases, the covalent bonds can be carbon-carbon bonds, thioether bonds, or carbon-heteroatom bonds.
Cross-linking may be permanent or reversible, depending upon the particular cross-linker used. Reversible cross-linking may allow for the polymer to linearize or dissociate under appropriate conditions. In some cases, reversible cross-linking may also allow for reversible attachment of a material bound to the surface of a bead. In some cases, a cross-linker may form disulfide linkages. In some cases, the chemical cross-linker forming disulfide linkages may be cystamine or a modified cystamine.
In some cases, disulfide linkages can be formed between molecular precursor units (e.g., monomers, oligomers, or linear polymers) or precursors incorporated into a bead and nucleic acid molecules (e.g., oligonucleotides). Cystamine (including modified cystamines), for example, is an organic agent comprising a disulfide bond that may be used as a crosslinker agent between individual monomeric or polymeric precursors of a bead. Polyacrylamide may be polymerized in the presence of cystamine or a species comprising cystamine (e.g., a modified cystamine) to generate polyacrylamide gel beads comprising disulfide linkages (e.g., chemically degradable beads comprising chemically-reducible cross-linkers). The disulfide linkages may permit the bead to be degraded (or dissolved) upon exposure of the bead to a reducing agent.
In some cases, chitosan, a linear polysaccharide polymer, may be crosslinked with glutaraldehyde via hydrophilic chains to form a bead. Crosslinking of chitosan polymers may be achieved by chemical reactions that are initiated by heat, pressure, change in pH, and/or radiation.
In some cases, a bead may comprise an acrydite moiety, which in certain aspects may be used to attach one or more nucleic acid molecules (e.g., barcode sequence, barcoded nucleic acid molecule, barcoded oligonucleotide, primer, or other oligonucleotide) to the bead. In some cases, an acrydite moiety can refer to an acrydite analogue generated from the reaction of acrydite with one or more species, such as, the reaction of acrydite with other monomers and cross-linkers during a polymerization reaction. Acrydite moieties may be modified to form chemical bonds with a species to be attached, such as a nucleic acid molecule (e.g., barcode sequence, barcoded nucleic acid molecule, barcoded oligonucleotide, primer, or other oligonucleotide). Acrydite moieties may be modified with thiol groups capable of forming a disulfide bond or may be modified with groups already comprising a disulfide bond. The thiol or disulfide (via disulfide exchange) may be used as an anchor point for a species to be attached or another part of the acrydite moiety may be used for attachment. In some cases, attachment can be reversible, such that when the disulfide bond is broken (e.g., in the presence of a reducing agent), the attached species is released from the bead. In other cases, an acrydite moiety can comprise a reactive hydroxyl group that may be used for attachment.
Functionalization of beads for attachment of nucleic acid molecules (e.g., oligonucleotides) may be achieved through a wide range of different approaches, including activation of chemical groups within a polymer, incorporation of active or activatable functional groups in the polymer structure, or attachment at the pre-polymer or monomer stage in bead production.
For example, precursors (e.g., monomers, cross-linkers) that are polymerized to form a bead may comprise acrydite moieties, such that when a bead is generated, the bead also comprises acrydite moieties. The acrydite moieties can be attached to a nucleic acid molecule (e.g., oligonucleotide), which may include a priming sequence (e.g., a primer for amplifying target nucleic acids, random primer, primer sequence for messenger RNA) and/or one or more barcode sequences. The one more barcode sequences may include sequences that are the same for all nucleic acid molecules coupled to a given bead and/or sequences that are different across all nucleic acid molecules coupled to the given bead. The nucleic acid molecule may be incorporated into the bead.
In some cases, the nucleic acid molecule can comprise a functional sequence, for example, for attachment to a sequencing flow cell, such as, for example, a P5 sequence for Illumina® sequencing. In some cases, the nucleic acid molecule or derivative thereof (e.g., oligonucleotide or polynucleotide generated from the nucleic acid molecule) can comprise another functional sequence, such as, for example, a P7 sequence for attachment to a sequencing flow cell for Illumina sequencing. In some cases, the nucleic acid molecule can comprise a barcode sequence. In some cases, the primer can further comprise a unique molecular identifier (UMI). In some cases, the primer can comprise an R1 primer sequence for Illumina sequencing. In some cases, the primer can comprise an R2 primer sequence for Illumina sequencing. Examples of such nucleic acid molecules (e.g., oligonucleotides, polynucleotides, etc.) and uses thereof, as may be used with compositions, devices, methods and systems of the present disclosure, are provided in U.S. Patent Pub. Nos. 2014/0378345 and 2015/0376609, each of which is entirely incorporated herein by reference.
A bead injected or otherwise introduced into a partition may comprise releasably, cleavably, or reversibly attached barcodes. A bead injected or otherwise introduced into a partition may comprise activatable barcodes. A bead injected or otherwise introduced into a partition may be degradable, disruptable, or dissolvable beads.
Barcodes can be releasably, cleavably or reversibly attached to the beads such that barcodes can be released or be releasable through cleavage of a linkage between the barcode molecule and the bead, or released through degradation of the underlying bead itself, allowing the barcodes to be accessed or be accessible by other reagents, or both. In non-limiting examples, cleavage may be achieved through reduction of di-sulfide bonds, use of restriction enzymes, photo-activated cleavage, or cleavage via other types of stimuli (e.g., chemical, thermal, pH, enzymatic, etc.) and/or reactions, such as described elsewhere herein. Releasable barcodes may sometimes be referred to as being activatable, in that they are available for reaction once released. Thus, for example, an activatable barcode may be activated by releasing the barcode from a bead (or other suitable type of partition described herein). Other activatable configurations are also envisioned in the context of the described methods and systems.
In addition to, or as an alternative to the cleavable linkages between the beads and the associated molecules, such as barcode containing nucleic acid molecules (e.g., barcoded oligonucleotides), the beads may be degradable, disruptable, or dissolvable spontaneously or upon exposure to one or more stimuli (e.g., temperature changes, pH changes, exposure to particular chemical species or phase, exposure to light, reducing agent, etc.). In some cases, a bead may be dissolvable, such that material components of the beads are solubilized when exposed to a particular chemical species or an environmental change, such as a change temperature or a change in pH. In some cases, a gel bead can be degraded or dissolved at elevated temperature and/or in basic conditions. In some cases, a bead may be thermally degradable such that when the bead is exposed to an appropriate change in temperature (e.g., heat), the bead degrades. Degradation or dissolution of a bead bound to a species (e.g., a nucleic acid molecule, e.g., barcoded oligonucleotide) may result in release of the species from the bead.
As will be appreciated from the above disclosure, the degradation of a bead may refer to the disassociation of a bound or entrained species from a bead, both with and without structurally degrading the physical bead itself. For example, the degradation of the bead may involve cleavage of a cleavable linkage via one or more species and/or methods described elsewhere herein. In another example, entrained species may be released from beads through osmotic pressure differences due to, for example, changing chemical environments. By way of example, alteration of bead pore sizes due to osmotic pressure differences can generally occur without structural degradation of the bead itself. In some cases, an increase in pore size due to osmotic swelling of a bead can permit the release of entrained species within the bead. In other cases, osmotic shrinking of a bead may cause a bead to better retain an entrained species due to pore size contraction.
A degradable bead may be introduced into a partition, such as a droplet of an emulsion or a well, such that the bead degrades within the partition and any associated species (e.g., oligonucleotides) are released within the droplet when the appropriate stimulus is applied. The free species (e.g., oligonucleotides, nucleic acid molecules) may interact with other reagents contained in the partition. For example, a polyacrylamide bead comprising cystamine and linked, via a disulfide bond, to a barcode sequence, may be combined with a reducing agent within a droplet of a water-in-oil emulsion. Within the droplet, the reducing agent can break the various disulfide bonds, resulting in bead degradation and release of the barcode sequence into the aqueous, inner environment of the droplet. In another example, heating of a droplet comprising a bead-bound barcode sequence in basic solution may also result in bead degradation and release of the attached barcode sequence into the aqueous, inner environment of the droplet.
Any suitable number of molecular tag molecules (e.g., primer, barcoded oligonucleotide) can be associated with a bead such that, upon release from the bead, the molecular tag molecules (e.g., primer, e.g., barcoded oligonucleotide) are present in the partition at a pre-defined concentration. Such pre-defined concentration may be selected to facilitate certain reactions for generating a sequencing library, e.g., amplification, within the partition. In some cases, the pre-defined concentration of the primer can be limited by the process of producing nucleic acid molecule (e.g., oligonucleotide) bearing beads.
In some cases, beads can be non-covalently loaded with one or more reagents. The beads can be non-covalently loaded by, for instance, subjecting the beads to conditions sufficient to swell the beads, allowing sufficient time for the reagents to diffuse into the interiors of the beads, and subjecting the beads to conditions sufficient to de-swell the beads. The swelling of the beads may be accomplished, for instance, by placing the beads in a thermodynamically favorable solvent, subjecting the beads to a higher or lower temperature, subjecting the beads to a higher or lower ion concentration, and/or subjecting the beads to an electric field. The swelling of the beads may be accomplished by various swelling methods. The de-swelling of the beads may be accomplished, for instance, by transferring the beads in a thermodynamically unfavorable solvent, subjecting the beads to lower or high temperatures, subjecting the beads to a lower or higher ion concentration, and/or removing an electric field. The de-swelling of the beads may be accomplished by various de-swelling methods. Transferring the beads may cause pores in the bead to shrink. The shrinking may then hinder reagents within the beads from diffusing out of the interiors of the beads. The hindrance may be due to steric interactions between the reagents and the interiors of the beads. The transfer may be accomplished microfluidically. For instance, the transfer may be achieved by moving the beads from one co-flowing solvent stream to a different co-flowing solvent stream. The swellability and/or pore size of the beads may be adjusted by changing the polymer composition of the bead.
The barcodes that are releasable as described herein may sometimes be referred to as being activatable, in that they are available for reaction once released. Thus, for example, an activatable barcode may be activated by releasing the barcode from a bead (or other suitable type of partition described herein). Other activatable configurations are also envisioned in the context of the described methods and systems.
In addition to thermally cleavable bonds, disulfide bonds and UV sensitive bonds, other non-limiting examples of labile bonds that may be coupled to a precursor or bead include an ester linkage (e.g., cleavable with an acid, a base, or hydroxylamine), a vicinal diol linkage (e.g., cleavable via sodium periodate), a Diels-Alder linkage (e.g., cleavable via heat), a sulfone linkage (e.g., cleavable via a base), a silyl ether linkage (e.g., cleavable via an acid), a glycosidic linkage (e.g., cleavable via an amylase), a peptide linkage (e.g., cleavable via a protease), or a phosphodiester linkage (e.g., cleavable via a nuclease (e.g., DNAase)). A bond may be cleavable via other nucleic acid molecule targeting enzymes, such as restriction enzymes (e.g., restriction endonucleases), as described further below.
Numerous chemical triggers may be used to trigger the degradation of beads. Examples of these chemical changes may include, but are not limited to pH-mediated changes to the integrity of a component within the bead, degradation of a component of a bead via cleavage of cross-linked bonds, and depolymerization of a component of a bead.
In accordance with certain aspects, biological particles may be partitioned along with lysis reagents in order to release the contents of the biological particles within the partition. In such cases, the lysis agents can be contacted with the biological particle suspension concurrently with, or immediately prior to, the introduction of the biological particles into the partitioning junction/droplet generation zone (e.g., junction 210), such as through an additional channel or channels upstream of the channel junction.
Also provided herein are kits comprising partitioning fluid, a barcoded bead, and a primer pair comprising: a 5′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 3′ end of a specific locus within the nucleic acid fragment, and a 3′ primer-adapter nucleic acid molecule comprising a nucleotide sequence that is complementary to a sequence located at the 5′ end of the specific locus. The kits may include one or more of the following: one, two, three, four, five or more, up to all of partitioning fluids, including both aqueous buffers and non-aqueous partitioning fluids or oils, nucleic acid barcode libraries that are releasably associated with beads, as described herein, microfluidic devices, reagents for disrupting cells amplifying nucleic acids, and providing additional functional sequences on fragments of cellular nucleic acids or replicates thereof, as well as instructions for using any of the foregoing in the methods described herein.
The invention now being generally described, it will be more readily understood by reference to the following examples which are included merely for purposes of illustration of certain aspects and embodiments of the present invention, and are not intended to limit the invention.
Human CA46 (ATCC, #CRL-1648), HEL (ATCC, #TIB-180), and CCRF-CEM (ATCC, #CRM-CCL-119) cell lines were maintained according to standard procedures in RPMI-1640 (Thermo Fisher Scientific, #11-875-119) with 10% (or 20% for CA46 cells) FBS (Thermo Fisher Scientific, #10-437-028) at 37° C. with 5% CO2.
Patient samples were obtained either as fresh peripheral blood or cryopreserved mononuclear cells isolated from bone marrow biopsies or peripheral blood. For fresh peripheral blood, mononuclear cells were isolated within 48 hours of blood collection utilizing a Ficoll (Thermo Fisher Scientific, #45-001-750) gradient according to manufacturer's recommendations. Isolated mononuclear cells were then resuspended in staining buffer (Biolegend, #420201) and incubated with Human TruStain FxC (10 minutes at 4° C.; Biolegend, #422302) to block Fc receptor-mediated binding. Cells were then stained with a CD34-PE-Vio770 antibody (20 minutes at 4° C.; Miltenyi Biotec, clone AC136, #130-113-180) and DAPI (Invitrogen, #D1306). The samples were then sorted for DAPI-negative, CD34-positive cells using a BD Influx cell sorter at the Weill Cornell Medicine Flow Cytometry Core.
Single Nucleus ATAC-Seq with GoT-ChA.
Nuclei isolation. Cells of interest were first subjected to nuclei isolation. Briefly, cells were resuspended with lysis buffer (10 mM Tris-HCL (pH 7.4), 10 mM NaCl, 3 mM MgCl2, 0.1% Tween-20, 0.1% Nonidet P40 Substitute, 0.01% Digitonin, 1% BSA) and incubated on ice (3 minutes for patient samples, 5 minutes for cell lines), followed by adding chilled wash buffer (10 mM Tris-HCL (pH 7.4), 10 mM NaCl, 3 mM MgCl2, 1% BSA, 0.1% Tween-20) and centrifuging to pellet isolated nuclei. Nuclei were then resuspended in a Tris-based buffer for subsequent tagmentation and counted using trypan blue and a Countess II FL Automated Cell Counter.
Tagmentation. For a target recovery of 10,000 cells, 16,000 nuclei were tagmented in bulk using standard Nextera-loaded Tn5 transposase on a thermocycler with the following program: 37° C. for 60 minutes, hold at 4° C.
Single-cell emulsion generation. After tagmentation, partitioning oil, uniquely barcoded gel beads, nuclei, and PCR reagent mixture containing the two locus-specific GoT-ChA primers (at a final concentration of 300 nM) were loaded onto a microfluidics chip to create a droplet-based single-cell emulsion. The two locus-specific GoT-ChA primers used are: GoT-ChA_R1N (primer with a Read 1N handle sequence: TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-[22 bp locus specific] (SEQ ID NO: 1)) and GoT-ChA_IS2 (locus-specific primer with an IS2 handle sequence: AGCAAGTGAGAAGCATCGTGTC-[22 bp locus specific] (SEQ ID NO: 2)). Critically, the gel beads are each coated with oligonucleotides that begin with the P5 Illumina sequencing adapter followed by a 16-nucleotide unique cell barcode and ending with a partial Read 1 Nextera sequence (AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNNNNNNNTCGTCGG CAGCGTC (SEQ ID NO: 3)).
In-droplet cell barcoding. After generation, the single-cell emulsion was transferred to a pre-chilled PCR strip tube and placed on a thermocycler for the following program: 72° C. for 5 minutes; 98° C. for 30 seconds; 12 cycles of 98° C. for 10 seconds, 59° C. for 30 seconds, 72° C. for 1 minute; hold at 15° C.
Emulsion cleanup and sample allocation for library construction. After cell barcoding, the emulsion is broken and barcoded nucleic acids are cleaned twice using magnetic beads. The first cleanup step utilizes Dynabeads MyOne SILANE, while the second utilizes SPRIselect beads at a 1.2× size selection ratio of beads: sample, eluting in a total volume of 45 μL. 5 μL of sample is set aside for GoT-ChA library construction, while the remaining 40 μL are used for standard ATAC-seq library construction.
ATAC-seq library construction. The 40 μL set aside for ATAC-seq is subjected to a sample indexing PCR, utilizing a partial P5 primer (AATGATACGGCGACCACCGAGA (SEQ ID NO: 4)) and a unique indexing primer beginning with the P7 Illumina sequencing adapter followed by an 8-nucleotide sample index and ending with a partial Read 2 Nextera sequence (CAAGCAGAAGACGGCATACGAGATXXXXXXXXGTCTCGTGGGCTCGG, “X” denotes user-defined sample index (SEQ ID NO: 5)), with the following thermocycler program: 98° C. for 45 seconds; 9 cycles (if targeting 10,000 cells; more cycles needed if a lower nuclei input was used) of 98° C. for 20 seconds, 67° C. for 30 seconds, 72° C. for 20 seconds; 72° C. for 1 minute, 4° C. hold. After the indexing PCR, the sample is subjected to a 0.4×/1.2× double-sided SPRIselect cleanup, eluting in a final volume of 20 μL.
GoT-ChA library construction: hemi-nested PCR and streptavidin-biotin pull-down. To generate the GoT-ChA library, two PCRs are performed, using the 5 μL set aside after the cell barcoding PCR cleanup step. The first PCR utilizes a hemi-nested primer design with partial P5 (binds the P5 Illumina sequencing handle: AATGATACGGCGACCACCGAGATCTACAC (SEQ ID NO: 7)) and GoT-ChA_nested (a nested, biotinylated, locus specific primer with a TruSeq Small RNA Read 2 handle: /5BiosG/CCTTGGCACCCGAGAATTCCA-[22 bp locus specific sequence, proximal to the mutation site relative to the GoT-ChA_IS2 primer binding site] (SEQ ID NO: 8)) primers with the following thermocycler program: 95° C. for 3 min; 15 cycles of 95° C. for 20 s, 65° C. for 30 s and 72° C. for 20 s; followed by 72° C. for 5 min and ending with hold at 4° C. After a 1.2×SPRIselect clean up, biotinylated PCR product is bound and isolated using Dynabeads M-280 Streptavidin magnetic beads (Thermo Fisher Scientific, #11206D). Briefly, beads are washed three times with 1× sodium chloride-sodium phosphate-EDTA buffer (SSPE, VWR, #VWRV0810-4L), added to the purified PCR product, and incubated at room temperature for 15 minutes. The beads are then washed twice with 1×SSPE buffer and once with 10 mM Tris-HCl (pH 8.0) before resuspending in water.
GoT-ChA library construction: on-bead sample indexing. The bead-bound fragments are then amplified and sample indexed using P5 and RPI-X (binds the TruSeq Small RNA Read 2 handle and adds a sample index and P7 Illumina sequencing handle: CAAGCAGAAGACGGCATACGAGATXXXXXX XXXGTGACTGGAGTTCCTTGGCAC CCGAGAATTCCA, “X” denotes user-defined sample index (SEQ ID NO: 6)) primers with the following thermocycler program: 95° C. for 3 min; 6-10 cycles of 95° C. for 20 s, 65° C. for 30 s and 72° C. for 20 s; followed by 72° C. for 5 min and ending with hold at 4° C.
Library QC and Sequencing Parameters. Final libraries were quantified using a Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, #Q32854) and a High Sensitivity DNA chip (Agilent Technologies, #5067-4626) run on a Bioanalyzer 2100 system (Agilent Technologies) and sequenced on a NovaSeq 6000 at the Weill Cornell Medicine Genomics Resources Core Facility with the following parameters: paired-end 50 cycles; Read 1N 50 cycles, i7 Index 8 cycles, i5 Index 16 cycles, Read 2N 50 cycles. ATAC libraries were sequenced to a depth of 25,000 read pairs per nucleus, and GoT-ChA libraries were sequenced to 5,000 read pairs per nucleus.
Single nucleus ATAC-seq with GoT-ChA (Modification).
Cells were subjected to nuclei isolation according to the Nuclei Isolation for Single Cell ATAC Sequencing protocol (version CG000169 Rev D, 10× Genomics). Briefly, cells were resuspended with lysis buffer (10 mM Tris-HCL (pH 7.4), 10 mM NaCl, 3 mM MgCl2, 0.1% Tween-20, 0.1% Nonidet P40 Substitute, 0.01% Digitonin, 1% BSA) and incubated on ice (3 minutes for patient samples, 5 minutes for cell lines), followed by adding chilled wash buffer (10 mM Tris-HCL (pH 7.4), 10 mM NaCl, 3 mM MgCl2, 1% BSA, 0.1% Tween-20) and centrifuging to pellet isolated nuclei. Nuclei were then resuspended in 1× diluted nuclei buffer (10× Genomics) and counted using trypan blue and a Countess II FL Automated Cell Counter.
Nuclei were subsequently processed according to the Chromium Next GEM Single Cell ATAC Solution user guide (version CG000209 Rev F, 10× Genomics) with the following modifications:
Final libraries were quantified using a Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, #Q32854) and a High Sensitivity DNA chip (Agilent Technologies, #5067-4626) run on a Bioanalyzer 2100 system (Agilent Technologies) and sequenced on a NovaSeq 6000 at the Weill Cornell Medicine Genomics Resources Core Facility with the following parameters: paired-end 50 cycles; Read 1N 50 cycles, i7 Index 8 cycles, i5 Index 16 cycles, Read 2N 50 cycles. ATAC libraries were sequenced to a depth of 25,000 read pairs per nucleus, and GoT-ChA libraries were sequenced to 5,000 read pairs per nucleus.
Raw sequencing data for GoT-ChA libraries were demultiplexed using cellranger-ATAC mkfastq. The GoT-ChA sequencing reads were then input into a series of custom pre-processing functions designed to result in a genotype-per-cell output.
First, the “FastqSplit” function takes input FASTQ files and splits them into smaller files with a user-defined n reads for later parallelized processing. Next, “FastqFiltering” runs on each newly generated split FASTQ file and identifies read pairs that do not pass a set of user-defined parameters for base quality filtering. Default usage was designed to identify poor quality bases at or surrounding a SNV site, though the function includes parameters to easily adjust for filtering of all base pairs in paired read sequences or of only a single read for each pair. “BatchMutationCalling” is then run, which first identifies whether a read contains the expected sequence of the nested primer used during library construction. If so, then each read's paired cell barcode is matched to a provided whitelist, within a Hamming distance of two. All reads that pass these two criteria are then assessed for whether the read contains a specified wildtype or mutant sequence at a designated position. This process is performed for each read in each split and filtered FASTQ file via parallelized processing through a slurm workload manager, ultimately outputting a cell barcode by genotyped reads matrix for each FASTQ file processed. Finally, “MergeMutationCalling” takes all matrices generated from each split FASTQ file and merges them together, grouping by cell barcode and summarizing the counts of reads that were identified to be wildtype, mutant, or neither.
The summarized genotyping data must then be integrated with the chromatin accessibility information via shared cell barcodes. Further, the genotyping data must be corrected for background noise. These two steps are achieved via a single function, “AddGenotypingArchR” that is compatible with the ArchR scATAC-seq pipeline. This function separates genotyping data from barcodes that match true cells present in the ATAC dataset from barcodes that are deemed “non-cells” from empty droplets. The quantity of signal for each genotype is quantified in empty droplets and, using a percentile of that noise, is corrected from genotyping signals in true cells. Next, both the raw and noise corrected genotyping data is added to the ArchR project's metadata by matching shared cell barcodes. Finally, the “FilterGenotyping” function is used to implement a minimum number of reads that is required for high confidence genotyping calls.
Purified antibodies were covalently conjugated to oligonucleotides containing unique barcodes for use in ASAP-seq experiments, as previously described (Van Buggennum et al., Sci Rep 6, 22675, 2016). Briefly, custom oligonucleotides with a sequence designed to mimic that of TotalSeq-B (Biolegend) that also contain a 5′ amino modifier are labelled with a trans-cyclooctene (TCO) polyethylene glycol (PEG, 4 units) N-hydroxysuccinimide (NHS) linker. Separately, purified antibodies (without any carrier proteins such as BSA or gelatin) against desired intracellular proteins are labelled with a methyltetrazine (mTz)-PEG4-NHS linker. Modified oligos and antibodies were then conjugated at 4° C. overnight, after which TCO-PEG4-glycine was used to quench residual tetrazine reaction sites on the antibodies. Conjugates were then pooled if desired and remaining unbound oligonucleotides were removed with ammonium sulfate precipitation and PBS washes using 50 kDa amicon filters (Millipore Sigma, #UFC505008).
Single Cell ASAP-Seq with GoT-ChA.
Samples were processed in a similar fashion to that described previously for standard scATAC-seq, with a few key differences as noted by the original authors (Mimitou et al., Nature Biotechnology, 2021, 39, pages 1246-1258)) and additional minor modifications for incorporation of GoT-ChA:
Final libraries were quantified using a Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, #Q32854) and a High Sensitivity DNA chip (Agilent Technologies, #5067-4626) run on a Bioanalyzer 2100 system (Agilent Technologies) and sequenced on a NovaSeq 6000 at the Weill Cornell Medicine Genomics Resources Core Facility with the following parameters: paired-end 50 cycles; Read 1N 50 cycles, i7 Index 8 cycles, i5 Index 16 cycles, Read 2N 50 cycles. ATAC libraries were sequenced to a depth of 25,000 read pairs per cell, and both GoT-ChA and any protein tag libraries were sequenced to 5,000 read pairs per cell.
To link genotypes to chromatin accessibility profiles at a single-cell resolution, we developed GoT-ChA by modifying the 10× Genomics single cell ATAC-seq (scATAC-seq) platform (
To test the ability of GoT-ChA to accurately capture genotype along with chromatin accessibility information we performed cell line mixing studies. Two human cell lines of discrete cell types (CA46, a Burkitt's lymphoma cell line, and HEL, an erythroleukemic cell line) and homozygous genotypes (
Thus, GoT-ChA allows for high-throughput simultaneous capture of genotypes and chromatin accessibility profiles in single cells, with high accuracy and cell recovery independent of expression level and genomic localization of the targeted region.
To integrate genotyping into scATAC-seq, we modified the broadly utilized 10× Genomics platform by adding two custom primers (GoT-ChA primers) to the cell barcoding PCR reaction mixture prior to loading the microfluidics chip for droplet generation (
To test the ability of GoT-ChA to accurately capture genotype along with chromatin accessibility information, we performed a cell line mixing study. Two human cell lines of discrete cell types (CA46, a B lymphocyte cell line; and HEL, an erythroblast cell line) and differing genotypes for the TP53R248 locus (CA46, TP53R248Q homozygous mutant; HEL, TP53R248 homozygous wildtype;
Analysis of the matching unprocessed genotyping data showed the presence of two distinct modes in the distribution of genotyping reads per cell. We reasoned that these distributions reflect cells for which genotyping was successfully captured versus cells displaying background noise from the genotyping library. To address this aspect, we developed a computational framework (
We further tested GoT-ChA targeting a different genomic locus, the JAK2V617 hotspot, with a separate mixing study to directly address heterozygous genotyping. Three human cell lines (HEL, an erythroblast cell line; SET-2, a megakaryoblast cell line; CCRF-CEM, a T lymphoblast cell line) with discrete JAK2V617 genotypes (HEL, JAK2V617F homozygous mutant; SET-2, JAK2V617F heterozygous; CCRF-CEM, JAK2V617 homozygous wildtype;
Altogether, these data demonstrate that GoTChA allows for high-throughput simultaneous capture of genotypes and chromatin accessibility profiles in single cells, with high accuracy and cell recovery independent of expression level and genomic localization of the targeted region.
The JAK2V617F mutation has a central role in the pathogenesis of myeloproliferative neoplasms. We sought to explore how this mutation disrupts the regulatory chromatin landscape that determines cellfate decisions of HSPCs. To address this question, we applied GoT-ChA to CD34+ sorted progenitor cells from seven patients with JAK2V617F-mutated MF with no additional mutations, who had either not been treated with JAK inhibition or were being treated with ruxolitinib, a JAK1/2 inhibitor, or fedratinib, a JAK2-specific inhibitor at the time of sample collection. The quality of the scATAC-seq data was not affected by removal of a small portion of the sample for GoT-ChA genotyping library construction (
Analysis of the GoT-ChA genotyping data (
Projection of genotypes onto the cell differentiation map demonstrated intermingling of JAK2 wildtype and mutant cells throughout HSPCs and myeloid progenitor clusters, while the common lymphoid progenitors (CLPs) and B cell clusters were mainly comprised of wildtype cells (
Nonetheless, projection of genotype densities onto the differentiation map suggested an uneven distribution of mutant cells across cell subtypes (
Inflammatory disruption of the bone marrow microenvironment has been extensively documented in myelofibrosis. Indeed, secretion of inflammatory cytokines is a central feature of MPN pathophysiology and has been shown to provide a supportive niche for the expansion of mutant clones in various disease states, including MPNs and clonal hematopoiesis. However, defining how wildtype and JAK2V617F mutant early progenitors differ regarding cell-intrinsic epigenetic profiles in human myelofibrosis remains unknown due to the inability to directly compare mutated and wildtype cells within primary human samples.
To delineate the effects of JAK2V617F on chromatin accessibility, we first compared gene accessibility scores (as the number of cells profiled varied between samples, linear mixture model (LMM) was used with patient sample identity explicitly modeled as random effect to account for inter-patient variability, followed by likelihood ratio test; see materials and methods) between wildtype and JAK2V617F-mutated cells within the HSPC1 cluster (
Consistently, gene pathway analysis revealed an enrichment in the inflammatory response pathway in mutant HSPC1 cells (Family-wise error rate [FWER]=0.1031; normalized enrichment score [NES]=1.375; Hallmark pathway M5932;
To further explore the regulatory underpinning of inflammatory phenotypes in HSPCs, we leveraged chromatin accessibility to infer transcription factor activity based on the accessibility of their DNA binding motifs (see materials and methods). Comparing wildtype and JAKV617F-mutated cells within the HSPC1 cluster, we uncovered a subset of transcription factors that show increased motif accessibility (false discovery rate [FDR]<0.05 and Δz-score>0.25) in early mutant HSPCs (
By leveraging longitudinal sampling obtained from an untreated patient with PV that later progressed to MF (Pt-01), we explored whether the pro-inflammatory phenotype as measured by JUN and NFKB1 motif accessibility preceded the increase in bone marrow fibrosis observed in MF. Indeed, we observed that both JUN and NFKB1 motif accessibilities were increased in JAK2V617F mutant early HSPCs already at the PV stage (
Myeloid/erythroid differentiation (LYL1, SNAI1, TCF4, and MESP1) also exhibited increased accessibility of their DNA-binding motifs in mutant cells (
To uncover further changes in transcription factor activity within later HSPCs, we performed differential transcription factor motif accessibility between wildtype and mutant cells within the HSPC2 cluster (
Consistent with mutant cells in the HSPC1 cluster, HSPC2 mutant cells also exhibited a decrease in the accessibility of transcription factor motifs implicated in stem cell quiescence (NFYA/B/C, YY1, YY2, GATA2 and HLF;
To link the activation of NF-kB signaling to canonical JAK2 downstream targets, we correlated NFkB-related transcription factor motif accessibility with members of the STAT family transcription factor motifs in HSPCs. We found increased correlation of STAT1, STAT3, STAT5A, and STAT5B motif accessibility with NF-kB factor motif accessibility, consistent with JAK2 activation of canonical STAT targets (
To validate the inflammatory and myeloid/erythroid signatures observed in mutant HSPCs, we leveraged bulk RNA-seq data of Lin−, Sca-1+, c-Kit+ (LSK) progenitor cells from a novel Dre-rox, Crelox dual recombinase Jak2V617F mouse model that allows for sequential knock-in followed by knock-out of the mutated allele (Jak2Rox/Lox/Jak2RL;
Collectively, gene accessibility and transcription factor motif accessibility comparisons revealed a subset of early HSPCs displaying cell-intrinsic proinflammatory phenotypes, as well as erythroid lineage priming in JAK2V617F mutant versus wildtype human HSPCs.
We next sought to define the epigenetic changes in the erythroid and megakaryocytic progenitors, the cell types undergoing significant clonal expansion of JAK2V617F-mutated cells (
Of note, BCL11A was one of the top transcription factors noted to have decreased motif accessibility in mutated erythroid progenitors (FDR=0.0032, Dz-score=−2.84;
A key clinical feature of myelofibrosis is a dramatic increase in megakaryocytes, which are thought to be one of the main cellular drivers of marrow fibrosis via pro-fibrotic cytokine and growth factor signaling. Differential transcription factor motif accessibility in MkPs revealed an increased activity of JUN and FOS family proteins (
To expand the reach of GoT-ChA for multimodality single-cell sequencing, we integrated it with ASAP-seq, a method that assays genome-wide chromatin accessibility simultaneously with targeted protein expression utilizing fixed whole cells (rather than nuclei in standard scATAC-seq). We applied the combined method to two MF patient samples, Pt-02 (untreated) and Pt-06 (ruxolitinib treated). Genotyping of the JAK2V617 locus was available for 2,663 out of 11,457 (23.2%) and for 489 out of 2,928 (16.7%) cells for Pt-02 and Pt-06, respectively (
To leverage the clonal phasing of mitochondrial variants and JAK2V617F, we developed a random forest classifier to impute missing JAK2 genotypes based on heteroplasmy levels. We trained the classifier on a random sampling of 90% of Pt-02 genotyped cells (training set) and assessed performance with the remaining 10% of genotyped cells (test set), resulting in a genotyping accuracy of 94.7% (
Cell clustering of the Pt-02 sample based on chromatin accessibility alone, and therefore agnostic to genotyping, protein, or mitochondrial data, resulted in the expected cell clusters (
We further leveraged GoT-ChA-ASAP for simultaneous measurement of protein expression, applying a panel of 21 cell surface protein markers to orthogonally validate cluster assignments (see materials and methods). Progenitor clusters HSPC1, HSPC2, and MPP showed increased CD34 and decreased CD38 staining, while MkPs showed increased CD41 and CD36, EPs showed high CD71, lymphoid clusters showed high CD99 staining, and T-cells showed high CD7 levels (
Overall, these results demonstrate the capability of GoT-ChA to deliver a highly multi-modal single-cell platform to link mutated genotypes with mitochondrial variants and cell surface proteins, together with chromatin accessibility, enabling discovery of clonal changes across multiple layers of information in a single, high-throughput, unified assay.
To overcome the limitations of existing technology, we adapted the droplet-based scRNA-seq platform (10× Genomics) to enable capture of somatic mutations in single cells. After generation of the barcoded cDNA library, we amplify the locus of interest, and integrate genotyping with scRNA-seq data via shared cell barcodes, demonstrating precision genotyping via species mixing studies (
We studied CD34+ cells from patients with CALR-mutated essential thrombocythemia. Mapping of mutant and wildtype cells across the differentiation topology demonstrated that they co-mingle throughout development, consistent with mutations arising in stem cells to populate the entire differentiation tree (
As ongoing clonal evolution results in multi-clonal populations, we require genotyping of multiple mutations in parallel. To test this ability, we targeted 3 mutations in CD34+ cells from a patient with myelofibrosis (
To demonstrate the unique discovery potential of single-cell multi-omics in the context of clonal mosaicism in normal tissues, we applied it to CM driven by DNMT3AR882 mutations (
To determine how aberrant DNA methylation may serve as a link between mutations in the de novo methylase DNMT3A and observed transcriptional dysregulation, we performed single-cell multi-omics that simultaneously profiles the cells' methylome and transcriptome, linked with mutation status (
Nanobody-tethered transposition may be performed as detailed at world wide web at biorxiv.org/content/10.1101/2022.03.08.483436v1
Profiling chromatin status at single-cell resolution has an enormous impact on functional epigenomic characterization in a variety of biological contexts. However, available protocols typically measure only a single histone modification profile in a single cell. Recently, the development of Nanobody-tethered transposition followed by sequencing (NTT-seq) allowed for multiplexed measurements of histone post-translational modifications and DNA binding proteins. Thus, up to three epigenomic features can be jointly measured in the same cell.
Since NTT-seq is based on the 10× Genomics single cell ATAC-seq platform, it would be compatible with the genotyping capture approach used in GoT-ChA. In this manner, multiple epigenomic features can be measured simultaneously with genotyping, in the same single cell.
How does it relate to the main protocol? Adapting the genotyping step from the GoTChA protocol into the NTT-seq protocol is straightforward (
The proposed modification adds an additional and unique feature to our genotyping method, as now epigenomic measurements can be extended to features beyond the chromatin accessibility profiles, towards specific post-translational histone modifications and DNA binding proteins. While not essential to practicing the invention, it uniquely allows to uncover how somatic mutations disrupt the specific epigenetic features targeted in the context of clonal mosaicism and cancer.
To capture multiple epigenetic modifications simultaneously with genotypes, we developed GoT-NTT-seq. After nuclei isolation, staining with the primary antibodies against the targets of interest is performed. Critically, primary antibodies must be from different species or immunoglobulin isotypes to allow multiplexing. Next, stained nuclei are incubated with NTT-bodies directing transposition events towards regions in which the primary targets are bound. Importantly, each NTT-body contains a specific barcoded adaptor allowing for downstream de-multiplexing of the targeted sequences. After transposition, nuclei are flowed into the 10× Chromium solution machine, together with locus-specific genotyping primers (GoT primers). Importantly, as in GoT-ChA, the GoT primers contain a partial overlap with read 1 sequence, allowing capturing of the genotyping fragments by the gel beads containing the unique cellular barcodes. Once the fragments are generated, downstream library preparation and sequencing are performed (
To test the feasibility GoT-NTT-seq, we performed a cell mixing study targeting the TP53 locus. We first generated a 1:1 mix of both HEL (erythroblastic cell line homozygous wild type for TP53-R248) and CA46 (lymphoblastic cell line homozygous mutant for TP53-R248). To verify our multiplexing capabilities, we targeted both phosphorylated RNA polymerase II (S2S5-PolII) and the repressive histone modification H3K27me3. Quality control of the GoT-NTT-seq data showed that inclusion of the genotyping approach was not detrimental (
Linking cell genotypes with cell phenotypes (epigenetic, transcriptional and protein profiling) dramatically empowers our ability to study somatic evolution in normal tissues. Specifically, we develop:
Current state-of-the-art technology is limited in its ability to connect genotypes with other ‘-omics’ profiles at the single-cell level. Methods that integrate DNA genotyping with RNA expression in single cells have been developed. For example, G&T-seq can capture single-cell genomic and whole transcriptomic data by extracting and separating gDNA and mRNA for amplification and sequencing. scRNA-seq data has also been used to identify single-nucleotide variant (SNV) drivers via the Smart-seq2 protocol, which captures full-length transcripts. TARGET-seq uses sensitive genotyping through targeted amplification of both genomic DNA (gDNA) and complementary DNA (cDNA) to enable integration of clonal structure with transcriptomic states. Finally, fluidics methods for single-cell genotyping have recently been coupled with oligo-barcoded antibodies, although these methods are limited to a small number of cell surface markers. However, collectively, these technologies share similar limitations such as low-throughput or low efficiency of somatic genotyping due to sparse coverage, and may therefore be insufficient to fully map clonal expansions in tumorigenesis and therapeutic escape. This highlights the critical need for high-throughput methods to integrate single-cell genotyping with regulatory, transcriptional and protein data.
Rationale. Epigenetic rewiring determines lineage commitment and stem cell identity. In particular, chromatin accessibility states have been shown to precede differentiation states via ‘priming’ of cells toward specific lineages. Therefore, epigenetic encoding likely plays a significant role in the interplay between the impact of somatic mutations and the cell identity leading to clonal outgrowth. To directly link clonal genotypes and chromatin accessibility, we develop GoT-ChA, adapting 10× snATAC to include gDNA-based genotyping.
Approach. To determine the regulatory networks that govern the impact of somatic mutations, we have developed a prototype single-cell chromatin accessibility assay that integrates somatic mutation genotyping (GoT-ChA,
Our preliminary application of GoT-ChA to JAK2 mutated myelofibrosis CD34+ cells showed effective genotyping together with cell identity identification using chromatin accessibility profiles (
To develop and test the ability of GoT-ChA to target mutations across genes of interest, we perform GoT-ChA on admixtures of cell lines with known variants across the genes, cluster and identify cell lines via gene accessibility patterns (see
Our preliminary experience with the GoT-ChA application demonstrated robust genotyping across TP53 mutations as well as a JAK2 mutation (>50% genotyping efficiency, >98% precision), as well as the ability to capture multiple alleles in a heterozygously mutated locus. Notably, performance was maintained between cell line admixtures and clinical samples in terms of genotyping efficiency. The clinical samples analysis also confirmed efficient genotyping across cell type without a strong dependence of chromatin accessibility as anticipated from the methods' design. These data importantly validate the ability of cell line admixtures proposed in the UG3 phase to serve as a robust tool for technology development.
Lastly, GoT-ChA is expanded to enable substantial multiplexed genotyping (20-30 loci total across the 10 recurrently mutated genes), particularly as many tissues exhibit complex clonal structures involving multiple concurrent mutations. Primer design is optimized by testing multiple primers for each locus to assess the impact of sequence specificity, distance from the targeted locus and the impact of local chromatin structure. These data aid in generating a primer design interface to ensure that GoT-ChA will be compatible with production level scale. Our preliminary data supports multi-plexing for up to 3 targets (
Rationale. Histone modifications act in a concerted fashion to regulate chromatin states and transcription. While accessibility measurements from ATAC-seq are informative, the diversity and richness of chromatin states are best captured through the combinatorial binding patterns of multiple histone modifications. These assays can reveal the presence of poised and active enhancers, promoters, repressed, and heterochromatic regions. Profiling multiple chromatin marks is crucial for thorough chromatin state characterization in heterogeneous tissues. Up to now, several high throughput chromatin factor-profiling assays have been developed. Cleavage Under Targets and Tagmentation (CUT&Tag) has most successfully been adapted to single-cell resolution assays. Briefly, CUT&Tag generates a library from DNA fragments colocalizing with the chromatin factor of interest by fragmenting out DNA surrounding antibody-bound chromatin factor. This process is facilitated by pA-Tn5 fusion protein, where protein A (pA) has an affinity to antibodies and Tn5 is a transposase that fragments DNA and tags fragments with sequencing-ready adapters. Given our ability to add genotyping to snATAC-seq, integration with additional epigenetic modification profiling expands our ability to understand the impact of somatic mutations on the epigenome.
Approach. Available protocols typically measure only a single histone modification profile in a single cell. To fill this technological gap in the field, our group has adapted the CUT&Tag method into a multiplexable assay, where the protein A-Tn5 construct is replaced by selective Nanobody-tethered Tn5 (NTT-body) fusion proteins (
The genotyping efficiency of GoT-EpiM is optimized. The genotyping efficiency is increased through: i) producing high quality barcoded recombinant NTT-bodies, minimizing the amount of free sequencing adaptors that can interfere with the genotyping reaction; ii) titering the amount of genotyping primers to improve the efficiency of the reaction; and iii) we aim to prolong the denaturation and hybridization steps in PCR to increase PCR fidelity in a microdroplet as has been done previously. This approach results in a higher percentage of cells for which we can generate high-confidence genotyping data.
Rationale. Our work using genotyping of transcriptomes (see
Approach. To accomplish this goal, we leverage recent advances that report the ability to perform tagmentation on RNA/DNA hybrid molecules, including RNA/cDNA hybrid molecules that arise from reverse transcription. Importantly, recent work (ISSAAC-seq) demonstrates the ability to perform this tagmentation in-situ and in single-cells without performing cellular lysis, allowing tagmentation to take place without physically separating a cell's RNA and chromatin molecules. Reverse transcription is performed in situ, which creates an RNA/cDNA hybrid. When performing this workflow, the 10×ATAC-seq bead can capture tagged chromatin fragments and cDNA molecules using the capture sites that are introduced during tagmentation. The 10× protocol with integrated GoT-ChA adds a droplet barcode, which can uniquely identify the cell-of-origin for each modality after computational demultiplexing, allowing simultaneous genotype, chromatin accessibility and transcriptome profiling.
To test the feasibility of this approach, we performed a pilot species-mixing experiment where we simultaneously profiled chromatin accessibility and mRNA expression using the ISSAAC-seq protocol, followed by shallow (MiSeq) sequencing. Using a single lane of the 10× snATAC-seq kit, we successfully recovered paired modality profiles for 8,257 cells. Importantly, both modalities exhibited a high degree of species-specificity, and species classifications were fully concordant across paired RNA and ATAC profiles (
Rationale. Integrating protein measurements into multimodal single-cell assays is widely adopted in the biological community (e.g., CITE-seq). Protein-measurements are robust, particularly given their large copy number per cell, and we have shown that these data are highly complementary to scRNA-seq and informative for discovering cell types, states, and developmental trajectories. We introduced ASAP-seq, an assay that marks the location of DNA-protein interactions while simultaneously quantifying cell surface protein levels based on a panel of barcoded antibodies (antibody-derived tags; ADT). While we have shown that this approach is compatible with GoT-ChA and allows to connect somatic genotypes with protein expression, this has been limited to cell surface markers. Given the use of extracted nuclei from frozen solid tissues, we require a technology that can connect genotypes with intra-nuclear protein capture (such as transcription factors) to highlight regulators whose abundance varies across somatic genotypes.
Approach. We have developed a single-cell profiling assay that simultaneously measures chromatin accessibility with intranuclear proteins (i.e. transcription factors), and intracellular proteins (i.e. phosphorylation-dependent signaling regulators). We combine the use of single-stranded binding protein to mask the negative charge of ADTs of intracellular proteins, substantially reducing background signal. In preliminary experiments, we profiled in-vitro human brain organoids, generating high-quality chromatin accessibility measurements (7,095 fragments/cell), paired with measurements of 64 intracellular/nuclear proteins (
All publications and patents mentioned herein are hereby incorporated by reference in their entirety as if each individual publication or patent was specifically and individually indicated to be incorporated by reference. In case of conflict, the present application, including any definitions herein, will control.
While specific embodiments of the subject invention have been discussed, the above specification is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of this specification and the claims below. The full scope of the invention should be determined by reference to the claims, along with their full scope of equivalents, and the specification, along with such variations.
This application is a U.S. National Stage Application of PCT/US22/52701, filed Dec. 13, 2022, which claims the benefit of priority to U.S. Provisional Patent Application Ser. No. 63/288,874, filed Dec. 13, 2021. The entire contents of these applications are incorporated herein by reference in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/052701 | 12/13/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63288874 | Dec 2021 | US |