This application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. The ASCII copy, created on Aug. 30, 2022, is named FrudakisSeqListUpdate-ascii and is 63,612 bytes in size.
The present disclosure generally relates to next generation sequencing (NGS), including single-cell NGS (SCNGS), and more particularly to enhancements to single cell or nucleus next generation sequencing for improving throughput and reducing costs.
Deoxyribonucleic acid (DNA) evidence of a crime scene is almost always derived from a mixture of donors and, as a result, forensic scientists often produce “average” results of the crime scene evidence. For example, if two individuals handled a firearm or other weapon before it was used in a homicide by a third individual, the analyst gets a composite of all three (3) profiles mixed together in what is referred to a “mixture.” Mixtures are often more complex and involve more than just three contributors. When mixture numbers increase and mixture proportions are relatively even, as is often the case, they cannot be resolved. As a result, the evidence from which they came is often uninformative to the case in court.
To illustrate by example, if there are three contributors, do the 11 and 13 alleles go together for one contributor, with the 15 and 15.3 to another, leaving the 16 as a homozygous third? Or does the 13 pair with 15 with some peak height imbalance, leaving the 11, 16 pairing and a 16 homozygote? Or are there four contributors, with four homozygotes and one heterozygote? Note that the problem is even more difficult for the “D2S1336” locus in
It is possible to combine them all into a single mixed (e.g., composite) profile and compare suspect standards with this mixed profile, but the statistics of such comparisons are weak due to the uncertainty involved. This weakness creates significant downstream problems when searching for unknown contributors in national databases, such as the Combined DNA Index System (CODIS), which is the United States' national DNA database created and maintained by the Federal Bureau of Investigation (FBI).
To further explain in relation to
In response to this problem, statistical geneticists have developed new software tools for Probabilistic Genotyping (PG) that alleviate the burden of manual de-convolution of alleles at each genetic position. These tools can consider more sophisticated metrics in a more standardized way. They are reliable in disentangling simpler mixtures of uneven proportions, but when the mixtures involve too many contributors (or even for simple ones), and/or when the contributors are evenly represented (e.g., 0.3, 0.28, 0.32 in a 3-person mixture), even they are not helpful.
Even when they are helpful, they are extremely difficult to explain in court, which limits their utility. To this day, you can say the words “mixture problem” to any forensic DNA analyst and they will immediately know what you are referring to. The mixture problem affects roughly two-thirds of all forensic samples analyzed in law enforcement crime labs, and, of these, two-thirds more than one-half of the profiles cannot be deconvoluted. They are either processed as less informative mixture profiles or, more often, simply discarded as not interpretable. Each year, a large number of violent crimes go unsolved because the perpetrator was either lucky or smart enough to use an item that had been used by or exposed to others before.
With singe-cell next generation sequencing (SCNGS), a full transcriptome (i.e., the entire collection of RNA molecules) or genome (e.g., epigenome, exome, or full) is determined for each cell of a sample, such that each cell produces its own identity profile or “vector.” Cells are encapsulated in tiny microreactors or droplets, within which the molecular biology takes place in isolation of every other cell, generating work products (e.g., amplified DNA) that is tagged with a unique barcode identifier. While the technology infrastructure exists for applying human identity panels in SCNGS, low throughput and high costs have prevented adoption.
To further explain, a typical vendor offering enables up to either (8) samples processed from cells to NGS library suitable DNA at a time at a cost of over $1,000-$3,000 per sample, depending on the number of samples processed and the vendor(s), and recent “cellplexing” versions of the technology are able to get the pricing down to about $60 per sample, with considerable work. The existing state-of-the-art forensic and diagnostics platforms, which do not provide single cell resolution and therefore do not solve the mixture problem or sensitivity problems, costs only $40 per sample. The further below $40 per sample that SCNGS profiling from forensic evidence can fall, the higher the likelihood of adoption of the technique.
Thus, there is a need in the field of forensic science for SCNGS to be further multiplexable relative to current state-of-the-art, so that profiles can be developed from cells contributed by different donors in sample mixtures. In forensics, it has been identified that one only needs a small sample of cells from each donor as exemplars for determining that donor's identity profile, not each of the donor's sometimes thousands of cells, of various cell types. Data from the entirety of the sample's complexity is not needed and, in fact, can consume valuable bandwidth in multiplexed reactions.
Similarly, in the fields of pre-clinical research, biomarkers are used to track tumor types and subtypes apparent from SCNGS data, such that treatment efficacy can be assessed prior to chemotherapy or after chemotherapy in the event of relapse. Tumors exist in complex microenvironments incorporating various normal cell types, and various clones or subtypes of tumor cells. The vast majority of SCNGS technology is designed for deep assessments of samples, such that large numbers of cells are sequenced, revealing the entire subtype complexity. In some situations, such as in diagnostic situations, or Drug Sensitivity Screening (DSS) screening situations, it has been identified that data from the entirety of the sample's complexity is not needed and, in fact, can consume valuable bandwidth in multiplexed reactions.
To provide a more complete understanding of the present disclosure and features and advantages thereof, reference is made to the following description, taken in conjunction with the accompanying figures, wherein like reference numerals represent like parts.
What are described herein are enhancements to single cell or nucleus next generation sequencing for significantly improving throughput and reducing costs.
The present inventive work relates to improved cost-effectiveness and functionality of human identity casework and diagnostics. Inventive aspects of the present disclosure may effect a disruptive transformation in these markets by enabling for the first time, cost-effective targeted gene profiles from individual cells part of sample mixtures. What is enabled is a more economical and efficient version of “horizontal” or gene-targeted Single Cell Next Generation Sequencing (SCNGS), such that markets, heretofore resistant to adoption, are opened up. SCNGS may sometimes be referred to as “low throughput” NGS, and NGS is sometimes referred to as Massively Paralleled Sequencing.
Notably, the present inventive work may help, once and for all, establish SCNGS as a means by which the “mixture problem” which has plagued the forensic genetics and human identity market since its inception in the late 90's can be solved in a cost-effective manner, likely to be adopted. In so doing, the present inventive work could help bring about an inflection point and establish a new gold standard or state-of-the-art in forensics and diagnostics. Just as notably, the inventive aspects disclosed herein may help establish a framework for translating recent SCNGS advances to the areas of diagnostics, which require low per-sample costs.
Although the present disclosure may focus on the forensic sciences in many instances, the mixture and sensitivity problems apply to clinical diagnostics as well, where, instead of losing information from cells corresponding to minor profile contributors, what is lost is information from cells corresponding to minor tumor subclones, or, in the case of cell culture authentication, for example, from minor cell culture contaminants.
A multi-step method and associated compositions for multiplexing different samples into single runs at significantly greater scale than heretofore possible, for use with typical vendor equipment and reagent platforms, are described.
According to some aspects of the present disclosure, what is involved may be or include a kit, constituting a set of consumables (e.g., conjugated multiplex primer solutions, plastics) and reagents (e.g., enzymes, deoxynucleotide triphosphates “dNTPs”, buffers, etc.) that seamlessly integrate with and elegantly transform recently-introduced SCNGS platforms for forensic genetics casework. The advance provided is in improving an established technology of amplifying deoxyribonucleic acid (DNA) from isolated cells such that it becomes cost-effective and efficient enough for application within the fields of forensic science and clinical diagnostics. According to some aspects, the present inventive work may enable SCNGS platforms to produce multiplex Short Tandem Repeat (STR) and Single Nucleotide Polymorphism (SNP) human identity (and related forensic phenotype) profiles from single cells over a large number of samples, such that the cost per sample and number of samples that can be processed in a day or week becomes competitive with the current, non-single cell state-of-the-art. In other aspects, the present inventive work may enable SCNGS platforms to produce sequence reads at significantly lower costs per sample for certain diagnostic target gene Research and Development (R&D) applications.
According to some aspects, the present inventive work may involve methods for the construction of conjugated multiplex primer sets and molecular biology workflow for seamless integration into existing SCNGS protocols to enable cells from different forensic or diagnostic samples to be multiplexed at high multiplicity into single runs through an SCNGS workflow. Instead of a single lane or channel of a SCNGS cartridge being devoted to a single sample as in standard SCNGS, or to 8 samples as with multiplexed SCNGS offered by some vendors, with the present invention, this lane can be devoted to dozens, hundreds or even thousands of samples, depending on the number of target genes and the desired depth per sample.
Single Cell (SC) systems to date have targeted the R&D market, and therefore have been designed as “vertical” platforms, allowing deep analysis of relatively few samples (such as the full genomes of many thousands of cells for a couple or few different samples). Because the goal of forensic and diagnostic applications of DNA sequencing are more “horizontal,” requiring relatively shallow data on a representative and small sampling of a sample, much of the capacity of present SC systems is not needed.
To overcome the above, SCNGS vendors have developed methods for tagging cells corresponding to discrete samples, so that cells of different samples can be combined prior to the expensive step of encapsulating cells into microdroplets, dramatically reducing the cost per sample. To do this, the vendors have used lipid vesicles containing oligonucleotide tags, where each tag is allocated to each sample of cells. The vesicles fuse with the cell membrane of these cells, releasing the tag oligos inside. Other gene targeting 5′ and 3′ primers, or just 5′ or just 3′ primers are provided on separate beads that are encapsulated with the cell, each bead containing a unique barcode that identifies the products from a particular cell. The other primers can be provided in bulk. The tag oligos become incorporated into amplification products from the cell by virtue of a universal sequence at the 3′ end, and tag these products by virtue of a barcode sequence at the 5′ end.
The problem with this method of multiplexing is that each of the cells in the sample consume reagents needed to produce the amplification products. This consumption utilizes channel bandwidth in subsequent cartridge and surface chemistry platforms needed to produce the SCNGS reads. When each cell of a sample occupies space in these platforms, it limits the degree of multiplexing that can be accomplished due to the platform's limited space.
According to some aspects of the present disclosure, a mechanism is provided by which amplification of a cell in SCNGS is accomplished for only select cells of a sample. This selective processing reduces the complexity of each donor's contribution to the reads, enabling a horizontal analysis, or skimming of the samples, needed in some R&D and routine diagnostics applications and virtually all forensic science applications. Such processing may bring the cost per sample down to levels suitable for mass adoption.
To better illustrate in relation to the figures,
With targeted gene SCNGS of the background/related art, the basic idea is to create a large number of micrometer or nanometer sized spaces, which may be termed microvesicles or “reactors”; such spaces may be microdroplets or nanodroplets on a solid surface or substrate, or physical microchambers, or microwells or nanowells created via electrokinetic gating (all of the foregoing which may be referred to hereinafter as microspaces or nanospaces, or microreactors or nanoreactors). Into these spaces are introduced an individual cell or nucleus along with a bead linked to barcoded primers. Subsequent thermal cycling steps enable extension of the primers along the target gene region, followed by amplification of the target loci for subsequent reading with next generation sequencing (NGS).
Upon lysis of the cell, and introduction of reagents for Polymerase Chain Reaction (PCR), the oligos bind to genetic material released by the cell or nucleus whereupon the oligo is extended in an extension step. The reagents for PCR may be or include, for example, dNTPs and, in some cases, reverse transcriptase, and thermostable or other DNA polymerase enzyme. For applications targeting ribonucleic acid (RNA) transcripts, complimentary (cDNA) is first created for each messenger RNA (mRNA). Once the bead is dissolved enzymatically or photochemically, the oligos are released, find the RNA, and convert it to cDNA. For applications targeting DNA loci, this cDNA step is not necessary; extension and amplification take place directly on the released genomic DNA.
Because each oligo attached to the bead contains a common and unique, bead-specific barcode incorporated into its sequence, all genetic sequences extended and subsequently amplified from it are identifiable as having been originally captured and extended from the oligos provided by a particular bead. As the bead was encapsulated with a single cell or nucleus in its nanoreactor, all amplification products generated inside the microvesicle may be attributable to that single cell.
For vendors targeting DNA instead of RNA, the oligos attached to the beads contain a universal sequence, a barcode sequence, and a 3′ common sequence. Here, each oligonucleotide sequence may be referred to or abbreviated in the form of “univ-bc-comm.” A panel of primers may be generated to target specific loci, where each primer contains the common sequence at the 5′ end, which thereby facilitates incorporation of the barcodes into the amplification products. The 3′ primers will often contain another, different universal primer sequence at their ends. Here, the result is that the amplification products from mRNA (or cDNA) or genomic DNA each contain a unique, cell and droplet specific barcode sequence.
After this processing, the amplification products may be prepared for NGS, by amplifying each with a unique pair of tagged universal primers in a grid format. Using the grid format, samples of each column receive a 5′ universal primer with a unique tag, and samples of each row receive a 3′ primer that matches the second universal site linked to a unique tag. Here, the sample can be identified by the pair of tags incorporated into the resulting amplification products.
The barcodes only resolve the molecule (one barcode) and the cells (another barcode) within the sample, and do not identify the sample. Therefore, each sample must be processed in isolation of each other. This processing takes several hours and consumes many expensive cartridges and reagent kits. Here, the run products must be kept track of manually for later next generation sequencing, which is error prone. Costs are roughly $5500 per sample all the way through NGS, which is far too expensive for routine diagnostics, R&D, or forensic science applications which targets costs of $40 or less.
After library preparation, sequences produced from each cell are identified by the unique barcode, and therefore the amplification products from all cells of a sample can be pooled together. Different samples of cells are then separated into different channels, however, because there is no way at this stage to identify the sample from which the cell and sequence was derived.
Using channel separation, each of the expensive cartridges and reagent kits can only handle eight (8) samples at a time. There is a new alternative method available that allows multiplexing of eight (8) samples per channel, but this only gets costs down to $687 per sample. A primary driver of costs is the expense of the reagents, labor and consumables for the single cell isolation portion of this workflow.
Thus, the above-described process may be multiplexed eight-fold with use of state-of-the-art SCNGS multiplexing. This multiplexing may be performed by adding lipid vesicles with tag primers to the cells of each sample. The vesicles fuse with the cells, and the tag primers are internalized by the cells. The tag primer may contain a sample-specific barcode and a 3′ end that matches a terminal sequence on the beads, allowing them to be incorporated into the amplification products produced by the bead primers.
Cartridges containing, for example, eight (8) sample wells of a microtiter plate may be used to process eight (8) samples at a time, at costs around several hundred dollars per sample. Multiplexing of samples to reduce per-sample-costs in this processing step may be achieved by using beads as described above which contain half of the primers targeting the specific regions of interest (e.g., bead 252 with the plurality of oligonucleotides 254 of
With use of the above-described sample multiplexing processing, the depth of each sample is not significantly diminished and the data produced are as rich as that produced from unmultiplexed sample processing. For example, the same number of reads per target locus in the same number of cells (e.g., thousands) is achieved. As is apparent, the above-described process consumes valuable channel bandwidth.
For forensic and certain diagnostics-targeted SCNGS applications, only a dozen or few dozen cells worth of data from each sample are needed for identification. In certain R&D applications, it is desired to track only cells of a certain type, such as tumor cells instead of normal cells. The remaining thousands of cells for which data are not needed consume reagents and cartridge bandwidth (e.g., well space or two-dimensional surface area), which is wasteful in these applications. As is apparent, conventional processes impose limitations on the number of samples that may be multiplexed and corresponding limitations on the cost-per-sample reduction that can be achieved.
It is understood that workflow 300 (without the dashed box) produces data at about $5,500 per sample and a throughput of eight (8) samples per week. The throughput (e.g., only 8 samples per week, depending on labor investment) and cost (e.g., up to $5,500 per sample) are problems that are debilitating for forensic science and diagnostics use, which may target costs at less than $40 per sample and a throughput of hundreds of samples per week for SCNGS to be adopted.
All of this background is known to those familiar with the current state-of-the-art and SCNGS datasets which are published monthly. Conventional “sample” multiplexing attempts to resolve some of the above-indicated problems. Conventional sample multiplexing involves the insertion of a conventional “sample tagging” step, where cells are allocated to microreactors, prior to performing library preparation process 306. Use of the conventional sample tagging step provides modest cost reductions down to hundreds of dollars per sample (e.g., about $500) and modest throughput gains up to 64 samples per week.
Even with sample multiplexing, SCNGS is not presently employed for forensics and routine diagnostics. The reason is that existing protocols and platforms provide for price-per-sample reductions to about $100, whereas competing bulk technologies that are currently used perform at prices of about $40 per sample or less. This particular problem may be referred to herein as the “SCNGS cost problem.” Further, breaking a set of samples into, for example, 8 sets of 8 multiplexed samples, imposes significantly more work in terms of pipetting and tracking than would be the case if the set of 64 samples were processed simultaneously. Laboratory technicians, daunted by the challenge, only attempt such multiplexing if they are highly-experienced and ambitious, which results in non-adoption of the processing. This other particular problem may be referred to herein as the “SCNGS throughput problem.”
According to at least some implementations, an enhanced sample multiplexing process involving enhanced sample tagging may be inserted prior to library preparation process 306 of
Thus, in some implementations, the present invention may constitute a series of steps in the form of a module 350 that fits into and is inserted into the workflow 300 as indicated in
In at least some implementations, the inventive processes may overcome both the SCNGS cost and throughput problems at the same time. Preferably, the above may be achieved with use of true sample multiplexing, at a meaningful scale, using an innovative molecular biology approach that is seamlessly-integrated into existing transcriptome or targeted DNA SCNGS workflows.
In some implementations, the inventive processes may achieve the above by allowing a multiplexing of samples prior to injection into microreactors, such that larger numbers of samples (e.g., 96, 384, or more) may be processed at a time. This may be accomplished by attaching the second set of primers to an agent, such as an antibody (e.g., proteoglycan or glycoprotein), such that only microreactors containing particular cells capable of binding to the agent receive both of the primers for each target locus, produce amplified DNA, and consume reagents and single-cell as well as NGS cartridge bandwidth.
The selective targeting of only particular cells of each sample may be achieved by using agents that recognize and bind to cell or nuclear membrane proteins and/or sugars that express the relevant epitopes or cellular targets. For example, with a forensics sample containing a mixture of tissue and cell types, epithelial cells may be targeted for selection. As another example, with a forensics sample containing a mixture of blood cells, CD34 positive white blood cells may be targeted for selection. As yet another example, a cell culture sample used in drug sensitivity screening or other diagnosis, one could target cells expressing aberrant “glycomes” or cell-surface sugar profiles using proteoglycans or glycoproteins identified from other systematic screening processes, thereby eliminating normal cells from the analysis, and restricting the use of reagents and consumables bandwidth to tumor cells.
In at least some implementations, each cell may be essentially tagged with its “sample” membership (e.g., in a unique well of a microtiter plate), where a subset of the cells in each one of the samples is targeted. The targeting may be performed in order to maximize the efficiency with which the reagents and cartridge bandwidths are utilized, allowing for greater “stacking” of samples per cartridge. For example, if 384 samples are stacked into each sample well of the cartridges, the non-multiplexed $1,000 per sample cost of the single cell portion of the workflow may be reduced to under $1 per sample. Since bulk analysis protocols that lack single-cell resolution cost about $40 per sample, the motivation for adoption becomes not only that of gaining the power of single-cell resolution for solving the mixture and sensitivity problems, but of cost savings. Advantageously, the present inventive work has therefore opened up the forensic science market to SCNGS by alleviating the primary impediments to its adoption.
Accordingly, the present inventive work may overcome both SCNGS cost and throughput problems at the same time with use of an innovative molecular biology approach and workflow that can seamlessly integrate into existing transcriptome or targeted DNA SCNGS workflows.
Beginning at a start block 402 of
Next, a plurality of single-stranded concatenated polynucleotides may be synthesized or generated (step 406 of
As illustrated in
In some implementations, each universal sequence (“univ”) may be 10, 15, 20 or 30 or more nucleotides in length and may be synthesized using any suitable state-of-the-art approach. In some implementations, the polynucleotide concatenates may have additional universal primer binding sites (e.g., other than “univ”) as needed, for example, to facilitate subsequent processing steps (e.g., as described later below). In some implementations, each gene sequence (“gene”) may have a size that is selected based on a desired melting temperature Tm for amplification reactions.
For example,
Also as mentioned above, each single-stranded concatenated polynucleotide may be uniquely associated with a barcode (“bc”) (e.g., which is repeated in each repeating pattern of the polynucleotide concatenate). With reference back to
To illustrate the above by example, for a targeted set of loci where N=2, a bead may be associated with oligos including univ-bc-gene1 and univ-bc-gene2 (which provide the 5′ primer sequence for gene1 and gene2), and a single-stranded concatenated polynucleotide may be comprised of univ-bc-gene1+univ-bc-gene2 (which provide the 3′ primer sequence for gene1 and gene2).
In the illustrative example associated with the targeted set of CODIS STR loci, where the twenty-four (24) primers represented in the bead library are 5′ primers for each of 24 different target loci, there may be 24 “univ-bc-gene” sequences that are strung in the polynucleotide concatenate. Each single-stranded concatenated polynucleotide may be comprised of the repeating pattern of “univ-bc-gene” sequences, one (1) common “univ” sequence, one (1) common “bc” sequence, and the 24 different “gene” sequences representing the 3′ primers for the 24 different target loci. In alternative implementations of the above, the 5′ and 3′ primers may be reversed, where the beads with the oligos provide the 3′ primers and the single-stranded concatenated polynucleotides provide the 5′ primers.
For a gene set of twenty-four (24) CODIS STR loci, the length of each single-stranded concatenated polynucleotide may be about three-thousand (3,000) nucleotides long; synthesis of long polynucleotides of this type are available today through established vendors, such as Twist Bioscience Corporation, of San Francisco, Calif., U.S.A.
In preferred implementations, at each “gene-univ” junction of each single-stranded concatenated polynucleotide, the “gene” and “univ” segments may be selected so as to create or fashion a restriction enzyme binding/cleaving site (see a plurality of sites 505, indicated as arrows in
Notably, each single-stranded concatenated polynucleotide is provided with a moiety on its 5′ end to facilitate a subsequent binding of the polynucleotide concatenate to a (e.g., cell) binding agent. In some implementations, the moiety is biotin. In the illustrative example, the binding agent is an antibody which may recognize an epitope found on the cellular/nuclear membrane of some of the cell types in the sample, and has at least one conjugated binding site for the 5′ end moiety (e.g., biotin) of the polynucleotide concatenate. As biotin binds to streptavidin, the antibody may be conjugated to streptavidin.
As shown in
In the illustrative example, the binding agent that is utilized in the process may be an antibody. In other implementations, other agents or chaperone entities may be utilized. For example, specific proteoglycans or glycoproteins may be utilized depending on the application, or even lipid vesicles for indiscriminate microdroplet targeting irrespective of the cell or cell type inside.
Even further in the illustrative example, an antibody epitope that is differentially-expressed amongst cells of the samples, such that only a fraction of the cells of each sample are targeted, may be utilized. In other implementations, a ubiquitously-expressed antibody present on the membranes of all cells or nuclei may be utilized. With respect to the illustrative example of the antibody epitope that is differentially-expressed, certain alpha or beta integrin chains, such as beta 1, alpha 2 and alpha 3, which are regionally-expressed among epithelial cells depending on their position in various epithelia, including oral epithelium and with beta 4 subunits, in gastric epithelium, may be used to select only a fraction of the epithelial cells in each sample. With respect to oral epithelium, see, e.g., Thorup A, Dabelsteen E, Schou S, Gil S, Carter W and J Reibel; “Differential expression of integrins and laminin-5 in normal oral epithelia,” APMIS, 1997, July; 105(7):519-30. With respect to gastric epithelium, see, e.g., Virtanen I, Tani T, Back N, Happloa O, Laitinen L, Kiviluoto T, Salo J, Burgeson R, Lehto V and E Kivilaakso; “Differential expression of laminin chains and their integrin receptors in human gastric mucosa,” Am J Pathol. 1995; October; 147(4):1123-32.
Indeed, many of the integrin subunits involved in basal membrane adhesion may be regionally-expressed within anatomically-defined epithelia. Alternatively, one could target CD138+ (e.g., mature, circulating) B-cells from whole blood evidence per samples (CD138 is expressed on the cell membrane of such cells), or CD1 positive cells indicative of epithelium, or CD54 positive cells indicative of endothelium, or CD340/HER-2, a well-known epithelial tumor antigen expressed on the surface of many breast and ovarian tumor cells. With respect to CD1 positive cells indicative of epithelium, see, e.g., for a comprehensive list of such useful antigens, at https://www.sinobiological.com of Sino Biological US Inc. of Chesterbrook, Pa., U.S.A., or more specifically, https://www.sinobiological.com/research/d-antigens/epithelial-cell. With respect to CD340/HER-2, see, e.g., Mitri and O'Reagan, “The HER2 Receptor in Breast Cancer. Pathophysiology, Clinical Use, and New Advances in Therapy,” Chemotherapy Research and Practice,” 2012, 2012:743193. Doi:10.1155/2012/743193.
Continuing in the flowchart 400 of
Next, the plurality of single-stranded concatenated polynucleotides in each well of the microtiter plate may be converted into double-stranded form (step 410 of
In
In some implementations, this terminal sequence utilized in this step may be the last gene in the set, in which case the sequence that is complimentary to this sequence may be used as the primer. In alternative implementations, the terminal sequence may be a second universal sequence at the 5′ and 3′ ends of the polynucleotide concatenate. See, e.g., concatenated polynucleotides 502a of
Next, a binding agent (e.g., a chaperone molecule or entity) may be added to each well of the microtiter plate (step 412 of
In the illustrative example, the 5′ end moiety is biotin, which binds to streptavidin, and therefore this antibody may be conjugated to streptavidin. Such streptavidin-conjugated antibodies are well-known to those familiar with the state-of-the-art and are readily available from a variety of vendors; even kits for preparing streptavidin-conjugated antibodies are available. See, e.g., https://www.bio-rad-antibodies.com of Bio-Rad Laboratories, Inc. of Hercules, Calif., U.S.A., or more particularly, https://www.bio-rad-antibodies.com/kit/streptavidin-conjugation-kit-lnk16.html?f=kit. Streptavidin may be conjugated randomly to lysine residues of the antibody with an average of two molecules of streptavidin per antibody. See, e.g., Hoffmann et al., “Rapid conjugation of antibodies to toxins to select candidates for the development of anticancer Antibody Drug Conjugates (ADCs),” 2020, Sci Rep 10, 8869.
Next, different samples which comprise cells or nuclei (depending on the application and antibody epitope or agent binding specificity) may be placed into the microtiter plate (step 414 of
Thus, different concatenated polynucleotides from different samples (e.g., concatenated polynucleotide 504b of sample 541 and concatenated polynucleotide 506b of sample 543 of
Next, the contents of each well of the microtiter plate may be pooled into a single tube or well (step 416 of
Thus, the cells from all of the different samples may be pooled. In at least some cases, each cell of each sample will be bound to an antibody and, in the illustrative example, only some of the cells are bound because the epitope is differentially-expressed amongst cells within each sample. The antibody may be linked to a double-stranded polynucleotide concatenate containing an integrated barcode that is indicative of the well from which the cells or sample came (e.g., bc1, bc2, bc3, etc.).
Next, the antibody-bound cells/nuclei are then injected, fused, or otherwise integrated into microvesicles (e.g., microreactors, microdroplets, emulsion, etc.) together with the bead library (step 418 of
Thus, what is provided is an encapsulated, cell-bound antibody linked to a double-stranded polynucleotide concatenate containing 5′ primers (e.g., or 3′ primers), where the polynucleotide concatenate has integrated a barcode indicative of its sample origin, along with a vendor-supplied bead that is linked to oligonucleotides containing the ‘3 primers (e.g., or 5′ primers) for the targeted loci. Within the bead primers is integrated a different barcode indicative of the cell, bead, and emulsion (indicated with green “G” color in
The following steps may be performed in any one of a number of different alternative orders. When components such as enzymes, buffers, or reagents (e.g., dNTPs) are added to microdroplets, there are any one of a variety of different means by which this processing may be accomplished. For example, means described in various background/related art may be utilized, for example, processing associated with U.S. Pat. No. 10,501,739, which may include the addition of the components in the fluid that is initially encapsulated with a cell and/or bead, the merging of emulsion microdroplets through the use of microfluidic junction nodes, the contacting of a microdroplet with a solution containing the component wherein the component enters the microdroplet through diffusion (perhaps even active transport), the injecting of the microdroplet with a solution containing the component, and/or the flowing of the component into a carrier fluid comprised of microdroplets.
The double-stranded concatenated polynucleotide may be digested with restriction endonuclease which is added to each microdroplet (step 420 of
The excess univ primers may then be used in linear amplification for creating an excess of “sense” univ-bc-gene strands for each element (step 422 of
Next, proteinase may be added to the microdroplets for disrupting of the nucleosome/DNA structures, thereby allowing access to the genome (step 424 of
In some implementations, the polynucleotide concatenate may include a dummy element at its 5′ end (see, e.g.,
The bead may then be degraded or dissolved, for releasing the oligos or primers (step 426 of
A thermostable DNA polymerase and dNTPs may then be added, and targeted PCR is carried out for amplifying the target gene panel (step 428 of
Thus, what are produced are amplification products where the polynucleotide barcode integrated into the primer at the 3′ end (or 5′ end) informs as to the sample (e.g., microtiter plate well) and the bead library barcode integrated into the primer present at the 5′ end (or 3′ end) informs as to the cell within that sample. Notably, the amplification products of the sample multiplex are obtained from only a subset of cells in each sample.
In
As is apparent, all of the amplification products may be pooled into a single tube and, not only is the information built into the amplification products informative as to the cell of origin, but the sample of origin as well. This has been accomplished as a “skim” on each sample, without wasting enzyme/reagents or generating amplicons for every cell of the sample. Because only a portion of the cells of each sample were sampled, what is produced is a “shallow” set of amplification products for each sample, enabling more samples to be combined or stacked into a well or channel of vendor-provided equipment and cartridges.
In this way, multiple samples (e.g., 96, 384, or more, depending on the plate configuration) may be combined for single runs through vendor-provided emulsion generation and NGS systems to reduce costs associated with these runs. This is the major driver in per-sample SCNGS costs, especially the former. Each of these runs requires consumables and reagents that are the primary determinants of per sample costs, and, dividing these costs by a large number of samples enables a lower per sample cost.
The barcodes now serve two purposes, not just one. In particular, they resolve the cells within the sample in addition to the well or sample identity. Therefore, all samples may be processed together in a few hours, consuming a single cartridge and reagent kit, and submitted to NGS at the same time to thereby consume only one channel of the NGS kits, where the resulting sequences may be computational-attributed (in a foolproof or guaranteed manner) to both cells and samples of origin during analysis.
With reference back to
A preferred embodiment of the type of product that may be provided is a 96, 384 or higher-well plate, within which are dried-down, antibody-polynucleotide concatenates, ready for cellular sample addition. In some implementations, these may be made to order, for the target genes of interest, or alternatively may be standardized for specific applications. In forensic science applications, as the evidence presented almost always contains one or more of epithelial, blood, and semen cells, it may be advisable to include antibodies recognizing epitopes for certain epithelium, white blood cells and semen, to “skim” each sample in a manner inclusive of the possible cell types that may be present and avoid type II (false negative) error.
Rather than load sample cells into the cartridge channel directly, the user may load the sample cells into the microtiter plate wells containing the antibody (or other agent) bound polynucleotide concatenates, incubate, pool together, and then load into the cartridge channel.
Prior to the present inventive work, the problem that limits SCNGS market growth with the current state-of-the-art was:
1) Throughput—only eight (8) samples per week per machine; and
2) Cost—from $687 (low-order multiplexed 8-fold)-$5,000 per sample (not multiplexed).
For example, drug discovery research requires large numbers of pre- and post-drug SCNGS datasets (one for pre-treatment cells, one for post-treatment cells). A screen of 384 drug compounds would require 768 tests, take 96 weeks with one machine, and cost $2M. This kind of experiment is simply not feasible for routine drug discovery screening requiring hundreds of such plates. As another example, in forensic science, the current gold-standard, which suffers from sensitivity and resolution problems, is $40 per sample, with a throughput of about four-hundred (400) samples per week per machine. Even though SCNGS is the solution to the “mixture problem” and “sensitivity problem”, the community simply has not adopted at $5,000 per sample and eight (8) samples per week per machine throughput because these costs and throughput are simply not feasible for routine casework requiring several hundred samples be analyzed affordably each week.
On the other hand, in some implementations of the present disclosure, what may now be achieved is:
1) Throughput—three-hundred eighty-four (384) samples or more per week.
2) Cost—$2-$30 per sample.
In the first drug discovery described above, a screen of 384 drug compounds would require 768 tests at a cost $7,500 and take about one week to complete. Such costs and throughput are feasible for routine screening that requires hundreds of such plates. In the forensic science example, the costs at $3-$30 per sample and 384 samples per week per machine throughput are better than current gold standard of $40 per sample and 384 samples per week per machine, while solving the sample mixture and sensitivity problems plaguing the industry, making it feasible for routine casework requiring several hundreds of samples analyzed economically each week.
Note that, for vertical applications of SCNGS, it may not be advisable to multiplex samples too extensively, lest the channel depth of the NGS system be exceeded and the desired read depths per cell not achieved. The commercially-available limit presently available with background/related art cell multiplexing systems, which tag all cells of a sample indiscriminately, is eight (8). These methods utilize lipid vesicles containing oligonucleotides with N “bc”-“univ” sequences, where N is the number of samples, and a bulk set of 3′ primers (or 5′ primers) containing the “univ” sequence at its 5′ end. The lipids fuse with cells which are then partitioned into microdroplets with bead bound 5′ primers (or 3′ primers) and the resulting amplification products incorporate the “bc”-“univ” tags, thereby identifying the sample. The problem with this approach is that, without cell counting, the samples with higher cell densities will “hog” channel bandwidth relative to those with lower cell densities. And, due to the stochastic nature of polymerase chain reaction, cells from the sparser samples may be out-competed for polymerase and primers, resulting in their signal being lost in the NGS data. This may be the reason why the method is limited to multiplex factors of eight (8).
In elevating this limit substantially, the present inventive work has opened up horizontal SCNGS as a more efficient and cost-effective version of horizontal SCNGS, where channel depth is not as important as costs per sample. With horizontal SCNGS, one is interested in finding exemplars of cellular diversity for diagnostic or forensic identification purposes, not in sequencing each and every cell of a particular subtype or origin. Thus, what is traded off is the depth of information from each sample in exchange for the ability to process an increased number of samples simultaneously, as it enables the achievement of lower per sample costs.
Existing applications of horizontal SCNGS for targeted DNA sequencing waste much of the channel depth on redundancy, as they are only pseudo-horizontal in that, while they target a limited number of loci and allow for sample multiplexing, the amplification products are produced from all of the cells of the sample indiscriminately and the degree of multiplexing achievable is low. This follows from its design, which was originally tailored to vertical SCNGS applications that value (and must preserve) sample depth (e.g., the number of reads per cell and number of cells per sample). In the background/related art, the multiplexed, horizontal SCNGS uses bulk primers as companions to those found on the beads, and thus produce data for all cells in the sample indiscriminately, a la vertical SCNGS. With forensic and diagnostic applications designed to skim a sample for the presence of a major component (such as a donor, or a cell type), these methods are still too expensive and laborious. If the methods were more efficient at using the channel depth, they could lower the costs to implement them on a sample-to-sample basis. Lower cost points would enable widespread adoption for routine diagnostics and human identity tasks, which are characterized by high volume requirements and low-cost demands. Advantageous, the present inventive work provides the first such system to enable cost-effective horizontal SCNGS for these types of higher-throughput applications.
In some implementations, steps 404 through 428 of
In some implementations, it may be useful to stoichiometrically balance the 5′ and 3′ primer sets on the beads and antibodies or other agents, especially if each antibody/agent is bound to only one polynucleotide concatemer providing one (e.g., the 3′ primer) set, since the number of 5′ primers contributed by the beads will be contributed in far greater proportions per gene sequence. To accomplish this, the univ2 primers of
In some implementations, the method can be applied with equal efficacy, with or without the univ2 adaptors, if instead of a single polynucleotide concatemer being bound to the antibody, multiple polynucleotide concatenates are bound to multiple binding sites on the antibody. If the number of binding sites equals the number of target loci and gene sequences, the number of elements in the polynucleotide concatemer N may be one, as long as a set of elements covering all of the gene sequences required is used. In this case, there would be no need to digest the antibody bound nucleotide with the restriction enzyme and instead, one may rely on the protease to release the units for amplification through digestion of the protein antibody.
In some implementations, the method may be applied to whole transcriptomic SCNGS. Here, instead of using a concatemer of “univ-bc-gene” elements, a single “univ-bc-rand” element may be utilized, where “rand” refers to a randomized N-mer oligonucleotide (where, e.g., N may be 6, 8, 12, 20, etc.). If a sufficient number of random sequences are present in the randomer, the nucleotide, once liberated from the antibody, will prime for most of the cDNA sequences generated by vendor-supplied bead libraries using poly dT as the “gene” sequence.
In some implementations, the antibodies used may be monoclonal or polyclonal. Polyclonals may provide assurance that human variation in epitope expression does not result in type II (false negative) results for a given donor to the mixed forensic, R&D, or clinical diagnostics sample. Monoclonals that demonstrate low type II error may be desirable for quality assurance purposes, since antibody-epitope binding can be more easily qualified and quantified.
In some implementations, the process may include the introduction of enzymes and reagents into microdroplets. The likelihood of two microdroplets or a micro and nanodroplet fusing is a function of, in part, their size and concentration. What is desirable is to minimize cell and bead containing microdroplets from fusing, but facilitate the smaller restriction enzyme containing nanodroplets fusing with the cell and bead containing microdroplets; one could utilize a concentration of the former in excess of the latter.
As part of the development of the present inventive work, a software algorithm for modeling statistical dilution of components of unequal size has been developed to help manage the fusion goals.
An index system incorporating a cipher may be utilized for selecting the random barcodes of the polynucleotide concatenates used to tag the sample identities. Such a selection may be performed so that bioinformatically-attributing sequences to the appropriate sample may be foolproof or guaranteed. For example, consider a grid 700A shown in
In order to further demonstrate the creation of the polynucleotide concatenates, a software program has been written. In forensic science, the polynucleotide concatenates may incorporate Tm matched primer sequences for the CODIS set of STR loci, such as those shown in a table 700B of
Before concatenation in the polynucleotide, each primer may be joined with a universal and barcode sequence as shown in a table 700D of
Continuing on with the disclosure, note that, in forensic science applications, so-called “touch DNA” samples may contain very low numbers of epithelial cells derived from human skin. Here, the cell numbers may be so low that it may be possible to use an antibody that recognizes a ubiquitously-expressed epithelial, or general mammalian cell membrane epitope. This may be realized without crowding the reagent/channel space too much to prevent the scale of “stacking” or multiplexing achievable with differentially-expressed epitopes, which allow for “skimming” of a sample.
In at least some cases, it may not be possible to isolate intact cells from forensic evidence, since it is usually presented in a dried format (e.g., a dried swab), and therefore, the inventive processes may be only useful in forensic science with nuclei, in which case the antibody epitope would need to be a nuclear epitope. The use of a nuclear epitope would likely make it much less likely or even impossible to use a differentially-expressed epitope for “skimming” of the sample, but the sparse nature of cells present on dried evidence swabs are almost always present in vanishingly small quantities. This would allow substantial stacking or multiplexing of samples without any one sample crowding too much of the “channel space” with its amplification products. The sparse cells present on dried evidence swabs may include, for example, epithelial cells from “touch DNA” evidence, white blood cells from whole blood evidence, and semen cells from Sexual Assault Evidence Kits (SAEKs).
The same applies to whole blood evidence, since only white blood cells contain nuclei, and white blood cells are present at low levels compared to red blood cells and platelets. Using a circulating B-cell epitope, such as CD138, will always allow for greater multiplexing capability and lower costs, however, without sacrificing quality since the nature of SCNGS in providing millions of reads per experiment, one still expects a large enough number of exemplars from each donor to be confident in the profile generated.
With fresh buccal or saliva samples, notorious for high numbers of cells per sample, epitopes corresponding to ubiquitously-expressed epithelial gene products would be advisable, such as cell adhesion molecule (CAM) epitopes, tight junction, desmosome, gap junction, or because all epithelial cells bind to the extracellular matrix through fibronectin via integrin proteins, integrin epitopes. Dried buccal samples would probably need nuclei dilution prior to adding to a set of samples, lest it potentially crowd out other samples, or consume too much of the channel space with its amplification products.
Thus, as described above, the inventive process may involve a way in which to “skim” or take a small fraction of cells of a given sample so that samples can be “stacked” or multiplexed prior to injection into microreactors, such that their amplification products contain barcodes enabling attribution to specific cells as well as the sample of origin. However, one might consider why the use of the agent-bound polynucleotide concatenate method is preferable to merely taking a small fraction of each sample, stacking or multiplexing these small fractions, and using the background/related art of lipid nanoreactor or antibody bound to a small tag oligonucleotide. Here, the small tag oligonucleotide contains a universal sequence and becomes incorporated into the amplification products generated by bead coated primers and ubiquitously provided primer partners.
In response to this inquiry, the latter, background/related art method may indeed be easier and involve fewer steps, but it does not allow for high-order stacking or multiplexing unless even numbers of cells from each sample are taken, which requires cell counting which is a laborious process. Without cell counting, the samples with higher cell densities will “hog” channel bandwidth relative to those with lower cell densities. And, due to the stochastic nature of polymerase chain reaction, cells from the sparser samples may be out-competed for polymerase and primers, resulting in their signal being lost in the NGS data. Indeed, this is likely the reason why the current vendors that practice this method of multiplexing only allow for eight-sample combinations in each channel.
In at least some implementations of the present disclosure, such problems of the background/related art may be avoided by using an agent-bound polynucleotide concatenate (e.g., referred to as “antibody-bound polynucleotide concatenate” for simplicity) as a gatekeeper, wherein only cells that bind an antibody-bound polynucleotide concatenate will produce amplification product. Since the moles or numbers of antibodies or antibody-bound polynucleotide concatenates in each well can be easily equilibrated, the use of antibody-bound polynucleotide concatenates effectively normalizes each samples contribution to the downstream amplification products, such that each sample consumes the same fraction of the channel bandwidth.
A good example of the benefit of this effective normalization is with SAEK commonly encountered in forensic science. With SAEKs, there is a need to separate sperm cells from epithelial cells. Swabs are taken from body orifices, and often contain far more victim epithelial cells than suspect sperm cells. For this reason, a differential lysis procedure is most commonly used to separate the two cell types prior to analysis with capillary electrophoresis. In at least some implementations of the present disclosure, by using antibodies with epitopes expressed on the surface of the sperm cell, this laborious process could be eliminated, since the epithelial cells in the sample mixture would not produce amplification products contributing to the SCNGS generated CODIS profiles. Further, since some items of evidence may contain a greater number of sperm cells than others, the use of the same number of antibody-bound polynucleotide concatenates for each sample ensures that each sample occupies an equal “slice” of the NGS analytical channel, such that the samples with high sperm counts do not swamp out those with low sperm counts. In this way, another marketable embodiment of the present inventive work would be quality-controlled 96, 384, or higher-well plates, within the wells of which are a constant number of moles of dried-down anti-human sperm antibody-polynucleotide concatenates, ready for cellular sample addition. Without the present inventive work, without expensive and laborious cell counting and normalization between samples, the background/related art method of multiplexing would likely produce data for some contributors of some samples rather than each contributor to each sample. The added labor of cell counting would vastly reduce throughput and increase costs.
In
To carry out these three modules, what is described is a three-step microfluidic process with respect to microfluidic device 800A of
As described, the foregoing methods rely on “positive skimming”, where an agent binds to and selects the cells that will proceed to be sequenced. In some situations, the process may be better served with use of “negative skimming,” where an agent is utilized to bind to the cells that will be removed and not proceed to sequencing. For example, if an entire collection of tumor cell types is desired to be read from a tumor sample, and the cell surface markers of these types is not known a-priori, biotinylated antibodies may be used that recognize normal epithelium, endothelium, immune cells, etc., bound to a streptavidin-coated bead, and removed from the sample using a magnet. The sub-sample of cells left behind would thus not need to be bound by a separate agent linked to a polynucleotide concatenate. Instead of the polynucleotide concatenates, simple oligonucleotides may be used that contain the sample-identifying barcodes and either the 5′ primers or the 3′ primers for the target loci, as partners for those that will be provided on the bead. Because the sample is a sub-sample after “negative skimming,” these primers may be supplied in free solution form, and no longer do they need to be bound to an agent to positively select cells for proceeding; all of the cells of the sub-sample may proceed to contribute to the SCNGS data without consuming too much of the channel bandwidth of the cartridge(s).
Beginning at a start block 902 of
Samples of cells may be placed in each well (step 906 of
A substrate, such as a streptavidin-coated magnetic bead, may be added to bind to the moiety-bound agent (step 910 of
The cells may be encapsulated into microdroplet emulsions along with beads bound to univ-bc-gene primers as described previously (step 914 of
Next, proteinase may be added to the microdroplets for disrupting of the nucleosome/DNA structures (e.g., as well as the chromatin), thereby allowing access to the genome (step 916 of
The bead may then be degraded or dissolved, for releasing the oligos or primers (step 918 of
Thus, what are produced are amplification products where the polynucleotide barcode integrated into the primer at the 3′ end (or 5′ end) informs as to the sample (e.g., microtiter plate well) and the bead library barcode integrated into the primer present at the 5′ end (or 3′ end) informs as to the cell within that sample. Notably, the amplification products of the sample multiplex are obtained from only a subset of cells in each sample.
Because of the reduction in the number of molecular biology steps, if used with a targeted DNA or genomic sequencing, such a negative skimming procedure can be used with a two-step microfluidics device such as those previously described in the background/related art (see, e.g., U.S. Pat. No. 10,501,739 to Eastburn) and if used with an RNA transcriptome, a one-step microfluidics device may be employed (see, e.g., U.S. Pat. No. 10,752,895 to Church et al.).
The foregoing therefore describes “negative skimming” and “positive skimming”. The “negative skimming” approach may be utilized such that no skimming takes place at all, resulting in all cells of each sample contributing to NGS reads, consuming more channel bandwidth. This would result in lower-order multiplexing than the skimming methods, because each sample would take up more of the channel bandwidth. In this case, however, the procedure is the same as steps 904 through 920 of
At least in some cases, however, both the negative and the non-skimming approaches of the present disclosure may suffer from the same limitations as the current state-of-the-art multiplexing approaches with SCNGS. In particular, the limitations of the acceptance and passing of cells to amplification and NGS, where some samples may potentially contribute far too many numbers of cells to the bandwidth than others, making it prudent to limit the multiplex factor. Without use of an agent to bind the cells selected to “pass” to amplification and NGS, there may be no suitable means to normalize them without laborious and time-consuming cell counting and dilution methods. Therefore, this approach may not be implemented in the type of high-throughput types of environments that are desired.
In some implementations, the objectives of the present disclosure may be achieved with use of other agent-bound polynucleotides. For example, instead of a polynucleotide concatenate, a single element of barcode and universal sequences (“univ2-bc-univ3”) may be utilized to tag the cells, if the universal sequence (“univ3”) is also present at the 5′ end of one half of the primers used to amplify target sequences. If the beads contribute the 5′ primers, each sample is apportioned to a well of a microtiter plate, then the 3′ primers without any barcode sequence may be contributed in bulk across all samples/wells. If the agent-bound “univ2-bc-univ3” elements are apportioned to each well in a manner such that each well gets an agent-bound element with a different barcode bc sequence, then once this agent binds the cells and these primers and cells are encapsulated into a microdroplet (see, e.g.,
In
In
Unlike the polynucleotide concatenate method, this method only incorporates some of its amplicons with the sample tag, and due to the stochastic nature of polymerase chain reaction, the proportion of tagged to not-tagged will be different from cell to cell. To avoid not-tagged amplicons from consuming NGS channel space at the expense of tagged, another amplification step could be performed using the universal sequence on the first set of primers linked to the beads (see
The advantage of this approach is that, instead of partitioning agent-bound long concatenates with unique barcode sequences to wells of a plate, which would require 96 or 384 different long concatenates, agent-bound short elements may be apportioned to each well of a plate where each element contains a unique barcode. That is, one can save on expenses since 96 or 384 different short sequences are needed in contrast to long ones.
On the other hand, the disadvantage of this approach compared to the polynucleotide concatenate approach could be significant. Because the ratio of tagged to not-tagged varies, often substantially, from cell to cell, we may introduce an uncontrollable form of bias into the results. Here, the ratio of tagged to not-tagged target gene amplicon levels will vary from cell to cell, cellular proportions for certain unknowable gene targets may be unreliable, and after normalization PCR (designed to make the mass of DNA from each sample the same), some cells will “swamp out” other cells/samples and consume disproportionate amount of NGS cartridge channel space. This would provide misleading results on the existence and proportions of cell types in the samples. In contrast, with the polynucleotide concatenate method, every target sequence that is amplified contains a cell and sample tag from the beginning, and therefore the opportunity for this type of amplification bias and misleading result may be avoided.
Swab aspects of the present disclosure are now described. With respect to the background/related art, forensic casework most always uses cotton swabs. With cotton swabs there are often problems removing biological material from the cotton matrix; as the cotton swab dries after collection, the biological material can adhere to the swab. For example, due to the saccharic composition of the spermatocyte membrane, spermatocytes stick to solid supports, especially cotton. See, e.g., Lazzarino, M. F. et al., “DNA Recovery from Semen Swabs with the DNA IQ System,” 2008, Forensic Science Communications 10(1). In order to release the maximum amount of material from the swabs, a variety of buffers have been tested and compared to the standard differential extraction buffer. Use of detergents such as 1-2% sodium dodecyl sulfate (SDS) has shown to increase sperm cell or nucleus recovery (see, e.g., Norris, J. V. et al., “Expedited, chemically enhanced sperm cell or nucleus recovery from cotton swabs for rape kit analysis,” 2007, J Forensic Sci 52(4): 800-5) as well as the recovery of other cell or nucleus types but still, a not insignificant fraction of cases involve low levels of cellular material to begin with (see, e.g., “touch DNA” cases), and these often produce DNA results that are below the thresholds for reliable interpretation. Thus, it is important to maximize the release of cellular and/or genetic material from swabs prior to SCNGS analysis. Previous patent disclosures for nucleic acid purification have described this challenge and referenced the first uses of cellulase for recovery of genetic material from dried cotton swabs (see, e.g., U.S. Pat. No. 10,464,065 to Selden et al.,) and the addition of low amounts of cellulase has shown to release more epithelial and sperm cells from the cotton swab matrix than buffer elution alone. See, e.g., Voorhees, J. C. et al., “Enhanced elution of sperm from cotton swabs via enzymatic digestion for rape kit analysis.” 2006, J Forensic Sci 51(3): 574-9.
In addition to the use of cellulase, to facilitate the collection of discrete cells from swabs re-hydrated after desiccation, antiagglutinins such as antibodies can be used to prevent clumping of cells after they are released from the cellulose matrix. Various antiagglutinins are commercially available for this purpose, such as Goat Anti-PNA antiagglutinin of Vector Laboratories, and products are available for use with particular cell or nucleus types such as blood cells (anti-hemagglutinin).
Cellulase has not become an integral part of swab preparation protocols in the forensic science community. This is largely because the effects are modest, because the original descriptions used exocellulases, and were overly simplistic in the interpretation. Cellulases come in two varieties—exocellulases and endocellulases. For disrupting cotton swabs, which are made almost entirely of cellulose (see, e.g., entry for “Cotton” at Wikipedia) digestion at sites between the ends would be more valuable than digestion from the ends, primarily because there are far more of these sites.
Polyester swabs are also used in forensic science, but with the presently disclosed SCNGS inventive work, its likelihood of disruption of the current state-of-the-art in forensic and diagnostic science, and adoption, the relevance for complete release of cells and/or their nuclei from desiccated swabs as a precursor for SCNGS, it is likely that the present inventive work could contribute towards a new FBI and/or American National Standards Institute (ANSI) Quality Assurance Standard mandating cellulose swabs for the future in forensic science (with implications for analogous Clinical Laboratory Improvement Amendments or “CLIA” standards in diagnostics).
Here, what is described is the use of endocellulases for the extraction of cells and/or nuclei from, and preparation of dried forensic cotton swabs for horizontal SCNGS, with or without the use of lysis buffers that facilitate the action of the endonuclease, by enabling the release of intact cells or nuclei from within the swabs (such as those containing proteinase K, and/or hypotonic solutions/buffers). What is also described is the use of endo-cellulases for the digestion of cellulose in cotton swabs and release of cellular material for use in SCNGS workflows. What is further disclosed is a combination of cellulase and anti-agglutinin tailored to specific types of evidence swabs, such as buccal or blood swabs.
Accordingly, as described herein, the present inventive work was developed to effect a disruptive transformation of forensic genetics and diagnostics by enabling human identity and/or diagnostic profiles from individual contributors present in sample mixtures. At least some aspects of the present inventive work involve a “horizontal” approach to a recently introduced method of isolating and examining single cells in massively parallel formats, called SCNGS, as opposed to the “vertical” approach used today. In so doing, the present inventive work may enable larger numbers of samples to be analyzed for fewer numbers of cells per sample, reducing cost points such that forensic genetics and diagnostics applications of SCNGS become practical. Notably, the present inventive work may help solve, once and for all, the “mixture problem” which has plagued the forensic genetics and human identity market. In so doing, the present inventive work could represent an inflection point in the state-of-the-art in forensics and diagnostics as a fundamental, transformative disruption that brought about the effective, efficient and cost-compatible application of SCNGS to these fields.
In some implementations of the present disclosure, the following definitions may apply in relation to the above-described processes.
The term “nucleotide” is meant to convey any string of nucleosides forming any sequence, whether of many nucleosides (“oligonucleotide”) or a large number (“polynucleotides”). A “nucleoside” is a 2′-deoxy and/or 2′-hydroxyl form of a nucleic acid building block, whether naturally occurring or synthetic, the latter of which are commonly referred to as “analogs”. A nucleoside consists of a nitrogenous base covalently attached to a sugar (ribose or deoxyribose) but without the phosphate group. A nucleotide on the other hand consists of a nitrogenous base covalently attached to a sugar (ribose or deoxyribose) and from one to three phosphate groups. See, e.g., Alberts B, Johnson A, Lewis J, Morgan D, Raff M, Roberts, K, Walter P., “Molecular Biology of the Cell or Nucleus,” Sixth Edition, W. W. Norton & Company. The present inventive text considers the terms nucleosides, nucleotides, and analogues thereof as equivalent, represented as G, A, T, C or dG, dA, dT or dC, as long as they hybridize specifically to their cognate nucleotide, nucleoside, analog where G hybridizes with C and vice-versa, and A hybridizes with T and vice versa and will hereafter be referred to as “nucleotides”. Nucleotides may come in a 2′-deoxy or a 2′-hydroxyl forms where they are referred to as deoxynucleotide or a nucleotide, respectively. See, e.g., U.S. Pat. No. 10,501,739 to Eastburn, and U.S. Pat. No. 10,752,895 to Church et al. Examples of nucleotide analogues are given by Scheit. See, e.g., Scheit, K., “Nucleotide Analogues, Synthesis and Biological Function,” John Wiley, New York, 1980. Polynucleotides thereof with enhanced hybridization characteristics described by Uhlman and Peyman. See, e.g., Uhlmann E and Peyman A., “Antisense Oligonucleotides: A New Therapeutic Principle,” 1990, Chemical Reviews (90)4: 544. Modified nucleotides may include, for example diaminopurine, 5-fluorouracil, 5-bromouracil, . . . , etc. Nucleotides are polymerized into nucleic acid molecules via the process of DNA replication, which is the process referred to when using the terms “amplification” or “polymerization” and is extensively taught in various textbooks such as Kornberg 2006. See, e.g., Kornberg A., “DNA Replication,” 1980, WH Freeman & Co Ltd.; and Watson J, Baker T, Bell S, Gann A, Levine M, Losick R, “Molecular Biology of the Gene,” 7th Edition, 2014, Pearson Publishers.
“Oligonucleotides” as used herein refer to single-stranded nucleic acid molecules comprised of a string or series of from 5 to 10, 15, 20 up to 100 nucleotide bases linked in a 5′ to 3′ orientation. A “polynucleotide” is a set of linked oligonucleotides, and/or a string of single-stranded nucleic acid molecules comprised of a string or series of from 100, 200, 1000, 2000 to 10,000 nucleotide bases linked in a 5′ to 3′ orientation.
The notations 5′ and 3′ are used to express one end versus the other end of target amplicons, and can be used interchangeably. For example, if the vendor “bead library” provides amplification primers for one end of an amplicon, the “antibody-concatenate” would provide amplification primers for the other end whether the first end is called 5′ or 3′, and whether the corresponding second end is called 3′ or 5′.
“Complimentary” or “substantially complementary” nucleotides are defined as those for which at least 80% or more of the nucleotides, usually 90 or 95% or more, appositionally overlap when iteratively compared, nucleotide by nucleotide along the length of at least one of the nucleotides, such that the nucleotides are capable of base-pairing or hybridizing to form a duplex of double-stranded DNA under any of a variety of salt and/or buffer conditions. Assessment of complementarity or substantial complementarity is accomplished via either visual registration of the sequences in various overlap configurations and/or with computer devices such as with the use of programs, written in any language, for example, Python programming language with the “Zip” function with lists, tuples or arrays, embedded or not in dictionaries, comprised of iterations of at least one sequence starting with one nucleotide to define a stretch or length of sequence, and progressing along the sequence, re-adjusting the alignment phase by one nucleotide until the end is reached.
The term “hybridization” refers to the non-covalent union of two single-stranded nucleotide sequences based on sequence complementarity, such that they anneal or non-covalently bind together to form a duplex of double-stranded DNA. This union may be accomplished at low (<100 mM) or high (>100 mM) salt concentrations, whether buffered or not, usually below 1M salt concentrations, typically below 500, 200 or 100 mM salt concentrations and are most often carried out under rigorous or stringent binding conditions such that sequence complementarity or substantial complementarity is required, and thus is at least substantially sequence-dependent. Longer oligo/polynucleotide complements require lower annealing temperatures than shorter complements, as do those of greater GC content, and sequence-specific, and therefore meaningful hybridization is typically carried out at temperatures of at least 22 degrees Celsius, to 30, 37, 40, up to 90 degrees Celsius as described fully in Sambrook et al. See, e.g., Sambrook, Fritsche and Maniatis, “Molecular Cloning: A Laboratory Manual,” 2nd Edition, 1989, Cold Spring Harbor Press. When used, the term “hybridization” is intended to indicate the union, binding, annealing, of single-stranded nucleotide/oligonucleotide/polynucleotide sequences as a function of their sequence complementarity as complementary nucleotides under reasonably achievable solution conditions, and for this reason is considered a “sequence specific” event.
As used herein the term “barcode” indicates any nucleotide sequence used to unambiguously identify an oligonucleotide or polynucleotide sequence, or set of sequences, in which it is embedded or attached. The sequence may be randomly or non-randomly generated, but in either case must be long enough (n) such that its occurrence in the larger oligo/polynucleotide in which it is embedded or attached is not expected by chance within it as a function of the polynomial expansion of n elements given an equal or even a biased likelihood of occurrence of any of the four nucleotide types in any single position within the oligo/polynucleotide. The barcode may be a few, a dozen, or a few dozen nucleotides in length (e.g., from 5, 10, 15, 20, . . . , to about 50 or 75 or more nucleotides long). It is composed of dA, dT, dG and dC nucleotides in any combination or order, as well as any nucleotide derivatives or analogues thereof and in the context of the present inventive work, represents an intersection of the elements comprising the set of gene-specific nucleotide containing oligo/polynucleotides attached to a given, discrete bead. Barcode sequences are essentially non-complimentary, representing a minimally cross-hybridizing set. Their annealing temperatures (Tm) may be from 20 degrees Celsius, to 25, to 30, 35, 40, 45, 50, 55, 60, 65 up to about 90 degrees Celsius and each is defined by a unique sequence-dependent Tm.
“Beads” are nano or micrometer sized spheres that are capable of adsorbing to, or otherwise binding to oligonucleotides and/or polynucleotides. In certain embodiments the bead has a magnetic core or cortex, and may be functionalized with a moiety or a protein such as streptavidin which binds to moieties such as biotin often incorporated at the 5′ end of oligo/polynucleotides. Examples include Dynabeads from ThermoFisher Scientific (see, e.g., Immunoprecipitation Dynabeads Products, Thermo Fisher Scientific—US) which are available in various sizes and configurations such as Streptavidin coated M-280 (current gold standard for isolation of biotinylated nucleic acids) (see, e.g., Magnetic Beads: Life Science Applications; e.g., news-medical.net), M-270, MyOne C1, DynaMag-2, and Pierce tradenames, AMSBIO's MagSi magnetic beads, IBA's MagStrep Type3 XT, which most often are magnetic allowing for convenient purification from cellular debris, reaction salts and buffers as reviewed in Bosnes et al., 2013. See, e.g., Bosnes M, Deggerdal A, Rian A, Korsnes L, Larsen F., “Magnetic Separation in Molecular Biology,” In: Hafeli U, Schutt W, Teller J, Zborowski M, editors. “Scientific and Clinical Applications of Magnetic Carriers,” Springer Science & Business Media, 2013, pp. 269-286. The term “bead” is meant to be taken as equivalent to any other solid substrate including slides, beads, chips, particles, strands, gels, sheets, tubing, spheres, containers, capillaries, pads, slices, films, dishes, plates in any configuration 96-well, 28 well 384 well etc. made of any substance such as but not limited to paramagnetic materials, ceramic, plastic, glass, polystyrene, methylstyrene, acrylic polymers, titanium, any other metal, latex, sepharose, cellulose, nylon etc., such that might enable sequestration of multiplex oligo/polynucleotide primer sets with specific cells. The term “attach” is meant to mean the covalent sharing of electrons between atoms of two specific molecules, where one of the molecules is part of a solid substrate or a bead and the two can be said to be “bound”. Beads may be purified using any means, such as magnetic separation or centrifugation for example, and the nucleic acids bound or attached to the beads can be purified using any method for the purification of beads or nucleic acids bound to beads including via use of solid DNA binding substrates or old-school extraction and precipitation. See, e.g., Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 4th Edition, 2012, CSHL Press).
“Multiplexing” refers to the targeting of a set of unrelated, unlinked genetic positions, “loci”, or “genes”, during execution of a detection method, such as polymerase chain reaction or “amplification”. Multiplexing can take place inside of a cell or nucleus or outside of a cell, inside of a micro or nanometer sized reaction vessel or outside of one, or most commonly, in a reaction tube and can be carried out using primers in solution or bound to a solid substrate, such as a slide, plate, bead, semiconductor etc. Typically multiplexing is carried out by, instead of supplying a single 5′/3′ primer pair targeting a single genetic locus, providing a set or collection of 5′/3′ primer pairs during amplification set-up. “Amplification” means polymerase chain reaction (PCR) (see, e.g., Mullis et al., 1986, Cold Spring Harb Symp Quant Biol 51 Pt 1:263) and may include the use of gripped or anchored primers as in the case of “grip PCR”, or incorporate PCR in various schema for specific amplification purposes such as RACE PCR, Ligation Chain Reaction PCR, standard or nested PCR, real-time PCR and/or multiplexed PCR.
The term “sequencing” is meant to convey any method of sequence detection—such as, Sanger sequencing, sequencing by synthesis on solid substrate, or n solution, sequencing by hybridization (SBH), sequencing by ligation (SBL), TaqMan reporter probe digestion, pyrosequencing, etc.
The term “nanodroplet”, “vessel” and “reactor” are intended to signify the same meaning in the context of the present inventive work, indicating a micro or nanometer sized vesicle, or sphere which may contain a bead with nucleic acids attached to it, and may contain a cell, within which biochemical and molecular biology reactions take place including DNA replication, polymerization, amplification etc., such that the reactions are isolated from those taking place in other nanodroplets which are similarly isolated. By “isolated” it is meant that the nucleic acid oligo/polynucleotides in separate nanodroplets are unable to hybridize with those from another.
Thus, enhancements to single cell or nucleus next generation sequencing have been described herein. What are now described are the various approaches as presented above and herein.
In one illustrative example, a method of the present disclosure may involve obtaining a bead library which includes a bead coupled to a plurality of oligonucleotides, the plurality of oligonucleotides comprising a plurality of first primers of a set of primer pairs for a targeted set of loci, each oligonucleotide further including a first barcode sequence that is common to the plurality of oligonucleotides on the bead; binding a cell or nucleus of a sample to a polynucleotide concatenate via a binding agent, the polynucleotide concatenate comprising repeating units of univ-bc-gene sequences, represented as univ-bc-gene1, univ-bc-gene2, . . . , to univ-bc-geneN, wherein the univ represents a universal sequence, the be represents a second barcode sequence that is common to the repeating units of univ-bc-gene sequences, and the gene1, gene2, . . . , to geneN comprise a respective plurality of second primers of the set of primer pairs for the targeted set of loci, and wherein N represents the number of loci in the targeted set of loci; wherein the univ and the gene are selected at each gene-univ junction of the univ-bc-gene sequences so as to create a restriction enzyme binding site at each gene-univ junction; and encapsulating, into a microvesicle, the bead coupled to the plurality of oligonucleotides and the cell or nucleus of the sample that is bound to the polynucleotide concatenate.
In some implementations of the above-described method, the method may further involve adding restriction endonuclease to the microvesicle for digestion, for separating the univ-bc-gene sequences from the polynucleotide concatenate into a plurality of individual univ-bc-gene units; causing a membrane of the cell or nucleus to be digested in the microvesicle, for primer access to a genome of the cell or nucleus; causing the bead to be degraded or digested in the microvesicle, for releasing the plurality of oligonucleotides from the bead; and performing a PCRprocess for amplifying regions of the genome based on the primer pairs for the targeted set of loci, which generates a plurality of amplicons each of which incorporates the first barcode sequence and the second barcode sequence; and wherein, for each amplicon, the first barcode sequence uniquely identifies the cell or the nucleus and the second barcode sequence uniquely identifies the sample.
In some implementations of the above-described method, binding the cell or the nucleus of the sample to the polynucleotide concatenate is performed by placing the polynucleotide concatenate into a well of a microtiter plate; placing the binding agent into the well of the microtiter plate, such that the binding agent binds to an end of the polynucleotide concatenate; and placing the sample of the cell or the nucleus into the well of the microtiter plate, so that the binding agent binds to the membrane of the cell or the nucleus of the sample, thereby binding the cell or the nucleus of the sample to polynucleotide concatenate. In some implementations of the method, where the polynucleotide concatenate comprises a single-stranded polynucleotide concatenate, the method may further involve converting the single-stranded polynucleotide concatenate into a double-stranded polynucleotide concatenate. In some implementations of the method, the steps of placing are repeated for each one of a plurality of samples of cells or nuclei that are placed into different wells of the microtiter plate, one sample per well, for binding the cells or the nuclei of each sample to a different polynucleotide concatenate, and the method further involves pooling together the different samples of the different cells or the nuclei; and performing microvesicle encapsulation after pooling together the different samples of the cells or the nuclei.
In some implementations of the above-described method, the binding agent is for binding to different types of cells of the sample for targeting a fraction of the cells of the sample; or the binding agent comprises an antibody which binds to an epitope of the membrane of the cell or nucleus of the sample; or the binding agent comprises an antibody which binds to an epitope that is differentially expressed amongst cells of the sample for targeting a fraction of the cells of the sample; or the binding agent comprises an antibody which binds to an epitope of the membrane of the cell or nucleus of the sample, the polynucleotide concatenate is provided with a moiety on its end, the moiety comprises biotin which binds to streptavidin, and the antibody comprises a streptavidin-conjugated antibody; or the bead library comprises a vendor bead library; or the first primers of the set of primer pairs comprise 3′ primers and the second primers of the set of primer pairs comprise 5′ primers; or the first primers of the set the primer pairs comprise 5′ primers and the second primers of the primer pairs comprise 3′ primers, or the targeted set of loci comprise CODIS STRs.
In another illustrative example, a method of the present disclosure may involve obtaining a bead library which includes a bead coupled to a plurality of oligonucleotides, the plurality of oligonucleotides comprising a plurality of first primers of a set of primer pairs for a targeted set of loci, each oligonucleotide further including a first barcode sequence that is common to the plurality of oligonucleotides on the bead but different from bead to bead; generating a plurality of polynucleotide concatenates based on at least the bead library, each polynucleotide concatenate comprising repeating units of univ-bc-gene sequences, represented as univ-bc-gene1, univ-bc-gene2, . . . , to univ-bc-geneN, wherein the univ represents a universal sequence, the be represents a second barcode sequence that is common to the repeating units of univ-bc-gene sequences in the polynucleotide concatenate but different from the other polynucleotide concatenates, and the gene1, gene2, . . . , to geneN comprise a respective plurality of second primers of the set of primer pairs for the targeted set of loci, and wherein N represents the number of loci in the targeted set of loci; wherein the univ and the gene are selected at each gene-univ junction of the univ-bc-gene sequences of the plurality of polynucleotide concatenates so as to create a restriction enzyme binding site at each gene-univ junction; and for each one of the plurality of polynucleotide concatenates, placing the polynucleotide concatenate into a respective unique one of a plurality of wells of a microtiter plate, and converting the polynucleotide concatenate into a double-stranded polynucleotide concatenate, thereby producing a plurality of double-stranded polynucleotide concatenates; placing a binding agent into the plurality of wells, such that the binding agent binds to ends of the plurality of double-stranded polynucleotide concatenates; for each one of a plurality of samples of cells or nuclei, placing the sample into the respective unique one of the plurality of wells, so that the binding agent binds to a membrane of the cells or the nuclei of the sample, thereby binding the cell or the nucleus of the sample to the double-stranded polynucleotide concatenate, for thereby producing the plurality of samples of cells or nuclei that are respectively bound to the plurality of double-stranded polynucleotide concatenates; and pooling together the plurality of samples of cells or nuclei that are respectively bound to the plurality of double-stranded polynucleotide concatenates. The method may further involve, for each one of a plurality of different beads associated with the bead library, encapsulating, into a microvesicle, the bead coupled to the plurality of oligonucleotides and the cell or nucleus that is bound to the polynucleotide concatenate; adding restriction endonuclease to the microvesicle for digestion, for separating the univ-bc-gene sequences from the polynucleotide concatenate into a plurality of individual univ-bc-gene units; causing a membrane of the cell or nucleus to be digested in the microvesicle, for primer access to a genome of the cell or nucleus; causing the bead to be degraded or digested in the microvesicle, for releasing the plurality of oligonucleotides from the bead; and performing a PCR process for amplifying regions of the genome based on the primer pairs for the targeted set of loci, which generates a plurality of amplicons each of which incorporates the first barcode sequence and the second barcode sequence; wherein, for each amplicon, the first barcode sequence uniquely identifies the cell or nuclei and the second barcode sequence uniquely identifies the sample.
In yet another illustrative example, a method of the present disclosure may involve obtaining a bead library which includes a bead coupled to a plurality of oligonucleotides, the plurality of oligonucleotides comprising a plurality of first primers of a set of primer pairs for a targeted set of loci, each first primer including a first barcode sequence that is common to the plurality of first primers on the bead; obtaining a plurality of second primers of the set of primer pairs for the targeted set of loci, in bulk, each being represented as univY-gene, wherein the univY represents a universal sequence Y and each gene represents one of the plurality of second primers; binding a cell or nucleus of a sample to a polynucleotide sequence via a binding agent, the polynucleotide sequence comprising a univX-bc-univY sequence, wherein the univX represents a universal sequence X, the be represents a second barcode sequence, the univY represents the universal sequence Y; encapsulating, into a microvesicle, the bead coupled to the plurality of oligonucleotides comprising the plurality of first primers of the set of primer pairs, the plurality of second primers of the set of primer pairs, and the cell or the nucleus of the sample that is bound to the polynucleotide sequence.
In some implementations of the above-described method, the method may further involve causing a membrane of the cell or nucleus to be digested in the microvesicle, for primer access to a genome of the cell or nucleus; causing the bead to be degraded or digested in the microvesicle, for releasing the plurality of oligonucleotides from the bead; and performing a PCR process for amplifying regions of the genome based on the primer pairs for the targeted set of loci, which generates a plurality of amplicons each of which incorporates the first barcode sequence and the second barcode sequence; wherein, for each one of at least some of the amplicons, the first barcode sequence uniquely identifies the cell or the nucleus and the second barcode sequence uniquely identifies the sample. In some implementations of the above-method, binding the cell or the nucleus of the sample to the polynucleotide concatenate is performed by placing the polynucleotide sequence into a well of a microtiter plate; placing the binding agent into the well of the microtiter plate, such that the binding agent binds to an end of the polynucleotide sequence; and placing the sample of the cell or the nucleus into the well of the microtiter plate, so that the binding agent binds to the membrane of the cell or the nucleus of the sample, thereby binding the cell or the nucleus of the sample to polynucleotide sequence. In some implementations of the above-method, the polynucleotide concatenate comprises a single-stranded polynucleotide concatenate, and the method further involves converting the single-stranded polynucleotide concatenate into a double-stranded polynucleotide concatenate. In some implementations of the above-method, the steps of placing are repeated for each one of a plurality of samples of cells or nuclei that are placed into different wells of the microtiter plate, one sample per well, for binding the cells or the nuclei of each sample to a different polynucleotide concatenate, and the method further involves pooling together the different samples of the different cells or the nuclei; and performing microvesicle encapsulation after pooling together the different samples of the cells or the nuclei.
In some implementations of the above-method, the binding agent is for binding to different types of cells of the sample for targeting a fraction of the cells of the sample; or the binding agent comprises an antibody which binds to an epitope of the membrane of the cell or nucleus of the sample; or the binding agent comprises an antibody which binds to an epitope that is differentially expressed amongst cells of the sample for targeting a fraction of the cells of the sample; or the binding agent comprises an antibody which binds to an epitope of the membrane of the cell or nucleus of the sample; or the polynucleotide sequence is provided with a moiety on its end, the moiety comprises biotin which binds to streptavidin, and the antibody comprises a streptavidin-conjugated antibody; or the bead library comprises a vendor bead library; or the first primers of the set the primer pairs comprise 5′ primers and the second primers of the primer pairs comprise 3′ primers; or the targeted set of loci comprise CODIS STRs.
In even yet another illustrative example, a process of the present disclosure (e.g., a three-step microfluidic process) is described for single cell next generation sequencing with use of a microfluidic device configured for enabling a generation of emulsions of solution within an immiscible carrier fluid, where the generation of the emulsions include single cells with a first set of reagents and/or enzymes desired for carrying out a set of molecular biology reactions, and where the process involves transitioning the single cells into a first incubation chamber within which a first set of molecular biology reactions take place; merging the emulsions with a second set of reagents and/or enzymes; transitioning to a second incubation chamber within which a second set of molecular biology reactions take place; and merging the emulsions with a third set of reagents and/or enzymes, for creating the emulsions for polymerase chain reaction in preparation for sequencing.
In some implementations of the above-described process, the solution is aqueous or non-aqueous, and wherein the multi-step process is for enabling multiplexed single cell next generation sequencing through tagging of the cells within a collection of samples. In some implementations of the above-described process, the process is for use with a simple set of antibody-bound or bead-bound barcode tags or types, and/or bead libraries, for enabling the tagging of cells within microreactors or vessels such that only a fraction of the single cells of the sample are tagged, which constitutes a skimming of cellular diversity from the sample. In some implementations of the above-described process, the first set of molecular biology reactions is restriction endonuclease digestion and/or linear amplification using a DNA polymerase. In some implementations of the above-described process, the second set of molecular biology reactions comprise proteinase digestion. In some implementations of the above-described process, the third set of molecular biology reactions enables geometric DNA amplification using a DNA polymerase.
In another illustrative example, a method of the present disclosure which is for multiplexed SCNGS may involve encapsulating a single nucleotide bound bead with a single cell or cell nucleus within a microvessel or on a solid substrate, within the context of a plurality of vessels or substrate binding sites, each containing a single bead and nucleus, where the cells or nuclei of a sample are tagged for sample identity by virtue of their being bound to an agent that contains a linked polynucleotide of repeating units in the form of a concatemer, where a restriction endonuclease binding site exists at each of the junctions between repeating units, and where within each vessel, the nucleotides are liberated from the beads and agent and extended from cDNA or genomic DNA from the cell via primer extension and/or polymerase chain reaction, to create polymerase chain amplicons such that the liberated bead bound primers and liberated cell tagging primers are both incorporated and suitable for next generation or massively parallel sequencing.
In some implementations of the above-described method, the nucleotide or genetic barcoding of the cells is contributed by the bead primers, and the nucleotide or genetic barcoding the sample is contributed by the agent bound nucleotides. In some implementations, the repeating unit is an oligonucleotide element of univ-bc-gene structure, where univ corresponds to a universal or common sequence, be corresponds to a specific barcode sequence, and gene corresponds to a nucleotide region of a gene. In some implementations, a second and/or a third universal binding site is incorporated at the 5′ and 3′ ends of the concatemer. In some implementations, the agent binds to all or substantially all of the cells of a sample. In some implementations, the agent binds to only a fraction of the cells of a sample. In some implementations, the preferred agent is an antibody that recognizes an epitope or binding site on the surface of the cells or nuclei.
In some implementations of the above-described method, the method uses a single or collection of polyclonal antibodies that recognize an epitope or binding site on the surface of the cells or nuclei. In some implementations, the method uses a single or collection of monoclonal antibodies that recognize an epitope or binding site on the surface of the cells or nuclei. In some implementations, the method uses a mixture of polyclonal and monoclonal antibodies that recognize an epitope or binding site on the surface of the cells or nuclei. In some implementations, the preferred agent is a liposome that merges or fuses indiscriminately with all of the cells or nuclei. In some implementations, the agent is a protein or mixture of proteins that binds to proteins, sugars or other entities on the surface of the cells or nuclei. In some implementations, the agent is a peptide or mixture of peptides that binds to proteins, sugars or other entities on the surface of the cells or nuclei. In some implementations, the polynucleotide concatenate contains a biotin at its 5′ end, and binds to the agent through strepavidin molecules linked to the agent. In some implementations, the agent bound nucleotides comprise a nucleotide containing one unit, or a polynucleotide constituting a concatemer containing repeating units or segments, of a first common or universal primer binding sequence, a unique barcode sequence, and a sequence common to a class of nucleotides, such as poly dT, and where a restriction endonuclease binding site exists at each of the junctions between repeating units.
In some implementations of the above-described method, the number of repeating univ-bc-gene units is 1, and one such element is bound to the agent. In some implementations, the number of repeating univ-bc-gene units is 2 through 12, and one such element is bound to the agent. In some implementations, the number of repeating univ-bc-gene units is 13 through N, and one such element is bound to the agent, where N is the maximum number of units that may be reasonably incorporated into a synthesized polynucleotide with state-of-the-art polynucleotide synthesis methods. In some implementations, the number of repeating univ-bc-gene units is 1, 2, or any number that is reasonably incorporated in a synthesized polynucleotide, and the units are bound to multiple sites on the agent. In some implementations, the number of repeating univ-bc-gene units is more than two dozen.
In some implementations of the above-described method, the repeating units are the same, targeting bulk cDNA, or the same genomic DNA sequence. In some implementations, the repeating units are different, targeting different cDNA or genomic DNA sequences. In some implementations, the concatenate is synthesized synthetically as a single polynucleotide. In some implementations, the first univ-bc-gene unit of the polynucleotide concatenate is a dummy unit that remains bound to the agent after restriction endonuclease digestion, and its gene sequence is not part of the gene sequence set needed for target loci amplification. In some implementations, additional universal or common sequences at the 5′ and 3′ ends of the polynucleotide concatenate are incorporated, enabling the concatenate to be amplified prior to restriction endonuclease digestion. In some implementations, the method involves libraries of a plurality of agent bound polynucleotides. In some implementations, the libraries of a plurality of agent bound polynucleotides are converted to double-stranded DNA.
In yet another illustrative example, what is claimed is a process of co-localizing an agent bound polynucleotide containing a single element or a concatenate of elements with beads coupled to other primers, and single cells within a micro or nanometer sized space, such as a micro or nanodroplet, well, or vesicle (“microreactor” or “nanodroplet”), such that the nucleotide element or elements that results from restriction digestion of the concatenate serve as half of the primers for DNA amplification (e.g., the 3′ primers) and the primers contributed by the bead represent the other half of the primers (e.g., the 5′ primers). In some implementations, what is further claimed is a process of digesting the polynucleotide, inside of a micro or nanometer sized space, bound to the agent or released from it, using a restriction endonuclease capable of binding to the restriction endonuclease binding sites present at each of the junctions between repeating units in the concatemer of 1. In some implementations, what is further claimed is a process of utilizing an organized distribution of barcoded primers linked to the agent for tracking the sample of origin for amplification products generated inside of emulsions, microvessels, microreactors etc to the sample of origin. In some implementations, what is further claimed is a process of utilizing a cipher to create barcodes linking amplification products generated inside of emulsions, microvessels, microreactors etc to the sample of origin. In some implementations, what is further claimed is a process of isolating such polynucleotide-linked, agent-bound cells in the preceding into micro or nanodroplets comprised of lipids or agarose.
In another illustrative example, what is claimed is a process of isolating such polynucleotide-linked, agent-bound cells into micro or nanodroplets comprised of droplets or wells, etched or not, into a solid 2-dimensional surface or substrate. In some implementations, what is further claimed is a process of co-localizing polynucleotide-linked, agent-bound cells into micro or nanometer sized spaces with single cells, captured through photon-assisted or electrokinetic gating. In some implementations, what is further claimed is a process where the primers constituted by the elements of the polynucleotides target specific human identity loci, such that each element contains a universal primer binding site, a discrete barcode and a set of gene specific primers or nucleotide class specific primer(s). In some implementations, what is claimed is further allowing a multiplexing of samples prior to integration of cells into the vessels, thereby reducing per sample costs for each SCNGS run. In some implementations, the method may be applied in the forensic sciences, or in routine diagnostics, or in R&D, or in Drug Discovery or Development.
In a further illustrative example, a method of the present disclosure for multiplexed SCNGS may involve encapsulating a single nucleotide bound bead with a single cell or cell nucleus within a microvessel or on a solid substrate, within the context of a plurality of vessels or substrate binding sites, each containing a single bead linked to one set of primers, and a single cell or nucleus, where the other set of primers creating pairs for each targeted DNA site is provided in bulk without any barcode, but contains a universal sequence at its 5′ end (univY-gene), where the cells or nuclei of a sample are tagged for sample identity by virtue of their being bound to an agent that contains a linked polynucleotide of a single unit in the form of a different universal, barcode and a universal region (univX-bc-univY), the nucleotides are liberated from the beads and agent and combined with bulk primers containing the partners extended from cDNA or genomic DNA from the cell via primer extension and/or polymerase chain reaction, to create polymerase chain amplicons, such that the liberated bead bound primers and liberated cell tagging primers are both incorporated and suitable for next generation or massively parallel sequencing.
In some implementations of the above-described method, the nucleotide or genetic barcoding of the cells is contributed by the bead primers, and the nucleotide or genetic barcoding the sample is contributed by the agent bound nucleotides. In some implementations, the primers containing nucleotide or genetic barcoding of the cells represent one half of the primer pair for each targeted sequence, the 3′ primers are provided in bulk and represent the other half of the primer pair for each targeted sequence, and the nucleotide or genetic barcoding the sample is contributed by the agent bound nucleotides. In some implementations, the agent bound polynucleotide contains, in order, a barcode sequence linked to a universal sequence, where the universal sequence is contained by each of the bulk primers at their 5′ ends. In some implementations, the agent bound polynucleotide contains, in order, a universal sequence linked to a barcode sequence linked to a universal sequence, where the universal sequence is contained by each of the bulk primers at their 5′ ends.
In some implementations of the above-described method, the polynucleotide is single-stranded. In some implementations, the polynucleotide is double-stranded. In some implementations, the agent binds to all or substantially all of the cells of a sample. In some implementations, the agent binds to only a fraction of the cells of a sample. In some implementations, the preferred agent is an antibody that recognizes an epitope or binding site on the surface of the cells or nuclei. In some implementations, the method may use a single or collection of polyclonal antibodies that recognize an epitope or binding site on the surface of the cells or nuclei. In some implementations, the method may use a single or collection of monoclonal antibodies that recognize an epitope or binding site on the surface of the cells or nuclei. In some implementations, the method may use a mixture of polyclonal and monoclonal antibodies that recognize an epitope or binding site on the surface of the cells or nuclei. In some implementations, the preferred agent is a liposome that merges or fuses indiscriminately with all of the cells or nuclei. In some implementations, the agent is a protein or mixture of proteins that binds to proteins, sugars or other entities on the surface of the cells or nuclei. In some implementations, the agent is a peptide or mixture of peptides that binds to proteins, sugars or other entities on the surface of the cells or nuclei. In some implementations, the polynucleotide concatenate contains a biotin at its 5′ end, and binds to the agent through strepavidin molecules linked to the agent. In some implementations, the polynucleotide represents a concatenate of multiple units separated at their junction by a restriction endonuclease site. In some implementations, the repeating units are different, targeting different cDNA or genomic DNA sequences. In some implementations, the concatenate is synthesized synthetically as a single polynucleotide as opposed to created through ligation. In some implementations, the first unit of the polynucleotide concatenate is a dummy unit that remains bound to the agent after restriction endonuclease digestion.
In some implementations of the above-described method, the method involves incorporating additional universal or common sequences at the 5′ and 3′ ends of the unit(s), enabling subsequent common amplification in separate steps after gene amplification. In some implementations, the method involves libraries of a plurality of agent bound polynucleotides. In some implementations, the method involves libraries of a plurality of agent bound polynucleotides that are converted to double-stranded DNA. In some implementations, what is further claimed is a process of co-localizing an agent bound polynucleotide containing a single element or a concatenate of elements, with beads coupled to other primers, and bulk primers, and single cells within a micro or nanometer sized space, such as a micro or nanodroplet, well, or vesicle (“microreactor” or “nanodroplet”), such that the nucleotide element or elements that results from DNA amplification contain both a cell barcode nucleotide tag and a sample barcode nucleotide tag. In some implementations, what is further claimed is a process of digesting the polynucleotide, inside of a micro or nanometer sized space, bound to the agent or released from it, using a restriction endonuclease capable of binding to the restriction endonuclease binding sites present at each of the junctions between repeating units in the concatemer of 1. In some implementations, what is further claimed is a process of utilizing an organized distribution of barcoded primers linked to the agent for tracking the sample of origin for amplification products generated inside of emulsions, microvessels, microreactors etc to the sample of origin. In some implementations, what is further claimed is a process of utilizing a cipher to create barcodes linking amplification products generated inside of emulsions, microvessels, microreactors etc to the sample of origin. In some implementations, what is further claimed is a process of isolating such polynucleotide-linked, agent-bound cells in the preceding into micro or nanodroplets comprised of lipids or agarose.
In even another illustrative example, what is claimed is a process of isolating such polynucleotide-linked, agent-bound cells into micro or nanodroplets comprised of droplets or wells, etched or not, into a solid 2-dimensional surface or substrate. In some implementations, what is further claimed is a process of co-localizing polynucleotide-linked, agent-bound cells into micro or nanometer sized spaces with single cells, captured through photon-assisted or electrokinetic gating. In some implementations, what is further claimed is a process where the primers constituted by the elements of the polynucleotides target specific human identity loci, such that each element contains a universal primer binding site, a discrete barcode and a set of gene specific primers or nucleotide class specific primer(s). In some implementations, what is further claimed is a process allowing a multiplexing of samples prior to integration of cells into the vessels, thereby reducing per sample costs for each SCNGS run. In some implementations, the method may be applied in the Forensic Sciences, or in routine diagnostics, or in Research and Development (R&D), or in Drug Discovery or Development.
In yet even another illustrative example, a three-step SCNGS process of the present disclosure is provided, which involves a microfluidic device enabling the generation of emulsions of solution within an immiscible carrier fluid, the creation of such emulsions containing single cells with reagents and enzymes needed for carrying out a set of molecular biology reactions, transition of the cells into an incubation chamber within which molecular biology reactions take place, the subsequent merging of the emulsions with a second set of reagents and/or enzymes, transition to a second incubation chamber within which a second set of molecular biology reaction(s) takes place, the subsequent merging of the emulsions with a third set of reagents and/or enzymes, creating emulsions suitable for polymerase chain reaction in preparation for sequencing.
In some implementations of the above-described process, the process is for enabling multiplexed SCNGS approach through the tagging of cells within a collection of samples. In some implementations, the process is used with a relatively simple set of antibody bound or bead bound barcode tags or types, and/or bead libraries, that enable the tagging of cells within microreactors or vessels such that only a fraction of the cells of the sample are tagged, constituting a “skimming” of the cellular diversity from the sample. In some implementations, the solution is aqueous, or is non-aqueous. In some implementations, the first set of molecular biology reactions is restriction endonuclease digestion and/or linear amplification using a DNA polymerase. In some implementations, the second set of molecular biology reactions is proteinase digestion. In some implementations, the third set of molecular biology reactions enables geometric DNA amplification using a DNA polymerase.
In another illustrative example, what is claimed is the use of a plate or array of barcoded primers with an agent that recognizes and binds to cells of a sample differentially for the purposes of negatively ‘skimming’ a sample of cells for multiplexed SCNGS via negative selection, in that the antibody epitope is expressed on cells of known identity in the sample, the antibody is linked to a chemical entity that binds a solid substrate, enabling these cells to be removed from the sample, leaving only those that do not express the epitope for multiplexed SCNGS analysis, thereby skimming the sample and obviating the need for positive ‘skimming’ using an agent-bound polynucleotide concatenate.
In some implementations of the use of the plate or array, the agent is an antibody recognizing a differentially-expressed epitope. In some implementations, solid substrate is a magnetic bead. In some implementations, the chemical entity is biotin. In some implementations, pre-prepared plates or other substrates containing wells/chambers/vessels are utilized, to segregate samples defined by oligonucleotides that contain the same sequences in each well except a unique identifier or barcode which is different in each well/chamber/vessel.
In yet another illustrative example of the present disclosure, what is claimed is the use of an array, grid or matrix of barcoded primers in 2D or 3D space, to enable tracking of sample identity among cells of a sample of cells that is not depleted of any cells, for the purposes of multiplexing samples for SCNGS. In some implementations, pre-prepared plates or other substrates containing wells/chambers/vessels are utilized, to segregate samples defined by oligonucleotides that contain the same sequences in each well except a unique identifier or barcode which is different in each well/chamber/vessel.
In yet a further illustrative example of the present disclosure, what may be provided is a computer-readable non-transitory storage medium storing instructions which, when executed by one or more computing devices, cause the one or more computing devices to perform any method described herein. In some implementations, a computer system may be configured to perform the techniques and/or methods for the analysis or process according to the embodiments described herein. A system of the present disclosure may include or be associated with multiple subsystems such as, for example, one or more machines, one or more computer systems, and one or more data repositories. In some implementations, the various subsystems of the system may be communicatively connected over one or more networks, which may include packet-switching or other types of network infrastructure devices (e.g., routers, switches, etc.) that are configured to facilitate information exchange between remote systems. In one embodiment, the system may be a device in which the various subsystems (e.g., such as machine(s), computer system(s), and possibly a data repository) are components that are communicatively and/or operatively coupled and integrated within the device. In some of these operational contexts, the data repository and/or computer system(s) of the embodiments may be configured within a cloud computing environment. In a cloud computing environment, the storage devices comprising a data repository and/or the computing devices comprising a computer system may be allocated and instantiated for use as a utility and on-demand; thus, the cloud computing environment provides as services the infrastructure (e.g., physical and virtual machines, raw/block storage, firewalls, load-balancers, aggregators, networks, storage clusters, etc.), the platforms (e.g., a computing device and/or a solution stack that may include an operating system, a programming language execution environment, a database server, a web server, an application server, etc.), and the software (e.g., applications, application programming interfaces or Application Programming Interfaces or APIs, etc.) necessary to perform any storage-related and/or computing tasks. Here, it is noted that in various embodiments, the techniques described herein can be performed by various systems and devices that include some or all of the above subsystems and components (e.g., such as sequencing machines, computer systems, and data repositories) in various configurations and form factors; thus, the example embodiments and configurations as described are to be regarded in an illustrative rather than a restrictive sense.
This application claims the benefit of U.S. Provisional Application No. 63/130,559 filed on Dec. 24, 2020 and entitled “Enhanced Methods And Compositions For The Application Of Single Cell Next Generation Sequencing In Diagnostics And Forensic Science,” and U.S. Provisional Application No. 63/277,529 filed on Nov. 9, 2021 and entitled “Enhanced Methods And Compositions For The Multiplexing Of Single Cell Or Nucleus Next Generation Sequencing Samples For Reducing Costs And Improving Throughput,” the contents of which are incorporated herein by reference in their entirety.