The subject matter disclosed herein is generally directed to methods, devices, and reagents for mutation detection in mammalian and bacterial cells during single replication events.
Somatic mutations are implicated in age-related diseases including cancer, neurodegeneration, and organ failure in humans (Fernandez et al. 2016; Rossi et al. 2008; Campbell et al. 2015; Vijg et al. 2017; Lodato et al. 2017). Somatic mutations steadily accumulate in our cells, but most somatic mutations are harmless. Occasionally a mutation affects a gene or a regulatory element in a way that leads to a phenotypic consequence (Stratton et al. 2009 Nature 458, 719-724). The detailed analysis of somatic mutations provides insights into the causes of mutagenesis. Moreover, identification and quantitative analysis of human somatic mutation accumulation is extremely important for several fields such as improving healthcare through preventative medicine, detection of potential carcinogenic materials, estimating the baseline risk of cancer and other diseases of aging etc. Understanding the intrinsic and extrinsic factors that contribute to mutagenesis (including mutation-inducing cancer therapies) is currently a top priority in cancer prevention and treatment (Drost et al. 2017; Wu et al. 2016; Vijg et al. 2017). However, available methods for somatic mutation detection are far from satisfactory, limiting our progress in understanding how mutations accumulate in cells and impact health (Gawad et al. 2016; Baslan and Hicks 2017). The current knowledge about somatic mutations rate is incomplete and mostly measured indirectly. Despite the complexity of nucleoside chemistry, the wide range of DNA lesions characterized in humans, and the large variety of DNA replication and repair enzymes, only 30 distinct somatic mutation signatures have been defined from human tumor sequencing, and the etiology of more than half remains unknown (Petljak and Alexandrov 2016). Analysis of quantitative, high-resolution, and unbiased somatic mutation data has the best potential to link observed mutational spectra with molecular mechanisms. However, the distributed and asynchronous nature of mutations in somatic cells makes these mutations difficult to detect and accurately quantify. Specifically, published rates of somatic mutation vary widely from 10−11 to 10−7 single nucleotide variants (SNVs) per cell per cell division (Li et al. 2014; Lynch 2010; Ju et al. 2017; Holstege et al. 2014; Blokzijl et al. 2016; Milholland et al. 2017) due to differences across cell types and reliance on uncertain cell division rate estimates (Tomasetti et al. 2017). Without strong selection such as that occurring in tumor development (McGranahan and Swanton 2017), these somatic mutation rates result in variant allele frequencies lower than can be reliably detected in bulk samples using standard sequencing approaches, which produce consensus false positive SNV error rates of 10−6 to 105 per base.
Available approaches for somatic variant analysis have yielded valuable insights, but also have technical limitations that compromise quantitative mutation analysis. Accurate detection of rare alleles in bulk samples can be achieved by molecular consensus sequencing methods that use barcodes to track reads from individual input molecules, but this comes at the cost of added protocol complexity, a requirement for ultra-deep sequencing that makes genome-wide analysis challenging (Merkle et al. 2017; Martincorena et al. Science. 2015 Sep. 25; 349(6255):1483-9) and does not provide direct insight into intra-lineage structures or dynamics. For example, Martincorena et al. used computational approaches on big data sets to study mutations. To enrich samples for somatic variants, single cells can be isolated and cloned or processed for sequencing directly. These cells or clones are typically separated by a large and unknown number of cell division events, which contributes to uncertainty in mutation rate calculations and severely limits power to determine exactly when mutations arose within a lineage or correlations among the mutations (Szikriszt et al. 2016 Genome Biology 17:99; Blokzijl et al. 2016; Drost et al. 2017; Milholland et al. 2017). For example, Szikriszt et al. used random single cell expansion and analysis to detect mutations caused by cytotoxic agents, but the method does not allow lineage construction Direct single-cell methods do not require live cells or suffer from a selection bias against slower- or non-growing cells, but despite recent advances, remain compromised in the sensitivity and accuracy of variant detection compared with bulk approaches (Gawad et al. 2016; Chen et al. Science. 2017 Apr. 14; 356(6334):189-194; Lodato et al. 2015). For example, Chen et al. used sister cells for SNP correction in a single-cell whole genome amplification (WGA) method Mutation accumulation experiments that run for hundreds of generations or more to enrich lineages with large numbers of mutations are widely used for fast-growing bacterial cells (Tenaillon et al. 2016) but are impractical for mammalian cells. Other approaches for measuring mutation rates in vitro target specific genomic loci, which can lead to strongly biased estimates of genome-wide rates and genome-wide mutation spectra (Araten et al. 2005). Mutation analysis relying on powerful yeast genetics and tools not available in higher eukaryotes has been performed (Kennedy et al., PLoS Genet 2015 11(4): e1005151). (Genome Biology (2016) 17:99). Thus, there is a need for quantitative and accurate genome-wide somatic mutation analyses that are unbiased by positive or negative selection.
Bacteria have a relatively rapid mutation rate. Antimicrobial resistance (AMR) is the ability of a microbe to resist the effects of medication previously used to treat them. This broader term also covers antibiotic resistance, which applies to bacteria and antibiotics. Resistance arises through one of three ways: natural resistance in certain types of bacteria, genetic mutation, or by one species acquiring resistance from another. Resistance can appear spontaneously because of random mutations; or more commonly following gradual buildup over time, and because of misuse of antibiotics or antimicrobials. Resistant microbes are increasingly difficult to treat, requiring alternative medications or higher doses, both of which may be more expensive or more toxic. Microbes resistant to multiple antimicrobials are called multidrug resistant (MDR); or sometimes superbugs. Antimicrobial resistance is on the rise with millions of deaths every year. All classes of microbes develop resistance: fungi develop antifungal resistance, viruses develop antiviral resistance, protozoa develop antiprotozoal resistance, and bacteria develop antibiotic resistance. The increase in resistant strains include higher morbidity and mortality, longer patient hospitalization, and an increase in treatment costs. (B. Murray, New Engl. J. Med. 330: 1229-1230 (1994)). Thus, there is a need for novel methods to analyze mutations in bacteria with the accuracy and the time resolution to measure mutations with high accuracy and specificity.
Somatic mutations steadily accumulate in our cells. Most somatic mutations are harmless, but some lead to a phenotypic consequence. It is an objective of the present invention to provide a somatic mutation analysis approach that has combined 1) the ability to identify groups of variants that arise during the replication of an individual cell, 2) high accuracy and sensitivity for genome-wide somatic SNVs, and 3) minimal positive and negative mutation selection biases. A further objective of the present invention to provide methods and tools to elucidate exactly how mutations appear in the genome. A further objective of the present invention is to provide methods and tools to determine how mutations are correlated in space and time. For example, a mutation may occur in one generation that effects the mutation rate in downstream generations. A further objective of the present invention is to provide a quantitative model for neutral mutations in order to describe the selective forces that shape genomes. A further objective of the present invention is to provide for determining the effect of environmental stimuli and drugs on somatic mutations. It is another objective of the present invention to provide for personalized medicine by determining the effect of treatments on somatic mutations in cells obtained from a subject in need thereof. It is another objective of the present invention to provide for lineage sequencing of bacteria to detect mutations.
The methods of lineage sequencing of the present invention work in a controlled genetic background and uses expectations about inheritance states, such that every lineage of the pedigree is processed, enabling reconstruction of the entire history of mutations in the clonal population of cells, suppression of sequencing error, and highlighting of rare variants. The dataset obtained is “single-cell resolved” in this sense, with single generation resolution of mutations occurring during the growth of the initial population, and the data is extraordinarily accurate, far superior to standard sequencing approaches due to error correction enabled by the unique structure of lineage sequencing data. The unique data structure also enables determination of correlations among mutation rates along the genomes (after a single-cell replication event) and through time that cannot be observed using conventional approaches at any sequencing effort level.
In one aspect, the present invention provides for a method of detecting somatic mutations across a single cell lineage comprising: isolating single cells from a clonal population, each single cell representing a lineage segment of the clonal population; sequencing genomic DNA representing each single cell; and determining true somatic mutations along the single cell lineage of the clonal population based, at least in part, on the sequencing of the genomic DNA representing each single cell. As used herein, the terms “true variants” or “true somatic mutations” refer to mutations detected by sequencing that are not the result of sequencing errors. In certain embodiments, the method further comprises culturing a sub-clonal cell population from each single cell representing a lineage segment and sequencing genomic DNA from each sub-clonal cell population. In certain embodiments, the method further comprises whole genome amplification (WGA) and sequencing amplified DNA from each single cell. In certain embodiments, the determining step comprises generating a lineage structure of the clonal population that identifies and maps the origin of variants within the clonal population. In certain embodiments, the clonal population is derived from expanding a single parent cell, preferably a eukaryotic cell. In certain embodiments, the clonal population is expanded for at least 2 generations, more preferably, between 5 and 6 generations. In certain embodiments, the single parent cell may be expanded for 2 to 10 generations. In certain embodiments, the method comprises: expanding a single eukaryotic cell into 2 or more generations; isolating single cells from the expanded cells; culturing the single cells, whereby a colony for each cell is obtained; sequencing genomic DNA from each colony; and determining true somatic mutations along the single cell lineage.
The true somatic mutations may be determined by tracing the mutations across the cell lineage for each generation. Each mutation may be observed in at least two cells in the lineage. The method may further comprise detecting a mutation signature. The method may further comprise determining a mutation rate. The method may further comprise determining the set of mutations that occurred in individual cell cycles in the lineage. The method may further comprise clustering of mutations in the genome in an individual cell cycle. The method may further comprise determining relationships in the number, type, spectrum, clustering and/or overlap of mutations in mother and daughter cells and/or along a sub-lineage.
In certain embodiments, the clonal population is derived from expanding a single parent cell exposed to or under exposure to one or more perturbations. The cell(s) may be exposed to the one or more perturbations prior to expanding and/or during the step of expanding. The cells may be exposed to a perturbation before expanding or both before and during expanding. The perturbation may be an environmental condition, a drug, or an agent capable of modulating expression of a gene. The environmental condition may be physical (e.g., temperature, atmospheric pressure, pH, growth media conditions, sheer stress,), chemical (e.g., carcinogen), or biological (e.g., cytokine, pathogen). The perturbation may also be a genetic perturbation of a coding on non-coding genomic region. The agent capable of modulating perturbation of a gene may be a CRISPR system, RNAi, TALE, or zinc finger protein. The agent may be inducible, such that the timing of perturbation may be controlled.
The method according to any embodiment herein may be performed in an automated device. In certain embodiments, the step of isolating single cells from a clonal population is performed in an automated device. In certain embodiments, the step of expanding a single parent cell is performed in an automated device. Automated device as used herein includes semi-automated devices. The device may be operably linked to a computing system.
In certain embodiments, the single cell may be loaded into a microfluidic device configured for segregation of single cells across the cell lineage. In certain embodiments, the single cells may be isolated with an optical tweezer. In certain embodiments, the single cells may be segregated into separate wells. In certain embodiments, expanding a single parent cell is performed on a microscope slide. In certain embodiments, the step of isolating single cells from a clonal population is performed on the microscope slide. The slide may be configured for cutting with UV light. The slide may be a polyethylene-naphthalate (PEN) membrane slide. Isolating single cells from the expanded cells may comprise laser microdissection. The cells may be captured on a sticky cap tube. The cells may be captured with catapulting.
In certain embodiments, methods for isolating single cells may comprise a combination of the methods described above. In certain embodiments, the expanding and isolating may be visually recorded, whereby single cells across are lineage are tracked. In certain embodiments, the sequencing may be whole genome or whole exome sequencing.
In certain embodiments, the single parent cell is a eukaryotic or bacterial cell. In certain embodiments, the single cell is obtained from a subject in need thereof. The single cell may be a stem cell or induced pluripotent stem cell (iPS).
In certain embodiments, the method further comprises detecting phenotypes during the step of expanding a single parent cell. In certain embodiments, more than one parent cell is expanded and the method further comprises selecting specific cells and/or sub-lineages for genome sequencing based on the observed phenotypes.
In another aspect, the present invention provides for a method of detecting mutations in bacteria during single replication events comprising: expanding a single a non-eukaryotic cell into 2 or more generations; segregating single cells from the expanded cells; culturing the single cells, whereby a colony for each cell is obtained; DNA sequencing each colony; and determining true mutations along the single cell lineage. The cell may be expanded into 2 to 10 generations. True somatic mutations may be determined by tracing the mutations across the cell lineage for each generation. Each true mutation may be observed in at least one pair of daughter cells. The single cell may be exposed to a perturbation during the step of expanding into 2 or more generations. The perturbation may be an environmental condition, a drug, or an agent capable of modulating expression of a gene. The environmental condition may be physical, chemical, or biological. The drug may comprise an antibiotic, whereby mutations in response to the antibiotic are detected in single replication events. The method may further comprise detecting a mutation signature.
The single bacterial cell may be obtained by diluting a sample of bacteria. The single bacterial cell may be obtained by sorting a sample of bacteria. The single bacterial cell may be obtained by separation with an optical tweezer and live single cell microscopy.
The single bacterial cell may be loaded into a microfluidic device configured for segregation and recovery of single cells across the cell lineage. The single bacterial cells across a lineage may be segregated on a chip and expanded. The single bacterial cells across a lineage may be segregated into separate wells and expanded.
The method may further comprise determining the growth rate of the isolated single bacterial cells across a lineage. The growth rate may be determined in the presence of an antibiotic. The antibiotic may be in a low dose; such that resistant cells may be detected or growth rates of resistant and sensitive cells compared.
The DNA sequencing may comprise loading the bacterial cells from a cell lineage into a microfluidic device capable of segregating each colony and generating a sequencing library for each colony.
The single bacterial cell may be obtained from a subject in need thereof (e.g., a subject with an infection). The single bacterial cell may be obtained from an environmental sample.
These and other aspects, objects, features, and advantages of the example embodiments will become apparent to those having ordinary skill in the art upon consideration of the following detailed description of illustrated example embodiments.
Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. Definitions of common terms and techniques in molecular biology may be found in Molecular Cloning: A Laboratory Manual, 2nd edition (1989) (Sambrook, Fritsch, and Maniatis); Molecular Cloning: A Laboratory Manual, 4th edition (2012) (Green and Sambrook); Current Protocols in Molecular Biology (1987) (F. M. Ausubel et al. eds.); the series Methods in Enzymology (Academic Press, Inc.): PCR 2: A Practical Approach (1995) (M. J. MacPherson, B. D. Hames, and G. R. Taylor eds.): Antibodies, A Laboraotry Manual (1988) (Harlow and Lane, eds.): Antibodies A Laboraotry Manual, 2nd edition 2013 (E. A. Greenfield ed.); Animal Cell Culture (1987) (R. I. Freshney, ed.); Benjamin Lewin, Genes IX, published by Jones and Bartlet, 2008 (ISBN 0763752223); Kendrew et al. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0632021829); Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 9780471185710); Singleton et al., Dictionary of Microbiology and Molecular Biology 2nd ed., J. Wiley & Sons (New York, N.Y. 1994), March, Advanced Organic Chemistry Reactions, Mechanisms and Structure 4th ed., John Wiley & Sons (New York, N.Y. 1992); and Marten H. Hofker and Jan van Deursen, Transgenic Mouse Methods and Protocols, 2nd edition (2011).
As used herein, the singular forms “a”, “an”, and “the” include both singular and plural referents unless the context clearly dictates otherwise.
The term “optional” or “optionally” means that the subsequent described event, circumstance or substituent may or may not occur, and that the description includes instances where the event or circumstance occurs and instances where it does not.
The recitation of numerical ranges by endpoints includes all numbers and fractions subsumed within the respective ranges, as well as the recited endpoints.
The terms “about” or “approximately” as used herein when referring to a measurable value such as a parameter, an amount, a temporal duration, and the like, are meant to encompass variations of and from the specified value, such as variations of +/−10% or less, +/−5% or less, +/−1% or less, and +/−0.1% or less of and from the specified value, insofar such variations are appropriate to perform in the disclosed invention. It is to be understood that the value to which the modifier “about” or “approximately” refers is itself also specifically, and preferably, disclosed.
The terms “subject”, “individual” or “patient” are used interchangeably throughout this specification, and typically and preferably denote humans, but may also encompass reference to non-human animals, preferably warm-blooded animals, even more preferably mammals, such as, e.g., non-human primates, rodents, canines, felines, equines, ovines, porcines, and the like. The term “non-human animals” includes all vertebrates, e.g., mammals, such as non-human primates, (particularly higher primates), sheep, dog, rodent (e.g. mouse or rat), guinea pig, goat, pig, cat, rabbits, cows, and non-mammals such as chickens, amphibians, reptiles etc. In one embodiment, the subject is a non-human mammal. In another embodiment, the subject is human. In another embodiment, the subject is an experimental animal or animal substitute as a disease model. The term does not denote a particular age or sex. Thus, adult and newborn subjects, as well as fetuses, whether male or female, are intended to be covered. Examples of subjects include humans, dogs, cats, cows, goats, and mice. The term subject is further intended to include transgenic species.
Reference throughout this specification to “one embodiment”, “an embodiment,” “an example embodiment,” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment,” “in an embodiment,” or “an example embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to a person skilled in the art from this disclosure, in one or more embodiments. Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention. For example, in the appended claims, any of the claimed embodiments can be used in any combination.
All publications, published patent documents, and patent applications cited herein are hereby incorporated by reference to the same extent as though each individual publication, published patent document, or patent application was specifically and individually indicated as being incorporated by reference.
Mutation data reveal the dynamic equilibrium between DNA damage and repair processes in cells and are indispensable to the understanding of age-related diseases, tumorigenesis, tumor evolution, and the acquisition of drug resistance. However, available genome-wide methods have a limited ability to resolve rare somatic variants and the relationships between such variants. Embodiments disclosed herein provide methods for a DNA sequencing approach, ‘lineage sequencing’, which carries out independent genome sequencing of lineages of close and known relationships in a clonal population and provides improved sensitivity and specificity of all genomic aberrations that occur in a given cell lineage, including CNVs, indels and SNP detection, in addition to unbiased precise/sensitive detection of de novo mutations genome-wide occurring during in vitro culture with time resolution as good as a single generation and dense lineage sampling. As used herein, the term “clonal” refers to cells derived from the same cell.
Lineage sequencing produces high-quality somatic mutation call sets with resolution as high as the single-cell level in subject lineages. Lineage sequencing entails sampling single cells from a population and sequencing sub-clonal sample sets derived from these cells. The method leverages knowledge of relationships among the cells to jointly call variants across the full sample set. This approach integrates data from multiple sequence libraries representing different sub-lineages (e.g., from sub-clonal sample sets) to support each variant and enables the precise assignment of mutations to lineage segments with resolution as high as a single cell cycle. In certain embodiments, cells can be cultured under continuous observation to link observed single-cell phenotypes with single-cell mutation data. In certain embodiments, the somatic mutation call sets produced by lineage sequencing are consistent with previous analyses in aggregate, while the high sensitivity, specificity, and resolution of the data provide a unique opportunity for quantitative analysis of variation in mutation rate, spectrum, and correlations among variants. In certain embodiments, lineage sequencing allows testing basic questions about mutagenesis, such as the uniformity of mutation rates within a lineage.
The methods of lineage sequencing of the present invention may apply the analysis to single cells in a small population of cells grown in vitro under controlled conditions. In certain embodiments, cells are grown under perturbation conditions. The lineage sequencing approach provides high signal to noise ratio for rare mutations, genome-wide scope, and single-cell single generation resolution. This method enables for the detection of the genetic alterations that are fixed into the genome of single cells within limited time periods in a controlled environment. One important application for this method is enabling direct and precise measurements for quantification of mutagenicity for different environmental sources. The invention is useful as a tool to accurately quantify different substances for their genetic effect on human cells and thusly quantify their mutagenicity. The currently used measurements lack single cell resolution, single generation resolution, and are biased against neutral/deleterious mutations. Minimally biased measurements with maximum resolution are crucial to reveal new information about the biochemical mechanisms of mutation and DNA repair. In addition, having the ability to accurately quantify the mutagenic effect of a drug meets an unmet clinical need to determine the mutagenic effects of therapies prior to their widespread deployment, and can improve treatments to extend life expectancies for cancer patients.
Isolation of Single Cells from a Clonal Population
In certain example embodiments, single cells arising from different lineage segments across a clonal population are isolated. As used herein a “clonal population” refers to a population of cells arising from a single parent cell. Thus, the single cells arising from different lineage segments represent daughter cells replicated from the original parent cell and subsequent daughter cells arising from replication of further generations of daughter cells. In certain example embodiments, the single cell from a given lineage segment may have undergone 1, 2, 3, 4 or 5 cell divisions prior to isolation. The terms “lineage segment” and “generation” may be used interchangeably herein. In certain example embodiments, the clonal population may be expanded for at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 generations. Thus, a single cell is isolated from each generation or lineage segment (e.g., daughter cells from the last generation). The single cell arising from each generational lineage to be assayed may be isolated using any suitable technique for isolating single cells.
In certain embodiments, live cell imaging is used to track single cells. In certain embodiments, single cells across a lineage are visualized and/or recorded by a live cell imaging system. Live cell imaging is the study of living cells using time-lapse microscopy (see, e.g., Khodjakov, A. and Rieder, C. L., Imaging the division process in living tissue culture cells. Methods 38: 2-16 (2006); and Goldman, R. D. and Spector, D. L. (eds.), Live Cell Imaging: A Laboratory Manual. Cold Spring Harbor Press, Cold Spring Harbor, N.Y., 631 pages (2005)). Live cell imaging may be used in the present invention in combination with microfluidic systems, optical tweezers and laser capture microscopy. For example, the separation of cells to different quadrants using optical tweezers may be recorded. In another example, live cell imaging may also be used with a microfluidic device of the present invention to track cells. In another example, live cell imaging may be used to generate movies or recordings of dividing cells (e.g., on a slide). In certain embodiments, the recordings are used to capture cells of a lineage using laser capture microscopy.
In certain embodiments, depreciation is minimized (i.e., accurately identify, track and measure thousands of single cells via high-throughput microscopy). In certain embodiments, cells are labeled by the heterogeneous random uptake of fluorescent nanoparticles of different emission colors to generate a large number of unique digital codes (see, e.g., Rees et al., Nanoparticle vesicle encoding for imaging and tracking cell populations, Nat Methods. 2014 November; 11(11):1177-81. doi: 10.1038/nmeth.3105. Epub 2014 Sep. 14). Commercial highly fluorescent nanocrystals (e.g., Qtracker 525) exploits stochastic uptake dynamics of the nanoparticles. The uptake of nanoparticles and their subsequent encapsulation within vesicles inside the cell is a random process so the number of nanoparticle-labeled vesicles (NLVs) varies from cell to cell.
In certain example embodiments, the single cells may be isolated using a microfluidic device. Microfluidic devices may provide advantages in sorting and tracking of single cells, in automation, in reagent cost, and in control of experimental conditions.
Microfluidic devices disclosed herein may be silicone-based chips and may be fabricated using a variety of techniques, including, but not limited to, hot embossing, molding of elastomers, injection molding, LIGA, soft lithography, silicon fabrication and related thin film processing techniques. Suitable materials for fabricating the microfluidic devices include, but are not limited to, cyclic olefin copolymer (COC), polycarbonate, poly(dimethylsiloxane) (PDMS), and poly(methylacrylate) (PMMA). In one embodiment, soft lithography in PDMS may be used to prepare the microfluidic devices. For example, a mold may be made using photolithography which defines the location of the one or more flow channels and the array of microwells. The substrate material is poured into a mold and allowed to set to create a stamp. The stamp is then sealed to a solid support such as, but not limited to, glass.
Due to the hydrophobic nature of some polymers, such as PDMS, which absorbs some proteins and may inhibit certain biological processes, a passivating agent may be necessary (Schoffner et al. Nucleic Acids Research, 1996, 24:375-379). Suitable passivating agents are known in the art and include, but are not limited to, silanes, parylene, n-Dodecyl-b-D-matoside (DDM), pluronic, Tween-20, other similar surfactants, polyethylene glycol (PEG), albumin, collagen, and other similar proteins and peptides.
The microfluidic devices may further comprise inlet and outlet ports, or openings, which in turn may be connected to valves, tubes, channels, chambers, and syringes and/or pumps for the introduction and extraction of fluids into and from the microfluidic device. The microfluidic devices may be connected to fluid flow actuators that allow directional movement of fluids within the microfluidic device. Example actuators include, but are not limited to, e.g., syringe pumps, mechanically actuated recirculating pumps, electroosmotic pumps, bulbs, bellows, diaphragms, or bubbles intended to force movement of fluids.
In example embodiments, the microfluidic device may be a hydrodynamic trap array device (see, e.g., Kimmerling et al., A microfluidic platform enabling single-cell RNA-seq of multigenerational lineages. Nat Commun. 2016 Jan. 6; 7:10220. doi: 10.1038/ncomms10220). In other example embodiments, the microfluidic device may be a system such as, but not limited to the Fluidigm system (see, e.g., www.fluidigm.com/products/cl-system). In other example embodiments, a microfluidic device may be used for generating a sequencing library integrating lysis, fragmentation, adapter tagging, purification, and size selection of 96 samples in parallel (see, e.g., International Publication Numbers WO 2015/050998 A2 and WO 2016/138290, incorporated herein by reference in its entirety).
In one example embodiment, the single cells are isolated by loading the parent cell on a hydrodynamic trap device. In general, the device comprises an array of hydrodynamic traps that capture and culture single cells for multiple generations on a chip. The device can have multiple lanes for culturing more than one cell. In exemplary embodiments, a single cell is loaded into a lane of a trap array. The single cell is trapped in a first trap. Due to a pressure differential across the trap lanes, cells travel to the first unoccupied trap in the array. When the cell divides, a daughter cell is carried downstream and captured in a subsequent unoccupied trap. In certain embodiments, time lapse imaging of this process allows determination of single-cell proliferation kinetics. In certain embodiments, time lapse imaging of this process allows identification of lineal relationships between cells. In certain embodiments, each lane of traps can capture more than 4, more than 8, more than 16, more than 32, more than 64 or more than 128 cells, thus allowing lineage tracking for more than 2, 3, 4, 5, 6 or 7 generations. In certain embodiments, retrieval of cells is performed by reversing the pressure differential across trap lanes to collect each single cell as they exit the device from the loading channel.
In certain embodiments, single cells across a lineage may be isolated using an optical tweezer. The terms “optical tweezing,” “optical trapping,” “laser tweezing,” “laser trapping” and “single-beam gradient force trap” are used interchangeably herein and refer to the use of a highly focused laser beam to provide an attractive or repulsive force, depending on the refractive index mismatch to physically hold and move microscopic dielectric objects similar to tweezers. Optical tweezing relies on a difference in refractive index between the cell and the surrounding solution and are applicable for use in combination with microfluidic systems (see, e.g., Landry et al., Optofluidic cell selection from complex microbial communities for single-genome analysis. Methods Enzymol. 2013; 531:61-90). As such, cells (or other structures with a boundary defined by a refractive index gradient), but not the solution itself, can be trapped using this method. High-performance microscope objectives can focus light very tightly to create an optical trap that has a capture radius comparable to the wavelength of light used to create the trap.
These properties allow an optical trap to be used with surgical precision to move one cell when many others are nearby, while minimizing the possibility of carrying forward an extraneous cell or foreign DNA. Contamination by environmental DNA or double sorting is a significant practical concern in single-cell genomics, particularly in de novo applications, where reference sequence to identify contaminating reads is not available. By contrast, other popular methods such as fluorescence-activated flow cytometry and micromanipulation work by subdividing a liquid sample, and require a high dilution of cells in clean buffer to reduce the chance of contamination.
While idealized representations often depict spherical objects being trapped, in practice, cells of essentially any morphology can be moved using an optical trap.
In certain other example embodiments, single cells may be isolated using laser capture microscopy. Laser capture microdissection (LCM) is a cell separation technique that combines microscopy with laser beam technology and allows targeting of specific cells or tissue regions that need to be separated from others. (see, e.g., Vandewoestyne et al., Laser capture microdissection: should an ultraviolet or infrared laser be used? Anal Biochem. 2013 Aug. 15; 439(2):88-98. doi: 10.1016/j.ab.2013.04.023). Exemplary systems may be the MMI CellCut laser microdissection (LIVID) system (Molecular Machines & Industries GmbH, Eching, Germany; www.molecular-machines.com/home)(see also, Espina, et al. Laser-capture microdissection technology. Expert. Rev. Mol. Diagn. 7, 647-657). In certain embodiments, cells of interest are grown on a slide and live cell imaging is used to record the dividing cells. In certain embodiments, one microscope is used for live cell imaging allowing for the locations of cells to be tracked and recorded. In certain embodiments, a second microscope is used for laser dissection of single cells identified by live cell imaging (e.g. cells tracked and recorded). In certain embodiments, the spatial coordinates of the cells on the slide are preserved when transferring from one microscope to another. In one embodiment, the slides are marked to maintain spatial orientation of the slides and the tracked cells. In certain embodiments, a laser is used to cut out cells from a coverslip (e.g., a coverslip that can be cut by a laser such as UV) and the cell captured using a surface configured for capturing cells (e.g., a sticky cap of an Eppendorf tube). Capturing may also be performed using catapulting (see, e.g., Horneffer et al., Principles of laser-induced separation and transport of living cells J Biomed Opt. 2007 September-October; 12(5):054016; and Vogel et al., Principles of laser microdissection and catapulting of histologic specimens and live cells, Methods Cell Biol. 2007; 82:153-205). Live cells captured using the live cell imaging and microdissection can be used to perform lineage sequencing as described herein.
In certain embodiments, the lineage tracing (e.g., live imaging, selecting cells of a lineage, and capturing single cells) is automated. In certain embodiments, a computer is a trained to select cells that are part of a lineage with training sets and live cell imaging. The computer can select the dividing cells and capture them for downstream processing. In certain embodiments, the method uses a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently (see, e.g., Ronneberger O., Fischer P., Brox T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N., Hornegger J., Wells W., Frangi A. (eds) Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Springer, Cham).
In certain embodiments, an optical tweezer may be used in combination with laser capture microscopy to facilitate capture of single cells by moving single cells to a location for capture.
In certain embodiments, each isolated single cell from each generational lineage are culture to generate a sub-clonal cell population for each generational lineage. Each sub-clonal cell population may be cultured according to standard cell culture techniques. Imaging may be used to confirm seeding of a single cell in a single culture vessel or well. Sub-colonies may be grown to a density of between 103 to 109 cells and depending on the subsequent sequencing technique to be employed.
General techniques useful in the practice of this invention in cell culture and media uses are known in the art (e.g., Large Scale Mammalian Cell Culture (Hu et al. 1997. Curr Opin Biotechnol 8: 148); Serum-free Media (K. Kitano. 1991. Biotechnology 17: 73); or Large Scale Mammalian Cell Culture (Curr Opin Biotechnol 2: 375, 1991). The terms “culturing” or “cell culture” are common in the art and broadly refer to maintenance of cells and potentially expansion (proliferation, propagation) of cells in vitro. Typically, animal cells, such as mammalian cells, such as human cells, are cultured by exposing them to (i.e., contacting them with) a suitable cell culture medium in a vessel or container adequate for the purpose (e.g., a 96-, 24-, or 6-well plate, a T-25, T-75, T-150 or T-225 flask, or a cell factory), at art-known conditions conducive to in vitro cell culture, such as temperature of 37° C., 5% v/v CO2 and >95% humidity.
Methods related to stem cells and differentiating stem cells are known in the art (see, e.g., “Teratocarcinomas and embryonic stem cells: A practical approach” (E. J. Robertson, ed., IRL Press Ltd. 1987); “Guide to Techniques in Mouse Development” (P. M. Wasserman et al. eds., Academic Press 1993); “Embryonic Stem Cells: Methods and Protocols” (Kursad Turksen, ed., Humana Press, Totowa N. J., 2001); “Embryonic Stem Cell Differentiation in Vitro” (M. V. Wiles, Meth. Enzymol. 225: 900, 1993); “Properties and uses of Embryonic Stem Cells: Prospects for Application to Human Biology and Gene Therapy” (P. D. Rathjen et al., al., 1993). Differentiation of stem cells is reviewed, e.g., in Robertson. 1997. Meth Cell Biol 75: 173; Roach and McNeish. 2002. Methods Mol Biol 185: 1-16; and Pedersen. 1998. Reprod Fertil Dev 10: 31). For further elaboration of general techniques useful in the practice of this invention, the practitioner can refer to standard textbooks and reviews in cell biology, tissue culture, and embryology (see, e.g., Culture of Human Stem Cells (R. Ian Freshney, Glyn N. Stacey, Jonathan M. Auerbach—2007); Protocols for Neural Cell Culture (Laurie C. Doering—2009); Neural Stem Cell Assays (Navjot Kaur, Mohan C. Vemuri—2015); Working with Stem Cells (Henning Ulrich, Priscilla Davidson Negraes—2016); and Biomaterials as Stem Cell Niche (Krishnendu Roy—2010)). For further methods of cell culture solutions and systems, see International Patent publication WO2014159356A1.
The term “medium” as used herein broadly encompasses any cell culture medium conducive to maintenance of cells, preferably conducive to proliferation of cells. Typically, the medium will be a liquid culture medium, which facilitates easy manipulation (e.g., decantation, pipetting, centrifugation, filtration, and such) thereof.
Typically, the medium will comprise a basal medium formulation as known in the art. Many basal media formulations (available, e.g., from the American Type Culture Collection, ATCC; or from Invitrogen, Carlsbad, Calif.) can be used, including but not limited to Eagle's Minimum Essential Medium (MEM), Dulbecco's Modified Eagle's Medium (DMEM), alpha modified Minimum Essential Medium (alpha-MEM), Basal Medium Essential (BME), Iscove's Modified Dulbecco's Medium (IMDM), BGJb medium, F-12 Nutrient Mixture (Ham), Liebovitz L-15, DMEM/F-12, Essential Modified Eagle's Medium (EMEM), RPMI-1640, Medium 199, Waymouth's MB 752/1 or Williams Medium E, and modifications and/or combinations thereof. Compositions of basal media are generally known in the art and it is within the skill of one in the art to modify or modulate concentrations of media and/or media supplements as necessary for the cells cultured.
Such basal media formulations contain ingredients necessary for mammalian cell development, which are known per se. By means of illustration and not limitation, these ingredients may include inorganic salts (in particular salts containing Na, K, Mg, Ca, Cl, P and possibly Cu, Fe, Se and Zn), physiological buffers (e.g., HEPES, bicarbonate), nucleotides, nucleosides and/or nucleic acid bases, ribose, deoxyribose, amino acids, vitamins, antioxidants (e.g., glutathione) and sources of carbon (e.g., glucose, sodium pyruvate, sodium acetate), etc.
For use in culture, basal media can be supplied with one or more further components. For example, additional supplements can be used to supply the cells with the necessary trace elements and substances for optimal growth and expansion. Furthermore, antioxidant supplements may be added, e.g., β-mercaptoethanol. While many basal media already contain amino acids, some amino acids may be supplemented later, e.g., L-glutamine, which is known to be less stable when in solution. A medium may be further supplied with antibiotic and/or antimycotic compounds, such as, typically, mixtures of penicillin and streptomycin, and/or other compounds, exemplified but not limited to, amphotericin, ampicillin, gentamicin, bleomycin, hygromycin, kanamycin, mitomycin, mycophenolic acid, nalidixic acid, neomycin, nystatin, paromomycin, polymyxin, puromycin, rifampicin, spectinomycin, tetracycline, tylosin, and zeocin.
Lipids and lipid carriers can also be used to supplement cell culture media. Such lipids and carriers can include, but are not limited to cyclodextrin, cholesterol, linoleic acid conjugated to albumin, linoleic acid and oleic acid conjugated to albumin, unconjugated linoleic acid, linoleic-oleic-arachidonic acid conjugated to albumin, oleic acid unconjugated and conjugated to albumin, among others. Albumin can similarly be used in fatty-acid free formulations.
Also contemplated is supplementation of cell culture media with mammalian plasma or sera. Plasma or sera often contain cellular factors and components that facilitate cell viability and expansion. Optionally, plasma or serum may be heat inactivated. Heat inactivation is used in the art mainly to remove the complement. Heat inactivation typically involves incubating the plasma or serum at 56° C. for 30 to 60 min, e.g., 30 min, with steady mixing, after which the plasma or serum is allowed to gradually cool to ambient temperature. A skilled person will be aware of any common modifications and requirements of the above procedure. Optionally, plasma or serum may be sterilised prior to storage or use. Usual means of sterilisation may involve, e.g., filtration through one or more filters with pore size smaller than 1 μm, preferably smaller than 0.5 μm, e.g., smaller than 0.45 μm, 0.40 μm, 0.35 μm, 0.30 μm or 0.25 μm, more preferably 0.2 μm or smaller, e.g., 0.15 μm or smaller, 0.10 μm or smaller. Suitable sera or plasmas for use in media as taught herein may include human serum or plasma, or serum or plasma from non-human animals, preferably non-human mammals, such as, e.g., non-human primates (e.g., lemurs, monkeys, apes), foetal or adult bovine, horse, porcine, lamb, goat, dog, rabbit, mouse or rat serum or plasma, etc., or any combination of such. In certain preferred embodiments, a medium as taught herein may comprise bovine serum or plasma, preferably foetal bovine (calf) serum or plasma, more preferably foetal bovine (calf) serum (FCS or FBS). When culturing human cells, media may preferably comprise human serum or plasma, such as autologous or allogeneic human serum or plasma, preferably human serum, such as autologous or allogeneic human serum, more preferably autologous human serum or plasma, even more preferably autologous human serum.
In certain preferred embodiments, serum or plasma can be substituted in media by serum replacements, such as to provide for serum-free media (i.e., chemically defined media). The provision of serum-free media may be advantageous particularly with view to administration of the media or fraction(s) thereof to subjects, especially to human subjects (e.g., improved bio-safety). By the term “serum replacement” it is broadly meant any a composition that may be used to replace the functions (e.g., cell maintenance and growth supportive function) of animal serum in a cell culture medium. A conventional serum replacement may typically comprise vitamins, albumin, lipids, amino acids, transferrin, antioxidants, insulin and trace elements. Many commercialized serum replacement additives, such as KnockOut Serum Replacement (KOSR), N2, B27, Insulin-Transferrin-Selenium Supplement (ITS), and G5 are well known and are readily available to those skilled in the art.
Plasma or serum or serum replacement may be comprised in media as taught herein at a proportion (volume of plasma or serum or serum replacement/volume of medium) between about 0.5% v/v and about 40.0% v/v, preferably between about 5.0% v/v and about 20.0% v/v, e.g., between about 5.0% v/v and about 15.0% v/v, more preferably between about 8.0% v/v and about 12.0% v/v, e.g., about 10.0% v/v.
After the target cell density is achieved, genomic DNA may be extracted from each sub-clonal population using standard genomic DNA extraction technique. Sequencing libraries may then be prepared using standard techniques known in the art. An example sequencing library construction protocol is further described in the working examples below. In preferred embodiments, the invention does not require an amplification step.
In preferred embodiments, the present invention uses next generation sequencing in order to detect mutations genome wide. Exemplary next generation sequencing technologies include, for example, Illumina sequencing, Ion Torrent sequencing, 454 sequencing, SOLiD sequencing, and nanopore sequencing amongst others. Methods for constructing sequencing libraries are known in the art (see, e.g., Head et al., Library construction for next-generation sequencing: Overviews and challenges. Biotechniques. 2014; 56(2): 61-77).
In an embodiment, sequencing is performed on colonies of expanded single cells. Not being bound by a theory, artifacts caused by amplification will not be caused in the expanded cells and the expanded cells provide genetic material that can be later used to validate the results. Whole-genome amplification (WGA) methods are limited by low accuracy of copy-number variation (CNV) detection and low amplification fidelity (Chen et al. 2017). Methods of expanding colonies in tissue culture are well known in the art. In certain example embodiments, sequencing is done using short-read shotgun sequencing. An exemplary short-read shotgun sequencing technique is described in further detail in the working examples below.
In certain embodiments, when colony expansion is difficult, single cells are sequenced after whole genome amplification of single cells (e.g., multiple displacement amplification (MDA) whole-genome amplification (\VGA) chemistry). Since the development of MDA, many other WGA methods have been developed and successfully applied to single cells (Blainey 2013, Cal & Walsh et al, 2012). MDA works by the extension of 6-mer 3′-protected random primers on the DNA template (Dean, et al.; 2001, Genome Res 11: 1095-1099). In MBA, a polymerase with strong strand displacement activity such as phi29 DNA polymerase or Bst DNA polymerase creates and displaces overlapping synthesis products from the template as single-stranded DNA under isothermal conditions (Dean, et al., 2001, Genome Res 11: 1095-1099; Zhang, et al., 2001, Mol Diagn 6: 141-150; Aviel-Ronen, et al., 2006; BMC Genomics 7). The displaced single-stranded DNA is a substrate for further priming and synthesis (Dean, et 2001, Genome Res 11: 1095-1099; Zhang, et al., 2001, Mol Diagn 6: 141-150). Phi29 DNA polymerase is typically specified for MDA due to its high accuracy owing to 3′-5′ exonuclease-mediated proofreading and exceptionally strong processivity in strand displacement synthesis, which can exceed 10,000 nt (Mellado, et al., 1980, Virology 104: 84-96; Blanco & Salas, 1984, Proceedings of the National Academy of Sciences 81: 5325; Blanco; et al., 1989, Journal of Biological Chemistry 264: 8935-8940; Morin, et al., 2012, Proceedings of the National Academy of Sciences 109: 8115-8120). This property of the polymerase evens out amplification on shorter genomic distances to produce high molecular weight products with more uniform amplification across the template than purely PCR-based methods, which typically produce products shorter than 1000 nt and exhibit greater amplification bias (Dean, et al.; 2002, Proceedings of the National Academy of Sciences 99: 5261).
The terms “depth” or “coverage” as used herein refers to the number of times a nucleotide is read during the sequencing process. Depth can be calculated from the length of the original genome (G), the number of reads (N), and the average read length (L) as N×L/G. For example, a hypothetical genome with 2,000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2× redundancy. This parameter also enables one to estimate other quantities, such as the percentage of the genome covered by reads (sometimes also called coverage). A high coverage in shotgun sequencing is desired because it can overcome errors in base calling and assembly. The subject of DNA sequencing theory addresses the relationships of such quantities. Even though the sequencing accuracy for each individual nucleotide is very high, the very large number of nucleotides in the genome means that if an individual genome is only sequenced once, there will be a significant number of sequencing errors. Furthermore rare single-nucleotide polymorphisms (SNPs) are common. Hence to distinguish between sequencing errors and true SNPs, it is necessary to increase the sequencing accuracy even further by sequencing individual genomes a large number of times.
The term “deep sequencing” as used herein indicates that the total number of reads is many times larger than the length of the sequence under study. The term “deep” as used herein refers to a wide range of depths greater than 1× up to 100×.
The terms “low-pass sequencing” or “shallow sequencing” as used herein refers to a wide range of depths greater than or equal to 0.1× up to 1×
The term “ultra-deep” as used herein refers to higher coverage (>100-fold), which allows for detection of sequence variants in mixed populations.
Sequencing data may then be used to identify somatic mutations (single nucleotide variants) in the clonal population. In certain example embodiments, this may comprise identifying of single nucleotide variants that occurred at the same locus in two or more, but less than all, sub-clones. Such groups of matching single nucleotide variants that coincide at the same genomic locus in multiple sub-clones may putatively represent de novo somatic mutations that occurred during expansion of the clonal population. These single nucleotide variants may be referred to as “branch variants.” By contrast, single nucleotide variants shared by all sub clones were most likely present in the founding parent cell. Single nucleotide variants appearing in only one sub-clone likely represent variants that either appeared in the last round of cell division, appeared early in sub-clonal culture (or later if strongly selected), or represent technical errors in sequencing or variant calling. In certain example embodiments, determining true somatic mutations along a single cell lineage may comprise defining set of branch variants comprising variants present in at least two sub-clones. Raw single nucleotide variants may be identified using methods such as MuTect (Cibulskis et al. 2013) or other comparable methods.
Using the branch variant set, mutational events and the flow of mutations through the lineage segments may be quantitatively reconstructed. Coincident single nucleotide variants, that is identical single nucleotide variants appearing at matching genomic loci in two or more but not all sub clones, are identified. In certain example embodiments, coincident single nucleotide events are selected where the quality of coincident single nucleotide variant classification is assessed by calculating the probability for each sample to belong to its consensus-assigned group (“reference” or “alternative”) considering the base content in the read alignment at the locus in question for each sample. For example, the probability may be calculated using a binomial distribution test for each of the samples. Assuming each sample can belong to an assigned reference or variant group (H0) or to another group (H1). The calculated probability can provide a heuristic score to rank the quality of coincident single nucleotide variants groups. For example:
H0: P=P1, the probability weight that the sample belongs to group 1, the assigned reference or variant group.
H1:P=P2, the probability weight that the sample belongs to group 2, a different group than assigned.
For each classified single nucleotide variant, the probability for every sample may be calculated and the minimal probability chosen to represent the quality of the coincident classification option to the single nucleotide variant. Single nucleotide variants with a quality of this coincident classification less than 0.99 may be depleted.
In certain example embodiments, a lineage structure of the clonal population may be determined that identifies and maps the single nucleotide variants within the clonal population. In certain example embodiments, the lineage structure may be determined by analysis of time lapse imaging of cells in the microfluidic trap array device. Single nucleotide variants may be grouped as coincident single nucleotide variants and branch variants called by evaluating the coincident single nucleotide variant quality score for the highest scoring group of sub clones over all the groups of sub clones consistent with the lineage structure determined by time lapse imaging. The coincident single nucleotide variant quality score may be determined as follows:
In another example embodiment, the sequencing data may be used to estimate the most likely lineage structure in each lineage experiment. For example, coincident single nucleotide variants may be identified by analysis of the of raw SNV calls from the sub clones generated by MuTect (see methods section on Lineage sequencing analysis pipeline). Sub clones sharing the same SNVs may be grouped without restrictions by hierarchical clustering and for each coincident SNV, the quality score may be calculated as:
The frequency of coincident SNVs for each group of sub clones may then be tabulated and ranked to generate plots such as those show in
By grouping the samples in each branch variant SNV and comparing to the base call at the same locus in a sister lineage, the reference allele and alternative allele may be determined, and a variant allele fraction calculated without depending on an external reference allele. However, for the first cell division, no such sister lineage exists within the dataset, and an external reference may be used. Suitable example methods for selecting a reference set are further discussed in the working examples below.
In certain example embodiments, the method may further comprise determining a mutation spectrum or signature. Mutational spectra may be defined using standard approaches and reported in the standard format (Alexandrov et al. 2013b). Each SNV may be classified into one of six subtypes; C:G>A:T, C:G>G:C, C:G>T:A, T:A>A:T, T:A>C:G, and T:A>G:C. and further refined by including the sequence context of each mutated base (one base 3′ and one base 5′). For example, a C:G>T:A mutation can be characterized as TpCpG>TpTpG (mutated base underlined and presented as the pyrimidine partner of the mutated base pair) generating 96 possible mutation types (6 types of substitution*4 types of 5′ base*4 types of 3′ base). The mutational spectra may be compared with patterns extracted from clinical and cell line (ATCC) sample. The similarity between two mutational patterns A and B, defined as a non-negative vector with 96 mutation types, was computed by cosine similarity:
The cosine similarity value may be used to determine if mutational spectra are more similar to one another. 95% confidence intervals may be calculated by bootstrapping the SNV list 104 times.
Accordingly, the methods disclosed herein may also be used to determine characteristics pathways or phenotypes of a particular cell lineages. For example, the methods disclosed herein may be used on a single cells derived from a patient to further diagnose a particular disease sub-type or phenotype, which in turn may be used to guide appropriate therapeutic regiments. In certain embodiments, patient cells are analyzed for genotoxicity to a particular drug or therapeutic regimen.
In certain embodiments, the cell is derived from cells taken from a subject, such as a cell line. A wide variety of cell lines for tissue culture models are known in the art. Examples of cell lines include, but are not limited to, HT115, RPE1, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, HeLa-S3, Huh1, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panc1, PC-3, TF1, CTLL-2, C1R, Rath, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calu1, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7, NRK, NRK-52E, MRCS, MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelial, BALB/3T3 mouse embryo fibroblast, 3T3 Swiss, 3T3-L1, 132-d5 human fetal fibroblasts; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A2780ADR, A2780cis, A172, A20, A253, A431, A-549, ALC, B16, B35, BCP-1 cells, BEAS-2B, bEnd.3, BHK-21, BR 293, BxPC3, C3H-10T1/2, C6/36, Cal-27, CHO, CHO-7, CHO-IR, CHO-K1, CHO-K2, CHO-T, CHO Dhfr−/−, COR-L23, COR-L23/CPR, COR-L23/5010, COR-L23/R23, COS-7, COV-434, CML T1, CMT, CT26, D17, DH82, DU145, DuCaP, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, HEK-293, HeLa, Hepa1c1c7, HL-60, HMEC, HT-29, Jurkat, JY cells, K562 cells, Ku812, KCL22, KG1, KYO1, LNCap, Ma-Mel 1-48, MC-38, MCF-7, MCF-10A, MDA-MB-231, MDA-MB-468, MDA-MB-435, MDCK II, MDCK II, MOR/0.2R, MONO-MAC 6, MTD-1A, MyEnd, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NALM-1, NW-145, OPCN/OPCT cell lines, Peer, PNT-1A/PNT 2, RenCa, RIN-5F, RMA/RMAS, Saos-2 cells, Sf-9, SkBr3, T2, T-47D, T84, THP1 cell line, U373, U87, U937, VCaP, Vero cells, WM39, WT-49, X63, YAC-1, YAR, and transgenic varieties thereof. Cell lines are available from a variety of sources known to those with skill in the art (see, e.g., the American Type Culture Collection (ATCC) (Manassas, Va.)).
In certain embodiments, the cell is a microbe. In certain example embodiments, the microbe is a bacterium. Examples of bacteria that can be detected in accordance with the disclosed methods include without limitation any one or more of (or any combination of) Acinetobacter baumanii, Actinobacillus sp., Actinomycetes, Actinomyces sp. (such as Actinomyces israelii and Actinomyces naeslundii), Aeromonas sp. (such as Aeromonas hydrophila, Aeromonas veronii biovar sobria (Aeromonas sobria), and Aeromonas caviae), Anaplasma phagocytophilum, Anaplasma marginal, e Alcaligenes xylosoxidans, Acinetobacter baumanii, Actinobacillus actinomycetemcomitans, Bacillus sp. (such as Bacillus anthracis, Bacillus cereus, Bacillus subtilis, Bacillus thuringiensis, and Bacillus stearothermophilus), Bacteroides sp. (such as Bacteroides fragilis), Bartonella sp. (such as Bartonella bacilliformis and Bartonella henselae, Bifidobacterium sp., Bordetella sp. (such as Bordetella pertussis, Bordetella parapertussis, and Bordetella bronchiseptica), Borrelia sp. (such as Borrelia recurrentis, and Borrelia burgdorferi), Brucella sp. (such as Brucella abortus, Brucella canis, Brucella melintensis and Brucella suis), Burkholderia sp. (such as Burkholderia pseudomallei and Burkholderia cepacia), Campylobacter sp. (such as Campylobacter jejuni, Campylobacter coli, Campylobacter lari and Campylobacter fetus), Capnocytophaga sp., Cardiobacterium hominis, Chlamydia trachomatis, Chlamydophila pneumoniae, Chlamydophila psittaci, Citrobacter sp. Coxiella burnetii, Corynebacterium sp. (such as, Corynebacterium diphtheriae, Corynebacterium jeikeum and Corynebacterium), Clostridium sp. (such as Clostridium perfringens, Clostridium difficile, Clostridium botulinum and Clostridium tetani), Eikenella corrodens, Enterobacter sp. (such as Enterobacter aerogenes, Enterobacter agglomerans, Enterobacter cloacae and Escherichia coli, including opportunistic Escherichia coli, such as enterotoxigenic E. coli, enteroinvasive E. coli, enteropathogenic E. coli, enterohemorrhagic E. coli, enteroaggregative E. coli and uropathogenic E. coli) Enterococcus sp. (such as Enterococcus faecalis and Enterococcus faecium) Ehrlichia sp. (such as Ehrlichia chafeensia and Ehrlichia canis), Erysipelothrix rhusiopathiae, Eubacterium sp., Francisella tularensis, Fusobacterium nucleatum, Gardnerella vaginalis, Gemella morbillorum, Haemophilus sp. (such as Haemophilus influenzae, Haemophilus ducreyi, Haemophilus aegyptius, Haemophilus parainfluenzae, Haemophilus haemolyticus and Haemophilus parahaemolyticus, Helicobacter sp. (such as Helicobacter pylori, Helicobacter cinaedi and Helicobacter fennelliae), Kingella kingii, Klebsiella sp. (such as Klebsiella pneumoniae, Klebsiella granulomatis and Klebsiella oxytoca), Lactobacillus sp., Listeria monocytogenes, Leptospira interrogans, Legionella pneumophila, Leptospira interrogans, Peptostreptococcus sp., Mannheimia hemolytica, Moraxella catarrhalis, Morganella sp., Mobiluncus sp., Micrococcus sp., Mycobacterium sp. (such as Mycobacterium leprae, Mycobacterium tuberculosis, Mycobacterium paratuberculosis, Mycobacterium intracellulare, Mycobacterium avium, Mycobacterium bovis, and Mycobacterium marinum), Mycoplasm sp. (such as Mycoplasma pneumoniae, Mycoplasma hominis, and Mycoplasma genitalium), Nocardia sp. (such as Nocardia asteroides, Nocardia cyriacigeorgica and Nocardia brasiliensis), Neisseria sp. (such as Neisseria gonorrhoeae and Neisseria meningitidis), Pasteurella multocida, Plesiomonas shigelloides. Prevotella sp., Porphyromonas sp., Prevotella melaninogenica, Proteus sp. (such as Proteus vulgaris and Proteus mirabilis), Providencia sp. (such as Providencia alcalifaciens, Providencia rettgeri and Providencia stuartii), Pseudomonas aeruginosa, Propionibacterium acnes, Rhodococcus equi, Rickettsia sp. (such as Rickettsia rickettsii, Rickettsia akari and Rickettsia prowazekii, Orientia tsutsugamushi (formerly: Rickettsia tsutsugamushi) and Rickettsia typhi), Rhodococcus sp., Serratia marcescens, Stenotrophomonas maltophilia, Salmonella sp. (such as Salmonella enterica, Salmonella typhi, Salmonella paratyphi, Salmonella enteritidis, Salmonella cholerasuis and Salmonella typhimurium), Serratia sp. (such as Serratia marcesans and Serratia liquifaciens), Shigella sp. (such as Shigella dysenteriae, Shigella flexneri, Shigella boydii and Shigella sonnei), Staphylococcus sp. (such as Staphylococcus aureus, Staphylococcus epidermidis, Staphylococcus hemolyticus, Staphylococcus saprophyticus), Streptococcus sp. (such as Streptococcus pneumoniae (for example chloramphenicol-resistant serotype 4 Streptococcus pneumoniae, spectinomycin-resistant serotype 6B Streptococcus pneumoniae, streptomycin-resistant serotype 9V Streptococcus pneumoniae, erythromycin-resistant serotype 14 Streptococcus pneumoniae, optochin-resistant serotype 14 Streptococcus pneumoniae, rifampicin-resistant serotype 18C Streptococcus pneumoniae, tetracycline-resistant serotype 19F Streptococcus pneumoniae, penicillin-resistant serotype 19F Streptococcus pneumoniae, and trimethoprim-resistant serotype 23F Streptococcus pneumoniae, chloramphenicol-resistant serotype 4 Streptococcus pneumoniae, spectinomycin-resistant serotype 6B Streptococcus pneumoniae, streptomycin-resistant serotype 9V Streptococcus pneumoniae, optochin-resistant serotype 14 Streptococcus pneumoniae, rifampicin-resistant serotype 18C Streptococcus pneumoniae, penicillin-resistant serotype 19F Streptococcus pneumoniae, or trimethoprim-resistant serotype 23F Streptococcus pneumoniae), Streptococcus agalactiae, Streptococcus mutans, Streptococcus pyogenes, Group A streptococci, Streptococcus pyogenes, Group B streptococci, Streptococcus agalactiae, Group C streptococci, Streptococcus anginosus, Streptococcus equismilis, Group D streptococci, Streptococcus bovis, Group F streptococci, and Streptococcus anginosus Group G streptococci), Spirillum minus, Streptobacillus moniliformi, Treponema sp. (such as Treponema carateum, Treponema petenue, Treponema pallidum and Treponema endemicum, Tropheryma whippelii, Ureaplasma urealyticum, Veillonella sp., Vibrio sp. (such as Vibrio cholerae, Vibrio parahemolyticus, Vibrio vulnificus, Vibrio parahaemolyticus, Vibrio vulnificus, Vibrio alginolyticus, Vibrio mimicus, Vibrio hollisae, Vibrio fluvialis, Vibrio metchnikovii, Vibrio damsela and Vibrio furnisii), Yersinia sp. (such as Yersinia enterocolitica, Yersinia pestis, and Yersinia pseudotuberculosis) and Xanthomonas maltophilia among others.
In certain example embodiments, the microbe is a fungus. Examples of fungi that can be detected in accordance with the disclosed methods include without limitation any one or more of (or any combination of), Aspergillus, Blastomyces, Candidiasis, Coccidiodomycosis, Cryptococcus neoformans, Cryptococcus gatti, Histoplasma, Mucroymcosis, Pneumocystis, Sporothrix, fungal eye infections ringworm, Exserohilum, and Cladosporium.
In certain example embodiments, the fungus is a yeast. Examples of yeast that can be detected in accordance with disclosed methods include without limitation one or more of (or any combination of), Aspergillus species, a Geotrichum species, a Saccharomyces species, a Hansenula species, a Candida species, a Kluyveromyces species, a Debaryomyces species, a Pichia species, or combination thereof. In certain example embodiments, the fungi is a mold. Example molds include, but are not limited to, a Penicillium species, a Cladosporium species, a Byssochlamys species, or a combination thereof.
In certain example embodiments, the microbe is a protozoa. Examples of protozoa that can be detected in accordance with the disclosed methods and devices include without limitation any one or more of (or any combination of), Euglenozoa, Heterolobosea, Diplomonadida, Amoebozoa, Blastocystic, and Apicomplexa. Example Euglenoza include, but are not limited to, Trypanosoma cruzi (Chagas disease), T brucei gambiense, T brucei rhodesiense, Leishmania braziliensis, L. infantum, L. mexicana, L. major, L. tropica, and L. donovani. Example Heterolobosea include, but are not limited to, Naegleria fowleri. Example Diplomonadid include, but are not limited to, Giardia intestinalis (G. lamblia, G. duodenalis). Example Amoebozoa include, but are not limited to, Acanthamoeba castellanii, Balamuthia madrillaris, Entamoeba histolytica. Example Blastocystis include, but are not limited to, Blastocystic hominis. Example Apicomplexa include, but are not limited to, Babesia microti, Cryptosporidium parvum, Cyclospora cayetanensis, Plasmodium falciparum, P. vivax, P. ovale, P. malariae, and Toxoplasma gondii
In certain embodiments, cells are obtained from a subject in need thereof (e.g., subject suffering from cancer, or infection). Single cells may be treated with potential drug candidates (therapeutic agent) for the subject. Mutation rates may be determined for the different drugs and drugs may be selected for treatment that cause the least amount of mutations for cells of the subject. In certain embodiments, the cells assayed are stem cells or induced pluripotent stem cells (iPS). Methods of obtaining cells from a subject are well known in the art. In certain embodiments, the therapeutically effective amount is decreased if the drug causes a significant amount of mutations. In certain embodiments, the therapeutically effective amount is increased if the drug causes a significant amount of mutations.
The terms “therapeutic agent”, “therapeutic capable agent” or “treatment agent” are used interchangeably and refer to a molecule or compound that confers some beneficial effect upon administration to a subject. The beneficial effect includes enablement of diagnostic determinations; amelioration of a disease, symptom, disorder, or pathological condition; reducing or preventing the onset of a disease, symptom, disorder or condition; and generally counteracting a disease, symptom, disorder or pathological condition.
As used herein, “treatment” or “treating,” or “palliating” or “ameliorating” are used interchangeably. These terms refer to an approach for obtaining beneficial or desired results including but not limited to a therapeutic benefit and/or a prophylactic benefit. By therapeutic benefit is meant any therapeutically relevant improvement in or effect on one or more diseases, conditions, or symptoms under treatment. For prophylactic benefit, the compositions may be administered to a subject at risk of developing a particular disease, condition, or symptom, or to a subject reporting one or more of the physiological symptoms of a disease, even though the disease, condition, or symptom may not have yet been manifested.
The term “effective amount” or “therapeutically effective amount” refers to the amount of an agent that is sufficient to effect beneficial or desired results. The therapeutically effective amount may vary depending upon one or more of: the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of administration and the like, which can readily be determined by one of ordinary skill in the art. The term also applies to a dose that will provide an image for detection by any one of the imaging methods described herein. The specific dose may vary depending on one or more of: the particular agent chosen, the dosing regimen to be followed, whether it is administered in combination with other compounds, timing of administration, the tissue to be imaged, and the physical delivery system in which it is carried.
In certain embodiments, the subject in need thereof has cancer. Examples of cancer relevant for the present invention include but are not limited to carcinoma, lymphoma, blastoma, sarcoma, and leukemia or lymphoid malignancies. More particular examples of such cancers include without limitation: squamous cell cancer (e.g., epithelial squamous cell cancer), lung cancer including small-cell lung cancer, non-small cell lung cancer, adenocarcinoma of the lung, squamous carcinoma of the lung and large cell carcinoma of the lung, cancer of the peritoneum, hepatocellular cancer, gastric or stomach cancer including gastrointestinal cancer, pancreatic cancer, glioma, glioblastoma, cervical cancer, ovarian cancer, liver cancer, bladder cancer, hepatoma, breast cancer, colon cancer, rectal cancer, colorectal cancer, endometrial cancer or uterine carcinoma, salivary gland carcinoma, kidney or renal cancer, prostate cancer, vulval cancer, thyroid cancer, hepatic carcinoma, anal carcinoma, penile carcinoma, as well as CNS cancer, melanoma, head and neck cancer, bone cancer, bone marrow cancer, duodenum cancer, oesophageal cancer, thyroid cancer, or hematological cancer.
Other non-limiting examples of cancers or malignancies include, but are not limited to: Acute Childhood Lymphoblastic Leukemia, Acute Lymphoblastic Leukemia, Acute Lymphocytic Leukemia, Acute Myeloid Leukemia, Adrenocortical Carcinoma, Adult (Primary) Hepatocellular Cancer, Adult (Primary) Liver Cancer, Adult Acute Lymphocytic Leukemia, Adult Acute Myeloid Leukemia, Adult Hodgkin's Disease, Adult Hodgkin's Lymphoma, Adult Lymphocytic Leukemia, Adult Non-Hodgkin's Lymphoma, Adult Primary Liver Cancer, Adult Soft Tissue Sarcoma, AIDS-Related Lymphoma, AIDS-Related Malignancies, Anal Cancer, Astrocytoma, Bile Duct Cancer, Bladder Cancer, Bone Cancer, Brain Stem Glioma, Brain Tumours, Breast Cancer, Cancer of the Renal Pelvis and Urethra, Central Nervous System (Primary) Lymphoma, Central Nervous System Lymphoma, Cerebellar Astrocytoma, Cerebral Astrocytoma, Cervical Cancer, Childhood (Primary) Hepatocellular Cancer, Childhood (Primary) Liver Cancer, Childhood Acute Lymphoblastic Leukemia, Childhood Acute Myeloid Leukemia, Childhood Brain Stem Glioma, Glioblastoma, Childhood Cerebellar Astrocytoma, Childhood Cerebral Astrocytoma, Childhood Extracranial Germ Cell Tumours, Childhood Hodgkin's Disease, Childhood Hodgkin's Lymphoma, Childhood Hypothalamic and Visual Pathway Glioma, Childhood Lymphoblastic Leukemia, Childhood Medulloblastoma, Childhood Non-Hodgkin's Lymphoma, Childhood Pineal and Supratentorial Primitive Neuroectodermal Tumours, Childhood Primary Liver Cancer, Childhood Rhabdomyosarcoma, Childhood Soft Tissue Sarcoma, Childhood Visual Pathway and Hypothalamic Glioma, Chronic Lymphocytic Leukemia, Chronic Myelogenous Leukemia, Colon Cancer, Cutaneous T-Cell Lymphoma, Endocrine Pancreas Islet Cell Carcinoma, Endometrial Cancer, Ependymoma, Epithelial Cancer, Esophageal Cancer, Ewing's Sarcoma and Related Tumours, Exocrine Pancreatic Cancer, Extracranial Germ Cell Tumour, Extragonadal Germ Cell Tumour, Extrahepatic Bile Duct Cancer, Eye Cancer, Female Breast Cancer, Gaucher's Disease, Gallbladder Cancer, Gastric Cancer, Gastrointestinal Carcinoid Tumour, Gastrointestinal Tumours, Germ Cell Tumours, Gestational Trophoblastic Tumour, Hairy Cell Leukemia, Head and Neck Cancer, Hepatocellular Cancer, Hodgkin's Disease, Hodgkin's Lymphoma, Hypergammaglobulinemia, Hypopharyngeal Cancer, Intestinal Cancers, Intraocular Melanoma, Islet Cell Carcinoma, Islet Cell Pancreatic Cancer, Kaposi's Sarcoma, Kidney Cancer, Laryngeal Cancer, Lip and Oral Cavity Cancer, Liver Cancer, Lung Cancer, Lymphoproliferative Disorders, Macroglobulinemia, Male Breast Cancer, Malignant Mesothelioma, Malignant Thymoma, Medulloblastoma, Melanoma, Mesothelioma, Metastatic Occult Primary Squamous Neck Cancer, Metastatic Primary Squamous Neck Cancer, Metastatic Squamous Neck Cancer, Multiple Myeloma, Multiple Myeloma/Plasma Cell Neoplasm, Myelodysplastic Syndrome, Myelogenous Leukemia, Myeloid Leukemia, Myeloproliferative Disorders, Nasal Cavity and Paranasal Sinus Cancer, Nasopharyngeal Cancer, Neuroblastoma, Non-Hodgkin's Lymphoma During Pregnancy, Nonmelanoma Skin Cancer, Non-Small Cell Lung Cancer, Occult Primary Metastatic Squamous Neck Cancer, Oropharyngeal Cancer, Osteo-/Malignant Fibrous Sarcoma, Osteosarcoma/Malignant Fibrous Histiocytoma, Osteosarcoma/Malignant Fibrous Histiocytoma of Bone, Ovarian Epithelial Cancer, Ovarian Germ Cell Tumour, Ovarian Low Malignant Potential Tumour, Pancreatic Cancer, Paraproteinemias, Purpura, Parathyroid Cancer,Penile Cancer, Pheochromocytoma, Pituitary Tumour, Plasma Cell Neoplasm/Multiple Myeloma, Primary Central Nervous System Lymphoma, Primary Liver Cancer, Prostate Cancer, Rectal Cancer, Renal Cell Cancer, Renal Pelvis and Urethra Cancer, Retinoblastoma, Rhabdomyosarcoma, Salivary Gland Cancer, Sarcoidosis Sarcomas, Sezary Syndrome, Skin Cancer, Small Cell Lung Cancer, Small Intestine Cancer, Soft Tissue Sarcoma, Squamous Neck Cancer, Stomach Cancer, Supratentorial Primitive Neuroectodermal and Pineal Tumours, T-Cell Lymphoma, Testicular Cancer, Thymoma, Thyroid Cancer, Transitional Cell Cancer of the Renal Pelvis and Urethra, Transitional Renal Pelvis and Urethra Cancer, Trophoblastic Tumours, Urethra and Renal Pelvis Cell Cancer, Urethral Cancer, Uterine Cancer, Uterine Sarcoma, Vaginal Cancer, Visual Pathway and Hypothalamic Glioma, Vulvar Cancer, Waldenstrom's Macroglobulinemia, or Wilms' Tumour.
The present invention is applicable to measuring the mutation rate and detecting mutation signatures in response to an external stimuli (i.e., perturbation). Perturbations may include exposure to different physical parameters, exposure to different types, combinations and/or concentrations of chemical, and biological agents, and analysis of different genetic perturbations, including engineered genetic perturbations. The perturbation may be a drug. In preferred embodiments, the drugs are cancer drugs or antimicrobial drugs (e.g., antibiotics).
The therapeutic agent is for example, a chemotherapeutic or biotherapeutic agent, radiation, or immunotherapy. Any suitable therapeutic treatment for a particular cancer may be administered. Examples of chemotherapeutic and biotherapeutic agents include, but are not limited to an angiogenesis inhibitor, such as angiostatin K1-3, DL-a-Difluoromethyl-ornithine, endostatin, fumagillin, genistein, minocycline, staurosporine, and thalidomide; a DNA intercalator/cross-linker, such as Bleomycin, Carboplatin, Carmustine, Chlorambucil, Cyclophosphamide, cis-Diammineplatinum(II) dichloride (Cisplatin), Melphalan, Mitoxantrone, and Oxaliplatin; a DNA synthesis inhibitor, such as (±)-Amethopterin (Methotrexate), 3-Amino-1,2,4-benzotriazine 1,4-di oxide, Aminopterin, Cytosine β-D-arabinofuranoside, 5-Fluoro-5′-deoxyuridine, 5-Fluorouracil, Ganciclovir, Hydroxyurea, and Mitomycin C; a DNA-RNA transcription regulator, such as Actinomycin D, Daunorubicin, Doxorubicin, Homoharringtonine, and Idarubicin; an enzyme inhibitor, such as S(+)-Camptothecin, Curcumin, (−)-Deguelin, 5,6-Dichlorobenzimidazole I-β-D-ribofuranoside, Etoposide, Formestane, Fostriecin, Hispidin, 2-Imino-1-imidazoli-dineacetic acid (Cyclocreatine), Mevinolin, Trichostatin A, Tyrphostin AG 34, and Tyrphostin AG 879; a gene regulator, such as 5-Aza-2′-deoxycytidine, 5-Azacytidine, Cholecalciferol (Vitamin D3), 4-Hydroxytamoxifen, Melatonin, Mifepristone, Raloxifene, all trans-Retinal (Vitamin A aldehyde), Retinoic acid, all trans (Vitamin A acid), 9-cis-Retinoic Acid, 13-cis-Retinoic acid, Retinol (Vitamin A), Tamoxifen, and Troglitazone; a microtubule inhibitor, such as Colchicine, docetaxel, Dolastatin 15, Nocodazole, Paclitaxel, Podophyllotoxin, Rhizoxin, Vinblastine, Vincristine, Vindesine, and Vinorelbine (Navelbine); and an unclassified antitumor agent, such as 17-(Allylamino)-17-demethoxygeldanamycin, 4-Amino-1,8-naphthalimide, Apigenin, Brefeldin A, Cimetidine, Dichloromethylene-diphosphonic acid, Leuprolide (Leuprorelin), Luteinizing Hormone-Releasing Hormone, Pifithrin-a, Rapamycin, Sex hormone-binding globulin, Thapsigargin, Vismodegib (Erivedge™), and Urinary trypsin inhibitor fragment (Bikunin). The antitumor agent may be a monoclonal antibody or antibody drug conjugate, such as rituximab (Rituxan®), alemtuzumab (Campath®), Ipilimumab (Yervoy®), Bevacizumab (Avastin®), Cetuximab (Erbitux®), panitumumab (Vectibix®), and trastuzumab (Herceptin®), Tositumomab and 1311-tositumomab (Bexxar®), ibritumomab tiuxetan (Zevalin®), brentuximab vedotin (Adcetris®), siltuximab (Sylvant™) pembrolizumab (Keytruda®), ofatumumab (Arzerra®), obinutuzumab (Gazyva™), 90Y-ibritumomab tiuxetan, 1311-tositumomab, pertuzumab (Perjeta™), ado-trastuzumab emtansine (Kadcyla™), Denosumab (Xgeva®), and Ramucirumab (Cyramza™). The antitumor agent may be a small molecule kinase inhibitor, such as Vemurafenib (Zelboraf®), imatinib mesylate (Gleevec®), erlotinib (Tarceva®), gefitinib (Iressa®), lapatinib (Tykerb®), regorafenib (Stivarga®), sunitinib (Sutent®), sorafenib (Nexavar®), pazopanib (Votrient®), axitinib (Inlyta®), dasatinib (Sprycel®), nilotinib (Tasigna®), bosutinib (Bosulif®), ibrutinib (Imbruvica™), idelalisib (Zydelig®), crizotinib (Xalkori®), afatinib dimaleate (Gilotrif®), ceritinib (LDK378/Zykadia), trametinib (Mekinist®), dabrafenib (Tafinlar®), Cabozantinib (Cometriq™), vandetanib (Caprelsa®), The antitumor agent may be a proteosome inhibitor, such as bortezomib (Velcade®) and carfilzomib (Kyprolis®). Optionally, the antitumor agent is a neoantigen. Neoantigens are tumor-associated peptides that serve as active pharmaceutical ingredients of vaccine compositions that stimulate antitumor responses and are described in U.S. Pat. No. 9,115,402, which is incorporated by reference herein in its entirety. The antitumor agent may be a cytokine such as interferons (INFs), interleukins (ILs), or hematopoietic growth factors. The antitumor agent may be INF-a, IL-2, Aldesleukin IL-2, Erythropoietin, Granulocyte-macrophage colony-stimulating factor (GM-CSF) or granulocyte colony-stimulating factor. The antitumor agent may be a targeted therapy such as toremifene (Fareston®), fulvestrant (Faslodex®), anastrozole (Arimidex®), exemestane (Aromasin®), letrozole (Femara®), ziv-aflibercept (Zaltrap®), Alitretinoin (Panretin®), temsirolimus (Torisel®), Tretinoin (Vesanoid®), denileukin diftitox (Ontak®), vorinostat (Zolinza®), romidepsin (Istodax®), bexarotene (Targretin®), pralatrexate (Folotyn®), lenaliomide (Revlimid®), belinostat (Beleodaq™), lenaliomide (Revlimid®), pomalidomide (Pomalyst®), Cabazitaxel (Jevtana®), enzalutamide (Xtandi®), abiraterone acetate (Zytiga®), radium 223 chloride (Xofigo®), or everolimus (Afinitor®). The antitumor agent may be a checkpoint inhibitor such as an inhibitor of the programmed death-1 (PD-1) pathway, for example an anti-PD1 antibody (Nivolumab). The inhibitor may be an anti-cytotoxic T-lymphocyte-associated antigen (CTLA-4) antibody. The inhibitor may target another member of the CD28 CTLA4 Ig superfamily such as BTLA, LAG3, ICOS, PDL1 or KIR. A checkpoint inhibitor may target a member of the TNFR superfamily such as CD40, OX40, CD 137, GITR, CD27 or TIM-3. Additionally, the antitumor agent may be an epigenetic targeted drug such as HDAC inhibitors, kinase inhibitors, DNA methyltransferase inhibitors, histone demethylase inhibitors, or histone methylation inhibitors. The epigenetic drugs may be Azacitidine (Vidaza), Decitabine (Dacogen), Vorinostat (Zolinza), Romidepsin (Istodax), or Ruxolitinib (Jakafi).
In certain embodiments, the perturbation is an agent capable of modulating gene expression. In certain embodiments, the perturbation comprises a CRISPR system, RNAi, TALE, or zinc finger protein. Not being bound by a theory, targeting one or more genomic loci (e.g., genes) may have an effect on mutation rate. Cells may be obtained that already express a CRISPR system and guide RNA for a specific target. The CRISPR system may be inducible, such that the CRISPR system may be induced and lineage followed.
The present invention may be further illustrated and extended based on aspects of CRISPR-Cas development and use as set forth in the following articles and particularly as relates to delivery of a CRISPR protein complex and uses of an RNA guided endonuclease in cells and organisms:
each of which is incorporated herein by reference, may be considered in the practice of the instant invention, and discussed briefly below:
Also, “Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing”, Shengdar Q. Tsai, Nicolas Wyvekens, Cyd Khayter, Jennifer A. Foden, Vishal Thapar, Deepak Reyon, Mathew J. Goodwin, Martin J. Aryee, J. Keith Joung Nature Biotechnology 32(6): 569-77 (2014), relates to dimeric RNA-guided FokI Nucleases that recognize extended sequences and can edit endogenous genes with high efficiencies in human cells.
One type of programmable DNA-binding domain is provided by artificial zinc-finger (ZF) technology, which involves arrays of ZF modules to target new DNA-binding sites in the genome. Each finger module in a ZF array targets three DNA bases. A customized array of individual zinc finger domains is assembled into a ZF protein (ZFP).
ZFPs can comprise a functional domain. The first synthetic zinc finger nucleases (ZFNs) were developed by fusing a ZF protein to the catalytic domain of the Type IIS restriction enzyme FokI. (Kim, Y. G. et al., 1994, Chimeric restriction endonuclease, Proc. Natl. Acad. Sci. U.S.A. 91, 883-887; Kim, Y. G. et al., 1996, Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proc. Natl. Acad. Sci. U.S.A. 93, 1156-1160). Increased cleavage specificity can be attained with decreased off target activity by use of paired ZFN heterodimers, each targeting different nucleotide sequences separated by a short spacer. (Doyon, Y. et al., 2011, Enhancing zinc-finger-nuclease activity with improved obligate heterodimeric architectures. Nat. Methods 8, 74-79). ZFPs can also be designed as transcription activators and repressors and have been used to target many genes in a wide variety of organisms.
Naturally occurring TALEs or “wild type TALEs” are nucleic acid binding proteins secreted by numerous species of proteobacteria. TALE polypeptides contain a nucleic acid binding domain composed of tandem repeats of highly conserved monomer polypeptides that are predominantly 33, 34 or 35 amino acids in length and that differ from each other mainly in amino acid positions 12 and 13. In advantageous embodiments the nucleic acid is DNA. As used herein, the term “polypeptide monomers”, “TALE monomers” or “monomers” will be used to refer to the highly conserved repetitive polypeptide sequences within the TALE nucleic acid binding domain and the term “repeat variable di-residues” or “RVD” will be used to refer to the highly variable amino acids at positions 12 and 13 of the polypeptide monomers. As provided throughout the disclosure, the amino acid residues of the RVD are depicted using the IUPAC single letter code for amino acids. A general representation of a TALE monomer which is comprised within the DNA binding domain is X1-11-(X12X13)-X14-33 or 34 or 35, where the subscript indicates the amino acid position and X represents any amino acid. X12X13 indicate the RVDs. In some polypeptide monomers, the variable amino acid at position 13 is missing or absent and in such monomers, the RVD consists of a single amino acid. In such cases the RVD may be alternatively represented as X*, where X represents X12 and (*) indicates that X13 is absent. The DNA binding domain comprises several repeats of TALE monomers and this may be represented as (X1-11-(X12X13)-X14-33 or 34 or 35)z, where in an advantageous embodiment, z is at least 5 to 40. In a further advantageous embodiment, z is at least 10 to 26.
The TALE monomers have a nucleotide binding affinity that is determined by the identity of the amino acids in its RVD. For example, polypeptide monomers with an RVD of NI preferentially bind to adenine (A), monomers with an RVD of NG preferentially bind to thymine (T), monomers with an RVD of HD preferentially bind to cytosine (C) and monomers with an RVD of NN preferentially bind to both adenine (A) and guanine (G). In yet another embodiment of the invention, monomers with an RVD of IG preferentially bind to T. Thus, the number and order of the polypeptide monomer repeats in the nucleic acid binding domain of a TALE determines its nucleic acid target specificity. In still further embodiments of the invention, monomers with an RVD of NS recognize all four base pairs and may bind to A, T, G or C. The structure and function of TALEs is further described in, for example, Moscou et al., Science 326:1501 (2009); Boch et al., Science 326:1509-1512 (2009); and Zhang et al., Nature Biotechnology 29:149-153 (2011), each of which is incorporated by reference in its entirety.
The system, as described in the present technique or any of its components, may be embodied in the form of a computer system. Typical examples of a computer system includes a general-purpose computer, a programmed microprocessor, a micro-controller, a peripheral integrated circuit element, and other devices or arrangements of devices that are capable of implementing the steps that constitute the method of the present technique.
The computer system may comprise a computer, an input device, a display unit and/or the Internet. The computer further comprises a microprocessor. The microprocessor is connected to a communication bus. The computer also includes a memory. The memory may include Random Access Memory (RAM) and Read Only Memory (ROM). The computer system further comprises a storage device. The storage device can be a hard disk drive or a removable storage drive such as a floppy disk drive, optical disk drive, etc. The storage device can also be other similar means for loading computer programs or other instructions into the computer system. The computer system also includes a communication unit. The communication unit allows the computer to connect to other databases and the Internet through an I/O interface. The communication unit allows the transfer as well as reception of data from other databases. The communication unit may include a modem, an Ethernet card, or any similar device which enables the computer system to connect to databases and networks such as LAN, MAN, WAN and the Internet. The computer system facilitates inputs from a user through input device, accessible to the system through I/O interface.
The computer system executes a set of instructions that are stored in one or more storage elements, in order to process input data. The storage elements may also hold data or other information as desired. The storage element may be in the form of an information source or a physical memory element present in the processing machine.
The set of instructions may include various commands that instruct the processing machine to perform specific tasks such as the steps that constitute the method of the present technique.
The set of instructions may be in the form of a software program. Further, the software may be in the form of a collection of separate programs, a program module with a larger program or a portion of a program module, as in the present technique. The software may also include modular programming in the form of object-oriented programming. The processing of input data by the processing machine may be in response to user commands, results of previous processing or a request made by another processing machine.
The present invention advantageously provides several applications for the described methods. The present invention advantageously provides for identification and quantitative analysis of mutagenicity of different environmental exposures (physical, chemical, biological). The present invention advantageously provides for measuring the effect of drugs. This answer provides an unmet clinical need to determine the mutagenic effects of therapies prior to their widespread deployment, and can improve treatments to extend life expectancies for cancer patients. Using the described methods, it is feasible to quantify the effect of drugs on mutation accumulation phenotype and in addition to read the mutation specific signatures that might shed light on the affected mechanism. The present invention advantageously provides for characterization of mutation accumulation behavior of individuals, and measuring the drug response and DNA damage mechanism of individuals, by performing the method on specific patient derived stem cells or iPS.
The invention is further described in the following examples, which do not limit the scope of the invention described in the claims.
As described herein, somatic mutations steadily accumulate in our cells. While most somatic mutations are harmless, some lead to phenotypic consequences. Highly sensitive and specific detection of mutations by whole genome sequencing is desired.
The illustrated method provides dense sampling allowing reduced selection bias toward faster-dividing lineages that occurs when random single cells or sub-clones from samples of interest are obtained. The method allows known relationships between lineages because the cells are derived from a single cell over generations and the cells can be visualized over time. These known relationships allow the invention to have improved accuracy and/or sensitivity. The method allows the number of cell divisions relating to different sub-clones to be exactly known allowing for a quantitative mutation rate. Mutations that co-occurred in the same single-cell division event can be identified thereby allowing mutations to be spatially correlated. Mutation sets occurring in sequential single-cell division events can be compared thereby allowing mutations to be temporally correlated. Continuous microscopic observation enables recording of phenotypic correlates to mutations. The method also allows for the determination of the causes of mutations (e.g., effective repair, failed repair, distribution of fitness effects (DFE)).
Overall, lineage sequencing enables accurate single-cell sequencing by replacing WGA with culture and applying scalable error correction. The lineage sequencing variant calls are consistent with sensitive and accurate detection of mutations with single-generation resolution.
While low-cost DNA sequencing is transforming biological research and discovery, preparing large sample sets for sequencing with minuscule starting material is now the limiting factor in many applications. Applicants have developed two polydimethlysiloxane (PDMS) microfluidic devices that automates the key steps in whole genome sequencing (WGS) sample preparation, integrating lysis, fragmentation, adapter tagging, purification, and size selection of 96 samples in parallel (
In conclusion, using the microfluidic device, Applicants built 500 WGS libraries from clinical and environmental microbial samples. The low sample prep cost enabled Applicants to process each sample in triplicate to achieve >Q70 consensus base calling accuracy that revealed clonality between the Pseudomonas isolates within the patients, while isolates between patients were vastly diverse. Despite >100× reduction in sample and reagent input, the device made libraries displayed superior quality in complexity and orders of magnitude lower contamination over conventional bench top methodologies. With this advantage, Applicants processed soil micro-colonies (˜103 cells/colony) into WGS libraries to phylotype and predict biosynthetic gene clusters. With this platform, Applicants are processing 3000+clinical isolates of MRSA and PSSA. Applicants are also expanding this technology for other application such as transcriptomics, metagenomics, and long-read DNA sequencing. The microfluidic system may also be used for lineage sequencing.
Here Applicants further describe lineage sequencing (
In this proof-of-concept demonstration, Applicants precisely controlled cells in the population using a microfluidic device (Kimmerling et al. 2016) and used time-lapse microscopy to provide independent knowledge of the relationships between sub clones (
To demonstrate the lineage sequencing concept (
In order to produce a list of provisional SNVs for each lineage experiment, sequence data from all pairs of sub clones were analyzed with MuTect (Cibulskis et al. 2013). SNVs that arose de novo during the lineage experiment were identified by filtering for groups of SNVs produced by the MuTect analysis that occurred at the same locus in two or more (but not all) sub clones. Such groups of matching SNVs that coincide at the same genomic locus in multiple sub clones putatively represent de novo somatic mutations that occurred during generations 1-5 in the lineage experiments. Applicants term these SNVs “branch variants” (
By contrast, non-coincident SNVs representing true variants arising within or after the last (sixth) generation of the HT115 lineage—the leaf variants—had to be identified within individual samples without the extra statistical power lent by joint variant calling across multiple samples. The leaf variants showed an allele fraction distribution distinct from the branch variants with most values lower than 0.5 and ranging down to uncertain instances of candidate variants with low allele fraction that are filtered out by MuTect (
The knowledge that branch variants are clonal is valuable in variant detection. For example, Applicants can easily segment mutations by ploidy of the genomic locus using the read depth in the standard-depth PCR-free data since variant alleles are fully represented. Leaf variants in the data include sub-clonal variants, and their detection is fraught by challenging tradeoffs in read depth and variant allele fraction cutoffs (
Mutations are driven by the chemistry of nucleotides, DNA, and the biology of their metabolism, including synthesis and repair. Each mutagenic pathway has particular characteristics and results in a distinct mutational spectrum (Alexandrov et al. 2013a, 2013b). As a result, observed mutational spectra provide clues about the mutational processes operating in cells. The spectrum of mutations caused by POLE proofreading deficiency has been previously characterized (Shinbrot et al. 2014). Applicants compared the mutational spectrum of the aggregated HT115 branch variant SNV call set with mutational spectra derived from tumor-normal whole genome sequencing of group A POLE colon tumors (Shinbrot et al. 2014) generated by the TCGA Research Network and found high similarity (
Activity of the DNA mismatch repair (MMR) pathway is known to be coordinated with DNA replication and to be most active during S phase, particularly in euchromatic early-replicating regions (Supek and Lehner 2015; Kunkel and Erie 2015). Using high-resolution genomic replication timing data, Applicants compared the frequency of SNVs and replication timing across the genome. SNVs were markedly suppressed in early-replicating regions relative to late-replicating regions (HT115 in
POLE mutations are also associated with a particular type of strand asymmetry called replication-class (R-class) asymmetry (Haradhvala et al. 2016), which arises due to POLE's specific role in synthesis and proofreading of the leading strand during DNA replication and the stereotyped locations of replication origins in much of the human genome. POLE-driven mutations appear in these regions in a polarized fashion with respect to the two DNA strands. For example, Applicants expect a high proportion of C>A relative to G>T in the DNA strand being synthesized as the leading strand (C>A in left replicating regions and G>T in right replicating regions relative to the genomic reference). Indeed, Applicants observed the predicted POLE R-class asymmetry among HT115 branch variant SNVs at the same level previously quantified in TCGA POLE mutant samples (Haradhvala et al. 2016) (
Applicants tested whether lineage structures could be estimated from the genomic data alone by blinding themselves to the time-lapse imaging data from the HT115 and RPE1 experiments. Applicants first filtered the raw SNV calls from MuTect to identify coincident SNV calls (
Lineage sequencing also allows data quality to be tested by quantifying variants that do not agree with the consensus lineage structure. Applicants estimated the specificity and sensitivity of the branch SNV calls as a function of the quality threshold for coincident variant calls, similar to the construction of a receiver operator characteristic (ROC) (
Applicants constructed a null statistical model for mutation accrual based on the assumptions that each somatic mutation event was independent of the others and that the number of mutations appearing in each daughter cell followed the same (Poisson) distribution. Applicants estimated the average mutation rate for HT115 cells as 173 SNVs per cell division with the 95% confidence interval (CI) [147,203] (using the model), which corresponds to a rate of 3.0*10−8, 95% CI [2.5*10−8, 3.5*10−8] SNV per bp per cell division (SNV/bp/division). The error in this estimate is driven principally by the variant counting statistics since Applicants have precise knowledge of the number generations over which mutations accrued in each lineage segment. At haploid loci, leaf variant data can also be used to independently estimate the mutation rate for SNVs at haploid loci with allele fraction of one. These SNVs must occur prior to sub clone expansion to appear clonal, elsewise the SNV would be diluted in the population. Applicants calculated the mutation rate for branch variants, 2.9*10−8 CI [1.3*10−8, 5.2*10−8] SNV/bp/division, and leaf variants, 4.2*10−8 CI [2.2*10−8, 6.5*10−8] SNV/bp/division, at HT115 haploid sites, and found rates similar to those calculated for diploid/triploid sites, suggesting that homology-directed repair makes a minor impact in the HT115 experiment (
Applicants next tested the assumption that mutations accrue according to a Poisson process, or equivalently, that the cells in the lineage experiment growing under similar conditions at the same time, would exhibit the same average mutation rate. Applicants produced quantile-quantile (QQ) plots of p-value quantiles to compare the observed distribution of branch variant SNV counts across each lineage segment with the theoretical Poisson process model based on a constant rate of mutation. The QQ plots show poor concordance of the experimental data with the theoretical Poisson process model (
Applicants observed eleven instances where two SNVs occurred at identical genomic positions in related sub-lineages (HT115: 7 instances,
Here Applicants describe lineage sequencing, a new genome-wide technique that utilizes knowledge of the cell lineage structure to reconstruct mutational events occurring during lineage formation with high resolution, high accuracy, high precision, and minimal bias. Lineage sequencing is based on standard short-read next generation sequencing methods applied at standard shotgun sequencing coverage depth for each sequence library and utilizes joint variant calling across the set of sequence libraries informed by an estimate of the lineage structure to enhance statistical power. The proof of concept implementation achieves partial resolution of individual cell cycles with neither whole-genome amplification nor polymerase chain reaction amplification steps, and as a result, produces sequence libraries with highly uniform and accurate coverage of the genome. By applying lineage sequencing, Applicants measure the accumulation of SNVs by identifying >90% of SNVs that evolved during lineage formation and mapping these onto cells in the lineage with very high resolution (1-4 cell divisions, with little uncertainty in the number of cell divisions in each lineage segment where SNVs were assigned). Altogether, the method provides accurate and nearly complete estimates of cellular genotypes within the lineage, suggesting that lineage sequencing may emerge as a gold standard for somatic mutation quantification.
Applicants were able to distinguish two types of SNV call sets: high-confidence branch variants that occur during lineage formation and leaf variants which occur in the last round of cell divisions or during sub-clonal culture. Leaf variants can be compared to variants identified by the existing “double bottleneck” methodology (Szikriszt et al. 2016; Blokzijl et al. 2016; Drost et al. 2017). Leaf variants cannot be assigned to a specific cell division in the lineage with high confidence and suffer from low specificity, preventing precise quantification for mutation rate assessment.
By contrast, the branch variant analysis shows improved performance over alternative methods by providing a nearly complete and almost perfectly accurate list of somatic variants that accumulate within a specific number of cell divisions thanks to the support of each branch variant by multiple sequencing libraries. The use of the prior estimate of inter-sample relationships here is analogous to the use of known family relationships in human germline trio and quartet sequencing (Roach et al., Science. 2010 April 30; 328(5978): 636-639). In addition to the internal consistency of the branch variant analysis (
While the lineage structures Applicants derived from sequence data alone enabled recovery of the same set of relationships among the sub clones as observed in the time-lapse microscopy, they lack independent information about number of cell divisions and duration of each cell cycle in the lineage that was available from the imaging data. In principle, the observed time-to-replication for each cell in the imaging data can be used to determine whether the number of cell cycles, or alternatively, the elapsed time, is a better predictor of the number of somatic mutations, and further, to constrain likely mechanisms of mutagenesis in these cells. The sample size provided insufficient power for this task, although the analysis illustrates the framework for cell-specific phenotype-mutation set correlative analysis (
Here, Applicants showed two different ways that information about the lineage structure can be obtained prior to joint variant calling to support lineage sequencing: 1) by time-lapse microscopy, and 2) using raw SNVs from whole genome sequence data (provided at least one detectable variant per cell to achieve full resolution of the lineage). In principle, other approaches for tracking/estimating lineage structures could also support lineage sequencing (Woodworth et al. 2017). For example, recently reported self-editing genomic barcoding approaches (Frieda et al. 2017)′(McKenna et al. 2016). With appropriate methods for cell sampling and sub-clonal culture, Applicants expect that the lineage sequencing concept can be applied to study mutations that occur in solid tissues or whole organisms.
Lineage sequencing facilitates in-line data quality evaluation by enabling candidate variants that do not agree with the consensus lineage structure to be quantified. Data quality assessments can in turn be used to optimize the biochemical and computational protocols for analysis of genomic aberrations occurring during lineage formation. Technical errors are introduced independently across sub clones and are uncorrelated, resulting in dramatically lower false positive error rates in the branch variants calls where uncorrelated errors are excluded. The branch variant error probabilities are expected to fall by the square (or higher power for variants supported by data from more than two sub clones) of the nominal consensus error rate, e.g. (10−6)2=10−12. The extra statistical power lent by joint variant calling in lineage sequencing could conceivably ensure accurate and more sensitive mutation detection for direct single-cell readout that depends on noisy whole-genome amplification. In addition, sampling strategies for lineage sequencing can be tailored to optimize statistical power for different estimation tasks: for example, more sparse sampling of cells from the lineage could improve mutation rate determination at constant sequencing effort for cells with few mutations per generation.
These results show that mutations do not occur independently with uniform probability, but rather are heterogeneous across the genome and across closely related cells, even when environmental conditions are uniform. Applicants observed that mutations are not independent, rather that different mutations can occur at some genomic locus in different cells. Lesion persistence probably explains this and serves as an example of one particular mechanism that causes strong correlations among somatic mutations.
In summary, lineage sequencing allows precise assessment of the rate and spectrum of somatic mutations with intra-lineage resolution as high as individual cell division events. Lineage sequencing can be combined with real-time phenotypic observation and/or perturbation to link cellular activity and responses with mutational events at the single-cell level. This detailed level of analysis enables the detection of individual biochemical events in cells and the parsing of mutational spectra with enhanced resolution. Applicants imagine that lineage sequencing could be applied widely to study spontaneous and exposure- or therapy-associated mutational processes in remarkable detail and to help identify the molecular mechanisms and genomic consequences of many mutation types.
Use of the microfluidic devices described herein for lineage sequences represent only single embodiments for practicing the invention. Lineage sequencing can also be performed using live cell imaging and laser capture methods (
In certain embodiments, the lineage tracing is automated. Applicants can train the computer to select cells that are part of a lineage with training sets and live cell imaging (
In certain embodiments, lineage sequencing may be used for genotoxicity testing or in exposure models. In certain embodiments, primary cells are exposed to a toxin. The primary cells may be small lung airway epithelial cells. The toxin may be cigarette extract. Cell lines may include, but are not limited to human lung cell lines (e.g., A549 and BEAS-2B), small airway primary cells (e.g., for purchase from Lonza), or immortalized small airway epithelial cells (e.g., HSAEC1-KT). In certain embodiments, the toxin is a chemotherapy (e.g., cisplatin). In certain embodiments, lineage sequencing is used to quantify the genotoxicity damage from exposure to a drug or toxin and identify repair resistant DNA lesions and tolerance pathways.
In certain embodiments, an epigenetic mutational rate is measured. The present invention can be used to calculate lineages with single-cell epigenomics to enable charting of tumor evolution with therapy. In certain embodiments, cell lineages are isolated and DNA methylation patterns across the lineage are determined. DNA methylation may be determined using bisulfite sequencing methods. Bisulfite sequencing may be performed on single cells isolated from a lineage.
In certain embodiments, lineage sequencing can be combined with Diff-seq (Lin-Diff-seq) (see e.g., Aggeli et al., Diff-seq: A high throughput sequencing-based mismatch detection assay for DNA variant enrichment and discovery, Nucleic Acids Res. 2018 Apr. 20; 46(7): e42). Diff-seq leverages the Surveyor endonuclease to cleave mismatched DNA molecules that are generated after cross-annealing of a complex pool of DNA fragments. Sequencing libraries enriched for Surveyor-cleaved molecules result in increased coverage at the variant sites. In certain embodiments, Diff-seq is applied on DNA molecules obtained from captured lineages. Applicants applied Lin-Diff-seq on HT115 cells (
In certain embodiments, lineage sequencing is used to determine conditions that decrease the mutation rate (e.g., slower division rate) (see, e.g., Tomasetti and Vogelstein, Variation in cancer risk among tissues can be explained by the number of stem cell divisions, Science. 2015 Jan. 2; 347(6217): 78-81). In certain embodiments, lineage sequencing is used for functional genomics, such that specific phenotypes can be detected and associated with somatic mutation. For example, a branch variant or leaf variant may occur that is associated with growth rate or drug resistance. In certain embodiments, a phenotype may be detected in a movie generated by live cell imaging. In certain embodiments, lineage sequencing can be used to screen for agents or conditions that decrease the mutation rate in a population of cells. In certain embodiments, lineage sequencing can be used to screen for conditions that increase the mutation rate in a population of cells (e.g., ageing). In certain embodiments, lineage sequencing can be used to compare the mutation rate between different cells.
Cell Culture and conditioned media preparation. Human HT115 Epithelial colon carcinoma cells from the Cancer Cell Line Encyclopedia (CCLE), (Barretina et al. 2012)), were maintained in High Glucose Dulbecco's modified Eagle's medium (DMEM, Life Technologies, Inc.) supplemented with 15% fetal bovine serum (Mediatech #35-015-CV). The telomerase-immortalized RPE1 cells (ATCC) were maintained in DMEM/F12 medium (ThermoFisher) with 10% fetal bovine serum. Conditioned media for use with the microfluidic device was collected from the stock cell flask 24 hours and no longer than 48 hours after fresh medium was added. The medium was centrifuged (4700 g for 5 min) and filtered (0.2 um) before use.
Microfluidic Device.
Hydrodynamic trap array devices were fabricated in silicon and glass as described previously (Kimmerling et al. 2016). Prior to cell culture in the device, the system was flushed with 10% bleach for ten minutes for sterilization and cleaning, rinsed with water, and then flushed with conditioned cell culture media overnight to fully rinse and prime the system for cell culture. Finally, devices were flushed with a 0.1% poly-L-lysine solution (Sigma) for ten minutes to coat the channel surfaces and promote cell adhesion and growth, followed by a short wash of a few seconds with conditioned cell culture media just before cell loading.
Single-Cell Culture in the Hydrodynamic Trap Array.
A single-cell suspension was introduced at the downstream port of the system to load cells into the device (port P3 in Supplemental
Trypsinization for Cellular Re-Seeding or Release.
Following multigenerational growth in the device, cells were detached from the channel surfaces by flowing in a solution of 0.25% Trypsin and EDTA (Gibco). In order to achieve rapid fluidic exchange while minimizing shear stress on cells within the trap array the pressures were set to have significant flow rate along the bypass channels (P1>>P2, P3) while maintaining only slight flow across the trap lanes (P2>P3). Fully disassociated cells were either re-seeded in the device for continued culture or released one at a time for downstream collection and sub-clonal outgrowth. Cellular re-seeding was carried out both across and within individual lanes of traps in the array. For instance, for longer-term lineage tracking, a single cell was loaded into one lane of the trap array and allowed to divide for two generations. After trypsin treatment, these cells were released into the left bypass channel (P3>P2) where the pressures were set to have no flow (P1=P2) and subsequently re-captured one cell per lane by maintaining slight flow across the array (P2>P3) and periodically increasing the upstream pressure (P1>P2) or setting it to atmospheric pressure (P1<P2) to move cells along the length of the left bypass channel and position them for capture within each lane of the trap array. Following subsequent rounds of cell divisions, cells were re-seeded within each trap lane by gently flowing forward across the trap array (P2>P3) after detachment with trypsin. In both cases, the cellular detachment and re-capture processes were recorded in order to maintain lineage information collected via time lapse imaging. For single cell collection downstream of the device, the pressures were set to have substantial flow rate along the bypass channels (P1>>P2,P3) after detachment with trypsin—these pressures were set such that the volumetric flow rate along the bypass channel was approximately 15 μl per minute. To release individual cells to the bypass channel, the flow direction was periodically reversed (P3>P2) until a cell reached the bypass channel at which point flow into the traps was re-established (P2>P3) to ensure no other cells were released. Each cell was flushed for 30 seconds (approximately 7.5 μl total volume—in order to clear the dead volume of the system) and collected directly into 70 μl of conditioned cell culture medium in a glass-bottom 384 well plate for continued sub-clonal outgrowth in a standard tissue culture incubator (37° C., 5% CO2).
Cells Identity Tracking and Lineage Reconstruction by Time Lapse Image Analysis.
Time-lapse imaging was conducted with a custom LabView program (National Instruments) which drove a TTL-triggered white LED light-source (ThorLabs) for illumination as well as two automated stages (Newport), which traversed the x and y axes to capture multiple fields of view for each frame. Images were taken every 3 minutes (using 10× magnification lens). Lineage structure and time to division measurements were determined by manually tracking the recorded image series using ImageJ software (Example image series shown in Supp. movie 1). The trypsin release and re-seeding and single cell release image series were captured continuously using a lower magnification (4×) lens that captured the entire device image in a single field of view. These image series were analyzed with assistance of custom Python code based on the openCV library package that pre-analyzed the movies and marked each cell with a different color to ease human analysis, which was performed with iMovie software (Apple, Inc.).
Cell Growth Measurements.
Single cells released into separate wells on glass bottom plate for sub-clonal culture were immediately imaged upon collection to validate the presence of a single cell per well. The growth of sub clones was monitored every 24-48 hours, and fresh conditional medium was introduced every 48 hours. For HT115 lineage, Applicants isolated 37 cells (out of 45 cells in the channel), of which 11 grew as sub-clonal cultures and processed for sequencing. For RPE1, Applicants isolated 22 single cells (out of 26 cells in the channel), of which 15 grew as sub-clonal cultures and 13 were processed for sequencing.
gDNA extraction and Library Construction.
Colonies were grown to at least 106 cells. Genomic DNA was extracted from each sample using the QlAamp DNA Mini kit (Qiagen). PCR-free library construction was performed by the Genomics Platform at the Broad Institute using their standard process. All sample information and tracking was performed by automated LIMS messaging. Samples undergo fragmentation by means of acoustic shearing using Covaris focused ultrasound shearing instrument to provide fragments of approximately 385 bp. Following fragmentation, size selection was performed using a SPRI cleanup. Library preparation was performed using a commercially available kit (product KK8202, KAPALIBPREPKT, KAPA Biosystems) that entailed palindromic forked adapters with unique 8 base index sequences embedded within the adapter (the DNA oligonucleotide adaptors were purchased from IDT). Following library construction, each library was quantified using quantitative PCR (qPCR; KAPA Library Quantification kit (ABI Prism) from KAPA Biosystems). This qPCR quantification assay was automated using Agilent's Bravo liquid handling platform. Based on the qPCR quantification, libraries were normalized to 1.7 nanomolar. Library samples were then pooled into groups of 24 samples, and the 24-plex pools were once again quantified by qPCR. Library pools were then combined with HiSeq X Cluster Amp Mix 1, 2 and 3 in a tube strip using the Hamilton Starlet Liquid Handling system. Cluster amplification of the templates was performed according to Illumina's protocol using the cBot instrument (Illumina). Clustered flow cells were then shotgun sequenced on HiSeqX to approximately 35-fold coverage of the genome using proprietary sequencing by synthesis (SBS) reagents (Illumina HiSeqX), then analyzed using RTA2.
Primary Analysis of Genomic Data.
Sequence read alignment, data aggregation, preliminary production analysis and quality control proceeds after sequencing using the automated Picard pipeline (Broad Institute Genomics Platform, (picard.sourceforge.net/). The Picard pipeline produces high quality recalibrated sample level BAM files using the following procedure. Reads were extracted from sequencing instruments and aligned using BWA against the GRCh37/hg19 reference. Duplicate reads were marked for downstream interpretation in the analysis pipeline. Reads around known indel sites were realigned to produce improved alignments. Quality scores were recalibrated using the GATK base quality score recalibrator to increase the accuracy of reported base quality scores. Cell line identity was verified against reference genotype fingerprints. Data were aggregated per sample in BAM format including base calls, quality scores, and alignment data. Finally, summary metrics were generated to allow quality assessment.
Copy Number Variation Analysis.
Copy number variation (CNV) analysis for each cell line (Figure S3) was carried out by the procedure outlined below and showed that HT115 cells were largely diploid as expected. However, CNV analysis showed that the RPE1 cells were predominantly triploid, which might be related to genomic instability caused by telomerase/telomere dysfunction (Garbe et al. 2014), as the RPE1 line was originally immortalized by telomerase overexpression. For CNV analysis, reads were counted for each sample in 10,000-base bins using the GATK 4 Tool SparkGenomeReadCounts function and divided by the median bin coverage. By exploiting the fact that each sample came from a pure sub-clonal population, coverage was scaled by a factor of x where x minimized the objective sum{bins}(abs(x coverage(bin) −nearest integer(x coverage(bin)). Coverage scores were scaled so that they lined up with integers as well as possible. The absolute (but noisy) coverage level values were input into a Hidden Markov Model (HMM) with integer copy number states 0, 1, 2, 3, 4, 5. This HMM learned transition matrix elements between states via an expectation-maximization (EM) algorithm. The observed coverage was modeled as a Cauchy distribution, centered at the integer copy number values and the model learned the width parameter of this emission distribution via the EM algorithm. The parameters were updated by an M step consisting of numerical optimization while the E step consisted of obtaining posteriors from the forward-backward algorithm. Once the model had converged, the Viterbi algorithm was used to obtain segments of constant copy number.
SNV Analysis.
Raw single nucleotide variants (SNVs) were identified using MuTect (Cibulskis et al. 2013) within the “Firehose” pipeline, developed at the Broad Institute (www.broadinstitute.org/cancer/cga). MuTect was run for every pair of sub clones twice, such that each sample acted as a “tumor” sample in one run for a sub clone pair, and as a “normal” sample in another run of the same sub clone pair. The SNV lists from the MuTect runs for each lineage experiment were combined and de-duplicated using a custom Python script.
Lineage Sequencing Analysis Pipeline.
Raw SNVs for each lineage experiment were extracted from the combined MuTect output list. Candidate coincident SNVs, identical SNVs appearing at matching genomic loci in two or more but not all sub clones from a lineage experiment were identified. DNA base counts were extracted for all sub clones at the locus for each coincident SNV. SNVs that contained low coverage (<5 reads) of alternate alleles (compared with the reference) in one of the variant samples were not pursued further. Coincident SNVs were called where the quality of coincident SNV classification was assessed by calculating the probability for each sample to belong to its consensus-assigned group (‘reference’ or ‘alternative’) considering the base content in the read alignment at the locus in question for each sample. The probability was calculated using a binomial distribution test for each of the samples. Assuming each sample can belong to the assigned reference or variant group (H0) or to another group (H1). The calculated probability serves a heuristic score to rank the quality of coincident SNV groups.
H0: P=P1, the probability weight that the sample belongs to group1, the assigned reference or variant group.
H1: P=P2, the probability weight that the sample belongs to group 2, a different group than assigned.
For each classified SNV Applicants calculated this probability for every sample
and choose the minimal probability to represent the quality of this coincident classification option to the SNV. Applicants depleted all SNVs with quality of this coincident classification less than 0.99 (
Two approaches were used to obtain a prior estimate of the lineage structure for use in calling the final branch variant sets:
Method 1: In the “lineage->called variants” approach, the lineage structure was determined by analysis of time lapse imaging of cells in the microfluidic trap array device. This lineage structure was subsequently used for calling branch variants. SNVs were grouped as coincident SNVs and branch variants called by evaluating the coincident SNV quality score for the highest scoring group of sub clones over all the groups of sub clones consistent with the lineage structure determined by time lapse imaging, including the option that an SNV is only truly present in a single sub clone and represents a leaf variant.
Method 2: In the “raw variants->lineage->called variants” approach, Applicants used the shotgun sequence data to estimate the most likely lineage structure in each lineage experiment. This was accomplished by identifying coincident SNVs by analysis of raw SNV calls from the sub clones generated by MuTect (see methods section on Lineage sequencing analysis pipeline). Sub clones sharing the same SNVs were grouped without restrictions by hierarchical clustering and for each coincident SNV, the quality score was calculated
The frequency of coincident SNVs for each group of sub clones was tabulated and ranked to generate the plots in
Classifying SNVs in the First Cell Division of the Lineage.
By grouping the samples in each branch variant SNV and comparing to the base call at the same locus in a sister lineage, Applicants could determine which was the reference allele, which was the alternative, and calculate the variant allele fraction without depending on an external reference allele. However, for the first cell division, no such sister lineage exists within the dataset, so Applicants do need an external reference to identify the reference genotype for the cell that founded each lineage experiment. In the HT115 lineage, this involved the split across sub clones {{47,54}; {34,44,49,63,48,45,38,56,57}} and in the RPE1 lineage, the split across sub clones {{46,39,34,37,38}; {28,32,36,24,27,44,23,22}}. In order to resolve this issue, Applicants compared each branch variant SNV in these groups to the reference genome (hg19) for RPE1 cells, and assigned alleles matching hg19 as reference. Applicants were less confident in using hg19 to identify reference alleles in the fast-mutating HT115 cells, so instead Applicants used base calls from an additional HT115 sub clone Applicants sequenced from a different HT115 lineage experiment (seeded from the same HT115 cell stock) to identify reference alleles at the top of the HT115 lineage presented in this study. An additional consistency check to determine if the correct alleles after the first cell division were assigned as reference was to check that the majority of the SNVs represent homozygote to heterozygous changes as expected, which was indeed found to be the case.
Cell Line Authentication.
Cell identity was authenticated by the Broad Genomics Platform for HT115 using previously stored fingerprint genotypes. The fingerprint consists of the genotype at 82 loci from the query sample which were compared against fingerprints of all the cancer cell line encyclopedia (CCLE) cell lines using the farthest neighbor graph (FNG) algorithm. The highest correlation (0.83) was found between the HT115 sample and the CCLE HT115 sample. For RPE1 (which is not a CCLE cell line), Applicants used the ATCC short tandem repeat (STR)-based authentication service. The STR profile of the RPE1 sample validated with 100% similarity to hTERT RPE1.
POLE Tumor WGS Datasets.
Applicants assembled a collection of 10 whole genome tumor datasets that were published and annotated with POLE coding mutations and mutator phenotype from the Cancer Genome Atlas (TCGA; dbGAP: phs000178.v1.p1)(Haradhvala et al. 2016).
Mutation Spectrum Calculation.
Mutational spectra were defined using standard approaches and reported in the standard format (Alexandrov et al. 2013b). Each SNV was classified into one of six subtypes; C:G>A:T, C:G>G:C, C:G>T:A, T:A>A:T, T:A>C:G, and T:A>G:C. and further refined by including the sequence context of each mutated base (one base 3′ and one base 5′). For example, a C:G>T:A mutation can be characterized as TpCpG>TpTpG (mutated base underlined and presented as the pyrimidine partner of the mutated base pair) generating 96 possible mutation types (6 types of substitution*4 types of 5′ base*4 types of 3′ base). Applicants compared the mutational spectra from the HT115 and RPE1 branch variant and leaf variant SNV call sets with patterns extracted from POLE mutant clinical and cell line (ATCC) samples (
The cosine similarity value indicates mutational spectra are more similar to one another. 95% confidence intervals were calculated by bootstrapping the SNV list 104 times.
Replication Timing Analysis.
DNA replication timing for HT115 cells was measured according to a previously described method (Koren et al. 2012). Briefly, 50 million cells were fixed with EtOH, treated with RNaseA, and stained with Propidium Iodide (PI) for DNA content. G1 and S phase cells were sorted using the FACSAria cell sorter (Beckton Dickinson; 1 million cells per fraction), and genomic DNA was extracted and whole-genome sequenced. Replication timing was calculated by counting the number of S phase reads in consecutive windows containing 200 G1 reads along each chromosome, filtering outlier data points and smoothing the data with a cubic smoothing spline. Applicants arbitrarily divided the genome into early-(>60), intermediate-(>33 & <60) and late (<33) replicating bins. The 95% confidence intervals (presented as error bars) were calculated by bootstrapping the SNV list 104 times. Replication timing data for RPE1 dataset were obtained from a general previously published dataset (Koren et al. 2012).
Branch Variant SNV Occurrence in Genes and Exons.
Genomic regions consisting of genes, exons, and introns were determined from RefSeq gene tables (Pruitt et al. 2007). The total fraction of these regions in the genome were counted and normalized with attention to the locus-wise CNV calls. The log 2(observed/expected) ratios of the branch variant SNV count values in each genomic region type were calculated. One-tailed binomial tests were performed to calculate the statistical significance of deviations by the observed counts from the expected number of mutations (based on the average genome-wide count for each lineage experiment and the size of each genomic region type considered) using binomial statistics (custom Python code); P<0.01 was considered significant. The 95% confidence intervals (presented as error bars) were calculated by bootstrapping the SNV list 104 times, recalculating results for each list, and determining the 0.025th and 0.975th quantile values.
Analysis of Replication-Class (R Class) Asymmetry.
Left- and Right-Replicating regions were calculated from replication timing measurements as described previously (Haradhvala et al. 2016). Regions with 0.1<=slope <=0.3 units/interval were designated right-replicating, and regions with −0.3<=slope <=−0.1 were designated left-replicating. In order to determine the reference strand asymmetries (a control for other types of asymmetry), each of the twelve possible substitutions with respect to the genomic reference strand (six base pair changes x two orientations) were counted and normalized by the number of corresponding bases in the genome to measure mutations/Mb. Rates of complementary mutations (e.g. C>A and G>T) were then compared. To measure replicative strand asymmetries, all SNV were counted with respect to the leading strand template as described (Haradhvala et al. 2016) using the left- and right-replicating regions defined above. For example, C>A mutations in the leading strand reference are considered to be genomic reference strand C>A mutations in left-replicating regions and genomic reference strand G>T mutations in right-replicating regions. Mutation rates were again calculated by normalizing for the number of corresponding bases in the genome within the intervals with defined replication direction and the complementary mutations were compared. Error bars for mutation rates represent a 95% confidence interval for the underlying binomial probability of a given base being mutated, calculated from the beta distribution parameterized as Beta(n+1,N-n+1), where n is the number of a given mutation and N is the size of the genomic territory of the mutated base. P-values represent the binomial probability of seeing a given distribution of complementary mutations, assuming the probability of a given mutation is determined solely by the base composition within an interval (e.g. the probability of seeing a C>A instead of a G>T is the proportion of C:G base-pairs of with a C on the strand of reference. This will be very near a value of 0.5). Strand asymmetries were calculated as log 2 of the ratio of complementary mutation counts. Error bars represent a 95% confidence interval on the log 2 quotient of the underlying binomial probabilities above. These confidence intervals were determined empirically by taking 1000 pairs of samples from the beta distribution above for both complementary mutation sets, taking the log 2 quotient of each sampled pair and determining the 0.025th and 0.975th quantiles of the resulting distribution.
Calculation of Branch Variant SNV Sensitivity and Specificity in Lineage Sequencing.
False positive and false negative SNV call rates were estimated by applying the “raw variants->lineage->called variants” approach to generate calls and analyzing the results in the context of the microscopic derived lineage structure. The consensus lineage structure estimated agreed with the lineage structure determined by time lapse imaging and was assumed to be the true lineage structure for this analysis. In the following, coincident SNVs were considered to be true positives when in agreement with the consensus lineage structure, and false positives when conflicting with the consensus lineage structure.
Next, Applicants prepared a scrambled version of the dataset where sub clone labels were randomly scrambled for each SNV call, and false positive SNV call rates were calculated by summing the count of coincident SNVs that agreed with the consensus lineage in the scrambled dataset. The data were scrambled by building a list of 10,000 alternative scrambling patterns. Each pattern was checked to verify that each real sub clone set representing the known true lineage structure was indeed disrupted by the scrambling operation and would not create bias. For each SNV Applicants randomly picked a scrambling pattern from the list and reassigned SNVs to the new sub clone identities. The scrambled data set was reanalyzed and the false positive coincident SNV rate found to be 5 SNVs for the HT115 lineage experiment and 24 SNVs for the RPE1 lineage experiment where the quality threshold for accepting coincident SNVs was set to >=0.99 (Figure S8).
False negative SNV counts were estimated by counting the number of true SNVs that were filtered out as a result of their failure to surpass the required quality threshold for accepting coincident SNVs. Applicants recognize that these “missing” SNVs could likely be imputed to improve the sensitivity of the branch variant SNV call sets, but Applicants have not performed such an imputation in this study and seek here to report the “raw” false negative rate. Applicants start with the true positive branch variant SNVs, then look in sub clones where the consensus lineage structure would have predicted the same SNV to exist but none was called due to the “missing” SNV falling short of the quality threshold applied (0.99). From the counts of these “missing” SNVs, Applicants subtract the estimated false positive counts separately estimated at this quality threshold to arrive at the estimated false negative SNV count.
FN=[# branch variants<threshold]−[# scrambled branch variants<threshold]
With True Negative (TN), False Positive (FP), True Positive (TP), and False Negative (FN) counts, Applicants could estimate the specificity and sensitivity of lineage sequencing for the HT115 and RPE1 lineage sequencing datasets with the coincident SNV quality threshold at 0.99:
Specificity=TN/(TN+FP)=0.999(HT115 and RPE1)
Sensitivity=TP/(TP+FN)=0.964(HT115) and 0.918(RPE1).
Applicants then estimated the sensitivity and specificity of SNV calls using the above estimation approach as a function of the coincident SNV quality threshold value to produce the plots in
Generation of Lineage Dendrograms from a Sequence Distance Metric.
Dendrograms representing cell lineage relations between samples were measured by counting the number of high-quality coincident SNVs between every pair of samples normalized by the total number of high-quality coincident SNVs. One minus this matrix of similarity scores results matrix of distance scores between every pair of samples. This pairwise distance matrix was used to render dendrograms using the MATLAB seqlinkage tool.
Statistical Analysis of Mutation Rate and Quantile-Quantile Plotting.
The counts of branch variants (
Let yi be the number of mutation per branch, and yid|λ˜Poisson(λ*# of generations).
Applicants assume the non-informative prior distribution of λ|Gamma(α,β) with α=0, β=0. So, the posterior distribution is then:
λ|y·Gamma(α+n*ymeand,β+n),
Applicants then simulated the model (using custom R code) 106 times, and the number mutations per generation and determined as well as the 0.025th and 0.975th quantiles of the resulting distribution. These counts were normalized by dividing out the total size of the genome taking into account the regional copy number variation of each line (for HT115, total bp of 5.7*109, and for RPE1, total bp of 9.15*109) to get the mutation rate with units of SNV/bp/cell division.
The P-value calculations for branch variant counts in each lineage segment were done by simulation of the Poisson model 106 times with dependence on the segment length. The quantile-quantile plot produced by correlating the observed Log 10 of the sorted P-values, against the log 10 of the expected P-values.
Haploid Mutation Count Validation. The Full List of Branch and Leaf Variants in the HT115 lineage experiment was reduced to those SNVs that occur in haploid regions by filtering allele fraction SNVs where both the reference allele fraction was greater than 0.9 and the alternative allele fraction was also greater than 0.9. Leaf variant SNVs in this set are likely to be true SNVs that occurred in the last generation of the lineage experiment. The total length of haploid regions was calculated in consideration of measured copy number variation and totaled 3.05×108 bp. Applicants estimated the mutation rate in these regions in conjunction with the branch and leaf variant counts. Applicants then simulated the Poisson counting statistics as described before 106 times and determined the 0.025th and 0.975th quantiles of the resulting distribution.
Detection of Different SNVs that Occurred at Identical Genomic Positions:
In order to identify SNVs that occurred at identical genomic positions, Applicants modified the “raw variants->lineage->called variants” approach by allowing initial clustering into three groups (‘reference’, ‘alternative1’ or ‘alternative 2’). Initial coincident SNVs were then determined by hierarchical clustering of three groups for each SNV (Scipy hierarchy fcluster). The center of every cluster was calculated and the quality of coincident SNV classification was calculated similarly as above, by the probability of each sample to belong to its consensus-assigned group:
The final quality of the multiple event coincident mutations was determined by taking the minimum probability sample as describing each sample
The estimate of the probability of getting multiple variants coincide for uniform probability along the genome was calculated by simulating 106 times the chance of obtaining 7 cases of multiple variants coincide from 3000 events for HT115 (or 4 cases in 700 events for RPE1) in 3×109 sites. This analysis results in an estimate of the upper limit of the p-value because the procedure for calling multiple mutation events further requires that the events co-occur within a sub-lineage.
The invention is further described by the following numbered paragraphs:
1. A method of detecting the mutation rate across a single cell lineage comprising:
2. The method according to paragraph 1, wherein the cell is expanded into 2 to 10 generations.
3. The method according to paragraph 1 or 2, wherein true somatic mutations are determined by tracing the mutations across the cell lineage for each generation.
4. The method according to any of paragraphs 1 to 3, wherein each mutation is observed in at least one pair of daughter cells.
5. The method according to any of paragraphs 1 to 4, wherein the method further comprises detecting a mutation signature.
6. The method according to any of paragraphs 1 to 5, wherein the cell(s) are exposed to the perturbation during the step of expanding.
7. The method according to any of paragraphs 1 to 6, wherein the perturbation is an environmental condition, a drug, or an agent capable of modulating expression of a gene.
8. The method according to paragraph 7, wherein the environmental condition is physical, chemical, or biological.
9. The method according to paragraph 7, wherein the agent capable of modulating expression of a gene is a CRISPR system, RNAi, TALE, or zinc finger protein.
10. The method according to paragraph 9, wherein the agent is inducible.
11. The method according to any of paragraphs 1 to 10, wherein at least steps (b) and (c) are performed in an automated device.
12. The method according to paragraph 11, wherein the device is operably linked to a computing system.
13. The method according to any of paragraphs 1 to 12, wherein the single cell is loaded into a microfluidic device configured for segregation of single cells across the cell lineage.
14. The method according to any of paragraphs 1 to 12, wherein the single cells are isolated with an optical tweezer.
15. The method according to any of paragraphs 1 to 12, wherein the single cells are isolated by live single cell microscopy.
16. The method according to any of paragraphs 1 to 12, wherein the single cells are segregated into separate wells.
17. The method according to any of paragraphs 1 to 16, wherein the expanding and isolating is visually recorded, whereby single cells across are lineage are tracked.
18. The method according to any of paragraphs 1 to 17, wherein sequencing is whole genome or whole exome sequencing.
19. The method according to any of paragraphs 1 to 18, wherein the single cell is obtained from a subject in need thereof.
20. The method according to paragraph 19, wherein the single cell is a stem cell or iPS.
21. A method of detecting mutations in bacteria during single replication events comprising:
22. The method according to paragraph 21, wherein the cell is expanded into 2 to 10 generations.
23. The method according to paragraph 21 or 22, wherein true somatic mutations are determined by tracing the mutations across the cell lineage for each generation.
24. The method according to any of paragraphs 21 to 23, wherein each mutation is observed in at least one pair of daughter cells.
25. The method according to any of paragraphs 21 to 24, wherein the single cell is exposed to a perturbation during the step of expanding into 2 or more generations.
26. The method according to paragraph 25, wherein the perturbation is an environmental condition, a drug, or an agent capable of modulating expression of a gene.
27. The method according to paragraph 26, wherein the environmental condition is physical, chemical, or biological.
28. The method according to paragraph 26, wherein the drug comprises an antibiotic, whereby mutations in response to the antibiotic are detected in single replication events.
29. The method according to any of paragraphs 21 to 28, wherein the method further comprises detecting a mutation signature.
30. The method according to any of paragraphs 21 to 29, wherein the single bacterial cell is obtained by diluting a sample of bacteria.
31. The method according to any of paragraphs 21 to 29, wherein the single bacterial cell is obtained by sorting a sample of bacteria.
32. The method according to any of paragraphs 21 to 29, wherein the single bacterial cell is obtained by separation with an optical tweezer and live single cell microscopy.
33. The method according to any of paragraphs 21 to 32, wherein the single bacterial cell is loaded into a microfluidic device configured for segregation and recovery of single cells across the cell lineage.
34. The method according to any of paragraphs 21 to 33, wherein single bacterial cells across a lineage are segregated on a chip and expanded.
35. The method according to any of paragraphs 21 to 33, wherein the single cells across a lineage are segregated into separate wells and expanded.
36. The method according to any of paragraphs 21 to 35, further comprising determining the growth rate of the isolated single cells across a lineage.
37. The method according to paragraph 36, wherein the growth rate is determined in the presence of an antibiotic.
38. The method according to any of paragraphs 21 to 37, wherein DNA sequencing comprises loading the bacterial cells from a cell lineage into a microfluidic device capable of segregating each colony and generating a sequencing library for each colony.
39. The method according to any of paragraphs 21 to 38, wherein the single bacterial cell is obtained from a subject in need thereof.
40. The method according to any of paragraphs 21 to 38, wherein the single bacterial cell is obtained from an environmental sample.
Various modifications and variations of the described methods, devices, compositions, and kits of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific embodiments, it will be understood that it is capable of further modifications and that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention that are obvious to those skilled in the art are intended to be within the scope of the invention. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure come within known customary practice within the art to which the invention pertains and may be applied to the essential features herein before set forth.
This application claims the benefit of U.S. Provisional Application Nos. 62/504,484, filed May 10, 2017 and 62/516,028, filed Jun. 6, 2017. The entire contents of the above-identified applications are hereby fully incorporated herein by reference.
This invention was made with government support under Grant Nos. ES002109 and AI110787 awarded by the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2018/032157 | 5/10/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62516028 | Jun 2017 | US | |
62504484 | May 2017 | US |