Human cancer cell lines have been a driving force in cancer research and a useful tool for discovering oncogenic mechanisms and new therapeutic targets. However, large-scale characterization of cell lines has been limited to rudimentary metrics, such as viability in cell culture, because more complex phenotypes, e.g., in vivo behaviors, have not been tractable at scale. Most studies of metastasis rely on only a small number of experimental models, which make it difficult to extrapolate findings to genetically diverse human tumors. While there are hundreds of human cancer cell lines, the prospect of in vivo testing of each cell line, one-by-one, is unattractive not only because of its labor intensity, but also because of the difficulty in sufficiently controlling for variability between animal experiments. Thus, there is an urgent and demonstrated need for improved methods for characterizing the metastatic potential of cancer cell lines in vivo.
As described below, the present invention features methods and compositions for characterizing the metastatic potential of cancer cell lines, as well as an interactive metastasis map featuring information that defines such cancer cell lines (e.g., their propensity to metastasize, organs where metastasis is typically observed, sequence data, genomic data, transcriptomic data, proteomic data, metabolomic data, drug sensitivity data, CRISPR knockout viability data, shRNA knockdown data, and annotated data relating to the cell of origin).
In one aspect, the present invention provides a method of characterizing the metastatic potential of a mixture of cancer cells in vivo, the method including systemically delivering to a non-human subject the plurality of cancer cells, where each cell contains a vector encoding as a single transcript a barcode, a detectable marker suitable for in vivo imaging, and a detectable marker suitable for cell selection and/or sorting. This method also includes imaging the cells and their descendants subsequent to delivery to locate where in the body the cell and/or its descendants are present, thereby characterizing the metastatic potential.
In another aspect, the invention provides a method of characterizing the metastatic potential of a mixture cancer cells in vivo, the method including systemically delivering to a non-human subject the plurality of cancer cells, each cell comprising a vector encoding a barcode; and subsequent to delivery detecting the bar code in a cell, tissue, or organ to determine where in the body the cell and/or its descendants are present, thereby characterizing the metastatic potential.
In another aspect, the invention provides a method of generating a metastasis map, the method including systemically delivering to a non-human subject a plurality of cells, each cell containing a vector encoding as a single transcript, a barcode, a detectable marker suitable for in vivo imaging, and a detectable marker suitable for cell selection and/or sorting, detecting the cells and their descendants subsequent to delivery to identify where in the body the cell and/or its descendants are present, compiling the detection data in a database, and associating the data with the cell's identity, thereby generating a metastasis map.
In yet another aspect, the invention provides a method for generating a metastasis map, the method including systemically delivering to a non-human subject a plurality of cells, each cell comprising a vector encoding as a barcode and detecting and quantitating expression of the barcode, compiling the expression data in a database and associating the expression data with the cell's identity, thereby generating a metastasis map.
In some embodiments of these inventions, the methods also include allowing the plurality of cells to proliferate in the subject for a period of time (e.g., days, weeks, and months). In some embodiments, the methods also include isolating the cells from the subject and characterizing the identity of the cells and their abundance. In some embodiments, the method also includes sorting the isolated cells. In embodiments of the above aspects or any other aspect of the invention, the identity and quantity of the cells or the sorted cells is assessed by next-generation sequencing or quantitative PCR. In some embodiments, the methods include carrying out single cell RNA sequencing on each cell, thereby generating a transcriptome for each cell. In some embodiments, the cells are isolated from brain, lung, liver, bone, and/or another organ or tissue. In one embodiment of the methods presented above, the plurality of cells is derived from two or more distinct cell lines. In some embodiments, the plurality of cells is derived from at least about 50, 100, 200, 300, 400, 500 or more cell lines. In some embodiments of the methods wherein the cell has a vector encoding marker suitable for imaging, the marker is a bioluminescent marker. In some embodiments, the imaging is used to monitor metastatic growth of the cells in vivo. In some embodiments, the expression levels of the barcode, the detectable marker suitable for in vivo imaging, and the detectable marker suitable for cell selection and/or sorting are correlated. In some embodiments, the abundance of the barcodes reflects the metastatic potentials of different cells. In some embodiments, barcode-enriched cells are characterized as highly metastatic, barcode-present cells are characterized as weakly metastatic, and barcode-depleted cells are characterized as non-metastatic. In some embodiments, the methods also include harvesting tissue of the non-human subject. In some embodiments, the methods also include preparing a lysate from the tissue, and in some embodiments, the methods also include isolating the cells from the lysate and characterizing the identity and quantity of the cells. In some embodiments of the above aspects, the cells are isolated from the subject, characterized as to their identity and abundance, and the data included in the metastasis map. In some embodiments, a genomic, transcriptomic or proteomic profile of the cell is included in the metastasis map. In some embodiments, the identity of the cells or the sorted cells and their quantity is assessed by next-generation sequencing or quantitative PCR, and the data included in the metastasis map. In some embodiments, the data is used to generate a metastasis map that includes a visual representation of the anatomical position of the cells and their proliferation over time. In some embodiments, drug sensitivity data, CRISPR knockout viability data, shRNA knockdown data, annotated cell line data, a metabolite profile, a genomic profile, a transcriptomic profile, or a proteomic profile of the cell is included as an interactive feature within the visual representation.
In another aspect, the invention provides a vector containing a single transcription cassette containing a detectable marker suitable for cell selection and/or sorting, a marker suitable for imaging a cell in vivo, and a barcode. In some embodiments, the vector is a viral vector, and in some instances the viral vector is a lentiviral vector. In some embodiments, the expression levels of the markers and the barcode are correlated. In some embodiments, the marker suitable for cell selection and/or sorting is GFP or mCherry. In some embodiments, the marker suitable for imaging is luciferase.
In yet another aspect, the invention provides a method for identifying the molecular features characteristic of a metastatic cell, wherein the method includes using the metastasis map generated using any of the methods disclosed herein to identify organ-specific patterns of metastasis. In some embodiments, the method also includes utilizing the organ specific patterns of metastasis to identify molecular features that distinguish brain-metastatic from non-metastatic cell lines. In some embodiments, the method also includes using genomic data from each cell to identify a mutation associated with brain metastasis.
In yet another aspect, the invention provides a computer implemented method of generating a metastasis map quantifying metastatic potential, the method involving receiving, by a processor, a listing of vectors encoded as barcodes, the vectors being associated with a plurality of cells systemically delivered to a non-human subject; receiving, from an imaging device, images of the plurality of cells and their descendants within the non-human subject; storing, by the processor, the images of the plurality of cells and their descendants in a database and identifying, by the processor, locations of the plurality of cells and their descendants from the images using the barcodes; and generating, by the processor, the metastasis map based on the locations of the plurality of cells and their descendants. In some embodiments, the method also includes comparing the location of the plurality of cells and their descendants from an image at a first point in time to the location of the plurality of cells and their descendants from an image at a second point in time. In some embodiments, the method also includes isolating cells at a particular location for presentation within the metastasis map. In some embodiments, the method also includes identifying cell types from for the plurality of cells and their descendants from the images, and in some embodiments, the method also includes isolating cell types for presentation within the metastasis map.
In other embodiments of the above aspects or any other aspect of the invention, the methods involve generating a visual representation of an anatomical position of the plurality of cells and their proliferation over time within the metastasis map. In some embodiments, the method also involves generating a genomic, transcriptomic or proteomic profile for the plurality of cells as an interactive feature within in the metastasis map. In some embodiments, the method further includes analyzing the plurality of cells and their descendants to characterize at least one of their identity, quantity, and abundance for visualization within the metastasis map. In some embodiments, comparing the location of the plurality of cells and their descendants at the first point in time and the second point in time is used to monitor metastatic growth of the cells over time in vivo. In some embodiments, the metastasis map is generated as a heat map for particular locations within the non-human subject. In some embodiments, the metastasis map is generated as at least one of a heat map, a pie chart, a bar graph, a PCA plot, and a radar plot. In yet another embodiment, the metastasis map can be generated showing quantities of each cell type from the plurality of cells at a particular location.
In another aspect, the invention provides a system for generating a metastasis map quantifying metastatic potential, the system containing a CPU, a computer readable memory and a computer readable storage medium, program instructions to receive a listing of vectors encoded as barcodes, the vectors being associated with a plurality of cells systemically delivered to a non-human subject; program instructions to receive images of the plurality of cells and their descendants within the non-human subject from an imaging device; program instructions to store the images of the plurality of cells and their descendants in a database and program instructions to identify locations of the plurality of cells and their descendants from the images using the barcodes; program instructions to generate the metastasis map based on the locations of the plurality of cells and their descendants.
The invention provides methods and compositions for determining the metastatic potential of cancer cell lines in an efficient and large-scale manner. Compositions and articles defined by the invention were isolated or otherwise manufactured in connection with the examples provided below. Other features and advantages of the invention will be apparent from the detailed description, and from the claims.
Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale &
Marham, The Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings ascribed to them below, unless specified otherwise.
By “alteration” is meant a change (increase or decrease) in the expression levels or activity of a gene or polypeptide as detected by standard art known methods such as those described herein. As used herein, an alteration includes a 10% change in expression levels, preferably a 25% change, more preferably a 40% change, and most preferably a 50% or greater change in expression levels.
In this disclosure, “comprises,” “comprising,” “containing” and “having” and the like can have the meaning ascribed to them in U.S. Patent law and can mean “includes,” “including,” and the like; “consisting essentially of” or “consists essentially” likewise has the meaning ascribed in U.S. Patent law and the term is open-ended, allowing for the presence of more than that which is recited so long as basic or novel characteristics of that which is recited is not changed by the presence of more than that which is recited, but excludes prior art embodiments.
“Detect” refers to identifying the presence, absence or amount of the analyte to be detected.
By “detectable label” is meant a composition that when linked to a molecule of interest renders the latter detectable, via spectroscopic, photochemical, biochemical, immunochemical, or chemical means. For example, useful labels include radioactive isotopes, magnetic beads, metallic beads, colloidal particles, fluorescent dyes, electron-dense reagents, enzymes (for example, as commonly used in an ELISA), biotin, digoxigenin, or haptens.
By “disease” is meant any condition or disorder that damages or interferes with the normal function of a cell, tissue, or organ. Examples of diseases include cancer (e.g., metastatic cancer). Examples of cancers include, without limitation, leukemias (e.g., acute leukemia, acute lymphocytic leukemia, acute myelocytic leukemia, acute myeloblastic leukemia, acute promyelocytic leukemia, acute myelomonocytic leukemia, acute monocytic leukemia, acute erythroleukemia, chronic leukemia, chronic myelocytic leukemia, chronic lymphocytic leukemia), polycythemia vera, lymphoma (Hodgkin's disease, non-Hodgkin's disease), Waldenstrom's macroglobulinemia, heavy chain disease, and solid tumors such as sarcomas and carcinomas (e.g., fibrosarcoma, myxosarcoma, liposarcoma, chondrosarcoma, osteogenic sarcoma, chordoma, angiosarcoma, endotheliosarcoma, lymphangiosarcoma, lymphangioendotheliosarcoma, synovioma, mesothelioma, Ewing's tumor, leiomyosarcoma, rhabdomyosarcoma, colon carcinoma, pancreatic cancer, breast cancer, ovarian cancer, prostate cancer, squamous cell carcinoma, basal cell carcinoma, adenocarcinoma, sweat gland carcinoma, sebaceous gland carcinoma, papillary carcinoma, papillary adenocarcinomas, cystadenocarcinoma, medullary carcinoma, bronchogenic carcinoma, renal cell carcinoma, hepatoma, bile duct carcinoma, choriocarcinoma, seminoma, embryonal carcinoma, Wilm's tumor, cervical cancer, uterine cancer, testicular cancer, lung carcinoma, small cell lung carcinoma, bladder carcinoma, epithelial carcinoma, glioma, astrocytoma, medulloblastoma, craniopharyngioma, ependymoma, pinealoma, hemangioblastoma, acoustic neuroma, oligodenroglioma, schwannoma, meningioma, melanoma, neuroblastoma, and retinoblastoma).
The invention provides a number of targets that are useful for the development of highly specific drugs to treat or a disorder characterized by the methods delineated herein. In addition, the methods of the invention provide a facile means to identify therapies that are safe for use in subjects. In addition, the methods of the invention provide a route for analyzing virtually any number of compounds for effects on a disease described herein with high-volume throughput, high sensitivity, and low complexity.
By “fragment” is meant a portion of a polypeptide or nucleic acid molecule. This portion contains, preferably, at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of the entire length of the reference nucleic acid molecule or polypeptide. A fragment may contain 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 nucleotides or amino acids.
By “genomic profile” is meant a collection of information relating to single nucleotide alterations and copy number alterations. A genomic profile may include all or a portion of the genomic sequence of one or more cells. A genomic profile may include deviations from a reference genomic sequence. For example, a genomic profile of a cancer cell may include single nucleotide variants or other mutations that are not present in a normal, non-cancerous cell.
By “harvesting” is meant collecting a biological sample from a subject. In some instances, harvesting includes excision of an organ. In other instances, harvesting includes a biopsy.
“Hybridization” means hydrogen bonding, which may be Watson-Crick, Hoogsteen or reversed Hoogsteen hydrogen bonding, between complementary nucleobases. For example, adenine and thymine are complementary nucleobases that pair through the formation of hydrogen bonds.
The terms “isolated,” “purified,” or “biologically pure” refer to material that is free to varying degrees from components which normally accompany it as found in its native state. “Isolate” denotes a degree of separation from original source or surroundings. “Purify” denotes a degree of separation that is higher than isolation. A “purified” or “biologically pure” protein is sufficiently free of other materials such that any impurities do not materially affect the biological properties of the protein or cause other adverse consequences. That is, a nucleic acid or peptide of this invention is purified if it is substantially free of cellular material, viral material, or culture medium when produced by recombinant DNA techniques, or chemical precursors or other chemicals when chemically synthesized. Purity and homogeneity are typically determined using analytical chemistry techniques, for example, polyacrylamide gel electrophoresis or high-performance liquid chromatography. The term “purified” can denote that a nucleic acid or protein gives rise to essentially one band in an electrophoretic gel. For a protein that can be subjected to modifications, for example, phosphorylation or glycosylation, different modifications may give rise to different isolated proteins, which can be separately purified.
By “isolated polynucleotide” is meant a nucleic acid (e.g., a DNA) that is free of the genes which, in the naturally-occurring genome of the organism from which the nucleic acid molecule of the invention is derived, flank the gene. The term therefore includes, for example, a recombinant DNA that is incorporated into a vector; into an autonomously replicating plasmid or virus; or into the genomic DNA of a prokaryote or eukaryote; or that exists as a separate molecule (for example, a cDNA or a genomic or cDNA fragment produced by PCR or restriction endonuclease digestion) independent of other sequences. In addition, the term includes an RNA molecule that is transcribed from a DNA molecule, as well as a recombinant DNA that is part of a hybrid gene encoding additional polypeptide sequence.
By an “isolated polypeptide” is meant a polypeptide of the invention that has been separated from components that naturally accompany it. Typically, the polypeptide is isolated when it is at least 60%, by weight, free from the proteins and naturally-occurring organic molecules with which it is naturally associated. Preferably, the preparation is at least 75%, more preferably at least 90%, and most preferably at least 99%, by weight, a polypeptide of the invention. An isolated polypeptide of the invention may be obtained, for example, by extraction from a natural source, by expression of a recombinant nucleic acid encoding such a polypeptide; or by chemically synthesizing the protein. Purity can be measured by any appropriate method, for example, column chromatography, polyacrylamide gel electrophoresis, or by HPLC analysis.
By “marker” is meant any analyte (e.g., protein or polynucleotide) having an alteration in expression level or activity that is associated with a disease or disorder.
By “Metastasis Map” or “MetMap” is meant a collection of data related to the cancer cell lines. In one embodiment, a MetMap delineates the metastatic potential of each cell line in the collection.
“Metastatic potential” refers to the propensity of a cancer to develop secondary malignant growths at a distance from a primary site of cancer.
By “metastatic tumor” is meant a malignant growth that originates from a single cell that has survived in circulation, undergone extravasation, initiated tumor formation, and/or induced blood vessel remodeling.
As used herein, “obtaining” as in “obtaining an agent” includes synthesizing, purchasing, or otherwise acquiring the agent.
By “proteomic profile” is meant information about the expression of proteins. A proteomic profile may include all or a portion of the proteins present in a cell (e.g., cancer cell). A proteomic profile may include information about alterations in protein expression relative in a cancer cell relative to the protein expression of a reference cell. In some embodiments, the alteration is the presence or absence of a protein relative to a reference cell. The proteomic profile may include alterations in the amount of one or more proteins present in a cell compared to a reference cell. In some embodiments, a reference cell is a normal, non-cancerous cell derived from the same tissue the cancerous cell is derived from.
By “reduces” is meant a negative alteration of at least 10%, 25%, 50%, 75%, or 100%.
By “reference” is meant a standard or control condition.
A “reference sequence” is a defined sequence used as a basis for sequence comparison. A reference sequence may be a subset of or the entirety of a specified sequence; for example, a segment of a full-length cDNA or gene sequence, or the complete cDNA or gene sequence. For polypeptides, the length of the reference polypeptide sequence will generally be at least about 16 amino acids, preferably at least about 20 amino acids, more preferably at least about 25 amino acids, and even more preferably about 35 amino acids, about 50 amino acids, or about 100 amino acids. For nucleic acids, the length of the reference nucleic acid sequence will generally be at least about 50 nucleotides, preferably at least about 60 nucleotides, more preferably at least about 75 nucleotides, and even more preferably about 100 nucleotides or about 300 nucleotides or any integer thereabout or therebetween.
Nucleic acid molecules useful in the methods of the invention include any nucleic acid molecule that encodes a polypeptide of the invention or a fragment thereof. Such nucleic acid molecules need not be 100% identical with an endogenous nucleic acid sequence, but will typically exhibit substantial identity. Polynucleotides having “substantial identity” to an endogenous sequence are typically capable of hybridizing with at least one strand of a double-stranded nucleic acid molecule. Nucleic acid molecules useful in the methods of the invention include any nucleic acid molecule that encodes a polypeptide of the invention or a fragment thereof. Such nucleic acid molecules need not be 100% identical with an endogenous nucleic acid sequence, but will typically exhibit substantial identity. Polynucleotides having “substantial identity” to an endogenous sequence are typically capable of hybridizing with at least one strand of a double-stranded nucleic acid molecule. By “hybridize” is meant pair to form a double-stranded molecule between complementary polynucleotide sequences (e.g., a gene described herein), or portions thereof, under various conditions of stringency. (See, e.g., Wahl, G. M. and S. L. Berger (1987) Methods Enzymol. 152:399; Kimmel, A. R. (1987) Methods Enzymol. 152:507).
For example, stringent salt concentration will ordinarily be less than about 750 mM NaCl and 75 mM trisodium citrate, preferably less than about 500 mM NaCl and 50 mM trisodium citrate, and more preferably less than about 250 mM NaCl and 25 mM trisodium citrate. Low stringency hybridization can be obtained in the absence of organic solvent, e.g., formamide, while high stringency hybridization can be obtained in the presence of at least about 35% formamide, and more preferably at least about 50% formamide. Stringent temperature conditions will ordinarily include temperatures of at least about 30° C., more preferably of at least about 37° C., and most preferably of at least about 42° C. Varying additional parameters, such as hybridization time, the concentration of detergent, e.g., sodium dodecyl sulfate (SDS), and the inclusion or exclusion of carrier DNA, are well known to those skilled in the art. Various levels of stringency are accomplished by combining these various conditions as needed. In a preferred: embodiment, hybridization will occur at 30° C. in 750 mM NaCl, 75 mM trisodium citrate, and 1% SDS. In a more preferred embodiment, hybridization will occur at 37° C. in 500 mM NaCl, 50 mM trisodium citrate, 1% SDS, 35% formamide, and 100 μg/ml denatured salmon sperm DNA (ssDNA). In a most preferred embodiment, hybridization will occur at 42° C. in 250 mM NaCl, 25 mM trisodium citrate, 1% SDS, 50% formamide, and 200 μg/ml ssDNA. Useful variations on these conditions will be readily apparent to those skilled in the art.
For most applications, washing steps that follow hybridization will also vary in stringency. Wash stringency conditions can be defined by salt concentration and by temperature. As above, wash stringency can be increased by decreasing salt concentration or by increasing temperature. For example, stringent salt concentration for the wash steps will preferably be less than about 30 mM NaCl and 3 mM trisodium citrate, and most preferably less than about 15 mM NaCl and 1.5 mM trisodium citrate. Stringent temperature conditions for the wash steps will ordinarily include a temperature of at least about 25° C., more preferably of at least about 42° C., and even more preferably of at least about 68° C. In a preferred embodiment, wash steps will occur at 25° C. in 30 mM NaCl, 3 mM trisodium citrate, and 0.1% SDS. In a more preferred embodiment, wash steps will occur at 42 C in 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS. In a more preferred embodiment, wash steps will occur at 68° C. in 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS. Additional variations on these conditions will be readily apparent to those skilled in the art. Hybridization techniques are well known to those skilled in the art and are described, for example, in Benton and Davis (Science 196:180, 1977); Grunstein and Hogness (Proc. Natl. Acad. Sci., USA 72:3961, 1975); Ausubel et al. (Current Protocols in Molecular Biology, Wiley Interscience, New York, 2001); Berger and Kimmel (Guide to Molecular Cloning Techniques, 1987, Academic Press, New York); and Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York.
By “substantially identical” is meant a polypeptide or nucleic acid molecule exhibiting at least 50% identity to a reference amino acid sequence (for example, any one of the amino acid sequences described herein) or nucleic acid sequence (for example, any one of the nucleic acid sequences described herein). Preferably, such a sequence is at least 60%, more preferably 80% or 85%, and more preferably 90%, 95% or even 99% identical at the amino acid level or nucleic acid to the sequence used for comparison.
Sequence identity is typically measured using sequence analysis software (for example, Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705, BLAST, BESTFIT, GAP, or PILEUP/PRETTYBOX programs). Such software matches identical or similar sequences by assigning degrees of homology to various substitutions, deletions, and/or other modifications. Conservative substitutions typically include substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine. In an exemplary approach to determining the degree of identity, a BLAST program may be used, with a probability score between e−3 and e−100 indicating a closely related sequence.
By “subject” is meant a mammal, including, but not limited to, a human or non-human mammal, such as a bovine, equine, canine, ovine, or feline.
By “transcriptomic profile” is meant information about the expression levels of RNAs. In some embodiments, a transcriptomic profile includes expression profiling or splice variant analysis. In other embodiments, the transcriptomic profile includes information relating to mRNAs, tRNAs, of sRNAs. A transcriptomic profile may include all or a portion of the genes expressed in a cell. A transcriptomic profile may include alterations in gene expression relative to a reference cell, wherein the alteration can be the presence of a transcript not observed in the reference cell or the absence of a transcript that is present in the reference cell. The transcriptomic profile may include alterations in the amount of one or more transcripts present in a cell compared to a reference cell. A reference cell is a normal, non-cancerous cell derived from the same tissue the cancerous cell is derived from.
Unless specifically stated or obvious from context, as used herein, the term “or” is understood to be inclusive. Unless specifically stated or obvious from context, as used herein, the terms “a”, “an”, and “the” are understood to be singular or plural.
Unless specifically stated or obvious from context, as used herein, the term “about” is understood as within a range of normal tolerance in the art, for example within 2 standard deviations of the mean. About can be understood as within 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.1%, 0.05%, or 0.01% of the stated value. Unless otherwise clear from context, all numerical values provided herein are modified by the term about.
Ranges provided herein are understood to be shorthand for all of the values within the range. For example, a range of 1 to 50 is understood to include any number, combination of numbers, or sub-range from the group consisting 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50.
The recitation of a listing of chemical groups in any definition of a variable herein includes definitions of that variable as any single group or combination of listed groups. The recitation of an embodiment for a variable or aspect herein includes that embodiment as any single embodiment or in combination with any other embodiments or portions thereof.
Any compositions or methods provided herein can be combined with one or more of any of the other compositions and methods provided herein.
SCAP (FDR<1e-79, highlighted in bold)
The invention features compositions and methods that are useful for determining the metastatic potential of cancer cell lines, as well as an interactive metastasis map featuring information that defines such cancer cell lines (e.g., their propensity to metastasize, organs where metastasis is typically observed, sequence data, genomic data, transcriptomic data, proteomic data, metabolomic data, drug sensitivity data, CRISPR knockout viability data, shRNA knockdown data, and annotated data relating to the cell of origin).
The invention is based, at least in part, on the discovery that a cancer cell's metastatic potential can be ascertained by systemically delivering the cell, in a modified form to allow detection, to a non-human subject. Accordingly, the invention provides compositions and methods for determining the metastatic potential of a plurality of cancer cell lines in vivo. These methods and compositions have been used to generate a map of the metastatic properties of individual cell lines, and this Metastasis Map (or MetMap) represents a novel and important tool for the study of metastatic cancer.
Nucleic Acid Constructs Methods and compositions are provided herein for tracking cancer cells administered to a non-human subject in vivo. Compositions of the present invention can be used to modify cancer cells prior to administration to the subject so that the cells express identifying markers. Thus, one aspect of the present disclosure provides a nucleic acid construct comprising a barcode, a first detectable marker, and a second detectable marker. The first detectable marker allows in vivo imaging of the cells after administration to a non-human subject. In some embodiments, the first detectable marker is a bioluminescent marker, such as a luciferase. Luciferases, unlike fluorescent proteins, do not require an external light source to generate a signal, which makes this family of bioluminescent markers suitable for in vivo imaging.
The second detectable marker allows for cell selection, sorting, or both. Markers suitable for cell selection and/or sorting include, but are not limited to, fluorescent proteins. In some embodiments, the second marker is a green, red, blue, or yellow fluorescent protein (GFP, RFP, BFP, or YFP, respectively). In some embodiments, the second marker is mCherry. In some embodiments, the second detectable marker comprises an epitope to which an antibody specifically binds. In some embodiments, the antibody that specifically binds to the epitope is labeled.
In some embodiments of the present invention, the nucleic acid construct encodes a barcode but no detectable markers. In some embodiments, other selectable markers (e.g., antibiotic resistance genes) are encoded in the nucleic acid construct to enable efficient selection of transformed or transduced cells. In some embodiments, a surface protein on the cancer cell can be used to isolate or detect the cancer cell. In some embodiments, the surface protein comprises an epitope to which an antibody can specifically bind and mediate isolation of the cancer cell. In some embodiments, the antibody is labeled. In some embodiments, the label is a fluorescent or other visually detectable label.
The barcode between 10 and 30 nucleotides. For example, the barcode contemplated herein may comprise 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides. The barcodes are designed to reduce or eliminate nonspecific binding to the cancer cell's nucleic acid molecules (i.e., genomic DNA, RNA, etc.). In some embodiments, the barcode comprises a nucleic acid sequence that is not substantially complementary to any endogenous nucleic acid sequence present in the cancer cell. In some embodiments, the barcode is designed to diverge from perfect complementarity from an endogenous nucleic acid sequence present in the cancer cell by 2, 3, or 4 or more nucleotides. In some embodiments, the barcode is designed so that the most complementary sequences in an endogenous nucleic acid molecule present in the cancer cell have a conformation that disfavors barcode binding to the endogenous nucleic acid molecule.
In some embodiments, the nucleic acid construct encoding the barcode and markers is a single expression cassette. Thus, the expression of each encoded element is correlated with the expression of the other elements. In some embodiments, the nucleic acid construct is a vector (e.g., recombinant plasmids). The term “recombinant vector” includes a vector (e.g., plasmid, phage, phasmid, virus, cosmid, fosmid, or other purified nucleic acid vector) that has been altered, modified or engineered such that it contains greater, fewer or different nucleic acid sequences than those included in the native or natural nucleic acid molecule from which the recombinant vector was derived. For example, a recombinant vector may include a nucleotide sequence encoding a polypeptide (i.e., the markers) and/or a polynucleotide (i.e., the barcode), or fragment thereof, operatively linked to regulatory sequences such as promoter sequences, terminator sequences, long terminal repeats, untranslated regions, and the like, as defined herein. Recombinant expression vectors allow for expression of the genes or nucleic acids included in them.
In some embodiments of the present disclosure, one or more nucleic acid constructs having a nucleotide sequence encoding one or more of the polypeptides or polynucleotides described herein are operatively linked to one or more regulatory sequences that can integrate the nucleic acid construct into a cancer cell genome. In some embodiments, cancer cells are stably transfected or transduced by the introduced nucleic acid construct. Modified cells can be selected, for example, by detecting the first or second marker. In some embodiments, barcode, and at least one of the marker gene are encoded in different nucleic acid constructs, and will be introduced into the same cell by co-transfection or co-transduction. Any additional elements needed for optimal synthesis of polynucleotides or polypeptides described herein would be apparent to one of ordinary skill in the art.
In some embodiments, the nucleic acid construct comprises at least one adapter nucleic acid sequence that has a sequence complementary to that of a nucleic acid molecule used in a downstream sequencing reaction. For example, the adapters used in some embodiments are designed to be compatible with next-generation sequencing including, but not limited to, Ion Torrent and MiSeq platforms. In some embodiments, the length of the adapter is between 8 and 20 nucleotides. In some embodiments, the length of the adapter is 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides. The adapter's sequence is designed to reduce or eliminate nonspecific binding of the adapter to an endogenous nucleic acid molecule. In some embodiments, the adapter is designed to have a sequence that is not substantially complementary to any nucleic acid sequence present in an endogenous nucleic acid molecule. In some embodiments, the adapter is designed to diverge from perfect complementarity with the endogenous nucleic acid molecule by 2, 3, or 4 or more nucleotides.
One aspect of the present disclosure provides a method for characterizing the metastatic potential of a mixture of cancer cell lines in vivo. In one embodiment, the method comprises modifying the cells to comprise a nucleic acid construct encoding a barcode, a first detectable marker, and a second detectable marker, such as the constructs described above. Each distinct cell line in the mixture of cell lines will be modified to express a unique barcode, and each barcode will only be used with a single cell line. The modified cells are systemically administered to a non-human subject and allowed to propagate in the non-human subject. After a period of time, the non-human subject is imaged to detect at least one of the markers encoded in the nucleic acid construct, which allows the location of the cells in the body of the non-human subject to be determined.
The non-human subject can be any non-human mammal. In some embodiments, the non-human mammal is a mouse, rat, rabbit, pig, goat, or other domesticated mammal. In some embodiments, the non-human animal is immunocompromised. In some embodiments, the non-human subject is an immunocompromised mouse, such as a NOD scid gamma (NSG) mouse.
Methods of introducing exogenous nucleic acid molecules into a cell are known in the art. For example, eukaryotic cells can take up nucleic acid molecules from the environment via transfection (e.g., calcium phosphate-mediated transfection). Transfection does not employ a virus or viral vector for introducing the exogenous nucleic acid into the recipient cell. Stable transfection of a eukaryotic cell comprises integration into the recipient cell's genome of the transfected nucleic acid, which can then be inherited by the recipient cell's progeny.
Eukaryotic cells (e.g., human cancer cells) can be modified via transduction, in which a virus or viral vector stably introduces an exogenous nucleic acid molecule to the recipient cell. Eukaryotic transduction delivery systems are known in the art. Transduction of most cell types can be accomplished with retroviral, lentiviral, adenoviral, adeno-associated, and avian virus systems, and such systems are well-known in the art. In some embodiments of the present disclosure, the viral vector system is a lentiviral system.
In some embodiments, the viral vectors are assembled or packaged in a packaging cell prior to contacting the intended recipient cell. In some embodiments, the vector system is a self-inactivating system, wherein the viral vector is assembled in a packaging cell, but after contacting the recipient cell, the viral vector is not able to be produced in the recipient cell. In some embodiments, the first detectable marker allows in vivo imaging of the cells after delivery to a non-human subject. In some embodiments, the first detectable marker is a bioluminescent marker, such as a luciferase. Luciferases, unlike fluorescent proteins, do not require an external light source to generate a signal, which makes this family of bioluminescent markers suitable for in vivo imaging. In some embodiments, luciferin or an analogous substrate is administered to the non-human subject, which is acted upon by the luciferase to generate bioluminescence. In some embodiments, in vivo imaging comprises bioluminescence imaging. Many imaging methodologies are known in the art that can be utilized in the methods presented herein. Examples of such methodologies include, but are not limited to, those disclosed in U.S. Publication Nos. 20180160099, 20170220733, 20170212986, 20170038574, 20160370295, 20160202185, 20140333750, 20140326922, 20140063194, and 20140038201, the contents of each are incorporated herein by reference in their entirety.
The second detectable marker is used to isolate and/or sort modified cancer cells from other cells. A technique for isolating or sorting cancer cells comprising a nucleic acid construct as described herein is flow cytometry. In fluorescence activated cell sorting
(FACS), a fluorescent marker is used to distinguish modified from unmodified cells. In some embodiments, the second marker is a fluorescent polypeptide suitable for cell sorting. In some embodiments, the second marker is a polypeptide having an epitope that is specifically bound by a fluorescently labelled antibody. A gating strategy appropriate for the cells expressing the marker (or otherwise labeled) is used to segregate the cells. For example, modified cancer cells expressing a fluorescent protein (e.g., GFP or mCherry) can be separated from other cells in a sample by using a corresponding gating strategy. In one embodiment, a GFP gating strategy is employed. In some embodiments, an mCherry gating strategy is used. Other methods of isolating cells are known in the art and may be used to segregate modified cancer cells from non-modified cells and from cells derived from a non-human subject.
To determine from which cell line a particular modified cancer cell is derived from, the barcode within the modified cell is sequenced. Sequencing of the barcodes within the modified cancer cells is accomplished using a next-generation sequencing platform such as IonTorrent or MiSeq, but other platforms are contemplated herein. Additionally, single cell analysis (e.g., single cell RNA sequencing (RNA-seq)) can be used to determine barcode sequences and identify the cell lines from which the modified cancer cells present at a location or in a sample derived. RNA-seq may also be used to generate transcriptome data for the modified cancer cells.
The abundance of modified cancer cells present in a metastatic lesion is indicative of the metastatic potential of the cell lines from which the cells are derived. In some embodiments, the abundance of modified cancer cells is determined during cell isolation and/or cell sorting. In some embodiments, the modified cells are quantitated during next-generation sequencing or RNA-seq. Other methods of quantitating cells in a sample or tissue are known in the art.
Generating Metastasis maps
Another aspect of the present disclosure provides methods for generating a metastasis map of cancer cell lines. These methods include systemically delivering a mixture of cells derived from cancer lines to a non-human animal, wherein the cells are modified to comprise a vector encoding a barcode or a vector encoding a barcode and at least one marker as described above. The method for generating the map further involves detecting and quantitating the expression of the barcode, and these steps are also described above. The data derived from quantitating the expression of the barcode is then compiled in a database and associated with the cell's identity (i.e., identifying the cell line from which the cell derived).
The metastasis map may also include a genomic, transcriptomic, or proteomic profiles of the cell line. In some embodiments, the metastasis map also includes drug sensitivity data, CRISPR knockout viability data, shRNA knockdown data, annotated cell line data, and/or a metabolite profile of the cell line. The data that constitutes the profiles may be generated de novo using methods known in the art.
The practice of the present invention employs, unless otherwise indicated, conventional techniques of molecular biology (including recombinant techniques), microbiology, cell biology, biochemistry and immunology, which are well within the purview of the skilled artisan. Such techniques are explained fully in the literature, such as, “Molecular Cloning: A Laboratory Manual”, second edition (Sambrook, 1989); “Oligonucleotide Synthesis” (Gait, 1984); “Animal Cell Culture” (Freshney, 1987); “Methods in Enzymology” “Handbook of Experimental Immunology” (Weir, 1996); “Gene Transfer Vectors for Mammalian Cells” (Miller and Calos, 1987); “Current Protocols in Molecular Biology” (Ausubel, 1987); “PCR: The Polymerase Chain Reaction”, (Mullis, 1994); “Current Protocols in Immunology” (Coligan, 1991). These techniques are applicable to the production of the polynucleotides and polypeptides of the invention, and, as such, may be considered in making and practicing the invention. Particularly useful techniques for particular embodiments will be discussed in the sections that follow.
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the assay, screening, and therapeutic methods of the invention, and are not intended to limit the scope of what the inventors regard as their invention.
Methods of monitoring metastasis are needed to better understand similarities and differences between different types of cancer. To test the feasibility and reliability of in vivo barcoding to monitor metastasis, a pilot study of four breast cell lines was performed (
The transcribing barcode design allowed co-capturing of cancer barcodes and cancer transcriptomes of metastases from bulk RNA-Seq analysis, and a workflow was developed that analyzed both (
Eight barcoded cell lines (the four cell lines modified to express either GFP or mCheRry) were injected as a pool into the left ventricle of recipient mice. Bioluminescence imaging (BLI) revealed metastatic lesions throughout the body (
The results observed for barcodes quantitated by bulk RNA-Seq were validated by two methods: quantitative RT-PCR and single cell RNA sequencing (
Having validated the method for in vivo barcoding to monitor metastasis, a larger subset of breast cancer cells was evaluated for metastatic behaviors. Principal component analysis (PCA) of expression profiles stratified the breast cancer cell lines in the Cancer Cell Line Encyclopedia (CCLE) collection into 3 categories: (1) expression initiated with HS (termed HS cells), displaying fibroblast morphology and characteristics, (2) enriched in luminal subtype, and (3) enriched in basal subtype (
Cell lines were individually barcoded, pooled at equal numbers, and injected into mice (
To quantify the cell line metastatic potentials on an absolute scale, the cell count for each cell line in different organs was inferred based on the total number of isolated cancer cells and their compositions as measured by barcode abundance. This metric was then used to compare cell lines across the three pools analyzed (pilot, group 1, and group 2) (
The analysis characterized some cell lines as pan-metastatic. For example, four cell lines, MDAMB231, HCC1187, JIMT1, and HCC1806 displayed pan-metastatic behaviors. Some showed a propensity for liver, lung, bone, or brain, and others were not metastatic (
Having demonstrated feasibility, the metastatic potential was mapped for 500 cancer cell lines spanning 21 cancer types to generate a pan-cancer Metastasis Map (MetMap). To facilitate high throughput profiling, cell lines were used that had been barcoded for use in the PRISM method, which was previously developed for in vitro testing of drug sensitivities (Yu et al., Nat. Biotechnol. 34: 419-23 (2016) the contents of each are hereby incorporated by reference in their entirety).
PRISM lines were pooled based on their in vitro doubling speed across mixed lineages, with 25 cell lines per pool. Because PRISM barcoded cells did not express GFP or luciferase, introducing labeling markers for cancer cell purification was analyzed to determine if it was critical for the method. One PRISM pool (of 25 cell lines) that contained the JIMT1 cell line was transformed with a GFP-luciferase vector, and cells were sorted by GFP expression (
The GFP-labeled and unlabeled cell pools were subjected to the same animal workflow, tissue dissociation, and mouse cell depletion. The GFP-labeled group was further sorted to purify cancer cells. Isolated GFP-labeled cancer cells or tissue lysates from the unlabeled cell lines were subjected to barcode amplification and sequencing. A comparison of the two experiments showed highly concordant results. Although the initial barcode distribution of the pre-injected pools had altered (
MetMap (
The resulting metastasis map (MetMap) is the largest ever generated (
It was also noted that the intracardiac injection approach allowed for the evaluation of far more cell lines in vivo compared to traditional subcutaneous (subQ) injection (
To assess if MetMap reflects the metastatic behavior of various cancers the metastatic potential was compared with clinical annotations of cell lines. Significant association with (1) cancer lineage, (2) where the cell line was derived from, (3) patient age, but not with gender or ethnicity were found (
Cell lines derived from metastases showed higher metastatic potential than lines derived from primary tumors. Interestingly, multiple cell lines derived from primary tumors known to give rise to metastases in patients were metastatic as xenografts (
The association with aging of patients was unexpected, where a gradual decline of metastatic potential was observed as the age of cancer patient increased (
Perhaps most importantly, extensive variation in metastatic potential was observed within individual lineages, thereby making it possible to search for associations between metastasis propensity and genomic features of the tumors. Of note, metastatic potential was not simply explained by cell line proliferation rate or mutational burden (
To investigate mechanisms involved in metastasis, efforts were focused on breast cancer and its potential for brain metastasis (see
Genomic data available for each of the cell lines was used to search for evidence of DNA-level mutations associated with brain metastasis. At the level of single nucleotide variant (SNV) mutations, Phosphatidylinositol-4,5-Bisphosphate 3-Kinase (PIK3CA) mutations were significantly associated with metastasis. 4 of 7 metastatic lines harbored PIK3CA mutation, compared to 0 of 14 non- or weakly-metastatic lines (p=2.3e-06, FDR =0.01,
Unbiased analysis of the DNA copy number landscape similarly pointed to an association with lipid metabolism. An association was observed between metastatic potential and deletions of chromosome 8p12-8p21.2 (p=7.3e-06, FDR=0.0017,
To ascertain the clinical relevance of these associations, clinical tumor datasets of breast cancer, among which EMC-MSK contains organ-specific metastasis relapse information for each patient were analyzed (
To assess PI3K activity in these clinical cohorts, we utilized two PI3K-response signatures, one generated with PIK3CA mutant overexpression, and the other with PI3K-inhibitor treatment. Although the gene identities overlapped little between the two signatures, strong co-regulated patterns were observed in patient tumors (
Consistent with associations at the genetic level, expression analysis similarly showed an enrichment of a PI3K activation signature in the brain metastatic cell lines (
Transcriptomes of the breast cancer cell lines were analyzed to detect associations with brain metastasis. For this analysis, gene expression profiles of cell lines growing in vitro were compared to their profiles in in vivo metastatic lesions (see
RNA-Seq was used to characterize the transcriptomes, and this protocol captured cancer cell compositions and averaged in vivo transcriptomes of metastases from cell line pools in the breast cancer cohort study. To understand what metastases the transcriptomes encoded, differential expression analysis was performed on the in vivo transcriptomes to cells in vitro. To properly account for the different cell line compositions in each metastasis, a composite in vitro transcriptome was modeled using the barcode composition and single cell line in vitro transcriptomes and then compared to the in vivo results (
In the pilot group experiments, MDAMB231 dominated lung, liver, kidney, and bone metastases in most samples (
Having confirmed the validity of these profiles, pathway enrichment analysis was performed to query consensus programs that the differential genes encode at the 5 metastasis sites (
To determine if a metabolite profile paralleled the gene expression profiles associated with brain metastatic potential, the abundance of 226 metabolites was analyzed across the breast cancer cell lines (Barretina et al.). As predicted from mRNA profiling, upregulation of cholesterol species in highly brain metastatic cells was observed (
In contrast, global downregulation was observed for triglycerides (triacylglycerols, TAGs) in brain metastatic cells (
To further investigate the functional significance of a lipid metabolic profile of cells with brain metastatic potential, genome-wide CRISPR/Cas9 viability screening data was analyzed to identify vulnerabilities associated with the brain-metastatic state (Meyers et al., Nat. Genet., 49: 1779-84 (2017), the contents of which are incorporated herein by reference in their entirety). Remarkably, SREBF1 was the top correlated dependency (i.e., cancer cells rely on SREBF1 to switch to a brain-metastatic state in vitro) for brain metastasis (p=5.9e-8, FDR=0.001,
SREBF1 is a pivotal transcription factor that mediates lipid synthesis downstream of PI3K pathway. To understand if SREBF1 confers the lipid state observed in brain metastatic cells, lipidomics were performed after knocking-out SREBF1 in brain metastatic cell lines JIMT1 (PIK3CA-mut) and HCC1806 (8p-loss). SREBF1 knock-out (KO) resulted in a dramatic shift in intracellular lipid content (
Given the repeated observation of lipid metabolism being associated with brain metastatic potential, the functional impact of perturbing the pathway on brain metastasis formation was assessed. Towards this goal, pooled in vivo CRISPR screen of 29 gene candidates in brain metastatic growth were performed using the JIMT1 model (
To assess how it compared to systemic metastasis, an intracardiac injection assay was performed, focusing on SREBF1. The most dramatic phenotype was that of brain metastasis, where SREBF1-KO cells showed a 196-fold reduction in brain metastasis compared to WT controls (
To determine the generality of the SREBF1 requirement for breast cancer growth in the brain, it was knocked out in additional brain metastatic lines including HCC1954, MDAMB231 and HCC1806. As with JIMT1, a significant inhibition in brain metastatic growth was also observed in these lines, although the magnitude and duration of growth inhibition varied (
This restoration of growth was not explained by escape from genome-editing, as brain metastases at the end of the experiment had evidence of editing at the SREBF1 locus (
The present disclosure describes MetMap as a new large-scale in vivo characterization of human cancer cell lines that adds a missing dimension to in vitro studies. The MetMap resource currently has metastasis profiles of 125 cell lines spanning 22 tumor types—over an order of magnitude more than was previously available. Ideally, all available cancer cell lines would be characterized for their metastatic potential, thus creating an even larger repertoire of models for exploration of metastasis mechanisms. A limitation of the use of human cell lines for such experiments is that they require the use of immunodeficient mice for in vivo characterization, and the extent to which the immune system plays an important role in mediating organ-specific patterns of metastasis remains to be determined (Topalian et al., Cell 161: 185-86 (2015), the contents of which are incorporated herein by reference in their entirety).
Multiple lines of experimental and clinical evidence pointed to the role of lipid metabolism in governing the ability of cells to survive in the brain microenvironment. The importance of lipid metabolism in cancer has been recently highlighted by a number of studies (Pascual et al., Nature 541: 41-45 (2017); Zhang et al., Cancer Discov. 8: 1006-25 (2018); Nieman et al., Nat. Med. 17: 1498-1503 (2011), the contents of each are incorporated herein by reference in their entirety), but its role in brain metastasis has not been previously recognized. Particularly intriguing is the notion that interfering with lipid or cholesterol metabolism might abrogate metastatic growth in the brain. The development of brain-penetrant inhibitors of this pathway would allow for this hypothesis to be tested pharmacologically. More generally, this disclosure highlights the complex interplay between cancer cell survival and metabolic states that can vary widely from organ to organ. Exploiting such tumor microenvironmental differences may prove useful as a therapeutic strategy to combat cancer.
The results reported herein above were obtained using the following methods and materials.
All breast cell lines were obtained from CCLE and cultured under the recommended conditions. Cell line identities were confirmed by SNP fingerprinting as well as RNA-Seq, in comparison to the CCLE results (portals.broadinstitute.org/ccle). The Fluorescence-Luciferase-Barcode (FLB) construct was engineered using the FUW lentiviral vector backbone (a gift from David Baltimore, Addgene plasmid # 14882). Barcodes of 26 nucleotide-long were designed using barcode_generator.py (ver 2.8, comailab.genomecenter.ucdavis.edu/index.php/), and cloned into the landing pad c-terminal to the TGA stop codon of Fluorescence-Luciferase using Gibson assembly (New England Biolabs). Lentivirus preparation and cell infection were performed according to published protocols available at http://www.broadinstitute.org/rnai. Infected cells were subjected to FACS with a fixed gate for GFP or mCherry, using Sony SH4800 sorter.
Animal work was performed in accordance with a protocol approved by the Broad Institute Institutional Animal Care and Use Committee (IACUC). NOD scid gamma (NSG) female mice (The Jackson Laboratory) of 5-6 weeks were used. Cancer cells were suspended in PBS+0.4% BSA, and 100 μl of cell suspensions were injected into the left ventricle of anesthetized mice (ketamine 100 mg/kg; xylazine 10 mg/kg). In vivo metastasis progression was monitored via real-time BLI using the IVIS SpectrumCT Imaging System (PerkinElmer), on a weekly basis. Mice were anesthetized with inhaling isoflurane, injected intraperitonially D-Luciferin (150 mg/kg), and imaged with auto exposure setting in prone and supine positions. At the end point, ex vivo BLI was performed by submerging the excised organs in DMEM/F12 media (Thermo Fisher Scientific) containing D-Luciferin for 10 min and imaged with auto exposure setting. BLI analysis was performed using Living Image software (ver 4.5, PerkinElmer). In the case of breast cancer cohort study (pilot, group 1, group 2 in
Tissue Processing and Cancer Cell Isolation from Organs
Organs including brain, lung, liver, kidney were dissociated using gentleMACS Octo Dissociator with Heaters (Miltenyi Biotec). Bones (from both hind limbs) were chopped into fine pieces and incubated in the dissociation buffer with vigorous shaking. The dissociated cell suspensions were filtered using 100 μm filters, and washed with DMEM/F12 twice. Cell suspensions were then washed with staining buffer (PBS+2 mM EDTA+0.5% BSA), and incubated with mouse cell depletion beads according to the instructions (Miltenyi Biotec). Cell suspensions were subjected to negative selection using autoMACS Pro Separator (Miltenyi Biotec) to deplete mouse stroma. Brains were subjected to an additional myelin debri depletion step using myelin removal beads II (Miltenyi Biotec). The resultant cell suspensions were then subjected to FACS using Sony SH4800 sorter, with the fixed gate for GFP or mCherry. DAPI staining was used to exclude dead cells. For bulk RNA-Seq, cells were sorted to a single tube in PBS+0.4% BSA+RNasin Plus RNase Inhibitor (Promega), centrifuged at 1500 rpm×10 min, and cell pellets were frozen in −80C for downstream use. For single cell RNA-Seq, single cells were sorted into 96-well plates containing cold TCL buffer (Qiagen) containing 1% b-mercaptoethanol, snap frozen on dry ice, and then stored at -80° C. 90 single cells were sorted per plate, the rest wells were used for negative and positive controls.
Individual cell lines, cell line pools prior to injection, and cells isolated from metastases were subjected to RNA-Seq. RNA extraction was performed using Quick-RNA MicroPrep according to instructions (Zymo Research). RNA was quantified using RNA 6000 Pico Kit on a 2100 Bioanalyzer (Agilent). RNA samples from cell numbers lower than 500 were not measured but all were used as input for library preparation. cDNA was synthesized using Clontech SmartSeq v4 reagents from up to 2 ng RNA input according to manufacturer's instructions (Clontech). Full length cDNA was fragmented to a mean size of 150 bp with a Covaris M220 ultrasonicator and Illumina libraries were prepared from 2 ng of sheared cDNA using Rubicon Genomics Thruplex DNAseq reagents according to manufacturer's protocol. The finished dsDNA libraries were quantified by Qubit fluorometer, Agilent TapeStation 2200, and RT-qPCR using the Kapa Biosystems library quantification kit. Uniquely indexed libraries were pooled in equimolar ratios and sequenced on Illumina NextSeq500 runs with paired-end 75bp reads at the Dana-Farber Cancer Institute Molecular Biology Core Facilities. RT-qPCR quantification of barcodes was performed using Maxima First Strand cDNA Synthesis Kit, Taqman Fast Advanced Master Mix, custom synthesized Taqman probes, and QuantStudio 6 PCR System (ThermoFisher Scientific). Single cell RNA-Seq was performed as previously described (Ramaswamy, S. et al., Nat. Genet. 33, 49-54 (2003), the contents therein are hereby incorporated by reference in their entirety).
Scalable Metastatic Potential Profiling with Barcoded Cell Line Pools.
To enable profiling of in vivo metastatic potential in a scalable manner, a barcoding vector was designed that contained (1) a fluorescence protein (GFP or mCherry) for cell sorting, (2) a luciferase for real-time in vivo imaging, and (3) a barcode for cell line identity (
The transcribing barcode design allows co-capturing cancer barcodes and cancer transcriptomes of metastases from bulk RNA-Seq, a workflow and analysis method was developed that readout both (
To validate RNA-Seq-quantitated barcode results from the pilot study, RT-qPCR was performed using Taqman assays against the barcodes. An examination of individual barcoded lines showed that the Taqman probes were highly specific to the engineered barcodes and there was no cross detection (
Having validated the feasibility of in vivo barcoding approach, efforts were focused on mapping the metastatic behaviors of basal-like breast cancers from Cancer Cell Line Encyclopedia (CCLE), a breast cancer subtype that displays substantial heterogeneity in metastasis patterns from patient to patient. Principal component analysis (PCA) of expression profiles stratified breast cancer cell lines into 3 categories: (1) one group all initiated with HS and displaying fibroblast characteristics, (2) one enriched in luminal subtype, and (3) one enriched in basal subtype (
To quantify the cell line metastatic potentials on an absolute scale, the cell number was inferred for each cell line based on the total cancer cell counts and their barcode-quantitated compositions from each organ. This metric was used to compare cell lines across the 3 pool studies. For data visualization, a petal plot was developed that encodes 3 information: (1) metastatic potential as quantified by inferred cell number, (2) its confidence interval that estimates animal variability, (3) and penetrance—percentage of animals in the cohort that the particular cell line was detected (
Drafting MetMap with PRISM Cell Line Pools
Expansion of metastatic potential mapping beyond breast cancer was attempted as was drafting a comprehensive MetMap for all solid tumor types. Focusing on one cancer type at a time would result in custom pooling and different group sizing, which was neither scalable nor standardizable. For pan-cancer characterization, it also didn't make sense to perform bulk RNA-Seq on mixed cancer types, as lineage would be a strong confounder. In this case, readout at DNA level would be sufficient. PRISM, a barcoded cell line mixture approach developed for high-throughput in vitro drug screen, was used. It was asked whether the PRISM platform could be applied for the in vivo MetMap purpose.
As part of PRISM profiling, cell lines were pooled based on their in vitro doubling time across mixed lineages, with a size of 25 lines per pool. PRISM barcoded cells did not harbor GFP or luciferase, thus in the first study, it was addressed whether it was critical to introduce the labeling markers for cancer cell purification. One PRISM pool (of 25 cell lines) was chosen that contained JIMT1, labeled with GFP-luciferase vector, and then sorted for GFP+ cells (
The positive control JIMT1 was pan-metastatic as expected. Importantly, cell lines such as MELHO, MHHES1 and PC14 substantially dropped in their initial abundance after GFP labeling, yet they gained similar in vivo enrichment as in the non-labeled experiment. These results suggested that we could quantitatively detect barcodes from crude lysates without the need of pure cancer cell isolation from PRISM.
The simplified workflow using PRISM pools for pan-cancer mapping was employed, and a total of 503 cancer cell lines across 21 cancer types were profiled (
Analysis of In Vivo Metastasis Transcriptomes with Polyclonal Cell Lines
RNA-Seq co-captured cancer cell composition and averaged in vivo transcriptomes of metastases from cell line pools in the breast cancer cohort study. To understand what metastasis transcriptomes encoded, differential analysis was performed on the in vivo transcriptomes to cells in vitro. To properly account for the different cell line compositions in each metastasis, a composite in vitro transcriptome was modeled using the barcode composition and single cell line in vitro transcriptomes, and then compared to the actual in vivo results (
To assess whether such comparison identified genes relevant to metastasis, the top differentially expressed genes were inspected. Notably, MUCL1 (also termed small breast epithelial mucin, SBEM) and SCGB2A2 (also known as Mammaglobin, MGB1) were strongly induced in brain metastases as well as in other sites (
Since MDAMB231 is the most investigated cell line in breast cancer metastasis, it was asked whether genes previously identified and validated as metastasis mediators were induced in the in vivo transcriptomic profiles. In the pilot group experiments, MDAMB231 dominated lung, liver, kidney and bone metastases in most samples (
Having confirmed the validity of these profiles, pathway enrichment analysis was performed to query consensus programs that the differential genes encode in the 5 organ sites (
Barcode Quantification from RNA-Seq of Metastases
Since the RNA-Seq library preparation sheared the cDNA randomly into small pieces, demultiplexed RNA-Seq reads were mapped to the barcode references using Bowtie 2 (Langmead et al., Nat. Methods 9: 357-59 (2012), the contents of which are incorporated herein by reference in their entirety) local mode for barcode detection and quantification. Mapped reads were filtered with the criteria that reads (either 5′ or 3′) must cover over 50% of the barcodes from either end, and counted using samtools. Barcode percentage corresponding to cell composition was calculated for single cell lines, pre-injected cell mixtures, and in vivo metastasis samples.
For breast cohort study, metastatic potential of cell line j targeting organ i, was calculated as:
where ci is the total cancer cell number isolated from organ i and pj is the fractional proportion of cell line j estimated by barcode quantification, and n is the number of replicates of mice. To identify features that associate with brain metastatic potential, a 2-class comparison method was used (Ritchie et al., Nucleic Acids Res. 43: e47 (2015), the contents of which are incorporated herein by reference in their entirety). The analysis was performed on mutation, copy number, metabolite (available at https://portals.broadinstitute.org/ccle/), and CRISPR-gene dependency (CERES scores, available at https://depmap.org/portal/) separately. Copy number data were binarized using a cutoff of <=−1 (loss) and >=1 (gain).
Cancer Transcriptomic Analysis from RNA-Seq of Metastases
Potential mouse contaminating reads were removed by competitive mapping to the human/mouse hybrid genome using BBSplit (https://sourceforge.net/projects/bbmap/). Reads that uniquely mapped to the human genome were then used as input for mapping and gene-level counting with the RSEM package (Li et al., BMC Bioinformatics, 12: 323 (2011), the contents of which are incorporated herein by reference in their entirety). Gene count estimates were normalized using the TMM method (Robinson et al., Bioinformatics 26: 139-40 (2010), the contents of which are incorporated herein by reference in their entirety). For differential analysis, to properly account for the cancer cell composition differences in each in vivo sample, an in silico modeled in vitro mixture was generated first. For each in silico metastasis model, the estimated expression g of gene i is computed as a weighted average of the cell lines present in the corresponding in vivo sample:
ĝi=Σj=1Mgi,jpj, where gi,j is the baseline in vitro expression of gene i in cell line j and pj is the fractional proportion of cell line j in the in vivo sample, as estimated by barcode quantification, and M is the number of cell lines present in the in vivo sample. The in vivo and in silico counterpart were then compared using a paired design for each organ in voom-limma (Ritchie et al.). The three studies, pilot, group 1, and group 2, were analyzed separately. Overlap significance test of two-set or multi-set intersection was performed using cpsets function in the SuperExactTest package (Wang et al., Sci. Rep. 5: 16923 (2015), the contents of which are incorporated herein by reference in their entirety). Gene set enrichment analysis (GSEA) was performed using the GSEA-preranked method in GSEA package (Subramanian et al., Proc. Natl. Acad. Sci. USA 102: 15545-50 (2005), the contents of which are incorporated herein by reference in their entirety. ssGSEA signature projection was performed in GenePattern (genepattern.broadinstitute.org) (Barbie et al., Nature 462: 108-12 (2009), the contents of which are incorporated herein by reference in their entirety). Gene signature data sets were from MSigDB (software.broadinstitute.org/gsea/msigdb/).
SREBF1 ChIP-Seq peak data were from ENCODE (www.encodeproject.org/) (Consortium et al., Nature 489, 57-74 (2012), the contents of which are incorporated herein by reference in their entirety) and analyzed using ChIPseeker (Yu et al., Bioinformatics 31: 2382-83 (2015), the contents of which are incorporated herein by reference in their entirety).
All PRISM cell lines were initially obtained from CCLE. Cell lines were adapted to the same culture condition in pheno red-free RPMI1640 media (ThermoFisher Scientific), and barcoded as previously described (Yu et al., Nat. Biotechnol. 34: 419-23 (2016), the contents of which are incorporated herein by reference in their entirety). PRISM cell lines were pooled based on their in vitro doubling speed bins, at equal number, in the format of 25 lines per pool. Cells were thawed and recovered for 48 hours prior to in vivo injection. To form the large pool of 498 cell lines, 20 PRISM pools were mixed at equal total number right before injection.
Post in vivo experiments, organs were subjected to tissue dissociation, mouse stroma depletion, and the dissociated cell pellets were frozen in −80° C. as discussed above. The pellets (<=50 mg dry weight) were lysed in 200 μL freshly prepared lysis buffer (with proteinase K), heat digested at 60° C., and denatured at 95° C. for 10 minutes. 20 μL of lysates were used for barcode amplification per 100 μL PCR volume (multiple technical replicates per sample). PCR was performed using the following conditions: 95° C. for 3 minutes; 98° C. for 20 seconds, 57° C. for 15 seconds, 72° C. for 10 seconds (30 cycles); 72° C. for 5 minutes; 4° C. stop. PCR libraries (technical replicates combined) were quantified using 2100 Bioanalyzer (Agilent), normalized, pooled, and gel-purified using QIAquick Gel Extraction Kit (Qiagen). Purified samples were quantified, and 2 nM of libraries with 25% spike-in PhiX DNA were sequenced on Illumina MiSeq or HiSeq at 800 K/mm2 cluster density.
De-multiplexed sequencing reads were mapped to the barcode reference to generate a table of cell line barcode counts for each sample/condition. Library-size normalized read counts for each sample were used for calculation of relative metastatic potential. Relative metastatic potential of cell line j targeting organ i, rMi,j was defined as:
where ci,j is the read counts of cell line j from organ i, pjis the read counts of cell line j from pre-injected population, n (n=4˜5) is the number of replicate samples of mice, m (m=4˜5) is the number of replicates of pre-injected population. Confidence intervals reflecting animal variance were calculated using bootstrap.
CRISPR/Cas9 versions of cell lines were generated by infecting luciferized cells with Cas9-Blast lentivirus and selecting in 5 μg/mL Blasticidin for 10 days with continuous passaging until non-infected controls were killed. For pooled in vivo screen, JIMT1-Cas9 cells were infected with a CRISPR guide library (Table 3) in an arrayed-fashion in 6-well plates, and selected in 2 μg/mL Puromycin for 4 days. At this time, non-infected controls were killed, and no growth defect was observed in the perturbed cell lines. Post antibiotic selection, cells were pooled and subjected to intracranial injection at 6e4 cells per animal in 1 of PBS. This was equivalent to 1e3 cells per guide on average per animal. Intracranial growth was allowed for progression for 4 weeks, and brain tissues were processed adopting the workflow of PRISM in vivo assay, except that guides were amplified using primers targeting the guide vector. De-multiplexed sequencing reads were mapped to the guide reference to generate a table of barcode counts for each guide for each sample. Sequencing-depth was normalized using the upper quartile method and relative depletion was quantitated using a linear model in limma. For individual gene validation (
Protein lysates were prepared in RIPA Lysis Buffer (ThermoFisher Scientific)+cOmplete Mini EDTA-free Protease Inhibitor Cocktail (Roche). Western blot was performed using NuPAGE gel (ThermoFisher Scientific)+Wet/Tank Blotting (Bio-Rad)+Odyssey detection system (LI-COR). SREBF1 primary antibodies (14088-1-AP, Proteintech), GAPDH (D16H11) XP® Rabbit mAb (Cell Signaling), and IRDye® 800CW Goat anti-Mouse IgG, IRDye® 680RD Goat anti-Rabbit IgG secondary antibodies (LI-COR) were used.
JIMT1 luciferized cells were infected with Cas9-Blast lentivirus (Sanjana et al., Nat. Methods 11: 783-84 (2014), the contents of which are incorporated herein by reference in their entirety) and selected in Blasticidin (5 μg/mL) for 10 days with continuous passaging until non-infected controls were all killed. JIMT1-Cas9 cells were then subjected to lentiGuide-Puro virus infection that encode SREBF1-targeting (ACAGGGGTGGAGCTGAACTG) or non-targeting (CTCCGTTATGTGGCATGAGA) guides. Infected cells were selected in Blasticidin (5 μg/mL)+Puromycin (2 μg/mL) for 4 days until non-infected controls were all killed. Verification of knockout was confirmed by western blot 10 days after infection. Protein lysates were prepared in Cell Lysis Buffer (Cell Signaling) plus cOmplete Mini EDTA-free Protease Inhibitor Cocktail (Roche). Western blot was performed using NuPAGE gel (ThermoFisher Scientific) +iBlot 2 transfer (ThermoFisher Scientific) plus Odyssey detection system (LI-COR). SREBF1 primary antibodies (sc-17755, sc-365513, Santa Cruz), GAPDH (D16H11) XP® Rabbit mAb (Cell Signaling), and IRDye® 800CW Goat anti-Mouse IgG, IRDye® 680RD Goat anti-Rabbit IgG secondary antibodies (LI-COR) were used.
Tumor sphere assay was performed in Aggrewell400 24-well plates, according to manufacturer's instructions (StemCell Technologies). Each well contains approximately 1200 micro-wells. Cells were seeded at a density of 4000 cells/well, corresponding to 1-3 cells per micro-well. At the end point, tumor spheres were imaged and quantified using IncuCyte S3 System (EssenBioscience), using whole-well imaging modality.
METABRIC, TCGA, and MSK targeted sequencing breast cancer datasets were downloaded from cBioPortal. EMC-MSK dataset including 615 primary tumors (GSE2034, GSE2603, GSE5327, GSE12276), and the 65 metastasis sample dataset (GSE14020) were collected and processed as previously described (Zhang, X. H. et al., Cell 154, 1060-1073, (2013), the contents of which are incorporated by reference in their entirety). Paired primary breast tumor and brain metastasis RNA-Seq was available from Vareslija et al. To exclude the confounding effect of brain stroma contamination in this dataset, a contamination indicator generated from GSE52604 was applied, and the contaminating effect was regressed out, generating a corrected gene matrix. PI3K-response signatures were from Gatza et al. and Creighton et al. respectively. Signature analysis was conducted as described (Malladi, S. et al., Cell 165, 45-60, (2016), the contents of which are incorporated by reference in their entirety). Hierarchical clustering and heatmap generation were generated using gplots package. Log-rank tests of survival curve difference were calculated using survival package. A multivariate Cox proportional harzards model was built using coxph function (
In some embodiments, the steps of the methodologies and analysis provided herein can be implemented and/or supplemented through the use of computing devices. Any suitable computing device can be used to implement the computing devices and methods/functionality described herein and be converted to a specific system for performing the operations and features described herein through modification of hardware, software, and firmware, in a manner significantly more than mere execution of software on a generic computing device, as would be appreciated by those of skill in the art. One illustrative example of such a computing device 1500 is depicted in
The computing device 1500 can include a bus 1510 that can be coupled to one or more of the following illustrative components, directly or indirectly: a memory 1512, one or more processors 1514, one or more presentation components 1516, input/output ports 1518, input/output components 1520, and a power supply 1524. One of skill in the art will appreciate that the bus 1510 can include one or more busses, such as an address bus, a data bus, or any combination thereof. One of skill in the art additionally will appreciate that, depending on the intended
The computing device 1500 can include or interact with a variety of computer-readable media. For example, computer-readable media can include Random Access Memory (RAM); Read Only Memory (ROM); Electronically Erasable Programmable Read Only Memory (EEPROM); flash memory or other memory technologies; CD-ROM, digital versatile disks (DVD) or other optical or holographic media; magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices that can be used to encode information and can be accessed by the computing device 1500.
The memory 1512 can include computer-storage media in the form of volatile and/or nonvolatile memory. The memory 1512 may be removable, non-removable, or any combination thereof. Exemplary hardware devices are devices such as hard drives, solid-state memory, optical-disc drives, and the like. The computing device 1500 can include one or more processors that read data from components such as the memory 1512, the various I/O components 1516, etc. Presentation component(s) 1516 present data indications to a user or other device. Exemplary presentation components include a display device, speaker, printing component, vibrating component, etc.
The I/O ports 1518 can enable the computing device 1500 to be logically coupled to other devices, such as I/O components 1520. Some of the I/O components 1520 can be built into the computing device 1500. Examples of such I/O components 1520 include a microphone, joystick, recording device, game pad, satellite dish, scanner, printer, wireless device, networking device, and the like.
From the foregoing description, it will be apparent that variations and modifications may be made to the invention described herein to adopt it to various usages and conditions. Such embodiments are also within the scope of the following claims.
The recitation of a listing of elements in any definition of a variable herein includes definitions of that variable as any single element or combination (or subcombination) of listed elements. The recitation of an embodiment herein includes that embodiment as any single embodiment or in combination with any other embodiments or portions thereof.
All patents and publications mentioned in this specification are herein incorporated by reference to the same extent as if each independent patent and publication was specifically and individually indicated to be incorporated by reference.
This application claims the benefit of the following U.S. Provisional Application No. 62/837,525, filed Apr. 23, 2019, the entire contents of which are incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/029584 | 4/23/2020 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62837525 | Apr 2019 | US |