The molecular pathways for cellular processes such as metabolism, energy production, and signal transduction have been described in some detail. In contrast, the transcriptional circuitries that control the gene expression programs that define cell identity have yet to be mapped in most cells. For such mapping, it is essential to identify the set of key transcription factors that are responsible for control of cell identity and to determine how they function together to regulate cell-type-specific gene expression programs.
In some aspects, the disclosure provides a method of identifying the core regulatory circuitry of a cell or tissue, comprising: a) identifying a group of transcription factor encoding genes in a cell or tissue which are associated with a super-enhancer; b) determining which transcription factor encoding genes identified in a) comprise autoregulated transcription factor encoding genes, wherein a transcription factor encoding gene identified in a) comprises an autoregulated transcription factor encoding gene if the transcription factor encoded by the transcription factor encoding gene is predicted to bind to the super-enhancer associated with the transcription factor encoding gene; and c) identifying the core regulatory circuitry of the cell or tissue, wherein the core regulatory circuitry of the cell or tissue comprises autoregulated transcription factor encoding genes identified in b) which form an interconnected autoregulatory loop, wherein the autoregulated transcription factor encoding genes identified in b) form an interconnected autoregulatory loop if each transcription factor encoded by an autoregulated transcription factor encoding gene identified in b) is predicted to bind to the super-enhancer associated with each of the other autoregulated transcription factor encoding genes identified in b).
In some embodiments, the core regulatory circuitry comprises the autoregulated transcription factors forming the interconnected autoregulatory loop, the transcription factors encoded by the autoregulated transcription factor encoding genes, a super-enhancers associated with the autoregulated transcription factor encoding genes, or a component of the super-enhancer.
In some embodiments, the method further includes d) determining at least one target of at least one transcription factor encoded by at least one autoregulated transcription factor encoding gene. In some embodiments, the at least one target of the at least one transcription factor encoded by the at least one autoregulated transcription factor encoding gene comprises a gene which encodes a reprogramming factor or a cell identity gene. In some embodiments, the transcription factor encoded by the transcription factor encoding gene is predicted to bind to the super-enhancer associated with transcription factor encoding gene if the super-enhancer associated with the transcription factor encoding gene comprises at least one DNA sequence motif predicted for the transcription factor encoded by the transcription factor encoding gene. In some embodiments, each transcription factor encoded by the autoregulated transcription factor encoding gene is predicted to bind to the super-enhancer associated with each of the other autoregulated transcription factor encoding genes if the super-enhancers associated with each of the other autoregulated transcription factor encoding genes comprise at least one DNA sequence motif predicted for each of the transcription factors encoded by each of the other autoregulated transcription factor encoding genes.
In some embodiments, the at least one DNA sequence motif is located between 500 bp upstream and 500 bp downstream of the super-enhancer associated with the transcription factor encoding gene.
In some embodiments, the cell comprises a) a blood cell selected from the group consisting of a CD14+ monocyte, a CD56+ monocyte, a CD4+ T cell, a CD3+ T cell, a CD4+ primary T cell, a CD4+ memory T cell, a CD4+ naïve T cell, a CD4+ CD127+ T cell, a CD8+ primary T cell, a CD8+ memory T cell, a CD8+ naïve T cell, a CD19+ B cell, a CD20+ B cell, a CD34+ HSC cell; b) a brain cell selected from the group consisting of astrocytes, glial cells, an neurons; c) a fibroblast selected from the group consisting of dermal fibroblast and fibroblast; d) skeletal myoblasts; e) a colon crypt, f) an embryonic stem cell; g) a hepatocyte; h) a tumor cell; i) a keratinocyte; j) a macrophage; k) lymphocytes; l) regulatory T (Tregs); m) NK cells; n) pancreatic beta cells; o) cardiac muscle cells; p) never cells; and q) chondrocytes.
In some embodiments, the tissue comprises a) brain tissue selected from the group consisting of brain hippocampus, brain inferior temporal lobe, brain angular gyrus, and brain mid frontal lobe; b) internal tissue selected from the group consisting of spleen, bladder, mammary epithelium, adipose, ovarian, adrenal gland, pancreatic, and lung; d) thymus; e) muscle tissue selected from the group consisting of skeletal muscle, psoas muscle, duodenum smooth muscle, and stomach smooth muscle; f) heart tissue selected from the group consisting of right ventricle, aorta, left ventricle, and right atrium; g) digestive tissue selected from the group consisting of esophagus, gastric, sigmoid colon, and small intestine; and h) tumor tissue.
In some aspects, the disclosure provides a method of identifying the cell identity program of a cell or tissue, comprising a) identifying the core regulatory circuitry of a cell or tissue of interest, wherein the core regulatory circuitry of the cell or tissue of interest comprises at least one autoregulated transcription factor encoding gene associated with a super-enhancer in the cell or tissue of interest, at least one transcription factor encoded by the at least one autoregulated transcription factor encoding gene, at least one super-enhancer associated with the at least one autoregulated transcription factor encoding gene, and optionally at least one component of the super-enhancer; and b) identifying the cell identity program of the cell or tissue, wherein the cell identity program of the cell or tissue comprises the core regulatory circuitry identified in a) and at least one target of the at least one transcription factor encoded by the at least one autoregulated transcription factor encoding gene in the core regulatory circuitry.
In some embodiments, the at least one target comprises a gene comprising at least one enhancer element predicted to be bound by the at least one transcription factor. In some embodiments, the at least one enhancer element predicted to be bound by the at least one transcription factor comprises a DNA sequence motif associated with a super-enhancer.
In some aspects, the disclosure provides a method of modulating the identity of a cell, comprising modulating at least one component of a cell identity program of the cell. In some embodiments, the at least one component of the cell identity program in the cell comprises the core regulatory circuitry of the cell or at least one target modulated by the at least one component of the core regulatory circuitry of the cell. In some embodiments, the modulating the at least one component of the cell identity program in the cell comprises contacting the cell with an agent that modulates at least one component of the cell identity program of the cell.
In some embodiments, the cell comprises a cell listed in Table 2 and the at least one component of the cell identity program comprises at least one component listed in Table 2 selected from the group consisting of (i) at least one gene encoding a master transcription factor, (ii) the master transcription factor encoded by the at least one gene, (iii) a target of the master transcription factor, and (iv) at least one super-enhancer associated with any of (i)-(iii), or at least one component of the super-enhancer.
In some embodiments, the method further includes (i) modulating at least two components of the cell identity program in the cell, (ii) modulating at least three components of the cell identity program in the cell, (iii) modulating at least four components of the cell identity program in the cell, or (iv) modulating at least five components of the cell identity program in the cell. In some embodiments, the method further includes (i) modulating at least one component of the core regulatory circuitry in the cell and at least one target of a master transcription factor in the core regulatory circuitry; (ii) modulating at least two components of the core regulatory circuitry in the cell and at least two targets of a master transcription factor in the core regulatory circuitry; (iii) modulating at least three components of the core regulatory circuitry in the cell and at least three targets of a master transcription factor in the core regulatory circuitry; (iv) modulating at least four components of the core regulatory circuitry in the cell and at least four targets of a master transcription factor in the core regulatory circuitry; and (v) modulating at least five components of the core regulatory circuitry in the cell and at least five targets of a master transcription factor in the core regulatory circuitry of the cell.
In some aspects, the disclosure provides a method of diagnosing a cell identity program-related disorder comprising determining whether the cell identity program of the cell or tissue is enriched for disease-associated variations. In some embodiments, the determining comprises: a) obtaining a sample comprising a cell or tissue of interest; and b) detecting the presence of disease-associated variations in components of the cell identity program of the cell or tissue of interest, wherein the cell identity program of the cell or tissue is enriched for disease-associated variations if at least two disease-associated variations are detected in the components of the cell identity program of the cell or tissue of interest.
In some embodiments, the cell identity program of the cell or tissue is enriched for disease-associated variations if (i) at least three; (ii) at least four; (iii) at least five; (iv) or at least six disease associated variations are detected in the components of the cell identity program of the cell or tissue of interest. In some embodiments, the disease-associated variations comprise GWAS variants. In some embodiments, the disease-associated variations comprise GWAS variants in a super-enhancer associated with the core regulatory circuitry in the cell or tissue of interested selected from the group consisting of i) at least one gene encoding a master transcription factor, (ii) the master transcription factor encoded by the at least one gene, or (iii) at least one target of the master transcription factor. In some embodiments, the GWAS variant is selected from the group consisting of (i) a GWAS variant from Alzheimer disease present in the cell identity program of brain hippocampus; (ii) a GWAS variant from systemic lupus erythematosus present in the cell identity program of CD20 cells; (iii) a GWAS variant from fasting insulin trait present in the cell identity program of adipose nuclei; (iv) a GWAS variant from ulcerative colitis present in the cell identity program of sigmoid colon; and (vi) a GWAS variant from electrocardiographic traits present in the cell identity program of left ventricle.
In some aspects, the disclosure provides a method of treating a cell identity program-related disorder in a subject in need thereof, comprising modulating at least one abnormal component of a cell identity program in a diseased cell or tissue of the subject.
In some embodiments, modulating at least one abnormal component of the cell identity program in the diseased cell or tissue of the subject comprises administering to the subject an effective amount of an agent that modulates the at least one abnormal component of the cell identity program. In some embodiments, the agent is selected from the group consisting of small organic or inorganic molecules; saccharides; oligosaccharides; polysaccharides; a biological macromolecule selected from the group consisting of peptides, proteins, peptide analogs and derivatives; peptidomimetics; nucleic acids selected from the group consisting of siRNAs, shRNAs, antisense RNAs, ribozymes, and aptamers; an extract made from biological materials selected from the group consisting of bacteria, plants, fungi, animal cells, and animal tissues; naturally occurring or synthetic compositions; and any combination thereof. In some embodiments, the diseased cell or tissue comprises a tumor cell or tissue. In some embodiments, the diseased cell or tissue comprises a cell or tissue listed in Table 2, and the abnormal component comprises at least one component of the cell identity program of the cell listed in Table 2 selected from the group consisting of (i) a gene encoding a master transcription factor, (ii) the master transcription factor encoded by the gene, (iii) a target of the master transcription factor, and (iv) a super-enhancer associated with any of (i)-(iii), or a component of the super-enhancer.
In some embodiments, the method further includes diagnosing the subject as having the cell identity program-related disorder.
In some aspects, the disclosure provides a method of reprogramming a cell of a first cell type to a cell of a second cell type, the method comprising modulating at least one component of the core regulatory circuitry of the second cell type in the cell of the first cell type.
In some embodiments, the (i) the at least one component comprises a transcriptional repressor or transcriptional co-repressor and modulating comprises repressing the at least one component; and/or (ii) the at least one component comprises a transcriptional activator or transcriptional co-activator and modulating comprises activating the at least one component. In some embodiments, activating the at least one component comprises (i) expressing the at least one component of the core regulatory circuitry of the second cell type in the cell of the first type; (ii) introducing the at least one component of the core regulatory circuitry of the second cell type into the cell of the second type; (iii) contacting the cell with an agent that activates expression of the at least one component of the core regulatory circuitry of the second cell type in the cell of the first type; and (iv) any combination of (i)-(iii). In some embodiments, modulating (e.g., activating) the at least one component of the core regulatory circuitry of the second cell type in the cell of the first type occurs ex vivo. In some embodiments, modulating (e.g., repressing) the at least one component of the core regulatory circuitry of the second cell type in the cell of the first type occurs ex vivo.
In some embodiments, modulating (e.g., activating) the at least one component of the core regulatory circuitry of the second cell type in the cell of the first type occurs in vivo. In some embodiments, modulating (e.g., repressing) the at least one component of the core regulatory circuitry of the second cell type in the cell of the first type occurs in vivo.
In some embodiments, the method includes inhibiting at least one component of the core regulatory circuitry of the first cell type. In some embodiments, the (i) cell of the first cell type comprises the core regulatory circuitry of a diseased cell, and the cell of the second cell type comprises the core regulatory circuitry of a normal cell; (ii) cell of the first cell type comprises the core regulatory circuitry of a terminally differentiated cell, and the cell of the second cell type comprises the core regulatory circuitry of a less differentiated cell; (iii) cell of the first cell type comprises the core regulatory circuitry of a first somatic cell type, and the cell of the second cell type comprises the core regulatory circuitry of a second somatic cell type; (iv) cell of the first cell type comprises the core regulatory circuitry of a somatic cell, and the cell of the second cell type comprises the core regulatory circuitry of an embryonic cell; (v) cell of the first cell type comprises the core regulatory circuitry of a first tissue type, and the cell of the second type comprises the core regulatory circuitry of a second tissue type; (vi) cell of the first cell type comprises the core regulatory circuitry of a skin or fat cell, and the cell of the second cell type comprises the core regulatory circuitry of a tissue; and (vii) cell of the first cell type comprises the core regulatory circuitry of a tumor cell or tissue, and the cell of the second cell type comprises the core regulatory circuitry of a healthy cell or tissue.
In some aspects, the disclosure provides a method of identifying a candidate modulator of at least one component of the core regulatory circuitry of a cell or tissue, comprising: a) contacting a cell or tissue with a test agent; and b) assessing the ability of the test agent to modulate at least one component of the core regulatory circuitry of the cell or tissue, wherein the test agent is identified as a candidate modulator of the at least one component of the core regulatory circuitry of the cell or tissue if the at least one component of the core regulatory circuitry is activated or inhibited in the presence of the test agent.
In some embodiments, the at least one component of the core regulatory circuitry of the cell or tissue comprises a reprogramming factor or a cell identity gene. In some embodiments, the at least one component of the core regulatory circuitry of the cell or tissue comprises a disease-associated variant.
In some aspects, the disclosure provides a method of reprogramming a cell comprising contacting the cell with the candidate modulator identified according to a method described herein. In some embodiments, at least one component of the core regulatory circuitry of the cell comprises a disease-associated variant. In some embodiments, contacting occurs in vivo or ex vivo.
In some aspects, the disclosure provides a method of identifying a candidate modulator of at least one component of the cell identity program of a cell or tissue, comprising: a) contacting a cell or tissue with a test agent; and b) assessing the ability of the test agent to modulate at least one component of the cell identity program of the cell or tissue, wherein the test agent is identified as a candidate modulator of the at least one component of the cell identity program of the cell or tissue if the at least one component of the cell identity program of the cell or tissue is activated or inhibited in the presence of the test agent.
In some embodiments, the at least one component of the cell identity program of the cell or tissue comprises a reprogramming factor or a cell identity gene. In some embodiments, the at least one component of the cell identity program of the cell or tissue comprises a disease-associated variant.
In some aspects, the disclosure provides a method of reprogramming a cell comprising contacting the cell with the candidate modulator identified according to a method described herein. In some embodiments, at least one component of the core regulatory circuitry of the cell comprises a disease-associated variant. In some embodiments, contacting occurs in vivo or ex vivo.
In some aspects, the disclosure provides a method of identifying a target for drug discovery comprising identifying a variation in at least one component of the core regulatory circuitry of a cell or tissue that is more prevalent in subjects suffering from a disease than in healthy subjects, wherein the at least one component of the core regulatory circuitry of the cell or tissue that is more prevalent in subjects suffering from a disease than in healthy subjects comprises a disease-associated variant, and wherein the disease-associated variant is a target for drug discovery.
In some aspects, the disclosure provides a method of identifying a target for drug discovery comprising identifying a variation in at least one component of the cell identity program of a cell or tissue that is more prevalent in subjects suffering from a disease than in healthy subjects, wherein the at least one component of the cell identity program of the cell or tissue that is more prevalent in subjects suffering from a disease than in healthy subjects comprises a disease-associated variant, and wherein the disease-associated variant is a target for drug discovery.
In some embodiments, the target for drug discovery comprises a target for diagnostic purposes.
In some aspects, the disclosure provides a method of identifying a target for anti-cancer drug discovery comprising: a) comparing the core regulatory circuitry of a tumor cell or tissue with the core regulatory circuitry of a corresponding non-tumor cell or tissue; and b) identifying at least one component that differs between the core regulatory circuitry of the tumor cell or tissue and the corresponding non-tumor cell or tissue, wherein the at least one component that differs between the core regulatory circuitry of the tumor cell or tissue and the corresponding non-tumor cell or tissue is identified as a target for anti-cancer drug discovery.
In some embodiments, a gene regulated by the at least one component is identified as a target for anti-cancer drug discovery. In some embodiments, the at least one component differs in sequence, expression, and/or activity.
In some aspects, the disclosure provides a method of identifying an anti-cancer agent comprising identifying a modulator of the target for anti-cancer drug discovery identified according to a method described herein.
In some aspects, the disclosure provides a method treating a cancer characterized by tumor cell or tissue comprising the target for anti-cancer drug discovery, comprising administering to a subject suffering from the cancer an effective amount of the anti-cancer agent identified according to a method described herein.
The practice of the present invention will typically employ, unless otherwise indicated, conventional techniques of cell biology, cell culture, molecular biology, transgenic biology, microbiology, recombinant nucleic acid (e.g., DNA) technology, immunology, and RNA interference (RNAi) which are within the skill of the art. Non-limiting descriptions of certain of these techniques are found in the following publications: Ausubel, F., et al., (eds.), Current Protocols in Molecular Biology, Current Protocols in Immunology, Current Protocols in Protein Science, and Current Protocols in Cell Biology, all John Wiley & Sons, N.Y., edition as of December 2008; Sambrook, Russell, and Sambrook, Molecular Cloning: A Laboratory Manual, 3rd ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 2001; Harlow, E. and Lane, D., Antibodies—A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 1988; Freshney, R. I., “Culture of Animal Cells, A Manual of Basic Technique”, 5th ed., John Wiley & Sons, Hoboken, N.J., 2005. Non-limiting information regarding therapeutic agents and human diseases is found in Goodman and Gilman's The Pharmacological Basis of Therapeutics, 11th Ed., McGraw Hill, 2005, Katzung, B. (ed.) Basic and Clinical Pharmacology, McGraw-Hill/Appleton & Lange; 10th ed. (2006) or 11th edition (July 2009). Non-limiting information regarding genes and genetic disorders is found in McKusick, V. A.: Mendelian Inheritance in Man. A Catalog of Human Genes and Genetic Disorders. Baltimore: Johns Hopkins University Press, 1998 (12th edition) or the more recent online database: Online Mendelian Inheritance in Man, OMIM™. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, Md.) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, Md.), as of May 1, 2010, World Wide Web URL: http://www.ncbi.nlm.nih.gov/omim/ and in Online Mendelian Inheritance in Animals (OMIA), a database of genes, inherited disorders and traits in animal species (other than human and mouse), at http://omia.angis.org.au/contact.shtml. All patents, patent applications, and other publications (e.g., scientific articles, books, websites, and databases) mentioned herein are incorporated by reference in their entirety. In case of a conflict between the specification and any of the incorporated references, the specification (including any amendments thereof, which may be based on an incorporated reference), shall control. Standard art-accepted meanings of terms are used herein unless indicated otherwise. Standard abbreviations for various terms are used herein.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the Office upon request and payment of the necessary fee.
Aspects of the disclosure relate to methods of identifying the core regulatory circuitry and/or cell identity programs of cells or tissues, and related diagnostic, treatment, and screening methods involving the core regulatory circuitry and/or cell identity programs identified.
In embryonic stem cells and a few other cell types, master transcription factors (TFs) have been shown to function together in a core regulatory circuit (CRC) that controls the gene expression programs that define cell identity (Boyer et al., 2005; Lee and Young, 2011; Odom et al., 2006; Lien et al., 2002; Novershtern et al., 2011). In these CRCs, the master TFs regulate their own genes and other genes key to cell identity though their binding of the super-enhancers associated with those genes (Whyte et al., 2013; Hnisz et al., 2013). Work described herein exploits novel features of super-enhancers and TF binding site sequences for 43 cell types and tissues to construct models of CRCs for a broad spectrum of cell types throughout the human body. Cell Identity Program models for these cells, which consist of the master TFs forming the CRCs and their target genes, contain the vast majority of master TFs and reprogramming factors described for specific cell types in the literature and cluster according to known cell lineages. The work described herein also demonstrates that the master TFs in the CRCs have binding site sequences in the enhancers of the majority of cell identity genes that are expressed in each cell/tissue type. Surprisingly, the work described herein also demonstrates that the regulatory elements within the Cell Identity Program models are highly enriched in disease-associated sequence variation, and shows how tumor cells can modify the CRC to create gene expression programs associated with tumor pathology. These maps of core regulatory circuitry provide founding models to test and expand knowledge of regulatory circuitry, provide guidance for reprogramming studies, and should facilitate understanding of disease causality.
Accordingly, aspects of the disclosure relate to methods for identifying the core regulatory circuitry of a cell or tissue. In some aspects, a method of identifying the core regulatory circuitry of a cell or tissue comprises: a) identifying a group of transcription factor encoding genes in a cell or tissue which are associated with a super-enhancer; b) determining which transcription factor encoding genes identified in a) comprise autoregulated transcription factor encoding genes, wherein a transcription factor encoding gene identified in a) comprises an autoregulated transcription factor encoding gene if a transcription factor encoded by the transcription factor encoding gene is predicted to bind to a super-enhancer associated with the transcription factor encoding gene; and c) identifying the core regulatory circuitry of the cell or tissue, wherein the core regulatory circuitry of the cell or tissue comprises autoregulated transcription factor encoding genes identified in b) which form an interconnected autoregulatory loop, wherein the autoregulated transcription factor encoding genes identified in b) form an interconnected autoregulatory loop if each transcription factor encoded by an autoregulated transcription factor encoding gene identified in b) is predicted to bind to a super-enhancer associated with each of the other autoregulated transcription factor encoding genes identified in b). An exemplary embodiment of a method for identifying the core regulatory circuitry of a cell or tissue is depicted in
As is shown in the example embodiment depicted in
As is illustrated in
As exemplified in the embodiment shown in
As used herein, the phrase “interconnected autoregulatory loop” refers to a network of autoregulated transcription factor encoding genes predicted to bind each of the super-enhancers associated with other autoregulated transcription factors in the network. The concept of an autoregulatory loop is depicted in
As used herein, the phrase “super-enhancer” refers to clusters of enhancers which drive the expression of genes encoding the master transcription factors and other genes key to cell identity. The disclosure contemplates the use of any super-enhancer. Exemplary super-enhancers are disclosed in PCT International Application No. PCT/US2013/066957 (attorney docket no. WIBR-137-WO1), filed Oct. 25, 2013, the entirety of which is incorporated by reference herein.
As used herein, the phrase “super-enhancer component” refers to a component, such as a protein, that has a higher local concentration, or exhibits a higher occupancy, at a super-enhancer, as opposed to a normal enhancer or an enhancer outside a super-enhancer, and in embodiments, contributes to increased expression of the associated gene. In an embodiment, the super-enhancer component is a nucleic acid (e.g., RNA, e.g., eRNA transcribed from the super-enhancer, i.e., an eRNA). In an embodiment, the nucleic acid is not chromosomal nucleic acid. In an embodiment, the component is involved in the activation or regulation of transcription. In some embodiments, the super-enhancer component comprises RNA polymerase II, Mediator, cohesin, Nipbl, p300, CBP, Chd7, Brd4, and components of the esBAF (Brg1) or a Lsd1-Nurd complex (e.g., RNA polymerase II).
As used herein, “enhancer” refers to a short region of DNA to which proteins (e.g., transcription factors) bind to enhance transcription of a gene. As used herein, “transcriptional coactivator” refers to a protein or complex of proteins that interacts with transcription factors to stimulate transcription of a gene. In some embodiments, the transcriptional coactivator is Mediator. In some embodiments, the transcriptional coactivator is Med1 (Gene ID: 5469). In some embodiments, the transcriptional coactivator is a Mediator component. As used herein, “Mediator component” comprises or consists of a polypeptide whose amino acid sequence is identical to the amino acid sequence of a naturally occurring Mediator complex polypeptide. The naturally occurring Mediator complex polypeptide can be, e.g., any of the approximately 30 polypeptides found in a Mediator complex that occurs in a cell or is purified from a cell (see, e.g., Conaway et al., 2005; Kornberg, 2005; Malik and Roeder, 2005). In some embodiments a naturally occurring Mediator component is any of Med1-Med 31 or any naturally occurring Mediator polypeptide known in the art. For example, a naturally occurring Mediator complex polypeptide can be Med6, Med7, Med10, Med12, Med14, Med15, Med17, Med21, Med24, Med27, Med28 or Med30. In some embodiments a Mediator polypeptide is a subunit found in a Med11, Med17, Med20, Med22, Med 8, Med 18, Med 19, Med 6, Med 30, Med 21, Med 4, Med 7, Med 31, Med 10, Med 1, Med 27, Med 26, Med14, Med15 complex. In some embodiments a Mediator polypeptide is a subunit found in a Med12/Med13/CDK8/cyclin complex. Mediator is described in further detail in PCT International Application No. WO 2011/100374, the teachings of which are incorporated herein by reference in their entirety.
In some embodiments, the method of identifying the core regulatory circuitry comprises d) determining at least one target of at least one transcription factor encoded by at least one autoregulated transcription factor encoding gene. In some embodiments, the at least one target of the at least one transcription factor encoded by the at least one autoregulated transcription factor encoding gene comprises a gene which encodes a reprogramming factor or a cell identity gene.
Any suitable method can be used to determine whether the transcription factor encoded by the transcription factor encoding gene is predicted to bind to the super-enhancer associated with the transcription factor encoding gene, e.g., motif analysis or searching. In some embodiments, the transcription factor encoded by the transcription factor encoding gene is predicted to bind to the super-enhancer associated with transcription factor encoding gene if the super-enhancer associated with the transcription factor encoding gene comprises at least one DNA sequence motif predicted for the transcription factor encoded by the transcription factor encoding gene. In some embodiments, each transcription factor encoded by the autoregulated transcription factor encoding gene is predicted to bind to the super-enhancer associated with each of the other autoregulated transcription factor encoding genes if the super-enhancers associated with each of the other autoregulated transcription factor encoding genes comprise at least one DNA sequence motif predicted for each of the transcription factors encoded by each of the other autoregulated transcription factor encoding genes.
The at least one DNA sequence motif can be located within any range upstream or downstream of the super-enhancer associated with the transcription factor encoding gene (e.g., autoregulated transcription factor encoding gene). In some embodiments, the at least one DNA sequence motif is located between 10,000 bp upstream and 10,000 bp downstream of the super-enhancer associated with the transcription factor encoding gene. In some embodiments, the at least one DNA sequence motif is located between 5,000 bp upstream and 5,000 bp downstream of the super-enhancer associated with the transcription factor encoding gene. In some embodiments, the at least one DNA sequence motif is located between 500 bp upstream and 500 bp downstream of the super-enhancer associated with the transcription factor encoding gene. In some embodiments, the at least one DNA sequence motif is located between 50 bp upstream and 50 bp downstream of the super-enhancer associated with the transcription factor encoding gene.
In some embodiments, the methods described herein comprise obtaining ChIP-seq data for histone H3K27Ac, e.g., as a marker of an enhancer, e.g., a super-enhancer associated with a transcription factor encoding gene. In some embodiments, the H3K27Ac ChIP-seq data can be used to create a catalogue of super-enhancers for a cell or tissue of interest described herein.
Aspects of the disclosure involve cells of interest. The disclosure contemplates any cell of interest. In some embodiments, the cell comprises a cell of ectoderm lineage. In some embodiments, the cell comprises a cell of endoderm lineage. In some embodiments, the cell comprises a cell of mesoderm lineage. In some embodiments, the cell comprises an embryonic cell (e.g., embryonic stem cell). In some embodiments, the cell comprises a pluripotent cell (e.g., an induced pluripotent stem cell). In some embodiments, the cell comprises a somatic cell. In some embodiments, the cell comprises a multipotent cell. In some embodiments, the cell comprises a progenitor cell. In some embodiments, the cell comprises a cell listed in Table 1. In some embodiments, the cell comprises a cell listed in Table 2. In some embodiments, the cell comprises a) a blood cell selected from the group consisting of a CD14+ monocyte, a CD56+ monocyte, a CD4+ T cell, a CD3+ T cell, a CD4+ primary T cell, a CD4+ memory T cell, a CD4+ naïve T cell, a CD4+CD127+ T cell, a CD8+ primary T cell, a CD8+ memory T cell, a CD8+ naïve T cell, a CD19+ B cell, a CD20+ B cell, a CD34+ HSC cell; b) a brain cell selected from the group consisting of astrocytes, glial cells, an neurons; c) a fibroblast selected from the group consisting of dermal fibroblast and fibroblast; d) skeletal myoblasts; e) a colon crypt, f) an embryonic stem cell; g) a hepatocyte; h) a tumor cell; i) a keratinocyte; j) a macrophage; k) lymphocytes; I) regulatory T (Tregs); m) NK cells; n) pancreatic beta cells; o) cardiac muscle cells; p) nerve cells; and q) chondrocytes (e.g., for cartilage repair).
In some embodiments, the cell comprises a diseased cell. In some embodiments, the cell comprises a cell that harbors a disease-associated variant (e.g., a GWAS variant). In some embodiments, the tumor cell is a cell from a cancer selected from the group consisting of ovarian cancer, bladder cancer, lung cancer, cervical cancer, breast cancer, prostate cancer, gliomas, fibrosarcomas, retinoblastomas, melanomas, soft tissue sarcomas, osteosarcomas, leukemias, stomach cancer, colon cancer, carcinoma of the kidney, gastrointestinal cancer, salivary gland cancer, pancreatic cancer, Hodgkin's disease, non-Hodgkin's lymphomas, acute and chronic lymphocytic leukemias, multiple myeloma, neuroblastoma, Wilms' tumor, testicular cancer, soft-tissue sarcomas, chronic lymphocytic leukemia, primary macroglobulinemia, chronic granulocytic leukemia, primary brain carcinoma, malignant pancreatic insulinoma, malignant carcinoid carcinomas, malignant melanomas, choriocarcinomas, mycosis fungoides, head and neck carcinomas, osteogenic sarcoma, pancreatic carcinomas, acute granulocytic leukemia, hairy cell leukemia, neuroblastoma, rhabdomyosarcoma, Kaposi's sarcoma, genitourinary carcinomas, thyroid carcinomas, esophageal carcinomas, malignant hypercalcemia, cervical hyperplasia, renal cell carcinomas, endometrial carcinomas, polycythemia vera, essential thrombocytosis, adrenal cortex carcinomas, skin cancer, and prostatic carcinomas.
Aspects of the disclosure involve tissues of interest. The disclosure contemplates any tissue of interest. In some embodiments, the tissue comprises tissue of mesoderm lineage. In some embodiments, the tissue comprises tissue of endoderm lineage. In some embodiments, the tissue comprises tissue of ectoderm lineage. In some embodiments, the tissue comprises germ tissue. In some embodiments, the tissue comprises a) brain tissue selected from the group consisting of brain hippocampus, brain inferior temporal lobe, brain angular gyrus, and brain mid frontal lobe; b) internal tissue selected from the group consisting of spleen, bladder, mammary epithelium, adipose, ovarian, adrenal gland, pancreatic, and lung; d) thymus; e) muscle tissue selected from the group consisting of skeletal muscle, psoas muscle, duodenum smooth muscle, and stomach smooth muscle; f) heart tissue selected from the group consisting of right ventricle, aorta, left ventricle, and right atrium; g) digestive tissue selected from the group consisting of esophagus, gastric, sigmoid colon, and small intestine; and h) tumor tissue.
In an embodiment the sample includes a cell or tissue, e.g., a cell or tissue from any of human cells; fetal cells; embryonic stem cells or embryonic stem cell-like cells, e.g., cells from the umbilical vein, e.g., endothelial cells from the umbilical vein; muscle, e.g., myotube, fetal muscle; blood cells, e.g., cancerous blood cells, fetal blood cells, monocytes; B cells, e.g., Pro-B cells; brain, e.g., astrocyte cells, angular gyrus of the brain, anterior caudate of the brain, cingulate gyrus of the brain, hippocampus of the brain, inferior temporal lobe of the brain, middle frontal lobe of the brain, brain cancer cells; T cells, e.g., naïve T cells, memory T cells; CD4 positive cells; CD25 positive cells; CD45RA positive cells; CD45RO positive cells; IL-17 positive cells; cells stimulated with PMA; Th cells; Th17 cells; CD255 positive cells; CD127 positive cells; CD8 positive cells; CD34 positive cells; duodenum, e.g., smooth muscle tissue of the duodenum; skeletal muscle tissue; myoblast; stomach, e.g., smooth muscle tissue of the stomach, e.g., gastric cells; CD3 positive cells; CD14 positive cells; CD19 positive cells; CD20 positive cells; CD34 positive cells; CD56 positive cells; prostate, e.g., prostate cancer; colon, e.g., colorectal cancer cells; crypt cells, e.g., colon crypt cells; intestine, e.g., large intestine; e.g., fetal intestine; bone, e.g., osteoblast; pancreas, e.g., pancreatic cancer; adipose tissue; adrenal gland; bladder; esophagus; heart, e.g., left ventricle, right ventricle, left atrium, right atrium, aorta; lung, e.g., lung cancer cells; skin, e.g., fibroblast cells; ovary; psoas muscle; sigmoid colon; small intestine; spleen; thymus, e.g., fetal thymus; breast, e.g., breast cancer; cervix, e.g., cervical cancer; mammary epithelium; liver, e.g., liver cancer.
In some embodiments, the tumor tissue is tumor tissue from a cancer selected from the group consisting of ovarian cancer, bladder cancer, lung cancer, cervical cancer, breast cancer, prostate cancer, gliomas, fibrosarcomas, retinoblastomas, melanomas, soft tissue sarcomas, osteosarcomas, leukemias, stomach cancer, colon cancer, carcinoma of the kidney, gastrointestinal cancer, salivary gland cancer, pancreatic cancer, Hodgkin's disease, non-Hodgkin's lymphomas, acute and chronic lymphocytic leukemias, multiple myeloma, neuroblastoma, Wilms' tumor, testicular cancer, soft-tissue sarcomas, chronic lymphocytic leukemia, primary macroglobulinemia, chronic granulocytic leukemia, primary brain carcinoma, malignant pancreatic insulinoma, malignant carcinoid carcinomas, malignant melanomas, choriocarcinomas, mycosis fungoides, head and neck carcinomas, osteogenic sarcoma, pancreatic carcinomas, acute granulocytic leukemia, hairy cell leukemia, neuroblastoma, rhabdomyosarcoma, Kaposi's sarcoma, genitourinary carcinomas, thyroid carcinomas, esophageal carcinomas, malignant hypercalcemia, cervical hyperplasia, renal cell carcinomas, endometrial carcinomas, polycythemia vera, essential thrombocytosis, adrenal cortex carcinomas, skin cancer, and prostatic carcinomas.
In some embodiments, the cell or tissue of interest comprises a cell or tissue that is affected by a disease. Exemplary diseases include, without limitation, an autoimmune disease, a metabolic disease, a cardiovascular disease, a neurological disease, a psychiatric disease, a renal disease, a liver disease, a dermatological disease, a pancreatic disease, a glandular disease, a lymph disease, an ophthalmological disease, an orthopedic disease, an inflammatory disease, a hematological disease, an infectious disease, a cell-type specific disease, an olfactory disease, etc. In some embodiments, the cell or tissue affected by a disease is obtained from a subject suffering from the disease.
Aspects of the disclosed methods include obtaining a biological sample from a subject comprising a cell or tissue of interest. A biological sample used in the methods described herein will typically comprise or be derived from cells or tissues isolated from a subject. The cells or tissues may comprise cells or tissues affected by a disease described herein. In some embodiments, the cells or tissues are isolated from a tumor cell or tissue described herein.
Samples can be, e.g., surgical samples, tissue biopsy samples, fine needle aspiration biopsy samples, core needle samples. The sample may be obtained using methods known in the art. A sample can be subjected to one or more processing steps. In some embodiments the sample is frozen and/or fixed. In some embodiments the sample is sectioned and/or embedded, e.g., in paraffin. In some embodiments, tumor cells, e.g., epithelial tumor cells, are separated from at least some surrounding stromal tissue (e.g., stromal cells and/or extracellular matrix). Cells or tissue of interest can be isolated using, e.g., tissue microdissection, e.g., laser capture microdissection. It should be appreciated that a sample can be a sample isolated from any of the subjects described herein.
In some embodiments, cells of the sample are lysed. Nucleic acids or polypeptides may be isolated from the samples (e.g., cells or tissues of interest). In some embodiments DNA, optionally isolated from a sample, is amplified. A wide variety of methods are available for detection of DNA, e.g., DNA of super-enhancers associated with autoregulated transcription factor encoding genes, DNA of an autoregulated transcription factor encoding gene, a DNA sequence motif, etc. In some embodiments RNA, optionally isolated from a sample, is reverse transcribed and/or amplified. A wide variety of solution phase or solid phase methods are available for detection of RNA, e.g., mRNA encoding a master transcription factor or autoregulated transcription factor, mRNA encoding a target of a master transcription factor. Suitable methods include e.g., hybridization-based approaches (e.g., nuclease protection assays, Northern blots, microarrays, in situ hybridization), amplification-based approaches (e.g., reverse transcription polymerase chain reaction (which can be a real-time PCR reaction), or sequencing (e.g., RNA-Seq, which uses high throughput sequencing techniques to quantify RNA transcripts (see, e.g., Wang, Z., et al. Nature Reviews Genetics 10, 57-63, 2009)). In some embodiments of interest a quantitative PCR (qPCR) assay is used. Other methods include electrochemical detection, bioluminescence-based methods, fluorescence-correlation spectroscopy, etc.
Aspects of the methods described herein involve detecting the levels or presence of expression products, e.g., an expression product of a component the core regulatory circuitry comprising a disease associated variation (e.g., such as a single nucleotide polymorphism), an autoregulated transcription factor, an expression product of a target gene of a master transcription factor, etc.). Levels of expression products, e.g., of master transcription factor target genes, may be assessed using any suitable method. Either mRNA or protein level may be measured. A “polypeptide”, “peptide” or “protein” refers to a molecule comprising at least two covalently attached amino acids. A polypeptide can be made up of naturally occurring amino acids and peptide bonds and/or synthetic peptidomimetic residues and/or bonds. Polypeptides described herein include naturally purified products, products of chemical synthetic procedures, and products produced by recombinant techniques from a prokaryotic or eukaryotic host, including, for example, bacterial, yeast, higher plant, insect and mammalian cells.
Exemplary methods for measuring mRNA include hybridization based assays, polymerase chain reaction assay, sequencing, in situ hybridization, etc. Exemplary methods for measuring protein levels include ELISA assays, Western blot, mass spectrometry, or immunohistochemistry. It will be understood that suitable controls and normalization procedures can be used to accurately quantify expression. Values can also be normalized to account for the fact that different samples may contain different proportions of a cell type of interest, e.g., tumor cells or tissues compared to corresponding non-tumor cells or tissues (e.g., health cells or tissues).
Aspects of the disclosure relate to methods of identifying the cell identity program of a cell or tissue. Generally, the methods of identifying the cell identity program of a cell or tissue incorporate the methods of identifying the core regulatory circuitry and extend those methods according to exemplary embodiments depicted in
In some aspects, a method of identifying the cell identity program of a cell or tissue, comprising a) identifying the core regulatory circuitry of a cell or tissue of interest, wherein the core regulatory circuitry of the cell or tissue of interest comprises at least one autoregulated transcription factor encoding gene associated with a super-enhancer in the cell or tissue of interest, at least one transcription factor encoded by the at least one autoregulated transcription factor encoding gene, at least one super-enhancer associated with the at least one autoregulated transcription factor encoding gene, and optionally at least one component of the super-enhancer; and b) identifying the cell identity program of the cell or tissue, wherein the cell identity program of the cell or tissue comprises the core regulatory circuitry identified in a) and at least one target of the at least one transcription factor encoded by the at least one autoregulated transcription factor encoding gene in the core regulatory circuitry.
As used herein, the phrase “cell identity program” refers to the core regulatory circuitry of a cell or tissue and targets of master transcription factors that are part of the core regulatory circuitry of the cell or tissue, as is depicted in
The disclosure contemplates the use of any target of a master transcription factor that is part of the core regulatory circuitry of a cell or tissue, e.g., at least one target which comprises a gene comprising at least one enhancer element predicted to be bound by the at least one transcription factor. In some embodiments, the at least one enhancer element predicted to be bound by the at least one transcription factor comprises a DNA sequence motif associated with a super-enhancer.
Surprisingly, and unexpectedly, the work described herein demonstrates the cell identity programs constructed for 43 different human cell and tissue types. Exemplary cell identity programs for 43 different human cell and tissue types are shown in Table 2.
Aspects of the disclosure relate to methods for modulating cell identity. Generally, the methods of modulating cell identity disclosed herein involve modulating at least one component of a cell identity program of a cell. The at least one component of the cell identity program in the cell comprises the core regulatory circuitry of the cell or at least one target modulated by the at least one component of the core regulatory circuitry of the cell. The disclosure contemplates the use of any suitable method for modulating the at least one component of a cell identity program of a cell. In some embodiments, modulating the at least one component of the cell identity program in the cell comprises contacting the cell with an agent that modulates at least one component of the cell identity program of the cell. The expressions “activate”, “inhibit”, “modulate”, “increase”, “decrease” or the like, e.g., which denote quantitative differences between two states, refer to at least statistically significant differences between the two states. For example, “modulating at least one component of the cell identity program” means that the sequence, expression, or activity of the at least one component of the cell identity program is modified, activated, increased, inhibited, or decreased in the presence of the agent by at least statistically significantly amount compared to the sequence, expression, or activity of the at least one component of the cell identity program in the absence of the agent. Such terms are applied herein to, for example, rates of cell proliferation, percentages of surviving cells, percentages of altered or modified sequences, levels of expression, levels of transcriptional or translational activity, and levels of enzymatic or protein activity, percentages of conversion of a cell of a first cell type to a cell of a second cell type, etc. It should be appreciated that the at least one component can comprise any component of the cell identity program including one or more components of the core regulatory circuitry or targets of autoregulated transcription factors expressed by the core regulatory circuitry. In some embodiments, the cell comprises a cell listed in Table 2 and the at least one component of the cell identity program comprises at least one component listed in Table 2 selected from the group consisting of (i) at least one gene encoding a master transcription factor, (ii) the master transcription factor encoded by the at least one gene, (iii) a target of the master transcription factor, (iv) at least one super-enhancer associated with any of (i)-(iii), or at least one component of the super-enhancer.
The methods for modulating cell identity contemplate modulating any or all components of the cell identity program of a particular cell or tissue. Generally, it is expected that the extent of modulation of any particular cell or tissue from a first type to a second type is proportionate to the number of components in the cell identity program modulated relative to the total number of components in the cell identity program. In some embodiments, the method comprises modulating at least two components, at least three components, at least four components, or at least five components, of the cell identity program in the cell. In some embodiments, the method comprises modulating at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 33%, at least 40%, or at least 50% of the components in the cell identity program. In some embodiments, the method comprises modulating at least 55%, at least 60%, at least 70%, at least 75%, at least 80%, or at least 90% of the components in the cell identity program of a cell. In some embodiments, the method comprises modulating 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or up to 100% of the components of the cell identity program of the cell.
In some embodiments, the method comprises modulating at least one component of the core regulatory circuitry in the cell, and at least one target of a master transcription factor in the core regulatory circuitry. In some embodiments, the method comprises modulating at least two components of the core regulatory circuitry in the cell and at least two targets of a master transcription factor in the core regulatory circuitry. In some embodiments, the method comprises modulating at least three components of the core regulatory circuitry in the cell and at least three targets of a master transcription factor in the core regulatory circuitry. In some embodiments, the method comprises modulating at least four components of the core regulatory circuitry in the cell and at least four targets of a master transcription factor in the core regulatory circuitry. In some embodiments, the method comprises modulating at least five components of the core regulatory circuitry in the cell and at least five targets of a master transcription factor in the core regulatory circuitry of the cell. In some embodiments, the method comprises modulating at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20 or at least 25 components of the core regulatory circuitry in the cell and at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20 or at least 25 targets of the master transcription factors in the core regulatory circuitry.
In some embodiments, the method comprises modulating all components of the core regulatory circuitry in the cell, and at least one target of a master transcription factor in the core regulatory circuitry. In some embodiments, the method comprises modulating at least one component of the core regulatory circuitry in the cell, and all of the targets of the master transcription factor in the core regulatory circuitry. In some embodiments, the method comprises modulating all components of the core regulatory circuitry in the cell. In some embodiments, the method comprises modulating all targets of master transcription factors in the core regulatory circuitry.
In some aspects, the disclosure relates to reprogramming cells of a first cell type to cells of a second cell type, e.g., to alter the identity of the cell of the first cell type. In some aspects, the disclosure provides a method of reprogramming a cell of a first cell type to a cell of a second cell type, the method comprising modulating at least one component of the core regulatory circuitry of the second cell type in the cell of the first cell type. In some aspects, the disclosure provides a method of reprogramming a cell of a first cell type to a cell of a second cell type, the method comprising modulating at least one component of the cell identity program of the second cell type in the cell of the first cell type. In some context, “modulating at least one component of the core regulatory circuitry and/or cell identity program” comprises activating the at least one component of the core regulatory circuitry and/or cell identity program, e.g., activating a transcriptional coactivator. Those skilled in the art will appreciate that activation of the at least one component of the core regulatory circuitry and/or cell identity program can be accomplished in a variety of ways, e.g., alone or in combination with conventional reprogramming methods. In some embodiments, activating the at least one component comprises expressing the at least one component of the core regulatory circuitry and/or cell identity program of the second cell type in the cell of the first type. Such expression can be accomplished using methods such as DNA transfection, for example transient transfection, mRNA transfection, viral infection, etc. It should be appreciated that expression of core regulatory circuitry for purposes of reprogramming can be conditional, e.g., inducible, e.g., under control of an inducible promoter, e.g., using an inducible expression system, e.g., Tet-On, Tet-Off. In some embodiments, activating the at least one component comprises introducing the at least one component of the core regulatory circuitry and/or cell identity program of the second cell type into the cell of the second type. For example, at least one component of the core regulatory circuitry and/or cell identity program of the second cell type, e.g., in polypeptide form, can be directly introduced into the cell of the first cell type. Such polypeptides may, for example, be purified from natural sources, produced in vitro or in vivo in suitable expression systems using recombinant DNA technology (e.g., by recombinant host cells or in transgenic animals or plants), synthesized through chemical means such as conventional solid phase peptide synthesis, and/or methods involving chemical ligation of synthesized peptides (see, e.g., Kent, S., J Pept Sci., 9(9):574-93, 2003 or U.S. Pub. No. 20040115774), or any combination of the foregoing. In some embodiments, activating the at least one component comprises contacting the cell with an agent that activates expression of the at least one component of the core regulatory circuitry and/or cell identity program of the second cell type in the cell of the first type. In some embodiments, activation of the at least one component of the core regulatory circuitry and/or cell identity program of the second cell type comprises any combination of the above methods.
In some context, “modulating at least one component of the core regulatory circuitry and/or cell identity program” comprises repressing the at least one component of the core regulatory circuitry and/or cell identity program. For example, if the at least one component of the core regulatory circuitry and/or cell identity program comprise a repressor, reducing the repressor's activity in the context of several other transcriptional activators, for example transiently, could result in activation of the core regulatory circuitry and/or cell identity program of the second cell type thereby reprogramming the cell. The disclosure contemplates any suitable method of repressing the at least one component of the core regulatory circuitry and/or cell identity program (e.g., transcriptional repressor). Exemplary methods of repressing the at least one component include contacting the cell or tissue with a dominant negative mutant of the transcriptional repressor, contacting the cell or tissue with a nucleic acid that inhibits transcription or translation of the transcriptional repressor, e.g., antisense oligonucleotides directed against the sequence encoding the transcriptional repressor or a regulatory element that drives expression of the transcriptional repressor, e.g., a super-enhancer or DNA sequence binding motif, shRNA, microRNA, aptamers, small molecule inhibitors that interfere with binding between the transcriptional repressor and a regulatory element, etc.
It should be appreciated that the extent of reprogramming of the cell from the first cell type to the cell of the second cell type is likely to increase proportionately the extent of core regulatory circuitry and/or cell identity program components of the cell of the second cell type activated in the cell of the first cell type. In other words, the more the activation profile of core regulatory circuitry and/or cell identity program components of the cell of the first type resembles the core regulatory circuitry and/or cell identity program of the cell of the second type, the more the cell of the first type will phenotypically resemble the cell of the second type, i.e., the reprogramming efficiency will increase with increased activation of the desired core regulatory circuitry and/or cell identity program components. For the avoidance of doubt, it should be appreciated that the expressions “activation profile” and “activation of the core regulatory circuitry and/or cell identity program” refer to the overall effect that modulation of the components of the core regulatory circuitry and/or cell identity programs have on the cell or tissue, taking into account the fact that both activating a transcriptional activator or coactivator and repressing or inhibiting a transcriptional repressor or corepressor result in an overall net effect that favors increased activity or activation of the core regulatory circuitry and/or cell identity program in such a way that the identity of the cell is reprogrammed from the cell of the first type to the cell of the second type as a result of such increased activity or activation. In some embodiments, modulating the at least one component of the core regulatory circuitry and/or cell identity program increases the overall activation or activity of the core transcriptional circuitry and/or cell identity program (e.g., by driving the expression of core transcriptional circuitry target genes) by at least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, or 95% or more. In some embodiments, modulating the at least one component of the core regulatory circuitry and/or cell identity program increases the overall activation or activity of the core transcriptional circuitry and/or cell identity program by at least 1.1 fold, 1.2 fold, 1.3 fold, 1.4 fold, 1.5 fold, 1.6 fold, 1.7 fold, 1.8 fold, 1.9 fold, 2.0 fold, 2.5 fold, 3 fold, 4 fold, 5 fold, 6 fold, 7 fold, 8 fold.
In some embodiments, at least two components, at least three components, at least four components, at least five components, at least six components, at least seven components, at least eight components, at least nine components, or at least ten components of the core regulatory circuitry and/or cell identity program of the second cell type are modulated (e.g., activated and/or repressed) in the cell of the first type. In some embodiments, at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 33%, at least 35%, at least 40%, at least 45%, at least 50% or more of the components of the core regulatory circuitry of the cell of the second type are modulated (e.g., activated and/or repressed) in the cell of the first type. In some embodiments, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 87%, or at least 90% of the components of the core regulatory circuitry and/or cell identity program of the cell of the second type are modulated (e.g., activated and/or repressed) in the cell of the first type. In some embodiments, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% of the components of the core regulatory circuitry and/or cell identity program of the cell of the second type are modulated (e.g., activated and/or repressed) in the cell of the first type.
In some embodiments, modulating the at least one component of the core regulatory circuitry and/or cell identity program of the second cell type in the cell of the first type occurs ex vivo. In some embodiments, modulating the at least one component of the core regulatory circuitry and/or cell identity program of the second cell type in the cell of the first type occurs in vivo. In some embodiments, the method of reprogramming optionally comprises modulating (e.g., inhibiting) at least one component of the core regulatory circuitry and/or cell identity program of the first cell type.
It should be appreciated that the methods can be used to reprogram any cell of a first cell type to a cell of a second cell type as long as the core regulatory circuitry and/or cell identity program of the cell of the second cell type is known. In some embodiments, the cell of the first cell type comprises the core regulatory circuitry and/or cell identity program of a diseased cell, and the cell of the second cell type comprises the core regulatory circuitry and/or cell identity program of a normal cell. In some embodiments, the cell of the first cell type comprises the core regulatory circuitry and/or cell identity program of a terminally differentiated cell, and the cell of the second cell type comprises the core regulatory circuitry and/or cell identity program of a less differentiated cell. In some embodiments, the cell of the first cell type comprises the core regulatory circuitry and/or cell identity program of a first somatic cell type, and the cell of the second cell type comprises the core regulatory circuitry and/or cell identity program of a second somatic cell type. In some embodiments, the cell of the first cell type comprises the core regulatory circuitry and/or cell identity program of a somatic cell, and the cell of the second cell type comprises the core regulatory circuitry and/or cell identity program of an embryonic cell. In some embodiments, the cell of the first cell type comprises the core regulatory circuitry and/or cell identity program of a first tissue type, and the cell of the second type comprises the core regulatory circuitry and/or cell identity program of a second tissue type. In some embodiments, the cell of the first cell type comprises the core regulatory circuitry and/or cell identity program of a skin or fat cell, and the cell of the second cell type comprises the core regulatory circuitry and/or cell identity program of an internal cell or tissue. In some embodiments, the cell of the first cell type comprises the core regulatory circuitry and/or cell identity program of a tumor cell or tissue, and the cell of the second cell type comprises the core regulatory circuitry and/or cell identity program of a healthy cell or tissue.
In some embodiments, nucleic acids encoding one or more core regulatory circuitry components can be incorporated into a vector, which can be introduced into a cell whose reprogramming is desired. Accordingly, in some embodiments, the disclosure provides kits comprising at least one nucleic acid encoding a core regulatory circuitry component of a cell type of interest.
In some embodiments, reprogramming is effected without genetically modifying the cell being reprogrammed. In some embodiments, cells to be reprogrammed may be obtained from a patient (or donor, optionally one who is immunocompatible with the patient), reprogrammed ex vivo, and at least some of the resulting cells can be administered to the patient for purposes of cell-based therapy, e.g., regenerative medicine, e.g., restoring a degenerated, injured, damaged, or dysfunctional organ or tissue, cell-based immunotherapy (e.g., for cancer or an infection), or used to construct a tissue or organ ex vivo, which can be implanted into the patient. In some embodiments, the reprogrammed cells can optionally be expanded ex vivo prior to reprogramming, after reprogramming, or both.
In some aspects, the disclosure provides methods for determining a subset of core regulatory circuitry components for a cell or tissue that are sufficient to effect reprogramming of the cell or tissue, comprising systematically introducing all but a first, a second, a third, . . . up to an Nth (where N is an integer equal to the total number of core regulatory circuitry components for the cell or tissue) of the core regulatory circuitry components into the cell or tissue to be reprogrammed, and evaluating combinations of core regulatory circuitry components that are effective in reprogramming the cell or tissue.
The reprogramming methods described herein can be used for any purpose which would be desirable to a skilled person, e.g., use in cell therapy, e.g., autologous cell therapy. As an example, fibroblasts can be obtained from an individual and reprogrammed to muscle cells ex vivo for use in tissue repair. As another example, white fat can be reprogrammed to brown fat.
Aspects of the disclosure relate to diagnosing cell identity program-related disorders. As used herein a “cell identity program-related disorder” refers to any disease, condition, or disorder that is caused, correlated to, or associated with a deviation in sequence, expression, or activity of a component of a cell identity program in a cell or tissue, e.g., a diseased cell or tissue of interest, e.g., obtained from a subject suffering from any disease, condition, or disorder described herein. In some aspects, a method of diagnosing a cell identity program-related disorder comprising determining whether the cell identity program of the cell or tissue is enriched for disease-associated variations. Any suitable method can be used to determine enrichment of disease-associated variations in the cell identity program of a cell or tissue of interest. In some embodiments, determining whether the cell identity program of the cell or tissue is enriched for disease-associated variations comprises obtaining a sample comprising a cell or tissue of interest, and detecting the presence of disease-associated variations in components of the cell identity program of the cell or tissue of interest, wherein the cell identity program of the cell or tissue is enriched for disease-associated variations if at least two disease-associated variations are detected in the components of the cell identity program of the cell or tissue of interest.
Those skilled in the art will appreciate that the sensitivity and specificity of the diagnostic methods may increase as a function of the overall number of disease-associated variations detected in the cell identity program relative to the overall number of components in the cell identity program. In some embodiments, the cell identity program of the cell or tissue is enriched for disease-associated variations if at least three; at least four; at least five; or at least six disease associated variations are detected in the components of the cell identity program of the cell or tissue of interest. In some embodiments, the cell identity program of the cell or tissue is enriched for disease-associated variations if at least 7, at least 8, at least 9, or at least 10 disease-associated variations are detected in the components of the cell identity program. In some embodiments, the cell identity program of the cell or tissue is enriched for disease-associated variations if at least 5%, at least 6%, at least 7%, at least 8%, at least 9%, or at least 10% of the components of the cell identity program are determined to contain a disease-associated variation. In some embodiments, the cell identity program of the cell or tissue is enriched for disease-associated variations if at least 11%, at least 12%, at least 13%, at least 14%, at least 15%, at least 16%, at least 17%, at least 88%, at least 19%, at least 20%, at least 25% or more of the components of the cell identity program are determined to contain a disease-associated variation. In some embodiments, the cell identity program of the cell or tissue is enriched for disease-associated variations if at least 30%, at least 33%, at least 35%, at least 37%, at least 39%, at least 42%, at least 45%, at least 47%, at least 50%, at least 55%, at least 60% or more of the components of the cell identity program are determined to contain a disease-associated variation.
As used herein, the phrase “disease-associated variations” and “disease-associated variants” refers to variations in sequences, expression levels, or activity of components of a cell identity program in a particular cell or tissue of interest. In some embodiments, the disease associated variations comprise single nucleotide polymorphisms. In some embodiments, the disease-associated variations comprise GWAS variants. Any SNPs linked to a phenotypic trait or disease can be of use herein. In some embodiments, the SNP comprises one of more than 5,000 SNPs and diseases identified in more than 1,600 GWAS studies described in PCT International Application No. PCT/US2013/066957 (attorney docket no. WIBR-137-WO1), filed Oct. 25, 2013, the entirety of which is incorporated by reference herein.
In some embodiments, the disease-associated variations comprise GWAS variants in a super-enhancer associated with the core regulatory circuitry in the cell or tissue of interested selected from the group consisting of i) at least one gene encoding a master transcription factor, (ii) the master transcription factor encoded by the at least one gene, or (iii) at least one target of the master transcription factor. In some embodiments, the GWAS variant is selected from the group consisting of (i) a GWAS variant from Alzheimer disease present in the cell identity program of brain hippocampus; (ii) a GWAS variant from systemic lupus erythematosus present in the cell identity program of CD20 cells; (iii) a GWAS variant from fasting insulin trait present in the cell identity program of adipose nuclei; (iv) a GWAS variant from ulcerative colitis present in the cell identity program of sigmoid colon; (vi), a GWAS variant from electrocardiographic traits present in the cell identity program of left ventricle.
Aspects of the disclosure relate to various methods of treatment, e.g., treating cell identity program-related disorders. In some aspects, the disclosure provides a method of treating a cell identity program-related disorder in a subject in need thereof, comprising modulating at least one abnormal component of a cell identity program in a diseased cell or tissue of the subject. As used herein, “abnormal component” of a cell identity program refers to a component of a cell identity program which differs in sequence, expression and/or activity in the diseased cell or tissue compared to the sequence, expression or activity of the component in the corresponding healthy or normal cell or tissue. In some embodiments, modulating at least one abnormal component of the cell identity program in the diseased cell or tissue of the subject comprises administering to the subject an effective amount of an agent that modulates the at least one abnormal component of the cell identity program.
Aspects of the disclosure involve the use of agents. The disclosure contemplates the use of any agent that is suitable for a specified purpose, e.g. agents that modulate at least one component of a cell identity program, e.g., at least one abnormal component. Exemplary agents of use herein include, without limitation, small organic or inorganic molecules; saccharides; oligosaccharides; polysaccharides; a biological macromolecule selected from the group consisting of peptides, proteins, peptide analogs and derivatives; peptidomimetics; nucleic acids selected from the group consisting of siRNAs, shRNAs, antisense RNAs, ribozymes, and aptamers; an extract made from biological materials selected from the group consisting of bacteria, plants, fungi, animal cells, and animal tissues; naturally occurring or synthetic compositions; and any combination thereof.
In some embodiments, diseased cell or tissue comprises a tumor cell or tissue. In some embodiments, the diseased cell or tissue comprises a cell or tissue listed in Table 2, and the abnormal component comprises at least one component of the cell identity program of the cell listed in Table 2 selected from the group consisting of (i) a gene encoding a master transcription factor, (ii) the master transcription factor encoded by the gene, (iii) a target of the master transcription factor, (iv) a super-enhancer associated with any of (i)-(iii), or a component of the super-enhancer. In some embodiments, the method comprises diagnosing the subject as having the cell identity program-related disorder, e.g., according to a method described herein.
Aspects of the disclosure relate to identifying candidate modulators of core regulatory circuitry components of cells or tissues. Such candidate modulators can be useful, e.g., for reprogramming cells or tissues or treating diseases in which one or more components of the core regulatory circuitry comprises an abnormal component, e.g., the component comprises a disease-associated variant. In some aspects, the disclosure provides a method of identifying a candidate modulator of at least one component of the core regulatory circuitry of a cell or tissue, comprising: a) contacting a cell or tissue with a test agent; and b) assessing the ability of the test agent to modulate at least one component of the core regulatory circuitry of the cell or tissue, wherein the test agent is identified as a candidate modulator of the at least one component of the core regulatory circuitry of the cell or tissue if the at least one component of the core regulatory circuitry is activated or inhibited in the presence of the test agent. Activation or inhibition of the at least one component of the core regulatory circuitry can be measured by detecting and quantifying expression or activity of the at least one component of the core regulatory circuitry.
In some embodiments, the at least one component of the core regulatory circuitry of the cell or tissue comprises a reprogramming factor or a cell identity gene. In some embodiments, the at least one component of the core regulatory circuitry of the cell or tissue comprises a disease-associated variant.
In some aspects, the disclosure relates to methods of reprogramming cells comprising contacting the cells with candidate modulators identified according to the methods described herein. In some embodiments, at least one component of the core regulatory circuitry of the cell comprises a disease-associated variant. In some embodiments, contacting occurs in vivo or ex vivo.
Aspects of the disclosure relate to methods of identifying candidate modulators of cell identity program components in cells or tissue. In some aspects, the disclosure provides a method of identifying a candidate modulator of at least one component of the cell identity program of a cell or tissue, comprising: a) contacting a cell or tissue with a test agent; and b) assessing the ability of the test agent to modulate at least one component of the cell identity program of the cell or tissue, wherein the test agent is identified as a candidate modulator of the at least one component of the cell identity program of the cell or tissue if the at least one component of the cell identity program of the cell or tissue is activated or inhibited in the presence of the test agent. In some embodiments, the at least one component of the cell identity program of the cell or tissue comprises a reprogramming factor or a cell identity gene. In some embodiments, the at least one component of the cell identity program of the cell or tissue comprises a disease-associated variant.
In some aspects, the disclosure provides a method of reprogramming a cell comprising contacting the cell with the candidate modulator identified according to a method described herein. In some embodiments, at least one component of the core regulatory circuitry of the cell comprises a disease-associated variant. In some embodiments, contacting occurs in vivo or ex vivo.
Aspects of the disclosure relate to methods of identifying targets for drug discovery (e.g., cancer drug discovery). Such methods are useful for identifying core regulatory circuitry or cell identity programs of tumor cells or tissues which can be modulated in a way that shifts the tumor cells or tissues back towards the normal state, e.g., if a core regulatory circuitry component is overexpressed in tumor cells or tissue compared to normal cells or tissue, inhibiting its expression or activity in the tumor could shift the tumor cells or tissues back towards the normal state.
In some aspects, the disclosure provides, a method of identifying a target for drug discovery comprising identifying a variation in at least one component of the core regulatory circuitry of a cell or tissue that is more prevalent in subjects suffering from a disease than in healthy subjects, wherein the at least one component of the core regulatory circuitry of the cell or tissue that is more prevalent in subjects suffering from a disease than in healthy subjects comprises a disease-associated variant, and wherein the disease-associated variant is a target for drug discovery.
In some aspects, the disclosure provides a method of identifying a target for drug discovery comprising identifying a variation in at least one component of the cell identity program of a cell or tissue that is more prevalent in subjects suffering from a disease than in healthy subjects, wherein the at least one component of the cell identity program of the cell or tissue that is more prevalent in subjects suffering from a disease than in healthy subjects comprises a disease-associated variant, and wherein the disease-associated variant is a target for drug discovery.
In some embodiments, the target for drug discovery comprises a target for diagnostic purposes.
In some aspects, the disclosure provides a method of identifying a target for anti-cancer drug discovery comprising: a) comparing the core regulatory circuitry of a tumor cell or tissue with the core regulatory circuitry of a corresponding non-tumor cell or tissue; and b) identifying at least one component that differs between the core regulatory circuitry of the tumor cell or tissue and the corresponding non-tumor cell or tissue, wherein the at least one component that differs between the core regulatory circuitry of the tumor cell or tissue and the corresponding non-tumor cell or tissue is identified as a target for anti-cancer drug discovery. In some embodiments, a gene regulated by the at least one component is identified as a target for anti-cancer drug discovery. In some embodiments, the at least one component differs in sequence, expression, and/or activity.
In some aspects, the disclosure provides a method of identifying an anti-cancer agent comprising identifying a modulator of the target for anti-cancer drug discovery identified according to a method described herein.
In some aspects, the disclosure provides a method treating a cancer characterized by tumor cell or tissue comprising the target for anti-cancer drug discovery, comprising administering to a subject suffering from the cancer an effective amount of the anti-cancer agent identified according to a method described herein.
In some embodiments one or more steps of a method described herein is performed at least in part by a machine, e.g., computer (e.g., is computer-assisted) or other apparatus (device) or by a system comprising one or more computers or devices. “Computer-assisted” as used herein encompasses methods in which a computer is used to gather, process, manipulate, display, visualize, receive, transmit, store, or in any way handle or analyze information (e.g., data, results, structures, sequences, etc.). A method may comprise causing the processor of a computer to execute instructions to gather, process, manipulate, display, receive, transmit, or store data or other information. The instructions may be embodied in a computer program product comprising a computer-readable medium. A computer-readable medium may be any tangible medium (e.g., a non-transitory storage medium) having computer usable program instructions embodied in the medium. Any combination of one or more computer usable or computer readable medium(s) may be utilized in various embodiments. A computer-usable or computer-readable medium may be or may be part of, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device. Examples of a computer-readable medium include, e.g., a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (e.g., EPROM or Flash memory), a portable compact disc read-only memory (CDROM), a floppy disk, an optical storage device, or a magnetic storage device. In some embodiments a method comprises transmitting or receiving data or other information over a communication network. The data or information may be generated at or stored on a first computer-readable medium at a first location, transmitted over the communication network, and received at a second location, where it may be stored on a second computer-readable medium. A communication network may, for example, comprise one or more intranets or the Internet.
In some embodiments, a method of identifying the CRC and/or CIP may be embodied on a non-transitory computer-readable medium. In some embodiments, a CRC and/or CIP identified in accordance with the methods described herein may be embodied on a non-transitory computer-readable medium. In some embodiments a computer is used in sample tracking, data acquisition, and/or data management. For example, in some embodiments a sample ID is entered into a database stored on a computer-readable medium in association with a measurement or determination of a sequence, expression and/or activity. The sample ID may subsequently be used to retrieve a result of determining sequence, expression and/or activity in the sample. In some embodiments, automated image analysis of a sample is performed using appropriate software, comprising computer-readable instructions to be executed by a computer processor. For example, a program such as ImageJ (Rasband, W. S., ImageJ, U. S. National Institutes of Health, Bethesda, Md., USA, http://imagej.nih.gov/ij/, 1997-2012; Schneider, C. A., et al., Nature Methods 9: 671-675, 2012; Abramoff, M. D., et al., Biophotonics International, 11(7): 36-42, 2004) or others having similar functionality may be used. In some embodiments, an automated imaging system is used. In some embodiments an automated image analysis system comprises a digital slide scanner. In some embodiments the scanner acquires an image of a slide (e.g., following IHC for detection of a gene product) and, optionally, stores or transmits data representing the image. Data may be transmitted to a suitable display device, e.g., a computer monitor or other screen. In some embodiments an image or data representing an image is added to a patient medical record.
In some embodiments a machine, e.g., an apparatus or system, is adapted, designed, or programmed to perform an assay for measuring or determining sequence, expression or activity of a cell identity program component listed in Table 2. In some embodiments an apparatus or system may include one or more instruments (e.g., a PCR machine), an automated cell or tissue staining apparatus, a device that produces, records, or stores images, and/or one or more computer processors. The apparatus or system may perform a process using parameters that have been selected for detection and/or quantification of a gene product of master transcription factor listed in Table 2, e.g., in samples of tumor cells or tissue. The apparatus or system may be adapted to perform the assay on multiple samples in parallel and/or may comprise appropriate software to provide an interpretation of the result. The apparatus or system may comprise appropriate input and output devices, e.g., a keyboard, display, printer, etc. In some embodiments a slide scanning device such as those available from Aperio Technologies (Vista, Calif.), e.g., the ScanScope AT, ScanScope CS, or ScanScope FL or is used.
One skilled in the art readily appreciates that the present invention is well adapted to carry out the objects and obtain the ends and advantages mentioned, as well as those inherent therein. The details of the description and the examples herein are representative of certain embodiments, are exemplary, and are not intended as limitations on the scope of the invention. Modifications therein and other uses will occur to those skilled in the art. These modifications are encompassed within the spirit of the invention. It will be readily apparent to a person skilled in the art that varying substitutions and modifications may be made to the invention disclosed herein without departing from the scope and spirit of the invention.
The articles “a” and “an” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to include the plural referents. Claims or descriptions that include “or” between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention also includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process. Furthermore, it is to be understood that the invention provides all variations, combinations, and permutations in which one or more limitations, elements, clauses, descriptive terms, etc., from one or more of the listed claims is introduced into another claim dependent on the same base claim (or, as relevant, any other claim) unless otherwise indicated or unless it would be evident to one of ordinary skill in the art that a contradiction or inconsistency would arise. It is contemplated that all embodiments described herein are applicable to all different aspects of the invention where appropriate. It is also contemplated that any of the embodiments or aspects can be freely combined with one or more other such embodiments or aspects whenever appropriate. Where elements are presented as lists, e.g., in Markush group or similar format, it is to be understood that each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should be understood that, in general, where the invention, or aspects of the invention, is/are referred to as comprising particular elements, features, etc., certain embodiments of the invention or aspects of the invention consist, or consist essentially of, such elements, features, etc. For purposes of simplicity those embodiments have not in every case been specifically set forth in so many words herein. It should also be understood that any embodiment or aspect of the invention can be explicitly excluded from the claims, regardless of whether the specific exclusion is recited in the specification. For example, any one or more nucleic acids, polypeptides, cells, species or types of organism, disorders, subjects, or combinations thereof, can be excluded.
Where the claims or description relate to a composition of matter, e.g., a nucleic acid, polypeptide, cell, or non-human transgenic animal, it is to be understood that methods of making or using the composition of matter according to any of the methods disclosed herein, and methods of using the composition of matter for any of the purposes disclosed herein are aspects of the invention, unless otherwise indicated or unless it would be evident to one of ordinary skill in the art that a contradiction or inconsistency would arise. Where the claims or description relate to a method, e.g., it is to be understood that methods of making compositions useful for performing the method, and products produced according to the method, are aspects of the invention, unless otherwise indicated or unless it would be evident to one of ordinary skill in the art that a contradiction or inconsistency would arise.
Where ranges are given herein, the invention includes embodiments in which the endpoints are included, embodiments in which both endpoints are excluded, and embodiments in which one endpoint is included and the other is excluded. It should be assumed that both endpoints are included unless indicated otherwise. Furthermore, it is to be understood that unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or subrange within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise. It is also understood that where a series of numerical values is stated herein, the invention includes embodiments that relate analogously to any intervening value or range defined by any two values in the series, and that the lowest value may be taken as a minimum and the greatest value may be taken as a maximum. Numerical values, as used herein, include values expressed as percentages. For any embodiment of the invention in which a numerical value is prefaced by “about” or “approximately”, the invention includes an embodiment in which the exact value is recited. For any embodiment of the invention in which a numerical value is not prefaced by “about” or “approximately”, the invention includes an embodiment in which the value is prefaced by “about” or “approximately”. “Approximately” or “about” generally includes numbers that fall within a range of 1% or in some embodiments within a range of 5% of a number or in some embodiments within a range of 10% of a number in either direction (greater than or less than the number) unless otherwise stated or otherwise evident from the context (except where such number would impermissibly exceed 100% of a possible value). It should be understood that, unless clearly indicated to the contrary, in any methods claimed herein that include more than one act, the order of the acts of the method is not necessarily limited to the order in which the acts of the method are recited, but the invention includes embodiments in which the order is so limited. It should also be understood that unless otherwise indicated or evident from the context, any product or composition described herein may be considered “isolated”.
The molecular pathways for cellular processes such as metabolism, energy production, and signal transduction have been described in some detail. In contrast, the transcriptional circuitries that control the gene expression programs that define cell identity have yet to be mapped in most cells. For such mapping, it is essential to identify the set of key transcription factors that are responsible for control of cell identity and to determine how they function together to regulate cell-type-specific gene expression programs.
The key transcription factors responsible for the control of embryonic stem cell identity have been identified and their genome-wide occupancy and functions have been investigated extensively. This small set of master transcription factors has been identified through genetic perturbation and by virtue of their ability to reprogram cells of various types into the pluripotent state characteristic of ESCs (Yamanaka and Blau, 2010; Hanna et al., 2010; Stadtfeld and Hochedlinger, 2010; Young, 2011). These ESC master transcription factors bind to clusters of enhancers, called super-enhancers, which drive the expression of genes encoding the master transcription factors themselves as well as other genes key to cell identity. The master transcription factors thus form an interconnected autoregulatory circuitry that is at the core of the transcriptional network and that controls the pluripotent gene expression program of ESCs. Little is known about the core transcriptional circuitries of most human cell types, but there has been considerable progress in identifying transcription factors that are essential for cell identity and cellular reprogramming in a number of cell types. For example, master transcription factors have been identified for various hematopoietic cells, hepatocytes, pancreatic islets, heart and neurons (Graf and Enver, 2009; Vierbuchen et al., Nature 2010; Zhou et al., Nature 2008; McCulley and Black, Curr Top Dev Biol 2012). These factors tend to share two features: (1) they are encoded by genes whose expression is driven by super-enhancers and (2) they bind their own SEs as well as those of other master TFs. We have used these two properties to create models of core transcriptional regulatory circuitries (CRCs) for a broad range of human cell types. We describe these CRCs, criteria that we used for initial validation, evidence that non-cancer disease-associated variation is concentrated in these CRCs, and how tumor cells can modify CRCs to produce oncogenic gene expression programs.
Results
Cell Identity Program Maps for Human Primary Cells and Tissues
To construct maps of the core regulatory circuitry (CRC) driving the cell identity program of human cell types, we used the logic outlined in
Previous studies have shown that master TFs bind their own enhancers (Lee and Young, 2013; Chen et al., 2008; Chew et al., 2005; Matoba et al., 2006), so we next identified the subset of SE-associated TF genes whose products were predicted to bind their own SEs (
In ESCs and a few other cell types, the master TFs bind to the enhancers of their own genes as well as those of other master TFs, forming an interconnected autoregulatory loop (Boyer et al., 2005; Odom et al., 2006; Lien et al., Dev Biol 2002; Novershtern et al., Cell 2011). This auto-regulatory loops form the core regulatory circuit of the cells identity program. We next identified the auto-regulated SE-associated TF genes encoding transcription factors that are also predicted to bind each of the super-enhancers of the other auto-regulated transcription factors, and assembled the largest fully inter-connected network of auto-regulated transcription factors (
To further define cell identity programs, we extended the concept that master TFs of ESCs bind the super-enhancers of key cell-type-specific genes that are expressed in these cells (Young, 2011; Lee and Young, 2013). We thus identified, for all cell types under study, all SE-associated genes whose SEs contained motifs for all of the transcription factors in the CRC (
This approach allowed us to generate models of cell identity programs for 43 human primary cells and tissue types (Table 2).
Cell Identity Program Factors Cluster According to Known Lineages
During the course of development, cells evolve into different lineages which give rise to a specific panel of differentiated cell-types. The progressive differentiation of each cell type requires sequential activation or repression of transcriptional circuits, which have been especially well described for hematopoietic stem cell differentiation (Novershtern et al., Cell 2011; McArtur et al., 2009). We hypothesized that differentiated cell-types arising from the same developmental tissue would be more likely to share the same master transcription factors than cell-types originating from tissues which fate diverged earlier during development. To test this hypothesis, we carried out a hierarchical clustering analysis on the lists of factors we predicted to be part of the Cell Identity Program for each cell type. We obtained a dendrogram that remarkably recapitulated known lineage patterns (
CRC Master TFs have Binding Sites in Majority of Cell Identity Genes
In ESCs, the CRC master transcription factors occupy the enhancers of the majority of active cell identity genes (Kagey et al., 2010). We investigated whether the master transcription factors in the CRCs for the larger set of human cell types described here have binding site sequences in the enhancers of most active cell identity genes. The results show that this is indeed the case. Work described herein demonstrates that about 50% of the SE-associated genes in each cell-type have binding sites in their super-enhancer regulatory sequences for all the transcription factors in the CRC. Most of the known reprograming factors are either part of the CRC or the Cell Identity Program. We also observed that most of the cell identity genes have motifs in their regulatory sequences for at least one of the transcription factors of the CRC. These results suggest that the master TFs in the CRCs of most human cell types do indeed occupy the majority of active cell identity genes.
Cell Identity Programs are Enriched in Disease-Associated Sequence Variation
Work described herein demonstrates that the regulatory elements within the CRCs are enriched in disease-associated sequence variation (
Discussion
Work described herein provides the first maps of core regulatory circuitry of cell identity for a broad range of human cell types and tissues. These CRC maps provide founding models to test and expand knowledge of regulatory circuitry, provide guidance for reprogramming studies, and should facilitate understanding of disease causality.
Experimental Procedures
ChIP-seq Data
H3K27ac ChIP-seq sequence reads were either downloaded from GEO or generously shared by the NIH Roadmap Epigenome project (Bernstein et al., 2010) and were aligned to the hg19 version of the human genome using Bowtie 0.12.9 (Langmead et al., 2009) with parameters -k2-m2-n2-best.
CTC Mapper
During the course of work described herein an algorithm was developed to identify the transcriptional core circuitry of the cells which uses as input a file containing H3K27ac ChIP-seq reads aligned to the human genome together with its associated input ChIP-seq control aligned file, in a bam format. Briefly, super-enhancers and Master transcription Factors are identified using MACS 1.4.2 (Zhang et al., 2008) and ROSE (Loven et al., 2013) and a motif analysis is carried out on the super-enhancer constituent sequences extended 500 bp on each side using FIMO from the MEME suite (Matys et al., 2006). Interconnected auto-regulatory loops and their target genes are identified as described in the Experimental Procedures.
Lineage Clustering
Cell-type clustering based on core circuitry gene lists was done in R. A distance matrix was built based on the number of identical genes found in the cell type core circuitry gene lists on either all the genes in the core regulatory circuits or on the genes forming the interconnected autoregulatory loops only using the R dist function with euclidian method. The R hclust function with complete method was applied to the matrix of distances to generate the dendrograms.
GWAS Variant Analysis
Disease or trait-associated GWAS variants that had a dbSNP identifier and were found associated with the trait or disease in at least two independent studies were selected from the NHGRI (National Human Genome Research Institute) catalog of GWAS variants (www.genome.gov/gwastudies). Non-coding GWAS variants were identified as those that do not overlap with hg19 exonic regions. For each disease or trait, the GWAS variants were mapped to the super-enhancer regions identified in a cell-type relevant to the disease.
Identification of Super-Enhancers
First, super-enhancers are called as described in (Hnisz et al., 2013). Briefly, H3K27ac enriched regions are called using MACS 1.4.2 (Zhang et al., 2008) with parameters -p 1e-9 keep-dup=auto-w-S-space=50 on each H3K27ac ChIP-seq alignment and their corresponding input controls. ROSE (Loven et al., 2013) is then used to identify super-enhancers from the H3K27ac enriched regions. Briefly, H3K27ac enriched regions are considered as enhancers and are stitched together when they occur within 12.5 kb. In order to distinguish the H3K27ac enhancer signal from the H3K27ac promoter signal, constituent enhancers that are fully contained within 2 kb of a TSS are disregarded for stitching. Enhancer clusters that have a H3K27ac input-subtracted signal above a computed threshold defined by ranking the H3K27ac signal at enhancer clusters are identified as super-enhancers. Super-enhancers are then assigned to the closest active gene, considering the distance of the TSS to the center of the super-enhancers. We considered expressed the genes the first 2/3 genes based on their H3K27ac read density+−500 bp around their TSS rank. Genes called expressed using this metric show 90% overlap with genes having Gros-eq signal above background in their genes body (data not shown).
Identification of Master Transcription Factor Candidates
Super-enhancer-associated transcription factors are then selected from the lists of super-enhancer-associated genes using a list of transcription factors consisting in the concatenation of AnimaITFDB (Zhang et al., 2012), TcoF (Schaefer et al., 2011), Heinaniemi (ref) lists of factors. The super-enhancer-associated transcription factors are considered as the master transcription factor candidates for this cell type.
Motif Analysis
Super-enhancer constituent DNA sequences from all the identified super-enhancers in a given cell are extracted and extended 500 bp on each side to allow for transcription factor binding motif identification in and aside of H3K27ac peaks. A motif search is carried out on these sequences using FIMO (Find Individual Motif Occurrences) from the MEME (Multiple Em for Motif Elicitation) suite (Matys et al., 2006) to allow the identification of all occurrences of the DNA sequence motifs contained in a compiled library of motifs at a p-value threshold of 1e-4. The compiled library of motifs we used was composed of the TRANSFAC database motifs that we manually annotated to better associate the TRANSFAC motif designators with the official symbols, and the vertebrate motifs from the MEME database (updated on Jan. 23, 2014): (JASPAR CORE 2014 vertebrates (Mathelier et al., 2014), Jolma 2013 (Jolma et al., 2013), Homeodomains (Berger et al., 2008), mouse UniPROBE (Robasky et al., 2011), mouse and human ETS factors (Wei et al. 2010).
Identification of Interconnected Auto-Regulatory Loops and Associated Genes
The extended constituents that have motifs for each of the master transcription factor candidates are then identified and the official gene symbol of their associated genes is recovered using a dictionary associating each vertebrate to their associated gene official symbol or alias. From this list of genes, the transcription factors that have binding sites for their own protein products in their assigned extended super-enhancer constituents are defined as putative auto-regulated transcription factors. Interconnected auto-regulatory loops of the transcriptional core circuitry are then identified as the largest inter-connected network of auto-regulated transcription factors using an algorithm based on the identification of the maximum clique from the graph theory. Super-enhancer associated genes which contain binding motifs in their super-enhancer extended constituents for each of the predicted master transcription factors in the interconnected auto-regulatory loop are defined as target genes of the predicted master transcription factors. We calculated the pubmed (http://www.ncbi.nlm.nih.gov/pubmed) entry ratio of queries associating the gene official symbol or aliases in association with a list of terms related to the cell-type they were extracted from (Table 2) over the pubmed entries related to each factor only. For ease of representation, the 15 factors with the highest ratio were shown on the maps.
Transcription Factor Binding Predictions Validation
Oct4, Sox2 and Nanog ChIP-seq data were used to evaluate the predictions of the binding of transcription factors to super-enhancer extended constituent sequences. We identified the of super-enhancer constituents extended 500 bp on each side that had DNA motifs for each transcription factor and those that were overlapping with transcription factors binding sites as identified by the macs program ran on the ChIP-seq data with parameter -p 1e-9 keep-dup=auto-w-S-space=50. The true positive rates of transcription factor binding at super enhancer constituents was calculated by dividing the number motif containing super-enhancer constituent that are bound by the factors over the total number of motif containing super-enhancer constituents. Fold enrichments of true positive in super-enhancer sequences were next calculated by comparing the true positive rates at super-enhancers to the true positive rates obtained using a set of random genomic regions of the same size as the super-enhancer extended constituents.
GWAS Variant Enrichment Significance
Enrichment of the disease-associated GWAS variants in the super-enhancers of the core regulatory circuitry was calculated as the chance of capturing the same or a greater number of disease or trait-associated variants in a random set of genomic sequences, using a permutation test. A set of genomic sequences of the same size and originating from the same chromosome as each super-enhancer contained in the super-enhancer set of each relevant cell type was randomly selected 10000 times to calculate each empirical p-value.
This application claims the benefit of U.S. Provisional 61/955,764, filed Mar. 19, 2014. The entire teachings of the above application(s) are incorporated herein by reference.
This invention was made with government support under RO1-HG002668 awarded by the National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
61955764 | Mar 2014 | US |