A Sequence Listing is provided herewith as a text file, “373038WOSEQ LIST.txt” created on Oct. 16, 2020 and having a size of 53,248 bytes. The contents of the text file are incorporated by reference herein in their entirety.
The first demonstration that differentiated somatic cells can be reprogrammed into induced pluripotent stem cells (iPSCs) utilized ectopic expression of four factors: Oct4 (O), Sox2 (S), Klf4 (K), and c-Myc (M) (Takahashi and Yamanaka, 2006). For many years, Oct4 has been considered indispensable in the reprogramming process, because it is the only one of those four that is sufficient to induce pluripotency alone and its family members cannot replace its function (Kim et al., 2009a; Kim et al., 2009b; Nakagawa et al., 2008). Mechanistic investigations have shown that reprogramming is initiated by the global cooperative engagement of three pioneer factors Oct4, Sox2, and Klf4, followed by genome-wide epigenetic remodeling and two transcriptional waves (Chen et al., 2016; Chronis et al., 2017; Polo et al., 2012; Smith et al., 2016; Soufi et al., 2012; Sridharan et al., 2009). These studies emphasize the cooperative effect of Oct4, Sox2 and Klf4 (Chronis et al., 2017; Sridharan et al., 2009) but do not explain why Oct4 is unique, and the function of Sox2 and Klf4 in this process remains underappreciated.
Methods and compositions are described here for precisely controlling factor stoichiometry during cellular reprogramming by using polycistronic cassettes. Surprisingly, the data described herein show that in the absence of ectopic Oct4, polycistronic Sox2, Klf4, and c-Myc (referred to, for example, as the S2AK2AM polycistronic construct) was sufficient to establish pluripotency in several types of differentiated somatic cells. In some cases, c-Myc was optional and use of polycistronic Sox2 and Klf4 (for example, S2AK) was sufficient. The stoichiometry of Sox2 and Klf4 was more important for this reprogramming (e.g., than that of c-Myc), as disruption of the Sox2 and Klf4 factor balance led to a significant decrease or failure in iPSC generation. Genome wide investigations revealed cooperative binding of Sox2 and Klf4, leading to gradual activation and establishment of pluripotency network. Moreover, parallel transcriptomic analysis with secondary S2AK2AM embryonic fibroblasts (2° MEFs) and neural progenitor cells (2° NPCs) demonstrated convergent reprogramming trajectories and similar efficiency. The results shown herein illustrate the stoichiometric sufficiency of Sox2 and Klf4 in pluripotency induction without ectopic Oct4. The data provided herein demonstrate the core functions of Sox2 and Klf4 in pluripotency induction.
As described herein, in the absence of ectopic Oct4 expression, polycistronic Sox2, Klf4, and c-Myc was sufficient to establish pluripotency in several types of differentiated somatic cells. In some cases, c-Myc was not needed. The stoichiometry of Sox2 and Klf4 was important for this reprogramming, as disruption of the factor balance led to a significant decrease or failure in iPSC generation. To optimize the stoichiometry of Sox2 and Klf4, polycistronic expression cassettes are described herein that include a promoter operably linked to a nucleic acid segment encoding Sox2, Klf4, and optionally c-Myc. The nucleic acid segment can also include one or more peptide linkers between the Sox2, Klf4, and optional c-Myc coding regions. For example, the 2A “self-cleaving” peptides can be used as peptide linkers between the Sox2, Klf4, and optional c-Myc coding regions. Such linkers provide cleavage between the Sox2, Klf4, and optional c-Myc polypeptides. One example of a polycistronic expression cassette can, for example, include an open reading frame that includes the Sox2, Klf4, and c-Myc coding regions, where there is a cleavable 2A peptide linker between and in frame with the Sox2 and Klf4 coding regions, and where there is a 2A peptide linker between and in frame with the Klf4 and c-Myc coding regions (referred to as S2AK2AM). Examples of cleavable linker sequences are provided herein.
A “Klf polypeptide” refers to any of the naturally-occurring members of the family of Krüppel-like factors (Klfs), zinc-finger proteins that contain amino acid sequences similar to those of the Drosophila embryonic pattern regulator Krüppel, or variants of the naturally-occurring members that maintain transcription factor activity similar (within at least 50%, 80%, or 90% activity) compared to the closest related naturally occurring family member, or polypeptides comprising at least the DNA-binding domain of the naturally occurring family member, and can further comprise a transcriptional activation domain. See, Dang, D. T., Pevsner, J. & Yang, V. W. Cell Biol. 32, 1103-1121 (2000). Exemplary Klf family members include, Klf1, Klf2, Klf3, Klf-4, Klf5, Klf6, Klf7, Klf8, Klf9, Klf10, Klf11, Klf12, Klf13, Klf14, Klf15, Klf16, and Klf17. Klf2 and Klf-4 were found to be factors capable of generating iPS cells in mice, and related genes Klf1 and Klf5 did as well, although with reduced efficiency. See, Nakagawa, et al., Nature Biotechnology 26:101-106 (2007). In some embodiments, variants have at least 85%, 90%, 95%, 97%, 98%, 99%, or 99.5% amino acid sequence identity across their whole sequence compared to a naturally occurring Klf polypeptide family member such as to those listed above or such as listed in Genbank. Klf polypeptides (e.g., Klf1, Klf4, and Klf5) can be from human, mouse, rat, bovine, porcine, or other animals. Generally, the same species of protein will be used with the species of cells being manipulated.
The Klf4 polypeptide can be used as a pluripotency factor encoded in the polycistronic expression cassette. For example, the Klf4 polypeptide employed can have NCBI accession no. CAX16088 (mouse Klf4), NP_004226.3 (GI: 194248077) (human Klf4), or NP_001300981.1 (GI: 930697457) (human Klf4). A sequence for human Klf4 accession no. NP_004226.3 (GI: 194248077) is shown below as SEQ ID NO:1.
The SEQ ID NO:1 Klf4 polypeptide is encoded, for example, by a cDNA with NCBI accession number Klf4 NM 004235.6.
The sequence for human Klf4 accession no. NP_001300981.1 (GI: 930697457) is shown below as SEQ ID NO:2.
The SEQ ID NO:2 Klf4 polypeptide is encoded, for example, by a cDNA with NCBI accession number Klf4 NM_001314052.2.
A “Sox polypeptide” refers to any of the naturally-occurring members of the SRY-related HMG-box (Sox) transcription factors, characterized by the presence of the high-mobility group (HMG) domain, or variants thereof that maintain transcription factor activity similar (within at least 50%, 80%, or 90% activity) compared to the closest related naturally occurring family member, or polypeptides comprising at least the DNA-binding domain of the naturally occurring family member, and can further comprise a transcriptional activation domain. See, e.g., Dang, D. T., et al., Int. J. Biochem Cell Biol. 32:1103-1121 (2000). Exemplary Sox polypeptides include, e.g., Sox1, Sox-2, Sox3, Sox4, Sox5, Sox6, Sox7, Sox8, Sox9, Sox10, Sox11, Sox12, Sox13, Sox14, Sox15, Sox17, Sox18, Sox-21, and Sox30. Sox1 has been shown to yield iPS cells with a similar efficiency as Sox2, and genes Sox3, Sox15, and Sox18 have also been shown to generate iPS cells, although with somewhat less efficiency than Sox2. See, Nakagawa, et al., Nature Biotechnology 26:101-106 (2007). In some embodiments, variants have at least 85%, 90%, 95%, 97%, 98%, 99%, or 99.5% amino acid sequence identity across their whole sequence compared to a naturally occurring Sox polypeptide family member such as to those listed above or such as listed in Genbank. Sox polypeptides (e.g., Sox1, Sox2, Sox3, Sox15, or Sox18) can be from human, mouse, rat, bovine, porcine, or other animals. Generally, the same species of protein will be used with the species of cells being manipulated. The Sox2 polypeptide can be used as a pluripotency factor encoded in the polycistronic expression cassette.
For example, the Sox2 polypeptide encoded in the polycistronic expression cassette can have accession number CAA83435 (human Sox2), which has the following sequence (SEQ ID NO:3).
The Sox2 polypeptide is encoded, for example, by a cDNA with NCBI accession number NM_003106.4.
A “Myc polypeptide” refers any of the naturally-occurring members of the Myc family (see, e.g., Adhikary, S. & Eilers, M. Nat. Rev. Mol. Cell Biol. 6:635-645 (2005)), or variants thereof that maintain transcription factor activity similar (within at least 50%, 80%, or 90% activity) compared to the closest related naturally occurring family member, or polypeptides comprising at least the DNA-binding domain of the naturally occurring family member, and can further comprise a transcriptional activation domain. Exemplary Myc polypeptides include, e.g., c-Myc, N-Myc and L-Myc. In some embodiments, variants have at least 85%, 90%, 95%, 97%, 98%, 99%, or 99.5% amino acid sequence identity across their whole sequence compared to a naturally occurring Myc polypeptide family member, such as to those listed above or such as listed in Genbank. Myc polypeptides (e.g., c-Myc) can be from human, mouse, rat, bovine, porcine, or other animals. Generally, the same species of protein will be used with the species of cells being manipulated. The Myc polypeptide(s) can be a pluripotency factor. For example, in some cases the Myc polypeptide can be a human Myc polypeptide with accession number CAA25015 (human Myc), which has the following sequence (SEQ ID NO:4).
The Myc polypeptide with SEQ ID NO:4 is partially encoded, for example, by a nucleic acid with NCBI accession number X00196.1.
An “Oct polypeptide” refers to any of the naturally-occurring members of Octamer family of transcription factors, or variants thereof that maintain transcription factor activity, similar (within at least 50%, 80%, or 90% activity) compared to the closest related naturally occurring family member, or polypeptides comprising at least the DNA-binding domain of the naturally occurring family member, and can further comprise a transcriptional activation domain. Exemplary Oct polypeptides include Oct-1, Oct-2, Oct-3/4, Oct-6, Oct-7, Oct-8, Oct-9, and Oct-11. e.g., Oct3/4 (referred to herein as “Oct4”) contains the POU domain, a 150 amino acid sequence conserved among Pit-1, Oct-1, Oct-2, and uric-86. See, Ryan, A. K. & Rosenfeld, M. G. Genes Dev. 11, 1207-1225 (1997). In some embodiments, variants have at least 85%, 90%, 95%, 97%, 98%, 99%, or 99.5% amino acid sequence identity across their whole sequence compared to a naturally occurring Oct polypeptide family member such as to those listed above or such as listed in Genbank accession number NP002692.2 (human Oct4) or NP038661.1 (mouse Oct4). Oct polypeptides (e.g., Oct3/4) can be from human, mouse, rat, bovine, porcine, or other animals. Generally, the same species of protein will be used with the species of cells being manipulated. The Oct polypeptide(s) can be a pluripotency factor.
One example of an Oct4 polypeptide sequence is available in the NCBI database with accession number NP002692.2 (human Oct4), shown below as SEQ ID NO:5.
A cDNA nucleotide sequence for the human Oct4 polypeptide having SEQ ID NO:5 is available in the NCBI database as accession number NM_002701.4 (GI:116235483), which is shown below as SEQ ID NO:6.
The nucleic acid segments encoding Sox2, Klf4, and optionally c-Myc, are joined to form a larger polycistronic nucleic acid segment. As illustrated herein, the positions of the Sox2, Klf4, and optional c-Myc coding regions within the polycistronic nucleic acid can vary. In some cases, the Klf4 coding region is 5′ to the Sox2 and optional c-Myc coding regions. In other cases, the Sox2_coding region is 5′ to the Klf4 and optional c-Myc coding regions. In some cases, the cMyc coding region is not included in the polycistronic nucleic acid. In general, the polycistronic nucleic acid is constructed so that the Sox2 and Klf4 polypeptides are expressed at approximately equivalent levels.
Cleavage sites can be included in frame between the segments encoding Sox2, Klf4, and optionally c-Myc. Cleavable peptide linkers to be used between the Klf4, Sox2, and/or c-Myc coding regions can include, for example, 2A or LP4 sequences (de Felipe et al., Trends Biotechnol 24(2):68-75 (2006); Sun et al. Processing and targeting of proteins derived from polyprotein with 2A and LP4/2A as peptide linkers in a maize expression system, PLOS (2017)).
The cleavable linker can have a variety of sequences. The mechanism of 2A-mediated “self-cleavage” involves ribosome skipping the formation of a glycyl-prolyl peptide bond at the C-terminus of the 2A. Hence, the cleavable linker can have a Gly-Pro at its C-terminus linkage junction. A conserved sequence GDVEXNPGP (SEQ ID NO:7) (where X is any amino acid) is shared by different 2A linkers at their C-termini and is needed for generating steric hindrance and ribosome skipping.
The first discovered 2A was F2A (foot-and-mouth disease virus), after which E2A (equine rhinitis A virus), P2A (porcine teschovirus-1 2A), and T2A (thosea asigna virus 2A) were identified. The LP4 linker peptide is from a natural polyprotein occurring in the seed of Impatiens balsamina and can be split between the first and second amino acids during post-translational processing. Examples of cleavable linkers that can be used to link the Sox2 and Klf4, and optionally the c-Myc, proteins together include (where the N-terminal GSG can be present but may not be needed in some cases):
An example of an amino acid sequence for a S2AK2AM polypeptide is shown below as SEQ ID NO:16.
An example of an amino acid sequence for a S2AK polypeptide is shown below as SEQ ID NO:17.
Polycistronic nucleic acid segments encoding Sox2, Klf4, and optionally c-Myc, can be introduced into cells to facilitate conversion of cells into stem cells (e.g., pluripotent stem cells), or into other cell types. Nucleic acid segments encoding Sox2, Klf4, and optionally c-Myc can be inserted into or employed with any suitable expression system. The polycistronic Sox2, Klf4, and optionally c-Myc nucleic acids can be part of an expression cassette or expression vector that includes a promoter segment operably linked to the nucleic acid segment encoding the Sox2, Klf4, and optionally c-Myc.
Recombinant expression is usefully accomplished using a vector. Vectors include but are not limited to plasmids, viral nucleic acids, viruses, phage nucleic acids, phages, cosmids, and artificial chromosomes. The vector can also include other elements required for transcription (and translation if a marker gene or other protein encoded segment is included in the vector). Such expression cassettes and/or expression vectors can express sufficient amounts of the Sox2, Klf4, and optionally c-Myc to increase conversion of starting cells into stem cells or into cells of another phenotypic lineage.
Expression vectors and/or expression cassettes encoding polycistronic Sox2, Klf4, and optionally c-Myc can include promoters for driving the expression (transcription) of the polycistronic Sox2, Klf4, and optionally c-Myc. The vector can include a promoter operably linked to a polycistronic nucleic acid segment encoding Sox2, Klf4, and optionally c-Myc. Expression can include transcriptional activation, where transcription is increased above basal levels in the target starting cell by 10-fold or more, by 100-fold or more, such as by 1000-fold or more.
As used herein, vector refers to any carrier containing exogenous DNA. Thus, vectors are agents that transport the exogenous nucleic acid into a cell without degradation and include a promoter yielding expression of the polycistronic Sox2, Klf4, and optionally c-Myc in the cells into which it is delivered. A variety of prokaryotic and eukaryotic expression vectors are suitable for carrying, encoding and/or expressing polycistronic Sox2, Klf4, and optionally c-Myc mRNA. Such expression vectors include, for example, TetO-fuw, pET, pET3d, pCR2.1, pBAD, pUC, viral, and yeast vectors. The vectors can be used, for example, in a variety of in vivo and in vitro situations. For example, some of the experimental work illustrated herein involves use of or modification of the TetO-FUW vector.
The expression cassette, expression vector, and sequences in the cassette or vector can be heterologous. The promoter and/or other regulatory segments can be heterologous to the polycistronic segment encoding the Sox2, Klf4, and optionally c-Myc.
As used herein, the term “heterologous” when used in reference to an expression cassette, expression vector, regulatory sequence, promoter, or nucleic acid refers to an expression cassette, expression vector, regulatory sequence, or nucleic acid that has been manipulated in some way. For example, a heterologous promoter can be a promoter that is not naturally linked to a nucleic acid segment of interest, or that has been introduced into cells by cell transformation procedures. A heterologous nucleic acid or promoter also includes a nucleic acid or promoter that is native to an organism but that has been altered in some way (e.g., placed in a different chromosomal location, mutated, added in multiple copies, linked to a non-native promoter or enhancer sequence, etc.).
Heterologous coding regions can be distinguished from endogenous coding regions, for example, when the heterologous coding regions are joined to nucleotide sequences comprising regulatory elements such as promoters that are not found naturally associated with the coding region, or when the heterologous coding regions are associated with portions of a chromosome not found in nature (e.g., genes expressed in loci where the protein encoded by the coding region is not normally expressed). Similarly, heterologous promoters can be promoters that at linked to a coding region to which they are not linked in nature.
Viral vectors that can be employed include those relating to lentivirus, adenovirus, adeno-associated virus, herpes virus, vaccinia virus, polio virus, AIDS virus, neuronal trophic virus, Sindbis and other viruses. Also useful are any viral families which share the properties of these viruses which make them suitable for use as vectors. Retroviral vectors that can be employed include those described in by Verma, I. M., Retroviral vectors for gene transfer. In M
A variety of regulatory elements can be included in the expression cassettes and/or expression vectors, including promoters, enhancers, translational initiation sequences, transcription termination sequences and other elements.
A “promoter” is generally a sequence or sequences of DNA that function when in a relatively fixed location in regard to the transcription start site. For example, the promoter can be upstream of the coding region for the Sox2, Klf4 and (optionally) c-Myc. A “promoter” contains core elements required for basic interaction of RNA polymerase and transcription factors and can contain upstream elements and response elements. “Enhancer” generally refers to a sequence of DNA that functions at no fixed distance from the transcription start site and can be either 5′ or 3′ to the transcription unit. Furthermore, enhancers can be within an intron as well as within the coding sequence itself. They are usually between 10 and 300 bases in length, and they function in cis. Enhancers function to increase transcription from nearby promoters. Enhancers, like promoters, also often contain response elements that mediate the regulation of transcription. Enhancers often determine the regulation of expression.
Expression vectors used in eukaryotic host cells (e.g., animal, human or nucleated cells) can also contain sequences necessary for the termination of transcription which can affect mRNA expression. For mRNA, these regions are transcribed as polyadenylated segments in the untranslated portion of the mRNA encoding tissue factor protein. The 3′ untranslated regions also include transcription termination sites. The identification and use of 3′ untranslated regions including polyadenylation signals in expression constructs is well established.
The expression of Sox2, Klf4, and optionally c-Myc from a polycistronic expression cassette or expression vector can be controlled by any promoter capable of expression in prokaryotic cells or eukaryotic cells. Such promoters can include ubiquitously acting promoters, inducible promoters, or developmentally regulated promoters. Ubiquitously acting promoters include, for example, a CMV-β-actin promoter. Inducible promoters can include those that are active in particular cell populations or that respond to the presence of drugs such as tetracycline or doxycycline. Examples of prokaryotic promoters that can be used include, but are not limited to, SP6, T7, T5, tac, bla, trp, gal, lac, or maltose promoters. Examples of eukaryotic promoters that can be used include, but are not limited to, constitutive promoters, e.g., viral promoters such as CMV, SV40 and RSV promoters, as well as regulatable promoters, e.g., an inducible or repressible promoter such as the tet promoter, the hsp70 promoter and a synthetic promoter regulated by CRE. Vectors for bacterial expression include pGEX-5X-3, and for eukaryotic expression include pCIneo-CMV.
The expression cassette or vector can include a nucleic acid sequence encoding a marker product. This marker product is used to determine if the gene has been delivered to the cell and once delivered is being expressed. Preferred marker genes are fluorescent proteins, such as red fluorescent protein, green fluorescent protein, yellow fluorescent protein. The E. coli lacZ gene can also be employed as a marker. In some embodiments the marker can be a selectable marker. When such selectable markers are successfully transferred into a host cell, the transformed host cell can survive if placed under selective pressure. There are two widely used distinct categories of selective regimes. The first category is based on a cell's metabolism and the use of a mutant cell line which lacks the ability to grow independent of a supplemented media. The second category is dominant selection which refers to a selection scheme used in any cell type and does not require the use of a mutant cell line. These schemes typically use a drug to arrest growth of a host cell. Those cells which have a novel gene would express a protein conveying drug resistance and would survive the selection. Examples of such dominant selection use the drugs neomycin (Southern P. and Berg, P., J. Molec. Appl. Genet. 1:327 (1982)), mycophenolic acid, (Mulligan, R. C. and Berg, P. Science 209: 1422 (1980)) or hygromycin (Sugden, B. et al., Mol. Cell. Biol. 5: 410-413 (1985)).
Gene transfer can be obtained using direct transfer of genetic material, in but not limited to, plasmids, viral vectors, viral nucleic acids, phage nucleic acids, phages, cosmids, and artificial chromosomes, or via transfer of genetic material in cells or carriers such as cationic liposomes. Such methods are well known in the art and readily adaptable for use in the method described herein. Transfer vectors can be any nucleotide construction used to deliver genes into cells (e.g., a plasmid), or as part of a general strategy to deliver genes, e.g., as part of recombinant retrovirus or adenovirus (Ram et al. Cancer Res. 53:83-88, (1993)). Appropriate means for transfection, including viral vectors, chemical transfectants, or physico-mechanical methods such as electroporation and direct diffusion of DNA, are described by, for example, Wolff, J. A., et al., Science, 247, 1465-1468, (1990); and Wolff, J. A. Nature, 352, 815-818, (1991).
For example, the polycistronic Sox2, Klf4, and optionally c-Myc nucleic acid segment, expression cassette and/or vector can be introduced to a cell by any method including, but not limited to, calcium-mediated transformation, electroporation, microinjection, lipofection, particle bombardment and the like. The cells can be expanded in culture and then administered to a subject, e.g. a mammal such as a human. The amount or number of cells administered can vary but amounts in the range of about 106 to about 109 cells can be used. The cells are generally delivered in a physiological solution such as saline or buffered saline. The cells can also be delivered in a vehicle such as a population of liposomes, exosomes or microvesicles.
The polycistronic expression cassette(s) and/or expression vector(s) encoding the Sox2, Klf4, and optionally c-Myc can be introduced into starting cells or any cell subjected to the methods described herein. For example, the cells can be contacted with viral particles that include the expression cassettes. For example, retroviruses and/or lentiviruses are suitable for expression of Sox2, Klf4, and optionally c-Myc. Commonly used retroviral vectors are “defective”, i.e. unable to produce viral proteins required for productive infection. Rather, replication of the vector requires growth in a packaging cell line. To generate viral particles comprising nucleic acids of interest, the retroviral nucleic acids comprising the nucleic acid of interest are packaged into viral capsids by a packaging cell line. Different packaging cell lines provide a different envelope protein to be incorporated into the capsid, this envelope protein determining the specificity of the viral particle for the cells. Envelope proteins are of at least three types, ecotropic, amphotropic and xenotropic. Retroviruses packaged with ecotropic envelope protein, e.g. MMLV, are capable of infecting most murine and rat cell types and are generated by using ecotropic packaging cell lines such as BOSC23 (Pear et al. (1993) Proc. Natl. Acad. Sci. 90:8392-8396). Retroviruses bearing amphotropic envelope protein, e.g. 4070A (Danos et al, supra.), are capable of infecting most mammalian cell types, including human, dog and mouse, and are generated by using amphotropic packaging cell lines such as PA12 (Miller et al. (1985) Mol. Cell. Biol. 5:431-437); PA317 (Miller et al. (1986) Mol. Cell. Biol. 6:2895-2902); GRIP (Danos et al. (1988) Proc. Natl. Acad. Sci. 85:6460-6464). Retroviruses packaged with xenotropic envelope protein, e.g. AKR env, are capable of infecting most mammalian cell types, except murine cells. The appropriate packaging cell line may be used to ensure that the subject cells are targeted by the packaged viral particles. Suitable methods of introducing the retroviral vectors comprising expression cassettes into packaging cell lines and of collecting the viral particles that are generated by the packaging lines are well known in the art.
The polycistronic expression cassette(s) and/or expression vector(s) encoding the Sox2, Klf4, and optionally c-Myc can be can be integrated into the genomes of the cells, or the polycistronic expression vectors can be maintained episomally for the time needed to redirect the cells to a stem cell lineage. Episomal introduction and expression of pluripotency factors is desirable because the mammalian cell genome is not altered by insertion of the episomal vectors and because the episomal vectors are lost over time. Hence, use of episomal expression vectors allows expression of pluripotency factors for the short time that is needed to convert non-pluripotent mammalian cells to pluripotent cells, while avoiding possible chromosomal mutation and later expression of pluripotency factors during if differentiation into another cell type is desired.
Episomal plasmid vectors with the polycistronic expression cassette(s) encoding the Sox2, Klf4, and optionally c-Myc, can be introduced into mammalian cells as described for example, in Yu et al., Human induced pluripotent stem cells free of vector and transgene sequences, Science 324(5928): 797-801 (2009); United States Patent Application 20120076762, and Okita et al., A more efficient method to generate integration-free human iPS cells, N
For example, the polycistronic expression cassette can be included within and the Sox2, Klf4, and optionally c-Myc, can be expressed from an episomal vector that has EBNA-1 (Epstein-Barr nuclear antigen-1) and oriP, or Large T and SV40ori sequences so that the vectors can be episomally present and replicated without incorporation into a chromosome.
The polycistronic expression cassettes and/or vectors can be introduced into mammalian cells in the form of DNA, protein or mature mRNA by a technique such as lipofection, binding with a cell membrane-permeable peptide, liposomal transfer/fusion, or microinjection. When in the form of DNA, a vector such as a virus, a plasmid, or an artificial chromosome can be employed. Examples of viral vectors include retrovirus vectors, lentivirus vectors (e.g., according to Takahashi, K. and Yamanaka, S., Cell, 126: 663-676 (2006); Takahashi, K. et al., Cell, 131: 861-872 (2007); Yu, J. et al., Science, 318: 1917-1920 (2007)), adenovirus vectors (e.g., Okita K, et al., Science 322: 949 (2008)), adeno-associated virus vectors, and Sendai virus vectors (Proc Jpn Acad Ser B Phys Biol Sci. 85: 348-62, 2009), the contents of each of which references are incorporated herein by reference in their entireties. Also, examples of artificial chromosome vectors that can be used include human artificial chromosome (HAC), yeast artificial chromosome (YAC), and bacterial artificial chromosome (BAC and PAC) vectors. As a plasmid, a plasmid for mammalian cells can be used (e.g., Okita K, et al., Science 322: 949 (2008)). A vector can contain regulatory sequences such as a promoter, an enhancer, a ribosome binding sequence, a terminator, and a polyadenylation site, so that a pluripotency factor can be expressed.
Starting cells are cells targeted for transformation by the polycistronic Sox2, Klf4, and optionally c-Myc expression cassette or expression vector.
A starting population of cells may be derived from essentially any source and may be heterogeneous or homogeneous. The term “selected cell” or “selected cells” is also used to refer to starting cells. In certain embodiments, the cells to be transformed as described herein are adult cells, including essentially any accessible adult cell type(s). The cells can, for example, be autologous or allogeneic cells (relative to a subject to be treated or who may receive the cells). In some cases, the starting cells are adult progenitor cells or adult somatic cells. In still other embodiments, the starting cells include any type of cell from a newborn, including, but not limited to newborn cord blood, progenitor cells, and tissue-derived cells (e.g., somatic cells). In some embodiments, the starting population of cells does not include pluripotent stem cells. In other embodiments, the starting population of cells can include pluripotent stem cells. Accordingly, a starting population of cells that is transformed by the polycistronic Sox2, Klf4, and optionally c-Myc expression cassettes or expression vectors described herein, can be essentially any live cell type, particularly a somatic cell type.
As illustrated herein, fibroblasts can be reprogrammed to cross lineage boundaries and to be directly converted to pluripotent stem cells. However, the polycistronic expression cassettes and vectors can be used to convert or initiate conversion of starting cells to another cell type. Various cell types from all three germ layers have been shown to be suitable for somatic cell reprogramming by genetic manipulation, including, but not limited, to liver and stomach (Aoi et al., Science 321(5889):699-702 (2008); pancreatic f3 cells (Stadtfeld et al., Cell Stem Cell 2: 230-40 (2008); mature B lymphocytes (Hanna et al., Cell 133: 250-264 (2008); human dermal fibroblasts (Takahashi et al., Cell 131, 861-72 (2007); Yu et al., Science 318(5854) (2007); Lowry et al., Proc Natl Acad Sci USA 105, 2883-2888 (2008); Aasen et al., Nat Biotechnol 26(11): 1276-84 (2008); meningiocytes (Qin et al., J Biol Chem 283(48):33730-5 (2008); neural stem cells (DiSteffano et al., Stem Cells Devel. 18(5): (2009); and neural progenitor cells (Eminli et al., Stem Cells 26(10): 2467-74 (2008). Any starting cells can be transformed with the polycistronic Sox2, Klf4, and optionally c-Myc expression cassette or expression vectors described herein to initiate reprogramming to other cell types.
In some embodiments the starting cells can transiently or continuously express Sox2, Klf4, and optionally c-Myc by incubation under cell culture conditions.
Starting cells are treated for a time and under conditions sufficient to convert the starting cells across lineage and/or differentiation boundaries to form stem cells, especially pluripotent stem cells, or de-differentiated stem cells that may not be completely pluripotent. This process is referred to as ‘reprogramming.’ In some cases, the pluripotent stem cells or de-differentiated cells so formed can be differentiated into other types of cells (e.g., neural, cardiac, pancreatic, liver and other types of cells, or progenitors of such cells).
The time for conversion of starting cells into induced pluripotent stem cells or de-differentiated stem cells that may not be completely pluripotent can vary. For example, the starting cells can be incubated until stem cell markers are expressed. Such stem cell markers can include Nanog, SSEA1, Oct4, and combinations thereof. In another example, the starting cells can be incubated until markers of a different cell type are expressed. In some cases, the starting cells are incubated for a time sufficient to form teratomas that contain all three germ layers, or that can generate chimeric mice.
The time for conversion of starting cells into induced pluripotent stem cells can therefore vary. For example, the starting cells can be incubated under cell culture conditions for at least about 3 days, or for at least about 4 days, or for at least about 5 days, or for at least about 6 days, or for at least about 7 days, or for at least about 8 days, or for at least about 9 days, or for at least about 10 days, or for at least about 11 days, or for at least about 12 days, or for at least about 13 days, or for at least about 14 days, or for at least about 15 days, or for at least about 16 days, or for at least about 17 days, or for at least about 18 days, or for at least about 19 days.
In some embodiments, the stem cells so formed can be expanded or further incubated under cell culture conditions for about 5 days to about 35 days, or about 7 days to about 33 days, or about 10 days to about 30 days, or about 12 days to about 27 days, or about 15 days to about 25 days, or about 18 days to about 23 days.
The Examples illustrate some of the experiments performed and results obtained during development of the invention.
This Example illustrates some of the materials and methods used in the development of the invention.
HEK293T/17 cells (female) were cultured in DMEM (Invitrogen) supplemented with 10% FBS.
Mouse embryonic fibroblasts (MEFs) (mixed sex, for male and female embryos were combined to generate the primary cells) were prepared from the E13.5 embryos, and mouse tail tip fibroblasts (TTFs) (male) were derived from a 14-month old adult male mouse. MEFs and TTFs were cultured in MEF medium (DMEM supplemented with 10% FBS and non-essential amino acid (NEAA, Invitrogen)).
Mouse primary neural progenitor cells (NPCs) (mixed sex, for male and female embryos were combined to generated the primary cells) were prepared from the head of E13.5 embryos and maintained on matrigel (BD, 356231)-coated plates in the NPC medium (Neuralbasal medium (Invitrogen), 2% B27 (Invitrogen), 1% GlutaMAX™ (Invitrogen), 1% penicillin/streptomycin (Invitrogen), 2 μg/ml heparin (Sigma Aldrich), 20 ng/ml bFGF (Thermo fisher Scientific), and 20 ng/ml EGF (R&D)).
Mouse ESCs (male) and iPSCs (male) were maintained on feeders in ESC medium (Knock Out-DMEM (Invitrogen) with 5% ES-FBS (Invitrogen) and 15% Knock Out-serum replacement (KSR, Invitrogen), 1% GlutaMAX™, 1% NEAA, 0.1 mM 2-mercaptoethanol (Sigma Aldrich), 10 ng/ml leukemia inhibitory factor (LIF, Millipore), 3 μM CHIR99021 (Selleck), and 1 μM PD0325901 (Selleck)).
For microinjection, iPSCs (male) were maintained under feeder-free N2B27 condition (50% DMEM/F12 (Invitrogen), 50% Neurobasal Medium, 0.5% N2 (Invitrogen), 1% B27, 0.1 mM 2-mercaptoethanol, 10 ng/ml LIF, 25 μg/ml BSA (Invitrogen), 3 μM CHIR99021, and 1 μM PD0325901).
OG2 Mice (B6; CBA-Tg(Pou5f1-EGFP)2Mnn/J) (male and female) were from the Jackson Laboratory (004654). CD-1 (ICR) mice (male and female) were from Charles River (#022). OG2 mice were crossed to obtain OG2 MEFs as well as NPCs in the resulting embryos at embryonic day 13.5. Male OG2 mice at 14-months were used for the derivation of TTFs.
Super-ovulated female CD1 (ICR) mice were mated to CD1 (ICR) males for blastocyst preparation and further microinjection experiments. E13.5 embryos of tetraploid complementation assay were used for derivation of secondary MEFs and NPCs.
All animal procedures were approved by the Institutional Animal Care and Use Committee at the Tsinghua University, Beijing; as well as the Institutional Animal Care and Use Committee at the Institute of Zoology, Chinese Academy of Science, Beijing.
Plasmids generated in this study are listed in Table 1.
TetO-FUW-OSKM (Catalog no. 20321), TetO-FUW-Oct4 (Catalog no. 20323), TetO-FUW-Sox2 (Catalog no. 20326), TetO-FUW-K1f4 (Catalog no. 20322), TetO-FUW-c-Myc (20324), and FUW-M2rtTA (Catalog no. 20342) are from Addgene. See also, Brambrink et al. Cell Stem Cell 2: 151-159 (Feb. 2008). All plasmids in this study are based on the TetO-FUW backbone. For cloning, the backbone was digested with appropriate enzymes and each insert (e.g., the Sox2, Klf4, and c-Myc coding regions) was recovered by gel extraction. All inserts were amplified by PCR using the KOD Xtreme HS Polymerase (Novagen, 71975-3), and ligated into polycistronic expression cassette using T4 ligase or Gibson Assembly Master Mix (NEB, E2611). All plasmids were confirmed by enzyme digestion and sequencing.
For lentivirus preparation, HEK293T cells were plated 1 day ahead to reach about 70% confluency for transfection, and VSV-G envelope expressing plasmid pMD2.G (Addgene, 12259) and psPAX2 (Addgene, 12260) were used for lentiviral packaging. Plasmids (1.8 μg) with the gene of interests were mixed with psPAX2 (1.35 μg) and pMD2.G (0.45 μg) for each well of six-well plates, and Lipofectamine® 3000 Reagent (Thermo Fisher Scientific, L3000) was used for transfection. Five to eight hours later, the medium was changed to fresh MEF medium. Supernatant containing the virus was harvested at 48 hours, passed through a 0.45-μM filter to remove the cell debris, and mixed with 1 volume of fresh medium for immediate use.
For infection, mouse embryonic fibroblasts (MEFs) or neural progenitor cells (NPCs) were incubated with the lentiviral supernatant in the presence of 5 μg/ml polybrene (Millipore) for 8 hours or overnight. Medium was changed back to MEF or NPC medium after the infection for cells to recovery.
E13.5 embryos were used for MEFs derivation. After the embryo recovery, the head, limbs, and internal organs, especially the gonads, were removed under dissection microscope. The remaining bodies of the embryos were finely minced with two blades and digested in 0.05% Trypsin-EDTA for 15 minutes. MEF medium was then added to stop the trypsinization. Further dissociation of the tissues was performed by pipetting up and down for a few times. Cells were then collected by centrifugation and plated onto 15 cm dishes for expansion (passage 0, P0). MEFs were used before passages 4 for all tests.
One day prior to the experiment, Poly-D-lysine (PDL)/Laminin coated plates were prepared for NPC cultures. Briefly, 12-well culture plates were filled with PDL (10 μg/ml in distilled water) and incubated overnight at 37° C. incubator. On the next day, the solution was removed from plate wells. The wells were then washed with distilled water for three times and air-dried. Laminin (5 μg/ml in distilled water) was then added and incubated at 37° C. incubator for 4 hours to overnight. Laminin was removed from well before using the plate.
E13.5 embryos were used for NPC derivation. The embryo was decapitated with dissecting forceps. Skin and skull were peeling back from head to expose the brain. The whole brain was picked out using curved forceps and placed into cold DPBS. After rinsing with DPBS twice, brain was placed in a 35-cm dish, finely minced with sharp scissors. The minced tissue was transferred to a 15 ml centrifuge tube and digested with 1 ml of 0.05% Trypsin-EDTA at 37° C. for 7 minutes. To stop the enzymatic reaction, 5 ml of NPC medium was added to tube, followed by centrifugation and removing the supernatant. Tissue pellet was further dissociated with 1 ml of NPC medium by pipetting up and down several times and filtered with a 70 μm cell strainer. Cells were then placed to PDL/Laminin-coated 12-well plate and cultured in NPC medium for several days. During culture, NPCs proliferated and detached from plate to form floating neural spheres (P0). Spheres were then collected and digested to single NPCs with StemPro Accutase (Thermo Fisher Scientific). Since then, NPCs were cultured adherently on matrigel-coated plate for the following passages. NPCs were used before passages 4 for all tests.
For tail tip fibroblast (TTF) derivation, 14-month old adults were used. The tail was peeled, minced into 1 mm pieces, and cultured in a 60-cm dish. Medium was half changed every 3 days until fibroblasts migrated out of the graft pieces. Cells were then passaged and ready for use (P1).
Reprogramming and Derivation of iPSC Lines
Oct4-GFP (OG2) MEFs or TTFs were seeded onto gelatin-coated plates at the density of 10,000 cells/cm2. After transduction, cells were allowed to recover in MEF medium for 24-36 hours. Cells were then replated with the density of 10,000 cells/cm2, except elsewhere indicated. For NPCs, 5,000 cells/cm2 were seeded on Poly-D-lysine (PDL)/Laminin-coated six-well plates. After transduction, cells were allowed to recover in NPC medium for 24-36 hours. To start reprogramming, cultures were switched to reprogramming medium (ESC medium without Chirr99021 and PD0325901) with 1 μg/ml doxycycline. Doxycycline was used to induce expression of protein(s) from the polycistronic expression cassette. Introduction of doxycycline was denoted as day 0. During the entire process, medium was refreshed every other day for the first 10 days and everyday afterwards. From day 10, ESC medium with 1 μg/ml doxycycline was used. EGFP-positive colonies were usually counted on day 12 and ready for iPSC derivation on day 16.
For iPSC line derivation, the reprogramming cultures were incubated with 1 mg/ml collagenase B (Roche) for 20 minutes at 37° C. Single colonies were picked up under microscope and digested in 0.05% trypsin for 5-10 minutes for single-cell suspensions. Cells were then seeded on feeders in normal ESC medium, and these cells are considered as passage 0 (P0) iPSCs.
To calculate the EGFP-positive colony efficiency precisely, 2° MEFs or NPCs were seeded into 48-well plates. 24 hours later (day 0), half of the wells were stained with Heochest 33342 (Thermo Fisher Scientific), and the exact cell numbers in the well were recorded by counting the stained nuclei. The other half of cells was switched to reprogramming medium with 1 μg/ml doxycycline for further culture. During the experiment, medium was changed every other day, and the EGFP-positive colonies were counted on day 12. The final efficiency was calculated by dividing the EGFP-positive colony numbers by the initial cell numbers recorded on day 0.
Another method was also used. Single cells were seeded into the wells of 96-well plates with feeders. The next day, MEF medium was switched to reprogramming medium with 1 μg/ml doxycycline (day 0). During the reprogramming process, medium was changed every 4 days, and EGFP-positive colony numbers were counted on day 16. The efficiency was calculated by dividing the total EGFP-positive colony numbers by the well numbers.
iPSCs were cultured under N2B27 condition without feeders. On the day of injection, cells were suspended in Blastocyst Injection Medium (25 mM HEPES-buffered DMEM plus 10% FBS, pH 7.4).
For generation of chimeric mice, super-ovulated female CD1 (ICR) mice (4-week old) were mated to CD1 (ICR) males. Morulae (2.5 d post-coitum) were collected and cultured overnight in KSOM medium (Millipore) at 37° C. in 5% CO2. The next morning, the blastocysts were ready for iPSCs injection, and approximately 10 cells were injected for each blastocyst. Injected blastocysts were cultured in KSOM medium at 37° C. in 5% CO2 for 1-2 hours and then implanted into uteri of 2.5 d post-coitum pseudo-pregnant CD1 (ICR) female mice.
For tetraploid complementation assay, two cell-stage CD1 (ICR) embryos were electrofused to produce tetraploid embryos, and approximately 10 iPSCs were injected into the reconstructed tetraploid blastocysts. Injected blastocysts were cultured in KSOM medium at 37° C. in 5% CO2 for 1-2 hours and then implanted into uteri of 0.5 d post-coitum pseudo-pregnant CD1 (ICR) female mice. E13.5 embryos were dissected for generation of secondary MEFs and NPCs (2° MEFs and NPCs).
For gonadal contribution, the injected embryos were recovered 13 days (E13.5) after implantation. The gonadal regions of each embryo were collected and visualized under microscope for EGFP signal.
To validate the induction of reprogramming factors, 2° MEFs and NPCs were plated on 24-well plate at the density of 20,000 cells/cm2. After cultured in reprogramming medium with 1 μg/ml doxycycline for 48 hours, cells were fixed for immunofluorescent staining to test the expression of Sox2 and Klf4.
To test the influence of original cell density to final reprogramming efficiency, 2° MEFs and NPCs were plated on feeders in 12-well plates at density of 500 cells/well, 1,000 cells/well, 2.000 cells/well, 4,000 cells/well, respectively. Cells were reprogrammed as previously described. On reprogramming day 12, EGFP positive colony numbers were counted under fluorescent microscope.
To validate the requirement of doxycycline during reprogramming, 2° MEFs and NPCs were plated on feeders in 12-well plates at the density of 1,000 cells/well. Doxycycline was removed from reprogramming medium from day 0 to day 12. EGFP-positive colony numbers were counted on day 16.
To test the reprogramming kinetics with small molecules, 2° MEFs and NPCs were plated on feeders in 12-well plates at density of 1,000 cells/well. Cells were cultured in reprogramming medium with 1 μg/ml doxycycline, 1 μM A83-01 and 10 Forskolin for 12 days. The cell morphology was recorded for the reprogramming kinetics. All conditions were repeated in triplicate.
Secondary (2°) MEFs and NPCs were seeded on feeders and reprogrammed as described before. On the reprogramming day 6, EGFP-positive and EGFP-negative cells were sorted by flow cytometry, and cells of the same number were replated to a new 6-well plate with feeders, respectively. Cells were cultured in reprogramming medium with 1 μg/ml doxycycline for another 6 more days and the number of EGFP-positive colony was counted.
To generate teratoma, iPSCs maintained on feeders were switched to matrigel-coated plate and cultured in ESC medium without Chirr99021 and PD0325901. Then iPSCs were trypsinized and suspended in culture medium containing 2% matrigel. Then 1.0×106 cells were subcutaneously injected into the hind limbs of SCID mice. 5 weeks after the injection, tumors were dissected and fixed in 4% of polyformaldehyde (Sigma Aldrich), followed by paraffin section and haematoxylin-eosin (HE) staining.
Bisulfite treatment was done with the EpiTect Bisulfite Kit (Qiagen, 59104) exactly following the protocol provided for cultured cells. Recovered DNA was amplified by two-round PCR with primers targeting the Oct4 promoter, and the PCR products were ligated with T-vectors pMD20 (Clontech, 3270). Ten random selected clones were sequenced. PCR primers used are listed in Table 2.
Karyotype analysis of iPS cell lines was performed at Cell Line Genetics by analyzing the Giemsa binding (Meisner and Johnson, 2008). Briefly, iPSCs undergoing active division were blocked at metaphase with 0.1 μg/ml colcemid. Then iPSCs were trypsinized to single cells by 0.05% trypsin-EDTA. KCL hypotonic solution (0.075M) was used to resuspend and swollen iPSCs by gently swirling and incubating at room temperature for 20 min. Subsequently, iPSCs were fixed in fixative (3:1 v/v ratio of methanol to acetic acid), followed by preparation of slides for karyotyping.
Reprogramming cells were treated with 1 mg/ml of Collagenase B for 10-30 min depending on the cell density, followed by 5-minute trypsinization with 0.05% trypsin. Cells were then suspended in culture medium and filtered through 40 μm cell strainer. Flow cytometry analysis or sorting was performed on BD FACS Aria III. The treatments with collagenase B, filtration, and sorting usually lead to decrease by 30-50 times in generating EGFP-positive colonies. All data were analyzed with FlowJo v10.
Cell lysis samples or immunoprecipitation (IP) samples were loaded onto 10% SDS-PAGE gel for separation and then transferred to nitrocellulose membranes 0.45 μm (BioRad, 1620115). The following antibodies were used for immuno-blotting (IB): anti-Oct4 (Abcam ab19857), anti-Sox2 (Millipore AB5603 for IP, Abcam ab79351 for IB), anti-Klf4 (Stemgent 09-0021), and anti-actin (Santa Cruz sc-47778).
Secondary MEFs (10,000 cells/cm2) were plated onto a gelatin-coated 10-cm dish and cultured in reprogramming medium with 1 μg/ml doxycycline for 2 days. Cells were lysed with 500 μL ice-cold IP buffer (50 mM Tris-HCl pH 7.4, 150 mM NaCl, 1% TritonX-100, 0.1% NP-40, and 1.5 mM EDTA) on ice for 20 minutes. Protein A dynabead slurry (20 μL, Life Sciences Technologies, 10001D) was used for each IP test. Elute target and co-IP proteins with SDS sample buffer for direct western detection.
Cells were washed three times with DPBS and fixed with 4% PFA for 30 minutes at 4° C. Donkey serum (10% in DPBS) with 1% BSA was used for blocking for 1 hour at 4° C. Triton-X100 (0.3%) was added during blocking when staining of nuclei-located proteins. Antibodies were diluted in DPBS with 1% BSA. The following primary antibodies were used for staining: anti-Sox2 (Millipore, AB5603; Abcam, ab79351), anti-Klf4 (Stemgent, 09-0021; R&D, AF3158), anti-c-Myc (epitomics, 1472-1), anti-Nanog (Abcam, 80892), and anti-SSEA-1 (Stemgent, 09-0095).
Single visual field imaging was performed with fluorescent microscope (IX83, Olympus); Images were taken and analyzed using CellSens Dimension. For multiple visual fields imaging and analysis, cell culture plates were scanned using automated microscope (Lionheart FX, BioTek). Images were concatenative synthesized and analyzed using Gen5 Software.
For cultured cells, samples were lysed, and total RNA was extracted with RNeasy Plus mini kit (Qiagen, 74136) with QiaShredder (Qiagen, 79656) according to the manufacture's instruction. For sorted cells, samples at the indicated time points were collected and lysed in TRIzol™ Reagent (Invitrogen, 15596026). Total RNA was extracted as the following procedures. Linear acrylamide (Thermo Fisher Scientific, AM9520) was added to lysed cell samples for enhancing the precipitation of RNA. Chloroform was then added, and the mixtures was shaken vigorously with lysed samples to extract RNA. After centrifugation, RNA dissolved in aqueous phase was carefully transferred into an RNase free tube and mixed intensively with 1 volume of isopropanol (Sigma Aldrich). Samples were then placed at −20° C. overnight to precipitate RNA. On the next day, isopropanol was carefully removed after centrifugation and RNA was pelleted at bottom of the tube. The RNA pellet was then washed with 75% ethanol to eliminate possible residual traces of guanidinium. Ethanol was then removed after centrifugation by pipet tip and 10 minutes of air dry. Finally, total RNA was dissolved in 20 μl of nuclease-free water by pipetting up and down several times if necessary.
To test the gene expression level, total RNA was used for qPCR experiments. Genomic DNA elimination and reverse transcription were performed using the iScript cDNA synthesis kit (Bio-Rad), and qPCR was performed with iQ™ SYBR Green Supermix (Bio-Rad) on CFX384 Real-Time PCR System (Bio-Rad). All reactions were done in quadruplicate. All data were statistically analyzed in Prism 7 with the build-in analysis methods.
Total RNA of samples at the indicated times were used for sequencing. Sequencing libraries were generated using NEBNext® Ultra™ RNA Library Prep Kit for Illumina® (NEB #E7530L), according to the manufacturer's instructions. A total amount of 2 μg RNA per sample was used as input material for library preparation. The library fragments were purified with QiaQuick PCR kits (Qiagen, 28106), quality-controlled by Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA) and quantified by qPCR. Libraries were then sequenced using Illumina HiSeq 2500 platform and 150 bp paired-end (PE150) reads were generated.
All ChIP experiments were performed with EZ-ChIP Chromatin Immunoprecipitation kit (Millipore, 17-371), following the protocol provided with the kit with minor modifications. Briefly, day 0 or days 2 reprogramming cells (˜1.0×107) in a 15-cm dish were crosslinked with 0.55 ml of 37% formaldehyde to 20 ml of growth medium. 1 ml of 2.5 M glycine (20×) was added to quench the unreacted formaldehyde. Cells in each 15-cm plate were collected and resuspended in 830 μl of lysis buffer. Genomic DNA was then sheared to a length of 100-500 bp on Covaris S220 Sonicator with optimized conditions. For Sox2 or Klf4 ChIP, 1.0×107 reprogramming cells and 10 μg of antibody were used for each experiment, and for H3K27ac ChIP, 5.0×106 reprogramming cells and 2 μg of antibody were used. Finally, DNA fragments were recovered with NucleoSpin Gel and PCR Clean-up kit (MAGHEREY-NAGEL, 740609) and used for either qPCR or library preparation. The primary antibodies used are as follows: anti-Sox2 (Millipore, AB5603), anti-Klf4 (R&D, AF3158), and anti-H3K27ac (Abcam, ab4729).
Sequencing libraries were generated with NEBNext® Ultra™ II DNA Library Prep Kit for Illumina (E7645S), according to the manufacturer's instructions. Briefly, 4 ng of ChIP DNA and 40 ng of Input DNA were used for library preparation. NEBNext Multiplex Oligos for Illumina (Set 1, NEB #E7335; Set 2, NEB #E7500) were used for PCR amplification of adaptor-ligated DNA. Libraries were purified with SPRIselect® Reagent Kit (Beckman Coulter, Inc. #B23317), quality controlled by Bioanalyzer 2100, and quantified by qPCR. Sequencing was performed on Illumina NextSeq 550AR using single end 50-bp reads.
Statistical analyses were performed in GraphPad Prism 7. Significance and the value of n were calculated with the indicated methods in each figure legend. The data are presented as the mean±SD. *p<0.05; **p<0.01; ns, non-significant.
Before alignment, low quality reads and those containing adapter or poly-N were removed using FastQC. The remaining reads were mapped to the assembly mm9 genome using the default parameters in STAR (2.5.1b) aligner.
To clustering samples at different reprogramming time point, Manhattan method was used to find the distance and then hierarchical clustering was applied using hclust.
Differentially expressed genes (DEGs) of two groups was performed using the DESeq2 R package (1.10.1). DESeq2 provides statistical routines for determining differential expression from digital gene expression data using a model based on the negative binomial distribution. The resulting P-values were adjusted using the Benjamini and Hochberg's approach to control the false discovery rate. Genes with an adjusted P-value <0.05 found by DESeq2 were designated as differentially expressed.
Principal Components Analysis (PCA) was performed in R with R packages gmodels (2.16.2). Fast.prcomp was used for efficient computation of principal components and singular value decompositions.
Gene ontology (GO) enrichment of DEGs during reprogramming was calculated using the DAVID 6.8 functional annotation bioinformatics tool (see website at david.ncifcrf.gov). Terms that had a P-value <0.05 were defined as significantly enriched.
The correlation of all RNA sequencing data between MEF or NPC samples at different reprogramming times and was analyzed in R using pheatmap (1.0.10). The correlation of 112 pluripotency-associated genes between reprogramming NPCs and MEFs were analyzed using corrplot (0.84).
Alignment of the ChIP-seq reads was done using Bowtie2 with mouse genome build mm9, the result was then filtered by MAPQ (0.1.19) scores with smtools to only keep reads with MAPQ larger than 10 (Langmead et al., 2009). To identify regions of ChIP-Seq enrichment over background, peak callings were performed by MACS2 (2.1.0), using the corresponding input DNA as control for each sample (Zhang et al., 2008). Default parameters in MACS were used. The number of reads per million mapped reads (RPM) was calculated in each peak and the corresponding input control of that peak.
The fasta sequences for the peak regions called from MACS were collected and used as input for the motif finding algorithm MEME-Chip (maximum motif width=30, assuming any number of motifs per sequence) (Machanick and Bailey, 2011).
Genomic Regions Enrichment of Annotations Tool (GREAT) was used to analyze the peak distribution (McLean et al., 2010). For each peak, the smallest distance was calculated between the peak and the nearby Transcription Start Site (TSS) of genes (negative distance for peaks upstream of TSS). The distributions of distance for the peaks from different samples were compared. Bedtools was used to intersect the peaks from Sox2 and Klf4 to identify the colocalized (Sox_Klf) peaks, Sox_solo, and Klf_solo peaks.
Sox2, Klf4, and H3K27 acetylation ChIP-seq signals of the Sox_Klf, Sox_solo and Klf_solo peaks were analyzed and quantitatively measured, sorted by the intensity of Sox2 in Sox_Klf and Sox_solo and by the intensity of Klf4 in Klf_solo. Ngsplots was used to create the heatmap and average profile plot in
Genes with the TSS within +/−5 kb of the Sox_Klf, Sox_solo and Klf_solo peaks were identified. A Mann-Whitney U test was performed to measure the statistical significance of the difference between the normalized reads of each group of genes.
The enrichment of binding peaks of Sox2, Klf4 and H3K27 acetylation in pluripotency-associated regions was visualized in IGV (2.4.10). ChIP-qPCR was further conducted for detection of Sox2 and Klf4 binding property from the first exon to the distal enhancer of Oct4. Primers used for qPCR are listed in Table 2.
The accession number for the RNA-seq data and ChIP-seq data is NCBI GEO: GSE98280.
This Example describes use of polycistronic expression cassettes to precisely and conveniently control the stoichiometry of multiple factors at the single-cell level.
Polycistronic cassettes were constructed with 2A peptide cleavage sequences (de Felipe et al., 2006) between the segments encoding the reprogramming factors (e.g., Oct4, Klf4, Sox2, and/or c-Myc). Various combinations of two-pioneer factors were initially tested, and c-Myc (M) was included in all combinations because of its purported function in enhancing reprogramming efficiency through transcriptional amplification (Lin et al., 2012; Nie et al., 2012).
Thus, polycistronic Oct4, Sox2, and c-Myc (O2AS2AM), Oct4, Klf4, and c-Myc (O2AK2AM), and Sox2, Klf4, and c-Myc (S2AK2AM), were derived from a previous O2AS2AK2AM plasmid (Carey et al., 2009). These cassettes were transduced into mouse embryonic fibroblasts (MEFs) (
Three combinations were first tested for their capacity to induce reprogramming in OG2 MEFs following a widely used method (Takahashi and Yamanaka, 2006). OG2 MEFs harbor an EGFP reporter under control of the endogenous Oct4 promoter, so the EGFP signal can be used as a mark of reprogramming efficiency (Szabo et al., 2002). During the 2-week reprogramming, EGFP-positive colonies were counted on days 4, 7, 10, and 14. Surprisingly, EGFP-positive colonies were observed by day 7 under the S2AK2AM condition, and on day 14, about 60 EGFP-positive colonies were produced per 100,000 starting MEFs (0.06%) (
S2AK2AM generated typical iPSC-like colonies, and iPSC lines were derived from these colonies. When these lines were passaged in ESC medium, they formed ESC-like domed colonies and were Oct4-EGFP positive, which remained unchanged even after 20 passages (
The functional pluripotency of these lines was then tested by examining their capacity to form teratomas and chimeras. S2AK2AM iPSCs were able to form teratomas that contained all three germ layers and were successfully used to generate chimeric mice (
This Example describes experiments illustrating the capacity of S2AK2AM to reprogram different cell types to form pluripotent stem cells.
OG2 neural progenitor cells (NPCs), which expressed the NPC markers Nestin, Sox2, and Pax6 and formed neural spheres were transduced with S2AK2AM and exposed to a similar reprogramming protocol. OG2 MEFs harbor an EGFP reporter under control of the endogenous Oct4 promoter, so the EGFP signal can be used as a mark of reprogramming efficiency (Szabo et al., 2002). EGFP-positive colonies were obtained after 2 weeks, and stable iPSC lines were established (
Next, a more differentiated cell type, OG2 adult mouse tail tip fibroblasts (TTFs), was examined. Similarly, following S2AK2AM transduction and the reprogramming protocol, iPSC lines were obtained from the EGFP-positive colonies, and their pluripotent gene expression was not distinguishable from R1 ESCs.
This Example describes S2AK2AM-mediated reprogramming of MEFs and NPCs from embryos generated via the 4N assay with S2AK2AM iPSCs. These embryo derived MEFs and NPCs were referred to as secondary S2AK2AM cells (or S2AK2AM 2° MEFs and NPCs), because these cells were 100% iPSC derived (
These 2° MEFs and NPCs responded to doxycycline robustly. After 12 hours of induction, the Sox2 and Klf4 proteins were readily detected (
The inventors then evaluated whether the 2° MEFs could be reprogrammed. After 2 days, all cells underwent dramatic morphological changes simultaneously, which became more pronounced on day 3 (
During the MEF reprogramming, approximately 3% of the cells were reprogrammed to form EGFP-positive colonies (
With the optimized conditions, the efficiency of generating EGFP-positive colonies was then precisely calculated. Exact cell numbers were counted before reprogramming, and after 12 days. As shown in
Finally, the temporal requirement of exogenous factors for MEF reprogramming was examined. Doxycycline was removed from day 1 to day 12 (
Similar results were also observed for 2° NPCs (
This Example illustrates that in addition to providing simultaneous expression of Sox2, Klf4, and c-Myc, the other advantage of S2AK2AM is that the Sox2, Klf4, and c-Myc stoichiometry from the polycistronic cassettes is stable at the single-cell level.
Optimal Sox2, Klf4, and c-Myc stoichiometry was verify by observing the signal intensity of Sox2 and Klf4 as analyzed by immunostaining. In single cells transduced with S2AK2AM, the Sox2 and Klf4 expression signals were generally equivalent, which was in sharp contrast to the mosaic pattern observed in cells transduced with three vectors individually expressing Sox2, Klf4, and c-Myc (S+K+M) (
The effect of disrupting the factor stoichiometry was then tested by moving one factor to a monocistronic cassette, resulting in a combination of monocistronic Sox2 plus polycistronic Klf4 and c-Myc (S+K2AM), monocistronic Klf4 plus polycistronic Sox2 and c-Myc (K+S2AM) and monocistronic c-Myc plus polycistronic Sox2 and Klf4 (M+S2AK) (
The inventors then tested how the disruption of Sox2 and Klf4 stoichiometry would affect the reprogramming outcome. To facilitate the comparison of reprogramming efficiency, viral titrations were adjusted to achieve comparable percentages of cells co-expressing Sox2 and Klf4 in all conditions (
The inventors further investigated how the stoichiometry of Sox2 and Klf4 affected S2AK2AM reprogramming by manipulating the ratio of the two factors. Sox2 (+Sox2) or Klf4 (+Klf4) were individually overexpressed in 2° MEFs (
The inventors then examined if polycistronic Sox2 and Klf4 was sufficient for iPSC generation without co-expression of c-Myc. The two-factor combinations, S2AK, S2AM, and K2AM, were used for reprogramming (
This Example describes experiments designed to understand how the transcriptional network changed from distinct differentiation lineage pathways towards pluripotency, to gain insights into S2AK2AM reprogramming.
Because of the well-characterized function of Oct4 in pluripotency induction and its early detection in both MEF and NPC reprogramming, the inventors used the activation of endogenous Oct4 to monitor the S2AK2AM reprogramming to pluripotency. As shown in
Compared to day 0 MEFs, at days 2, 4, 8, and 12 the number of differential expressed genes (DEGs) detected was 1941, 3523, 3910, 2972, and 3969, respectively, in reprogramming intermediates, and iPSCs.
Hierarchical clustering placed the reprogramming intermediates from day 2 to day 12 close to each other, indicating that two major transcriptional switches occur between days 0 and 2 (day0/2) and between day 12 and mature iPSCs (day12/iPSC) (
Next, the inventors evaluated whether similar switches occur during NPC reprogramming. Because EGFP-positive cells were not visible on day 4, sorting was only performed on days 8 and 12 (
There were 699 upregulated genes during day 0/2 switch in MEF reprogramming. GO analysis revealed the overrepresentation of epithelial genes, indicating mesenchymal-to-epithelial transition (MET) was involved. Interestingly, epithelial genes were also highly enriched in the 880 genes upregulated during the day0/2 switch of NPC reprogramming. This indicates that by day 2, both MEFs and NPCs were reprogrammed towards intermediates with the characteristics of epithelial cells. These analyses indicate that S2AK2AM reprogramming might lead to convergent molecular trajectories after the day 0/2 transcriptional switch in both cell types.
The inventors compared the transcriptional profiles of day 0 MEFs and NPCs.
Surprisingly, on day 2, the number of DEGs between reprogramming MEFs and NPCs dropped sharply by 93.8% to 174, indicating the transcriptional similarity of MEF and NPC intermediates. The cell types continued to converge over the course of reprogramming, with no detectable difference in gene expression on day 12 (
PCA and correlation analysis clearly supported the disappearance of transcriptional difference between the cell types (
This Example describes the major molecular events governing the two transcriptional switches.
For the day 0/2 switch, many genes were differentially expressed, with 699 upregulated versus 1242 downregulated in MEFs and 880 upregulated versus 1245 downregulated in NPCs (
In the MEF gene set, gene ontology (GO) analysis showed that the downregulated genes were mostly responsible for tissue development, and tissue expression analysis revealed enrichment of genes related to fibroblasts and mesenchymal stem cells (Table 3A-3B).
These analyses indicated the silencing of MEF program during day 0/2 switch. Downregulation of fibroblast markers was confirmed by qPCR (
Similarly, in the NPC reprogramming, the 908 downregulated genes were mainly associated with nervous system development, including Nestin, Lhx2, Nlgn1, et al. Genes expressed in brain, hypothalamus, and cerebellum were overrepresented. Thus, with both MEF and NPC reprogramming, our data indicate that the removal of original cell identities marks the day 0/2 transcriptional switch.
This Example illustrates how the pluripotency network was established during S2AK2AM reprogramming by showing the expression of pluripotency genes was significantly upregulated.
During MEF reprogramming to iPSCs, 1615 genes were upregulated. These genes were divided into groups based on the timepoint at which they reached a threshold of twofold upregulation, and a pattern of progressive activation of genes was established (
A similar analysis was performed of NPC reprogramming. Lin28a, Lin28b, Zfp296, Cdh1, Oct4, Zscan10 were upregulated by day 4. After that, Nanog, Sall4, Tcl1, Fgf4, Zpf42, Gdf3, Utf1, Fbxo15, Esrrb, Dppa4/5, and Nodal were activated on day 8. Fewer genes were found activated by day 12, including Tdgf1, Dppa3, Eras, and Tex19.1. This list was similar to that from MEF reprogramming, with the leading activation of Oct4, Lin28a/b, Zfp296, and Chd1, and a group of other key pluripotent factors following. These observations indicate that independent of the original cell identity, the pluripotency network was gradually established in a similar way during MEF and NPC reprogramming.
To further verify the similar kinetics of pluripotency activation in MEFs and NPCs, 112 pluripotency-related genes were selected, and their expression levels in MEF and NPC reprogramming intermediates were compared in parallel. This correlation analysis revealed that the intermediates at each time points were highly similar (
At the day12/iPSC transition,
This Example illustrates the genome binding patterns of Sox2 and Klf4, which illustrate how S2AK2AM facilitates reprogramming.
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) was performed on day 2 reprogrammed MEFs. Overexpressed proteins tended to bind promiscuously across the genome, so to capture the true binding events, two independent experiments were conducted and only those peaks observed consistently (31236 for Sox2 and 1175 for Klf4) were used in this study. De novo motif discovery showed that Sox2 and Klf4 motifs were highly enriched in the immunoprecipitated DNA fragments, verifying the effectiveness of our experiments (
Interestingly, the Klf4 motif was overrepresented in the Sox2 peaks and vice versa (
To further investigate their cooperativity, the global colocalization of Sox2 and Klf4 in the genome was analyzed. About 80% of the Klf4 peaks were bound by Sox2 (Sox_Klf peaks) (
The inventors then examined whether this cooperative binding facilitated the activation of their target genes. Sox2 binding (Sox2 Klf and Sox2 solo) led to increased H3K27 acetylation on day 2, but a similar effect was not observed for Klf4 (Klf4_solo) (
The inventors then investigated whether Sox2 and Klf4 bindings were the same between the S2AK2AM condition and Sox2 or Klf4 overexpression alone. The samples for Sox2 or Klf4 overexpression alone (Sox2_tetO or Klf4_tetO) were from previous data (Chronis et al., 2017). Although the binding motifs were similar (
This Example illustrates how Sox2 and Klf4 cooperate in binding and activating pluripotent gene loci. Previously, the inventors had showed that Oct4, Lin28a/b, Zfp296, and Sox21 were upregulated early during MEF reprogramming. In this Example, the inventors investigated whether Sox2 and Klf4 co-occupied these genes.
Because of the critical role of Oct4 in pluripotency induction and maintenance, the inventors studied this case individually with ChIP-qPCR. Primers were designed to cover a large region from the first exon to the distal enhancer of Oct4 along the Oct4 regulatory region of chromosome 17 (
More interestingly, we noticed that co-binding of Sox2 and Klf4 occurs on one of the 231 ESC-specific superenhancers upstream of Oct4. These superenhancers were reported by Whyte and colleagues in 2013, and are associated with the high expression of nearby pluripotent genes (Whyte et al., 2013). The inventors searched whether other ESC-specific super-enhancers were also bound by Sox2 or Klf4. Interestingly, Sox2 binding also occurred on four superenhancers close to Nanog and Sox2, and these superenhancers has been shown to be essential for Nanog and Sox2 expression in ESCs (Blinka et al., 2016; Li et al., 2014; Zhou et al., 2014). A Fgf4 superenhancer was also bound by Sox2. These results demonstrate that on day 2 of S2AK2AM reprogramming, Sox2 and Klf4 cooperatively bound and remodeled some pluripotent gene loci even prior to their transcriptional activation, indicating their function in early priming towards pluripotency.
All patents and publications referenced or mentioned herein are indicative of the levels of skill of those skilled in the art to which the invention pertains, and each such referenced patent or publication is hereby specifically incorporated by reference to the same extent as if it had been incorporated by reference in its entirety individually or set forth herein in its entirety. Applicants reserve the right to physically incorporate into this specification any and all materials and information from any such cited patents or publications.
The following statements are intended to describe and summarize various embodiments of the invention according to the foregoing description in the specification.
The specific methods and compositions described herein are representative of preferred embodiments and are exemplary and not intended as limitations on the scope of the invention. Other objects, aspects, and embodiments will occur to those skilled in the art upon consideration of this specification and are encompassed within the spirit of the invention as defined by the scope of the claims. It will be readily apparent to one skilled in the art that varying substitutions and modifications may be made to the invention disclosed herein without departing from the scope and spirit of the invention. The invention illustratively described herein suitably may be practiced in the absence of any element or elements, or limitation or limitations, which is not specifically disclosed herein as essential. The methods and processes illustratively described herein suitably may be practiced in differing orders of steps, and the methods and processes are not necessarily restricted to the orders of steps indicated herein or in the claims.
The terms and expressions that have been employed are used as terms of description and not of limitation, and there is no intent in the use of such terms and expressions to exclude any equivalent of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention as claimed. Thus, it will be understood that although the present invention has been specifically disclosed by preferred embodiments and optional features, modification and variation of the concepts herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention as defined by the appended claims and statements of the invention. Under no circumstances may the patent be interpreted to be limited to the specific examples or embodiments or methods specifically disclosed herein. Under no circumstances may the patent be interpreted to be limited by any statement made by any Examiner or any other official or employee of the Patent and Trademark Office unless such statement is specifically and without qualification or reservation expressly adopted in a responsive writing by Applicants.
This application claims benefit of priority to the filing date of U.S. Provisional Application Ser. No. 62/916,830, filed Oct. 18, 2019, the contents of which are specifically incorporated herein by reference in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/055943 | 10/16/2020 | WO |
Number | Date | Country | |
---|---|---|---|
62916830 | Oct 2019 | US |