This application contains a Sequence Listing in computer readable form (filename: J022770007WO00-SEQ-HJD; 1.50 MB—ASCII text file; created Oct. 3, 2018), which is incorporated herein by reference in its entirety and forms part of the disclosure.
Selectable markers are widely adopted in transgenesis and genome editing for selecting engineered cells with a desired genotype. Antibiotic resistance genes (encoding antibiotic resistance proteins) provide resistance to specific antibiotics so that only cells expressing these resistance genes survive and multiply. Antibiotic resistance genes/antibiotics available for use in eukaryotic cells include hygB/Hygromycin, neo/Geneticin®/G418, pac/Puromycin, Sh bla/Phleomycin D1 (Zeocin™), and bsd/Blasticidin. Fluorescent proteins, such as green fluorescent protein (GFP) provide another means of cell selection, for example, via fluorescent-activated cell sorting (FACS) techniques or fluorescent microscopy.
There is a limited number of antibiotic resistance genes/antibiotics available for use in eukaryotic (e.g., mammalian) cells, thus selection schemes for identifying cells containing multiple transgenes are limited. Not only is there a limited number of distinct genes that confer antibiotic resistance in eukaryotic cells, but simultaneous use of as few as three different antibiotic resistance genes can adversely affect the health of transgenic cells. While antibiotic selection can be performed serially, this process is time-consuming. These limitations on selections schemes for identifying transgenic cells are problematic when there is a need to identify cells into which multiple transgenes have been introduced (e.g., to generate a transgenic organism, e.g., animal model, such as a mouse model).
Provided herein are methods, compositions and kits useful for the production and/or identification of, for example, cells and/or organisms harboring two or more transgenes (e.g., double-transgenics, triple-transgenics, etc.). For example, the compositions and kits may be used for the production and/or identification of cells and/or organisms harboring two, three, or four transgenes. This technology is based, at least in part, on a protein splicing mechanism initiated by an intein auto-processing domain, which facilitates the joining (conjugation) specifically in multi-transgenic cells of multiple (e.g., two, three, or four) separate selectable marker protein fragments (double-transgenic cells, triple-transgenic cells, or quadruple-transgenic cells). Joining of the two, three, four, or more separate selectable marker protein fragments in the multi-transgenic cells produces a full-length selectable marker protein that confers, for example, antibiotic resistance (an antibiotic resistance protein) or is capable of fluorescence under an appropriate wavelength of light (fluorescent protein). Cells expressing a full-length antibiotic resistance gene survive in the presence of the corresponding antibiotic and thus are selected as multi-transgenic (e.g., double-transgenic, triple-transgenic, or quadruple-transgenic) cells. Likewise, cells expressing a full-length functioning fluorescent protein fluoresce under the appropriate wavelength of light and thus are selected as multi-transgenic (e.g., double-transgenic, triple-transgenic, or quadruple-transgenic) cells.
Thus, the present disclosure provides, in some embodiments, methods comprising delivering to a composition comprising eukaryotic cells two or more vectors, wherein each vector comprises (i) a nucleotide sequence encoding a selectable marker protein fragment linked to an N-terminal intein protein fragment and/or a C-terminal intein protein fragment and (ii) a nucleotide sequence encoding a molecule of interest, wherein the intein protein fragments, when joined in frame to form full-length function proteins, catalyze joining of the selectable marker protein fragments to produce a full-length selectable marker protein. For example, when two vectors are delivered to a population of cells (e.g., under transfection conditions), some cells will take up the first vector (the vector is introduced in the cells), some cells will take up the second vector, and some cells will take up both vectors. Only those cells that take up both vectors are capable of expressing a full-length functioning selectable marker protein, thus only those cells are selected as double-transgenic cells.
In some embodiments, methods herein comprising delivering to a composition comprising eukaryotic cells (a) a first vector comprising (i) a nucleotide sequence encoding a first selectable marker protein fragment (e.g., antibiotic resistance protein fragment or fluorescent protein fragment) upstream from a nucleotide sequence encoding an N-terminal intein protein fragment and (ii) a nucleotide sequence encoding a first molecule, and (b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal intein protein fragment upstream from a second selectable marker protein fragment (e.g., antibiotic resistance protein fragment or fluorescent protein fragment) and (ii) a nucleotide sequence encoding a second molecule, wherein the N-terminal intein protein fragment and the C-terminal intein protein fragment catalyze joining of the first selectable marker protein fragment to the second selectable marker protein fragment to produce a full-length selectable marker protein. When the two vectors are delivered to a population of cells (e.g., under transfection conditions), some cells will take up the first vector (the vector is introduced in the cells), some cells will take up the second vector, and some cells will take up both vectors. Only those cells that take up both vectors are capable of expressing a full-length functioning selectable marker protein, thus only those cells are selected as double-transgenic cells.
In other embodiments, methods comprise delivering to eukaryotic cells (a) a first vector comprising (i) a nucleotide sequence encoding an N-terminal fragment of a selectable marker protein (e.g., antibiotic resistance protein or fluorescent protein), which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein and (ii) a nucleotide sequence encoding a first molecule of interest, (b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a nucleotide sequence encoding a central fragment of the selectable marker protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein and (ii) a nucleotide sequence encoding a second molecule of interest, and (c) a third vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a nucleotide sequence encoding a C-terminal fragment of the selectable marker protein and (ii) a nucleotide sequence encoding a third molecule of interest, wherein the N-terminal fragment and the C-terminal fragment of the first intein catalyze joining of N-terminal fragment of the selectable marker protein to the central fragment of the selectable markerprotein, and the N-terminal fragment and the C-terminal fragment of the second intein catalyze joining of central fragment of the selectable markerprotein to the C-terminal fragment of the selectable markerprotein, to produce a full-length selectable markerprotein. When the three vectors are delivered to a population of cells (e.g., under transfection conditions), some cells will take up the first vector (the vector is introduced in the cells), some cells will take up the second vector, some cells will take up the third vector, some cells will take up two different vectors, and some cells will take up all three vectors. Only those cells that take up all three vectors are capable of expressing a full-length functional selectable marker protein, thus only those cells are selected as triple-transgenic cells.
In still other embodiments, methods comprise delivering to eukaryotic cells (a) a first vector comprising (i) a nucleotide sequence encoding an N-terminal fragment of a selectable marker protein (e.g., antibiotic resistance protein or fluorescent protein), which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein and (ii) a nucleotide sequence encoding a first molecule of interest, (b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a nucleotide sequence encoding a first central fragment of the selectable marker protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein and (ii) a nucleotide sequence encoding a second molecule of interest, (c) a third vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a nucleotide sequence encoding a second central fragment of the selectable marker protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a third intein and (ii) a nucleotide sequence encoding a third molecule of interest, and (d) a fourth vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the third intein, which is upstream from a nucleotide sequence encoding a C-terminal fragment of the selectable marker protein and (ii) a nucleotide sequence encoding a third molecule of interest, wherein the N-terminal fragment and the C-terminal fragment of the first intein catalyze joining of N-terminal fragment of the selectable marker protein to the first central fragment of the selectable marker protein, the N-terminal fragment and the C-terminal fragment of the second intein catalyze joining of first central fragment of the selectable marker protein to the second central fragment of the selectable marker protein, the N-terminal fragment and the C-terminal fragment of the third intein catalyze joining of second central fragment of the selectable marker protein to the C-terminal fragment of the selectable marker protein to produce a full-length selectable marker protein. When the four vectors are delivered to a population of cells (e.g., under transfection conditions), some cells will take up the first vector (the vector is introduced in the cells), some cells will take up the second vector, some cells will take up the third vector, some will take up the fourth vector, some cells will take up two different vectors, some cells will take up three different vectors, and some will take up all four vectors. Only those cells that take up all four vectors are capable of expressing a full-length functional selectable marker protein, thus only those cells are selected as quadruple-transgenic cells.
It should be understood that any one embodiment described herein, including those only disclosed in the examples or one section of the specification, is intended to be able to combine with any one or more other embodiments unless explicitly disclaimed.
Provided herein, in some aspects, are methods of producing transgenic (e.g., multi-transgenic, such as double transgenic or triple transgenic) organisms, into which more than one transgene (or other genetic element) is introduced. As shown in
Another exemplary method of the present disclosure comprises delivering to a population of cells (a) a first vector comprising (i) a nucleotide sequence encoding an N-terminal fragment of an antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein and (ii) a nucleotide sequence encoding a first molecule of interest, (b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a nucleotide sequence encoding a central fragment of the antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein and (ii) a nucleotide sequence encoding a second molecule of interest, and (c) a third vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a nucleotide sequence encoding a C-terminal fragment of the antibiotic resistance protein and (ii) a nucleotide sequence encoding a third molecule of interest. Some cells of the population will take up a single vector (carrying only a fragment of the intein, a fragment of the selectable marker protein, and a single transgene), while other cells of the population will take up two vectors or all three vectors (and thus all intein fragments, all selectable marker protein fragments, and all transgenes of interest). In cells that take up all three vectors, following translation, the intein protein fragments spontaneously and non-covalently assemble (cooperatively fold) into an intein structure to catalyze joining of the N-terminal fragment of the selectable marker protein to the central fragment, and the central fragment to the C-terminal fragment of the selectable marker protein to produce a full-length selectable marker protein, which enables specific selection of those triple-transgenic cells. For example, if the selectable marker protein is an antibiotic resistance protein, only triple-transgenic cells expressing the full-length (functional) antibiotic resistance protein will survive selection in the present of the particular antibiotic. If the selectable marker protein is a fluorescent protein, as another example, only triple-transgenic cells expressing the full-length (functional) fluorescent protein will emit a detectable signal such that only those signal-emitting cells are selected.
An intein (intervening protein) carries out a unique auto-processing event known as protein splicing in which it excises itself out from a larger precursor polypeptide through the cleavage of two peptide bonds and, in the process, ligates the flanking extein (external protein) sequences through the formation of a new peptide bond. This rearrangement occurs post-translationally (or possibly co-translationally), as intein genes are found embedded in frame within other protein-coding genes. Furthermore, intein-mediated protein splicing is spontaneous; it requires no external factor or energy source, only the folding of the intein domain. In nature, the precursor protein contains three segments—an N-extein (N-terminal portion of the protein) followed by the intein followed by a C-extein (C-terminal portion of the protein). Following splicing, the resulting protein contains the N-extein linked to the C-extein.
There are two types of inteins: cis-splicing inteins are single polypeptides that are embedded in a host protein, whereas trans-splicing inteins (referred to as split inteins) are separate polypeptides that mediate protein splicing after the intein pieces and their protein cargo associate (see, e.g., Paulus, H Annu Rev Biochem 69:447-496 (2000); and Saleh L, Perler F B Chem Rec 6:183-193 (2006)). Split inteins catalyze a series of chemical rearrangements that require the intein to be properly assembled and folded. The first step in splicing involves an N—S acyl shift in which the N-extein polypeptide is transferred to the side chain of the first residue of the intein. This is then followed by a trans-(thio)esterification reaction in which this acyl unit is transferred to the first residue of the C-extein (which is either serine, threonine, or cysteine) to form a branched intermediate. In the penultimate step of the process, this branched intermediate is cleaved from the intein by a transamidation reaction involving the C-terminal asparagine residue of the intein. This then sets up the final step of the process involving an S—N acyl transfer to create a normal peptide bond between the two exteins (Lockless, S W, Muir, T W PNAS 106(27): 10999-11004 (2009)).
To date, there are at least 70 different intein alleles, distinguished not only by the type of host gene in which the inteins are embedded, but also the integration point within that host gene (Perler, F B Nucleic Acids Res. 30: 383-384 (2002); Pietrokovski, S Trends Genet. 17: 465-472 (2001)). A small fraction (less than 5%) of the identified intein genes encode split inteins. Unlike the more common contiguous inteins, split inteins are transcribed and translated as two separate polypeptides, the N-intein and C-intein, each fused to one extein. Upon translation, the intein fragments spontaneously and non-covalently assemble (cooperatively fold) into the canonical intein structure to carry out protein splicing in trans. The first two split inteins to be characterized, from the cyanobacteria Synechocystis species PCC6803 (Ssp) and Nostoc punctiforme PCC73102 (Npu), are orthologs naturally found inserted in the a subunit of DNA Polymerase III (DnaE). Npu is especially notable due its remarkably fast rate of protein trans-splicing (t1/2=50 s at 30° C.). This half-life is significantly shorter than that of Ssp (t1/2=80 min at 30° C.) (Shah, N H et al. J. Am. Chem. Soc. 135: 5839 (2013)).
Herein, split inteins are used to catalyze the joining of two fragments (e.g., an N-terminal fragment and a C-terminal fragment) of a selectable marker protein, such as an antibiotic resistance protein or a fluorescent protein to produce a functional, full-length protein (e.g.,
A split intein may be a natural split intein or an engineered split intein. Natural split inteins naturally occur in a variety of different organisms. The largest known family of split inteins is found within the DnaE genes of at least 20 cyanobacterial species (Caspi J, et al. Mol. Microbiol. 50: 1569-1577 (2003)). Thus, in some embodiments of the present disclosure, a natural split intein is selected from DnaE inteins. Non-limiting examples of DnaE inteins include Synechocystis sp. DnaE (SspDnaE) inteins and Nostoc punctiforme (NpuDnaE) inteins.
In some embodiments, a split intein is an engineered split intein. Engineered split inteins may be produced from contiguous inteins (where a contiguous intein is artificially split) or may be modified natural split inteins that, for example, promote efficient protein purification, ligation, modification and cyclization (e.g., NpuGEP and CfaGEP, as described by Stevens, A J PNAS 114(32): 8538-8543 (2017)). Methods for engineering split inteins are described, for example, by Aranko, A S et al. Protein Eng Des Sel. 27(8): 263-271 (2014), incorporated herein by reference. In some embodiments, the engineered split intein is engineered from DnaB inteins (Wu, H, et al. Biochim Biophys Acta 1387(1-2): 422-432 (1998)). For example, the engineered split intein may be a SspDnaB S1 intein. In some embodiments, the engineered split intein is engineered from GyrB inteins. For example, the engineered split intein may be a SspGyrB S11 intein.
In some embodiments, wherein triple-transgenics are produced, for example, the first intein may be the same as the second intein (e.g., both DnaE inteins). In other embodiments, two different inteins may be used (e.g., a DnaE intein and a DnaB intein). In some embodiments, the first intein is a NpuDnaE intein and the second intein is a NpuDnaE intein.
Transgenic (e.g., double and/or triple transgenic) cells of the present disclosure are selected based on their expression of a full-length selectable marker protein. A selectable marker protein, generally, confers a trait suitable for artificial selection. Examples of selectable marker proteins include antibiotic resistance proteins and fluorescent proteins.
An antibiotic resistance gene is a gene encoding a protein that confers resistance to a particular antibiotic or class of antibiotics. Non-limiting examples of antibiotic resistance genes for use in eukaryotic cells include those encoding proteins that confer resistance to hygromycin, G418, puromycin, phleomycin D1 or blasticidin. Non-limiting examples of antibiotic resistance genes for use in prokaryotic cells include those encoding proteins that confer resistance to hygromycin, G418, puromycin, phleomycin D1, blasticidin, kanamycin, spectinomycin, streptomycin, ampicillin, carbenicillin, bleomycin, erythromycin, polymyxin D, tetracycline and chloramphenicol.
Hygromycin B is an antibiotic produced by the bacterium Streptomyces hygroscopicus. It is an aminoglycoside that kills bacteria, fungi and higher eukaryotic cells by inhibiting protein synthesis. Hygromycin phosphotransferase (HPT), encoded by the hpt gene (also referred to as the hph or aphIV gene) originally derived from Escherichia coli, detoxifies the aminocyclitol antibiotic hygromycin B. Thus, in some embodiments, the selectable marker gene of the present disclosure is the hpt gene.
G418 (GENETICIN®) is an aminoglycoside antibiotic similar in structure to gentamicin B 1. It is produced by Micromonospora rhodorangea. G418 blocks polypeptide synthesis by inhibiting the elongation step in both prokaryotic and eukaryotic cells. Resistance to G418 is conferred by the neo gene from Tn5 encoding an aminoglycoside 3′-phosphotransferase, APT 3′ II. G418 is an analog of neomycin sulfate, and has similar mechanism as neomycin. Thus, in some embodiments, the selectable marker gene of the present disclosure is the neo gene.
Puromycin is an aminonucleoside antibiotic, derived from Streptomyces alboniger, that causes premature chain termination during translation taking place in the ribosome. Puromycin is selective for either prokaryotes or eukaryotes. Resistance to puromycin is conferred through expression of the puromycin N-acetyl-transferase (pac) gene. Thus, in some embodiments, the selectable marker gene of the present disclosure is the pac gene.
Phleomycin D1 (e.g., ZEOCIN®) is a glycopeptide antibiotic and one of the phleomycins from Streptomyces verticillus belonging to the bleomycin family of antibiotics. It is a broad-spectrum antibiotic that is effective against most bacteria, filamentous fungi, yeast, plant, and animal cells. It causes cell death by intercalating into DNA and induces double strand breaks of the DNA. Resistance to phleomycin D1 is conferred by the product of the Sh ble gene first isolated from Streptoalloteichus hindustanus. Thus, in some embodiments, the selectable marker gene of the present disclosure is the Sh ble gene.
Blasticidin S is an antibiotic that is produced by Streptomyces griseochromogenes. Blasticidin prevents the growth of both eukaryotic and prokaryotic cells by inhibiting termination step of translation and peptide bond formation (to lesser extent) by the ribosome. Resistance to blasticidin is conferred by at least three different genes: bls (an acetyltransferase) from Streptoverticillum spp.; bsr (a blasticidin-S deaminase) from Bacillus cereus (other bsr genes are known as well); and bsd (another deaminase) from Aspergillus terreus. Thus, in some embodiments, the selectable marker gene of the present disclosure is the bls gene, the bsr gene, or the bsd gene.
Non-limiting examples of fluorescent proteins that may be used as provided herein include TagCFP, mTagCFP2, Czurite, ECFP2, mKalama1, Sirius, Sapphire, T-Sapphire, ECFP, Cerulean, SCFP3C, mTurquoise, mTurquoise2, monomeric Midoriishi-Cyan, TagCFP, mTFP1, EGFP, Emerald, Superfolder GFP, Monomeric Czami Green, TagGFP2, mUKG, mWasabi, Clover, mNeonGreen, EYFP, Citrine, Venus, SYFP2, TagYFP, Monomeric Kusabira-Orange, mKOK, mKO2, mOrange, mOrange2, mRaspberry, mCherry, mStrawberry, mScarlet, mTangerine, tdTomato, TagRFP, TagRFP-T, mCpple, mRuby, mRuby2, mPlum, HcRed-Tandem, mKate2, mNeptune, NirFP, TagRFP657, IFP1.4 and iRFP.
Full-length selectable marker genes, in some embodiments, are produced by joining in the same cell two selectable marker gene fragments. In some embodiments, with reference to any full-length protein, one of the fragments is an N-terminal fragment (N-extein), while the other fragment is a C-terminal fragment (C-extein). Thus, in some embodiments, a first antibiotic resistance protein fragment is an N-terminal antibiotic resistance protein fragment, and a second antibiotic resistance protein fragment is a C-terminal antibiotic resistance protein fragment. In other embodiments, a first fluorescent protein fragment is an N-terminal fluorescent protein fragment, and a second fluorescent protein fragment is a C-terminal fluorescent protein fragment.
In other embodiments, full-length selectable marker genes are produced by joining in the same cell three or more selectable marker gene fragments. In some embodiments, with reference to any full-length protein, one of the fragments is an N-terminal fragment, one or more (e.g., 1, 2, or 3) of the fragments is a central fragment, and one of the fragments is a C-terminal fragment.
An N-terminal fragment may be any protein fragment that includes the free amine group (—NH2) of the full-length protein. A C-terminal fragment may be any protein fragment that includes the free carboxyl group (—COOH). A central fragment may be any protein fragment that is located between the N-terminal fragment and the C-terminal fragment of the full-length protein.
For example, amino acids 1-89 of the gene encoding hygromycin (a 341-amino acid protein) may be referred to as the N-terminal protein fragment, while amino acids 90-341 may be referred to as the C-terminal fragment. Similarly, with reference to
As another example, amino acids 1-52 of the gene encoding hygromycin (a 341-amino acid protein) may be referred to as the N-terminal protein fragment, amino acids 53-89 may be referred to as the central protein fragment, and amino acids 90-341 may be referred to as the C-terminal fragment. Similarly, amino acids 1-89 of the gene encoding hygromycin may be referred to as the N-terminal protein fragment, amino acids 90-240 may be referred to as the central fragment, and amino acids 241-341 may be referred to as the C-terminal fragment.
The methods and compositions of the present disclosure are used, in some embodiments, to produce multi-transgenic (e.g., double and/or triple transgenic) cells and/or organisms. Thus, in some embodiments, the methods use one vector that encodes a first molecule (a first molecule of interest) and another vector that encodes a second molecule (a second molecule of interest). In some embodiments, the methods use yet another vector that encodes a third molecules of interest. Additional vectors (e.g., encoding additional central fragments of a selectable marker protein) may encode additional molecules of interest. Molecules of interest may be, for example, polypeptides (e.g., proteins and peptides) or polynucleotides (e.g., nucleic acids, such as DNA or RNA).
In some embodiments, the first molecule (e.g., located on the first vector) is a protein. In some embodiments, the second molecule (e.g., located on the second vector) is a protein. In some embodiments, the third molecule (e.g., located on the third vector) is a protein. Examples of proteins of interest include, but are not limited to, enzymes, cytokines, transcription factors, hormones, growth factors, blood factors, antigens and antibodies.
In some embodiments, the first molecule is a peptide. In some embodiments, the second molecule is a peptide. In some embodiments, the third molecule is a peptide.
In some embodiments, the first molecule is a messenger RNA (mRNA). In some embodiments, the second molecule is a mRNA. In some embodiments, the third molecule is a mRNA. The mRNA, in some embodiments, encodes a vaccine or other antigenic molecule.
In some embodiments, the first molecule is a non-coding RNA (a RNA that does not encode a protein). In some embodiments, the second molecule is a non-coding RNA. In some embodiments, the third molecule is a non-coding RNA. Examples of non-coding RNA include, but are not limited to, RNA interference molecules, such as microRNA (miRNA), antisense RNA, short-interfering RNA (siRNA) or short-hairpin RNA (shRNA).
Methods of the present disclosure include the use of at least two or at least three different vectors. A vector is any nucleic acid that may be used as a vehicle to carry exogenous (foreign) genetic material into a cell. A vector, in some embodiments, is a DNA sequence that includes an insert (e.g., transgene) and a larger sequence that serves as the backbone of the vector. Non-limiting examples of vectors include plasmids, viruses/viral vectors, cosmids, and artificial chromosomes, any of which may be used as provided herein. In some embodiments, the vector is a viral vector, such as a viral particle. In some embodiments, the vector is an RNA-based vector, such as a self-replicating RNA vector. In some embodiments, the first vector is a plasmid, the second vector is a plasmid, and/or the third vector is a plasmid. A vector, as provided herein, includes a promoter operably linked to a nucleic acid encoding a fragment of an intein and a fragment of selectable marker protein. In some embodiments, a vector also comprises a promoter operably linked to a nucleic acid, such as a transgene, encoding a molecule of interest.
In some embodiments, one vector (e.g., a first vector) comprises a nucleotide sequence encoding a first selectable marker protein fragment upstream from a nucleotide sequence encoding an N-terminal intein protein fragment, while the other vector (e.g., a second vector) comprises a nucleotide sequence encoding a C-terminal intein protein fragment upstream from a second antibiotic resistance protein fragment (see, e.g.,
In some embodiments, (a) a first vector comprises a nucleotide sequence encoding an N-terminal fragment of an antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein, (b) a second vector comprises a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a nucleotide sequence encoding a central fragment of the antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein, and (c) a third vector comprises a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a nucleotide sequence encoding a C-terminal fragment of the antibiotic resistance protein. This configuration is equivalent to a (a) a first vector comprising a nucleotide sequence encoding an N-terminal fragment of a first intein, which is downstream from a nucleotide sequence encoding an N-terminal fragment of an antibiotic resistance protein, (b) a second vector comprising a nucleotide sequence encoding an N-terminal fragment of a second intein, which is downstream from a nucleotide sequence encoding a central fragment of the antibiotic resistance protein, which is downstream from a nucleotide sequence encoding a C-terminal fragment of the first intein, and (c) a third vector comprising a C-terminal fragment of the antibiotic resistance protein, which is downstream from a nucleotide sequence encoding a C-terminal fragment of the second intein.
Methods of the present disclosure may be used for the production of transgenic cells and organisms by introducing into host cells the vectors (e.g., first and second vectors) described herein. The cells into which the vectors are introduced may be eukaryotic or prokaryotic. In some embodiments, the cells are eukaryotic. Examples of eukaryotic cells for use as provided herein include mammalian cells, plant cells (e.g., crop cells), inset cells (e.g., Drosophila) and fungal cells (e.g., Saccharomyces). Mammalian cells may be, for example, human cells (stem cells or cells from an established cell line), primate cells, equine cells, bovine cells, porcine cells, canine cells, feline cells, or rodent cells (e.g., mouse or rat). Examples of mammalian cells for use as provided herein include, but are not limited to, Chinese hamster ovary (CHO) cells, human embryonic kidney (HEK) 293 cells, HeLa cells, and NSO cells. In some embodiments, the cells are prokaryotic. Examples of prokaryotic cells for use as provided herein include bacterial cells. Bacterial cells may be, for example, Escherichia spp. (e.g., Escherichia coli), Streptococcus spp. (e.g., Streptococcus pyogenes, Streptococcus viridans, Streptococcus pneumoniae), Neisseria spp. (e.g., Neisseria gibirrhoea, Neisseria meningitidis), Corynebacterium spp. (e.g., Corynebacterium diphtheriae), Bacillis spp. (e.g., Bacillis anthracis, Bacillis subtilis), Lactobacillus spp., Clostridium spp. (e.g., Clostridium tetani, Clostridium perfringens, Clostridium novyii), Mycobacterium spp. (e.g., Mycobacterium tuberculosis), Shigella spp. (e.g., Shigella flexneri, Shigella dysenteriae), Salmonella spp. (e.g., Salmonella typhi, Salmonella enteritidis), Klebsiella spp. (e.g., Klebsiella pneumoniae), Yersinia spp. (e.g., Yersinia pestis), Serratia spp. (e.g., Serratia marcescens), Pseudomonas spp. (e.g., Pseudomonas aeruginosa, Pseudomonas mallei), Eikenella spp. (e.g., Eikenella corrodens), Haemophilus spp. (e.g., Haemophilus influenza, Haemophilus ducreyi, Haemophilus aegyptius), Vibrio spp. (e.g., Vibrio cholera, Vibrio natriegens), Legionella spp. (e.g., Legionella micdadei, Legionella bozemani), Brucella spp. (e.g., Brucella abortus), Mycoplasma spp. (e.g., Mycoplasma pneumoniae) or Streptomyces spp. (e.g. Streptomyces coelicolor, Streptomyces lividans, Streptomyces albus).
Methods of the present disclosure, in some embodiments, include delivering vectors to a composition comprising cells and maintaining the composition under conditions that permit introduction of nucleic acid (e.g., first, second, and third vector) into the cells and permit nucleic acid expression in the cells to produce eukaryotic cells. Conditions required for the introduction of nucleic acid (e.g., vectors) into cells are well known. These conditions include, for example, transformation (of prokaryotic cells) conditions, transfection (of eukaryotic cells) conditions, transduction (via virus/viral vector) conditions, and electroporation conditions, any of which may be used as provided herein. Thus, in some embodiments, methods of the present disclosure include transfecting eukaryotic (e.g. mammalian) cells, while in other embodiments, the methods include transforming prokaryotic (e.g., bacterial) cells.
The selection of transgenic, e.g., multi-transgenic cells, such as double, triple, and/or quarduple transgenic cells depends on the type of selectable marker used. For example, if the selectable marker protein is an antibiotic resistance protein, the selection step may include exposing the cells to a specific antibiotic and selecting only those cells that survive. If the selectable marker protein is a fluorescent protein, the selection step may include simply viewing the cells under a microscope and selecting cells that fluoresce, or the selection step may include other fluorescent selection methods, such as fluorescence-activated cell sorting (FACS) sorting.
In some embodiments, cells are transduced with viral vectors (e.g., viruses) carrying the nucleic acids as described herein. In some embodiments, prior to transduction (or other transfection method), cells are seeded, for example, on well plates (e.g., 12-well plates) at a density of 1×104 to 1×106 per well. In some embodiments 100 μL to 500 μL, e.g., 100, 150, 200, 250, 300, 350, 400, 450, or 500 μL of each viral vector is added to each well.
The present disclosure also provides kits that may be used, for example, to produce and screen for transgenic cells and/or organisms. The kits may include any two or more components as described herein. For example, a kit may comprise (a) a first vector comprising a nucleotide sequence encoding a first selectable marker protein fragment upstream from a nucleotide sequence encoding an N-terminal intein protein fragment; and (b) a second vector comprising a nucleotide sequence encoding a C-terminal intein protein fragment upstream from a second selectable marker protein fragment, wherein the N-terminal intein protein fragment and the C-terminal intein protein fragment catalyze joining of the first selectable marker protein fragment to the second selectable marker protein fragment to produce a full-length antibiotic resistance protein.
In some embodiments, the kits include any two or more components as described herein. For example, a kit may comprise (a) a first vector comprising a nucleotide sequence encoding an N-terminal fragment of an antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein, (b) a second vector comprising a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a nucleotide sequence encoding a central fragment of the antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein, and (c) a third vector comprising a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a nucleotide sequence encoding a C-terminal fragment of the antibiotic resistance protein, wherein the N-terminal fragment and the C-terminal fragment of the first intein catalyze joining of N-terminal fragment of the antibiotic resistance protein to the central fragment of the antibiotic resistance protein, and the N-terminal fragment and the C-terminal fragment of the second intein catalyze joining of central fragment of the antibiotic resistance protein to the C-terminal fragment of the antibiotic resistance protein, to produce a full-length antibiotic resistance protein.
In some embodiments, the kits further comprise any one or more of the following components: buffers, salts, cloning enzymes (e.g., LR clonase), competent cells (e.g., competent bacterial cells), transfection reagents, antibiotics, and/or instructions for performing the methods described herein.
Additional embodiments of the present disclosure are encompassed by the following numbered paragraphs:
1. A method comprising delivering to eukaryotic cells
(a) a first vector comprising (i) a nucleotide sequence encoding an N-terminal fragment of an antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a nucleotide sequence encoding a first molecule of interest; and
(b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the intein, which is upstream from a C-terminal fragment of the antibiotic resistance protein and (ii) a nucleotide sequence encoding a second molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of the N-terminal fragment and the C-terminal fragment of the antibiotic resistance protein to produce a full-length antibiotic resistance protein.
2. The method of paragraph 1 further comprising maintaining the eukaryotic cells under conditions that permit introduction of the first and second vectors into the eukaryotic cells to produce transgenic eukaryotic cells.
3. The method of paragraph 2 further comprising selecting the transgenic eukaryotic cells that comprise the full-length antibiotic resistance protein.
4. The method of any one of paragraphs 1-3, wherein the eukaryotic cells are mammalian cells.
5. The method of any one of paragraphs 1-4, wherein the antibiotic resistance protein confers resistance to hygromycin, G418, puromycin, phleomycin D1 or blasticidin.
6. The method of any one of paragraphs 1-5, wherein the intein is a split intein.
7. The method of paragraph 6, wherein the split intein is a natural split intein.
8. The method of paragraph 7, wherein the natural split intein is selected from DnaE inteins.
9. The method of paragraph 8, wherein the DnaE inteins are selected from Synechocystis sp. DnaE (SspDnaE) inteins and Nostoc punctiforme (NpuDnaE) inteins.
10. The method of paragraph 6, wherein the split intein is an engineered split intein.
11. The method of paragraph 10, wherein the engineered split intein is engineered from DnaB inteins.
12. The method of paragraph 11, wherein the engineered split intein is a SspDnaB S1 intein.
13. The method of paragraph 12, wherein the engineered split intein is engineered from GyrB inteins.
14. The method of paragraph 13, wherein the engineered split intein is a SspGyrB S11 intein.
15. The method of any one of paragraphs 1-14, wherein the first and/or second molecule is a protein.
16. The method of any one of paragraphs 1-15, wherein the first and/or second molecule is a non-coding ribonucleic acid (RNA).
17. The method of paragraph 16, wherein the non-coding RNA is a microRNA (miRNA), antisense RNA, short-interfering RNA (siRNA) or short-hairpin RNA (shRNA).
18. The method of any one of paragraphs 1-17, wherein the first and/or second vector is a plasmid vector or a viral vector.
19. A method comprising delivering to eukaryotic cells
(a) a first vector comprising (i) an N-terminal fragment of a hygB gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a first molecule of interest; and
(b) a second vector comprising (ii) a nucleotide sequence encoding a C-terminal fragment of the intein, which is upstream from a C-terminal fragment of the hygB gene and (ii) a second molecule of interest,
wherein the N-terminal fragment and C-terminal fragment of the intein catalyze joining of a protein fragment encoded by the N-terminal fragment of the hygB gene to a protein fragment encoded by the C-terminal fragment of the hygB gene to produce full-length hygromycin B phosphotransferase.
20. The method of paragraph 19, wherein the first amino acid of the protein fragment encoded by the second hygB gene fragment is cysteine.
21. The method of paragraph 23, wherein
the protein fragment encoded by the N-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 1-89 of SEQ ID NO: 1, and the protein fragment encoded by the C-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 90-341 of SEQ ID NO: 1;
the protein fragment encoded by the N-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 1-200 of SEQ ID NO: 1, and the protein fragment encoded by the C-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 201-341 of SEQ ID NO: 1;
the protein fragment encoded by the N-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 1-53 of SEQ ID NO: 1, and the protein fragment encoded by the C-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 54-341 of SEQ ID NO: 1;
the protein fragment encoded by the N-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 1-240 of SEQ ID NO: 1, and the protein fragment encoded by the C-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 241-341 of SEQ ID NO: 1;
the protein fragment encoded by the N-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 1-292 of SEQ ID NO: 1, and the protein fragment encoded by the C-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 293-341 of SEQ ID NO: 1.
22. The method of any one of paragraphs 23-21, wherein
the N-terminal fragment of the intein is identified by SEQ ID NO:16, and the C-terminal fragment of the intein is identified by SEQ ID NO:17;
the N-terminal fragment of the intein is identified by SEQ ID NO:7, and the C-terminal fragment of the intein is identified by SEQ ID NO:8; or
the N-terminal fragment of the intein is identified by SEQ ID NO:18 or SEQ ID NO:9, and the C-terminal fragment of the intein is identified by SEQ ID NO:19 or SEQ ID NO:10.
23. A method comprising delivering to eukaryotic cells
(a) a first vector comprising (i) a N-terminal fragment of a bsr gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a first molecule of interest; and
(b) a second vector comprising (ii) a nucleotide sequence encoding a C-terminal fragment of the intein, which is upstream from a C-terminal fragment of the bsr gene and (ii) a second molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of a protein fragment encoded by the N-terminal fragment of the bsr gene to a protein fragment encoded by the C-terminal fragment of the bsr gene to produce full-length blasticidin-S deaminase.
24. The method of paragraph 23, wherein the protein fragment encoded by the N-terminal fragment of the bsr gene comprises an amino acid sequence identified by amino acids 1-102 of SEQ ID NO: 4, and the protein fragment encoded by the C-terminal fragment of the bsr gene comprises an amino acid sequence identified by amino acids 103-140 of SEQ ID NO: 4.
25. The method of paragraph 22 or 23, wherein
the N-terminal fragment of the intein is identified by SEQ ID NO:16, and the C-terminal fragment of the intein is identified by SEQ ID NO:17;
the N-terminal fragment of the intein is identified by SEQ ID NO:7, and the C-terminal fragment of the intein is identified by SEQ ID NO:8; or
the N-terminal fragment of the intein is identified by SEQ ID NO:18 or SEQ ID NO:9, and the C-terminal fragment of the intein is identified by SEQ ID NO:19 or SEQ ID NO:10.
26. A method comprising delivering to eukaryotic cells
(a) a first vector comprising (i) a N-terminal fragment of a pac gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a first molecule of interest; and
(b) a second vector comprising (ii) a nucleotide sequence encoding a C-terminal fragment of the intein, which is upstream from a C-terminal fragment of the pac gene and (ii) a second molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of a protein fragment encoded by the N-terminal fragment of the pac gene to a protein fragment encoded by the C-terminal fragment of the pac gene to produce full-length puromycin N-acetyl-transferase.
27. The method of paragraph 26, wherein
the protein fragment encoded by the N-terminal fragment of the pac gene comprises an amino acid sequence identified by amino acids 1-63 of SEQ ID NO: 2, and the protein fragment encoded by the C-terminal fragment of the pac gene comprises an amino acid sequence identified by amino acids 64-199 of SEQ ID NO: 2;
the protein fragment encoded by the N-terminal fragment of the pac gene comprises an amino acid sequence identified by amino acids 1-119 of SEQ ID NO: 2, and the protein fragment encoded by the C-terminal fragment of the pac gene comprises an amino acid sequence identified by amino acids 120-199 of SEQ ID NO: 2;
the protein fragment encoded by the N-terminal fragment of the pac gene comprises an amino acid sequence identified by amino acids 1-100 of SEQ ID NO: 2, and the protein fragment encoded by the C-terminal fragment of the pac gene comprises an amino acid sequence identified by amino acids 101-199 of SEQ ID NO: 2.
28. The method of paragraph 26 or 27, wherein
the N-terminal fragment of the intein is identified by SEQ ID NO:16, and the C-terminal fragment of the intein is identified by SEQ ID NO:17;
the N-terminal fragment of the intein is identified by SEQ ID NO:7, and the C-terminal fragment of the intein is identified by SEQ ID NO:8; or
the N-terminal fragment of the intein is identified by SEQ ID NO:18 or SEQ ID NO:9, and the C-terminal fragment of the intein is identified by SEQ ID NO:19 or SEQ ID NO:10.
29. A method comprising delivering to eukaryotic cells
(a) a first vector comprising (i) a N-terminal fragment of a neo gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a first molecule of interest; and
(b) a second vector comprising (ii) a nucleotide sequence encoding a C-terminal fragment of the intein, which is upstream from a C-terminal fragment of the neo gene and (ii) a second molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of a protein fragment encoded by the N-terminal fragment of the neo gene to a protein fragment encoded by the C-terminal fragment of the neo gene to produce full-length aminoglycoside 3′-phosphotransferase.
30. The method of paragraph 29, wherein
the protein fragment encoded by the N-terminal fragment of the neo gene comprises an amino acid sequence identified by amino acids 1-133 of SEQ ID NO: 3 and the protein fragment encoded by the C-terminal fragment of the neo gene comprises an amino acid sequence identified by amino acids 134-267 of SEQ ID NO: 3; or
the protein fragment encoded by the N-terminal fragment of the neo gene comprises an amino acid sequence identified by amino acids 1-194 of SEQ ID NO: 3 and the protein fragment encoded by the C-terminal fragment of the neo gene comprises an amino acid sequence identified by amino acids 195-267 of SEQ ID NO: 3.
31. The method of paragraph 29 or 30, wherein
the N-terminal fragment of the intein is identified by SEQ ID NO:16, and the C-terminal fragment of the intein is identified by SEQ ID NO:17;
the N-terminal fragment of the intein is identified by SEQ ID NO:7, and the C-terminal fragment of the intein is identified by SEQ ID NO:8; or
the N-terminal fragment of the intein is identified by SEQ ID NO:18 or SEQ ID NO:9, and the C-terminal fragment of the intein is identified by SEQ ID NO:19 or SEQ ID NO:10.
32. A method comprising delivering to a composition comprising eukaryotic cells
(a) a first vector comprising (i) a nucleotide sequence encoding an N-terminal fragment of a fluorescent protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a nucleotide sequence encoding a first molecule of interest; and
(b) a second vector comprising (ii) a nucleotide sequence encoding a C-terminal fragment of the intein, which is upstream from a C-terminal fragment of the fluorescent protein and (ii) a nucleotide sequence encoding a second molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of the N-terminal fragment of the fluorescent protein to the C-terminal fragment of the fluorescent protein to produce a full-length fluorescent protein.
33. The method of paragraph 51 further comprising maintaining the eukaryotic cells under conditions that permit introduction of the first and second vectors into the eukaryotic cells to produce transgenic eukaryotic cells.
34. The method of paragraph 33 further comprising selecting the transgenic eukaryotic cells that comprise the full-length fluorescent protein.
35. The method of any one of paragraphs 32-34, wherein the eukaryotic cells are mammalian cells.
36. The method of any one of paragraphs 32-35, wherein the fluorescent protein is selected from TagCFP, mTagCFP2, Czurite, ECFP2, mKalama1, Sirius, Sapphire, T-Sapphire, ECFP, Cerulean, SCFP3C, mTurquoise, mTurquoise2, monomeric Midoriishi-Cyan, TagCFP, mTFP1, EGFP, Emerald, Superfolder GFP, Monomeric Czami Green, TagGFP2, mUKG, mWasabi, Clover, mNeonGreen, EYFP, Citrine, Venus, SYFP2, TagYFP, Monomeric Kusabira-Orange, mKOK, mKO2, mOrange, mOrange2, mRaspberry, mCherry, mStrawberry, mTangerine, tdTomato, TagRFP, TagRFP-T, mCpple, mRuby, mRuby2, mPlum, HcRed-Tandem, mKate2, mNeptune, NirFP, TagRFP657, IFP1.4 and iRFP.
37. The method of any one of paragraphs 32-36, wherein the intein is a split intein.
38. The method of paragraph 37, wherein the split intein is a natural split intein.
39. The method of paragraph 38, wherein the natural split intein is selected from DnaE inteins.
40. The method of paragraph 39, wherein the DnaE inteins are selected from Synechocystis sp. DnaE (SspDnaE) inteins and Nostoc punctiforme (NpuDnaE) inteins.
41. The method of paragraph 40, wherein the split intein is an engineered split intein.
42. The method of paragraph 41, wherein the engineered split intein is engineered from DnaB inteins.
43. The method of paragraph 42, wherein the engineered split intein is a SspDnaB S1 intein.
44. The method of paragraph 42, wherein the engineered split intein is engineered from GyrB inteins.
45. The method of paragraph 44, wherein the engineered split intein is a SspGyrB S11 intein.
46. The method of any one of paragraphs 32-45, wherein the first and/or second molecule is a protein.
47. The method of any one of paragraphs 32-46, wherein the first and/or second molecule is a non-coding ribonucleic acid (RNA).
48. The method of paragraph 47, wherein the non-coding RNA is a microRNA (miRNA), antisense RNA, short-interfering RNA (siRNA) or short-hairpin RNA (shRNA).
49. The method of any one of paragraphs 32-48, wherein the first and/or second vector is a plasmid vector or a viral vector.
50. A method comprising delivering to eukaryotic cells
(a) a first vector comprising (i) an N-terminal fragment of an egfp gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a first molecule of interest; and
(b) a second vector comprising (ii) a nucleotide sequence encoding a C-terminal fragment of an intein, which is upstream from a C-terminal fragment of an egfp gene and (ii) a second molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of a protein fragment encoded by the N-terminal fragment of the egfp gene to a protein fragment encoded by the C-terminal fragment of the egfp gene to produce full-length EGFP protein.
51. The method of paragraph 50, wherein the protein fragment encoded by the N-terminal fragment of the egfp gene comprises an amino acid sequence identified by amino acids 1-175 of SEQ ID NO: 5, and the protein fragment encoded by the C-terminal fragment of the egfp gene comprises an amino acid sequence identified by amino acids 175-239 of SEQ ID NO: 5.
52. The method of paragraph 50 or 51, wherein
the N-terminal fragment of the intein is identified by SEQ ID NO:16, and the C-terminal fragment of the intein is identified by SEQ ID NO:17;
the N-terminal fragment of the intein is identified by SEQ ID NO:7, and the C-terminal fragment of the intein is identified by SEQ ID NO:8; or
the N-terminal fragment of the intein is identified by SEQ ID NO:18 or SEQ ID NO:9, and the C-terminal fragment of the intein is identified by SEQ ID NO:19 or SEQ ID NO:10.
53. A method comprising delivering to eukaryotic cells
(a) a first vector comprising (i) an N-terminal fragment of an mScarlet gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a first molecule of interest; and
(b) a second vector comprising (ii) a nucleotide sequence encoding a C-terminal fragment of an intein, which is upstream from a C-terminal fragment of an mScarlet gene and (ii) a second molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of a protein fragment encoded by the N-terminal fragment of the mScarlet gene to a protein fragment encoded by the C-terminal fragment of the mScarlet gene to produce full-length mScarlet protein.
54. The method of paragraph 53, wherein
the protein fragment encoded by the N-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 1-46 of SEQ ID NO: 6, and the protein fragment encoded by the C-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 47-232 of SEQ ID NO: 6;
the protein fragment encoded by the N-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 1-48 of SEQ ID NO: 6, and the protein fragment encoded by the C-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 49-232 of SEQ ID NO: 6;
the protein fragment encoded by the N-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 1-51 of SEQ ID NO: 6, and the protein fragment encoded by the C-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 52-232 of SEQ ID NO: 6;
the protein fragment encoded by the N-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 1-75 of SEQ ID NO: 6, and the protein fragment encoded by the C-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 76-232 of SEQ ID NO: 6;
the protein fragment encoded by the N-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 1-122 of SEQ ID NO: 6, and the protein fragment encoded by the C-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 123-232 of SEQ ID NO: 6;
the protein fragment encoded by the N-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 1-140 of SEQ ID NO: 6, and the protein fragment encoded by the C-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 141-232 of SEQ ID NO: 6;
the protein fragment encoded by the N-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 1-163 of SEQ ID NO: 6, and the protein fragment encoded by the C-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 164-232 of SEQ ID NO: 6.
55. The method of paragraph 53 or 54, wherein
the N-terminal fragment of the intein is identified by SEQ ID NO:16, and the C-terminal fragment of the intein is identified by SEQ ID NO:17;
the N-terminal fragment of the intein is identified by SEQ ID NO:7, and the C-terminal fragment of the intein is identified by SEQ ID NO:8; or
the N-terminal fragment of the intein is identified by SEQ ID NO:18 or SEQ ID NO:9, and the C-terminal fragment of the intein is identified by SEQ ID NO:19 or SEQ ID NO:10.
56. A eukaryotic cell, comprising
(a) a first vector comprising (i) a nucleotide sequence encoding an N-terminal fragment of an antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a nucleotide sequence encoding a first molecule of interest; and
(b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the intein, which is upstream from a C-terminal fragment of the antibiotic resistance protein and (ii) a nucleotide sequence encoding a second molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of the N-terminal fragment and the C-terminal fragment of the antibiotic resistance protein to produce a full-length antibiotic resistance protein.
57. The cell of paragraph 56, wherein the eukaryotic cells are mammalian cells.
58. The cell of paragraph 56 or 57, wherein the antibiotic resistance protein confers resistance to hygromycin, G418, puromycin, phleomycin D1 or blasticidin.
59. The cell of any one of paragraphs 56-58, wherein the intein is a split intein.
60. The cell of paragraph 59, wherein the split intein is a natural split intein.
61. The cell of paragraph 60, wherein the natural split intein is selected from DnaE inteins.
62. The cell of paragraph 61, wherein the DnaE inteins are selected from Synechocystis sp. DnaE (SspDnaE) inteins and Nostoc punctiforme (NpuDnaE) inteins.
63. The cell of paragraph 59, wherein the split intein is an engineered split intein.
64. The cell of paragraph 63, wherein the engineered split intein is engineered from DnaB inteins.
65. The cell of paragraph 64, wherein the engineered split intein is a SspDnaB S1 intein.
66. The cell of paragraph 65, wherein the engineered split intein is engineered from GyrB inteins.
67. The cell of paragraph 66, wherein the engineered split intein is a SspGyrB S11 intein.
68. The cell of any one of paragraphs 56-67, wherein the first and/or second molecule is a protein.
69. The cell of any one of paragraphs 56-68, wherein the first and/or second molecule is a non-coding ribonucleic acid (RNA).
70. The cell of paragraph 69, wherein the non-coding RNA is a microRNA (miRNA), antisense RNA, short-interfering RNA (siRNA) or short-hairpin RNA (shRNA).
71. The cell of any one of paragraphs 56-70, wherein the first and/or second vector is a plasmid vector or a viral vector.
72. A cell comprising
(a) a first vector comprising (i) an N-terminal fragment of a hygB gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a first molecule of interest; and
(b) a second vector comprising (ii) a nucleotide sequence encoding a C-terminal fragment of the intein, which is upstream from a C-terminal fragment of the hygB gene and (ii) a second molecule of interest,
wherein the N-terminal fragment and C-terminal fragment of the intein catalyze joining of a protein fragment encoded by the N-terminal fragment of the hygB gene to a protein fragment encoded by the C-terminal fragment of the hygB gene to produce full-length hygromycin B phosphotransferase.
73. The cell of paragraph 72, wherein the first amino acid of the protein fragment encoded by the second hygB gene fragment is cysteine.
74. The cell of paragraph 73, wherein
the protein fragment encoded by the N-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 1-89 of SEQ ID NO: 1, and the protein fragment encoded by the C-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 90-341 of SEQ ID NO: 1;
the protein fragment encoded by the N-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 1-200 of SEQ ID NO: 1, and the protein fragment encoded by the C-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 201-341 of SEQ ID NO: 1;
the protein fragment encoded by the N-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 1-53 of SEQ ID NO: 1, and the protein fragment encoded by the C-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 54-341 of SEQ ID NO: 1;
the protein fragment encoded by the N-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 1-240 of SEQ ID NO: 1, and the protein fragment encoded by the C-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 241-341 of SEQ ID NO: 1;
the protein fragment encoded by the N-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 1-292 of SEQ ID NO: 1, and the protein fragment encoded by the C-terminal fragment of the hygB gene comprises an amino acid sequence identified by amino acids 293-341 of SEQ ID NO: 1.
75. The cell of any one of paragraphs 72-74, wherein
the N-terminal fragment of the intein is identified by SEQ ID NO:16, and the C-terminal fragment of the intein is identified by SEQ ID NO:17;
the N-terminal fragment of the intein is identified by SEQ ID NO:7, and the C-terminal fragment of the intein is identified by SEQ ID NO:8; or
the N-terminal fragment of the intein is identified by SEQ ID NO:18 or SEQ ID NO:9, and the C-terminal fragment of the intein is identified by SEQ ID NO:19 or SEQ ID NO:10.
76. A eukaryotic cell, comprising
(a) a first vector comprising (i) a N-terminal fragment of a bsr gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a first molecule of interest; and
(b) a second vector comprising (ii) a nucleotide sequence encoding a C-terminal fragment of the intein, which is upstream from a C-terminal fragment of the bsr gene and (ii) a second molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of a protein fragment encoded by the N-terminal fragment of the bsr gene to a protein fragment encoded by the C-terminal fragment of the bsr gene to produce full-length blasticidin-S deaminase.
77. The cell of paragraph 76, wherein the protein fragment encoded by the N-terminal fragment of the bsr gene comprises an amino acid sequence identified by amino acids 1-102 of SEQ ID NO: 4, and the protein fragment encoded by the C-terminal fragment of the bsr gene comprises an amino acid sequence identified by amino acids 103-140 of SEQ ID NO: 4.
78. The cell of paragraph 76 or 77, wherein
the N-terminal fragment of the intein is identified by SEQ ID NO:16, and the C-terminal fragment of the intein is identified by SEQ ID NO:17;
the N-terminal fragment of the intein is identified by SEQ ID NO:7, and the C-terminal fragment of the intein is identified by SEQ ID NO:8; or
the N-terminal fragment of the intein is identified by SEQ ID NO:18 or SEQ ID NO:9, and the C-terminal fragment of the intein is identified by SEQ ID NO:19 or SEQ ID NO:10.
79. A eukaryotic cell, comprising
(a) a first vector comprising (i) a N-terminal fragment of a pac gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a first molecule of interest; and
(b) a second vector comprising (ii) a nucleotide sequence encoding a C-terminal fragment of the intein, which is upstream from a C-terminal fragment of the pac gene and (ii) a second molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of a protein fragment encoded by the N-terminal fragment of the pac gene to a protein fragment encoded by the C-terminal fragment of the pac gene to produce full-length puromycin N-acetyl-transferase.
80. The cell of paragraph 79, wherein
the protein fragment encoded by the N-terminal fragment of the pac gene comprises an amino acid sequence identified by amino acids 1-63 of SEQ ID NO: 2, and the protein fragment encoded by the C-terminal fragment of the pac gene comprises an amino acid sequence identified by amino acids 64-199 of SEQ ID NO: 2;
the protein fragment encoded by the N-terminal fragment of the pac gene comprises an amino acid sequence identified by amino acids 1-119 of SEQ ID NO: 2, and the protein fragment encoded by the C-terminal fragment of the pac gene comprises an amino acid sequence identified by amino acids 120-199 of SEQ ID NO: 2;
the protein fragment encoded by the N-terminal fragment of the pac gene comprises an amino acid sequence identified by amino acids 1-100 of SEQ ID NO: 2, and the protein fragment encoded by the C-terminal fragment of the pac gene comprises an amino acid sequence identified by amino acids 101-199 of SEQ ID NO: 2.
81. The cell of paragraph 79 or 80, wherein
the N-terminal fragment of the intein is identified by SEQ ID NO:16, and the C-terminal fragment of the intein is identified by SEQ ID NO:17;
the N-terminal fragment of the intein is identified by SEQ ID NO:7, and the C-terminal fragment of the intein is identified by SEQ ID NO:8; or
the N-terminal fragment of the intein is identified by SEQ ID NO:18 or SEQ ID NO:9, and the C-terminal fragment of the intein is identified by SEQ ID NO:19 or SEQ ID NO:10.
82. A eukaryotic cell, comprising
(a) a first vector comprising (i) a N-terminal fragment of a neo gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a first molecule of interest; and
(b) a second vector comprising (ii) a nucleotide sequence encoding a C-terminal fragment of the intein, which is upstream from a C-terminal fragment of the neo gene and (ii) a second molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of a protein fragment encoded by the N-terminal fragment of the neo gene to a protein fragment encoded by the C-terminal fragment of the neo gene to produce full-length aminoglycoside 3′-phosphotransferase.
83. The cell of paragraph 82, wherein
the protein fragment encoded by the N-terminal fragment of the neo gene comprises an amino acid sequence identified by amino acids 1-133 of SEQ ID NO: 3 and the protein fragment encoded by the C-terminal fragment of the neo gene comprises an amino acid sequence identified by amino acids 134-267 of SEQ ID NO: 3; or
the protein fragment encoded by the N-terminal fragment of the neo gene comprises an amino acid sequence identified by amino acids 1-194 of SEQ ID NO: 3 and the protein fragment encoded by the C-terminal fragment of the neo gene comprises an amino acid sequence identified by amino acids 195-267 of SEQ ID NO: 3.
84. The cell of paragraph 82 or 83, wherein
the N-terminal fragment of the intein is identified by SEQ ID NO:16, and the C-terminal fragment of the intein is identified by SEQ ID NO:17;
the N-terminal fragment of the intein is identified by SEQ ID NO:7, and the C-terminal fragment of the intein is identified by SEQ ID NO:8; or
the N-terminal fragment of the intein is identified by SEQ ID NO:18 or SEQ ID NO:9, and the C-terminal fragment of the intein is identified by SEQ ID NO:19 or SEQ ID NO:10.
85. A eukaryotic cell, comprising
(a) a first vector comprising (i) a nucleotide sequence encoding an N-terminal fragment of a fluorescent protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a nucleotide sequence encoding a first molecule of interest; and
(b) a second vector comprising (ii) a nucleotide sequence encoding a C-terminal fragment of the intein, which is upstream from a C-terminal fragment of the fluorescent protein and (ii) a nucleotide sequence encoding a second molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of the N-terminal fragment of the fluorescent protein to the C-terminal fragment of the fluorescent protein to produce a full-length fluorescent protein.
86. The cell of paragraph 85 further comprising maintaining the eukaryotic cells under conditions that permit introduction of the first and second vectors into the eukaryotic cells to produce transgenic eukaryotic cells.
87. The cell of paragraph 86 further comprising selecting the transgenic eukaryotic cells that comprise the full-length fluorescent protein.
88. The cell of any one of paragraphs 85-87, wherein the eukaryotic cells are mammalian cells.
89. The cell of any one of paragraphs 85-88, wherein the fluorescent protein is selected from TagCFP, mTagCFP2, Czurite, ECFP2, mKalama1, Sirius, Sapphire, T-Sapphire, ECFP, Cerulean, SCFP3C, mTurquoise, mTurquoise2, monomeric Midoriishi-Cyan, TagCFP, mTFP1, EGFP, Emerald, Superfolder GFP, Monomeric Czami Green, TagGFP2, mUKG, mWasabi, Clover, mNeonGreen, EYFP, Citrine, Venus, SYFP2, TagYFP, Monomeric Kusabira-Orange, mKOK, mKO2, mOrange, mOrange2, mRaspberry, mCherry, mStrawberry, mTangerine, tdTomato, TagRFP, TagRFP-T, mCpple, mRuby, mRuby2, mPlum, HcRed-Tandem, mKate2, mNeptune, NirFP, TagRFP657, IFP1.4 and iRFP.
90. The cell of any one of paragraphs 85-89, wherein the intein is a split intein.
91. The cell of paragraph 90, wherein the split intein is a natural split intein.
92. The cell of paragraph 91, wherein the natural split intein is selected from DnaE inteins.
93. The cell of paragraph 92, wherein the DnaE inteins are selected from Synechocystis sp. DnaE (SspDnaE) inteins and Nostoc punctiforme (NpuDnaE) inteins.
94. The cell of paragraph 93, wherein the split intein is an engineered split intein.
95. The cell of paragraph 94, wherein the engineered split intein is engineered from DnaB inteins.
96. The cell of paragraph 95, wherein the engineered split intein is a SspDnaB S1 intein.
97. The cell of paragraph 95, wherein the engineered split intein is engineered from GyrB inteins.
98. The cell of paragraph 97, wherein the engineered split intein is a SspGyrB S11 intein.
99. The cell of any one of paragraphs 85-98, wherein the first and/or second molecule is a protein.
100. The cell of any one of paragraphs 85-99, wherein the first and/or second molecule is a non-coding ribonucleic acid (RNA).
101. The cell of paragraph 100, wherein the non-coding RNA is a microRNA (miRNA), antisense RNA, short-interfering RNA (siRNA) or short-hairpin RNA (shRNA).
102. The cell of any one of paragraphs 85-101, wherein the first and/or second vector is a plasmid vector or a viral vector.
103. A eukaryotic cell, comprising
(a) a first vector comprising (i) an N-terminal fragment of an egfp gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a first molecule of interest; and
(b) a second vector comprising (ii) a nucleotide sequence encoding a C-terminal fragment of an intein, which is upstream from a C-terminal fragment of an egfp gene and (ii) a second molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of a protein fragment encoded by the N-terminal fragment of the egfp gene to a protein fragment encoded by the C-terminal fragment of the egfp gene to produce full-length EGFP protein.
104. The cell of paragraph 103, wherein the protein fragment encoded by the N-terminal fragment of the egfp gene comprises an amino acid sequence identified by amino acids 1-175 of SEQ ID NO: 5, and the protein fragment encoded by the C-terminal fragment of the egfp gene comprises an amino acid sequence identified by amino acids 175-239 of SEQ ID NO: 5.
105. The cell of paragraph 103 or 104, wherein
the N-terminal fragment of the intein is identified by SEQ ID NO:16, and the C-terminal fragment of the intein is identified by SEQ ID NO:17;
the N-terminal fragment of the intein is identified by SEQ ID NO:7, and the C-terminal fragment of the intein is identified by SEQ ID NO:8; or
the N-terminal fragment of the intein is identified by SEQ ID NO:18 or SEQ ID NO:9, and the C-terminal fragment of the intein is identified by SEQ ID NO:19 or SEQ ID NO:10.
106. A eukaryotic cell, comprising
(a) a first vector comprising (i) an N-terminal fragment of an mScarlet gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein and (ii) a first molecule of interest; and
(b) a second vector comprising (ii) a nucleotide sequence encoding a C-terminal fragment of an intein, which is upstream from a C-terminal fragment of an mScarlet gene and (ii) a second molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of a protein fragment encoded by the N-terminal fragment of the mScarlet gene to a protein fragment encoded by the C-terminal fragment of the mScarlet gene to produce full-length mScarlet protein.
107. The cell of paragraph 106, wherein
the protein fragment encoded by the N-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 1-46 of SEQ ID NO: 6, and the protein fragment encoded by the C-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 47-232 of SEQ ID NO: 6;
the protein fragment encoded by the N-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 1-48 of SEQ ID NO: 6, and the protein fragment encoded by the C-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 49-232 of SEQ ID NO: 6;
the protein fragment encoded by the N-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 1-51 of SEQ ID NO: 6, and the protein fragment encoded by the C-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 52-232 of SEQ ID NO: 6;
the protein fragment encoded by the N-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 1-75 of SEQ ID NO: 6, and the protein fragment encoded by the C-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 76-232 of SEQ ID NO: 6;
the protein fragment encoded by the N-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 1-140 of SEQ ID NO: 6, and the protein fragment encoded by the C-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 141-232 of SEQ ID NO: 6;
the protein fragment encoded by the N-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 1-163 of SEQ ID NO: 6, and the protein fragment encoded by the C-terminal fragment of the mScarlet gene comprises an amino acid sequence identified by amino acids 164-232 of SEQ ID NO: 6.
108. The cell of paragraph 106 or 107, wherein
the N-terminal fragment of the intein is identified by SEQ ID NO:16, and the C-terminal fragment of the intein is identified by SEQ ID NO:17;
the N-terminal fragment of the intein is identified by SEQ ID NO:7, and the C-terminal fragment of the intein is identified by SEQ ID NO:8; or
the N-terminal fragment of the intein is identified by SEQ ID NO:18 or SEQ ID NO:9, and the C-terminal fragment of the intein is identified by SEQ ID NO:19 or SEQ ID NO:10.
109. A composition comprising the cell of any one of paragraph 85-108.
110. A kit, comprising
(a) a first vector comprising a nucleotide sequence encoding an N-terminal fragment of an antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein; and
(b) a second vector comprising a nucleotide sequence encoding a C-terminal fragment of the intein, which is upstream from a C-terminal fragment of the antibiotic resistance protein,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of the N-terminal fragment and the C-terminal fragment of the antibiotic resistance protein to produce a full-length antibiotic resistance protein.
111. The kit of paragraph 110, wherein the antibiotic resistance protein confers resistance to hygromycin, G418, puromycin, phleomycin D1 or blasticidin.
112. A kit, comprising
(a) a first vector comprising a nucleotide sequence encoding an N-terminal fragment of a fluorescent protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of an intein; and
(b) a second vector comprising a nucleotide sequence encoding a C-terminal fragment of the intein, which is upstream from a C-terminal fragment of the fluorescent protein,
wherein the N-terminal fragment and the C-terminal fragment of the intein catalyze joining of the N-terminal fragment of the fluorescent protein to the C-terminal fragment of the fluorescent protein to produce a full-length fluorescent protein.
113. The kit of paragraph 112, wherein the fluorescent protein is selected from TagCFP, mTagCFP2, Czurite, ECFP2, mKalama1, Sirius, Sapphire, T-Sapphire, ECFP, Cerulean, SCFP3C, mTurquoise, mTurquoise2, monomeric Midoriishi-Cyan, TagCFP, mTFP1, EGFP, Emerald, Superfolder GFP, Monomeric Czami Green, TagGFP2, mUKG, mWasabi, Clover, mNeonGreen, EYFP, Citrine, Venus, SYFP2, TagYFP, Monomeric Kusabira-Orange, mKOK, mKO2, mOrange, mOrange2, mRaspberry, mCherry, mStrawberry, mTangerine, tdTomato, TagRFP, TagRFP-T, mCpple, mRuby, mRuby2, mPlum, HcRed-Tandem, mKate2, mNeptune, NirFP, TagRFP657, IFP1.4 and iRFP.
114. The kit of any one of paragraphs 110-113, wherein the intein is a split intein.
115. The kit of paragraph 114, wherein the split intein is a natural split intein or an engineered split intein.
116. The kit of paragraph 115, wherein the natural split intein is selected from DnaE inteins.
117. The kit of paragraph 116, wherein the DnaE inteins are selected from Synechocystis sp. DnaE (SspDnaE) inteins and Nostoc punctiforme (NpuDnaE) inteins.
118. The kit of paragraph 115, wherein the engineered split intein is engineered from DnaB inteins or GyrB inteins.
119. The kit of paragraph 118, wherein the engineered split intein is a SspDnaB S1 intein.
120. The kit of paragraph 118, wherein the engineered split intein is a SspGyrB S11 intein.
121. The kit of any one of paragraphs 112-120, further comprising any one or more of the following components: buffers, salts, cloning enzymes, competent cells, transfection reagents, antibiotics, and/or instructions for performing the methods described herein.
122. A method comprising delivering to eukaryotic cells
(a) a first vector comprising (i) a nucleotide sequence encoding an N-terminal fragment of an antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein and (ii) a nucleotide sequence encoding a first molecule of interest,
(b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a nucleotide sequence encoding a central fragment of the antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein and (ii) a nucleotide sequence encoding a second molecule of interest, and
(c) a third vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a nucleotide sequence encoding a C-terminal fragment of the antibiotic resistance protein and (ii) a nucleotide sequence encoding a third molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the first intein catalyze joining of N-terminal fragment of the antibiotic resistance protein to the central fragment of the antibiotic resistance protein, and the N-terminal fragment and the C-terminal fragment of the second intein catalyze joining of central fragment of the antibiotic resistance protein to the C-terminal fragment of the antibiotic resistance protein, to produce a full-length antibiotic resistance protein.
123. The method of paragraph 112 further comprising maintaining the eukaryotic cells under conditions that permit introduction of the first, second, and third vectors into the eukaryotic cells to produce transgenic eukaryotic cells.
124. The method of paragraph 123 further comprising selecting the transgenic eukaryotic cells that comprise the full-length antibiotic resistance protein.
125. The method of any one of paragraphs 112-124, wherein the eukaryotic cells are mammalian cells.
126. The method of any one of paragraphs 112-125, wherein the antibiotic resistance protein confers resistance to hygromycin, G418, puromycin, phleomycin D1 or blasticidin.
127. The method of paragraph 126, wherein the antibiotic resistance protein confers resistance to hygromycin.
128. The method of any one of paragraphs 112-127, wherein the first intein is a split intein.
129. The method of any one of paragraphs 112-128, wherein the second intein is a split intein.
130. The method of paragraph 128 or 129, wherein the split intein is a natural split intein.
131. The method of paragraph 130, wherein the natural split intein is selected from DnaE inteins.
132. The method of paragraph 131, wherein the DnaE inteins are selected from Synechocystis sp. DnaE (SspDnaE) inteins and Nostoc punctiforme (NpuDnaE) inteins.
133. The method of paragraph 132, wherein the first intein is an NpuDnaE intein and the second intein is an NpuDnaE intein.
134. The method of any one of paragraphs 112-133, wherein the first molecule of interest, second molecule of interest, third molecule of interest, or any combination thereof is a protein.
135. The method of any one of paragraphs 112-133, wherein the first molecule of interest, second molecule of interest, third molecule of interest, or any combination thereof is a non-coding ribonucleic acid (RNA).
136. The method of paragraph 135, wherein the non-coding RNA is a microRNA (miRNA), antisense RNA, short-interfering RNA (siRNA) or short-hairpin RNA (shRNA).
137. The method of any one of paragraphs 112-136, wherein the first vector, second vector, third vector, or any combination thereof, is a plasmid vector or a viral vector.
138. A method comprising delivering to eukaryotic cells
(a) a first vector comprising (i) an N-terminal fragment of a hygB gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein and (ii) a nucleotide sequence encoding a first molecule of interest,
(b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a central fragment of the hygB gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein and (ii) a nucleotide sequence encoding a second molecule of interest, and
(c) a third vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a C-terminal fragment of the hygB gene and (ii) a nucleotide sequence encoding a third molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the first intein catalyze joining of the protein fragment encoded by N-terminal fragment of the hygB gene to a protein fragment encoded by the central fragment of the hygB gene, and the N-terminal fragment and the C-terminal fragment of the second intein catalyze joining of the protein fragment encoded by the central fragment of the hygB gene to the protein fragment encoded by the C-terminal fragment of the hygB gene, to produce a full-length hygromycin B phosphotransferase.
139. The method of paragraph 138, wherein the first vector encodes the sequence identified by SEQ ID NO: 29, the second vector encodes the sequence identified by SEQ ID NO: 61, and the third vector encodes the sequence identified by SEQ ID NO: 23.
140. The method of paragraph 138, wherein the first vector encodes the sequence identified by SEQ ID NO: 21, the second vector encodes the sequence identified by SEQ ID NO: 61, and the third vector encodes the sequence identified by SEQ ID NO: 35.
141. A eukaryotic cell comprising:
(a) a first vector comprising (i) a nucleotide sequence encoding an N-terminal fragment of an antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein and (ii) a nucleotide sequence encoding a first molecule of interest,
(b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a nucleotide sequence encoding a central fragment of the antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein and (ii) a nucleotide sequence encoding a second molecule of interest, and
(c) a third vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a nucleotide sequence encoding a C-terminal fragment of the antibiotic resistance protein and (ii) a nucleotide sequence encoding a third molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the first intein catalyze joining of N-terminal fragment of the antibiotic resistance protein to the central fragment of the antibiotic resistance protein, and the N-terminal fragment and the C-terminal fragment of the second intein catalyze joining of central fragment of the antibiotic resistance protein to the C-terminal fragment of the antibiotic resistance protein, to produce a full-length antibiotic resistance protein.
142. The eukaryotic cell of paragraph 112, wherein the eukaryotic cells are mammalian cells.
143. The eukaryotic cell of paragraph 141 or 142, wherein the antibiotic resistance protein confers resistance to hygromycin, G418, puromycin, phleomycin D1 or blasticidin.
144. The eukaryotic cell of paragraph 143, wherein the antibiotic resistance protein confers resistance to hygromycin.
145. The eukaryotic cell of any one of paragraphs 141-144, wherein the first intein is a split intein.
146. The eukaryotic cell of any one of paragraphs 142-145, wherein the second intein is a split intein.
147. The eukaryotic cell of paragraph 145 or 146, wherein the split intein is a natural split intein.
148. The eukaryotic cell of paragraph 147, wherein the natural split intein is selected from DnaE inteins.
149. The eukaryotic cell of paragraph 148, wherein the DnaE inteins are selected from Synechocystis sp. DnaE (SspDnaE) inteins and Nostoc punctiforme (NpuDnaE) inteins.
150. The eukaryotic cell of paragraph 149, wherein the first intein is an NpuDnaE intein and the second intein is an NpuDnaE intein.
151. The eukaryotic cell of any one of paragraphs 142-150, wherein the first molecule of interest, second molecule of interest, third molecule of interest, or any combination thereof is a protein.
152. The eukaryotic cell of any one of paragraphs 142-150, wherein the first molecule of interest, second molecule of interest, third molecule of interest, or any combination thereof is a non-coding ribonucleic acid (RNA).
153. The eukaryotic cell of paragraph 152, wherein the non-coding RNA is a microRNA (miRNA), antisense RNA, short-interfering RNA (siRNA) or short-hairpin RNA (shRNA).
154. The eukaryotic cell of any one of paragraphs 142-153, wherein the first vector, second vector, third vector, or any combination thereof, is a plasmid vector or a viral vector.
155. A composition comprising the eukaryotic cell of any one of paragraph 142-154.
156. A kit comprising:
(a) a first vector comprising a nucleotide sequence encoding an N-terminal fragment of an antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein,
(b) a second vector comprising a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a nucleotide sequence encoding a central fragment of the antibiotic resistance protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein, and
(c) a third vector comprising a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a nucleotide sequence encoding a C-terminal fragment of the antibiotic resistance protein,
wherein the N-terminal fragment and the C-terminal fragment of the first intein catalyze joining of N-terminal fragment of the antibiotic resistance protein to the central fragment of the antibiotic resistance protein, and the N-terminal fragment and the C-terminal fragment of the second intein catalyze joining of central fragment of the antibiotic resistance protein to the C-terminal fragment of the antibiotic resistance protein, to produce a full-length antibiotic resistance protein.
157. The kit of paragraph 156, wherein the antibiotic resistance protein confers resistance to hygromycin, G418, puromycin, phleomycin D1 or blasticidin.
158. The kit of paragraph 157, wherein the antibiotic resistance protein confers resistance to hygromycin.
159. The kit of any one of paragraphs 156-158, wherein the first intein is a split intein.
160. The kit of any one of paragraphs 156-159, wherein the second intein is a split intein.
161. The kit of paragraph 159 or 160, wherein the split intein is a natural split intein.
162. The kit of paragraph 161, wherein the natural split intein is selected from DnaE inteins.
163. The kit of paragraph 162, wherein the DnaE inteins are selected from Synechocystis sp. DnaE (SspDnaE) inteins and Nostoc punctiforme (NpuDnaE) inteins.
164. The kit of paragraph 163, wherein the first intein is an NpuDnaE intein and the second intein is an NpuDnaE intein.
165. The kit of any one of paragraphs 156-164, wherein the first molecule of interest, second molecule of interest, third molecule of interest, or any combination thereof is a protein.
166. The kit of any one of paragraphs 156-164, wherein the first molecule of interest, second molecule of interest, third molecule of interest, or any combination thereof is a non-coding ribonucleic acid (RNA).
167. The kit of paragraph 166, wherein the non-coding RNA is a microRNA (miRNA), antisense RNA, short-interfering RNA (siRNA) or short-hairpin RNA (shRNA).
168. The kit of any one of paragraphs 156-167, wherein the first vector, second vector, third vector, or any combination thereof, is a plasmid vector or a viral vector.
169. A method comprising delivering to eukaryotic cells
(a) a first vector comprising (i) a nucleotide sequence encoding an N-terminal fragment of a fluorescent protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein and (ii) a nucleotide sequence encoding a first molecule of interest,
(b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a nucleotide sequence encoding a central fragment of the fluorescent protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein and (ii) a nucleotide sequence encoding a second molecule of interest, and
(c) a third vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a nucleotide sequence encoding a C-terminal fragment of the fluorescent protein and (ii) a nucleotide sequence encoding a third molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the first intein catalyze joining of N-terminal fragment of the fluorescent protein to the central fragment of the fluorescent protein, and the N-terminal fragment and the C-terminal fragment of the second intein catalyze joining of central fragment of the fluorescent protein to the C-terminal fragment of the fluorescent protein, to produce a full-length fluorescent protein.
170. The method of paragraph 169 further comprising maintaining the eukaryotic cells under conditions that permit introduction of the first, second, and third vectors into the eukaryotic cells to produce transgenic eukaryotic cells.
171. The method of paragraph 170 further comprising selecting the transgenic eukaryotic cells that comprise the full-length fluorescent protein.
172. The method of any one of paragraphs 169-171, wherein the eukaryotic cells are mammalian cells.
173. The method of any one of paragraphs 169-172, wherein the fluorescent protein is selected from TagCFP, mTagCFP2, Czurite, ECFP2, mKalama1, Sirius, Sapphire, T-Sapphire, ECFP, Cerulean, SCFP3C, mTurquoise, mTurquoise2, monomeric Midoriishi-Cyan, TagCFP, mTFP1, EGFP, Emerald, Superfolder GFP, Monomeric Czami Green, TagGFP2, mUKG, mWasabi, Clover, mNeonGreen, EYFP, Citrine, Venus, SYFP2, TagYFP, Monomeric Kusabira-Orange, mKOK, mScarlet, mKO2, mOrange, mOrange2, mRaspberry, mCherry, mStrawberry, mTangerine, tdTomato, TagRFP, TagRFP-T, mCpple, mRuby, mRuby2, mPlum, HcRed-Tandem, mKate2, mNeptune, NirFP, TagRFP657, IFP1.4 and iRFP.
174. The method of paragraph 173, wherein the fluorescent protein is mScarlet.
175. The method of any one of paragraphs 169-174, wherein the first intein is a split intein.
176. The method of any one of paragraphs 169-175, wherein the second intein is a split intein.
177. The method of paragraph 175 or 176, wherein the split intein is a natural split intein.
178. The method of paragraph 177, wherein the natural split intein is selected from DnaE inteins.
179. The method of paragraph 178, wherein the DnaE inteins are selected from Synechocystis sp. DnaE (SspDnaE) inteins and Nostoc punctiforme (NpuDnaE) inteins.
180. The method of paragraph 179, wherein the first intein is an NpuDnaE intein and the second intein is an NpuDnaE intein.
181. The method of any one of paragraphs 169-170, wherein the first molecule of interest, second molecule of interest, third molecule of interest, or any combination thereof is a protein.
182. The method of any one of paragraphs 169-180, wherein the first molecule of interest, second molecule of interest, third molecule of interest, or any combination thereof is a non-coding ribonucleic acid (RNA).
183. The method of paragraph 182, wherein the non-coding RNA is a microRNA (miRNA), antisense RNA, short-interfering RNA (siRNA) or short-hairpin RNA (shRNA).
184. The method of any one of paragraphs 169-183, wherein the first vector, second vector, third vector, or any combination thereof, is a plasmid vector or a viral vector.
185. A method comprising delivering to eukaryotic cells
(a) a first vector comprising (i) an N-terminal fragment of a mScarlet gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein and (ii) a nucleotide sequence encoding a first molecule of interest,
(b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a central fragment of the mScarlet gene, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein and (ii) a nucleotide sequence encoding a second molecule of interest, and
(c) a third vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a C-terminal fragment of the mScarlet gene and (ii) a nucleotide sequence encoding a third molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the first intein catalyze joining of the protein fragment encoded by N-terminal fragment of the mScarlet gene to a protein fragment encoded by the central fragment of the mScarlet gene, and the N-terminal fragment and the C-terminal fragment of the second intein catalyze joining of the protein fragment encoded by the central fragment of the mScarlet gene to the protein fragment encoded by the C-terminal fragment of the mScarlet gene, to produce a full-length mScarlet protein.
186. The method of paragraph 185, wherein the first vector encodes the sequence identified by SEQ ID NO: 121, the second vector encodes the sequence identified by SEQ ID NO: 123, and the third vector encodes the sequence identified by SEQ ID NO: 125.
187. A eukaryotic cell comprising:
(a) a first vector comprising (i) a nucleotide sequence encoding an N-terminal fragment of a fluorescent protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein and (ii) a nucleotide sequence encoding a first molecule of interest,
(b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a nucleotide sequence encoding a central fragment of the fluorescent protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein and (ii) a nucleotide sequence encoding a second molecule of interest, and
(c) a third vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a nucleotide sequence encoding a C-terminal fragment of the fluorescent protein and (ii) a nucleotide sequence encoding a third molecule of interest,
wherein the N-terminal fragment and the C-terminal fragment of the first intein catalyze joining of N-terminal fragment of the fluorescent protein to the central fragment of the fluorescent protein, and the N-terminal fragment and the C-terminal fragment of the second intein catalyze joining of central fragment of the fluorescent protein to the C-terminal fragment of the fluorescent protein, to produce a full-length fluorescent protein.
188. The eukaryotic cell of paragraph 187 wherein the eukaryotic cells are mammalian cells.
189. The eukaryotic cell of paragraph 187 or 188, wherein the fluorescent protein is selected from TagCFP, mTagCFP2, Czurite, ECFP2, mKalama1, Sirius, Sapphire, T-Sapphire, ECFP, Cerulean, SCFP3C, mTurquoise, mTurquoise2, monomeric Midoriishi-Cyan, TagCFP, mTFP1, EGFP, Emerald, Superfolder GFP, Monomeric Czami Green, TagGFP2, mUKG, mWasabi, Clover, mNeonGreen, EYFP, Citrine, Venus, SYFP2, TagYFP, Monomeric Kusabira-Orange, mKOK, mScarlet, mKO2, mOrange, mOrange2, mRaspberry, mCherry, mStrawberry, mTangerine, tdTomato, TagRFP, TagRFP-T, mCpple, mRuby, mRuby2, mPlum, HcRed-Tandem, mKate2, mNeptune, NirFP, TagRFP657, IFP1.4 and iRFP.
190. The eukaryotic cell of paragraph 189, wherein the fluorescent protein is mScarlet.
191. The eukaryotic cell of any one of paragraphs 187-190, wherein the first intein is a split intein.
192. The eukaryotic cell of any one of paragraphs 185-191, wherein the second intein is a split intein.
193. The eukaryotic cell of paragraph 191 or 192, wherein the split intein is a natural split intein.
194. The eukaryotic cell of paragraph 193, wherein the natural split intein is selected from DnaE inteins.
195. The eukaryotic cell of paragraph 194, wherein the DnaE inteins are selected from Synechocystis sp. DnaE (SspDnaE) inteins and Nostoc punctiforme (NpuDnaE) inteins.
196. The eukaryotic cell of paragraph 195, wherein the first intein is an NpuDnaE intein and the second intein is an NpuDnaE intein.
197. The eukaryotic cell of any one of paragraphs 185-196, wherein the first molecule of interest, second molecule of interest, third molecule of interest, or any combination thereof is a protein.
198. The eukaryotic cell of any one of paragraphs 185-196, wherein the first molecule of interest, second molecule of interest, third molecule of interest, or any combination thereof is a non-coding ribonucleic acid (RNA).
199. The eukaryotic cell of paragraph 198, wherein the non-coding RNA is a microRNA (miRNA), antisense RNA, short-interfering RNA (siRNA) or short-hairpin RNA (shRNA).
200. The eukaryotic cell of any one of paragraphs 185-199, wherein the first vector, second vector, third vector, or any combination thereof, is a plasmid vector or a viral vector.
201. A composition comprising the eukaryotic cell of any one of paragraph 185-200.
202. A kit comprising:
(a) a first vector comprising a nucleotide sequence encoding an N-terminal fragment of a fluorescent protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein,
(b) a second vector comprising a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a nucleotide sequence encoding a central fragment of the fluorescent protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein, and
(c) a third vector comprising a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a nucleotide sequence encoding a C-terminal fragment of the fluorescent protein,
wherein the N-terminal fragment and the C-terminal fragment of the first intein catalyze joining of N-terminal fragment of the fluorescent protein to the central fragment of the fluorescent protein, and the N-terminal fragment and the C-terminal fragment of the second intein catalyze joining of central fragment of the fluorescent protein to the C-terminal fragment of the fluorescent protein, to produce a full-length fluorescent protein.
203. The kit of paragraph 202, wherein the fluorescent protein is selected from TagCFP, mTagCFP2, Czurite, ECFP2, mKalama1, Sirius, Sapphire, T-Sapphire, ECFP, Cerulean, SCFP3C, mTurquoise, mTurquoise2, monomeric Midoriishi-Cyan, TagCFP, mTFP1, EGFP, Emerald, Superfolder GFP, Monomeric Czami Green, TagGFP2, mUKG, mWasabi, Clover, mNeonGreen, EYFP, Citrine, Venus, SYFP2, TagYFP, Monomeric Kusabira-Orange, mKOK, mScarlet, mKO2, mOrange, mOrange2, mRaspberry, mCherry, mStrawberry, mTangerine, tdTomato, TagRFP, TagRFP-T, mCpple, mRuby, mRuby2, mPlum, HcRed-Tandem, mKate2, mNeptune, NirFP, TagRFP657, IFP1.4 and iRFP.
204. The kit of paragraph 203, wherein the fluorescent protein is mScarlet.
205. The kit of any one of paragraphs 202-204, wherein the first intein is a split intein.
206. The kit of any one of paragraphs 202-205, wherein the second intein is a split intein.
207. The kit of paragraph 206, wherein the split intein is a natural split intein.
208. The kit of paragraph 207, wherein the natural split intein is selected from DnaE inteins.
209. The kit of paragraph 208, wherein the DnaE inteins are selected from Synechocystis sp. DnaE (SspDnaE) inteins and Nostoc punctiforme (NpuDnaE) inteins.
210. The kit of paragraph 209, wherein the first intein is an NpuDnaE intein and the second intein is an NpuDnaE intein.
211. The kit of any one of paragraphs 202-210, wherein the first molecule of interest, second molecule of interest, third molecule of interest, or any combination thereof is a protein.
212. The kit of any one of paragraphs 202-210, wherein the first molecule of interest, second molecule of interest, third molecule of interest, or any combination thereof is a non-coding ribonucleic acid (RNA).
213. The kit of paragraph 212, wherein the non-coding RNA is a microRNA (miRNA), antisense RNA, short-interfering RNA (siRNA) or short-hairpin RNA (shRNA).
214. The kit of any one of paragraphs 202-213, wherein the first vector, second vector, third vector, or any combination thereof, is a plasmid vector or a viral vector.
215. The kit of any one of paragraphs 202-214, further comprising any one or more of the following components: buffers, salts, cloning enzymes, competent cells, transfection reagents, antibiotics, and/or instructions for performing the methods described herein.
216. A transgenic selection method comprising delivering to a composition comprising eukaryotic cells (a) a first vector comprising (i) a nucleotide sequence encoding a first selectable marker protein fragment (e.g., antibiotic resistance protein fragment or fluorescent protein fragment) upstream from a nucleotide sequence encoding an N-terminal intein protein fragment and (ii) a nucleotide sequence encoding a first molecule, and (b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal intein protein fragment upstream from a second selectable marker protein fragment (e.g., antibiotic resistance protein fragment or fluorescent protein fragment) and (ii) a nucleotide sequence encoding a second molecule, wherein the N-terminal intein protein fragment and the C-terminal intein protein fragment catalyze joining of the first selectable marker protein fragment to the second selectable marker protein fragment to produce a full-length selectable marker protein.
217. A transgenic selection method comprising delivering to eukaryotic cells (a) a first vector comprising (i) a nucleotide sequence encoding an N-terminal fragment of a selectable marker protein (e.g., antibiotic resistance protein or fluorescent protein), which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein and (ii) a nucleotide sequence encoding a first molecule of interest, (b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a nucleotide sequence encoding a central fragment of the selectable marker protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein and (ii) a nucleotide sequence encoding a second molecule of interest, and (c) a third vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a nucleotide sequence encoding a C-terminal fragment of the selectable marker protein and (ii) a nucleotide sequence encoding a third molecule of interest, wherein the N-terminal fragment and the C-terminal fragment of the first intein catalyze joining of N-terminal fragment of the selectable marker protein to the central fragment of the selectable markerprotein, and the N-terminal fragment and the C-terminal fragment of the second intein catalyze joining of central fragment of the selectable markerprotein to the C-terminal fragment of the selectable markerprotein, to produce a full-length selectable markerprotein.
218. A transgenic selection method comprising delivering to eukaryotic cells (a) a first vector comprising (i) a nucleotide sequence encoding an N-terminal fragment of a selectable marker protein (e.g., antibiotic resistance protein or fluorescent protein), which is upstream from a nucleotide sequence encoding an N-terminal fragment of a first intein and (ii) a nucleotide sequence encoding a first molecule of interest, (b) a second vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the first intein, which is upstream from a nucleotide sequence encoding a first central fragment of the selectable marker protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a second intein and (ii) a nucleotide sequence encoding a second molecule of interest, (c) a third vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the second intein, which is upstream from a nucleotide sequence encoding a second central fragment of the selectable marker protein, which is upstream from a nucleotide sequence encoding an N-terminal fragment of a third intein and (ii) a nucleotide sequence encoding a third molecule of interest, and (d) a fourth vector comprising (i) a nucleotide sequence encoding a C-terminal fragment of the third intein, which is upstream from a nucleotide sequence encoding a C-terminal fragment of the selectable marker protein and (ii) a nucleotide sequence encoding a third molecule of interest, wherein the N-terminal fragment and the C-terminal fragment of the first intein catalyze joining of N-terminal fragment of the selectable marker protein to the first central fragment of the selectable marker protein, the N-terminal fragment and the C-terminal fragment of the second intein catalyze joining of first central fragment of the selectable marker protein to the second central fragment of the selectable marker protein, the N-terminal fragment and the C-terminal fragment of the third intein catalyze joining of second central fragment of the selectable marker protein to the C-terminal fragment of the selectable marker protein to produce a full-length selectable marker protein.
219. The method of any one of paragraphs 216-218 further comprising maintaining the eukaryotic cells under conditions that permit introduction of the vectors into the eukaryotic cells to produce transgenic eukaryotic cells.
220. The method of paragraph 219 further comprising selecting the transgenic eukaryotic cells that comprise the full-length selectable marker protein.
221. The method of any one of paragraphs 216-220, wherein the eukaryotic cells are mammalian cells.
222. The method of any one of paragraphs 216-221, wherein the antibiotic resistance protein confers resistance to hygromycin, G418, puromycin, phleomycin D1 or blasticidin.
223. The method of any one of paragraphs 216-222, wherein the intein is a split intein.
224. The method of paragraph 223, wherein the split intein is a natural split intein.
225. The method of paragraph 224, wherein the natural split intein is selected from DnaE inteins.
226. The method of paragraph 225, wherein the DnaE inteins are selected from Synechocystis sp. DnaE (SspDnaE) inteins and Nostoc punctiforme (NpuDnaE) inteins.
227. The method of paragraph 223, wherein the split intein is an engineered split intein.
228. The method of paragraph 2278, wherein the engineered split intein is engineered from DnaB inteins.
229. The method of paragraph 228, wherein the engineered split intein is a SspDnaB S1 intein.
230. The method of paragraph 229, wherein the engineered split intein is engineered from GyrB inteins.
231. The method of paragraph 230, wherein the engineered split intein is a SspGyrB S11 intein.
232. The method of any one of paragraphs 216-231, wherein the molecules are selected from proteins.
233. The method of any one of paragraphs 216-231, wherein the molecules are selected from non-coding ribonucleic acids (RNAs).
234. The method of paragraph 233, wherein the non-coding RNAs are selected from microRNAs (miRNAs), antisense RNAs, short-interfering RNAs (siRNAs), and short-hairpin RNAs (shRNAs).
235. The method of any one of paragraphs 216-234, wherein the vectors are selected from plasmid vectors and viral vectors.
The present disclosure is further illustrated by the following Examples. These Examples are provided to aid in the understanding of the disclosure, and should not be construed as a limitation thereof.
Selectable markers are often used in genetic engineering to isolate cells with desired genotypes [1]. However, there are a limited number of well-characterized antibiotic resistance genes for use in eukaryotic cells and a limited number of fluorescent proteins whose spectra can be unambiguously differentiated by equipment in ordinary laboratories. Researchers often run into the problem of not having enough choices of selectable markers if they are to incorporate multiple transgenes into a cell. On the other hand, selection with multiple antibiotics at the same time is often harsh to cells. “Selectable marker recycling” may provide a work-around, however, requiring multiple rounds of transgenesis, selection and removal of selection markers [2]. To allow multiple transgenes to be selected by one selection scheme at the same time, we have created split antibiotics resistance and fluorescent protein genes wherein a gene encoding an antibiotic resistance or fluorescent protein is split into two or more segments fused to inteins (“markertrons”) that can be rejoined by protein trans-splicing [3] (
We started out with engineering 2-markertron intein-split resistance (Intres) genes for double transgenesis. Since flanking residues and local protein folding can affect efficiency of intein-mediated trans-splicing, we set out to identify split points in each of the four commonly used antibiotic resistance genes compatible with two well-characterized split inteins derived from NpuDnaE [4, 5] and SspDnaB [6]. To facilitate assessment of the effectiveness of double transgenic selection, we cloned markertrons onto lentiviral vectors expressing TagBFP2 or mCherry fluorescent proteins as test transgenes (
To facilitate adoption of Intres markers, we created Gateway-compatible lentiviral vectors for convenient restriction-ligation-independent LR clonase recombination of transgenes 8 (FIG. 7A). We tested the functionality of these vectors by recombining TagBPF2 and mCherry, respectively to the N- and C-Intres vectors and found robust selection of double transgenic cells (
To test whether split fluorescent markers can be used for transgene selection, we screened for NpuDnaE split points for mScarlet fluorescent protein (
With the split points identified for 2-markertron Intres genes, we set out to engineer higher degree split markers. We tested combinations of splits points to partition a marker gene into three or more markertrons to allow for co-selection of more than two “unlinked” transgenes with one antibiotics (
To facilitate the use of 3-markertron Intres, we created Gateway compatible lentiviral vectors with these markers (
We further tested the feasibility of 4-markertron hygromycin Intres genes (
CRISPR/Cas has recently emerged as a powerful technology for genome engineering and editing. Although gene knockout based on NHEJ-mediated insertions/deletions (indels) occur at high frequency, precise editing and knock-in based on homology directed repair (HDR) using exogenous repair templates (a.k.a targeting constructs) are inefficient. We tested whether split selectable markers can be used to enrich for cells with biallelic knock-in at the AAVS1 locus. We constructed targeting constructs with homology arms flanking the target site, and splice acceptor-2A peptide to trap the markertrons within intron one of the host gene PPP1R12C. However, we did not obtain any live cells after CRISPR/Cas knock-in experiments using these targeting constructs and two weeks of antibiotic selection (data not shown). We suspected that the endogenous promoter of the host gene PPP1R12C might not drive sufficient expression of markertrons to reconstitute enough antibiotic resistance protein to counteract actions of the antibiotics. We thus tested an alternative strategy to express Intres markertrons by TetO promoter whose activity can be titrated by doxycycline (dox) concentration. To allow comparison of Intres-mediated biallelic selection versus full-length (FL) non-split selectable markers, we implemented several different targeting construct designs. First, we drive expression of a full-length (FL) resistance gene (e.g., Hygro) together with rtTA under a constitutive EF1a promoter and a separate test Intres (e.g., Blast Intres) under a dox-inducible TetO promoter (
In the Examples above, we have engineered split antibiotic resistance and fluorescent protein genes that can allow selection for two or more “unlinked” transgenes. By inserting unnatural residues at selectable markers, we showed that novel high-efficiency split points can be utilized, expanding the positions available for engineering. We demonstrated that split selectable markers can be incorporated into lentiviral vectors or gene targeting constructs in CRISPR/Cas9 genome editing experiments to enable enrichment of cells with double transgenesis or biallelic knock-ins. By combining two or more splits points, we showed that 3- and 4-split markers can be generated to allow higher degree transgenic selection. Future development of even higher-degree split selectable markers may enable “hyper-engineering” of cells containing tens of transgenes or targeted knock-ins.
Cloning
To generate a test plasmid for each markertron, we first generated a Gateway donor plasmid containing its ORF and then recombine into lentiviral destination vector with TagBFP2 (Plasmid 94: pLX-DEST-IRES-TagBFP2), EGFP (Plasmid 95: pLX-DEST-IRES-EGFP), or mCherry (Plasmid 96: pLX-DEST-IRES-mCherry) reporters, which were derived from pLX302 (addgene.org/25896/) by removing Puromycin resistance gene and inserting IRES-fluorescent genes downstream of the Gateway cassette. The markertron-ORF Gateway donor plasmids were generated either by a nested fusion PCR procedure to combine intein with the coding sequence of fragments of the selectable marker followed by insertion into the pCR8-GW-TOPO plasmid by sequence- and ligation-independent cloning (SLIC) (Li, M. Z. & Elledge, S. J. SLIC: a method for sequence- and ligation-independent cloning. Gene Synthesis: Methods and Protocols, 51-59 (2012)), or PCR-amplifying the relevant fragment of the selectable marker followed by insertion into “scaffold” plasmids (Plasmids 27-32) containing the intein sequences by SLIC. DNA sequences encoding inteins were codon optimized for Homo sapiens, and synthesized as GBlock (IDT), with AC1947 GB encoding NpuDnaE intein, AC1949 GB encoding SspDnaB intein. Selectable marker fragments were amplified from plasmids containing these markers. See Table 1 for plasmids.
Cell Culture
All cells were cultivated in Dulbecco's modified Eagle's medium (DMEM) (Sigma) with 10% fetal bovine serum (FBS)(Lonza), 4% Glutamax (Gibco), 1% Sodium Pyruvate (Gibco) and penicillin-streptomycin (Gibco). Incubator conditions were 37° C. and 5% CO2.
Virus Production
A viral packaging mix of pLP1, pLP2, and VSV-G were co-transfected with each lentiviral vector into Lenti-X 293T cells (ClonTech), seeded the day before in 6-well plates at a concentration of 1.2×106 cells per well, using Lipofectamine 3000. Media was changed 6 h after transfection then incubated overnight. 28 hour post transfection, the media supernatant containing virus was filtered using 45 uM PES filters then stored at −80° C. until use.
Transduction
The day prior to transduction, target cells (HEK293T, MCF7, U2-OS) were seeded into 12-well plates at a density of 1.5×105 cells per well. Prior to transduction, media was changed to media containing 10 μg/mL polybrene, 1 mL per well. 250 μL of each respective virus (500 μL total for experimental samples with two viruses added) was added to each well and incubated overnight. Media was changed 24 hour post infection. 4 day post infection cells were split into duplicate plates. 5 day post infection media with antibiotic (hygromycin) was added to each respective well of one replicate plate (the other remained under no selection). Antibiotic selection continued for 2 weeks before analysis on FACS.
Fluorescent-Activated Cell Sorting
Cells were trypisinized, suspended in media then analyzed on a LSRFortessa X-20 (BD Bioscience) flow cytometer using FACSDiVa software, version 8, on an HP Z230 workstation. Fifty thousand events were collected each run.
All references, patents and patent applications disclosed herein are incorporated by reference with respect to the subject matter for which each is cited, which in some cases may encompass the entirety of the document.
The indefinite articles “a” and “an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.”
It should also be understood that, unless clearly indicated to the contrary, in any methods claimed herein that include more than one step or act, the order of the steps or acts of the method is not necessarily limited to the order in which the steps or acts of the method are recited.
In the claims, as well as in the specification above, all transitional phrases such as “comprising,” “including,” “carrying,” “having,” “containing,” “involving,” “holding,” “composed of,” and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases “consisting of” and “consisting essentially of” shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures, Section 2111.03.
The terms “about” and “substantially” preceding a numerical value mean±10% of the recited numerical value.
Where a range of values is provided, each value between the upper and lower ends of the range are specifically contemplated and described herein.
This application claims the benefit under 35 U.S.C. § 119(e) of U.S. provisional application No. 62/616,281, filed Jan. 11, 2018, U.S. provisional application No. 62/608,478, filed Dec. 20, 2017, U.S. provisional application No. 62/624,629, filed Jan. 31, 2018, U.S. provisional application No. 62/571,672, filed Oct. 12, 2017, which is incorporated by reference herein in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2018/055412 | 10/11/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62624629 | Jan 2018 | US | |
62616281 | Jan 2018 | US | |
62608478 | Dec 2017 | US | |
62571672 | Oct 2017 | US |