The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jul. 29, 2020, is named M2-PCT_SL.txt and is 54,500 bytes in size.
The present disclosure is in the field of transgenesis. New compositions for use in inserting a transgene into a cell; and methods utilizing said new compositions, are provided herein.
Methods for insertion of exogenous genes (transgenes) into cells are increasingly important in the fields of genetic research and gene therapy. Although a number of methods for introducing transgenes into cells exist; all are beset with problems of one sort or another. Transfection methods (i.e., simply contacting cells with naked DNA or a DNA conjugate) have a low efficiency and often result in the exogenous sequences undergoing rearrangement in the recipient cell.
Viral vectors; including adenovirus, adeno-associated virus (AAV), retrovirus, foamy virus, herpesvirus, and poxvirus vectors; have also been used for inserting transgenes into cells. Viral transgenesis is more efficient than simple transfection, and can provide stable transgenesis if the virally-introduced transgene is integrated into the recipient cell genome, or maintained in the recipient cell as an episome. However, viral vectors require modification of the viral genome so that replication is blocked or inefficient; which, in turn, requires that the debilitated vector virus be propagated in the presence of a helper virus (which supplies, in trans, the functions missing in the vector virus), requiring complicated culture systems.
An additional drawback associated with the use of viral vectors is the limitations on the size of the transgene that can be inserted into a viral vector; since even vector viruses must retain a certain amount of viral sequences to work effectively as a delivery vehicle; and most viruses are unable to package DNA molecules any larger that about 110% of viral genome size.
Another problem with the use of viral vectors in gene therapy is the ability of the capsid proteins of the vector virus to induce an immune response, which can destroy or damage the vector before the transgene is stably introduced into the recipient cell.
One class of viral vectors is retroviruses. Retroviruses (which include the genus of lentiviruses) have a single-stranded RNA genome. A repeated sequence (R) is present at the extreme 5′ and 3′ ends of the retroviral genome. Immediately interior to the R sequence, at the 5′ end of viral RNA, is a sequence known as U5. Immediately interior to the R sequence, at the 3′ end of viral RNA, is a sequence known as U3. A schematic diagram of a generic retroviral RNA genome, showing the location of the R, U5 and U3 sequences, is shown in
During the retroviral infectious cycle, the RNA genome is copied into a single-stranded DNA molecule (by a process of reverse transcription, catalyzed by the reverse transcriptase enzyme, product of the viral pol gene). The single-stranded DNA product of reverse transcription is then copied (again by reverse transcriptase) to form a double-stranded viral DNA molecule. Due to the nature of the copying processes (e.g., requirements for primers), the U3 sequence becomes appended to the 5′ end of the double-stranded viral DNA genome (exterior to the R sequence); and the U5 sequence is appended to the 3′ end of the double-stranded viral DNA genome (exterior to the R sequence), forming identical long terminal repeat (LTR) sequences at the termini of the double-stranded DNA genome. A schematic diagram of a generic retroviral double-stranded DNA genome, showing the location of the LTRs, and their constituent R, U5 and U3 sequences, is shown in
Following conversion of the single-stranded RNA genome to a double-stranded DNA genome; the double-stranded DNA genome, flanked by its LTRs, is inserted into the host cell genome. This insertion reaction is catalyzed by the viral integrase protein (also a product of the pol gene), and requires a double-stranded, blunt-ended DNA molecule, with the inverted terminal repeat sequence 5′-ACTG-3′ (for HIV-1) as a substrate. The integrase protein removes the terminal TG residues on each strand, generating a double-stranded DNA molecule with a two-nucleotide 5′ overhang (5′-AC-3′) at each end. This molecule serves as a substrate for strand transfer by the int protein and is integrated into the host cell genome.
Retrovirus genomes are generally 8 kb or more in length and because, in most cases, all viral structural genes can be removed and replaced with exogenous sequences, retroviral vectors have a high capacity; requiring only that the transgene be flanked by viral LTRs to facilitate integration. However, the efficiency of stable transgenesis using retroviruses is comparatively low, and most retroviruses (excepting lentiviruses) are unable to infect dividing cells. Furthermore, when retrovirus vectors are used in gene therapy applications, retroviral capsid proteins can trigger immune responses.
For the reasons discussed above, there remains a need for transgenesis systems which have the benefits of viral vectors, such as high efficiency of genomic integration; but that do not suffer from the drawbacks associated with viral vectors, such as limited capacity and immunogenicity.
Disclosed herein are nucleic acid compositions, and methods for their manufacture and use, that promote highly efficient insertion of transgenes, at levels commonly achieved with viral vectors, but without the use of virus particles. The compositions include transgene cassettes, which have a linear double-stranded DNA structure that resembles a retroviral pre-integration substrate, characterized by blunt ends, a terminal 5′-ACTG-3′ sequence and truncated retroviral long terminal repeat (LTR) sequences. Nucleic acid vectors (insertion vectors) comprising transgene cassettes are also provided.
Transgene cassettes can be released from an insertion vector (e.g., a double-stranded circular plasmid DNA molecule) by cleavage with a restriction enzyme that generates blunt ends. Insertion vectors comprise one or more pairs of att sites, optionally with a negative selection marker disposed therebetween, for convenient insertion of transgenes using gateway cloning methods. Exterior to the att sites, insertion cassettes contain truncated retroviral long terminal repeat (LTR) sequences, a 5′-ACTG-3′ sequence and recognition sites for a blunt end-generating restriction enzyme.
Integration of a transgene into the genome of a cell is accomplished by contacting the cell with a transgene cassette and a source of retroviral integrase (e.g., DNA or mRNA encoding a retroviral integrase (int) enzyme. The integrase protein recognizes the transgene cassette as a substrate for integration, and integrates the transgene cassette into the genome of the recipient cell.
Accordingly, in certain embodiments, provided herein is a polynucleotide (i.e., a transgene cassette) comprising: (a) one or more selection markers, wherein the selection markers are flanked by (b) first and second att sites, wherein the att sites are flanked by (c) first and second truncated retroviral long terminal repeats (LTRs), wherein the first truncated LTR is upstream of the first att site, the second truncated LTR is downstream of the second att site, and wherein the first and second truncated retroviral LTRs are flanked by recognition sites for a restriction enzyme, wherein cleavage of the recognition sites generates blunt ends, and wherein the sequence 5′-ACTG-3′ is present at or near the termini of the polynucleotide.
In certain embodiments, the polynucleotide described in the preceding paragraph is a double-stranded DNA molecule. In additional embodiments, the polynucleotide is single-stranded DNA or RNA.
Selection markers can be positive selection markers (i.e., the presence of the marker promotes cell viability in the presence of a selective agent) or negative selection markers (e.g., a marker that is inhibitory to cell viability so that cells survive when the marker is removed or replaced by exogenous sequences). Exemplary positive selection markers include those encoding resistance to antibiotics such as, for example, penicillin, ampicillin, tetracycline and chloramphenicol. Exemplary negative selection markers include the DNA gyrase inhibitor ccdB.
In certain embodiments, the att sites present in the transgene cassette are attR sites. In further embodiments, the first att site is attR4 and the second att site is attR3. In additional embodiments, the att sites are attL sites, attP sites or attB sites. Mutants and variants of att sites such as, for example, attP3, attP4, attR1, attR2 attR3 attR4, attL1, attL2 attL3 and attL4 are known in the art.
Truncated retroviral LTR sequences can be obtained from the genome of any retrovirus, as known in the art. In certain embodiments, the retrovirus is a lentivirus and the transgene cassette contains truncated lentiviral LTRs. In additional embodiments, the lentivirus is HIV, and the transgene cassette contains truncated HIV LTRs. In further embodiments, the lentivirus is HIV-1, and the transgene cassette contains truncated HIV-1 LTRs.
In certain embodiments, a truncated retroviral LTR is one in which one or more transcriptional regulatory sequences, normally present in the U3 region, are removed. Accordingly, certain truncated LTRs contain deleted U3 (dU3) R and U5 sequences. In additional embodiments of a truncated retroviral LTR, all U3 sequences are removed. Accordingly, certain truncated LTRS contain R and U5 sequences, but no U3 sequences. In certain embodiments, the first truncated LTR comprises R and U5 sequence elements and the second truncated LTR comprises dU3, R and U5 sequence elements. In additional embodiments, the first truncated LTR comprises the nucleotide sequence:
In additional embodiments, the second truncated LTR sequence comprises the nucleotide sequence:
In further embodiments, the first truncated LTR comprises the nucleotide sequence:
and the second truncated LTR sequence comprises the nucleotide sequence:
The termini of the transgene cassette comprise recognition sites for a restriction enzyme whose cleavages results in production of blunt ends. In certain embodiments, the recognition sites comprise six or more nucleotide pairs (i.e., six, seven, eight, nine, ten, twelve or more nucleotide pairs). The longer the recognition site, the less likely it is that the restriction enzyme that recognizes that site will also recognize a site in the transgene insert (thereby destroying the integrity of the transgene). Generally both recognition sites will be recognized by the same restriction enzyme, but it is also possible to have recognition sites for different restriction enzymes at each end of the cassette, as long as both enzymes generate blunt ends after cleavage. In certain embodiments, the recognition sites are the same at both ends of the cassette and are recognized by a restriction enzyme selected from the group consisting of PmeI, ScaI and Bst Z17I.
Transgene cassettes also contain the sequence 5′-ACTG-3′ at or near the termini of the polynucleotide. In certain embodiments, the sequence 5′-ACTG-3′ is present exactly at the termini of the transgene cassette, such that the transgene cassette terminates in blunt ends having the sequence
In other embodiments, one additional nucleotide pair is present, outside the sequence 5′-ACTG-3′, at the termini of the transgene cassette. In additional embodiments, two additional nucleotide pairs are present, outside the sequence 5′-ACTG-3′, at the termini of the transgene cassette. In further embodiments, three, four or five additional nucleotide pairs are present, outside the sequence 5′-ACTG-3′, at the termini of the transgene cassette.
In certain embodiments, provided herein is a transgene cassette comprising (a) sequences encoding chloramphenicol resistance and the ccdB locus, wherein the sequences encoding chloramphenicol resistance and the ccdB locus are flanked by (b) an upstream attR4 site and a downstream attR3 site, wherein the att sites are flanked by (c) a 5′ dLTR sequence comprising R and U5 sequence elements upstream of the attR4 site and a 3′ dLTR sequence comprising dU3, R and U5 sequence elements downstream of the attR3 site, wherein the 5′ and 3′ dLTR sequences are flanked by (e) recognition sites for a restriction enzyme selected from the group consisting of PmeI, ScaI and BstZ17I and wherein all or part of the sequence 5′-ACTG-3′ is present within or near the recognition site for the restriction enzyme.
In certain embodiments of the transgene cassette described in the preceding paragraph, the 5′ dLTR sequence comprises SEQ ID NO:4, and the 3′ dLTR sequence comprises SEQ ID NO:5.
In additional embodiments, polynucleotides whose nucleotide sequences are homologous to that of the transgene cassette are provided. The nucleotide sequences of the homologous polynucleotides are at least 50% homologous, at least 60% homologous, at least 70% homologous, at least 75% homologous, at least 80% homologous, at least 85% homologous, at least 90% homologous, at least 95% homologous, at least 96% homologous, at least 97% homologous, at least 98% homologous, or at least 99% homologous to the sequence of the transgene cassettes described herein. Such homologous polynucleotides can be DNA or RNA and can be single-stranded or double-stranded.
In additional embodiments, polynucleotides having nucleotide sequences complementary to the sequence of either strand of the transgene cassette are provided. Such polynucleotides can be DNA or RNA. In further embodiments, this disclosure provides polynucleotides that hybridize under stringent conditions to a transgene cassette as disclosed herein.
Also provided are nucleic acid vectors (e.g., plasmid vectors) comprising a transgene cassette as disclosed herein; i.e., transgene vectors. Accordingly, in certain embodiments, provided herein is a plasmid comprising: (a) one or more selection markers, wherein the selection markers are flanked by (b) first and second att sites, wherein the att sites are flanked by (c) first and second truncated retroviral long terminal repeats (LTRs), wherein the first truncated LTR is upstream of the first att site, the second truncated LTR is downstream of the second att site, and wherein the first and second truncated retroviral LTRs are flanked by recognition sites for a restriction enzyme, wherein cleavage of the recognition sites generates blunt ends and wherein all or part of the sequence 5′-ACTG-3′ is present within or near the recognition site for the restriction enzyme.
In additional embodiments, provided herein is a plasmid comprising (a) sequences encoding chloramphenicol resistance and the ccdB locus, wherein the sequences encoding chloramphenicol resistance and the ccdB locus are flanked by (b) an upstream attR4 site and a downstream attR3 site, wherein the att sites are flanked by (c) a 5′ dLTR sequence comprising R and U5 sequence elements upstream of the attR4 site and a 3′ dLTR sequence comprising dU3, R and U5 sequence elements downstream of the attR3 site, wherein the 5′ and 3′ dLTR sequences are flanked by (d) first and second 5′-ACTG-3′ sequences, wherein all or part of the first and second 5′-ACTG-3′ sequences are within or near (e) recognition sites for a restriction enzyme selected from the group consisting of PmeI, ScaI and BstZ17I.
Also provided are plasmid vectors comprising a transgene cassette and a transgene. In certain embodiments, the transgene is located between the att sites of the transgene cassette, having been inserted by gateway cloning methodology, and optionally replacing one or more selection markers that were present between the att sites prior to insertion of the transgene. In certain embodiments, att sites present in the transgene vector (e.g., attR4 and attR3) are converted into different att sites (e.g., attP4 and attP3) in the process of transgene insertion. Transgenes are introduced by one-way, two-way or three-way gateway cloning, as known in the art. See, for example, Hartley et al. (2000) Genome Research 10:1788-1795.
Any sequence, coding or noncoding, can serve as a transgene. For example, a transgene can encode a detectable moiety; e.g., a fluorescent protein, such as green fluorescent protein (GFP), enhanced green fluorescent protein (EGFP), red fluorescent protein, yellow fluorescent protein, tdTomato and the like. A transgene can also encode an enzymatic activity (e.g., β-galactosidase, β-glucuronidase, luciferase, or an oxidorecuctase). A transgene can also be a therapeutic protein, such as globin or a coagulation factor.
Accordingly, in certain embodiments, provided herein is a polynucleotide comprising: (a) a transgene, wherein the transgene is flanked by (b) first and second att sites, wherein the att sites are flanked by (c) first and second truncated retroviral long terminal repeats (LTRs), wherein the first truncated LTR is upstream of the first att site, the second truncated LTR is downstream of the second att site, and wherein the first and second truncated retroviral LTRs are flanked by recognition sites for a restriction enzyme, wherein cleavage of the recognition sites generates blunt ends; the polynucleotide further comprising the sequence 5′-ACTG-3′ at or near its termini (i.e., at the termini of the polynucleotide, or within one, two, three four or five nucleotide pairs of the termini of the polynucleotide); and optionally wherein a selection marker is not present between the two att sites. In certain embodiments, the 5′ dLTR sequence comprises SEQ ID NO:4, and the 3′ dLTR sequence comprises SEQ ID NO:5. In further embodiments, this polynucleotide is present in a plasmid. In additional embodiments, this polynucleotide is a linear, double-stranded DNA molecule.
In additional embodiments, provided herein is a polynucleotide comprising (a) a transgene, wherein the transgene is flanked by (b) an upstream attP4 site and a downstream attP3 site, wherein the att sites are flanked by (c) a 5′ dLTR sequence comprising R and U5 sequence elements upstream of the attP4 site and a 3′ dLTR sequence comprising dU3, R and U5 sequence elements downstream of the attP3 site, wherein the 5′ and 3′ dLTR sequences are flanked by recognition sites for a restriction enzyme selected from the group consisting of PmeI, ScaI and BstZ17I, wherein the sequence 5′-ACTG-3′ is present within or near the recognition site for the restriction enzyme, and optionally wherein a selection marker is not present between the attP4 and attP3 sites. In certain embodiments, the 5′ dLTR sequence comprises SEQ ID NO:4, and the 3′ dLTR sequence comprises SEQ ID NO:5. In further embodiments, this polynucleotide is present in a plasmid. In additional embodiments, this polynucleotide is a linear, double-stranded DNA molecule.
In certain embodiments, the compositions disclosed herein comprise a plurality of DNA molecules resulting from cleavage of a plasmid with a restriction enzyme that generates blunt ends, wherein the plasmid comprises a transgene-containing transgene cassette. In additional embodiments, the restriction enzyme is selected from the group consisting of PmeI, ScaI and BstZ17I.
Accordingly, in certain embodiments, provided herein is a plurality of DNA molecules, one of which comprises: (a) transgene, wherein the transgene is flanked by (b) first and second att sites, wherein the att sites are flanked by (c) first and second truncated retroviral long terminal repeats (LTRs), wherein the first truncated LTR is upstream of the first att site, the second truncated LTR is downstream of the second att site, and wherein the first and second truncated retroviral LTRs are flanked by (d) partial restriction enzyme recognition sites generated by cleavage with a restriction enzyme that generates blunt ends; and further comprising all or part of the sequence 5′-ACTG-3′ at or near its termini (i.e., at its terminus, or within one, two, three four or five nucleotide pairs of its terminus). In further embodiments, the restriction enzyme is selected from the group consisting of PmeI, ScaI and BstZ17I. In certain embodiments, the 5′ dLTR sequence comprises SEQ ID NO:4, and the 3′ dLTR sequence comprises SEQ ID NO:5.
In additional embodiments, this disclosure provides a plurality of DNA molecules, one of which comprises (a) a transgene, wherein the transgene is flanked by (b) an upstream attP4 site and a downstream attP3 site, wherein the att sites are flanked by (c) a 5′ dLTR sequence comprising R and U5 sequence elements upstream of the attR4 site and a 3′ dLTR sequence comprising dU3, R and U5 sequence elements downstream of the attR3 site, wherein the 5′ and 3′ dLTR sequences are flanked by (d) partial restriction enzyme recognition sites generated by cleavage with a restriction enzyme that generates blunt ends; and further comprising all or part of the sequence 5′-ACTG-3′ at or near its termini (i.e., at its terminus, or within one, two, three four or five nucleotide pairs of its terminus). In further embodiments, the restriction enzyme is selected from the group consisting of PmeI, ScaI and BstZ17I. In certain embodiments, the 5′ dLTR sequence comprises SEQ ID NO:4, and the 3′ dLTR sequence comprises SEQ ID NO:5.
Also provided are nucleic acids (double-stranded DNA, single-stranded DNA and/or RNA) encoding a retroviral integrase protein. If the integrase-encoding nucleic acid is DNA, it can be present in a DNA vector, (e.g., a plasmid) in either double-stranded or single-stranded form. The integrase can further comprise one or more additional nuclear localization signals (NLS) in addition to the endogenous integrase NLS.
Also provided are combinations of a nucleic acid (DNA or RNA) encoding a retroviral (e.g., lentiviral; e.g., HIV; e.g., HIV-1) integrase and a transgene-containing transgene cassette (as described above). Further provided are combinations of a nucleic acid (DNA or RNA) encoding a retroviral (e.g., HIV; e.g., HIV-1) integrase and a plurality of DNA molecules (e.g., linear double stranded DNA molecules) comprising a transgene-containing transgene cassette as described above. For use in methods for targeted integration of a transgene, any of the combinations described previously in this paragraph can further comprise a polynucleotide encoding a fusion between dCas9 and psip1a (or a polypeptide comprising a fusion between dCas9 and psip1a); and a guide RNA comprising (i) a hairpin sequence that binds to Cas9 or dCas9, and (ii) a sequence complementary to a genomic target sequence.
Additionally provided herein are methods for introducing a transgene into the genome of a cell, wherein the methods comprise contacting the cell with a combination of a transgene-containing transgene cassette and a nucleic acid encoding a retroviral integrase protein. Contacting can be by, for example, transfection, electroporation, injection or any other method of introducing nucleic acids into a cell. Transgene-containing transgene cassettes have been described above and can be one of a plurality of the products of digestion of a plasmid with a blunt end-generating restriction enzyme. Alternatively, a transgene-containing transgene cassette can be an isolated DNA (or RNA) molecule.
The integrase-encoding nucleic acid can be DNA or mRNA. The retroviral integrase protein can be from any retrovirus. In certain embodiments, the retrovirus is a lentivirus. In additional embodiments, the lentivirus is HIV. In further embodiments, the HIV is HIV-1.
In certain embodiments, provided herein is a plasmid comprising (a) a first recognition site for a restriction enzyme selected from the group consisting of PmeI, ScaI and BstZ17I; (b) the sequence 5′-ACTG-3′; (c) a first truncated long terminal repeat (LTR) sequence comprising SEQ ID NO:4 that is interior to the first recognition site and the 5′-ACTG-3′ sequence; (d) an attR4 site that is interior to the first truncated LTR sequence; (e) the ccdB locus; (f) an attR3 site that is exterior to the ccdB locus; (g) a second truncated long terminal repeat (LTR) sequence comprising SEQ ID NO:5 that is exterior to the attR3 site; (h) the sequence 5′-CAGT-3′; and (i) a second recognition site for a restriction enzyme selected from the group consisting of PmeI, ScaI and BstZ17I, wherein the second recognition site is the same as the first recognition site; and wherein the 5′-CAGT-3′ sequence and the second recognition site are exterior to the second truncated LTR sequence. In certain embodiments, the 5′-ACTG-3′ sequence overlaps with the first recognition site and the 5′-CAGT-3′ sequence overlaps with the second recognition site.
In additional embodiments, provided herein is a plasmid comprising, in sequence (a) a first recognition site for a restriction enzyme selected from the group consisting of PmeI, ScaI and BstZ17I; (b) the sequence 5′-ACTG-3′; (c) a first truncated long terminal repeat (LTR) sequence comprising SEQ ID NO:4 that is interior to the first recognition site and the 5′-ACTG-3′ sequence; (d) an attP4 site that is interior to the first truncated LTR sequence; (e) a transgene; (f) an attP3 site that is exterior to the transgene; (g) a second truncated long terminal repeat (LTR) sequence comprising SEQ ID NO:5 that is exterior to the attP3 site; (h) the sequence 5′-CAGT-3′; and (i) a second recognition site for a restriction enzyme selected from the group consisting of PmeI, ScaI and BstZ17I, wherein the second recognition site is the same as the first recognition site and wherein the 5′-CAGT-3′ sequence and the second recognition site are exterior to the second truncated LTR sequence. In certain embodiments, the 5′-ACTG-3′ sequence overlaps with the first recognition site and the 5′-CAGT-3′ sequence overlaps with the second recognition site.
In additional embodiments, methods and compositions for targeted integration of transgenes are provided. The methods utilize a fusion protein in which psip1a (LEDGF/p75) amino acid sequences are joined to amino acid sequences of dCas9, optionally through a flexible linker such as (GGS)5. Nucleic acids (i.e., polynucleotides) encoding these fusion proteins are also provided. Also utilized in methods for targeted integration is a guide RNA comprising a portion that is complementary to a target genomic sequence and a portion comprising a RNA hairpin that binds to dCas9. The guide RNA tethers the fusion protein to the target genomic sequence (via its interaction with dCas9) and the psip1A portion of the fusion protein binds to a preintegration complex comprising integrase protein and a transgene cassette.
Accordingly, also provided are combinations of a nucleic acid (DNA or RNA) encoding a retroviral (e.g., lentiviral; e.g., HIV; e.g., HIV-1) integrase, a transgene-containing transgene cassette (as described above), a nucleic acid encoding a fusion between dCas9 and psip1a; and a guide RNA comprising (i) a hairpin sequence that binds to Cas9 or dCas9, and (ii) a sequence complementary to a genomic target sequence.
Additional embodiments provide combinations of a nucleic acid (DNA or RNA) encoding a retroviral (e.g., HIV; e.g., HIV-1) integrase, a plurality of DNA molecules (e.g., linear double stranded DNA molecules) comprising a transgene-containing transgene cassette (as described above), a nucleic acid encoding a fusion between dCas9 and psip1a; and a guide RNA comprising (i) a hairpin sequence that binds to Cas9 or dCas9, and (ii) a sequence complementary to a genomic target sequence.
The disclosure also provides methods for targeted insertion of a transgene into the genome of a cell, the method comprising contacting the cell with a combination comprising a nucleic acid (DNA or RNA) encoding a retroviral (e.g., lentiviral; e.g., HIV; e.g., HIV-1) integrase, a transgene-containing transgene cassette (as described above), a nucleic acid encoding a fusion between dCas9 and psip1a; and a guide RNA comprising (i) a hairpin sequence that binds to Cas9 or dCas9, and (ii) a sequence complementary to a genomic target sequence.
In additional embodiments, the disclosure provides methods for targeted insertion of a transgene into the genome of a cell, the method comprising contacting the cell with a combination comprising a nucleic acid (DNA or RNA) encoding a retroviral (e.g., HIV; e.g., HIV-1) integrase, a plurality of DNA molecules (e.g., linear double stranded DNA molecules) comprising a transgene-containing transgene cassette (as described above), a nucleic acid encoding a fusion between dCas9 and psip1a; and a guide RNA comprising (i) a hairpin sequence that binds to Cas9 or dCas9, and (ii) a sequence complementary to a genomic target sequence.
Fish were sorted into five groups depending on degree of expression of the transgene (Group 0: no expression through Group 4:highest level of expression), and results are expressed as the percentage of total individuals examined that fell into each group. For each pair of bars, white coloring indicates the percentage of fish in Group 0; light stippling indicates the percentage of fish in Group 1; heavy stippling indicates the percentage of fish in Group 2; dark shading indicates the percentage of fish in Group 3; and black indicates the percentage of fish in Group 4.
Fish were sorted into five groups depending on degree of expression of the transgene, and results are expressed as the percentage of total individuals examined that fell into each group. The percentage of fish in each group (Group 0 through Group 4) is indicated by shading, as in
For each pair of bars, the right-most bar (indicated by “+” beneath the graph) shows percentage of individuals stably expressing red fluorescence after co-injection of tdTomato-containing transgene cassette and integrase mRNA; the left-most bar (indicated by “−” beneath the graph) shows results of control injections of tdTomato-containing transgene cassette only. Fish were sorted into groups depending on their degree of red fluorescence: fish in Group 1 (indicated by light shading) exhibited partial fluorescence in heart; and fish in Group 2 (indicated by darker shading) exhibited full fluorescence in heart.
Practice of the present disclosure employs, unless otherwise indicated, standard methods and conventional techniques in the fields of cell biology, molecular biology, biochemistry, cell culture, recombinant DNA and related fields as are within the skill of the art. Such techniques are described in the literature and thereby available to those of skill in the art. See, for example, Alberts, B. et al., “Molecular Biology of the Cell,” 6th edition, Garland Science, New York, N.Y., 2015; Watson et al., “Molecular Biology of the Gene,” 7th edition, Pearson, London, 2014; Lodish et al. “Molecular Cell Biology,” 8th edition, W.H. Freeman, New York, N.Y., 2016; Voet, D. et al. “Fundamentals of Biochemistry: Life at the Molecular Level,” 5th edition, John Wiley & Sons, Hoboken, N.J., 2016; Sambrook, J. et al., “Molecular Cloning: A Laboratory Manual,” 3rd edition, Cold Spring Harbor Laboratory Press, 2001; Ausubel, F. et al., “Current Protocols in Molecular Biology,” John Wiley & Sons, New York, 1987 and periodic updates; Freshney, R. I., “Culture of Animal Cells: A Manual of Basic Technique,” 4th edition, John Wiley & Sons, Somerset, N J, 2000; and the series “Methods in Enzymology,” Academic Press, San Diego, Calif.
A “transgene vector,” or “pLTR vector,” as disclosed herein, is a DNA plasmid vector which, when cleaved by an appropriate restriction enzyme, generates a DNA molecule that resembles the substrate for integration of a retroviral DNA genome. Transgene vectors are characterized by sequences that facilitate introduction of an exogenous gene (e.g., att sites), flanked by truncated retroviral long terminal repeat (LTR) sequences, which are in turn flanked by the sequence 5′-ACTG-3′, which in turn overlaps with, or is flanked by, recognition sites for a restriction enzyme whose cleavage generates blunt ends and whose recognition sequence optionally contains six or more nucleotides. A transgene vector suitable for insertion of a transgene, but which do not comprise a transgene, is denoted an “insertion vector.”
A “transgene” is any DNA sequence inserted into a transgene vector as described herein. A transgene will often be a sequence encoding a protein, but can also be, e.g., a regulatory sequence (e.g., promoter, enhancer) or a sequence encoding a regulatory RNA, such as an antisense RNA or a siRNA.
A “transgene cassette” refers to a nucleic acid (e.g., DNA) molecule comprising a transgene (or one or more selection markers) flanked by sequences promoting recombination (e.g., att sites), which recombination-promoting sequences are in turn flanked by truncated LTR sequences, which truncated LTR sequences are in turn flanked by 5′-ACTG-3′ sequences, which 5′-ACTG-3′ sequences in turn overlap with, or are flanked by, recognition sequences for a restriction enzyme that, upon cleavage, generates blunt ends. A transgene cassette can be a portion of a transgene vector, wherein the transgene vector contains additional sequences such as, for example, replication origins, transcriptional regulatory sequences and additional selection markers. A transgene cassette can an isolated DNA molecule resulting from cleavage of a transgene vector with a blunt end-generating restriction enzyme as described herein. A transgene cassette may or may not comprise a transgene; if a transgene cassette comprises a transgene, it is denoted a “transgene-containing transgene cassette.”
The terms “interior” (or “internal”) and “exterior” (or “external”) refer to relative location within a transgene cassette or transgene vector. Taking the transgene (or the selection marker(s) present in the vector before insertion of the transgene) as center; a first element being “interior to” a second element means that the first element is closer to the transgene (or selection marker) than is the second element. Alternatively, a first element being “exterior to” a second element means that the second element is closer to the transgene (or selection marker) than is the first element.
An “integrase vector,” as disclosed herein, is a DNA plasmid vector containing sequences encoding a retroviral or lentiviral integrase protein. An integrase vector can also contain control sequences that regulate expression of the integrase protein. Such control sequences can be, for example, promoters for in vitro transcription, such as, for example, a SP6 promoter or a T7 promoter or the like; or a promoter (optionally in operative linkage with an enhancer) able to function in a eukaryotic cell. Such promoters and enhancers are known in the art. Sites specifying transcription termination and polyadenylation can also be present.
A restriction enzyme recognition site (or recognition sequence) is a DNA sequence to which a restriction enzyme binds in the process of DNA cleavage by the restriction enzyme. For most restriction enzymes, their recognition site is also the site at which the restriction enzyme cleaves DNA. However, certain restriction enzymes (e.g., FokI) cleave at a site that is distinct from the sequence at which they bind.
Cleavage of DNA by a restriction enzyme generates two DNA ends at the site of cleavage. If the terminal nucleotide of those ends is base-paired, the ends are denoted “blunt ends.” If one or more of the 5′-terminal nucleotides are not base-paired, the ends are said to have a 5′ extension or a 5′-overhang. If one or more of the 3′-terminal nucleotides are not base-paired, the ends are said to have a 3′ extension or a 3′-overhang. 5′- and 3′-overhangs can consist of one, two, three, four or more unpaired nucleotides.
“Homology” or “identity” or “similarity” as used herein refers to the relationship between two nucleic acid molecules based on an alignment of their nucleotide sequences. Homology and identity can each be determined by comparing a position in each sequence which may be aligned for purposes of comparison. For example, a “reference sequence” can be compared with a “test sequence.” When a position in the reference sequence is occupied by the same nucleotide at an equivalent position in the test sequence, then the molecules are identical at that position; when the equivalent position is occupied by a similar nucleotide residue (e.g., similar in steric and/or electronic nature, and/or in its hydrogen-bonding properties), then the molecules can be referred to as homologous (similar) at that position. The relatedness of two sequences, when expressed as a percentage of homology/similarity or identity, is a function of the number of identical or similar nucleotides at positions shared by the sequences being compared. In comparing two sequences, the absence of nucleotide residues, or presence of extra residues, in one sequence as compared to the other, also decreases the identity and homology/similarity.
As used herein, the term “identity” refers to the percentage of identical nucleotide residues at corresponding positions in two or more sequences when the sequences are aligned to maximize sequence matching, i.e., taking into account gaps and insertions. Identity can be readily calculated by known methods, including but not limited to those described in Computational Molecular Biology, Lesk, A. M., ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, D. W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A. M., and Griffin, H. G., eds., Humana Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 1991; and Carillo, H., and Lipman, D., SIAM J. Applied Math., 48: 1073 (1988). Methods to determine identity are designed to give the highest degree of match between the sequences tested. Moreover, methods to determine identity are codified in publicly available computer programs. Computer program methods to determine identity between two sequences include, but are not limited to, the GCG program package (Devereux et al. (1984) Nucleic Acids Research 12:387), BLASTP, BLASTN, and FASTA (Altschul et al. (1990) J. Molec. Biol. 215:403-410; Altschul et al. (1997) Nucleic Acids Res. 25:3389-3402). The BLAST X program is publicly available from NCBI and other sources. See, e.g., BLAST Manual, Altschul, S., et al., NCBI NLM NIH Bethesda, Md. 20894; Altschul et al. (1990) J. Mol. Biol. 215:403-410. The well known Smith-Waterman algorithm can also be used to determine identity.
For sequence comparison, typically one sequence acts as a reference sequence, to which one or more test sequences are compared. Sequences are generally aligned for maximum correspondence over a designated region, e.g., a region at least about 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65 or more nucleotides in length, and the region can be as long as the full-length of the reference nucleotide sequence. When using a sequence comparison algorithm, test and reference sequences are input into a computer program, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters.
Examples of algorithms that are suitable for determining percent sequence identity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al. (1990) J. Mol. Biol. 215:403-410 and Altschul et al. (1977) Nucleic Acids Res. 25:3389-3402, respectively. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information at www.ncbi.nlm.nih.gov (visited Jul. 22, 2019). Further exemplary algorithms include ClustalW (Higgins et al. (1994) Nucleic Acids Res. 22:4673-4680), available at www.ebi.ac.uk/Tools/clustalw/index.html (visited Jul. 22, 2019).
Sequence identity between two nucleic acids can also be described in terms of annealing, reassociation, or hybridization of two polynucleotides to each other, mediated by base-pairing. Hybridization between polynucleotides proceeds according to well-known and art-recognized base-pairing properties, such that adenine base-pairs with thymine or uracil, and guanine base-pairs with cytosine. The property of a nucleotide that allows it to base-pair with a second nucleotide is called complementarity. Thus, adenine is complementary to both thymine and uracil, and vice versa; similarly, guanine is complementary to cytosine and vice versa. An oligonucleotide or polynucleotide which is complementary along its entire length with a target sequence is said to be perfectly complementary, perfectly matched, or fully complementary to the target sequence, and vice versa. Two polynucleotides can have related sequences, wherein the majority of bases in the two sequences are complementary, but one or more bases are noncomplementary, or mismatched. In such a case, the sequences can be said to be substantially complementary to one another. If two polynucleotide sequences are such that they are complementary at all nucleotide positions except one, the sequences have a single nucleotide mismatch with respect to each other.
Conditions for hybridization are well-known to those of skill in the art and can be varied within relatively wide limits. Hybridization stringency refers to the degree to which hybridization conditions disfavor the formation of hybrids containing mismatched nucleotides, thereby promoting the formation of perfectly matched hybrids or hybrids containing fewer mismatches; with higher stringency correlated with a lower tolerance for mismatched hybrids. Factors that affect the stringency of hybridization include, but are not limited to, temperature, pH, ionic strength, and concentration of organic solvents such as formamide and dimethylsulfoxide. As is well known to those of skill in the art, hybridization stringency is increased by higher temperatures, lower ionic strengths, and lower solvent concentrations. See, for example, Ausubel et al., supra; Sambrook et al., supra; M. A. Innis et al. (eds.) PCR Protocols, Academic Press, San Diego, 1990; B. D. Hames et al. (eds.) Nucleic Acid Hybridisation: A Practical Approach, IRL Press, Oxford, 1985; and van Ness et al., (1991) Nucleic Acids Res. 19:5143-5151.
Thus, in the formation of hybrids (duplexes) between two polynucleotides, the polynucleotides are incubated together in solution under conditions of temperature, ionic strength, pH, etc., that are favorable to hybridization, i.e., under hybridization conditions. Hybridization conditions are chosen, in some circumstances, to favor hybridization between two nucleic acids having perfectly-matched sequences, as compared to a pair of nucleic acids having one or more mismatches in the hybridizing sequence. In other circumstances, hybridization conditions are chosen to allow hybridization between mismatched sequences, favoring hybridization between nucleic acids having fewer mismatches.
The degree of hybridization between two polynucleotides, also known as hybridization strength, is determined by methods that are well-known in the art. A preferred method is to determine the melting temperature (Tm) of the hybrid duplex. This is accomplished, for example, by subjecting a duplex in solution to gradually increasing temperature and monitoring the denaturation of the duplex, for example, by absorbance of ultraviolet light, which increases with the unstacking of base pairs that accompanies denaturation. Tm is generally defined as the temperature midpoint of the transition in ultraviolet absorbance that accompanies denaturation. Alternatively, if Tms are known, a hybridization temperature (at fixed ionic strength, pH and solvent concentration) can be chosen that is below the Tm of the desired duplex and above the Tm of an undesired duplex. In this case, determination of the degree of hybridization is accomplished simply by testing for the presence of duplex polynucleotide. Adsorption to hydroxyapatite can also be used to distinguish single-stranded nucleic acids from double-stranded nucleic acids.
Hybridization conditions are selected following standard methods in the art. See, for example, Sambrook, et al., Molecular Cloning: A Laboratory Manual, Second Edition, (1989) Cold Spring Harbor, N.Y. For example, hybridization reactions can be conducted under stringent conditions. An example of stringent hybridization conditions is hybridization at 50° C. or higher in 0.1×SSC (15 mM sodium chloride/1.5 mM sodium citrate). Another example of stringent hybridization conditions is overnight incubation at 42° C. in a solution: 50% formamide, 5×SSC (0.75 M NaCl, 75 mM trisodium citrate), 50 mM sodium phosphate (pH7.6), followed by washing in 0.1×SSC at about 65° C. Optionally, one or more of 5×Denhardt's solution, 10% dextran sulfate, and/or 20 mg/ml heterologous nucleic acid (e.g., yeast tRNA, denatured, sheared salmon sperm DNA) can be included in a hybridization reaction. Stringent hybridization conditions are hybridization conditions that are at least as stringent as the above representative conditions, where conditions are considered to be at least as stringent if they are at least about 80% as stringent, typically at least 90% as stringent as the above specific stringent conditions.
The term “substantially identical” is used herein to refer to a first nucleic acid sequence that contains a sufficient or minimum number of nucleotides that are identical to aligned nucleotides in a second nucleic acid sequence such that the first and second nucleotide sequences possess a common functional property (e.g., enhancing the expression, stability or transport of mRNA).
The term “homology” describes a mathematically based comparison of sequence similarities which is used to identify sequences with similar functions or motifs. A reference nucleotide sequence (e.g., a sequence as disclosed herein) is used as a “query sequence” to perform a search against public databases to, for example, identify other family members, related sequences or homologues. Such searches can be performed using the NBLAST and XBLAST programs (version 2.0) of Altschul et al. (1990) J. Mol. Biol. 215:403-410. BLAST nucleotide searches can be performed with the NBLAST program, score=100, wordlength=12 to obtain nucleotide sequences homologous to a reference nucleotide sequence. To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et al. (1997) Nucleic Acids Res. 25:3389-3402. When utilizing the BLAST and Gapped BLAST programs, the default parameters of the respective programs (e.g., XBLAST and BLAST) can be used (see ncbi.nlm.nih.gov).
Nucleic acids and polynucleotides of the present disclosure encompass those having a nucleotide sequence that is at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.9% or 100% identical to any of SEQ ID NOs:1-5.
Nucleotide analogues are known in the art. Accordingly, nucleic acids (i.e., SEQ ID NOs:1-5) comprising nucleotide analogues are also encompassed by the present disclosure.
Transgene vectors are based on Gateway destination vectors and are designed so that, after insertion of transgene sequences; cleavage of the vector with an appropriate restriction enzyme generates a DNA molecule resembling a retroviral pre-integration substrate. Thus, a transgene vector contains a transgene cassette comprising one or more pairs of att sites to facilitate insertion of the transgene by Gateway cloning methods. The att sites are flanked externally by truncated retroviral (e.g., lentiviral) LTR sequences (denoted 5′ dLTR and 3′ dLTR herein) which, in turn, are flanked (externally) by the inverted repeat sequence 5′-ACTG-3′. The 5′-ACTG-3′ sequences are flanked, in turn, by recognition sites for a restriction enzyme whose cleavage generates blunt-ended products. In certain embodiments, the 5′-ACTG-3′ sequences overlap with the recognition site for the blunt end-generating restriction enzyme. In certain embodiments, the recognition sites are six nucleotide pairs or greater in length. A schematic diagram of a transgene cassette is shown in
In certain embodiments of a transgene vector, one or more selection markers are located between the att sites, to allow for selection of vectors containing an inserted transgene. The selection marker can be a negative selection marker (e.g., the ccdB gene) that causes cell death or blocks cell growth; so that replacement of the negative selection marker by transgene sequences allows survival of cells harboring a transgene-containing vector. Selection markers are known in the art and include, for example, β-lactamase, ccdB, dihydrofolate reductase (DHFR), glutamine synthetase (GS), puromycin-N-acetyl transferase, hygromycin phosphotransferase, aminoglycoside-3-phosphotransferase, ble; and sequences encoding resistance to ampicillin, tetracycline, kanamycin, chloramphenicol, G418, gentamycin and neomycin.
A. Restriction Enzyme Recognition Sites
Integration of the retroviral double-stranded DNA genome requires a blunt-ended genome, terminating in the inverted repeat sequence
as a substrate for retroviral integrase activity. Accordingly, for transgene integration according to the present invention, the transgene is present on a blunt-ended DNA molecule; hence the restriction enzyme recognition sites that flank the transgene cassette are sites whose cleavage results in production of a blunt end (i.e., recognition sites for a blunt end-generating restriction enzyme) and whose recognition site contains all or part of the sequence 5′-ACTG-3′.
In addition, to avoid the possibility of cleavage within the transgene itself, it is preferable that the recognition site contain six nucleotide pairs or more; e.g., six nucleotide pairs, seven nucleotide pairs, eight nucleotide pairs, nine nucleotide pairs, ten nucleotide pairs, eleven nucleotide pairs, twelve nucleotide pairs or more. However, depending on the size and nucleotide sequence of the transgene, blunt end-generating restriction enzymes whose recognition sites contain four or five nucleotide pairs can also be used.
Exemplary restriction enzymes for use in the methods described herein, that produce blunt ends and whose recognition sequences contain all or part of the sequence 5′-ACTG-3′, include Sca I, PmeI and BstZ17I, whose recognition sequences are shown in Table 1.
Additional restriction enzyme recognition sequence suitable for use in the transgene vectors described herein include those whose cleavage generates blunt ends terminating in the sequence 5′-ACTG-3′, or in which the sequence 5′-ACTG-3′ is within 1, 2, 3, 4 or 5 base pairs of a blunt-ended terminus. In addition, restriction enzymes generating 5′-overhanging ends which can be repaired by a DNA polymerase to generate (1) a blunt-end terminating in the sequence 5′-ACTG-3′; or (2) a blunt-ended in which the sequence 5′-ACTG-3′ is within 1, 2, 3, 4 or 5 base pairs of the blunt-ended terminus, can also be used. Furthermore, restriction enzymes generating 3′-overhanging ends which can be processed by a protein having 3′-specific, single-stranded exonuclease activity (e.g., S1 nuclease, mung bean nuclease, E. coli. exonuclease I, E. coli. exonuclease X, E. coli DNA polymerase I, E. coli DNA polymerase II, E. coli DNA polymerase III, E. coli exonuclease T), to generate (1) a blunt-end terminating in the sequence 5′-ACTG-3′; or (2) a blunt-ended in which the sequence 5′-ACTG-3′ is within 1, 2, 3, 4 or 5 base pairs of the blunt-ended terminus, can also be used.
B. Inverted Repeat Sequence
For integration of a double-stranded viral DNA genome into a host cell chromosome, the blunt-ended inverted repeat sequence
is required at the termini of the double-stranded viral DNA genome. The 3′-processing activity of the viral integrase (int) protein removes the terminal GT dinucleotide, leaving a 5′ extension of the dinucleotide AC at both ends of the DNA molecule, which allows the molecule to serve as a substrate for strand transfer (i.e., integration).
Accordingly, the transgene vectors disclosed herein contain, at both ends of the transgene cassette, the inverted repeat (IR) sequence
This 5′-ACTG-3′ sequence can be part of the blunt end-generating restriction enzyme recognition site (as discussed in the previous section) or can overlap, either fully or partially, with the recognition site.
C. Truncated LTRs
The termini of retroviral and lentiviral genomes consist of identical long terminal repeat (LTR) sequences. A typical LTR contains three sequence elements: U5, a sequence unique to the 5′ end of the RNA genome; U3, a sequence unique to the 3′ end of the RNA genome; and R, a sequence contained at both the 5′ and 3′ ends of the RNA genome external to the U5 and U3 sequences. A generalized structure of a retroviral RNA genome, focusing on the terminal sequences, is shown in
During the infective cycle, the single-stranded RNA genome is converted to a double-stranded DNA molecule. Due to the nature of the reverse transcription reaction, certain terminal genomic sequences are duplicated and transferred to the other end of the genome, generating long terminal repeat (LTR) sequences, as shown schematically in
The LTR-containing double-stranded DNA genome is the substrate for integration; however, not all LTR sequences are required for integration of viral double-stranded DNA. In particular, many, if not all of the approximately 50 transcriptional regulatory elements, present in the U3 region, are unnecessary for integration. Accordingly, in the transgene vectors and transgene cassettes disclosed herein, not all U3 sequences are present in the truncated LTRs (dLTRs) present in the transgene vectors. In particular, the 5′ dLTR does not contain any U3 sequences, consisting of R and U5 sequences; and the 3′ dLTR contains an internally deleted U3 (dU3) region (that retains only the Sp1 and GATA-3 binding sites) along with R and U5 sequences.
The derivation of the 5′ dLTR and 3′ dLTR are shown in more detail in
D. att Sites
Transgene vectors are designed for rapid and simple insertion of transgenes using the gateway cloning system. See, for example, Hartley et al., supra. Accordingly, the transgene vectors disclosed herein, based on Gateway destination vectors, contain one or more pairs of att sites.
att sites are DNA sequences involved in the integration of the bacteriophage λ genome into, and its excision from, the E. coli. chromosome. The bacteriophage contains two sequence denoted attP, which, in the presence of a recombinase protein, recombine with a pair of bacterial sequence known as attB sites. The result of the recombination reaction is an E. coli genome containing an integrated λ genome, in which the integrated λ genome is flanked by hybrid att sites denoted attL and attR. Excision of an integrated λ genome is catalyzed by the xis protein, resulting in the regeneration of the attP sites in the phage genome and regeneration of the attB sites in the bacterial genome.
In a vector with a single pair of att sites, one att site lies just interior to the 5′ dLTR sequence, and the other att site lies just interior to the 3′ dLTR sequence. In certain embodiments, transgene vectors contain two pairs of att sites. In additional embodiments, transgene vectors contain three pairs of att sites: a first pair of att sites for 5′ entry clones; a second pair of att sites for middle entry clones and a third pair of att sites for 3′ entry clones as described, for example, by Kwan et al. (2007) Devel. Dynamics 236:3088-3099. Exemplary pairs of att sites include:
att L1 and att L2
att L3 and att L4
att R1 and att R2
att R3 and att R4
att B1 and att B2
att B3 and att B4
att P1 and att P2
att P3 and att P4
Retroviral integrase proteins are encoded by a portion of the retroviral pol gene, near its 3′ end. Integrase proteins comprise approximately 300 to 400 amino acids and include three domains, that are joined by linkers of varying length. The N-terminal domain includes two pairs of zinc-chelating histidine and cysteine residues (the HHCC motif) in which a bound Zn2+ ion stabilizes a helix-turn-helix structure. The catalytic core domain is characterized by three acidic amino acids: two aspartic acid residues and a glutamic acid residue (the DDE motif) with the second aspartic acid and the glutamic acid being separated by approximately 35 residues. The DDE motif is also involved in metal ion chelation. Also within the central region of HIV-1 integrase is a non-canonical nuclear localization signal (NLS), having the amino acid sequence IIGQVRDQAEHLK (SEQ ID NO:12) which is in part responsible for the ability of HIV to infect non-dividing cells. The C-terminal domain of integrase proteins is the least well-conserved but contains β-strand barrels resembling that found in the SH3 domain and includes determinants for DNA binding and multimerization (retroviral integrases are active only as multimers: a dimer is capable of 3′-end processing, but a tetramer is required for strand transfer and integration). Certain retroviral integrases also contain a N-terminal extension.
A nucleic acid comprising sequences encoding a polypeptide having retroviral integrase activity can be, for example, a mRNA molecule. Such mRNA molecules can be generated, for example, by in vitro transcription of a DNA molecule having appropriate transcriptional control sequences such as, for example, a bacteriophage T7 promoter or a bacteriophage SP6 promoter. Transcription termination can be regulated by the presence of a transcriptional terminator sequence or a RNA molecule can be generated as the result of run-off transcription from a linear DNA template. Optionally, such integrase mRNAs contain translational regulatory sequences; e.g., a Kozak sequence or an internal ribosome entry site (IRES).
Alternatively, sequences encoding polypeptides having retroviral integrase activity are present in a DNA molecule, for example, a plasmid. In these cases, promoter and enhancer sequences, additional transcriptional regulatory sequences such as transcription termination signals and polyadenylation signals, insulators and translational regulatory sequences (such as Kozak sequences and internal ribosome entry sites) can also be present in the plasmid. See also Masuda (2011) Frontiers in Microbiology 2:1-5 (Article 210).
In additional embodiments, the disclosure provides integrase proteins (and nucleic acids encoding them) that have been engineered to contain one or more additional nuclear localization signals. For example, in addition to the endogenous NLS present in HIV-1 integrase; NLS sequences from SV40 (PKKKRKV, SEQ ID NO:13), c-myc (PAAKRVKLD, SEQ ID NO:14), the HIV Vpr protein (RRTRNGASKS, SEQ ID NO:15) and hnRNPA1 (SSNFGPMLGGNRFFRSSPY, SEQ ID NO:16) are introduced at the N-terminus and/or the C-terminus of the integrase protein. In certain embodiments, a linker sequence is present between the integrase protein and the exogenous nuclear localization signal(s) at the N- and/or C-terminus. Since different nuclear localization signals are recognized by different importin proteins (e.g., the HIV integrase NLS is recognized by importin α3 and the HIV Vpr NLS is recognized by importin al, while other NLS sequences are recognized by importin β); integrase proteins containing multiple different nuclear localization signals will accumulate at higher levels in cell nuclei; thereby increasing integration efficiency.
The transgene cassettes and transgene vectors disclosed herein are gateway compatible; accordingly, it is straightforward to include not only coding sequences, but also 5′ and 3′ regulatory sequences, such as, for example, enhancers, promoters, transcription termination sites, polyadenylation signals and translation initiation sites; using two-way or three-way gateway cloning protocols. Accordingly, transgene-containing transgene cassettes, and integrated transgenes obtained by the methods described herein, can contain transcriptional and translational regulatory sequences to control the expression (e.g., temporal expression and/or regional expression) of the integrated transgene. Certain regulatory sequences, known in the art, can also provide constitutive expression of a transgene (e.g., actin promoter, CMV promoter, 3-GPDH promoter, ribosomal promoters). Transcriptional regulatory sequences include, for instance, promoters, enhancers, polyadenylation signals and insulators.
Promoters active in eukaryotic cells are known in the art and include, for example viral promoters (e.g., SV40 early promoter, SV40 late promoter, cytomegalovirus major immediate early (MIE) promoter, herpes simplex virus thymidine kinase (HSV-TK) promoter), EF1-alpha (translation elongation factor-1 α subunit) promoter, Ubc (ubiquitin C) promoter, PGK (phosphoglycerate kinase) promoter, actin promoter and others. See also Boshart et al., GenBank Accession No. K03104; Uetsuki et al. (1989) J. Biol. Chem. 264:5791-5798; Schorpp et al. (1996) Nucleic Acids Res. 24:1787-1788; Hamaguchi et al. (2000) J. Virology 74:10778-10784; and Dreos et al. (2013) Nucleic Acids Res. 41(D1):D157-D164. Tissue-specific promoters, such as the cMLC2 promoter, which specifies transcription in myocardial cells, can also be used.
Enhancer elements, and their nucleotide sequences, are known in the art. Certain enhancers can be used to direct tissue-specific expression of genes (e.g., transgenes) to which they are operatively linked. For example, the Fli1EP enhancer directs transcription to endothelial cells.
Polyadenylation signals, and their nucleotide sequences, are known in the art. Generally, a polyadenylation signal is present downstream, in the transcriptional sense, of the transgene. Polyadenylation signals that are active in eukaryotic cells include, but are not limited to, the SV40 polyadenylation signal, the bovine growth hormone (BGH) gene polyadenylation signal and the herpes simplex virus thymidine kinase gene polyadenylation signal. The polyadenylation signal directs 3′ end cleavage of pre-mRNA, polyadenylation of the pre-mRNA at the cleavage site and termination of transcription downstream of the polyadenylation signal. A core sequence AAUAAA is generally present in the polyadenylation signal. See also Cole et al. (1985) Mol. Cell. Biol. 5:2104-2113.
In further embodiments, the vectors and transgene cassettes disclosed herein contain an insulator element, also known as a matrix attachment region (MAR) or scaffold attachment region (SAR). MAR and SAR sequences act, inter alia, to insulate the chromatin structure of adjacent sequences. Thus, in a stably transformed cell, in which heterologous sequences are chromosomally integrated, an insulator sequence can prevent repression of transcription of a transgene that has integrated into a region of the cellular genome having a repressive chromatin structure. Accordingly, inclusion of one or more insulator sequences in a vector can facilitate expression of a transgene from the vector in stably-transformed cells.
Exemplary insulator elements include those from the human interferon beta gene (IBM), the chicken (G. gallus) lysozyme gene 5′ matrix attachment region (CLM), the human interferon alpha-2 gene (IAM), the mouse S4 MAR/SAR and the human X29 MAR/SAR. The insulator can be located at any location within the vector or the cassette. In certain embodiments, insulator elements are located within the transgene cassette upstream (in the transcriptional sense) of a promoter. In additional embodiments, insulator elements are present at both ends of a transgene.
In certain embodiments, the vectors also include, within an expression cassette (as defined above) a post-transcriptional regulatory element (PRE). In certain embodiments, the post-transcriptional regulatory element is a cis-acting element that promotes mRNA stability. In other embodiments, the post-transcriptional regulatory element is a cis-acting element that promotes transport of RNA from the nucleus to the cytoplasm. Exemplary PREs include the human hepatitis B virus PRE (HPRE) and the woodchuck hepatitis virus post-transcriptional regulatory element (WPRE). See, e.g., U.S. Pat. No. 6,136,597; Huang & Liang (1993) Mol. Cell. Biol. 13:7476-7486; Huang & Yen (1994) J. Virol. 68:3193-3199; Donello et al. (1996) J. Virol. 70:4345-4351; and Donello et al. (1998) J. Virology 72:5085-5092. Sub-elements of the HPRE (a element and f3 element) and WPRE (a element, f3 element and y element) have been identified. Accordingly, chimeric PREs containing mixtures of HPRE and WPRE sub-elements are also contemplated for use in the compositions disclosed herein.
Additional post-transcriptional regulatory elements include, but are not limited to, the 5′-untranslated region of the human Hsp70 gene, the SP163 sequence from the vascular endothelial growth factor (VEGF) gene, the tripartite leader sequence associated with adenovirus late mRNAs and the first intron of the human cytomegalovirus immediate early gene. See, for example, Mariati et al. (2010) Protein Expression and Purification 69:9-15.
A transgene can comprise an intron which, in certain instances, can increase production of mRNA from an integrated transgene. Exemplary introns that can be used include the human β-globin intron and the first intron of the human cytomegalovirus major immediate early (MIE) gene, also known as “intron A.”
Vectors containing a transgene cassette can contain a replication origin that functions in prokaryotic cells. Replication origins that functions in prokaryotic cells are known in the art and include, but are not limited to, the oriC origin of E. coli; plasmid origins such as, for example, the pSC101 origin, the pBR322 origin (rep) and the pUC origin; and viral (i.e., bacteriophage) replication origins (e.g., the f1 replication origin). Methods for identifying prokaryotic replication origins are provided, for example, in Sernova & Gelfand (2008) Brief. Bioinformatics 9(5):376-391.
Selection markers, both positive and negative, are known in the art. An exemplary selection marker that functions in eukaryotic cells is the glutamine synthetase (GS) gene; selection is applied by culturing cells in medium lacking glutamine or medium containing methionine sulfoximine. Another exemplary selection marker that functions in eukaryotic cells is the gene encoding resistance to neomycin (neo); selection is applied by culturing cells in medium containing neomycin or G418. An exemplary gene encoding neomycin resistance is the TN5 Neo gene. Additional selection markers include sequences encoding dihydrofolate reductase (DHFR, imparts resistance to methotrexate), puromycin-N-acetyl transferase (provides resistance to puromycin), hygromycin kinase (provides resistance to hygromycin B), hygromycin phosphotransferase, aminoglycoside-3-phosphotransferase, ble, and genes encoding resistance to zeocin. Yet additional selection markers that function in eukaryotic cells are known in the art. Selective agents that can be used in the methods disclosed herein are known in the art and include, but are not limited to, G418, methotrexate, neomycin, geneticin, puromycin, bleomycin, Zeocin, blasticidin, hygromycin, methionine sulfoximine and L-glutamine. Any of the sequences encoding a selection marker as described above can be operatively linked to a promoter and/or a polyadenylation signal.
The vectors disclosed herein can also contain one or more selection markers that function in prokaryotic cells. Selection markers that function in prokaryotic cells are known in the art and include, for example, sequences that encode polypeptides conferring resistance to a selective agent such as, for example, ampicillin, kanamycin, chloramphenicol, or tetracycline. An example of a polypeptide conferring resistance to ampicillin (and other beta-lactam antibiotics) is the beta-lactamase (bla) enzyme. Kanamycin resistance can result from activity of the neomycin phosphotransferase gene; and chloramphenicol resistance is mediated by chloramphenicol acetyl transferase.
Negative selection markers that are active in prokaryotic cells include the ccdB gene, which encodes a DNA gyrase inhibitor.
The vectors disclosed herein can be any nucleic acid vector known in the art. Exemplary vectors include plasmids, cosmids, bacterial artificial chromosomes (BACs) and viral vectors.
Any sequence, coding or noncoding, can serve as a transgene. For example, a transgene can encode a detectable moiety; e.g., a fluorescent protein, such as green fluorescent protein (GFP), enhanced green fluorescent protein (EGFP), red fluorescent protein, yellow fluorescent protein, tdTomato, luciferase and the like. A transgene can also encode an enzymatic activity (e.g., β-galactosidase, β-glucuronidase, luciferase and the like). A transgene can also be a therapeutic protein, such as globin, a coagulation factor, or a therapeutic antibody.
A transgene can encode, for example, a recombinant protein, a fusion protein, an antibody, a cytokine, a hormone, an enzyme or a clotting factor. Exemplary antibodies include monoclonal antibodies, single chain antibodies, bispecific antibodies, and antibody conjugates.
Exemplary transgenes include those encoding therapeutic proteins, e.g., hormones (such as, for example, growth hormone), cytokines (e.g., erythropoietin), antibodies, monoclonal antibodies (e.g., rituximab), antibody conjugates, fusion proteins (e.g., IgG-fusion proteins), interleukins, CD proteins, MHC proteins, enzymes and clotting factors.
Exemplary cytokines include, but are not limited to, erythropoietin, granulocyte colony-stimulating factor (G-CSF), filgrastim, and PEGfilgrastim.
Exemplary hormones include, but are not limited to, human growth hormone, luteinizing hormone (Luveris), and epoetin (Procrit).
Insertion of a transgene into a transgene vector is conducted using standard gateway cloning procedures, which results in conversion of the att sites present in the transgene vector into different att sites in the transgene-containing transgene vector. For example, in certain embodiments, attR sites (e.g., attR4 and attR3) present in a transgene vector are converted to attP sites (e.g., attP4 and attP3) in the process of inserting a transgene into the vector. Depending on the method of inserting transgene sequences, multiple att sites can be present in a transgene-containing transgene vector. For example, a transgene-containing transgene vector constructed by three-way gateway cloning will comprise four att sites.
The compositions disclosed herein can be used for convenient, high-efficiency, non-viral insertion of a transgene into the genome of a cell, by contacting the cell with a combination comprising (1) a transgene-containing transgene cassette (2) and a nucleic acid comprising sequences encoding a polypeptide having retroviral integrase activity. A transgene-containing transgene cassette can be an isolated, double-stranded DNA molecule or it can be one of a plurality of DNA molecules generated by digestion of a transgene-containing transgene vector with a restriction enzyme. Contact can be by any method known in the art, including transfection, injection, electroporation, biolistic delivery, protoplast fusion, polyethylene glycol (PEG)-mediated methods, polyethyleneimine (PEI)-mediated methods, DEAE-dextran-mediated methods, calcium phosphate co-precipitation, and lipid-based particles (e.g., lipofection).
The methods and compositions described herein achieve high-efficiency transgene integration. In certain embodiments, at least 5% of cells exposed to a transgene undergo stable integration of the transgene into the genome (i.e. 5% efficiency of integration). In additional embodiments, the efficiency of integration is greater than 10%, greater than 15%, greater than 20%, greater than 25%, greater than 30%, greater than 35%, greater than 40%, greater than 45%, greater than 50%, greater than 55%, greater than 60%, greater than 65%, greater than 70%, greater than 75%, greater than 80%, greater than 85%, greater than 90%, greater than 95%, or greater than 98%.
The cell can be any type of cell, including eukaryotic, prokaryotic or Archaeal. Exemplary eukaryotic cells include fungal cells (e.g., Trichoderma sp., Pichia pastoris, Schizosaccharomyces pombae and Saccharomyces cerevisiae), plant cells (e.g., Arabidopsis cells and tobacco BY2 cells), insect cells (e.g., Sf9, Sf21, and Drosophila S2 cells), vertebrate cells, teleost cells (e.g., Danio sp., e.g. Danio rerio or zebrafish), mammalian cells, primate cells and human cells. The transgene-containing transgene cassette can be an isolated and/or purified nucleic acid or can be part of a collection of nucleic acid molecules resulting from restriction enzyme digestion of a larger DNA molecule, e.g., a plasmid.
Cultured mammalian cell lines, useful for expression of recombinant polypeptides, include Chinese hamster ovary (CHO) cells, human embryonic kidney (HEK) cells, virally transformed HEK cells (e.g., HEK293 cells), NS0 cells, SP20 cells, CV-1 cells, baby hamster kidney (BHK) cells, 3T3 cells, Jurkat cells, HeLa cells, COS cells, PERC.6 cells, CAP® cells, CAP-T® cells (the latter two cell lines being commercially available from Cevec Pharmaceuticals, Cologne, Germany) and cancer cell lines such as A549 and PANC-1. A number of derivatives of CHO cells are also available such as, for example, CHO-DXB11, CHO-DG-44, CHO-K1 and CHO-S. Derivatives of any of the cells described herein obtained, for example, by mutagenesis, selection, gene knock-out, targeted integration (e.g., CRISPR/CAS9; zinc finger nucleases) or cloning, are also provided. Mammalian primary cells can also be used. Myeloma and hybridoma cells can also be used.
Nucleic acids comprising sequences encoding retroviral integrase activity, for use in these methods, are described elsewhere herein.
Each retrovirus encodes its own integrase protein, has unique LTR sequences and has a unique 5′ terminal sequence of its double-stranded DNA pre-integration intermediate. Accordingly, the present disclosure provides additional transgene vectors and transgene cassettes containing dLTR sequences and 5′-terminal inverted repeat sequences of a retrovirus other than HIV-1 and methods in which such transgene vectors and transgene cassettes are used in conjunction with nucleic acids encoding an integrase protein from the virus used to provide the dLTR and inverted repeat sequences.
For certain applications, it is desirable to insert a transgene(s) at a specific location in the genome of the target cell or target organism. Targeted integration is achieved by taking advantage of elements of the CRISPR-Cas9 targeting system. The Cas9 protein is a RNA-guided DNA endonuclease that cleaves DNA sequences that are complementary to a guide RNA. Guide RNAs can be synthesized to be complementary to any DNA sequence of choice, and are thereby able to target the Cas9 endonuclease to any DNA sequence of choice (i.e., a genomic DNA sequence complementary to the targeting portion of the sequence of the guide RNA). Moreover, mutants of Cas9 that lack endonuclease activity (so-called “dead Cas9” or dCas9) can be fused to functional domains (such as transcriptional activation domains and transcriptional repression domains) to target the activity of these domains to particular genomic sequences (e.g., promoters).
dCas9 is a catalytically inactive mutant of the Streptococcus pyogenes cas9 protein that lacks endonuclease activity. The dCas9 protein remains capable of binding to DNA/RNA duplexes and therefore can be targeted to a particular chromosomal sequence using a guide RNA of appropriate nucleotide sequence.
The amino acid sequence of S. pyogenes dCas9 is:
Lens epithelium-derived growth factor (LEGDF/p75) also known as psip1a, PC4 or SFRS1-interacting protein, is a host factor that participates in integration of the HIV genome into a host chromosome. The C-terminal portion of this protein contains an integrase-binding domain, which interacts with lentiviral integrase proteins and with other cellular proteins. The psip1a protein also binds to chromosomal DNA, thereby tethering integrase to chromosomal DNA at the integration site.
The amino acid sequence of zebrafish psip1a is:
For targeted integration using the transgene vectors disclosed herein, the transgene vector and integrase-encoding nucleic acid are supplemented with a nucleic acid (e.g., DNA, RNA) encoding a fusion between dCas9 and the psip1a (LEDGF) protein, in conjunction with a guide RNA whose targeting region is complementary to the genomic sequence at which integration is desired. The guide RNA targets the dCas9 portion of the fusion protein to the target genomic sequence, while the psip1a portion of the fusion protein interacts with integrase to tether the integrase/transgene cassette pre-integration complex to the target genomic sequence, thereby facilitating integration at the target genomic sequence. A schematic diagram illustrating this method is shown in
Accordingly, in certain embodiments for targeted integration of a transgene, the following constituents are introduced into the target cell:
(1) single guide RNA (sgRNA) with a sequence complementary to the target genomic sequence and a hairpin sequence that binds dCas9,
(2) a dCas9-psip1a fusion protein, or mRNA encoding a dCas9-psip1a fusion protein,
(3) mRNA encoding an integrase, and
(4) a transgene cassette.
In additional embodiments, sequences encoding the dCas9-psip1a fusion protein are present on a DNA molecule (e.g., a plasmid) and are under the transcriptional and translational control of elements that are active in the target cell.
In additional embodiments, sequences encoding the integrase protein are present on a DNA molecule (e.g., a plasmid) and are under the transcriptional and translational control of elements that are active in the target cell.
The foregoing methods for targeted integration rely on binding of the psip1a portion of the psip1a-dCas9 fusion protein to integrase molecules that are present at both ends of the transgene cassette in a preintegration complex. However, endogenous psip1a (already present in the cell) can compete with binding of the psip1a-dCas9 fusion protein to the integrase proteins present in the preintegration complex. Accordingly, in certain embodiments, the psip1a-dCas9 fusion protein is overexpressed in target cells, for example, by injecting RNA encoding the psip1a-dCas9 fusion protein at a molar excess to integrase RNA, by injecting a quantity of RNA encoding the psip1a-dCas9 fusion protein that will produce a molar excess of psip1a-dCas9 fusion protein to endogenous psip1a, or by introducing an expression vector containing sequences encoding the psip1a-dCas9 fusion protein (instead of RNA encoding the psip1a-dCas9 fusion protein) in which the sequences encoding the psip1a-dCas9 fusion protein are under the transcriptional control of sequences that express, or can be induced to express, the psip1A-dCas9-encoding sequence at high levels. In additional embodiments, inhibition of expression of endogenous psip1a, for example, by blocking splicing of psip1a pre-mRNA with morpholino compounds, can also be used to enhance the efficiency of targeted integration.
Translational control elements (e.g., Kozak sequences or the like) which are active at high levels in the host cell can also be included in vectors for overexpression of the psip1a-dCas9 fusion protein.
Transgene plasmids (pLTR vectors) were constructed by modifying the Gateway cloning destination vector pminiTol2 R4R3 (Addgene #40970, see also Kwan et al. (2007) Devel. Dynamics 236:3088-3099), which contains an attR4/attR3 gateway cassette flanked by Tol2 transposon sequences.
Briefly, the upstream and downstream miniTol2 sequences were replaced by two truncated HIV-1 LTR sequences. The upstream miniTol2 sequence was replaced with sequences containing the R and U5 sequences of the HIV-1 LTR (5′-dLTR; template from Addgene #14883). The downstream miniTol2 sequence was replaced with sequences containing dU3, R and U5 sequences of the HIV-1 LTR (3′-dLTR; template from Addgene #19319).
For sequence replacement, DNA molecules were constructed that contained the replacement sequence (5′ dLTR or 3′ dLTR) with the sequence 5′-ACTG-3′ appended to the 5′ end of the replacement sequence, and terminating in a recognition site for a blunt end-generating restriction enzyme (e.g., ScaI, PmeI or BstZ17I). Replacement DNA molecules were amplified by PCR, using Addgene 14883 and 19319 as templates, using Platinum™ Taq DNA Polymerase High Fidelity (Invitrogen). The amplification products were then inserted into the pminiTol2R4R3 vector. 5′ dLTR-containing PCR products were ligated into NdeI/XhoI-digested pminiTol2R4R3. 3′ dLTR-containing PCR products were ligated into ApaI/ScaII-digested pminiTol2R4R3.
A schematic diagram of the vector is shown in
Transgenes, and optionally regulatory sequences, are inserted into the transgene vector using standard gateway cloning methods. One-way, two-way, or three-way insertions can be used, depending on the nature of the transgene and associated (e.g., regulatory) sequences. See, e.g., Hartley et al., supra for additional details of methods for one-way, two-way and three-way insertions.
Plasmids were amplified in One Shot® TOP10 E. coli cells (Invitrogen, Carlsbad, Calif.) and purified using a PureLink® Quick Plasmid Miniprep Kit (Invitrogen) for subsequent microinjection, transfection, or production of mRNA by in vitro transcription.
The pCS2-integrase and pCS2-integrase-2A-tdTomato overexpression vectors were constructed using standard gateway cloning protocols with pCSDest2 (Addgene #22424), p3E-2a-tdTomato (Addgene #67707) and pME-integrase. pME-integrase was generated by conducting a standard gateway BP reaction using wild-type HIV-1 integrase in pET15b (Addgene #61668) as a template for PCR. A Kozak sequence was present in the vector for regulation of translation of the integrase sequences. All constructs were verified by DNA sequencing.
The p5E-CMV/SP6 plasmid (a 5′ entry gateway clone containing the CMV promoter) was obtained from Dr. Nathan Lawson. p5E-cmlc2 was obtained from a zebrafish Tol2 kit generated by Dr. Chien Chi-Bin. Kwan, K. M. et al. (2007) Dev Dyn 236:3088-3099. cmlc2 is a promoter that specifies transcription in the heart.
This example shows that co-injection of an EGFP-expressing transgene cassette and integrase-encoding mRNA, into zebrafish embryos, results in high-efficiency, stable transfection.
Adult zebrafish were housed in an Aquaneering (San Diego, Calif.) zebrafish housing system at 28° C. on a 14-hours light and 10-hours dark cycle. Single pair crossing were used to generate fertilized embryos for microinjection to test for stable genomic integration of transgenes. After analysis, selected embryos were incubated in the egg water at 28° C. for up to 6 days post-fertilization (dpf) before being raised in the main system.
A transgene cassette comprising sequences encoding enhanced green fluorescent protein (EGFP) under the control of a CMV promoter (pLTR-CMV-EGFP) was constructed by inserting a CMV promoter, EGFP cDNA and a BGH polyadenylation signal into the vector described in Example 1 using a 3-way (i.e., 5′ entry (CMV promoter), middle entry (EGFP) and 3′ entry (polyadenylation signal)) gateway insertion. See
Integrase-encoding mRNA was generated using a mMESSAGE mMACHINE® SP6 Transcription Kit (Invitrogen) with pCS2-Integrase, linearized with NotI, as a template. RNA was purified by phenol/chloroform extraction and ethanol precipitation.
One-cell zebrafish embryos were co-injected with the EGFP transgene cassette and the integrase mRNA, as shown schematically in
The injected embryos were analyzed for the expression of the EGFP transgene at 6 days post-fertilization (DPF). For fluorescence analysis, live embryos were placed in egg water containing 1× tricaine. Fluorescence images were acquired using a Leica M165 FC stereo microscope. Injected embryos were categorized in five different groups (Group 0 through Group 4) based on the degree of GFP expression, with Group 0 showing no EGFP fluorescence and Group 4 showing the highest amount of EGFP fluorescence. Groups 2-4 represent successful genome integration with strong transgene expression and a high potential for germ line transmission in F1 fish. Group 0 and Group 1 represent fish in which no integration occurred (Group 0) or a very small amount of integration occurred (Group 1).
A comparison of integration levels using two different doses of injected nucleic acid (a high dose of 25 ng/ul each of mRNA and DNA or a low dose of 12.5 ng/ul each) was performed, and the results were quantified. As shown in
Existing methods for construction of transgenic zebrafish (and other organisms) without using viral vectors include (1) Tol2-mediated transgenesis and (2) meganuclease (e.g., I-SceI)-mediated transgenesis. Accordingly, the methods described herein were compared to these two methods of performing transgenesis in zebrafish.
To test for the ability to direct tissue-specific expression of a transgene introduced by the methods disclosed herein, a transgene cassette containing sequences encoding EGFP under the control of Flilep enhancer (which directs transcription in endothelial cells) was constructed and denoted pLTR-Fli1ep:EGFP-pA. The p5E-fli1ep plasmid, containing the Flilep enhancer, was obtained from Dr. Nathan Lawson.
As in Example 3, fish that developed from injected embryos were grouped into five categories based on the degree of EGFP expression (negative expression: Group 0, low expression: Group 1 and increasing degrees of positive expression: Groups 2, 3 and 4). Fluorescent images of zebrafish that developed from embryos that had been injected with integrase mRNA and a transgene cassette containing sequences encoding enhanced green fluorescent protein under the transcriptional control of the endothelial-specific Flilep enhancer showed that; in Groups 2, 3 and 4; EGFP expression was primarily restricted to the vasculature. In addition, the levels of stable transgene integration were 57% in fish injected with 25 ng/ul and 27% in fish injected with 12.5 ng/ul (
In additional experiments using the catalytically-deficient integrase mutants D116A and E152A, a much lower integration efficiency (approximately 10%) was obtained; and all integrants were in Group 2 (i.e., low level of integration). These results indicate that, although a certain amount of integration can occur in the absence of integrase activity, high levels of integration depend on functional integrase.
This example shows that high levels of stable integration are obtained following co-transfection, into cultured human cells, of (1) a transgene cassette containing EGFP-encoding sequences under the transcriptional control of a CMV promoter and a (2) plasmid encoding HIV-1 integrase under the transcriptional control of a CMV promoter (pCS2-Integrase-2A-tdTomato). The transgene cassette was obtained by cleavage of the pLTR-CMV-EGFP plasmid (described in Example 3) with BstZ17I. The design of the experiment is shown schematically in
Two human epithelial cancer lines, A549 and PANC-1, were used in these experiments. Human lung cancer cell line A549 was acquired from ATCC (#CCL-185) and maintained in F12 medium supplied with 10% fetal bovine serum at 37° C. in a humidified atmosphere of 5% CO2/95% air in the presence of antibiotics. The human pancreatic cancer line PANC-1 was obtained from Sigma (#87092802) and maintained in DMEM with 10% fetal bovine serum at 37° C. in a humidified atmosphere of 5% CO2/95% air in the presence of antibiotics.
Transfection was conducted using Lipofectamine® 3000 (Invitrogen, Carlsbad, Calif.) according to the manufacturer's instructions. Briefly, one day before transfection, cells were seeded at a density of 2×105 cells/well in a 12-well plate. After 24 hours, the cells were rinsed with phosphate-buffered saline (PBS). Each group was transfected with a mixture of 1 μg BstZ17I-digested pLTR-CMV-EGFP and 1 μg of pCS2-Integrase-2A-tdTomato, using Lipofectamine®-p3000 mixture in Opti-MEM for 4 hours, after which an equal volume of complete medium was added. In control experiments, cells were transfected with the EGFP transgene cassette and a plasmid that lacked sequences encoding integrase (pSC2-2ATomato-pA).
One day after transfection, the cells were subcultured and analyzed by flow cytometry to determine the number of cells that received both DNA molecules. Single cell suspensions of the samples were prepared by trypsinization, and the fluorescence intensity of each sample was evaluated on a LSR II flow cytometer (BD Biosciences, San Jose, Calif.). For each analysis, at least 10,000 events were recorded. Green (GFP) and Red (tdTomato) fluorescent signal were used as indicators for successful co-transfection of transgene and integrase plasmid, respectively, and the percentages of double positive events (both red and green fluorescence) were calculated using FACSDiva software (BD Biosciences). Untransfected cells served as a negative control.
Seven days after transfection (approximately three passages), at which time only stable transfectants persist, the degree of integration was determined by fluorescence imaging using a Leica M165 FC stereomicroscope. At least four images were taken in random locations of the dish for each experimental group. Representative images are shown in
To quantify the percentage of the cells with positive GFP expression, all images were analyzed and processed consistently using Image J by adjusting the threshold and counting the positive pixels.
Quantified results were averaged and normalized to the transfection efficiency.
As noted elsewhere herein, retroviral integrases require a linear double-stranded DNA molecule, containing the terminal inverted repeat sequence 5′-ACTG-3′, as a substrate for end processing and strand transfer (i.e., integration). In this example, the effect, on integration efficiency, of the location of the 5′-ACTG-3′ sequence (the IR sequence), with respect to the termini of the transgene cassette, was tested. To this end, four versions of a transgene vector containing sequences encoding the red fluorescent protein tdTomato, under the transcriptional control of the cardiac-specific cMLC2 promoter and the BGH polyadenylation site, were generated. Each had a different end structure external to the IR sequences. Cleavage of the transgene vector with ScaI generated perfect 5′-ACTG-3′ blunt-ends on the resulting transgene DNA cassette; while cleavage with BstZ17I generated a transgene cassette with one additional terminal nucleotide exterior to the IR sequence (5′-TACTG-3′) and cleavage with PmeI generated a transgene cassette with two extra nucleotides exterior to the IR sequence (5′-AAACTG-3′). Double digestion with MluI and ApaI generated ends with 4-nucleotide overhangs exterior to the IR sequence.
One-cell embryos were injected with 12.5 ng/μl of integrase-encoding mRNA and 12.5 ng/μl of the of each of four different tdTomato-encoding transgene cassettes. Fish developing from injected embryos were analyzed for red fluorescence at 6 days post-fertilization dpf) and categorized into three groups: Group 0 (no fluorescence); Group 1 (partial fluorescence in heart) and Group 2 (full fluorescence in heart). The percentage of embryos in Groups 1 and 2 (i.e., percentage of embryos in which transgene was stably integrated) is shown in
In additional experiments, the contribution of the LTR sequences that are present in the transgene cassette was investigated. The following results were obtained:
(a) transgenes whose expression was directed by an endothelium-specific enhancer, flanked on both ends with a 21-nucleotide U3 sequence that included a 5′-ACTG-3′ blunt-ended sequence (i.e., no dLTR sequences), integrated efficiently in the presence of integrase; however, integration was non-specific;
(b) transgenes with a single downstream 3′-dLTR (i.e., no upstream 5′ dLTR) integrated with higher efficiency than transgenes flanked by both a 5′-dLTR and a 3′-dLTR;
(c) transgenes with a single upstream 5′-dLTR (i.e., no downstream 3′ dLTR) integrated with lower efficiency than transgenes flanked by both a 5′-dLTR and a 3′-dLTR.
Statistical Analysis
All assays were carried out in triplicate or more. Data was expressed as a mean or stacked mean with standard deviation (SD). The Student's t-test was used to compare the mean between groups to determine statistical significance; with a p value <0.05 considered statistically significant.
A vector encoding a fusion between LEGDF (psip1A) and dCas9 was constructed as follows. Sequences encoding zebrafish psip1a (zpsip1a) cDNA were cloned from zebrafish DNA and inserted by gateway cloning into the pME entry vector. Cas9 sequences were obtained as a KpnI/NheI fragment produced by double digestion of the dCas9 plasmid #100091 (Addgene, Watertown, Mass.). The psip1a sequence, the cas9 sequence, linearized pCS expression vector (Miyoshi et al. (1998) J. Virol. 72:8150-8157), a nuclear localization sequence (NLS) and sequences encoding (GGS)5 (SEQ ID NO:17) linkers were joined by Gibson assembly (Gibson et al. (2009) Nature Methods 6:343-345) to generate two fusions: one in which dCas9 sequences are upstream of psip1a sequences; the other in which dCas9 sequences are downstream of psip1a sequences. Schematically, the two fusions have the following structures:
The nucleotide sequence of the pCS-NLS-dCas9-(GGS)5-zpsip1a vector is:
Vector backbone sequences are represented by uppercase letters. Underlined segments of the sequence are as follows:
35-51: SP6 promoter
64-78: β-globin translational leader sequence
94-114: nuclear localization sequence from SV40 large T-antigen
139-4242: dCas 9
4243-4287: (GGS)5 linker (SEQ ID NO:17) (not underlined)
4294-5535: zebrafish psip1a
5573-5768: SV40 polyadenylation signal
7020-7880: AmpR gene.
A map of this vector is shown in
The nucleotide sequence of the pCS-zpsip1a-(GGS)5-dCas9-NLS vector is:
Vector backbone sequences are represented by uppercase letters. Underlined segments of the sequence are as follows:
35-51: SP6 promoter
64-78: β-globin translational leader sequence
91-1329: zebrafish psip1a
1330-1374: (GGS)5 linker (SEQ ID NO:17) (not underlined)
1381-5484: dCas 9
5539-5559: nuclear localization sequence from SV40 large T-antigen
5609-5804: SV40 polyadenylation signal
7056-7916: AmpR gene. A map of this vector is shown in
Additional vectors are constructed with different linker sequences between the Cas9-encoding and psip1a-encoding sequences. In these constructs, the (GGS)5 linker (SEQ ID NO:17) is replaced by the more rigid (EAAAK)n linker (in which n=1-4) (SEQ ID NO:18) and the flexible (GGGGS)n linker (in which n=1-4) (SEQ ID NO:19).
This plasmid was constructed by gateway cloning using p5E-CMV, pME-tdTomato, and the two-way Gateway cloning vector pLTRB-R4R2. The nucleotide sequence of this vector is:
Underlined segments of the sequence are as follows:
A map of this vector is shown in
This example describes targeted integration of a td-Tomato transgene in zebrafish. Transgenic zebrafish embryos (pTol2-CMV:EGFP-pA) that contained an integrated EGFP gene were constructed by Tol2-mediated transgenesis as described in Example 4. One-cell embryos obtained from adult zebrafish containing an exogenous EGFP gene that had been introduced by Tol2-mediated transgenesis of embryos (as described in Example 4) were used as target organisms. For each experiment, approximately 200 embryos were injected with a mixture of:
in which the GGG sequence at the 3′ end is the protospacer adjacent motif (PAM) sequence.
Because the target embryos are transgenic for EGFP, they exhibit green fluorescence. However, if the td-Tomato-encoding transgene cassette is integrated at the target sequence, the EGFP gene will be disrupted and the cell will exhibit red fluorescence, due to the integrated td-Tomato transgene.
Injected embryos were cultured in egg water (60 μg/ml Instant Ocean® sea salt) at 28.5° C. Five hours after injection, embryos were analyzed by confocal fluorescence microscopy. The results, shown in
Transgenic zebrafish (made, e.g., by I-SceI-mediated methods, Tol2-mediated methods, or the methods disclosed herein) containing an integrated EGFP gene (or any other gene providing a fluorescent readout) are selected in which a single exogenous EGFP gene is integrated at a locus that does not contain a coding region or regulatory element. This is achieved, for example, by outcrossing transgenic fish until a strain is obtained that contains a single EGFP insertion in a non-coding, non-regulatory region (confirmed, e.g., by determining the DNA sequence of the insertion site). Such a strain is used as a test system, e.g., for optimizing the methods and compositions disclosed herein. For example, targeted integration, into the EGFP sequences of such strains, of transgene cassettes containing sequences encoding a non-green fluorescent molecule, such as td-Tomato, results in loss of green fluorescence and acquisition of red fluorescence.
This example provides results of an experiment to determine the effect of additional NLS sequences, in the integrase protein, on the efficiency of integration. The pFLi1ep:EGF P-pA transgene cassette (see Example 5) was co-injected into one-cell embryos with mRNA encoding one of three different integrase proteins: wild-type HIV-1 integrase, HIV-1 integrase with a c-myc NLS attached to the N-terminus, and HIV-1 integrase with a c-myc NLS attached to the C-terminus.
Six days post-fertilization, embryos were analyzed by confocal fluorescence microscopy and sorted into Groups (0 through 4) as described in Examples 3 and 5. The results, shown in
This application is a United States National Stage Application filed under 35 U.S.C 371 of PCT Patent Application Serial No. PCT/US2020/070344, filed Jul. 31, 2020, which claims Provisional Patent Application No. 62/881,822, filed Aug. 1, 2019, the disclosure of all of which are hereby incorporated by reference in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/070344 | 7/31/2020 | WO |
Number | Date | Country | |
---|---|---|---|
62881822 | Aug 2019 | US |