The present invention relates to polypeptides with transpositional activity, particularly engineered polypeptides with transpositional activity, nucleic acids encoding such polypeptides, vectors comprising said nucleic acids, a cell comprising said nucleic acid, methods of integrating an exogenous nucleic acid into the genome using said polypeptides, said polypeptides for use in medicine and/or several gene therapies; and a pharmaceutical composition comprising said polypeptides, nucleic acids, vectors or cells. Particularly, the present invention relates to said polypeptide, which is engineered to have a gain of specificity of integration into the genome.
Polypeptides with transposase activity are a popular tool for therapeutic genome engineering. Transposons or transposable elements include a (short) nucleic acid sequence with terminal repeat sequences upstream and downstream thereof. Active transposons encode enzymes that facilitate the excision and insertion of the nucleic acid into target DNA sequences.
The Sleeping Beauty (SB, a Tc1/mariner superfamily transposon) transposon is one of the known tools, which is used for therapeutic genome engineering. This exemplary tool has been investigated in the past and in particular a variant was developed which shows hyperactivity, resulting in an enhanced efficient insertion of transposons of varying size into the nucleic acid of a cell or the insertion of DNA into the genome of a cell thus allowing more efficient transcription/translation than known transposons.
SB is a synthetic transposon that was reconstructed based on sequences of transpositionally inactive elements isolated from fish genomes. SB is the most thoroughly studied vertebrate transposon to date, and it supports a full spectrum of genetic engineering applications, including the generation of transgenic cell lines, induced pluripotent stem cell (iPSC) reprogramming, phenotype-driven insertional mutagenesis screens in the area of cancer biology, germline gene transfer in experimental animals and somatic gene therapy both ex vivo and in vivo. On the genomic scale, SB transposons exhibit a close-to-random integration profile with a slight bias towards integration into genes and their upstream regulatory sequences in cultured mammalian cell lines. On a local scale, SB preferentially inserts into TA dinucleotides (like Mos1), and shows additional target site preferences based on physical properties of the DNA, including bendability, A-philicity and a symmetrical pattern of hydrogen bonding sites in the major groove of the tDNA. However, random integration into the genome carries a certain genotoxic risk in human applications. Thus, there is a need of a transposon system, wherein integration is secured and insertion into the genome does not pose a risk with respect to oncogenic transformation, or at least wherein the risk is reduced.
The present inventors have unexpectedly found specific transposase variants, which exhibit inter alia: a gain of specificity of integration into the genome, in particular into palindromic AT repeat target sequence. The inventors demonstrate enhanced DNA bendability of target sites enriched in non-nucleosomal DNA by the discovered variants. The variants were further found to de-target exons as well as transcriptional regulatory regions of genes in the human genome, thus enhancing safety and utility of the variants in gene therapy applications.
In a first aspect (aspect 1a), the present invention provides a polypeptide with transposase activity comprising a variant of a naturally occurring transposase containing the following secondary structural elements:
Alternatively, the invention provides, in aspect 1b, a polypeptide with transpositional activity comprising or consisting of a transposase of the Tc1/mariner superfamily, wherein an amino acid position that corresponds to amino acid position 248, 247 and/or 187 of a sleeping beauty (SB) transposase of SEQ ID NO: 1 is substituted with a different amino acid, wherein the transposase has a gain in specificity of integration into the genome.
The gain in specificity means that the number of integration events into exons upon transposition with the polypeptide is reduced by at least 25% compared to the number of integration events into exons observed when using SB according to SEQ ID NO: 1.
Optionally, the transposase is a sleeping beauty transposase or a variant thereof having at least 70% sequence identity to SEQ ID NO: 1.
The polypeptide disclosed herein may, in one embodiment, comprise a substitution in the amino acid position that corresponds to amino acid position 248 of the SB transposase, wherein, preferably, said substitution is selected from the group consisting of K248R, K248S, K248V, K248I, and K248C.
In one embodiment, the polypeptide optionally comprises a substitution in the amino acid position that corresponds to amino acid position 247 of the SB transposase, wherein, preferably, said substitution is selected from the group consisting of P247R, P247C, P247A and P247S.
The polypeptide may, in one embodiment, comprise a substitution in the amino acid position that corresponds to amino acid position 187 of the SB transposase, wherein, preferably, said substitution is selected from the group consisting of H187A, H187N, H187C, H187Q, H187G, H187I, H187L, H187M, H187S, H187V, H187W, H187K, H187R, H187E, H187P and H187T.
In one embodiment, the polypeptide may comprise one or more of the following substitutions:
Alternatively, the invention provides, in aspect 1c, a polypeptide comprising SEQ ID NO: 1 or a variant thereof with transpositional activity and at least 70% sequence identity to SEQ ID NO: 1 and wherein the SEQ ID NO: 1 or the variant thereof comprises one or more of the following substitutions:
Said polypeptide can have the secondary structural elements recited above. It can be a polypeptide of aspect 1a. Preferably, it exhibits a gain of specificity of integration into the genome compared to SB100X.
In a second aspect, the invention relates to a nucleic acid comprising a nucleic acid sequence encoding a polypeptide of the first aspect.
In a third aspect, the invention relates to a vector comprising the nucleic acid molecule of the second aspect.
In a fourth aspect, the invention relates to a cell comprising the nucleic acid of the second aspect or the vector of the third aspect.
In a fifth aspect, the invention relates to an in vitro method of integrating an exogenous nucleic acid into the genome of a cell, the method comprising the steps of:
In a sixth aspect, the invention relates to an in vivo method of integrating an exogenous nucleic acid into the genome of a cell of a subject, the method comprising the step of administering the polypeptide, the nucleic acid, or the vector of any of the afore mentioned aspects of the invention and a nucleic acid comprising the exogenous nucleic acid or a vector comprising said nucleic acid to the subject.
In another aspect, the invention relates to the mentioned polypeptide, nucleic acid, vector, or cell for use in medicine, in particular, in gene therapy.
In a further aspect, the invention relates to a pharmaceutical composition comprising the mentioned polypeptide, nucleic acid, vector or cell and a pharmaceutically acceptable carrier, adjuvant or vehicle.
In the following, the content of the figures comprised in this specification is described. In this context please also refer to the detailed description of the invention above and/or below.
Before the present invention is described in detail below, it is to be understood that this invention is not limited to the particular methodology, protocols and reagents described herein as these may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention which will be limited only by the appended claims. Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art.
Preferably, the terms used herein are defined as described in “A multilingual glossary of biotechnological terms: (IUPAC Recommendations)”, Leuenberger, H. G. W, Nagel, B. and Klbl, H. eds. (1995), Helvetica Chimica Acta, CH-4010 Basel, Switzerland) and as described in “Pharmaceutical Substances: Syntheses, Patents, Applications” by Axel Kleemann and Jurgen Engel, Thieme Medical Publishing, 1999; the “Merck Index: An Encyclopedia of Chemicals, Drugs, and Biologicals”, edited by Susan Budavari et al., CRC Press, 1996, and the United States Pharmacopeia-25/National Formulary-20, published by the United States Pharmcopeial Convention, Inc., Rockville Md., 2001.
The practice of the present invention will employ, unless otherwise indicated, conventional methods of chemistry, biochemistry, cell biology, immunology, and recombinant DNA techniques which are explained in the literature in the field.
Throughout this specification and the claims which follow, unless the context requires otherwise, the word “comprise”, and variations such as “comprises” and “comprising”, will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps. In the following passages, different aspects of the invention are defined in more detail. Each aspect so defined may be combined with any other aspect or aspects unless clearly indicated to the contrary. In particular, any feature indicated as being optional, preferred or advantageous may be combined with any other feature or features indicated as being optional, preferred or advantageous.
In the following, the elements of the present invention will be described. These elements are listed with specific embodiments; however, it should be understood that they may be combined in any manner and in any number to create additional embodiments. The variously described examples and preferred embodiments should not be construed to limit the present invention to only the explicitly described embodiments. This description should be understood to support and encompass embodiments which combine the explicitly described embodiments with any number of the disclosed and/or preferred elements. Furthermore, any permutations and combinations of all described elements in this application should be considered disclosed by the description of the present application unless the context indicates otherwise.
In the following, some definitions of terms frequently used in this specification are provided. These terms will, in each instance of its use, in the remainder of the specification have the respectively defined meaning and preferred meanings.
As used in this specification and the appended claims, the singular forms “a”, “an”, and “the” include plural referents, unless the content clearly dictates otherwise.
The term “transposase” as used herein refers to an enzyme that is a component of a functional nucleic acid-protein complex capable of transposition and which is mediating transposition. The term “transposase” also refers to integrases from retrotransposons or of retroviral origin. A “transposition reaction” as used herein refers to a reaction where a transposon inserts into a target nucleic acid. Primary components in a transposition reaction are a transposon and a transposase or an integrase enzyme.
The terms “naturally occurring transposase” or “wild-type transposase” are used interchangeably in the context of the present invention as those unmodified amino acid sequences of the transposases that have been isolated from a naturally occurring species. Such amino acid sequences are readily available at EBI or NCBI. Sleeping Beauty is meant to be included in the term “naturally occurring transposase”.
The term “transpositional activity” as used herein refers to the activity of a given transposase that can be assessed in a transposition reaction. Suitable experimental set-ups are described in the experimental section herein or may use the classical binary transposition assay as described in Ivics, 1997 Cell 91:501-510.
The term “transposase” refers to a continuous stretch of amino acids that has transpositional activity of a full length naturally occurring transposase, e.g., at least 1% of the activity, at least 10% of the activity, at least 20% of the activity, at least 50% of the activity or optimally 100% of the activity or more. A transposase may lack 1 to 10 N- and/or C-terminal amino acid sequences of the full length naturally occurring transposase. Preferably, it may lack a methionine at position M1, if the transposase is comprised in a fusion protein, which comprises another protein at the N-terminus of the transposase.
The term “substitution” as used in the context of the present invention refers to the exchange of one nucleotide or amino acid with one nucleotide in a nucleic acid sequence and with one amino acid in an amino acid sequence, respectively.
The term “consensus sequence” as used in the context of the present invention refers to a calculated order of most frequent residues, either nucleotide or amino acid, found at each position in a sequence alignment between two or more sequences. It represents the results of a multiple sequence alignment in which related sequences are compared to each other and similar sequence motifs are calculated. Conserved sequence motifs are depicted as consensus sequences, which indicate identical amino acids, i.e. amino acids identical among the compared sequences, conserved amino acids, i.e. amino acids which vary among the compared amino acid sequence but wherein all amino acids belong to a certain functional or structural group of amino acids, e.g. polar or neutral, and variable amino acids, i.e. amino acids which show no apparent relatedness among the compared sequence.
The phrase “at a corresponding position” as used in the context of the present invention refers to the amino acid that aligns in an alignment of amino acid sequences with an amino acid sequence of a naturally occurring full length reference transposase, preferably with the full-length amino acid sequence of SB according to SEQ ID NO: 1. The position of the reference transposase is determined from the first N-terminal amino acid. Accordingly, position 187 of SB within the full-length amino acid sequence according to SEQ ID NO: 1 is “H”. Thus, an amino acid position corresponding to position 187 of SB in another transposase is an amino acid that aligns with “H” of SB at position 187 of SEQ ID NO: 1 in an alignment as exemplary shown in
In the context of present invention, the “primary structure” of a protein or polypeptide is the sequence of amino acids in the polypeptide chain. The “secondary structure” of a protein is the general three-dimensional form of local segments of the protein. It does not, however, describe specific atomic positions in three-dimensional space, which are considered to be tertiary structure. In proteins, the secondary structure is defined by patterns of hydrogen bonds between backbone amide and carboxyl groups. The “tertiary structure” of a protein is the three-dimensional structure of the protein determined by the atomic coordinates. The “quaternary structure” is the arrangement of multiple folded or coiled protein or polypeptide molecules in a multi-subunit complex.
The term “folding” or “protein folding” as used herein refers to the process by which a protein assumes its three-dimensional shape or conformation, i.e. whereby the protein is directed to form a specific three-dimensional shape through noncovalent and/or covalent interactions, such as but not limited to hydrogen bonding, metal coordination, hydrophobic forces, van der Waals forces, pi-pi interactions, electrostatic effects and/or intramolecular Cys bonds. The term “folded protein” thus, refers to the three-dimensional shape of parts or all of the protein, such as its secondary, tertiary, or quaternary structure.
The term “beta strand” as used in the context of the present invention refers to a 5 to 10 amino acid long section within a polypeptide chain in which the torsion angle of N—Ca—C—N in the backbone is about 120 degrees. Beta strands within a given protein sequence can be predicted after (multiple) sequence alignment (e.g. using Clustal omega) by retrieving annotations from pdb files. Prediction of beta strands can be performed using commonly available software tools like JPred. Start and end positions of the seven beta strands can be confirmed by additional structural alignment of PDB files using pyMol.
The terms “B sheet” or “beta sheet” as used interchangeably in the context of the present invention refers to two β-strands which form the β sheet. The β sheet is a common motif of regular secondary structure in proteins. Because peptide chains have a directionality conferred by their N-terminus and C-terminus, β-strands too can be said to be directional. They are usually represented in protein topology diagrams by an arrow pointing toward the C-terminus. Adjacent β-strands can form hydrogen bonds in antiparallel, parallel, or mixed arrangements. In an antiparallel arrangement, the successive B-strands alternate directions so that the N-terminus of one strand is adjacent to the C-terminus of the next. This is the arrangement that produces the strongest inter-strand stability because it allows the inter-strand hydrogen bonds between carbonyls and amines to be planar, which is their preferred orientation. A β structure is characterized by long extended polypeptide chains. The amino acid composition of β strands tends to favor hydrophobic (water fearing) amino acid residues. The side chains of these residues tend to be less soluble in water than those of more hydrophilic (water loving) residues. β structures tend to be found inside the core structure of proteins where the hydrogen bonds between strands are protected from competition with water molecules.
The terms “a-helix” or “alpha helix” as used interchangeably in the context of the present invention refers to a common motif in the secondary structure of proteins and is a right hand-helix conformation in which every backbone N—H group hydrogen bonds to the backbone C═O group of the amino acid located four residues earlier along the protein sequence. Among types of local structure in proteins, the α-helix is the most extreme and the most predictable from sequence, as well as the most prevalent.
The term “intervening region” or “loop” as used interchangeably in the context of the present invention refers to the amino acid residues between two β sheet sheets or a β sheet and a helix. Generally, the “intervening region” or “loop” are less structured or not structured at all, which provides them with flexibility.
To refer to a substitution within a given amino acid sequence the following designation is used “XNo.Z” means that the amino acid “X” of the original amino acid sequence at a particular position “No.” is substituted by amino acid “Z”, whereas “XNo.Z/X′No.′Z′/X”No.“Z″” is intended to mean that original amino acids “X”, “X” and “X″” at position “No.” may be substituted alternatively with several different amino acids that are indicated as “Z”, “′Z” and “Z”, respectively. The amino acid position is either indicated with respect to a specific length amino acid sequence, preferably a wild-type, full length sequence of a particular transposase.
The terms “vector” or “expression vector” are used interchangeably and refer to a polynucleotide or a mixture of a polynucleotide and proteins capable of being introduced or of introducing the collection of nucleic acids of the present invention or one nucleic acid that is part of the collection of nucleic acids of the invention into a cell, preferably a mammalian cell. Examples of vectors include but are not limited to plasmids, cosmids, phages, viruses or artificial chromosomes. In particular, a vector is used to transport the promoter and the collection of the nucleic acids or one nucleic acid that is part of the collection of nucleic acids of the invention into a suitable host cell. Expression vectors may contain “replicon” polynucleotide sequences that facilitate the autonomous replication of the expression vector in a host cell. Once in the host cell, the expression vector may replicate independently of or coincidental with the host chromosomal DNA, and several copies of the vector and its inserted DNA can be generated. In case that replication incompetent expression vectors are used—which is often the case for safety reasons—the vector may not replicate but merely direct expression of the nucleic acid. Depending on the type of expression vector the expression vector may be lost from the cell, i.e. only transiently expresses the neo-antigens encoded by the nucleic acid or may be stable in the cell. Expression vectors typically contain expression cassettes, i.e. the necessary elements that permit transcription of the nucleic acid into an mRNA molecule.
The term “viral vector” is used in the context of the present invention to refer to a single or double stranded nucleic acid sequence that can assemble into an infectious viral particle. This nucleic acid sequence may be a full or partial viral genome. In the latter case the viral genome preferably comprises one or more heterologous genes. For some viral particles only very short sequences of the viral genome are required to allow assembly of an infectious viral particle. For example, for assembly of an infectious adeno-associated viral particle only a short (about 200 bp long) repeat sequence placed at the 5′ and 3′ of a heterologous nucleic acid of a given length (typically between 4.5, to 5.3 kB for an adeno-associated virus) will allow assembly of infectious adeno-associated viral particles. The minimal nucleic acid sequence for assembly of a given virus are well known. The larger the viral genome and the smaller the minimal viral sequence for assembly of an infectious viral particle, the bigger the heterologous gene(s) can be that can be inserted into the viral vector.
“Sequence similarity” indicates the percentage of amino acids that either are identical or that represent conservative amino acid substitutions. The term “Sequence identity” between two amino acid sequences indicates the percentage of amino acids that are identical between two given sequences. The alignment for determining sequence similarity, preferably sequence identity, can be done with art known tools, preferably using the best sequence alignment, for example, using CLC main Workbench (CLC bio) or Align, using standard settings, preferably EMBOSS: needle, Matrix: Blosum62, Gap Open 10.0, Gap Extend 0.5. The percentage of identity is determined with reference to the full-length sequence that is used for comparison and not just for the sequence or sequence stretch with the highest similarity. Thus, an amino acid that shares 100% sequence identity to 50 consecutive amino acids of the 100 amino acid long sequence that is used for comparison only has 50% sequence identity (on the assumption that there are no further amino acids that share any identity outside the 50 consecutive amino acids.
The term “pharmaceutically acceptable”, as used herein, refers to the non-toxicity of a material, which, preferably, does not interact with the action of the active agent of the pharmaceutical composition. In particular, “pharmaceutically acceptable” means approved by a regulatory agency of the Federal or a state government or listed in the U.S. Pharmacopoeia, European Pharmacopoeia or other generally recognized pharmacopeia for use in animals, and more particularly in humans.
The term “carrier” refers to an organic or inorganic component, of a natural or synthetic nature, in which the active component is combined in order to facilitate, enhance or enable application. According to the invention, the term “carrier” also includes one or more compatible solid or liquid fillers, diluents, excipients or encapsulating substances, which are suitable for administration to a subject. Possible carrier substances (e.g., diluents) are, for example, sterile water, Ringer's solution, Lactated Ringer's solution, physiological saline, bacteriostatic saline (e.g., saline containing 0.9% benzyl alcohol), phosphate-buffered saline (PBS), Hank's solution, fixed oils, polyalkylene glycols, hydrogenated naphthalenes and biocompatible lactide polymers, lactide/glycolide copolymers or polyoxyethylene/polyoxy-propylene copolymers. In one embodiment, the carrier is PBS. The resulting solutions or suspensions are preferably isotonic to the blood of the recipient. Suitable carriers and their formulations are described in greater detail in Remington's Pharmaceutical Sciences, 17th ed., 1985, Mack Publishing Co.
The term “cell” is used in the context of the present invention to refer to a eukaryotic or a prokaryotic cell, such as a bacterial cell, a yeast cell, or a cell of a mammal, preferably a cell of a human, a mouse, a rat, a rabbit, a dog, a monkey, or a cat.
In the following different aspects of the invention are defined in more detail. Each aspect so defined may be combined with any other aspect or aspects unless clearly indicated to the contrary. In particular, any feature indicated as being preferred or advantageous may be combined with any other feature or features indicated as being preferred or advantageous.
In the work leading to the present invention, it was surprisingly shown that variants of a polypeptide with transposase activity have a remarkable gain of specificity of integration into the genome.
The invention provides a polypeptide with transpositional activity comprising or consisting of a transposase of the Tc1/mariner superfamily, wherein an amino acid position that corresponds to amino acid position 248, 247 and/or 187 of a sleeping beauty (SB) transposase of SEQ ID NO: 1 is substituted with a different amino acid, wherein the transposase has a gain in specificity of integration into the genome. The polypeptide may be a sleeping beauty transposase or a variant thereof having at least 70% sequence identity to SEQ ID NO: 1.
“Gain in specificity” preferably means that the number of integration events into exons upon transposition with the polypeptide is reduced by at least 10%, preferably, at least 25% compared to the number of integration events into exons observed when using SB according to SEQ ID NO: 1.
The polypeptide may, in one embodiment, comprise a substitution in the amino acid position that corresponds to amino acid position 248 of the SB transposase, wherein, preferably, said substitution is selected from the group consisting of K248R, K248S, K248V, K248I, and K248C.
The polypeptide may, in one embodiment, comprise a substitution in the amino acid position that corresponds to amino acid position 247 of the SB transposase, wherein, preferably, said substitution is selected from the group consisting of P247R, P247C, P247A and P247S.
The polypeptide may, in one embodiment, comprise a substitution in the amino acid position that corresponds to amino acid position 187 of the SB transposase, wherein, preferably, said substitution is selected from the group consisting of H187A, H187N, H187C, H187Q, H187G, H187I, H187L, H187M, H187S, H187V, H187W, H187K, H187R, H187E, H187P and H187T.
The polypeptide may also comprise a combination of substitutions in these positions, preferably, of the recited substitutions. Other mutations, e.g., mutations that increase transpositional activity (for example, as recited below), may also be comprised.
The polypeptide of the invention may comprise one or more of the following substitutions:
In one aspect, the present invention also provides a polypeptide with transpositional activity comprising, consisting essentially of, or consisting of a variant of a naturally occurring transposase with the following secondary structural elements:
The present invention is based on the surprising discovery that substituting certain amino acids in transposases leads to transposases which gain specificity of integration into the genome, in particular into palindromic AT repeat target sequence, that enhance the DNA bendability of target sites enriched in non-nucleosomal DNA and de-target exons as well as transcriptional regulatory regions of genes in the human genome. All these properties contribute to the enhancing of the safety and utility of the polypeptides of the invention in gene therapy applications. These properties are very favourable for a transposase because it is less likely to inactive or mutate an expressed gene. The improved integration pattern can be assessed by the experiments described in Examples 3 and 7 of the present application and presented in
Preferably the variant of the invention retains transpositional activity of the full length wild-type transposase, i.e. has at least 1%, at least 10%, at least 20%, at least 30%, at least 50%, at least 75%, optionally 100% or more of the transpositional activity of the full length wild type transposase on which the variant is based, preferably 75%, optionally 100% of the transpositional activity of SB according to SEQ ID NO: 1.
Thus, α1 helix-α2 helix-1 sheet-β2 sheet-β3 sheet-β4 sheet-β5 sheet-α3 helix-β6 sheet-η1-α4 helix-η2-α5 helix-α6 helix-α7 helix-α8 helix of other transposases will begin and end at amino acid positions corresponding to the above indicated amino acid positions within SB. This allows the skilled person to determine the secondary structural elements: α1 helix-α2 helix-β1 sheet-β2 sheet-β3 sheet-β4 sheet-β5 sheet-α3 helix-β6 sheet-η1-α4 helix-η2-α5 helix-α6 helix-α7 helix-α8 helix within any given transposase and to correspondingly determine the loop connecting the β3 sheet and the β4 sheet and the amino acid in η1 connecting the β6 sheet with α4 helix within any given transposase.
Furthermore, the skilled person can analyse the three-dimensional structure and amino acid sequence in relation to that structure in other transposases and predict the alignment of amino acids based on the three-dimensional structures. Moreover, computer programs can assist in predicting the secondary structure of a protein or polypeptide of interest, i.e. for example of any other transposase of interest. One method is based on homology modelling. It is usual that, two polypeptides or proteins which have a sequence identity of greater than 30%, or similarity greater than 40% often have similar structural topologies. The growth of the protein structural database (PDB) has provided enhanced predictability of secondary structure, including the potential number of folds within a polypeptide's or protein's structure. Further methods of predicting secondary structure of proteins and polypeptides are known in the art and include for example threading, profile analysis, and evolutionary linkage. Thus, a skilled person can easily add further transposases of this superfamily to the alignment allowing the identification of amino acids corresponding to amino acid position 187, 247 and 248 of SB in these transposases, and/or to the amino acids in the loop connecting the β3 sheet and the β4 sheet of any given transposase, or the amino acid in ηl connecting the β6 sheet with α4 helix of any given transposase.
The loop connecting the β3 sheet and the β4 sheet of a given transposase may comprise in the naturally occurring sequence an H, F, L, S or Y. In this case it is preferred to substitute these amino acids with another amino acid that is likely to alter the protein-DNA interaction of the transposase.
If the naturally occurring amino acid in the loop connecting the β3 sheet and the β4 sheet is H it is preferred that this amino acid is substituted with V, A, N, C, Q, G, I, L, M, F, P, R, S, T, W, K, Y, or E, preferably with V, I, L, M, P, S, or T, more preferably with V, I, L, T or P, most preferably with V, P or T.
If the naturally occurring amino acid in the loop connecting the β3 sheet and the β4 sheet is F it is preferred that this amino acid is substituted with V, A, N, C, Q, G, I, L, M, P, R, S, T, W, K or Y, preferably with V, I, L, M, P, S, or T, more preferably with V, I, L, T or P, most preferably with V, P or T.
If the naturally occurring amino acid in the loop connecting the β3 sheet and the β4 sheet is Y it is preferred that this amino acid is substituted with V, A, N, C, Q, G, I, L, M, F, P, R, S, T, W, or K, preferably with V, I, L, M, P, S, or T, more preferably with V, I, L, T or P, most preferably with V, P or T.
If the naturally occurring amino acid in the loop connecting the β3 sheet and the β4 sheet is L it is preferred that this amino acid is substituted with V, A, N, C, Q, G, I, M, F, P, R, S, T, W, H, Y or K, preferably with V, P or T.
If the naturally occurring amino acid in the loop connecting the β3 sheet and the β4 sheet is S it is preferred that this amino acid is substituted with V, A, N, C, Q, G, I, L, M, F, P, R, T, W, H, Y or K, preferably with V, P or T.
The η1 connecting the β6 sheet with α4 helix of a given transposase may comprise in the naturally occurring sequence K, I, S, T, V, P or C (position 248 in SB) or Q, P, S, or T (position 247 in SB). In this case it is preferred to substitute these amino acids with another amino acid that is likely to alter the protein DNA interaction of the transposase.
If the naturally occurring amino acid in η1 connecting the β6 sheet with α4 helix is at least one naturally occurring Q, P, S, or T it is preferred that this amino acid is substituted with R, C, S or A, preferably with R or S.
If the naturally occurring amino acid in η1 connecting the β6 sheet with α4 helix is at least one naturally occurring K, I, S, T, V, P or C it is preferred that this amino acid is substituted with R, V, I, C, A, P, or Q, preferably with I, C, or R, most preferably with R.
In a further preferred embodiment both a naturally occurring Q, P, S, or T and a K, I, S, T, V, P or C in η1 connecting the β6 sheet with α4 helix is substituted as outlined above, preferably both are substituted with R.
In a further embodiment or in another aspect of the present invention the substitution may also occur in a variant of a naturally occurring transposase. The term “variant of a naturally occurring transposase” refers to an amino acid sequence that has at least 70% amino acid sequence identity to the amino acid sequence of a transposase of a naturally occurring transposase, preferably to one of the transposases according to SEQ ID NO: 1 to 17. More preferably a variant of a transposase that is further modified by one or more substitutions as outlined above has at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to the amino acid sequence of the transposase of a naturally occurring transposase, preferably to one of the transposases according to SEQ ID NO: 1 to 17, most preferably to the SB transposase according to SEQ ID NO: 1. If the substitutions of the invention are in the context of a variant or a naturally occurring transposase, the degree of identity is determined without the substitution of the invention. Thus, a variant of a naturally occurring transposase may have 70% amino acid sequence identity to SEQ ID NO: 1 and additionally a substitution according to (i), (ii) and/or (iii) as outlined above.
It is further preferred that the number of integration events into exons is reduced in the polypeptide of the present invention based on the amino acid sequence of SEQ ID NO: 1 to 17, e.g., by at least 25%, more preferably at least 40% or more preferably at least 50%, if compared to the number of integration events into exons observed when using a transposase according to SEQ ID NO: 1 to 17, respectively, if the polypeptide of the invention comprises or consists of a variant of a transposase according to SEQ ID NO: 1 to 17.
In a preferred embodiment the polypeptide of the present invention comprises, consists or essentially consists of the full length naturally occurring transposase, preferable comprises consists or essentially consists of the full length naturally occurring transposase according to SEQ ID NO: 1 to 17 or a variant thereof. In this preferred embodiment the reference amino acid sequence for the determination of the variant is the full-length sequence. Thus, in this preferred embodiment that polypeptide of the present invention the variant comprises, consists or essentially consists of an amino acid sequence that has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to the amino acid sequence of a naturally occurring transposase, preferably to a transposase according to SEQ ID NO: 1 to 17, most preferably to the transposase SB according to SEQ ID NO: 1. Again the amino acid sequence identity of the variant of a naturally occurring transposase is determined in the absence of the substitutions of the invention according to (i), (ii) and/or (iii).
In each of the above cases a variant of the invention retains transpositional activity of the naturally occurring or wild-type transposase. The amino acid changes may have an effect on the producibility and/or stability of the resulting transposases or may affect another activity, e.g. the transpositional activity. Substitutions that increase the transpositional activity of transposases, in particular of SB are described further below and have previously been described in WO 2009/003671. The presence of such substitutions in addition to the substitutions according to (i), (ii) and/or (iii) described above, are particularly preferred.
In one embodiment of the first aspect of the present invention, the naturally occurring transposase is selected from the group consisting of Sleeping Beauty (SB) (SEQ ID NO: 1), Tdr1 (SEQ ID NO: 2), ZB (SEQ ID NO: 3), FP (SEQ ID NO: 4), Passport (SEQ ID NO: 5), TCB2 (SEQ ID NO: 6), S (SEQ ID NO: 7), Quetzal (SEQ ID NO: 8), Paris (SEQ ID NO: 9), Tc1 (SEQ ID NO: 10), Minos (SEQ ID NO: 11), Uhu (SEQ ID NO: 12), Bari (SEQ ID NO: 13), Tc3 (SEQ ID NO: 14), Impala (SEQ ID NO: 15), Himar (SEQ ID NO: 16), or Mos1 (SEQ ID NO: 17), preferably the transposase is SB (SEQ ID NO: 1), ZB (SEQ ID NO: 3), FP (SEQ ID NO: 4), Passport (SEQ ID NO: 5), or Minos (SEQ ID NO: 11). In the most preferred embodiment, the transposase is SB (SEQ ID NO: 1).
In a particular preferred embodiment or in a further aspect of the invention, the polypeptide of the present invention comprises, consists or essentially consists of the transposase of SEQ ID NO: 1 to 17 or a variant thereof comprising at least one of the substitutions according to (i), (ii) or (iii) outlined above.
The transposases that are closely related to SB not only share secondary structural elements that are arranged in the above indicated order but also share significant amino acid similarity or even identity in certain key regions involved in DNA-protein interaction. Thus, the substitutions of the inventions in this subgroup of transposases that are closely related to SB can be characterized by consensus amino acid sequences that surround the amino acid at position 187 of SB. This consensus sequence is X1X2X3X12X4X5GX6 (SEQ ID NO: 18) with the meaning for the variables indicated in the following. The other consensus amino acid sequence that surrounds the amino acid sequences at position 247 and 248 of SB is X7X8DX9X10X13X11X14H (SEQ ID NO: 19), wherein the variables have the meanings indicated in the following. A given transposase that is closely related to SB may comprise an amino acid sequence that fulfills the consensus sequence according to SEQ ID NO: 18 or SEQ ID NO: 19 or both. It is preferred that the transposase that is closely related to SB comprises amino acid sequences fulfilling both consensus sequences. Such transposes are highly similar to SB both in a region surrounding amino acid position 187 of SB and surrounding amino acid positions 247 and 248. Thus, the substitutions (i), (ii), and/or (iii) of the present invention are preferably comprised in a transposase comprising two consecutive amino acid sequence that fulfill the consensus according to SEQ ID NO: 18 and SEQ ID NO: 19, respectively. Accordingly, in one embodiment, the polypeptide of the first aspect of the present invention or alternatively in another aspect the present invention relates to a polypeptide with transpositional activity comprising, consisting essentially of, or consisting of a variant of a naturally occurring transposase, which comprises
In a preferred embodiment,
If there is only one substitution in a polypeptide of the invention, it is preferably not H187F, H187Y or K248A.
If any of these substitutions are present in a variant of a transposase the sequence identity of the variant to the respective wild-type transposase sequence is again determined prior to the introduction of a substitution according to (i), (ii), and/or (iii).
In one embodiment, the polypeptide according to the invention, comprises SEQ ID NO: 1 or variants thereof with transpositional activity and at least 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO: 1, and wherein the SEQ ID NO: 1 or the variant thereof comprises one or more of the following substitutions:
In a preferred embodiment, the polypeptide according to the invention comprises SEQ ID NO: 1 or variants thereof with transpositional activity and at least 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to SEQ ID NO: 1, wherein the SEQ ID NO: 1 or the variant thereof comprises a substitution that increases its transpositional activity, wherein said substitution preferably is one or more of the substitutions described above. An increase in transpositional activity may be an increase compared to activity of wtSB, or SB of SEQ ID NO: 1. As noted above, various variants of naturally occurring transposases have been described. For example, WO 2009/003671 describes variants of transposases in particular of SB that have are hyperactive, i.e. that have an increased transpositional activity in comparison to the wild-type transposase, in particular SB. It is preferred that the substitutions of the invention are introduced into variants of the transposase that comprise substitutions or groups of substitutions that enhance one or more property, in particular the transpositional activity of the transposase. Thus, in one embodiment, the polypeptide according to the invention further comprises at least one of the following substitutions or groups of substitutions:
(1) K14R, K13D, K13A, K30R, K33A, T83A, I100L, R115H, R143L, R147E, A205K/H207V/K208R/D210E, H207V/K208R/D210E, R214D/K215A/E216V/N217Q; M243Q, E267D, T314N, and/or G317E;
The polypeptide of the first aspect of the invention most preferably comprises or consists of an amino acid sequence that is based on SEQ ID NO: 1 and wherein the following amino acids are substituted within the amino acid sequence of SEQ ID NO: 1:
In a preferred embodiment, the polypeptide of the first aspect comprises or consists of an amino acid of SEQ ID NO: 1 wherein the following amino acids have been substituted, which increase the transpositional activity: K14R, K33A, R115H, R214D/K215A/E216V/N217Q, M243H, and T314N and further comprising the following substitutions of the invention H187I, H187V, H187E, H187T, H187P or H187L, preferably H187V, and/or P247R, P247S, P247C, or P247A, preferably P247R; and/or K248R, K248A, K248S, K248V, K248I or K248C; preferably K248R. Particularly, preferred embodiments of the polypeptide of the first aspect of the invention have an amino acid sequence comprising or consisting of SEQ ID NO: 1 wherein the following amino acids have been substituted: substitutions:
Of course, further substitutions may also be comprised, but preferably, the sequence identity to SEQ ID NO: 1 is at least 70%, at least 80%, at least 90% or at least 95%.
Particularly preferred transposases of the present invention that combine an increased transpositional activity with the properties of the substitutions of the present invention are the following (amino acid substitutions increasing the transpositional activity are highlighted by bold+underline and the amino acid substitutions of the invention are highlighted by bold, underline and italic):
The inventive polypeptides (transposase variants), have several advantages compared to approaches in the prior art with the most prominent exhibiting a remarkable gain of specificity of integration into the genome.
In a second aspect, the invention relates to a nucleic acid comprising a nucleic acid sequence encoding a polypeptide of the first aspect.
In one embodiment of the second aspect of the invention, the (encoding) nucleic acid sequence is operably linked to at least one transcription control unit.
In one embodiment of the nucleic acid according to the invention the nucleic acid additionally comprises at least an open reading frame. In another embodiment, the nucleic acid additionally comprises at least a regulatory region of a gene. Preferably, the regulatory region is a transcriptional regulatory region, and more specifically the regulatory region is selected from the group consisting of a promoter, an enhancer, a silencer, a locus control region, and a border element. Nucleic acids according to the present invention typically comprise ribonucleic acids, including mRNA, DNA, cDNA, chromosomal DNA, extrachromosomal DNA, plasmid DNA, viral DNA or RNA, including also a recombinant viral vector. In one embodiment of the nucleic acid according to the invention the nucleic acid is DNA or RNA and in another preferred embodiment the nucleic acid is part of a plasmid or a recombinant viral vector. An inventive nucleic acid is preferably selected from any nucleic sequence encoding the amino acid sequence of the inventive polypeptide. Therefore, all nucleic acid variants coding for the above mentioned inventive mutated polypeptide variants including nucleic acid variants with varying nucleotide sequences due to the degeneration of the genetic code. In particular nucleotide sequences of nucleic acid variants which lead to an improved expression of the encoded fusion protein in a selected host organism, are preferred. Tables for appropriately adjusting a nucleic acid sequence to the host cell's specific transcription/translation machinery are known to a skilled person. In general, it is preferred to adapt the G/C-content of the nucleotide sequence to the specific host cell conditions. For expression in human cells an increase of the G/C content by at least 10%, more preferred at least 20%, 30%, 50%, 70% and even more preferred 90% of the maximum G/C content (coding for the respective inventive peptide variant) is preferred. Preparation and purification of such nucleic acids and/or derivatives are usually carried out by standard procedures.
These sequence variants preferably lead to inventive polypeptides or proteins selected from variants of the polypeptide with transpositional activity of the invention comprising an amino acid sequence according to SEQ ID NO: 1-17, preferably SEQ ID NO: 1, which have at least one amino acid substituted as compared to the native nucleic acid sequence of the corresponding polypeptide with transpositional activity. Therefore, inventive nucleic acid sequences code for modified (non-natural) variants of said polypeptide. Further, promoters or other expression control regions can be operably linked with the nucleic acid encoding the inventive polypeptide to regulate expression of the polypeptide/protein in a quantitative or in a tissue-specific manner.
In a third aspect, the invention relates to a vector comprising the nucleic acid molecule of the second aspect as described above. As already mentioned above the nucleic acid encoding the inventive polypeptide may be RNA or DNA. Similarly, either the inventive nucleic acid encoding the inventive polypeptide or the transposon can be a linear fragment or a circularized, isolated fragment or be inserted into a vector, preferably as a plasmid or as recombinant viral DNA.
In a fourth aspect, the invention relates to a cell comprising the nucleic acid of the second aspect or the vector of the third aspect.
In one embodiment the cell is from an animal, preferably a vertebrate, which is preferably selected from the group consisting of a fish, a bird, or a mammal, preferably a mammal, e.g., a human. The cell can be an immune cell, e.g., a T cell, such as a primary T cell. It can also be a tumor cell.
In a fifth aspect, the invention relates to an in vitro method of integrating an exogenous nucleic acid into the genome of a cell, the method comprising the steps of:
The polypeptide may also be provided to the cell in the form of a polypeptide, e.g., by adding said polypeptide to the cell culture medium.
In one embodiment, the nucleic acid of the invention, the vector of the invention and/or the exogenous nucleic acid is provided in the cell using a method selected from the group consisting of: electroporation, microinjection, lipofection. Electroporation has proven to be particularly advantageous, e.g., for transducing primary cells.
The cell is from an animal, preferably a vertebrate, which is preferably selected from the group consisting of a fish, a bird, or a mammal, preferably a mammal, such as a human.
In a sixth aspect, the invention relates to an in vivo method of integrating an exogenous nucleic acid into the genome of a cell of a subject, the method comprising the step of administering the polypeptide, the nucleic acid, or the vector of any of the afore mentioned aspects of the invention and a nucleic acid comprising the exogenous nucleic acid or a vector comprising said nucleic acid to the subject.
In one embodiment the exogenous nucleic acid is comprised in the nucleic acid or the vector of the invention. It can also be administered separately.
In one embodiment the nucleic acid, or the vector of the invention or the exogenous nucleic acid is provided in the cell using a method selected from the group consisting of: electroporation, microinjection, lipoprotein particles, virus-like particles.
The exogenous nucleic acid may be DNA or RNA.
In another aspect, the invention relates to the mentioned polypeptide, nucleic acid, vector, or cell for use in medicine, in particular in gene therapy.
In one embodiment the gene therapy includes but is not limited to autologous or heterologous T cell therapy, gene therapy targeting any cell type in the blood, hematopoietic stem cell therapy, liver gene therapy, gene therapy of the central nervous system, eye gene therapy, muscle gene therapy, skin gene therapy and/or a gene therapy for treatment of cancer.
In general the therapeutic applications of this invention may be manifold and thus polypeptide, the nucleic acid, the vector, the cell and especially the methods of integrating an exogenous nucleic acid into the genome of a cell according to the invention may also find use in therapeutic applications, in which the aforementioned are employed to stably integrate a therapeutic nucleic acid (“nucleic acid of therapeutic interest”), e.g. gene (nucleic acid of therapeutic interest), into the genome of a target cell, i.e. gene therapy applications. This may also be of interest for vaccination therapy for the integration of antigens into antigen presenting cells, e.g. specific tumor antigens, e.g. MAGE-1, for tumor vaccination or pathological antigens for the treatment of infectious diseases derived from pathogens, e.g. leprosy, tetanus, Whooping Cough, Typhoid Fever, Paratyphoid Fever, Cholera, Plague, Tuberculosis, Meningitis, Bacterial Pneumonia, Anthrax, Botulism, Bacterial Dysentry, Diarrhoea, Food Poisoning, Syphilis, Gasteroenteritis, Trench Fever, Influenza, Scarlet Fever, Diphtheria, Gonorrhoea, Toxic Shock Syndrome, Lyme Disease, Typhus Fever, Listeriosis, Peptic Ulcers, and Legionnaires' Disease; for the treatment of viral infections resulting in e.g. Acquired Immunodeficiency Syndrome, Adenoviridae Infections, Alphavirus Infections, Arbovirus Infections, Borne Disease, Bunyaviridae infections, Caliciviridae Infections, Chickenpox, Condyloma Acuminata, Coronaviridae Infections, Coxsackievirus Infections, Cytomegalovirus Infections, Dengue, DNA Virus Infections, Ecthyma, Contagious, Encephalitis, Arbovirus, Epstein-Barr Virus Infections, Erythema Infectiosum, Hantavirus Infections, Hemorrhagic Fevers, Viral, Hepatitis, Viral, Human, Herpes Simplex, Herpes Zoster, Herpes Zoster Oticus, Herpesviridae Infections, Infectious Mononucleosis, Influenza in Birds, Influenza, Human, Lassa Fever, Measles, Molluscum Contagiosum, Mumps, Paramyxoviridae Infections, Phlebotomus Fever, Polyomavirus Infections, Rabies, Respiratory Syncytial Virus Infections, Rift Valley Fever, RNA Virus Infections, Rubella, Slow Virus Diseases, Smallpox, Subacute Sclerosing Panencephalitis, Tumor Virus Infections, Warts, West Nile Fever, Virus Diseases, Yellow Fever; for the treatment of protozoological infections resulting in e.g. malaria. The subject methods may be used to deliver a wide variety of therapeutic nucleic acids. Therapeutic nucleic acids of interest include genes that replace defective genes in the target host cell, such as those responsible for genetic defect based diseased conditions; genes which have therapeutic utility in the treatment of cancer; and the like.
In a further aspect, the invention relates to a pharmaceutical composition comprising the mentioned polypeptide, nucleic acid, vector or cell and a pharmaceutically acceptable carrier, adjuvant or vehicle.
The pharmaceutical composition may further comprise one or more carriers and/or excipients, all of which are preferably pharmaceutically acceptable. According to the invention, a pharmaceutical composition contains an effective amount of the active agents, e.g., the polypeptide, nucleic acid, vector, or cell described herein, to generate the desired reaction or the desired effect. A pharmaceutical composition in accordance with the present invention is preferably sterile. Pharmaceutical compositions can be provided in a uniform dosage form and may be prepared in a manner known per se. A pharmaceutical composition in accordance with the present invention may, e.g., be in the form of a solution or suspension.
Pharmaceutically acceptable carriers, adjuvants or vehicles that may be used in the compositions of this invention include, but are not limited to, ion exchangers, alumina, aluminum stearate, lecithin, serum proteins, such as human serum albumin, buffer substances such as phosphates, glycine, sorbic acid, potassium sorbate, partial glyceride mixtures of saturated vegetable fatty acids, water, salts or electrolytes, such as protamine sulfate, disodium hydrogen phosphate, potassium hydrogen phosphate, sodium chloride, zinc salts, colloidal silica, magnesium trisilicate, polyvinyl pyrrolidone, cellulose-based substances, sodium carboxymethylcellulose, polyacrylates, waxes, polyethylene-polyethylene glycol, polyoxypropylene-block polymers, polyethylene glycol and wool fat. The pharmaceutical compositions of the present invention may be administered orally, parenterally, by inhalation spray, topically, rectally, nasally, buccally, vaginally or via an implanted reservoir. The term parenteral as used herein includes subcutaneous, intravenous, intramuscular, intra-articular, intra-synovial, intrasternal, intrathecal, intrahepatic, intralesional and intracranial injection or infusion techniques. Preferably, the pharmaceutical compositions are administered orally, intraperitoneally or intravenously. Sterile injectable forms of the pharmaceutical compositions of this invention may be aqueous or oleaginous suspension. These suspensions may be formulated according to techniques known in the art using suitable dispersing or wetting agents and suspending agents. The sterile injectable preparation may also be a sterile injectable solution or suspension in a non-toxic parenterally acceptable diluent or solvent, for example as a solution in 1,3-butanediol. Among the acceptable vehicles and solvents that may be employed are water, Ringer's solution and isotonic sodium chloride solution. In addition, sterile, fixed oils are conventionally employed as a solvent or suspending medium.
The inventive pharmaceutical composition is preferably suitable for the treatment of diseases, particular diseases caused by gene defects such as cystic fibrosis, hypercholesterolemia, hemophilia, e.g. A, B, C or XIII, immune deficiencies including HIV, Huntington disease, a-anti-Trypsin deficiency, as well as cancer selected from colon cancer, melanomas, kidney cancer, lymphoma, acute myeloid leukemia (AML), acute lymphoid leukemia (ALL), chronic myeloid leukemia (CML), chronic lymphocytic leukemia (CLL), gastrointestinal tumors, lung cancer, gliomas, thyroid cancer, mamma carcinomas, prostate tumors, hepatomas, diverse virus-induced tumors such as e.g. papilloma virus induced carcinomas (e.g. cervix carcinoma), adeno carcinomas, herpes virus induced tumors (e.g. Burkitt's lymphoma, EBV induced B cell lymphoma), Hepatitis B induced tumors (Hepato cell carcinomas), HTLV-1 und HTLV-2 induced lymphoma, akustikus neurinoma, lung cancer, pharyngeal cancer, anal carcinoma, glioblastoma, lymphoma, rectum carcinoma, astrocytoma, brain tumors, stomach cancer, retinoblastoma, basalioma, brain metastases, medullo blastoma, vaginal cancer, pancreatic cancer, testis cancer, melanoma, bladder cancer, Hodgkin syndrome, meningeoma, Schneeberger's disease, bronchial carcinoma, pituitary cancer, mycosis fungoides, gullet cancer, breast cancer, neurinoma, spinalioma, Burkitt's lymphoma, lyryngeal cancer, thymoma, corpus carcinoma, bone cancer, non-Hodgkin lymphoma, urethra cancer, CUP-syndrome, oligodendroglioma, vulva cancer, intestinal cancer, oesphagus carcinoma, small intestine tumors, craniopharyngeoma, ovarial carcinoma, ovarian cancer, liver cancer, leukemia, or cancers of the skin or the eye; etc.
There are a variety of alternative techniques and procedures available to those of skill in the art which would similarly permit one to successfully practice the intended invention.
In order to assess the relative effects of single amino acid replacements at positions 187, 247 and 248 on transposition, the SB100X transposase was subjected to saturation mutagenesis by incorporating all possible amino acids by site-directed PCR mutagenesis. All constructs encoding the mutant SB100X transposases showed protein expression levels comparable to that of SB100X by Western blot analysis.
Next we evaluated the mutants for their transposition activities relative to SB100X by applying a cell-based transposition assay in human cells that was finetuned to obtain a single transposon integration per cell. Briefly, a donor plasmid carrying a puromycin (puro) resistance gene-marked transposon was co-transfected together with a helper plasmid encoding either SB100X, the inactive E279D transposase (D3) or a mutant variant of the SB100X transposase.
In contrast to the H187 mutations, the vast majority of the P247 mutations was either completely inactive or displayed a severe reduction in both overall transposition (
Similar to P247, the amino acid exchanges at position 248 strongly reduced transposition in almost all of the nineteen mutants (
As described recently, some of the K248 mutants displayed a segregation of transposon excision and integration activities (data not shown).
In order to investigate if the mutations have an impact on target site selection, we used some of the SB mutants described above for the generation of transposon insertion site libraries in human HepG2 cells, and compared both the local attributes as well as genome-wide distributions of these insertion sites to those generated by SB100X. For position 187 we selected 6 mutants (H187R, H187E, H187S, H187T, H187V, H187P) with a marked impact on transposition efficiency (
Although SB100X and the P247S and P247R mutants target this particular sequence only 2-3% of the time, some other mutants display significantly higher frequencies of integration into this motif (18% for H187P, 21% for H187V and 39% for K248R,
We determined the relative frequencies of integration into genomic features including genic and non-genic regions, cancer genes, exons, introns, 5′- and 3′-UTRs and sequences flanking genes upstream and downstream within 10-kb windows, relative to a computer-generated random data set. We not only compared insertions within these genomic features generated by SB100X and its mutants, but also involved MLV gammaretroviral and HIV lentiviral integration sites in the analysis. Both of these viral systems are popular vectors in gene therapy. As established previously, SB transposon insertions show only a minor bias toward genes and their flanking regions, in contrast to MLV and HIV insertions that are enriched in loci flanking transcriptional start sites (TSS) and within actively transcribed genes, respectively (
Next, we selected a small subset of our mutants; namely the P247R, H187V and K248R mutants, and addressed if these mutants generate a recognizably different genome-wide distribution profile than that of SB100X. The data presented in
Next, we profiled insertions generated by SB100X and the P247R, H187V and K248R mutants in functional genomic segments. These segments are defined by cooccurring epigenetic signal patterns, which are clustered together computationally to comprise various functional partitions of the human genome. We used 25-state chromatin models of human HepG2 cells. As above, we included MLV and HIV insertions in the analysis. First, in line with previous observations, SB transposon insertions show only a minor bias towards promoters, TSSs, enhancers and transcriptional regulatory regions with an open chromatin structure, in contrast to MLV and HIV insertions that display the highest enrichment in promoter regions including TSSs and transcribed regions, respectively (
Integration of therapeutic gene constructs into safe sites in the human genome would prevent insertional mutagenesis and associated risks of oncogenesis in gene therapy. Genomic “safe harbors” (GSHs) are regions of the human genome that are able to accommodate the predictable expression of newly integrated DNA without adverse effects on the host cell or organism. GSHs can be bioinformatically allocated to chromosomal sites or regions if they satisfy the following criteria: (i) no overlap with transcription units, (ii) distance of at least 50 kb from the 5′-end of any gene, iii) at least 300 kb distance to cancer related genes and (iv) microRNA genes, and (v) regions outside of ultra-conserved elements (UCEs) 58, 68. We have previously established that the SB transposon system has a significantly more favorable insertion profile than MLV- and HIV-based viral integration systems with respect to frequencies of insertions into GSHs.
The data presented above establish that some of our SB transposase mutants significantly de-target insertions away from exons as well as transcriptional regulatory regions of genes, thereby inherently implying that a larger fraction of insertions catalyzed by these enzymes lands in GSHs. We analyzed our insertion site datasets with respect to the relative frequencies of integration into GSHs by the P247R, H187V and K248R transposase variants and, as above, we included MLV and HIV insertions in the analysis (
High-density integration profiling of the Hermes transposon in yeast revealed a strong association of integration sites with nucleosome-free chromatin. In addition, recent evidence indicates that Tc1/mariner transposons preferentially integrate at linker regions between nucleosomes. We mapped our insertion datasets with respect to nucleosome occupancy as determined by MNase-Seq data. As seen previously, transposon insertions mediated by SB100X are underrepresented in nucleosomal DNA (
The above data establish that the H187V and K248R transposase mutants detarget exons and transcriptional regulatory regions of genes and avoid nucleosomal DNA for integration. However, these data do not necessarily shed light on causative relationships between these independent observations. For example, exons tend to be associated with nucleosomes; thus, depletion of integrations in exonic sequences by H187V and K248R may be a mere reflection of these transposase variants avoiding nucleosomal DNA. However, transcriptional regulatory regions including enhancers and TSSs are clearly depleted in nucleosomes; nonetheless we found that H187V and K248R detarget these genomic regions, suggesting that nucleosome occupancy in itself cannot sufficiently explain why H187V and K248R integrations are depleted in both exons as well as regulatory regions. Because the genomic segments are defined by chromatin marks, the question arises whether it is the local chromatin structure or the underlying primary DNA sequence that modulates integration frequencies in these segments. Consistent with previous findings 27, the SB100X transposase catalyzes an almost random insertion profile in human cells with a slight bias for euchromatin marks (including H3K4me1, H3K27ac, H3K36me3 and H3K29me2) (
As in Example 3, the relative frequencies of integration into genomic features including genic and non-genic regions, cancer genes, exons, introns, 5′- and 3′-UTRs and sequences flanking genes upstream and downstream within 10-kb windows, were determined relative to a computer-generated random data set. Again, we included MLV gammaretroviral and HIV lentiviral integration sites in the analysis.
This time, we analyzed a subset of additional mutants; namely H187P, H187R, H187E, H187S, H187T, P247S, P247A, P247C, P247S, K248C, K248I and K248V and again addressed if these mutants generate a recognizably different genome-wide distribution profile than that of SB100X (
Number | Date | Country | Kind |
---|---|---|---|
21195842.6 | Sep 2021 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2022/075007 | 9/8/2022 | WO |