The invention relates to a method for engineering a LAGLIDADG homing endonuclease variant, having mutations in two functional subdomains, each binding a distinct part of a modified DNA target half-site, said LAGLIDADG homing endonuclease variant being able to cleave a chimeric DNA target sequence comprising the nucleotides bound by each subdomain.
The invention relates also to a LAGLIDADG homing endonuclease variant obtainable by said method, to a vector encoding said variant, to a cell, an animal or a plant modified by said vector and to the use of said I-CreI endonuclease variant and derived products for genetic engineering, genome therapy and antiviral therapy.
Meganucleases are by definition sequence-specific endonucleases with large (>14 bp) cleavage sites that can deliver DNA double-strand breaks (DSBs) at specific loci in living cells (Thieny and Dujon, Nucleic Acids Res., 1992, 20, 5625-5631). Meganucleases have been used to stimulate homologous recombination in the vicinity of their target sequences in cultured cells and plants (Rouet et al., Mol. Cell. Biol., 1994, 14, 8096-106; Choulika et al., Mol. Cell. Biol., 1995, 15, 1968-73; Donoho et al., Mol. Cell. Biol, 1998, 18, 4070-8; Elliott et al., Mol. Cell. Biol., 1998, 18, 93-101; Sargent et al., Mol. Cell. Biol., 1997, 17, 267-77; Puchta et al., Proc. Natl. Acad. Sci. USA, 1996, 93, 5055-60; Chiurazzi et al., Plant Cell, 1996, 8, 2057-2066), making meganuclease-induced recombination an efficient and robust method for genome engineering. The use of meganuclease-induced recombination has long been limited by the repertoire of natural meganucleases, and the major limitation of the current technology is the requirement for the prior introduction of a meganuclease cleavage site in the locus of interest. Thus, the making of artificial meganucleases with tailored substrate specificities is under intense investigation. Such proteins could be used to cleave genuine chromosomal sequences and open new perspectives for genome engineering in wide range of applications. For example, meganucleases could be used to induce the correction of mutations linked with monogenic inherited diseases, and bypass the risk due to the randomly inserted transgenes used in current gene therapy approaches (Hacein-Bey-Abina et al., Science, 2003, 302, 415-419).
Recently, Zinc-Finger DNA binding domains of Cys2-His2 type Zinc-Finger Proteins (ZFP) could be fused with the catalytic domain of the Fokl endonuclease, to induce recombination in various cell types, including human lymphoid cells (Smith et al., Nucleic Acids Res, 1999, 27, 674-81; Pabo et al, Annu. Rev. Biochem, 2001, 70, 313-40; Porteus and Baltimore, Science, 2003, 300, 763; Urnov et al., Nature, 2005, 435, 646-651; Bibikova et al, Science, 2003, 300, 764). The binding specificity of ZFPs is relatively easy to manipulate, and a repertoire of novel artificial ZFPs, able to bind many (g/a)nn(g/a)nn(g/a)nn sequences is now available (Pabo et al., precited; Segal and Barbas, Curr. Opin. Biotechnol., 2001, 12, 632-7; Isalan et al., Nat. Biotechnol., 2001, 19, 656-60). However, preserving a very narrow specificity is one of the major issues for genome engineering applications, and presently it is unclear whether ZFPs would fulfill the very strict requirements for therapeutic applications. Furthermore, these fusion proteins have demonstrated high toxicity in cells (Porteus and Baltimore, precited; Bibikova et al, Genetics, 2002, 161, 1169-1175)), probably due to a low level of specificity.
In nature, meganucleases are essentially represented by homing endonucleases (HEs), a family of endonucleases encoded by mobile genetic elements, whose function is to initiate DNA double-strand break (DSB)-induced recombination events in a process referred to as homing (Chevalier and Stoddard, Nucleic Acids Res., 2001, 29, 3757-74; Kostriken et al., Cell; 1983, 35, 167-74; Jacquier and Dujon, Cell, 1985, 41, 383-94). Several hundreds of HES have been identified in bacteria, eukaryotes, and archea (Chevalier and Stoddard, precited); however the probability of finding a HE cleavage site in a chosen gene is very low.
Given their biological function and their exceptional cleavage properties in terms of efficacy and specificity, HEs provide ideal scaffolds to derive novel endonucleases for genome engineering. Data have been accumulated over the last decade, characterizating the LAGLIDADG family, the largest of the four HE families (Chevalier and Stoddard, precited). LAGLIDADG refers to the only sequence actually conserved throughout the family and is found in one or (more often) two copies in the protein. Proteins with a single motif, such as I-CreI, form hornodimers and cleave palindromic or pseudo-palindromic DNA sequences, whereas the larger, double motif proteins, such as I-SceI are monomers and cleave non-palindromic targets. Seven different LAGLIDADG proteins have been crystallized, and they exhibit a very striking conservation of the core structure, that contrasts with the lack of similarity at the primary sequence level (Jurica et al., Mol. Cell., 1998, 2, 469-76; Chevalier et al., Nat. Struct. Biol., 2001, 8, 312-6; Chevalier et al. J. Mol. Biol., 2003, 329, 253-69; Moure et al., J. Mol. Biol, 2003, 334, 685-95; Moure et al., Nat. Struct. Biol., 2002, 9, 764-70; Ichiyanagi et al., J. Mol. Biol., 2000, 300, 889-901; Duan et al., Cell, 1997, 89, 555-64; Bolduc et al., Genes Dev., 2003, 17, 2875-88; Silva et al., J. Mol. Biol., 1999, 286, 1123-36). In this core structure, two characteristic αββαββα folds, also called LAGLIDADG homing endonuclease core domains, contributed by two monomers, or by two domains in double LAGLIDAG proteins, are facing each other with a two-fold symmetry. DNA binding depends on the four ρ strands from each domain, folded into an antiparallel β-sheet, and forming a saddle on the DNA helix major groove (
Two approaches for deriving novel endonucleases from homing endonucleases, are under investigation:
Analysis of the I-CreI/DNA crystal structure indicates that 9 amino acids make direct contacts with the homing site (Chevalier et al., 2003; Jurica et al, precited) which randomization would result in 209 combinations, a number beyond any screening capacity today.
Therefore, several laboratories have relied on a semi-rational approach (Chita et al., Curr. Opin. Biotechnol., 2005, 16, 378-384) to limit the diversity of the mutant libraries to be handled: a small set of relevant residues is chosen according to structural data. Nevertheless, this was still not sufficient to create redesigned endonucleases cleaving chosen sequences:
The construction of chimeric and single chain artificial HEs has suggested that a combinatorial approach could be used to obtain novel meganucleases cleaving novel (non-palindromic) target sequences: different monomers or core domains could be fused in a single protein, to achieve novel specificities. These results mean that the two DNA binding domains of an I-CreI dimer behave independently; each DNA binding domain binds a different half of the DNA target site (
To reach a larger number of sequences, it would be extremely valuable to be able to identify smaller independent subdomains that could be combined (
However, a combinatorial approach is much more difficult to apply within a single monomer or domain than between monomers since the structure of the binding interface is very compact and the two different ββ hairpins which are responsible for virtually all base-specific interactions do not constitute separate subdomains, but are part of a single fold. For example, in the internal part of the DNA binding regions of I-CreI, the gtc triplet is bound by one residue from the first hairpin (Q44), and two residues from the second hairpin (R68 and R70; see
In spite of this lack of apparent modularity at the structural level, the Inventor has identified separable functional subdomains, able to bind distinct parts of a homing endonuclease half-site (
The different subdomains can be modified separately to engineer new cleavage specificities and the combination of different subdomains in one meganuclease (homodimer, heterodimer, single-chain chimeric molecule) increases considerably the number of DNA targets which can be cleaved by meganucleases. Thus, the identification of a small number of new cleavers for each subdomain allows for the design of a very large number of novel endonucleases with new specificities. This approach was used to assemble four set of mutations into heterodimeric homing endonucleases with fully engineered specificity, to cleave a model target (COMB1) or a sequence from the human RAG1 gene. This is the first time a homing endonuclease is entirely redesigned to cleave a naturally occurring sequence.
Furthermore, in former studies, the targets of the engineered proteins differed from the initial wild-type substrate by 1 to 6 base pairs per site, whereas the 22 by COMB1 and RAG1 sequences differ from the I-CreI cleavage site (C1221) by 9 and 16 bp, respectively.
This new combinatioral approach which can be applied to any homing endonuclease (monomer with two domains or homodimer) considerably enriches the number of DNA sequences that can be targeted, resulting in the generation of dedicated meganucleases able to cleave sequences from many genes of interest. The generation of collections of I-CreI derivatives and the ability to combine them intermolecularly as well as intermolecularly, increases the number of attainable 22-mer targets to at least 1.57×107 ((64×62)2).
In addition, for genome engineering applications, the major advantage of HEs is their exquisite specificity, a feature that becomes essential when engaging into therapeutic applications.
Therefore, this approach provides a general method to create novel endonucleases cleaving chosen sequences. Potential applications include the cleavage of viral genomes specifically or the correction of genetic defects via double-strand break induced recombination, both of which lead to therapeutics.
The invention relates to a method for engineering a LAGLIDADG homing endonuclease variant derived from a parent LAGLIDADG homing endonuclease by mutation of two functional subdomains of the core domain, comprising at least the steps of:
(a) constructing a first variant having mutation(s) in a first functional subdomain of the core domain which interacts with a first part of one half of said parent LAGLIDADG homing endonuclease cleavage site, by:
(a1) replacing at least one amino acid of a first subdomain corresponding to that situated from positions 26 to 40 in I-CreI, with a different amino acid,
(a2) selecting and/or screening the first variants from step (a1) which are able to cleave a first DNA target sequence derived from said parent LAGLIDADG homing endonuclease half-site, by replacement of at least one nucleotide of said first part of the half-site, with a different nucleotide,
(b) constructing a second variant having mutation(s) in a second functional subdomain of the core domain which interacts with a second part of said parent LAGLIDADG homing endonuclease half-site, by:
(b1) replacing at least one amino acid of a second subdomain corresponding to that situated from positions 44 to 77 in I-CreI, with a different amino acid,
(b2) selecting and/or screening the second variants from step (b1) which are able to cleave a second DNA target sequence derived from said parent LAGLIDADG homing endonuclease half-site, by replacement of at least one nucleotide of said second part of the half-site, with a different nucleotide,
(c) constructing a third variant which has mutation(s) in the first and the second functional subdomains of said parent LAGLIDADG homing endonuclease, by:
(c1) combining the mutation(s) of two variants from step (a1) and step (b1) in a single variant, and
(c2) selecting and/or screening the variants from step (c1) which are able to cleave a chimeric DNA target sequence comprising the first part of the first variant DNA target half-site and the second part of the second variant DNA target half-site.
According to the method of the invention, each substitution is at the position of an amino acid residue which interacts with a DNA target half-site. The LAGLIDADG homing endonucleases DNA interacting residues are well-known in the art. The residues which are mutated may interact with the DNA backbone or with the nucleotide bases, directly or via a water molecule.
According to an advantageous embodiment of said method, the amino acid in step a1) or b1) is replaced with an amino acid which is selected from the group consisting of A, C, D, E, G, H, K, N, P, Q, R, S, T, L, V, W and Y. According to another advantageous embodiment of said method, the amino acid which is replaced in step a1) is situated from positions 28 to 40 in I-CreI.
According to another advantageous embodiment of said method, the amino acid which is replaced in step b1) is situated from positions 44 to 70 in I-CreI.
According to the method of the invention, each part of the DNA target half-site comprises at least two consecutive nucleotides, preferably three consecutive nucleotides, and the first and the second part are separated by at least one nucleotide, preferably at least two nucleotides.
According to another advantageous embodiment of said method, the first and the second part of said half-site are situated in the external and the internal quarter of said half-site, respectively.
According to the method of the invention, the parent DNA target may be palindromic, non-palindromic or pseudo-palindromic.
According to the invention, the positions of the subdomains are defined by reference to I-CreI structure (pdb accession code 1g9y). Knowing the positions of the subdomains in I-CreI, one skilled in the art can easily deduce the corresponding positions in another LAGLIDADG homing endonuclease, using well-known protein structure analyses softwares such as Pymol. For example, for I-MsoI, the two functional subdomains are situated from positions 30 to 43 and 47 to 75, respectively.
According to the method of the invention, the amino acid mutation(s) in step a1) or b1) are introduced in either a wild-type LAGLIDADG homing endonuclease or a functional variant thereof.
The parent LAGLIDADG homing endonuclease may be selected from the group consisting of: I-SceI, I-Chul, I-CreI, I-CsmI, PI-SceI, PI-Tlil, PI-MtuI, I-CeuI, I-SceII, I-Sce III, HO, PI-CivI, PI-CtrI, PI-AaeI, PI-BsuI, PI-DhaI, PI-DraI, PI-MavI, PI-MchI, PI-MfuI, PI-MflI, PI-MgoI, PI-MinI, PI-MkaI, PI-MleI, PI-MmaI, PI-MshI, PI-MsmI, PI-MthI, PI-MtuI, PI-NpuI, PI-PfuI, PI-RmaI, PI-SpbI, PI-SspI, PI-FacI, PI-PhoI, PI-TagI, PI-ThyI, PI-TkI, PI-TspI, I-MsoI, and I-AniI; preferably, I-CreI, I-SceI, I-ChuI, I-DmoI, I-CsmI, PI-SceI, PI-Pful, PI-TliI, PI-MtuI, I-MsoI, I-Anil and I-CeuI; more preferably, I-CreI, I-MsoI, I-SceI, I-AniI, I-DmoI, PI-SceI, and PI-PfuI; still more preferably I-CreI.
Functional variants comprise mutations that do not affect the protein structure. For example, the parent homing endonuclease may be an I-Cre1 variant comprising one or more mutations selected from the group consisting of:
Step a1) or b1) may comprise the introduction of additional mutations, particularly at other positions contacting the DNA target sequence or interacting directly or indirectly with said DNA target.
This step may be performed by generating a library of variants as described in the International PCT Application WO 2004/067736.
The combination of mutations in step c1) may be performed by amplifying overlapping fragments comprising each of the two subdomains, according to well-known overlapping PCR techniques.
The selection and/or screening in step a2), b2) or c2) may be performed by using a cleavage assay in vitro or in vivo, as described in the International PCT Application WO 2004/067736.
According to another advantageous embodiment of said method, step a2), b2), and/or c2) are performed in viva, under conditions where the double-strand break in the mutated DNA target sequence which is generated by said variant leads to the activation of a positive selection marker or a reporter gene, or the inactivation of a negative selection marker or a reporter gene, by recombination-mediated repair of said DNA double-strand break.
For example, the cleavage activity of the variant of the invention may be measured by a direct repeat recombination assay, in yeast or mammalian cells, using a reporter vector, as described in the PCT Application WO 2004/067736. The reporter vector comprises two truncated, non-functional copies of a reporter gene (direct repeats) and a chimeric DNA target sequence within the intervening sequence, cloned in a yeast or a mammalian expression vector. The chimeric DNA target sequence is made of the combination of the different parts of each initial variant half-site. Expression of the variant results in a functional endonuclease which is able to cleave the chimeric DNA target sequence. This cleavage induces homologous recombination between the direct repeats, resulting in a functional reporter gene, whose expression can be monitored by appropriate assay.
According to another advantageous embodiment of said method, it comprises a further step d1) of expressing one variant obtained in step c2), so as to allow the formation of homodimers. Said homodimers are able to cleave a palindromic or pseudo-palindromic chimeric target sequence comprising two different parts, each from one of the two initial variants half-sites (
According to another advantageous embodiment of said method, it comprises a further step d′1) of co-expressing one variant obtained in step c2) and a wild-type LAGLIDADG homing endonuclease or a functional variant thereof, so as to allow the formation of heterodimers. Preferably, two different variants obtained in step c2) are co-expressed. Said heterodimers are able to cleave a non-palindromic chimeric target sequence comprising four different parts (A, B, C′, D′;
For example, host cells may be modified by one or two recombinant expression vector(s) encoding said variant(s). The cells are then cultured under condi-tions allowing the expression of the variant(s) and the homodimers/heterodimers which are formed are then recovered from the cell culture.
According to the method of the invention, single-chain chimeric endonucleases may be constructed by the fusion of one variant obtained in step c2) with a horning endonuclease domain/monomer. Said domain/monomer may be from a wild-type homing endonuclease or a functional variant thereof.
Methods for constructing single-chain chimeric molecules derived from homing endonucleases are well-known in the art (Epinat et al., Nucleic Acids Res., 2003, 31, 2952-62; Chevalier et al., Mol. Cell., 2002, 10, 895-905; Steuer et al., Chembiochem., 2004, 5, 206-13; International PCT Applications WO 03/078619 and WO 2004/031346). Any of such methods, may be applied for constructing single-chain chimeric endonucleases derived from the variants as defined in the present invention.
The subject matter of the present invention is also a LAGLIDADG homing endonuclease variant obtainable by the method as defined above.
In a first preferred embodiment of said variant, it is an I-Cre1 variant having at least two substitutions, one in each of the two subdomains situated from positions 26 to 40 and 44 to 77 of I-CreI, respectively.
In a more preferred embodiment, said substitution(s) in the subdomain situated from positions 44 to 77 of I-CreI are in positions 44, 68, 70, 75 and/or 77.
In another more preferred embodiment, said substitution(s) in the functional subdomain situated from positions 26 to 40 of I-CreI are in positions 26, 28, 30, 32, 33, 38 and/or 40 of I-CreI.
In another more preferred embodiment of said variant, it has at least one first substitution in positions 28 to 40 of I-CreI and one second substitution in positions 44 to 70 of I-CreI.
Preferably, said variant has amino acid residues in positions 44, 68 and 70, which are selected from the group consisting of: A44/A68/A70, A44/A68/G70, A44/A68/H70, A44/A68/K70, A44/A68/N70, A44/A68/Q70, A44/A68/R70, A44/A68/S70, A44/A68/T70, A44/D68/H70, A44/D68/K70, A44/D68/R70, A44/G68/H70, A44/G68/K70, A44/G68/N70, A44/G68/P70, A44/G68/R70, A44/H68/A70, A44/H68/G70, A44/H68/H70, A44/1-168/K70, A44/H68/N70, A44/H68/Q70, A44/H68/R70, A44/H68/S70, A44/H68/T70, A44/K68/A70, A44/K68/G70, A44/K68/H70, A44/K68/K70, A44/K68/N70, A44/K68/Q70, A44/K68/R70, A44/K68/S70, A44/K68/T70, A44/N68/A70, A44/N68/E70, A44/N68/G70, A44/N68/1-170, A44/N68/K 70, A44/N68/N70, A44/N68/Q70, A44/N68/R70, A44/N68/S 70, A44/N68/T70, A44/Q68/A70, A44/Q68/D70, A44/Q68/G70, A44/Q68/H70, A44/Q68/N70, A44/Q68/R70, A44/Q68/S70, A44/R68/A70, A44/R68/D 70, A44/R68/E70, A44/R68/G70, A44/R68/H70, A44/R681K70, A44/R68/L70, A44/R68/N70, A44/R68/R70, A44/R68/S 70, A44/R68/T70, A44/S68/A70, A44/S 68/G70, A44/S68/K70, A44/S68/N70, A44/S68/Q70, A44/S68/R70, A44/S 68/S 70, A44/S68/T70, A44/T68/A70, A44/T68/G70, A44/T68/H70, A44/T68/K70, A44/T68/N70, A44/T68/Q70, A44/T68/R70, A44/T68/S 70, A44/T68/T70, D44/D68/H70, D44/N68/S 70, D44/R68/A70, D44/R68/K70, D44/R68/N70, D44/R68/Q70, D44/R68/R70, D44/R68/S 70, D44/R68/T70, E44/H68/1-170, E44/R68/A70, E44/R68/H70, E44/R68/N70, E44/R68/S70, E44/R68/T70, E44/S68/T70, G44/H68/K70, G44/Q68/H70, G44/R68/Q70, G44/R68/R70, G44/T68/D70, G44/T68/P70, G44/T68/R70, H44/A68/S70, H44/A68/T70, H44/R68/A70, H44/R68/D70, H44/R68/E70, H44/R68/G70, H44/R68/N70, H44/R68/R70, H44/R68/S 70, H44/R68/T70, H44/S68/G70, 1144/S 68/S70, H44/S68/T70, H44/T68/S 70, H44/T68/T70, K44/A68/A70, K44/A68/D70, K44/A68/E70, K44/A68/G70, K44/A68/H70, K44/A68/N70, K44/A68/Q70, K44/A68/S 70, K44/A68/T70, K44/D68/A70, K44/D68/T70, K44/E68/G70, K44/E68/N70, K44/E68/S 70, K44/G68/A70, K44/G68/G70, K44/G68/N70, K44/G68/S 70, K44/G68/T70, K44/H68/D70, K44/H68/E70, K44/H68/G70, K44/H68/N70, K44/H68/S70, K44/H68/T70, K44/K68/A70, K44/K68/D70, K441K68/H70, K44/K68/T70, K44/N68/A70, K44/N68/D70, K44/N68/E70, K44/N68/G70, K44/N68/H70, K44/N68/N70, K44/N68/Q70, K44/N68/S70, K44/N68/T70, K44/P68/H70, K44/Q68/A70, K44/Q68/D70, K44/Q68/E70, K44/Q68/S70, K44/Q68/T70, K44/R68/A70, K44/R681D70, K44/R68/E70, K44/R68/G70, K44/R68/H70, K44/R68/N70, K44/R68/Q70, K44/R68/S70, K44/R68/T70, K44/S 68/A70, K44/S68/D70, K44/S68/H70, K44/S 68/N70, K44/S 68/S70, K44/S 68/T70, K44/T68/A70, K44/T68/D70, K44/T68/E70, K44/T68/G70, K44/T68/H70, K44/T68/N70, K44/T68/Q70, K44/T68/S70, K44/T68/T70, N44/A68/H70, N44/A68/R70, N44/H68/N70, N44/H68/R70, N44/K68/G70, N44/K68/H70, N44/K68/R70, N44/K68/S70, N44/N68/R70, N44/P68/D70, N44/Q68/H70, N44/Q68/R70, N44/R68/A70, N44/R68/D70, N44/R68/E70, N44/R68/G70, N44/R68/H70, N44/R68/K70, N44/R68/N70, N44/R68/R70, N44/R68/S 70, N44/R68/T70, N44/S 68/G70, N44/S68/1170, N44/S 68/K70, N44/S 68/R70, N44/T68/H70, N44/T68/K70, N44/T68/Q70, N44/T68/R70, N44/T68/S70, P44/N68/D70, P44/T68/T70, Q44/A68/A70, Q44/A68/H70, Q44/A68/R70, Q44/G68/K70, Q44/G68/R70, Q44/K68/G70, Q44/N68/A70, Q44/N68/H70, Q44/N68/S70, Q44/P681P70, Q44/Q68/G70, Q44/R68/A70, Q44/R68/D70, Q44/R68/E70, Q44/R68/G70, Q44/R68/H70, Q44/R68/N70, Q44/R68/Q70, Q44/R68/S70, Q44/S68/H70, Q44/S68/R70, Q44/S68/S70, Q44/T68/A70, Q44/T68/G70, Q44/T68/H70, Q44/T68/R70, R44/A68/G70, R44/A68/T70, R44/G68/T70, R44/H68/D70, R44/H68/T70, R44/N68/T70, R44/R68/A70, R441R68/D70, R44/R68/E70, R44/R68/G70, R44/R68/N70, R44/R68/Q70, R44/R68/S70, R44/R68/T70, R44/S68/G70, R44/S68/N70, R44/S68/S70, R44/S68/T70, S44/D68/K70, S44/H68/R70, S44/R68/G70, S44/R68/N70, S44/R68/R70, S44/R68/S70, T44/A68/K70, T44/A68/R70, T44/H68/R70, T44/K68/R70, T44/N68/P70, T44/N68/R70, T44/Q68/K70, T44/Q68/R70, T44/R68/A70, T44/R68/D70, T44/R68/E70, T44/R68/G70, T44/R68/H70, T44/R68/K70, T44/R68/N70, T44/R68/Q70, T44/R68/R70, T44/R68/S70, T44/R68/T70, T44/S68/K70, T44/S68/R70, T44/T68/K70, and
T44/T68/R70.
Preferably, said variant has amino acid in positions 28, 30, 33, 38 and 40 respectively, which are selected from the group consisting of: QNYKR, RNKRQ, QNRRR, QNYKK, QNTQK, QNRRK, KNTQR, SNRSR, NNYQR, KNTRQ, KNSRE, QNNQK, SNYRK, KNSRD, KNRER, KNSRS, RNRDR, ANSQRQNYRK, QNKRT, RNAYQ, KNRQE, NNSRK, NNSRR, QNYQK, QNYQR, SNRQR, QNRQK, ENRRK, KNNQA, SNYQK, TNRQR, QNTQR, KNRTQ, KNRTR, QNEDH, RNYNA, QNYTR, RNTRA, HNYDS, QNYRA, QNYAR, SNQAA, QNYEK, TNNQR, QNYRS, KNRQR, QNRAR, QNNQR, RNRER, KNRAR, KNTAA, KNRKA, RNAKS, KNRNA, TNESD, RNNQD, RNRYQ, KNYQN, KNRSS, KNRYA, ANNRK, KNRAT, KNRNQ, TNTQR, KNRQY, QNSRK, RNYQS, QNRQR, KNRAQ, ANRQR, KNRQQ, KNRQA, KNTAS, KAHRS, KHHRS, KDNHS, KESRS, KHTPS, KGHYS, KARQS, KSRGS, KSHHS, KNHRS, KRRES, KDGHS, KRHGS, KANQS, KDHKS, KKHRS, KQNQS, KQTQS, KGRQS, KRPGS, KRGNS, KNAQS, KNHNS, KHHAS, KRGSS, KSRQS, KTDHS, KHHQS, KADHS, KSHRS, KNRAS, KSHQS, KDAHS, KNHES, KDRTS, KDRSS, KAHQS, KRGTS, KNHSS, KQHQS, KNHGS, KNDQS, KNDQS, KDRGS, KNHAS, KHMAS, KSSHS, KGVAS, KSVQS, KDVHS, RDVQS, KGVQS, KGVTS, KGVHS, KGVRS, KGVGS, RAVGS, RDVRS, RNVQS, and NTVDS.
In another more preferred embodiment, said variant cleaves a chimeric DNA target comprising a sequence having the formula:
c−11n−10n−9n−8m−7y−6n−5n−4n−3k−2y−1r+1m+2n+3n+4n+5r+6k+7n+8n+9n+10g+11 (I),
wherein n is a, t, c, or g, m is a or c, y is c or t, k is g or t, r is a or g (SEQ ID NO: 2), providing that when n−10n−9n−8 is aaa and n−5n−4n−3 is gtc then n+8n+9n+10 is different from ttt and n+3n+4n+5 is different from gac and when n+8n+9n+10 is ttt and n+3n+4n+5 is gac then n−10n−9n−8 is different from aaa and n−5n−4n−3 is different from gtc.
According to the invention, said chimeric DNA target may be palindromic, pseudopalindromic or non-palindromic. Preferably, the nucleotide sequence from positions −11 to −8 and +8 to +11 and/or the nucleotide sequence from positions −5 to −3 and/or +3 to +5 are palindromic.
More preferably, for cleaving a chimeric DNA target, wherein n−4 is t or n+4 is a, said variant has a glutamine (Q) in position 44.
More preferably, for cleaving a chimeric DNA target, wherein n−4 is a or n+4 is t, said variant has an alanine (A) or an asparagine in position 44; the I-CreI variants comprising A44, R68, S70 or A44, R68, S70, N75 are examples of such variants.
More preferably, for cleaving a chimeric DNA target, wherein n−4 is c or n+4 is g, said variant has a lysine (K) in position 44; the I-CreI variants comprising K44, R68, E70 or K44, R68, E70, N75 are examples of such variants.
More preferably, for cleaving a chimeric DNA target, wherein n−9 is g or n+9 is c, said variant has an arginine (R) or a lysine (K) in position 38. The I-CreI the variants having the following amino acid residues in positions 28, 30, 33, 38 and respectively, are examples of such variants: Q28/N30/Y33/K38/R40, R28/N30/K331R38/Q40, Q28/N30/R33/R38/R40, Q28/N30/Y33/K38/K40, K28/N30/T33/R38/Q40, K28/N30/S33/R38/E40, S28/N30/Y33/R38/K40, K28/N30/S33/R38/D40, K28/N30/S33/R38/S 40, Q28/N30/Y33/R38/K40, Q28/N30/K33/R38/T40, N28/N30/S33/R38/K40, N28/N30/S33/R38/R40, E28/N30/R33/R38/K40, R28/N30/T33/R38/A40, Q28/N30/Y33/R38/A40, Q28/N30/Y33/R38/S40, K28/N30/R33/K38/A40, R28/N30/A33/K38/S40, A28/N30/N33/R38/K40, Q28/N30/S33/R38/K40, K28/A30/H33/R38/S40, K28/H30/H33/R38/S40, K28/E30/S33/R38/S40, K28/N30/H33/R38/S40, K28/D30/H33/K38/S40, K28/E30/H33/R38/S40, K28/S30/H33/R38/S40, and K28/G30/V33/R38/S40.
More preferably, said DNA target comprises a nucleotide triplet in positions −10 to −8, which is selected from the group consisting of: aac, aag, aat, acc, acg, act, aga, age, agg, agt, ata, atg, cag, cga, egg, ctg, gac, gag, gat, gaa, gcc, gga, ggc, ggg, ggt, gta, gtg, gtt, tac, tag, tat, taa, tcc, tga, tgc, tgg, tgt or ttg, and/or a nucleotide triplet in positions +8 to +10, which is the reverse complementary sequence of said nucleotide triplet in positions −10 to −8.
In a second preferred embodiment of said variant, it is an I-MsoI variant having at least two substitutions, one in each of the two subdomains situated from positions 30 to 43 and 47 to 75 of I-MsoI, respectively.
Furthermore, other residues may be mutated on the entire sequence of the parent LAGLIDADG homing endonuclease, and in particular in the C-terminal half of said sequence. For example, the substitutions in the C-terminal half of I-CreI (positions 80 to 163) are preferably in positions: 80, 82, 85, 86, 87, 94, 96, 100, 103, 114, 115, 117, 125, 129, 131, 132, 147, 151, 153, 154, 155, 157, 159 and 160 of 1-CreI.
The variants of the invention may include one or more residues inserted at the NH2 terminus and/or COOH terminus of the parent LAGLIDADG homing endonuclease sequence. For example, a methionine residue is introduced at the NH2 terminus, a tag (epitope or polyhistidine sequence) is introduced at the NH2 terminus and/or COOH tee minus; said tag is useful for the detection and/or the purification of said polypeptide.
The variants of the invention may be, either a monomer or single-chain chimeric endonuclease comprising two LAGLIDADG homing endonuclease domains in a single polypeptide, or an homodimer or heterodimer comprising two such domains in two separate polypeptides. According to the invention, one or both monomer(s)/domain(s) may be mutated in the two subdomains as defined above. One monomer/domain may be from a parent LAGLIDADG homing endonuclease or a functional variant thereof.
According to another preferred embodiment of the invention, said variant is a monomer, a single-chain chimeric molecule or an heterodimer, wherein both LAGLIDADG homing endonuclease domains comprise mutations in at least two separate subdomains, as defined above, said mutations in one domain being different from that in the other domain.
The subject-matter of the present invention is also a polynucleotide fragment encoding a variant or a mutated domain thereof, as defined above; said polynucleotide may encode one domain of a monomer, one monomer of an homodimer or heterodimer, or two domains of a monomer or single-chain molecule, as defined above.
The subject-matter of the present invention is also a recombinant vector comprising at least one polynucleotide fragment encoding a variant, as defined above. Said vector may comprise a polynucleotide fragment encoding the monomer of a homodimeric variant or the two domains of a monomeric variant or a single-chain molecule. Alternatively, said vector may comprise two different polynucleotide fragments, each encoding one of the monomers of an heterodimeric variant.
One type of preferred vector is an episome, i.e., a nucleic acid capable of extra-chromosomal replication. Preferred vectors are those capable of autonomous replication and/or expression of nucleic acids to which they are linked. Vectors capable of directing the expression of genes to which they are operatively linked are referred to herein as “expression vectors”.
A vector according to the present invention comprises, but is not limited to, a YAC (yeast artificial chromosome), a BAC (bacterial artificial), a baculovirus vector, a phage, a phagemid, a cosmid, a viral vector, a plasmid, a RNA vector or a linear or circular DNA or RNA molecule which may consist of chromo-somal, non chromosomal, semi-synthetic or synthetic DNA. In general, expression vectors of utility in recombinant DNA techniques are often in the form of “plasmids” which refer generally to circular double-stranded DNA loops which, in their vector form are not bound to the chromosome. Large numbers of suitable vectors are known to those of skill in the art.
Viral vectors include retrovirus, adenovirus, parvovirus (e.g. adeno-associated viruses), coronavirus, negative strand RNA viruses such as orthomyxovirus (e.g., influenza virus), rhabdovirus (e.g., rabies and vesicular stomatitis virus), para-myxovirus (e.g. measles and Sendai), positive strand RNA viruses such as picor-navirus and alphavirus, and double-stranded DNA viruses including adenovirus, herpesvirus (e.g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomega-lovirus), and poxvirus (e.g., vaccinia, fowlpox and canarypox). Other viruses include Norwalk virus, togavirus, flavivirus, reoviruses, papovavirus, hepadnavirus, and hepatitis virus, for example.
Vectors can comprise selectable markers, for example: neomycin phosphotransferase, histidinol dehydrogenase, dihydrofolate reductase, hygromycin phosphotransferase, herpes simplex virus thymidine kinase, adenosine deaminase, glutamine synthetase, and hypoxanthine-guanine phosphoribosyl transferase for eukaryotic cell culture; TRP1 for S. cerevisiae; tetracycline, rifampicin or ampicillin resistance in E. coli.
Preferably said vectors are expression vectors, wherein the sequence(s) encoding the variant of the invention is placed under control of appropriate transcriptional and translational control elements to permit production or synthesis of said variant. Therefore, said polynucleotide is comprised in expression cassette. More particularly, the vector comprises a replication origin, a promoter operatively linked to said encoding polynucleotide, a ribosome-binding site, an RNA-splicing site (when genomic DNA is used), a polyadenylation site and a transcription termination site. It also can comprise an enhancer. Selection of the promoter will depend upon the cell in which the polypeptide is expressed. Preferably, when said variant is an heterodimer, the two polynucleotides encoding each of the monomers are included in one vector which is able to drive the expression of both polynucleotides, simultaneously.
According to another advantageous embodiment of said vector, it includes a targeting construct comprising sequences sharing homologies with the region surrounding the chimeric DNA target sequence as defined above.
More preferably, said targeting DNA construct comprises:
a) sequences sharing homologies with the region surrounding the chimeric DNA target sequence as defined above, and
b) sequences to be introduced flanked by sequence as in a).
The invention also concerns a prokaryotic or eukaryotic host cell which is modified by a polynucleotide or a vector as defined above, preferably an expression vector.
The invention also concerns a non-human transgenic animal or a transgenic plant, characterized in that all or part of their cells are modified by a polynucleotide or a vector as defined above.
As used herein, a cell refers to a prokaryotic cell, such as a bacterial cell, or eukaryotic cell, such as an animal, plant or yeast cell.
The polynucleotide sequence(s) encoding the variant as defined in the present invention may be prepared by any method known by the man skilled in the art. For example, they are amplified from a cDNA template, by polymerase chain reaction with specific primers. Preferably the codons of said cDNA are chosen to favour the expression of said protein in the desired expression system.
The recombinant vector comprising said polynucleotides may be obtained and introduced in a host cell by the well-known recombinant DNA and genetic engineering techniques.
The variant of the invention is produced by expressing the polypeptide(s) as defined above; preferably said polypeptide(s) are expressed or co-expressed in a host cell modified by one or two expression vector(s), under conditions suitable for the expression or co-expression of the polypeptides, and the variant is recovered from the host cell culture.
The subject-matter of the present invention is further the use of a variant, one or two polynucleotide(s), preferably included in expression vector(s), a cell, a transgenic plant, a non-human transgenic mammal, as defined above, for molecular biology, for in vivo or in vitro genetic engineering, and for in vivo or in vitro genome engineering, for non-therapeutic purposes.
Non therapeutic purposes include for example (i) gene targeting of specific loci in cell packaging lines for protein production, (ii) gene targeting of specific loci in crop plants, for strain improvements and metabolic engineering, (iii) targeted recombination for the removal of markers in genetically modified crop plants, (iv) targeted recombination for the removal of markers in genetically modified microorganism strains (for antibiotic production for example).
According to an advantageous embodiment of said use, it is for inducing a double-strand break in a site of interest comprising a chimeric DNA target sequence, thereby inducing a DNA recombination event, a DNA loss or cell death.
According to the invention, said double-strand break is for: repairing a specific sequence, modifying a specific sequence, restoring a functional gene in place of a mutated one, attenuating or activating an endogenous gene of interest, introducing a mutation into a site of interest, introducing an exogenous gene or a part thereof, inactivating or detecting an endogenous gene or a part thereof, translocating a chromosomal arm, or leaving the DNA unrepaired and degraded.
According to another advantageous embodiment of said use, said variant, polynucleotide(s), vector, cell, transgenic plant or non-human transgenic mammal are associated with a targeting DNA construct as defined above.
The subject-matter of the present invention is also a method of genetic engineering, characterized in that it comprises a step of double-strand nucleic acid breaking in a site of interest located on a vector comprising a chimeric DNA target as defined hereabove, by contacting said vector with a variant as defined above, thereby inducing a homologous recombination with another vector presenting homology with the sequence surrounding the cleavage site of said variant.
The subject-matter of the present invention is also a method of genome engineering, characterized in that it comprises the following steps: 1) double-strand breaking a genomic locus comprising at least one chimeric DNA target of a variant as defined above, by contacting said target with said variant; 2) maintaining said broken genomic locus under conditions appropriate for homologous recombination with a targeting DNA construct comprising the sequence to be introduced in said locus, flanked by sequences sharing homologies with the targeted locus.
The subject-matter of the present invention is also a method of genome engineering, characterized in that it comprises the following steps: 1) double-strand breaking a genomic locus comprising at least one chimeric DNA target of a variant as defined above, by contacting said cleavage site with said variant; 2) maintaining said broken genomic locus under conditions appropriate for homologous recombination with chromosomal DNA sharing homologies to regions surrounding the cleavage site.
The subject-matter of the present invention is also a composition characterized in that it comprises at least one variant, one or two polynucleotide(s), preferably included in expression vector(s), as defined above.
In a preferred embodiment of said composition, it comprises a targeting DNA construct comprising the sequence which repairs the site of interest flanked by sequences sharing homologies with the targeted locus.
The subject-matter of the present invention is also the use of at least one variant, one or two polynucleotide(s), preferably included in expression vector(s), as defined above, for the preparation of a medicament for preventing, improving or curing a genetic disease in an individual in need thereof, said medicament being administrated by any means to said individual.
The subject-matter of the present invention is also a method for preventing, improving or curing a genetic disease in an individual in need thereof, said method comprising at least the step of administering to said individual a composition as defined above, by any means.
The subject-matter of the present invention is also the use of at least one variant, one or or two polynucleotide(s), preferably included in expression vector(s), as defined above for the preparation of a medicament for preventing, improving or curing a disease caused by an infectious agent that presents a DNA intermediate, in an individual in need thereof, said medicament being administrated by any means to said individual.
The subject-matter of the present invention is also a method for preventing, improving or curing a disease caused by an infectious agent that presents a DNA intermediate, in an individual in need thereof, said method comprising at least the step of administering to said individual a composition as defined above, by any means.
The subject-matter of the present invention is also the use of at least one variant, one or two polynucleotide(s), preferably included in expression vector(s), as defined above, in vitro, for inhibiting the propagation, inactivating or deleting an infectious agent that presents a DNA intermediate, in biological derived products or products intended for biological uses or for disinfecting an object.
The subject matter of the present invention is also a method for decontaminating a product or a material from an infectious agent that presents a DNA intermediate, said method comprising at least the step of contacting a biological derived product, a product intended for biological use or an object, with a composition as defined above, for a time sufficient to inhibit the propagation, inactivate or delete said infectious agent.
In a particular embodiment, said infectious agent is a virus. For example said virus is an adenovirus (Ad11, Ad21), herpesvirus (HSV, VZV, EBV, CMV, herpesvirus 6, 7 or 8), hepadnavirus (HBV), papovavirus (HPV), poxvirus or retrovirus (FITLY, HIV).
The subject-matter of the present invention is also the use of at least one homing endonuclease variant, as defined above, as a scaffold for making other meganucleases. For example a third round of mutagenesis and selection/screening can be performed on said variants, for the purpose of making novel, third generation homing endonucleases.
According to another advantageous embodiment of said uses, said homing endonuclease variant is associated with a targeting DNA construct as defined above.
The use of the homing endonuclease variant and the methods of using said homing endonuclease variant according to the present invention include also the use of the single-chain chimeric endonuclease derived from said variant, the poly-nucleotide(s), vector, cell, transgenic plant or non-human transgenic mammal encoding said variant or single-chain chimeric endonuclease, as defined above.
In addition to the preceding features, the invention further comprises other features which will emerge from the description which follows, which refers to examples illustrating the I-CreI meganuclease variants and their uses according to the invention, as well as to the appended drawings in which:
The method for producing meganuclease variants and the assays based on cleavage-induced recombination in mammal or yeast cells, which are used for screening variants with altered specificity, are described in the International PCT Application WO 2004/067736 and Epinat et al., Nucleic Acids Res., 2003, 31, 2952-2962. These assays result in a functional LacZ reporter gene which can be monitored by standard methods (
A) Material and methods
I-CreI scaffold-proteins open reading frames were synthesized, as described previously (Epinat et al., N.A.R., 2003, 31, 2952-2962). The I-CreI scaffold proteins include wild-type I-CreI, I-CreI D75N (I-CreI N75), I-CreI R7OS, D75N (I-CreI S70 N75), I-CreI 124V, R70S, D75N (I-CreI V24 S70 N75), and I-CreI 124V, R70S (I-CreI V24 S70). Combinatorial libraries were derived from the I-CreI scaffold proteins, by replacing different combinations of residues, potentially involved in the interactions with the bases in positions ±3 to 5 of one DNA target half-site (Q44, R68, R70, D75 and 177). The diversity of the meganuclease libraries was generated by PCR using degenerated primers harboring a unique degenerated codon at each of the selected positions. For example, mutation D75N was introduced by replacing codon 75 with aac. Then, PCR on the I-CreI N75 cDNA template was performed using primers from Sigma harboring codon VVK (18 codons, amino acids ADEGHKNPQRST) at positions 44, 68 and 70. The final PCR product was digested with specific restriction enzymes, and cloned back into the I-CreI ORF digested with the same restriction enzymes, in pCLS0542. In this 2 micron-based replicative vector marked with the LEU2 gene, I-CreI variants are under the control of a galactose inducible promoter (Epinat et al., precited). After electroporation in E. coli, 7×104 clones were obtained representing 12 times the theoretical diversity at the DNA level (183=5832).
The C1221 twenty-four by palindrome (tcaaaacgtcgtacgacgttttga, SEQ ID NO: 1) is a repeat of the half-site of the nearly palindromic natural I-CreI target (tcaaaacgtcgtgagacagtttgg, SEQ ID NO: 24). C1221 is cleaved as efficiently as the I-CreI natural target in vitro and ex vivo in both yeast and mammalian cells. The 64 palindromic targets were derived from C1221 as follows: 64 pair of oligonucleotides (ggcatacaagtttcaaaacnnngtacnnngttttgacaatcgtctgtca (SEQ ID NO: 25) and reverse complementary sequences) were ordered form Sigma, annealed and cloned into pGEM-T Easy (PROMEGA) in the same orientation. Next, a 400 by PvuII fragment was excised and cloned into the yeast vector pFL39-ADH-LACURAZ, also called pCLS0042, and the mammalian vector pcDNA3.1-LACURAZ-ΔURA, both described previously (Epinat et al., 2003, precited), resulting in 64 yeast reporter vectors (target plasmids).
Alternatively, double-stranded target DNA, generated by PCR amplification of the single stranded oligonucleotides, was cloned using the Gateway protocol (INVITROGEN) into yeast and mammalian reporter vectors.
The library of meganuclease expression variants was transformed into the leu2 mutant haploid yeast strain FYC2-6A: alpha, trp1Δ63, leu2Δ1, his3Δ200. A classical chemical heat choc protocol derived from (Gietz and Woods, Methods Enzymol., 2002, 350, 87-96), that routinely gives 106 independent transformants per μg of DNA, was used for transformation. Individual transformant (Leu+) clones were individually picked in 96 wells microplates. 13824 colonies were picked using a colony picker (QpixII, GENETIX), and grown in 144 microtiter plates.
The 64 target plasmids were transformed using the same protocol, into the haploid yeast strain FYBL2-7B: a, ura3Δ851, trp1Δ63, leu2Δ1, lys2Δ202, resulting in 64 tester strains.
Meganuclease expressing clones were mated with each of the 64 target strains, and diploids were tested for beta-galactosidase activity, by using the screening assay illustrated on
The clones showing an activity against at least one target were isolated (first screening). The spotting density was then reduced to 4 spots/cm2 and each positive clone was tested against the 64 reporter strains in quadruplicate, thereby creating complete profiles (secondary screening).
The open reading frame (ORF) of positive clones identified during the primary and/or secondary screening in yeast was amplified by PCR on yeast colonies, by using the pair of primers: ggggacaagtugtacaaaaaagcaggcttcgaaggagatagaaccatggccaataccaaatataacaaagagttcc (SEQ ID NO: 26) and ggggaccactttgtacaagaaagctgggatagtcggccgccggggaggatttcttcttetcgc (SEQ ID NO: 27) from PROLIGO. Briefly, yeast colony is picked and resuspended in 100 μl of LGlu liquid medium and cultures overnight. After centrifugation, yeast pellet is resuspended in 10 μl of sterile water and used to perform PCR reaction in a final volume of 50 containing 1.5 μl of each specific primers (100 pmol/μl). The PCR conditions were one cycle of denaturation for 10 minutes at 94° C., 35 cycles of denaturation for 30 s at 94° C., annealing for 1 min at 55° C., extension for 1.5 min at 72° C., and a final extension for 5 mM. The resulting PCR products were then sequenced.
f) Re-Cloning of primary hits
The open reading frames (ORFs) of positive clones identified during the primary screening were recloned using the Gateway protocol (Invitrogen). ORFs were amplified by PCR on yeast colonies, as described in e). PCR products were then cloned in: (i) yeast gateway expression vector harboring a galactose inducible promoter, LEU2 or KanR as selectable marker and a 2 micron origin of replication, and (ii) a pET 24d(+) vector from NOVAGEN. Resulting clones were verified by sequencing (MILLEGEN).
I-CreI is a dimeric homing endonuclease that cleaves a 22 by pseudo-palindromic target. Analysis of I-CreI structure bound to its natural target has shown that in each monomer, eight residues establish direct interactions with seven bases (Jurica et al., 1998, precited). Residues Q44, R68, R70 contact three consecutive base pairs at position 3 to 5 (and −3 to −5,
In a first library, the I-CreI scaffold was mutated from D75 to N to decrease likely energetic strains caused by the replacement of the basic residues R68 and R70 in the library that satisfy the hydrogen-acceptor potential of the buried D75 in the I-CreI structure. The D75N mutation did not affect the protein structure, but decreased the toxicity of I-CreI in overexpression experiments. Next, positions 44, 68 and 70 were randomized.
In a second library, the I-CreI scaffold was mutated from R70 to S and I24 to V (I-CreI V24, S70); these mutations did not affect the protein structure. Next, positions 44, 68, 75 and 77 were randomized.
64 palindromic targets resulting from substitutions in positions ±3, ±4 and ±5 of a palindromic target cleaved by I-CreI (Chevalier et al., 2003, precited) were generated, as described in
A robot-assisted mating protocol was used to screen a large number of meganucleases from our library. The general screening strategy is described in
The results from the library of I-CreI N75 mutants having variation at positions 44, 68 and 70 are detailed hereafter. 13,824 meganuclease expressing clones (about 2.3-fold the theoretical diversity) were spotted at high density (20 spots/cm2) on nylon filters and individually tested against each one of the 64 target strains (884,608 spots). 2100 clones showing an activity against at least one target were isolated (
The 350 validated clones showed very diverse patterns. Some of these new profiles shared some similarity with the wild type scaffold whereas many others were totally different. Various examples are shown on
Clustering was done using hclust from the R package, and the quantitative data from the primary, low density screening. Both variants and targets were clustered using standard hierarchical clustering with Euclidean distance and Ward's method (Ward, J. H., American Stat. Assoc., 1963, 58, 236-244). Mutants and targets dendrograms were reordered to optimize positions of the clusters and the mutant dendrogram was cut at the height of 8 to define the cluster.
Next, hierarchical clustering was used to determine whether families could be identified among the numerous and diverse cleavage patterns of the variants. Since primary and secondary screening gave congruent results, quantitative data from the first round of yeast low density screening was used for analysis, to permit a larger sample size. Both variants and targets were clustered using standard hierarchical clustering with Euclidean distance and Ward's method (Ward, J. H., precited) and seven clusters were defined (
1 frequencies according to the cleavage index, as described in FIG. 6b
2 in each position, residues present in more than ⅓ of the cluster are indicated
For each cluster, a set of preferred targets could be identified on the basis of the frequency and intensity of the signal (
Analysis of the residues found in each cluster showed strong biases for position 44: Q is overwhelmingly represented in clusters 1 and 2, whereas A and N are more frequent in clusters 3 and 4, and K in clusters 6 and 7. Meanwhile, these biases were correlated with strong base preferences for DNA positions ±4, with a large majority of t:a base pairs in cluster 1 and 2, a:t in clusters 3, 4 and 5, and c:g in clusters 6 and 7 (see Table I). The structure of I-CreI bound to its target shows that residue Q44 interacts with the bottom strand in position −4 (and the top strand of posi-tion +4, see
The 75 hybrid targets sequences were cloned as follows: oligonucleotides were designed that contained two different half sites of each mutant palindrome (PROLIGO). Double-stranded target DNA, generated by PCR amplifica-tion of the single stranded oligonucleotides, was cloned using the Gateway protocol (INVITROGEN) into yeast and mammalian reporter vectors. Yeast reporter vectors were transformed into S. cerevisiae strain FYBL2-7B (MATα, ura3Δ851, trp1Δ63, leu2Δ1, lys2Δ202).
Variants are homodimers capable of cleaving palindromic sites. To test whether the list of cleavable targets could be extended by creating heterodimers that would cleave hybrid cleavage sites (as described in
Altogether, a total of 112 combinations of 14 different proteins were tested in yeast, and 37.5% of the combinations (42/112) revealed a positive signal on their predicted chimeric target. Quantitative data are shown for six examples on
The variants are generated according to the experimental procedures described in example 1.
I-CreI wt (I-CreI D75), I-CreI D75N (I-CreI N75) and I-CreI S70 N75 open reading frames were synthesized, as described previously (Epinat et al., N.A.R., 2003, 31, 2952-2962). Combinatorial libraries were derived from the I-Crel N75, I-CreI D75 and I-CreI S70 N75 scaffolds, by replacing different combinations of residues, potentially involved in the interactions with the bases in positions ±8 to 10 of one DNA target half-site (Q26, K28, N30, S32, Y33, Q38 and S40). The diversity of the meganuclease libraries was generated by PCR using degenerated primers harboring a unique degenerated codon at each of the selected positions.
Mutation D75N was introduced by replacing codon 75 with aac. Then, the three codons at positions N30, Y33 and Q38 (Ulib4 library) or K28, N30 d Q38 (Ulib5 library) were replaced by a degenerated codon VVK (18 codons) coding for 12 different amino acids: A,D,E,G,H,K,N,P,Q,R,S,T). In consequence, the maximal (theoretical) diversity of these protein libraries was 123 or 1728. However, in tennis of nucleic acids, the diversity was 183 or 5832.
In Lib4, ordered from BIOMETHODES, an arginine in position 70 of the I-CreI N75 scaffold was first replaced with a serine (R70S). Then positions 28, 33, 38 and 40 were randomized. The regular amino acids (K28, Y33, Q38 and S40) were replaced with one out of 10 amino acids (A,D,E,K,N,Q,R,S,T,Y). The resulting library has a theoretical complexity of 10000 in terms of proteins.
In addition, small libraries of complexity 225 (152) resulting from the randomization of only two positions were constructed in an I-CreI N75 or I-CreI D75 scaffold, using NVK degenerate codon (24 codons, amino acids ACDEGHKNPQRSTWY).
Fragments carrying combinations of the desired mutations were obtained by PCR, using a pair of degenerated primers coding for 10, 12 or 15 different amino acids, and as DNA template, the I-CreI N75 (
The 64 palindromic targets derived from C1221 were constructed as described in example 1, by using 64 pairs of oligonucleotides (ggcatacaagtttcnnnacgtcgtacgacgtnnngacaatcgtctgtca (SEQ ID NO: 28) and reverse complementary sequences).
The open reading frame (ORF) of positive clones identified during the first and/or secondary screening in yeast was amplified by PCR on yeast colonies using primers: PCR-Gal10-F (gcaactttagtgctgacacatacagg, SEQ ID NO: 29) and PCR-Gal10-R (acaaccttgattgcagacttgacc, SEQ ID NO: 30).
All analyses of protein structures were realized using Pymol. The structures from I-CreI correspond to pdb entry 1g9y. Residue numbering in the text always refer to these structures, except for residues in the second I-Cref protein domain of the homodimer where residue numbers were set as for the first domain.
I-CreI is a dimeric homing endonuclease that cleaves a 22 by pseudo-palindromic target. Analysis of I-CreI structure bound to its natural target has shown that in each monomer, eight residues establish direct interactions with seven bases (Jurica et al., 1998, precited). According to these structural data, the bases of the nucleotides in positions ±8 to 10 establish direct contacts with I-CreI amino-acids N30, Y33, Q38 and indirect contacts with I-CreI amino-acids K28 and S40 (
An exhaustive protein library vs. target library approach was under-taken to engineer locally this part of the DNA binding interface. Randomization of 5 amino acids positions would lead to a theoretical diversity of 205=3.2×106. However, libraries with lower diversity were generated by randomizing 2, 3 or 4 residues at a time, resulting in a diversity of 225 (152), 1728 (123) or 10,000 (104). This strategy allowed an extensive screening of each of these libraries against the 64 palindromic 10NNN DNA targets using a yeast based assay described previously (Epinat et al., 2003, precited and International PCT Application WO 2004/067736) and whose principle is described in
First, the I-CreI scaffold was mutated from D75 to N. The D75N mutation did not affect the protein structure, but decreased the toxicity of I-CreI in overexpression experiments.
Next the Ulib4 library was constructed: residues 30, 33 and 38, were randomized, and the regular amino acids (N30, Y33, and Q38) replaced with one out of 12 amino acids (A,D,E,G,H,K,N,P,Q,R,S,T). The resulting library has a complexity of 1728 in terms of protein (5832 in terms of nucleic acids).
Then, two other libraries were constructed: Ulib5 and Lib4. In Ulib5, residues 28, 30 and 38, were randomized, and the regular amino acids (K28, N30, and Q38) replaced with one out of 12 amino acids (ADEGHKNPQRST). The resulting library has a complexity of 1728 in terms of protein (5832 in terms of nucleic acids). In Lib4, an Arginine in position 70 was first replaced with a Serine. Then, positions 28, 33, 38 and 40 were randomized, and the regular amino acids (K28, Y33, Q38 and S40) replaced with one out of 10 amino acids (A,D,E,K,N,Q,R,S,T,Y). The resulting library has a complexity of 10000 in tennis of proteins.
In a primary screening experiment, 20000 clones from Ulib4, 10000 clones from Ulib5 and 20000 clones from Lib4 were mated with each one of the 64 tester strains, and diploids were tested for beta-galactosidase activity. All clones displaying cleavage activity with at least one out of the 64 targets were tested in a second round of screening against the 64 targets, in quadriplate, and each cleavage profile was established, as shown on
After secondary screening and sequencing of positives over the entire coding region, a total of 1484 unique mutants were isolated showing a cleavage activity against at least one target. Different patterns could be observed.
Altogether, this large collection of mutants allowed the targeting of all of the 64 possible DNA sequences differing at positions 110, 19, and 18 (
Thus, hundreds of novel variants were obtained, including mutants with novel substrate specificity; these variants can keep high levels of activity and the specificity of the novel proteins can be even narrower than that of the wild-type protein for its target.
Hierarchical clustering was used to establish potential correlations between specific protein residues and target bases, as previously described (Arnould et al., J. Mol. Biol., 2006, 355, 443-458). Clustering was done on the quantitative data from the secondary screening, using hclust from the R package. Variants were clustered using standard hierarchical clustering with Euclidean distance and Ward's method (Ward, J. H., American Statist. Assoc., 1963, 58, 236-244). Mutant dendrogram was cut at the height of 17 to define the clusters. For the analysis, cumulated intensities of cleavage of a target within a cluster was calculated as the sum of the cleavage intensities of all cluster's mutants with this target, normalized to the sum of the cleavage intensities of all cluster's mutants with all targets.
Ten different mutant clusters were identified (Table II).
1Target and base frequencies correspond to cumulated intensity of cleavage as described in Materials and Methods).
2 In each position, residues present in more than 15% of the cluster are indicated
Analysis of the residues found in each cluster showed strong biases for all randomized positions. None of the residues is mutated in all libraries used in this study, and the residues found in the I-CreI scaffold were expected to be overrepresented. Indeed, K28, N30 and S40 were the most frequent residues in all 10 clusters, and no conclusion for DNA/protein interactions can really be inferred. However, Y33 was the most represented residue only in clusters 7, 8 and 10, whereas strong occurrence of other residues, such as H, R, G, T, C, P or S, was observed in the seven other clusters. The wild type Q38 residue was overrepresented in all clusters but one, R and K being more frequent in cluster 4.
Meanwhile, strong correlations were observed between the nature of residues 33 and 38 and substrate discrimination at positions+10 and +9 of the target.
Prevalence of Y33 was associated with high frequencies of adenine (74.9% and 64.3% in clusters 7 and 10, respectively), and this correlation was also observed, although to a lesser extent in clusters 4, 5 and 8. H33 or R33 were correlated with a guanine (63.0%, 56.3% and 58.5%, in clusters 1, 4 and 5, respectively) and T33, C33 or S33 with a thymine (45.6% and 56.3% in clusters 3 and 9, respectively). G33 was relatively frequent in cluster 2, the cluster with the most even base representation in ±10. These results are consistent with the observations of Seligman and collaborators (Nucleic Acids Res., 2002, 30, 3870-3879), who showed previously that a Y33R or Y3314 mutation shifted the specificity of I-CreI toward a guanine and Y33C, Y33T, Y33S (and also Y33L) towards a thymine in position ±10.
In addition, R38 and K38 were associated with an exceptional high frequency of guanine in cluster 4, while in all the other clusters, the wild type Q38 residue was overrepresented, as well as an adenine in ±9 of the target.
The structure of I-CreI bound to its target (Chevalier et al., 2003, precited; Jurica et al., 1998, precited) has shown that Y33 and Q38 contact two adenines in −10 and −9 (
This example shows that an I-CreI target can be separated in two parts, bound by different subdomains, behaving independently. In the I-CreI DNA target, positions ±5, ±4 and ±3 are bound by residues 44, 68 and 70 (
In order to determine if positions ±5 to ±3 and ±9 to ±8 are bound by two different, independent functional subdomains, mutants with altered specificity in the ±5 to ±3 region, but still binding C1221, were assayed for their cleavage properties in the +10 to +8 region.
All analyses of protein structures were realized using Pymol. The structures from I-CreI correspond to pdb entry 1g9y. Residue numbering in the text always refer to these structures, except for residues in the second I-CreI protein domain of the homodimer where residue numbers were set as for the first domain.
Mutants were generated as described in examples 1, by mutating positions 44, 68, 70 and 75, and screening for clones able to cleave C1221 derived targets. Mutant expressing plasmids are transformed into S. cerevisiae strain FYC2-6A (MATα, trp1Δ63, leu2Δ1, his3Δ200).
The 64 palindromic targets derived from C1221 by mutation in ±5 to ±3 were constructed as described in example 1, by using 64 pairs of oligonucleotides (ggcatacaagtttcaaaacnnngtacnnngttttgacaategtctgtca (SEQ ID NO:31) and reverse complementary sequences).
Mating was performed as described in example 1, using a low gridding density (about 4 spots/cm2).
B) Results 64 targets corresponding to all possible palindromic targets derived from C1221 were constructed by mutagenesis of bases ±10 to ±8, as shown on
As shown on
The objective here is to determine whether it is possible to combine separable functional subdomains in the I-CreI DNA-binding interface, in order to cleave novel DNA targets.
The identification of distinct groups of mutations in the I-CreI coding sequence that alter the cleavage specificity towards two different regions of the C1221 target sequence (10NNN (positions −10 to −8 and +8 to +10: ±8 to 10 or ±10 to 8; example 4) and 5NNN (positions −5 to −3 and +3 to +5: ±3 to 5 or ±5 to 3; example 1) raises the possibility of combining these two groups of mutants intramolecularly to generate a combinatorial mutant capable of cleaving a target sequence simultaneously altered at positions 10NNN and 5NNN (
Positions 28, 30, 33, 38 and 40 on one hand, and 44, 68 and 70, on another hand are on a same DNA-binding fold, and there is no structural evidence that they should behave independently. However, the two sets of mutations are clearly on two spatially distinct regions of this fold (
Therefore, a model non-palindromic target sequence that would be a patchwork of four cleaved 5NNN and 10NNN targets, was designed. This target, COMB 1, differs from the C1221 consensus sequence at positions ±3, ±4, ±5, ±8, ±9 and ±10 (
Throughout the text and figures, combinatorial mutants for COMB sequences are named with an eight letter code, after residues at positions 28, 30, 33, 38, 40, 44, 68 and 70 (For example, NNSRK/AAR stands for I-CreI 28N30N33S38R40K44A68A70R75N). Parental controls are named with a five letter or three letter code, after residues at positions 28, 30, 33, 38 and 40 (NNSRK stands for I-CreI 28N30N33S38R40K70S75N) or 44, 68 and 70 (AAR stands for I-CreI 44AQ68A70R75N).
All target sequences described in these examples are 22 or 24 by palindromic sequences. Therefore, they will be described only by the first 11 or 12 nucleotides, followed by the suffix_P, solely to indicate that (for example, target 5′ tcaaaacgtcgtacgacgttttga 3′ (SEQ ID NO:1) cleaved by the I-CreI protein, will be called tcaaaacgtcgt_P).
Basically, four series of mutations in the I-CreI monomer were obtained as described in examples 1 and 4, respectively. In a first step, a D75N mutation was introduced in the I-CreI scaffold, in order to decrease likely energetic strains caused by the replacement of the basic residues R68 and R70 in the library that satisfy the hydrogen-acceptor potential of the buried D75 in the I-CreI structure.
In this example, mutants able to cleave the 10NNN part (tctggacgtegt_P target (SEQ ID NO: 37)) of COMB2 were obtained by mutagenesis of positions 28, 30, 33 or 28, 33, 38, and 40 (Table III), and mutants able to cleave the 5NNN part (tcaaaacgacgt_P (SEQ ID NO:38) of COMB2 were obtained by mutagenesis of positions 44, 68 and 70 cleave (Table III).
In example 8, mutants able to cleave the 10NNN part (tcgatacgtegt_P (SEQ ID NO:44) of COMB3 were obtained by mutagenesis of positions 28, 30, 33 or 28, 33, 38, and 40 (Table IV), and mutants able to cleave the 5NNN part (teaaaaccetgt_P (SEQ ID NO:45)) of COMB3 were obtained by mutagenesis of positions 44, 68 and 70 cleave (Table IV).
Then, for each combined target (COMB2 or COMB3), mutations at positions 28, 30, 33, 38 and 40 from mutants cleaving 10NNN targets were combined with mutations at position 44, 68 and 70 from mutants cleaving 5NNN targets, and the ability of the resulting combinatorial mutants to cleave the appropriate target sequence COMB2 (tctggacgacgt_P (SEQ ID NO:39); this example) or COMB3 (tcaaaaccctgt_P (SEQ ID NO:45); example 8), was assayed.
In order to generate an I-CreI coding sequence containing mutations derived from different libraries (amino acids 28,30,33,38,40 and 44,68,70 or 44,68,70,75,77), separate overlapping PCR reactions were carried out that amplify the 5′ end (aa positions 1-43) or the 3′ end (positions 39-167) of the I-CreI coding sequence (
Targets were cloned as described in example 1.
c) Mating of Homing Endonuclease Expressing Clones and Screening in Yeast
Mating of homing endonuclease expressing clones and screening in yeast was performed as described in example 1, using a high gridding density (about 20 spots/cm2).
I-CreI mutants cleaving tctggacgtegt_P (SEQ ID NO:37) and tcaaaacgacgt_P (SEQ ID NO: 38) were identified as described in examples 1 and 4. Three variants, mutated in positions 30, 33, 38, 40 and 70, capable of cleaving the sequence tctggaegtegt_P (SEQ ID NO:37; Table III) were combined with 31 different variants, mutated in positions 44, 68 and 70, capable of cleaving the sequence tcaaaacgacgt_P (SEQ ID NO:38; Table III). Both set of proteins are mutated in position 70. However, the hypothesis of two separable functional subdomains implies that this position has little impact on the specificity in ±10 to ±8. Therefore, in the combined protein, only the residues 30, 33, 38 and 40 from the first set of proteins were used, residue 70 being picked from the second set of proteins.
The resulting 93 mutants were assayed for cleavage in yeast containing a LacZ assay with the combined target sequence COMB2 (tctggacgacgt_P: SEQ ID NO:39). Thirty two combined mutants were capable of cleaving the target (Tableau III and
These results indicate that combining mutations at positions 28, 30, 33, 38, 40 and 44, 68, 70 can give rise to functional endonucleases with the expected specificity for approximately 30% of the tested combinations. This study identifies residues 28-40 on one hand, and 44-70 on another hand, as part of two separable DNA-binding subdomains (
1mutations identified in I-CreIN75 variants cleaving the chosen 5GAC target.
2mutations identified in I-CreI S70N75 variants cleaving the 10TGG chosen target.
The experimental procedures are described in example 7.
Seven variants mutated in positions 28, 33, 38, 40 and 70, and capable of cleaving the sequence tcgatacgtcgt_P (SEQ ID NO:44, Table IV) were combined with 30 different variants mutated in positions 44, 68 and 70, and capable of cleaving the sequence tcaaaaccctgt_P (SEQ ID NO:45, Table IV). Mutations in position 70 are found in both set of proteins. However, the hypothesis of two separable functional subdomains implies that this position has little impact on the specificity in ±10 to ±8. Therefore, in the combined protein, only the residues 30, 33, 38 and 40 from the first set of proteins were used, residue 70 being picked from the second set of proteins.
The resulting 210 mutants were assayed for cleavage in yeast containing a LacZ assay with the combined target sequence COMB3 (tcgataccctgt_P (SEQ ID NO:46)). Seventy-seven combined mutants were capable of cleaving the target (Table IV). Cleavage of the combined target sequence is specific to the combinatorial mutant as each of the parent mutants was unable to cleave the combined sequence. In addition, while the parental mutants displayed efficient cleavage of the 5NNN and 10NNN target sequences, all combinatorial mutants displayed no significant activity for these sequences or for the original C1221 sequence.
These results indicate that combining mutations at positions 28, 30, 33, 38, 40 and 44, 68, 70 can give rise to functional endonucleases with the expected specificity for approximately 30% of the tested combinations. This study identifies residues 28-40 on one hand, and 44-70 on another hand, as part of two separable DNA-binding subdomains (
1 mutations identified in I-CreI N75 variants cleaving the chosen 5CCT target.
2mutations identified in I-CreI S70 N75 variants cleaving the 10GAT chosen target.
3mutations identified in an I-CreI N75 variant cleaving the 10GAT chosen target.
The objective here is to determine whether it is possible to identify and combine separable functional subdomains in the I-CreI DNA-binding interface, in order to cleave novel DNA targets. All target sequences described in this example are 24 by palindromic sequences. Therefore, they will be described only by the first 12 nucleotides, followed by the suffix_P, solely to indicate that (for example, target 5′ tcaaaacgtcgtacgacgttttga 3′ (SEQ ID NO:1), cleaved by the I-CreI protein, will be called tcaaaacgtcgt_P).
Two series of mutations in the I-CreI monomer were obtained as described in examples 1 and 4. In a first step, a D75N mutation was introduced in the I-CreI scaffold, in order to decrease likely energetic strains caused by the replacement of the basic residues R68 and R70 in the library that satisfy the hydrogen-acceptor potential of the buried D75 in the I-CreI structure. Then mutants able to cleave the tcaacacgtcgt_P (SEQ ID NO:47) target were obtained by mutagenesis of positions 28, 30, 33 or 28, 33, 38, and 40, (Table V), and mutants able to cleave tcaaaaccctgt_P (SEQ ID NO: 48) were obtained by mutagenesis of positions 44, 68 and 70 cleave (Table V).
Positions 28, 30, 33, 38 and 40 on one hand, and 44, 68 and 70, on another hand are on a same DNA-binding fold, and there is no structural evidence that they should behave independently. However, the two sets of mutations are clearly on two spatially distinct regions of this fold (
The experimental procedures are described in example 7.
Five variants, mutated in positions 28, 30, 33, 38, 40 and 70, and capable of cleaving the sequence tcaacacgtcgt_P (SEQ ID NO: 47, Table V) were combined with 34 different variants mutated in positions 44, 68 and 70, and capable of cleaving the sequence tcaaaaccctgt_P (SEQ ID NO:48, Table V). Mutations in position 70 are found in both set of proteins. However, the hypothesis of two separable functional subdomains implies that this position has little impact on the specificity in ±10 to ±8. Therefore, in the combined protein, only the residues 30, 33, 38 and 40 from the first set of proteins were used, residue 70 being picked from the second set of proteins. The resulting 170 mutants were assayed for cleavage in yeast containing a LacZ assay with the combined target sequence tcaacaccagt_P (SEQ ID NO: 49). Thirty seven combined mutants were capable of cleaving the target (
oI-CreI and I-CreI N75 are indicated as references. Amino-acid residues are indicated only when different from I-CreI.
The objective here is to determine whether it is possible to identify and combine separable functional subdomains in the I-CreI DNA-binding interface, in order to cleave novel DNA targets. All target sequences described in this example are 24 by palindromic sequences. Therefore, they will be described only by the first 12 nucleotides, followed by the suffix_P, solely to indicate that (for example, target 5′ tcaaaacgtcgtacgacgtatga 3′ (SEQ ID NO:1), cleaved by the I-CreI protein, will be called tcaaaacgtcgt_P).
Two series of mutations in the I-fret monomer were obtained as described in examples 1 and 4. In a first step, a D75N mutation was introduced in the I-CreI scaffold, in order to decrease likely energetic strains caused by the replacement of the basic residues R68 and R70 in the library that satisfy the hydrogen-acceptor potential of the buried D75 in the I-CreI structure. Then mutants able to cleave the tcaacacgtcgt_P target (SEQ ID NO:50) were obtained by mutagenesis of positions 28, 30, 33 or 28, 33, 38, and 40, (Table VI), and mutants able to cleave tcaaaactttgt_P (SEQ ID NO: 51) were obtained by mutagenesis of positions 44, 68 and 70 cleave (Table VI).
Positions 28, 30, 33, 38 and 40 on one hand, and 44, 68 and 70, on another hand are on a same DNA-binding fold, and there is no structural evidence that they should behave independently. However, the two sets of mutations are clearly on two spatially distinct regions of this fold (
The experimental procedures are described in example 7.
Five variants mutated in positions 28, 30, 33, 40 and 70, and capable of cleaving the sequence teaaeacgtegt_P (SEQ ID NO:50) were combined with 29 different variants mutated in positions 44, 68 and 70, and capable of cleaving the sequence tcaaaactttgt_P (SEQ ID NO:51). Mutations in position 70 are found in both set of proteins. However, the hypothesis of two separable functional subdomains implies that this position has little impact on the specificity in ±10 to ±8. Therefore, in the combined protein, only the residues 30, 33, 38 and 40 from the first set of proteins were used, residue 70 being picked from the second set of proteins. The resulting 145 mutants were assayed for cleavage in yeast containing a LacZ assay with the combined target sequence tcaaeactttgt_P (SEQ ID NO:52). Twenty three active combined mutants were identified. However, for all of them, one parental mutant was also cleaving the target. Nevertheless, this demonstrates a large degree of liberty between the two sets of mutations. Combined mutants capable of cleaving the target were capable of cleaving the combined sequence as individual mutants (
oI-CreI and I-CreI N75 are indicated as references. Amino-acid residues are indicated only when different from I-CreI.
Novel I-CreI variants were expressed, purified, and analyzed for in vitro cleavage as reported previously (Arnould et al., precited). Circular dichroism (CD) measurements were performed on a Jasco J-810 spectropolarimeter using a 0.2 cm path length quartz cuvette. Equilibrium unfolding was induced increasing temperature at a rate of 1° C./min (using a programmable Peltier thermoelectric). Samples were prepared by dialysis against 25 mM potassium phosphate buffer, pH 7.5, at protein concentrations of 20 μM.
Four combinatorial mutants cleaving COMB2 or COMB3, and their corresponding parent mutants were analyzed in vitro in order to compare their relative cleavage efficiencies. As can be observed in
Importantly, the differences in activity levels between mutants were also consistent with the variations observed in yeast, and this congruency was further confirmed by the in vitro study of 4 additional mutants cleaving COMB3. Thus, the variations of signal observed in yeast are not due to differences in expression levels, but really reflect differences in binding/and or cleavage properties.
Finally, analysis of the structure and stability of this group of combinatorial mutants was performed using far-UV CD (
The experimental procedures are described in example 3.
To determine if combinatorial mutants could function efficiently as heterodimers, a subset of mutants capable of cleaving the palindromic sites COMB2 and COMB3 were co-expressed in yeast and assayed for their ability to cleave the chimeric site COMB 1, corresponding to the fusion of the two half sites of the original targets (
Cleavage of the COMB1 target was also detected in vitro when the KNHQS/KAS and NNSRK/ARR purified proteins were incubated together with the COMB 1 target in our conditions, while incubation of single protein did not give rise to any detectable cleavage activity. However, the cleavage efficiency was extremely low, which might result from slow heterodimer formation in vitro. Indeed, Silva et al. could show that engineered derivatives from I-Drool had to be coexpressed in E. coli to form active heterodimers (Nucleic Acids Res., 2004, 32, 3156-3168) and is not clear whether I-CreI homodimers can exchange subunits easily. Actually, it cannot be excluded that low levels of cleavage could result from an alternative pathway, such as subsequent nicking by the two homodimers in solution, and we are currently investigating this issue.
Altogether, these results indicate that a combinatorial approach can generate artificial HEs capable of effectively cleaving chimeric target sites altered at position 10NNN and 5NNN. The generation of collections of I-CreI derivatives allows today for cleavage of all 64 10NNN targets and 62 out of the 64 5NNN targets (our unpublished data). The ability to combine them intramolecularly as well as intermolecularly, increases the number of attainable 22-mers to at least 1.57×107 ((64×62)2).
The experimental procedures are as described in examples 3 and 7, with the exception that the combinatorial mutants for the RAG target were generated as libraries, in contrast to the combinatorial mutants for the COMB targets (example 7) which were generated individually.
To analyse the effectiveness of a combinatorial approach for designing HEs for natural target sites, the human RAG1 gene was analysed for potential sites compatible with mutants present in the 10NNN and 5NNN libraries. RAG1 has been shown to form a complex with RAG2 that is responsible for the initiation of V(D)J recombination, an essential step in the maturation of immunoglobulins and T lymphocyte receptors (Oettinger et al., Science, 1990, 248, 1517-1523; Schatz et al., Cell, 1989, 59, 1035-1048). Patients with mutations in RAG1 display severe combined immune deficiency (SCID) due to the absence of T and B lymphocytes. SCID can be treated by allogenic hematopoetic stem cell transfer from a familial donor and recently certain types of SCID have been the subject of gene therapy trials (Fischer et al., Immunol. Rev., 2005, 203, 98-109).
Analysis of the genomic locus of RAG 1 revealed a potential target site located 11 bp upstream of the coding exon of RAG1, that was called RAG1.1 (
In contrast with the mutants used for COMB targets, which were generated individually, mutants used for RAG targets were generated in libraries. For the RAG1.2 target sequence, a library with a putative complexity of 1300 mutants was generated. Screening of 2256 clones yielded 64 positives (2.8%), which after sequencing, turned out to correspond to 49 unique endonucleases. For RAM 1.3, 2280 clones were screened, and 88 positives were identified (3.8%), corresponding to 59 unique endonucleases. In both cases, the combinatorial mutants were unable to cleave the 5NNN and 10NNN target sequences as well as the original C1221. In contrast with COMB mutants, which were generated and tested individually, RAG mutants were generated as libraries. Nevertheless, no obvious bias was detected in these libraries, and these frequencies should be representative of the real frequency of functional positives. This lower success rate, compared with screening with the COMB targets, could be due to the additional mutations at positions 75 and 77, or from the additional changes at positions ±6, ±7 and ±11 in these targets.
As for COMB1, a panel of mutants able to cleave the palindromic targets was then co-expressed in the yeast to test the RAG1.1 target cleavage.
Number | Date | Country | |
---|---|---|---|
Parent | 12091632 | Apr 2008 | US |
Child | 13916716 | US |