The present invention relates to RNA-guided recombineering-editing systems using phage recombination enzymes as well as methods, vectors, nucleic acid compositions, and kits thereof.
The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) system, originally found in bacteria and archaea as part of the immune system to defend against invading viruses, forms the basis for genome editing technologies that can be programmed to target specific stretches of a genome or other DNA for editing at precise locations. While various CRISPR-based tools are available, the majority are geared towards editing short sequences. Long-sequence editing is highly sought after in the engineering of model systems, therapeutic cell production and gene therapy. Prior studies have developed technologies to improve Cas9-mediated homology-5 directed repair (HDR), and tools leveraging nucleic acid modification enzymes with Cas9, e.g., prime-editing, demonstrated editing up to 80 base-pairs (bp) in length. Despite these progresses, there are continued demands for large-scale mammalian genome engineering with high efficiency and fidelity.
Provided herein are systems and methods that facilitate nucleic acid editing in a manner that allows large-scale nucleic acid editing with high accuracy and low off-target errors. These systems and methods employ a combination of microbial recombination components with CRISPR recombination components.
For example, disclosed herein are systems comprising a protein, a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence, and a microbial recombination protein. The microbial recombination protein may be, for example, RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof. In some embodiments, the system further comprises donor DNA. In some embodiments, the target DNA sequence is a genomic DNA sequence in a host cell.
In some embodiments, the system further comprises a recruitment system comprising at least one aptamer sequence and an aptamer binding protein functionally linked to the microbial recombination protein as part of a fusion protein. In some embodiments, the aptamer sequence is an RNA aptamer sequence or a peptide aptamer sequence. In some embodiments, the RNA aptamer sequence is part of the nucleic acid molecule. In some embodiments, the nucleic acid molecule comprises two RNA aptamer sequences. In some embodiments, the microbial recombination protein is functionally linked to the aptamer binding protein as a fusion protein. In some embodiments, the binding protein comprises a MS2 coat protein, a lambda N22 peptide, or a functional derivative, fragment, or variant thereof. In some embodiments, the fusion protein further comprises a linker and/or a nuclear localization sequence.
Disclosed herein are compositions comprising a nucleic acid sequence encoding a fusion protein comprising a microbial recombination protein functionally linked to an aptamer binding protein. The microbial recombination protein may be RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof. The compositions may further comprise one or both of a polynucleotide comprising a nucleic acid sequence encoding a Cas protein and a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence. In some embodiments, the nucleic acid molecule further comprises at least one RNA aptamer sequence. In some embodiments, the polynucleotide comprising a nucleic acid sequence encoding a Cas protein further comprises a sequence encoding at least one peptide aptamer sequence.
Also disclosed herein are vectors comprising a nucleic acid sequence encoding a fusion protein comprising a microbial recombination protein functionally linked to an aptamer binding protein. The microbial recombination protein may be RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof. The vectors may further comprise one or both of a polynucleotide comprising a nucleic acid sequence encoding a Cas protein and a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence. In some embodiments, the nucleic acid molecule further comprises at least one RNA aptamer sequence. In some embodiments, the polynucleotide comprising a nucleic acid sequence encoding a Cas protein further comprises a sequence encoding at least one peptide aptamer sequence.
In some embodiments, the RecE and RecT recombination protein is derived from E. coli. In some embodiments, the RecE, or derivative or variant thereof, comprises an amino acid sequence with at least 70% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-8. In some embodiments, the RecT, or derivative or variant thereof, comprises an amino acid sequence with at least 70% similarity to amino acid sequences selected from the group consisting of SEQ ID NO: 9.
In some embodiments, the Cas protein is Cas9 or Cas12a. In some embodiments, the Cas protein is a catalytically dead. In some embodiments, the Cas9 protein is wild-type Streptococcus pyogenes Cas9 or a wild-type Staphylococcus aureus Cas9. In some embodiments, the Cas9 protein is a Cas9 nickase (e.g., wild-type Streptococcus pyogenes Cas9 with an amino acid substation at position 10 of D10A).
Also disclosed is a eukaryotic cell comprising the systems or vectors disclosed herein.
Further disclosed herein are methods of altering a target genomic DNA sequence in a host cell. The methods comprise contacting the systems, compositions, or vectors described herein with a target DNA sequence (e.g., introducing the systems, compositions, or vectors described herein into a host cell comprising a target genomic DNA sequence). Kits containing one or more reagents or other components useful, necessary, or sufficient for practicing any of the methods are also disclosed herein.
Other aspects and embodiments of the disclosure will be apparent in light of the following detailed description and accompanying figures.
The present disclosure is directed to a system and the components for DNA editing. In particular, the disclosed system based on CRISPR targeting and homology directed repair by phage recombination enzymes. The system results in superior recombination efficiency and accuracy at a kilobase scale.
To facilitate an understanding of the present technology, a number of terms and phrases are defined below. Additional definitions are set forth throughout the detailed description.
The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms, or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “and” and “the” include plural references unless the context clearly dictates otherwise. The present disclosure also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not.
For the recitation of numeric ranges herein, each intervening number there between with the same degree of precision is explicitly contemplated. For example, for the range of 6-9, the numbers 7 and 8 are contemplated in addition to 6 and 9, and for the range 6.0-7.0, the number 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, and 7.0 are explicitly contemplated.
Unless otherwise defined herein, scientific, and technical terms used in connection with the present disclosure shall have the meanings that are commonly understood by those of ordinary skill in the art. For example, any nomenclature used in connection with, and techniques of, cell and tissue culture, molecular biology, immunology, microbiology, genetics and protein and nucleic acid chemistry and hybridization described herein are those that are well known and commonly used in the art. The meaning and scope of the terms should be clear; in the event, however of any latent ambiguity, definitions provided herein take precedent over any dictionary or extrinsic definition. Further, unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular.
The terms “complementary” and “complementarity” refer to the ability of a nucleic acid to form hydrogen bond(s) with another nucleic acid sequence by either traditional Watson-Crick base-paring or other non-traditional types of pairing. The degree of complementarity between two nucleic acid sequences can be indicated by the percentage of nucleotides in a nucleic acid sequence which can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 50%, 60%, 70%, 80%, 90%, and 100% complementary). Two nucleic acid sequences are “perfectly complementary” if all the contiguous nucleotides of a nucleic acid sequence will hydrogen bond with the same number of contiguous nucleotides in a second nucleic acid sequence. Two nucleic acid sequences are “substantially complementary” if the degree of complementarity between the two nucleic acid sequences is at least 60% (e.g., 65%, 70%, 75%, 80%, 85%, 90%, 95%. 97%, 98%, 99%, or 100%) over a region of at least 8 nucleotides (e.g., 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more nucleotides), or if the two nucleic acid sequences hybridize under at least moderate, preferably high, stringency conditions. Exemplary moderate stringency conditions include overnight incubation at 37° C. in a solution comprising 20% formamide, 5×SSC (150 mM NaCl, 15 mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5×Denhardt's solution, 10% dextran sulfate, and 20 mg/ml denatured sheared salmon sperm DNA, followed by washing the filters in 1×SSC at about 37-50° C., or substantially similar conditions, e.g., the moderately stringent conditions described in Sambrook et al., infra. High stringency conditions are conditions that use, for example (1) low ionic strength and high temperature for washing, such as 0.015 M sodium chloride/0.0015 M sodium citrate/0.1% sodium dodecyl sulfate (SDS) at 50° C., (2) employ a denaturing agent during hybridization, such as formamide, for example, 50% (v/v) formamide with 0.1% bovine serum albumin (BSA)/0.1% Ficoll/0.1% polyvinylpyrrolidone (PVP)/50 mM sodium phosphate buffer at pH 6.5 with 750 mM sodium chloride and 75 mM sodium citrate at 42° C., or (3) employ 50% formamide, 5×SSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM sodium phosphate (pH 6.8), 0.1% sodium pyrophosphate, 5×Denhardt's solution, sonicated salmon sperm DNA (50 μg/ml), 0.1% SDS, and 10% dextran sulfate at 42° C., with washes at (i) 42° C. in 0.2×SSC, (ii) 55° C. in 50% formamide, and (iii) 55° C. in 0.1×SSC (preferably in combination with EDTA). Additional details and an explanation of stringency of hybridization reactions are provided in, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, 3rd ed., Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (2001); and Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing Associates and John Wiley & Sons, New York (1994).
A cell has been “genetically modified,” “transformed,” or “transfected” by exogenous DNA, e.g., a recombinant expression vector, when such DNA has been introduced inside the cell. The presence of the exogenous DNA results in permanent or transient genetic change. The transforming DNA may or may not be integrated (covalently linked) into the genome of the cell. In prokaryotes, yeast, and mammalian cells for example, the transforming DNA may be maintained on an episomal element such as a plasmid. With respect to eukaryotic cells, a stably transformed cell is one in which the transforming DNA has become integrated into a chromosome so that it is inherited by daughter cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones that comprise a population of daughter cells containing the transforming DNA. A “clone” is a population of cells derived from a single cell or common ancestor by mitosis. A “cell line” is a clone of a primary cell that is capable of stable growth in vitro for many generations.
As used herein, a “nucleic acid” or a “nucleic acid sequence” refers to a polymer or oligomer of pyrimidine and/or purine bases, preferably cytosine, thymine, and uracil, and adenine and guanine, respectively. The present technology contemplates any deoxyribonucleotide, ribonucleotide, or peptide nucleic acid component, and any chemical variants thereof, such as methylated, hydroxymethylated, or glycosylated forms of these bases, and the like. The polymers or oligomers may be heterogenous or homogenous in composition and may be isolated from naturally occurring sources or may be artificially or synthetically produced. In addition, the nucleic acids may be DNA or RNA, or a mixture thereof, and may exist permanently or transitionally in single-stranded or double-stranded form, including homoduplex, heteroduplex, and hybrid states. In some embodiments, a nucleic acid or nucleic acid sequence comprises other kinds of nucleic acid structures such as, for instance, a DNA/RNA helix, peptide nucleic acid (PNA), morpholino nucleic acid (see, e.g., Braasch and Corey, Biochemistry, 41(14): 4503-4510 (2002)) and U.S. Pat. No. 5,034,506, incorporated herein by reference), locked nucleic acid (LNA; see Wahlestedt et al., Proc. Natl. Acad. Sci. U.S.A., 97: 5633-5638 (2000), incorporated herein by reference), cyclohexenyl nucleic acids (see Wang, J. Am. Chem. Soc., 122: 8595-8602 (2000), incorporated herein by reference), and/or a ribozyme. Hence, the term “nucleic acid” or “nucleic acid sequence” may also encompass a chain comprising non-natural nucleotides, modified nucleotides, and/or non-nucleotide building blocks that can exhibit the same function as natural nucleotides (e.g., “nucleotide analogs”); further, the term “nucleic acid sequence” as used herein refers to an oligonucleotide, nucleotide or polynucleotide, and fragments or portions thereof, and to DNA or RNA of genomic or synthetic origin, which may be single or double-stranded, and represent the sense or antisense strand. The terms “nucleic acid,” “polynucleotide,” “nucleotide sequence,” and “oligonucleotide” are used interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof.
A “peptide” or “polypeptide” is a linked sequence of two or more amino acids linked by peptide bonds. The peptide or polypeptide can be natural, synthetic, or a modification or combination of natural and synthetic. Polypeptides include proteins such as binding proteins, receptors, and antibodies. The proteins may be modified by the addition of sugars, lipids or other moieties not included in the amino acid chain. The terms “polypeptide” and “protein,” are used interchangeably herein.
As used herein, the term “percent sequence identity” refers to the percentage of nucleotides or nucleotide analogs in a nucleic acid sequence, or amino acids in an amino acid sequence, that is identical with the corresponding nucleotides or amino acids in a reference sequence after aligning the two sequences and introducing gaps, if necessary, to achieve the maximum percent identity. Hence, in case a nucleic acid according to the technology is longer than a reference sequence, additional nucleotides in the nucleic acid, that do not align with the reference sequence, are not taken into account for determining sequence identity. Methods and computer programs for alignment are well known in the art, including BLAST, Align 2, and FASTA.
A “vector” or “expression vector” is a replicon, such as plasmid, phage, virus, or cosmid, to which another DNA segment, e.g., an “insert,” may be attached or incorporated so as to bring about the replication of the attached segment in a cell.
The term “wild-type” refers to a gene or a gene product that has the characteristics of that gene or gene product when isolated from a naturally occurring source. A wild-type gene is that which is most frequently observed in a population and is thus arbitrarily designated the “normal” or “wild-type” form of the gene. In contrast, the term “modified,” “mutant,” or “polymorphic” refers to a gene or gene product that displays modifications in sequence and or functional properties (e.g., altered characteristics) when compared to the wild-type gene or gene product. It is noted that naturally occurring mutants can be isolated; these are identified by the fact that they have altered characteristics when compared to the wild-type gene or gene product.
In bacteria and archaea, CRISPR/Cas systems provide immunity by incorporating fragments of invading phage, virus, and plasmid DNA into CRISPR loci and using corresponding CRISPR RNAs (“crRNAs”) to guide the degradation of homologous sequences. Each CRISPR locus encodes acquired “spacers” that are separated by repeat sequences. Transcription of a CRISPR locus produces a “pre-crRNA,” which is processed to yield crRNAs containing spacer-repeat fragments that guide effector nuclease complexes to cleave dsDNA sequences complementary to the spacer. Three different types of CRISPR systems are known, type I, type II, or type III, and classified based on the Cas protein type and the use of a proto-spacer-adjacent motif (PAM) for selection of proto-spacers in invading DNA. The endogenous type II systems comprise the Cas9 protein and two noncoding crRNAs: trans-activating crRNA (tracrRNA) and a precursor crRNA (pre-crRNA) array containing nuclease guide sequences (also referred to as “spacers”) interspaced by identical direct repeats (DRs). tracrRNA is important for processing the pre-crRNA and formation of the Cas9 complex. First, tracrRNAs hybridize to repeat regions of the pre-crRNA. Second, endogenous RNaseIII cleaves the hybridized crRNA-tracrRNAs, and a second event removes the 5′ end of each spacer, yielding mature crRNAs that remain associated with both the tracrRNA and Cas9. Third, each mature complex locates a target double stranded DNA (dsDNA) sequence and cleaves both strands using the nuclease activity of Cas9.
CRISPR/Cas gene editing systems have been developed to enable targeted modifications to a specific gene of interest in eukaryotic cells. CRISPR/Cas gene editing systems are commonly based on the RNA-guided Cas9 nuclease from the type II prokaryotic clustered regularly interspaced short palindromic repeats (CRISPR) adaptive immune system. Engineering CRISPR/Cas systems for use in eukaryotic cells typically involves reconstitution of the crRNA-tracrRNA-Cas9 complex. In human cells, for example, the Cas9 amino acid sequence may be codon-optimized and modified to include an appropriate nuclear localization signal, and the crRNA and tracrRNA sequences may be expressed individually or as a single chimeric molecule via an RNA polymerase II promoter. Typically, the crRNA and tracrRNA sequences are expressed as a chimera and are referred to collectively as “guide RNA” (gRNA) or single guide RNA (sgRNA). Thus, the terms “guide RNA,” “single guide RNA,” and “synthetic guide RNA,” are used interchangeably herein and refer to a nucleic acid sequence comprising a tracrRNA and a pre-crRNA array containing a guide sequence. The terms “guide sequence,” “guide,” and “spacer,” are used interchangeably herein and refer to the about 20 nucleotide sequence within a guide RNA that specifies the target site. In CRISPR/Cas9 systems, the guide RNA contains an approximate 20-nucleotide guide sequence followed by a protospacer adjacent motif (PAM) that directs Cas9 via Watson-Crick base pairing to a target sequence.
In some embodiments, the disclosure provides a system for RNA-guided recombineering utilizing tools from CRISPR gene editing systems. The system comprises: a Cas protein, a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and a microbial recombination protein.
Cas protein families are described in further detail in, e.g., Haft et al., PLoS Comput. Biol., 1(6): e60 (2005), incorporated herein by reference. The Cas protein may be any Cas endonucleases. In some embodiments, the Cas protein is Cas9 or Cas12a, otherwise referred to as Cpf1. In one embodiment, the Cas9 protein is a wild-type Cas9 protein. The Cas9 protein can be obtained from any suitable microorganism, and a number of bacteria express Cas9 protein orthologs or variants. In some embodiments, the Cas9 is from Streptococcus pyogenes or Staphylococcus aureus. Cas9 proteins of other species are known in the art (see, e.g., U.S. Patent Application Publication 2017/0051312, incorporated herein by reference) and may be used in connection with the present disclosure. The amino acid sequences of Cas proteins from a variety of species are publicly available through the GenBank and UniProt databases.
In some embodiments, the Cas9 protein is a Cas9 nickase (Cas9n). Wild-type Cas9 has two catalytic nuclease domains facilitating double-stranded DNA breaks. A Cas9 nickase protein is typically engineered through inactivating point mutation(s) in one of the catalytic nuclease domains causing Cas9 to nick or enzymatically break only one of the two DNA strands using the remaining active nuclease domain. Cas9 nickases are known in the art (see, e.g., U.S. Patent Application Publication 2017/0051312, incorporated herein by reference) and include, for example, Streptococcus pyogenes with point mutations at D10 or H840. In select embodiments, the Cas9 nickase is Streptococcus pyogenes Cas9n (D10A).
In some embodiments, the Cas protein is a catalytically dead Cas. For example, catalytically dead Cas9 is essentially a DNA-binding protein due to, typically, two or more mutations within its catalytic nuclease domains which renders the protein with very little or no catalytic nuclease activity. Streptococcus pyogenes Cas9 may be rendered catalytically dead by mutations of D10 and at least one of E762, H840, N854, N863, or D986, typically H840 and/or N863 (see, e.g., U.S. Patent Application Publication 2017/0051312, incorporated herein by reference). Mutations in corresponding orthologs are known, such as N580 in Staphylococcus aureus Cas9. Oftentimes, such mutations cause catalytically dead Cas proteins to possess no more than 3% of the normal nuclease activity.
In some embodiments, the system comprises a nucleic acid molecule comprising a guide RNA sequence complementary to a target DNA sequence. The guide RNA sequence, as described above, specifies the target site with an approximate 20-nucleotide guide sequence followed by a protospacer adjacent motif (PAM) that directs Cas9 via Watson-Crick base pairing to a target sequence.
The terms “target DNA sequence,” “target nucleic acid,” “target sequence,” and “target site” are used interchangeably herein to refer to a polynucleotide (nucleic acid, gene, chromosome, genome, etc.) to which a guide sequence (e.g., a guide RNA) is designed to have complementarity, wherein hybridization between the target sequence and a guide sequence promotes the formation of a Cas9/CRISPR complex, provided sufficient conditions for binding exist. In some embodiments, the target sequence is a genomic DNA sequence. The term “genomic,” as used herein, refers to a nucleic acid sequence (e.g., a gene or locus) that is located on a chromosome in a cell. The target sequence and guide sequence need not exhibit complete complementarity, provided that there is sufficient complementarity to cause hybridization and promote formation of a CRISPR complex. A target sequence may comprise any polynucleotide, such as DNA or RNA. Suitable DNA/RNA binding conditions include physiological conditions normally present in a cell. Other suitable DNA/RNA binding conditions (e.g., conditions in a cell-free system) are known in the art; see, e.g., Sambrook, referenced herein and incorporated by reference. The strand of the target DNA that is complementary to and hybridizes with the DNA-targeting RNA is referred to as the “complementary strand” and the strand of the target DNA that is complementary to the “complementary strand” (and is therefore not complementary to the DNA-targeting RNA) is referred to as the “noncomplementary strand” or “non-complementary strand.”
The target genomic DNA sequence may encode a gene product. The term “gene product,” as used herein, refers to any biochemical product resulting from expression of a gene. Gene products may be RNA or protein. RNA gene products include non-coding RNA, such as tRNA, rRNA, micro RNA (miRNA), and small interfering RNA (siRNA), and coding RNA, such as messenger RNA (mRNA). In some embodiments, the target genomic DNA sequence encodes a protein or polypeptide.
In some embodiments, for instance, when the system includes a Cas9 nickase or a catalytically dead Cas 9, two nucleic acid molecules comprising a guide RNA sequence may be utilized. The two nucleic acid molecules may have the same or different guide RNA sequences, thus complementary to the same or different target DNA sequence. In some embodiments, the guide RNA sequences of the two nucleic acid molecules are complementary to a target DNA sequences at opposite ends (e.g., 3′ or 5′) and/or on opposite strands of the insert location.
In some embodiments, the system further comprises a recruitment system comprising at least one aptamer sequence and an aptamer binding protein functionally linked to the microbial recombination protein as part of a fusion protein.
In some embodiments, the aptamer sequence is an RNA aptamer sequence. In some embodiments, the nucleic acid molecule comprising the guide RNA also comprises one or more RNA aptamers, or distinct RNA secondary structures or sequences that can recruit and bind another molecular species, an adaptor molecule, such as a nucleic acid or protein. The RNA aptamers can be naturally occurring or synthetic oligonucleotides that have been engineered through repeated rounds of in vitro selection or SELEX (systematic evolution of ligands by exponential enrichment) to bind to a specific target molecular species. In some embodiments, the nucleic acid comprises two or more aptamer sequences. The aptamer sequences may be the same or different and may target the same or different adaptor proteins. In select embodiments, the nucleic acid comprises two aptamer sequences.
Any RNA aptamer/aptamer binding protein pair known may be selected and used in connection with the present disclosure (see, e.g., Jayasena, S. D., Clinical Chemistry, 1999. 45(9): p. 1628-1650; Gelinas, et al., Current Opinion in Structural Biology, 2016. 36: p. 122-132; and Hasegawa, H., Molecules, 2016; 21(4): p. 421, incorporated herein by reference).
A number of RNA aptamer binding, or adaptor, proteins exist, including a diverse array of bacteriophage coat proteins. Examples of such coat proteins include but are not limited to: MS2, Qβ, F2, GA, fr, JP501, M12, R17, BZ13, JP34, JP500, KU1, M11, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, ϕCb5, ϕCb8r, ϕCb12r, ϕCb23r, 7s and PRR1. In some embodiments, the RNA aptamer binds MS2 bacteriophage coat protein or a functional derivative, fragment or variant thereof. MS2 binding RNA aptamers commonly have a simple stem-loop structure, classically defined by a 19 nucleotide RNA molecule with a single bulged adenine on the 5′ leg of the stem (Witherall G. W., et al., (1991) Prog. Nucleic Acid Res. Mol. Biol., 40, 185-220, incorporated herein by reference). However, a number of vastly different primary sequences were found to be able to bind the MS2 coat protein (Parrott A M, et al., Nucleic Acids Res. 2000; 28(2):489-497, Buenrostro J D, et al. Natura Biotechnology 2014; 32, 562-568, and incorporated herein by reference). Any of the RNA aptamer sequence known to bind the MS2 bacteriophage coat protein may be utilized in connection with the present disclosure. In select embodiments, the MS2 RNA aptamer sequence comprises: AACAUGAGGAUCACCCAUGUCUGCAG (SEQ ID NO:145), AGCAUGAGGAUCACCCAUGUCUGCAG (SEQ ID NO:146), or AGCGUGAGGAUCACCCAUGCCUGCAG (SEQ ID NO:147).
N-proteins (Nut-utilization site proteins) of bacteriophages contain arginine-rich conserved RNA recognition motifs of ˜20 amino acids, referred to as N peptides. The RNA aptamer may bind a phage N peptide or a functional derivative, fragment or variant thereof. In some embodiments, the phage N peptide is the lambda or P22 phage N peptide or a functional derivative, fragment or variant thereof.
In select embodiments, the N peptide is lambda phage N22 peptide, or a functional derivative, fragment or variant thereof. In some embodiments, the N22 peptide comprises an amino acid sequence with at least 70% similarity to the amino acid sequence GNARTRRRERRAEKQAQWKAAN (SEQ ID NO: 149). N22 peptide, the 22 amino acid RNA-binding domain of the λ bacteriophage antiterminator protein N (λN-(1-22) or λN peptide), is capable of specifically binding to specific stem-loop structures, including but not limited to the BoxB stem-loop. See, for example Cilley and Williamson, RNA 1997; 3(1):57-67, incorporated herein by reference. A number of different BoxB stem-loop primary sequences are known to bind the N22 peptide and any of those may be utilized in connection with the present disclosure. In some embodiments, the N22 peptide RNA aptamer sequence comprises a nucleotide sequence with at least 70% similarity to an RNA sequence selected from the group consisting of GCCCUGAAAAAGGGC (SEQ ID NO: 150), GCCCUGAAGAAGGGC (SEQ ID NO: 151), GCGCUGAAAAAGCGC (SEQ ID NO: 152), GCCCUGACAAAGGGC (SEQ ID NO: 153), and GCGCUGACAAAGCGC (SEQ ID NO: 154). In some embodiments, the N22 peptide RNA aptamer sequence is selected from the group consisting of SEQ ID NOs: 150-154.
In select embodiments, the N peptide is the P22 phage N peptide, or a functional derivative, fragment or variant thereof. A number of different BoxB stem-loop primary sequences are known to bind the P22 phage N peptide and variants thereof and any of those may be utilized in connection with the present disclosure. See, for example Cocozaki, Ghattas, and Smith, Journal of Bacteriology 2008; 190(23):7699-7708, incorporated herein by reference. In some embodiments, the P22 phage N peptide comprises an amino acid sequence with at least 70% similarity to the amino acid sequence GNAKTRRHERRRKLAIERDTI (SEQ ID NO: 155). In some embodiments, the P22 phage N peptide RNA aptamer sequence comprises a sequence with at least 70% similarity to an RNA sequence selected from the group consisting of GCGCUGACAAAGCGC (SEQ ID NO: 156) and CCGCCGACAACGCGG (SEQ ID NO: 157). In some embodiments, the P22 phage N peptide RNA aptamer sequence is selected from the group consisting of SEQ ID NOs: 156-157, UGCGCUGACAAAGCGCG (SEQ ID NO: 158) or ACCGCCGACAACGCGGU (SEQ ID NO: 159).
In some embodiments, the aptamer sequence is a peptide aptamer sequence. The peptide aptamers can be naturally occurring or synthetic peptides that are specifically recognized by an affinity agent. Such aptamers include, but are not limited to, a c-Myc affinity tag, an HA affinity tag, a His affinity tag, an S affinity tag, a methionine-His affinity tag, an RGD-His affinity tag, a 7× His tag, a FLAG octapeptide, a strep tag or strep tag II, a V5 tag, or a VSV-G epitope. Corresponding aptamer binding proteins are well-known in the art and include, for example, primary antibodies, biotin, affimers, single domain antibodies, and antibody mimetics.
An exemplary peptide aptamer includes a GCN4 peptide (Tanenbaum et al., Cell 2014; 159(3):635-646, incorporated herein by reference). Antibodies, or GCN4 binding protein can be used as the aptamer binding proteins.
In some embodiments, the peptide aptamer sequence is conjugated to the Cas protein. The peptide aptamer sequence may be fused to the Cas in any orientation (e.g., N-terminus to C-terminus, C-terminus to N-terminus, N-terminus to N-terminus). In select embodiments, the peptide aptamer is fused to the C-terminus of the Cas protein.
In some embodiments, between 1 and 24 peptide aptamer sequences may be conjugated to the Cas protein. The aptamer sequences may be the same or different and may target the same or different aptamer binding proteins. In select embodiments, 1 to 24 tandem repeats of the same peptide aptamer sequence are conjugated to the Cas protein. In preferred embodiments between 4 and 18 tandem repeats are conjugated to the Cas protein. The individual aptamers may be separated by a linker region. Suitable linker regions are known in the art. The linker may be flexible or configured to allow the binding of affinity agents to adjacent aptamers without or with decreased steric hindrance. The linker sequences may provide an unstructured or linear region of the polypeptide, for example, with the inclusion of one or more glycine and/or serine residues. The linker sequences can be at least about 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acids in length.
In some embodiments, the fusion protein comprises a microbial recombination protein functionally linked to an aptamer binding protein. The microbial recombination protein may be RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof.
In select embodiments, the microbial recombination protein is RecE or RecT, or a derivative or variant thereof. Derivatives or variants of RecE and RecT are functionally equivalent proteins or polypeptides which possess substantially similar function to wild type RecE and RecT. RecE and RecT derivatives or variants include biologically active amino acid sequences similar to the wild-type sequences but differing due to amino acid substitutions, additions, deletions, truncations, post-translational modifications, or other modifications. In some embodiments, the derivatives may improve translation, purification, biological half-life, activity, or eliminate or lessen any undesirable side effects or reactions. The derivatives or variants may be naturally occurring polypeptides, synthetic or chemically synthesized polypeptides or genetically engineered peptide polypeptides. RecE and RecT bioactivities are known to, and easily assayed by, those of ordinary skill in the art, and include, for example exonuclease and single-stranded nucleic acid binding, respectively.
The RecE or RecT may be from a number of microbial organisms, including Escherichia coli, Pantoea breeneri, Type-F symbiont of Plautia stali, Providencia sp. MGF014, Shigella sonnei, Pseudobacteriovorax antillogorgiicola, among others. In preferred embodiments, the RecE and RecT protein is derived from Escherichia coli.
In some embodiments, the fusion protein comprises RecE, or a derivative or variant thereof. The RecE, or derivative or variant thereof, may comprise an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-8. The RecE, or derivative or variant thereof, may comprise an amino acid sequences with at least 70% (e.g., 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-8. In select embodiments, the RecE, or derivative or variant thereof, comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-8. In exemplary embodiments, the RecE, or derivative or variant thereof, comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-3.
In some embodiments, the fusion protein comprises RecT, or a derivative or variant thereof. The RecT, or derivative or variant thereof, may comprise an amino acid sequence selected from the group consisting of SEQ ID NOs: 9-14. The RecT, or derivative or variant thereof, may comprise an amino acid sequences with at least 70% (e.g., 75%., 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 9-14. In select embodiments, the RecT, or derivative or variant thereof, comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 9-14. In exemplary embodiments, the RecT, or derivative or variant thereof, comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NO: 9.
Truncations may be from either the C-terminal or N-terminal ends, or both. For example, as demonstrated in Example 6 below, a diverse set of truncations from either end or both provided a functional product. In some embodiments, one or more (2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 100, 120 or more) amino acids may be truncated from the C-terminal, N-terminal ends as compared to the wild-type sequence.
In the fusion protein, the microbial recombination protein may be linked to either terminus of the aptamer binding protein in any orientation (e.g., N-terminus to C-terminus, C-terminus to N-terminus, N-terminus to N-terminus). In select embodiments, the microbial recombination protein N-terminus is linked to the aptamer binding protein C-terminus. Thus, the overall fusion protein from N- to C-terminus comprises the aptamer binding protein (N- to C-terminus) linked to the microbial recombination protein (N- to C-terminus).
In some embodiments, the fusion protein further comprises a linker between the microbial recombination protein and the aptamer binding protein. The linkers may comprise any amino acid sequence of any length. The linkers may be flexible such that they do not constrain either of the two components they link together in any particular orientation. The linkers may essentially act as a spacer. In select embodiments, the linker links the C-terminus of the microbial recombination protein to the N-terminus of the aptamer binding protein. In select embodiments, the linker comprises the amino acid sequence of the 16-residue XTEN linker, SGSETPGTSESATPES (SEQ ID NO: 15) or the 37-residue EXTEN linker, SASGGSSGGSSGSETPGTSESATPESSGGSSGGSGGS (SEQ ID NO: 148).
In some embodiments, the fusion protein further comprises a nuclear localization sequence (NLS). The nuclear localization sequence may be at any location within the fusion protein (e.g., C-terminal of the aptamer binding protein, N-terminal of the aptamer binding protein, C-terminal of the microbial recombination protein). In select embodiments, the nuclear localization sequence is linked to the C-terminus of the microbial recombination protein. A number of nuclear localization sequences are known in the art (see, e.g., Lange, A., et al., J Biol Chem. 2007; 282(8): 5101-5105, incorporated herein by reference) and may be used in connection with the present disclosure. The nuclear localization sequence may be the SV40 NLS, PKKKRKV (SEQ ID NO:16); the Ty1 NLS, NSKKRSLEDNETEIKVSRDTWNTKNMRSLEPPRSKKRIH (SEQ ID NO: 17); the c-Myc NLS, PAAKRVKLD (SEQ ID NO:18); the biSV40 NLS, KRTADGSEFESPKKKRKV (SEQ ID NO: 19); and the Mut NLS, PEKKRRRPSGSVPVLARPSPPKAGKSSCI (SEQ ID NO: 20). In select embodiments, the nuclear localization sequence is the SV40 NLS, PKKKRKV (SEQ ID NO: 16).
The Cas protein and the fusion protein are desirably included in a single composition alone, in combination with each other, and/or the polynucleotide(s) (e.g., a vector) comprising the guide RNA sequence and the aptamer sequence. The Cas protein and/or the fusion protein may or may not be physically or chemically bound to the polynucleotide. The Cas protein and/or the microbial recombination protein can be associated with a polynucleotide using any suitable method for protein-protein linking or protein-virus linking known in the art.
The disclosure further provides compositions and vectors comprising a polynucleotide comprising a nucleic acid sequence encoding a fusion protein comprising a microbial recombination protein functionally linked to an RNA aptamer binding protein.
The compositions or vectors may further comprise at least one or both of a polynucleotide comprising a nucleic acid sequence encoding a Cas protein and a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence. In some embodiments, the nucleic acid molecule comprising a guide RNA sequence further comprises at least one RNA aptamer sequence. In some embodiments, the polynucleotide comprising a nucleic acid sequence encoding a Cas protein further comprises a sequence encoding at least one peptide aptamer sequence.
Descriptions of the nucleic acid molecule comprising a guide RNA sequence, the aptamer sequences, the Cas proteins, the microbial recombination proteins, and the aptamer binding proteins set forth above in connection with the inventive system also are applicable to the polynucleotides of the recited compositions and vectors.
The nucleic acid sequence encoding the Cas protein and/or the nucleic acid sequence encoding a fusion protein comprising a microbial recombination protein functionally linked to an aptamer binding protein can be provided to a cell on the same vector (e.g., in cis) as the nucleic acid molecule comprising the guide RNA sequence and/or the RNA aptamer sequence. In such embodiments, a unidirectional promoter can be used to control expression of each nucleic acid sequence. In another embodiment, a combination of bidirectional and unidirectional promoters can be used to control expression of multiple nucleic acid sequences.
In other embodiments, a nucleic acid sequence encoding the Cas protein, the nucleic acid sequence encoding a fusion protein comprising a microbial recombination protein functionally linked to an aptamer binding protein, and the nucleic acid molecule comprising the guide RNA sequence and/or the RNA aptamer sequence can be provided to a cell on separate vectors (e.g., in trans). Each of the nucleic acid sequences in each of the separate vectors can comprise the same or different expression control sequences. The separate vectors can be provided to cells simultaneously or sequentially.
The vector(s) comprising the nucleic acid sequences encoding the Cas protein and encoding a fusion protein comprising a microbial recombination protein functionally linked to an aptamer binding protein can be introduced into a host cell that is capable of expressing the polypeptide encoded thereby, including any suitable prokaryotic or eukaryotic cell. As such, the disclosure provides an isolated cell comprising the vector or nucleic acid sequences disclosed herein. Preferred host cells are those that can be easily and reliably grown, have reasonably fast growth rates, have well characterized expression systems, and can be transformed or transfected easily and efficiently. Examples of suitable prokaryotic cells include, but are not limited to, cells from the genera Bacillus (such as Bacillus subtilis and Bacillus brevis), Escherichia (such as E. coli), Pseudomonas, Streptomyces, Salmonella, and Envinia. Suitable eukaryotic cells are known in the art and include, for example, yeast cells, insect cells, and mammalian cells. Examples of suitable yeast cells include those from the genera Kluyveromyces, Pichia, Rhino-sporidium, Saccharomyces, and Schizosaccharomyces. Exemplary insect cells include Sf-9 and HIS (Invitrogen, Carlsbad, Calif.) and are described in, for example, Kitts et al., Biotechniques, 14: 810-817 (1993); Lucklow, Curr. Opin. Biotechnol., 4: 564-572 (1993); and Lucklow et al., J. Virol., 67: 4566-4579 (1993), incorporated herein by reference. Desirably, the host cell is a mammalian cell, and in some embodiments, the host cell is a human cell. A number of suitable mammalian and human host cells are known in the art, and many are available from the American Type Culture Collection (ATCC, Manassas, Va.). Examples of suitable mammalian cells include, but are not limited to, Chinese hamster ovary cells (CHO) (ATCC No. CCL61), CHO DHFR-cells (Urlaub et al., Proc. Natl. Acad. Sci. USA, 97: 4216-4220 (1980)), human embryonic kidney (HEK) 293 or 293T cells (ATCC No. CRL1573), and 3T3 cells (ATCC No. CCL92). Other suitable mammalian cell lines are the monkey COS-1 (ATCC No. CRL1650) and COS-7 cell lines (ATCC No. CRL1651), as well as the CV-1 cell line (ATCC No. CCL70). Further exemplary mammalian host cells include primate, rodent, and human cell lines, including transformed cell lines. Normal diploid cells, cell strains derived from in vitro culture of primary tissue, as well as primary explants, are also suitable. Other suitable mammalian cell lines include, but are not limited to, mouse neuroblastoma N2A cells, HeLa, HEK, A549, HepG2, mouse L-929 cells, and BHK or HaK hamster cell lines. Methods for selecting suitable mammalian host cells and methods for transformation, culture, amplification, screening, and purification of cells are known in the art.
The disclosure also provides a method of altering a target DNA. In some embodiments, the method alters genomic DNA sequence in a cell, although any desired nucleic acid may be modified. When applied to DNA contained in cells, the method comprises introducing the systems, compositions, or vectors described herein into a cell comprising a target genomic DNA sequence. Descriptions of the nucleic acid molecule comprising a guide RNA sequence, the Cas proteins, the microbial recombination proteins, the recruitment systems, and polynucleotides encoding thereof, the cell, the target genomic DNA sequence, and components thereof, set forth above in connection with the inventive system are also applicable to the method of altering a target genomic DNA sequence in a cell. The systems, composition or vectors may be introduced in any manner known in the art including, but not limited to, chemical transfection, electroporation, microinjection, biolistic delivery via gene guns, or magnetic-assisted transfection, depending on the cell type.
Upon introducing the systems described herein into a cell comprising a target genomic DNA sequence, the guide RNA sequence binds to the target genomic DNA sequence in the cell genome, the Cas protein associates with the guide RNA and may induce a double strand break or single strand nick in the target genomic DNA sequence and the aptamer recruits the microbial recombination proteins to the target genomic DNA sequence through the aptamer binding protein of the fusion protein, thereby altering the target genomic DNA sequence in the cell. When introducing the compositions, or vectors described herein into the cell, the nucleic acid molecule comprising a guide RNA sequence, the Cas9 protein, and the fusion protein are first expressed in the cell.
In some embodiments, the cell is in an organism or host, such that introducing the disclosed systems, compositions, vectors into the cell comprises administration to a subject. The method may comprise providing or administering to the subject, in vivo, or by transplantation of ex vivo treated cells, systems, compositions, vectors of the present system.
A “subject” may be human or non-human and may include, for example, animal strains or species used as “model systems” for research purposes, such a mouse model as described herein. Likewise, subject may include either adults or juveniles (e.g., children). Moreover, subject may mean any living organism, preferably a mammal (e.g., human or non-human) that may benefit from the administration of compositions contemplated herein. Examples of mammals include, but are not limited to, any member of the Mammalian class: humans, non-human primates such as chimpanzees, and other apes and monkey species; farm animals such as cattle, horses, sheep, goats, swine; domestic animals such as rabbits, dogs, and cats; laboratory animals including rodents, such as rats, mice and guinea pigs, and the like. Examples of non-mammals include, but are not limited to, birds, fish, and the like. In one embodiment of the methods and compositions provided herein, the mammal is a human.
As used herein, the terms “providing”, “administering,” “introducing,” are used interchangeably herein and refer to the placement of the systems of the disclosure into a subject by a method or route which results in at least partial localization of the system to a desired site. The systems can be administered by any appropriate route which results in delivery to a desired location in the subject.
The phrase “altering a DNA sequence,” as used herein, refers to modifying at least one physical feature of a DNA sequence of interest. DNA alterations include, for example, single or double strand DNA breaks, deletion, or insertion of one or more nucleotides, and other modifications that affect the structural integrity or nucleotide sequence of the DNA sequence. The modifications of a target sequence in genomic DNA may lead to, for example, gene correction, gene replacement, gene tagging, transgene insertion, nucleotide deletion, gene disruption, gene mutation, gene knock-down, and the like.
In some embodiments, the systems and methods described herein may be used to correct one or more defects or mutations in a gene (referred to as “gene correction”). In such cases, the target genomic DNA sequence encodes a defective version of a gene, and the system further comprises a donor nucleic acid molecule which encodes a wild-type or corrected version of the gene. Thus, in other words, the target genomic DNA sequence is a “disease-associated” gene. The term “disease-associated gene,” refers to any gene or polynucleotide whose gene products are expressed at an abnormal level or in an abnormal form in cells obtained from a disease-affected individual as compared with tissues or cells obtained from an individual not affected by the disease. A disease-associated gene may be expressed at an abnormally high level or at an abnormally low level, where the altered expression correlates with the occurrence and/or progression of the disease. A disease-associated gene also refers to a gene, the mutation or genetic variation of which is directly responsible or is in linkage disequilibrium with a gene(s) that is responsible for the etiology of a disease. Examples of genes responsible for such “single gene” or “monogenic” diseases include, but are not limited to, adenosine deaminase, α-1 antitrypsin, cystic fibrosis transmembrane conductance regulator (CFTR), β-hemoglobin (HBB), oculocutaneous albinism II (OCA2), Huntingtin (HTT), dystrophia myotonica-protein kinase (DMPK), low-density lipoprotein receptor (LDLR), apolipoprotein B (APOB), neurofibromin 1 (NF1), polycystic kidney disease 1 (PKD1), polycystic kidney disease 2 (PKD2), coagulation factor VIII (F8), dystrophin (DMD), phosphate-regulating endopeptidase homologue, X-linked (PHEX), methyl-CpG-binding protein 2 (MECP2), and ubiquitin-specific peptidase 9Y, Y-linked (USP9Y). Other single gene or monogenic diseases are known in the art and described in, e.g., Chial, H. Rare Genetic Disorders: Learning About Genetic Disease Through Gene Mapping, SNPs, and Microarray Data, Nature Education 1(1):192 (2008), incorporated herein by reference; Online Mendelian Inheritance in Man (OMIM); and the Human Gene Mutation Database (HGMD).
In another embodiment, the target genomic DNA sequence can comprise a gene, the mutation of which contributes to a particular disease in combination with mutations in other genes. Diseases caused by the contribution of multiple genes which lack simple (e.g., Mendelian) inheritance patterns are referred to in the art as a “multifactorial” or “polygenic” disease. Examples of multifactorial or polygenic diseases include, but are not limited to, asthma, diabetes, epilepsy, hypertension, bipolar disorder, and schizophrenia. Certain developmental abnormalities also can be inherited in a multifactorial or polygenic pattern and include, for example, cleft lip/palate, congenital heart defects, and neural tube defects.
In another embodiment, the method of altering a target genomic DNA sequence can be used to delete nucleic acids from a target sequence in a cell by cleaving the target sequence and allowing the cell to repair the cleaved sequence in the absence of an exogenously provided donor nucleic acid molecule. Deletion of a nucleic acid sequence in this manner can be used in a variety of applications, such as, for example, to remove disease-causing trinucleotide repeat sequences in neurons, to create gene knock-outs or knock-downs, and to generate mutations for disease models in research.
The term “donor nucleic acid molecule” refers to a nucleotide sequence that is inserted into the target DNA (e.g., genomic DNA). As described above the donor DNA may include, for example, a gene or part of a gene, a sequence encoding a tag or localization sequence, or a regulating element. The donor nucleic acid molecule may be of any length. In some embodiments, the donor nucleic acid molecule is between 10 and 10,000 nucleotides in length. For example, between about 100 and 5,000 nucleotides in length, between about 200 and 2,000 nucleotides in length, between about 500 and 1,000 nucleotides in length, between about 500 and 5,000 nucleotides in length, between about 1,000 and 5,000 nucleotides in length, or between about 1,000 and 10,000 nucleotides in length,
The disclosed systems and methods overcome challenges encountered during conventional gene editing, including low efficiency and off-target events, particularly with kilobase-scale nucleic acids. In some embodiments, the disclosed systems and methods improve the efficiency of gene editing. For example, the disclosed systems and methods can have a 2- to 10-fold increase in efficiency over conventional CRISPR-Cas9 systems and methods, as shown in Examples 2, 3, and 5. In some embodiments, the improvement in efficiency is accompanied by a reduction in off-target events. The off-target events may be reduced by greater than 50% compared to conventional CRISPR-Cas9 systems and methods, for example, a reduction of off-target events by about 90% is shown in Example 3. Another aspect of increasing the overall accuracy of a gene editing system is reducing the on-target insertion-deletions (indels), a byproduct of HDR editing. In some embodiments, the disclosed systems and methods reduce the on-target indels by greater than 90% compared to conventional CRISPR-Cas9 systems and methods, as shown in Example 3.
The disclosure further provides kits containing one or more reagents or other components useful, necessary, or sufficient for practicing any of the methods described herein. For example, kits may include CRISPR reagents (Cas protein, guide RNA, vectors, compositions, etc.), recombineering reagents (recombination protein-aptamer binding protein fusion protein, the aptamer sequence, vectors, compositions, etc.) transfection or administration reagents, negative and positive control samples (e.g., cells, template DNA), cells, containers housing one or more components (e.g., microcentrifuge tubes, boxes), detectable labels, detection and analysis instruments, software, instructions, and the like.
Any element of any suitable CRISPR/Cas gene editing system known in the art can be employed in the systems and methods described herein, as appropriate. CRISPR/Cas gene editing technology is described in detail in, for example, U.S. Pat. Nos. 8,546,553, 8,697,359; 8,771,945; 8,795,965; 8,865,406; 8,871,445; 8,889,356; 8,889,418; 8,895,308; 8,9066,616; 8,932,814; 8,945,839; 8,993,233; 8,999,641; 9,115,348; 9,149,049; 9,493,844; 9,567,603; 9,637,739; 9,663,782; 9,404,098; 9,885,026; 9,951,342; 10,087,431; 10,227,610; 10,266,850; 10,601,748; 10,604,771; and 10,760,064; and U.S. Patent Application Publication Nos. US2010/0076057; US2014/0113376; US2015/0050699; US2015/0031134; US2014/0357530; US2014/0349400; US2014/0315985; US2014/0310830; US2014/0310828; US2014/0309487; US2014/0294773; US2014/0287938; US2014/0273230; US2014/0242699; US2014/0242664; US2014/0212869; US2014/0201857; US2014/0199767; US2014/0189896; US2014/0186919; US2014/0186843; and US2014/0179770, each incorporated herein by reference.
The following examples further illustrate the invention but should not be construed as in any way limiting its scope.
Materials and Methods
RecE/T Homolog Screening RefSeq non-redundant protein database was downloaded from NCBI on Oct. 29, 2019. The database was searched with E. coli Rac prophage RecT (NP_415865.1) and RecE (NP_415866.1) as queries using position-specific iterated (PSI)-BLAST′ to retrieve protein homologs. Hits were clustered with CD-HIT2 and representative sequences were selected from each cluster for multiple alignment with MUSCLE3. Then, FastTree4 was used for maximum likelihood tree reconstruction with default parameters. A diverse set of RecET homologs were selected, synthesized by GenScript, and cloned into pMPH MCP vectors for testing.
Plasmids construction pX330, pMPH and pU6-(BbsI)_CBh-Cas9-T2A-BFP plasmids were obtained from Addgene. Tested effector DNA fragments were ordered from IDT, Genewiz, and GenScript. The fragments were Gibson assembled into the backbones using NEBuilder HiFi DNA Assembly Master Mix (New England BioLabs). All sgRNAs (Table 1) were inserted into backbones using Golden Gate cloning. All constructs were sequence-verified with Sanger sequencing of prepped plasmids.
Cell culture Human Embryonic Kidney (HEK) 293T, HeLa and HepG2 were maintained in Dulbecco's Modified Eagle's Medium (DMEM, Life Technologies), with 10% fetal bovine serum (FBS, HyClone), 100 U/mL penicillin, and 100 μg/mL streptomycin (Life Technologies) at 37° C. with 5% CO2.
hES-H9 cells were maintained in mTeSR1 medium (StemCell Technologies) at 37° C. with 5% CO2. Culture plates were pre-coated with Matrigel (Corning) 12 hours prior to use, and cells were supplemented with 10 μM Y27632 (Sigma) for the first 24 hours after passaging. Culture media was changed every 24 hours.
Transfection HEK293T cells were seeded into 96-well plates (Corning) 12-24 hours prior to transfection at a density of 30,000 cells/well, and 250 ng of total DNA was transfected per well. HeLa and HepG2 cells were seeded into 48-well plates (Corning) one day prior to transfection at a density of 50,000 and 30,000 cells/well respectively, and 400 ng of total DNA was transfected per well. Transfections were performed with Lipofectamine 3000 (Life Technologies) following the manufacturer's instructions.
Electroporation For hES-H9 related transfection experiments, P3 Primary Cell 4D-Nucleofector™ X Kit S (Lonza) was used following the manufacturer's protocol. For each reaction, 300,000 cells were nucleofected with 4 μg total DNA using the DC100 Nucleofector Program.
Fluorescence-activated cell sorting (FACS) mKate knock-in efficiency was analyzed on a CytoFLEX flow cytometer (Beckman Coulter; Stanford Stem Cell FACS Core). 72 hours after transfection, cells were washed once with PBS and dissociated with TrypLE Express Enzyme (Thermo Fisher Scientific). Cell suspension was then transferred to a 96-well U-bottom plate (Thermo Fisher Scientific) and centrifuged at 300×G for 5 minutes. After removing the supernatant, pelleted cells were resuspended with 50 μl 4% FBS in PBS, and cells were sorted within 30 minutes of preparation.
RFLP HEK293T cells were transfected with plasmid DNA and PCR templates and harvested after 72 hours for genomic DNA using the QuickExtract DNA Extraction Solution (Biosearch Technologies) following the manufacturer's protocol. The target genomic region was amplified using specific primers outside of the homology arms of the PCR template. PCR products were purified with Monarch PCR & DNA Cleanup Kit (New England BioLabs). 300 ng of purified product was digested with BsrGI (EMX1, New England BioLabs) or XbaI (VEGFA, NEB), and the digested products were analyzed on a 5% Mini-PROTEAN TBE gel (Bio-Rad).
Next-Generation Sequencing Library Preparation 72 hours after transfection, genomic DNA was extracted using QuickExtract DNA Extraction Solution (Biosearch Technologies). 200 ng total DNA was used for NGS library preparation. Genes of interest were amplified using specific primers (Table 2) for the first round PCR reaction. Illumina adapters and index barcodes were added to the fragments with a second round PCR using the primers listed in Table 2. Round 2 PCR products were purified by gel electrophoresis on a 2% agarose gel using the Monarch DNA Gel Extraction Kit (NEB). The purified product was quantified with Qubit dsDNA HS Assay Kit (Thermo Fisher) and sequenced on an Illumina MiSeq according to the manufacturer's instructions.
CCATCTCATCCCTGCGTGTCTCCCAGCGT
CCTCTCTATGGGCAGTCGGTGATgTTGGA
CCATCTCATCCCTGCGTGTCTCCACAAAA
CCTCTCTATGGGCAGTCGGTGATgGCTGA
CCATCTCATCCCTGCGTGTCTCCACACAC
CCTCTCTATGGGCAGTCGGTGATgAATGT
CCATCTCATCCCTGCGTGTCTCCGGCTAC
CCTCTCTATGGGCAGTCGGTGATgAGGAC
CCATCTCATCCCTGCGTGTCTCCGCAGGC
CCTCTCTATGGGCAGTCGGTGATgCCCTC
CCATCTCATCCCTGCGTGTCTCCGGAGG
CCTCTCTATGGGCAGTCGGTGATgCAAAT
CCATCTCATCCCTGCGTGTCTCCTGAGCG
CCTCTCTATGGGCAGTCGGTGATgGCCAG
High-throughput Sequencing Data Analysis Processed (demultiplexed, trimmed, and merged) sequencing reads were analyzed to determine editing outcomes using CRISPPResso25 by aligning sequenced amplicons to reference and expected HDR amplicons. The quantification window was increased to 10 bp surrounding the expected cut site to better capture diverse editing outcomes, but substitutions were ignored to avoid inclusion of sequencing errors. Only reads containing no mismatches to the expected amplicon were considered for HDR quantification; reads containing indels that partially matched the expected amplicons were included in the overall reported indel frequency.
Statistical Analysis Unless otherwise stated, all statistical analysis and comparison were performed using t-test, with 1% false-discovery-rate (FDR) using two-stage step-up method of Benjamini, Krieger and Yekutieli (Benjamini, Y., et. al, Biometrika 93, 491-507 (2006), incorporated herein by reference). All experiments were performed in triplicates unless otherwise noted to ensure sufficient statistical power in the analysis.
Determination of editing at predicted Cas9 off-target sites To evaluate RecT/RecE off-target editing activity at known Cas9 off-target sites, same genomic DNA extracts for knock-in analysis were used as template for PCR amplification of top predicted off-targets sites (high scored as predicted CRISPOR, a web-based analysis tool) for the EMX1, VEGFA guides, primer sequences are listed in Table 2.
iGUIDE Off-target Analysis Genome-wide, unbiased off-target analysis was performed following the iGUIDE pipeline (Nobles, C. L., et al. Genome Biol 20, 14 (2019), incorporated herein by reference) based on Guide-seq invented previously (Tsai, S., et al. Nat Biotechnol 33, 187-197 (2015), incorporated herein by reference). HEK293T cells were transfected in 20 uL Lonza SF Cell Line Nucleofector Solution on a Lonza Nucleofector 4-D with program DS-150 according to the manufacturer's instructions. 300 ng of gRNA-Cas9 plasmids (or 150 ng of each gRNACas9n plasmid for the double nickase), 150 ng of the effector plasmids, and 5 pmol of double stranded oligonucleotides (dsODN) were transfected. Cells were harvested after 72 hrs for genomic DNA using Agencourt DNAdvance reagent kit. 400 ng of purified gDNA which was then fragmented to an average of 500 bp and ligated with adaptors using NEBNext Ultra II FS DNA Library Prep kit following manufacturer's instructions. Two rounds of nested anchored PCR from the oligo tag to the ligated adaptor sequence were performed to amplify targeted DNA, and the amplified library was purified, size-selected, and sequenced using Illumina Miseq V2 PE300. Sequencing data was analyzed using the published iGUIDE pipeline, with the addition of a downsampling step which ensures an unbiased comparison across samples.
In contrast to mammals, convenient recombineering-edit tools are available for bacteria, e.g., the phage lambda Red and RecE/T. Microbial recombineering has two major steps: template DNA is chewed back by exonucleases (Exo), then the single-strand annealing protein (SSAP) supports homology directed repair by the template, optionally facilitated by nuclease inhibitor. A system for RNA-guided targeting of RecE/T recombineering activities was developed and achieved kilobase (kb) human gene-editing without DNA cutting.
Candidate microbial systems with recombineering activities were surveyed. Two lines of reasoning guided the search: 1) Orthogonality: prioritizing proteins with minimal resemblance to mammalian repair enzymes; 2) Parsimony: focusing on systems with fewest interdependent components. Three protein families were identified: lambda Red, RecE/T, and phage T7 gp6 (Exo) and gp2.5 (SSAP) recombination machinery. Based on phylogenetic reconstruction, RecE/T proteins were determined to be the most distant from eukaryotic recombination proteins and among the most compact (
The NCBI protein database was systematically searched for RecE/T homologs. To develop a portable tool, evolutionary relationships and lengths were examined (
The top 12 candidates were codon-optimized and MS2 coat protein (MCP) fusions were constructed to recruit these RecE/T homologs, hereafter termed “recombinator”, to wild-type Streptococcus pyogenes Cas9 (wtCas9) via MS2 RNA aptamers. To understand their respective molecular effects as Exo and SSAP, each was tested independently (
To validate RecE/T recombineering in human cells, homology directed repair (HDR) was measured at five genomic sites with two templates. While the RecE variants (RecE_587, RecE_CTD) demonstrated variable increases in knock-in efficiency, RecT significantly enhanced HDR in all cases, replacing ˜16 bp sequences at EMX1 and VEGFA, and knocking-in ˜1 kb cassette at HSP90AA1, DYNLT1, AAVS1 (
Three tests on REDITv1 were performed to explore: 1) activity across cell types, 2) optimal designs of HDR template, and 3) specificity. REDITv1 activity was robust across multiple genomic sites in HEK, A549, HepG2, and HeLa cells (
To alleviate unwanted edits, a version of REDIT with non-cutting Cas9 nickases (Cas9n) was assessed. A similar strategy was previously employed (Ran, F. A., et al., Cell (2013), 154: 1380-1389, incorporated herein by reference) to address off-target issues but had low HDR efficiency. REDIT was tested to determine if this system could overcome the limitation of endogenous repair and promote nicking-mediated recombination. Indeed, the nickase version demonstrated higher efficiencies, with the best results from Cas9n(D10A) with single- and double-nicking. This Cas9n(D10A) variant was designated REDITv2N (
The off-target activity of REDITv2N was investigated using GUIDE-seq. Results showed minimal off-target cleavage and a reduction of OTSs by ˜90% compared to REDITv1 (
Another byproduct of HDR editing is on-target insertion-deletions (indels). They could drastically lower yields of gene-editing, especially for long sequences. Indel formation was measured in an EMX1 knock-in experiment using deep sequencing. REDITv2N increased HDR to the same efficiency as its counterpart using wtCas9 (
Concepts from GUIDE-seq, LAM-PCR, and TLA were used to develop an NGS-based assay to identify genome-wide insertion sites (GIS), or GIS-seq (
REDIT was examined for long sequence editing ability in the absence of any nicking/cutting of the target DNA. Remarkably, when using catalytically dead Cas9 (dCas9) to construct REDITv2D, an exact genomic knock-in of a kilobase cassette was observed in human cells (
Microscopy analysis revealed incomplete nuclei-targeting of REDITv1, particularly REDITv1 RecT (
Finally, REDITv3 was utilized in hESCs to engineer kilobase knock-in alleles in human stem cells. REDITv3N single- and double-nicking designs resulted in 5-fold and 20-fold increased HDR efficiencies over no-recombinator controls, respectively (
To further investigate RecT and RecE_587 variants, both RecT and RecE_587 were truncated at various lengths as shown in
The truncated versions of both RecT and RecE_587 retained significant recombineering activity when used with different Cas9s. In particular, compared with the full-length RecT(1-269aa), the new truncated versions such as RecT(93-264aa) are over 30% smaller yet they preserved essentially the full activities of RecT in stimulating recombination in eukaryotic cells. Similarly, compared with the full-length RecE(1-280aa), truncated versions such as RecE_587(120-221aa) and RecE_587(120-209aa) are over 60% smaller but still retained high recombination activities in human cells. These truncated versions demonstrated the potential to further engineer minimal-functional recombineering enzymes using RecE and RecT protein variants, but also provide valuable compact recombineering tools for human genome editing that is ideal for in vitro, ex vivo, and in vivo delivery given their small size.
Overall, REDIT harnessed the specificity of CRISPR genome-targeting with the efficiency of RecE/RecT recombineering. The disclosed high-efficiency, low-error system makes a powerful addition to existing CRISPR toolkits. The balanced efficiency and accuracy of REDITv3N makes it an attractive therapeutic option for knock-in of large cassette in immune and stem cells.
The reconstructed RecE and RecT phylogenetic trees with eukaryotic recombination enzymes from yeast and human (
Three exonuclease proteins were used: the exonuclease from phage Lambda, the RecE587 core domain of E. coli RecE protein, and the exonuclease (gene name gp6) from phage T7 (
Similar measurements were made testing the genome editing efficiencies of three single-strand DNA annealing proteins (SSAPs) from the same three species of microbes as the exonucleases, namely Bet protein from phage Lambda, RecT protein from E. coli, and SSAP (gene name gp2.5) from phage T7 (
From these results, the genome recombineering activities of all three major family of phage/microbial recombination systems was systematically measured and validated in eukaryotic cells (lambda phage exonuclease and beta proteins; E. coli prophase RecE and RecT proteins, T7 phage exonuclease gp6 and single-strand binding gp2.5 proteins). All six proteins from three systems achieved efficient gene editing to knock-in kilobase-long sequences into mammalian genome across two genomic loci. Overall, the exonucleases showed ˜3-fold higher recombination efficiency (up to 4% mKate genome knock-in) when compared with no-recombinator controls. The single-strand annealing proteins (SSAP) showed higher activities, with 4-fold to 8-fold higher gene-editing activities over the control groups. This demonstrated the general applicability and validity that microbial recombination proteins in the exonuclease and SSAP families could be engineered via the Cas9-based fusion protein system to achieve highly efficient genome recombination in mammalian cells.
In order to demonstrate the generalizability of REDIT protein design, alternative recruitment systems were developed and tested. For a more compact REDIT system, the REDIT recombinator proteins were fused to N22 peptide and at the same time the sgRNA included boxB, the short cognizant sequence of N22 peptide, replacing MCP within the sgRNA (
A REDIT system using SunTag recruitment, a protein-based recruitment system, was developed (
mKate knock-in experiments (
In order to demonstrate the generalizability of REDIT protein design and develop versatile REDIT system applicable to a range of CRISPR enzymes, Cpf1/Cas12a based REDIT system using the SunTag recruitment design was developed (
These results showed that the microbial recombination proteins (exonuclease and single-strand annealing proteins) could be engineered using alternative designs such as the SunTag recruitment system to perform genome editing in eukaryotic cells. These protein-based recruitment system does not require the usage of RNA aptamers or RNA-binding proteins, instead, they took advantage of fusion protein domains directly connecting to the CRISPR enzymes to recruit REDIT proteins.
In addition to the flexibility in recruitment system design, these results using Cpf1/Cas12a-type CRISPR enzymes also demonstrated the general adaptability of REDIT proteins to various CRISPR systems for genome recombination. Cpf1/Cas12a enzymes have different catalytic residues and DNA-recognition mechanisms from the Cas9 enzymes. Hence, the REDIT recombination proteins (exonucleases and single-strand annealing proteins) could function independent from the specific choices of the CRISPR enzyme components (Cas9, Cpf1/Cas12a, and others). This proved the generalizability of the REDIT system and open up possibility to use additional CRISPR enzymes (known and unknown) as components of REDIT system to achieve accurate genome editing in eukaryotic cells.
Fifteen different species of microbes having RecE/RecT proteins were selected for a screen of various RecE and RecT proteins across the microbial kingdom (Table 3). Each protein was codon-optimized and synthesized. As previously described for E. coli RecE/RecT based REDIT systems, each protein was fused via E-XTEN linker to the MCP protein with additional nuclear localization signal. mKate knock-in gene-editing assay was used to measure efficiencies at DYNLT1 locus (
Pantoea
stewartii
Pantoea
stewartii
Pantoea brenneri
Pantoea brenneri
Pantoea
dispersa
Pantoea
dispersa
Providencia
stuartii
Providencia
stuartii
Providencia sp. MGF014
Providencia sp. MGF014
Providencia
alcalifaciens DSM 30120
Providencia
alcalifaciens DSM 30120
Shewanella
putrefaciens
Shewanella
putrefaciens
Bacillus sp. MUM 116
Bacillus sp. MUM 116
Shigella
sonnei
Shigella
sonnei
Salmonella enterica
Salmonella enterica
Acetobacter
Acetobacter
Salmonella enterica subsp. enterica
Salmonella enterica subsp. enterica
Photobacterium sp. JCM 19050
Photobacterium sp. JCM 19050
Next, to benchmark the RecT-based REDIT design, it was compared with three categories of existing HDR-enhancing tools (
The effect of template HA lengths on the editing efficiency of REDIT was quantified when using the canonical HDR donor bearing HAs of at least 100 bp on each side (
The knock-in cells were clonally isolated and the target genomic region was amplified using primers binding completely outside of the donor DNAs for colony Sanger sequencing (
Furthermore, the efficiencies of REDIT and Cas9 were compared when making different lengths of editing. For longer edits, 2-kb knock-in cassettes were used (
The sensitivity of REDIT's ability to promote HDR in the presence or absence of two distinctive pharmacological inhibitors of RAD51, B02 and RI-1 (
Mirin, a potent chemical inhibitor of DSB repair, which has also been shown to prevent MRN complex formation, MRN-dependent ATM activation, and inhibit Mre11 exonuclease activity was also used. When treating cells with Mrining, only the editing efficiencies of Cas9 reference experiments were affected by the Miring treatment, whereas the REDIT versions were essentially the same as vehicle-treated groups across all genomic targets (
To test if cell cycle inhibition affected recombination, cells were chemically synchronized at the G1/S boundary using double Thymidine blockage (DTB). REDIT versions had reduced editing efficiencies under DTB treatment, though it maintained higher editing efficiencies under DNA repair pathway inhibition, compared with Cas9 reference experiments, when Miring RI-1, or B02 were combined with DTB treatment (
To validate REDIT in different contexts, REDIT was applied in human embryonic stem cells (hESCs) to test their ability to engineer long sequences in non-transformed human cells. Robust stimulation of HDR was observed across all three genomic sites (HSP90AA1, ACTB, OCT4/POU5F1) using REDIT and REDITdn (
In vivo use of dCas9-EcRecT (SAFE-dCas9) was tested using cleavage free dCas9 editor via hydrodynamic tail vein injection. The gene editing vectors and template DNA used are shown in
At approximately seven days after injection, the perfused mice livers were dissected. The lobes of the liver were homogenized and processed to extract liver genomic DNA from the primary hepatocytes. The extracted genomic DNA was used for three different downstream analyses: 1) PCR using knock-in-specific primers and agarose gel electrophoresis (
In addition, in vivo use was tested using adeno-associated virus (AAV) delivery into LTC mice lungs. LTC mice include three genome alleles: 1) Lkb1 (flox/flox) allele allows Lkb1-K0 when expressing Cre; 2) R26(LSL-TdTom) allele allows detection of AAV-transduced cells via TdTom red fluorescent protein; and 3) H11(LSL-Cas9) allele allows expression of Cas9 in AAV-transduced cells. Schematics of the REDI gene editing vector and Cas9 control vectors are shown in
Approximately fourteen weeks after the AAV injection, perfused mice lungs were dissected. Fixed lung tissue was used for imaging analysis to identify tumor formation from successful gene-editing (
Escherichia coli RecE amino acid sequence
Escherichia coli RecE_587 amino acid sequence
Escherichia coli CTDRecE amino acid sequence
Pantoea brenneri RecE amino acid sequence
Providencia sp. MGF014 RecE amino acid sequence
Shigella sonnei RecE amino acid sequence
Pseudobacteriovorax antillogorgiicola
Escherichia coli RecT amino acid sequence
Pantoea brenneri RecT amino acid sequence
Providencia sp. MGF014 RecT amino acid sequence
Shigella sonnei RecT amino acid sequence
Pseudobacteriovorax antillogorgiicola
CgTgttgagggcgttggagcggggagaaggccaggggtcactccagga
GGAGCTACTAACTTCAGCCTGCTGAAGCAGGCTGGAGACGTGGAGGAG
AACCCTGGACCTgccaccatggtgagcgagctgattaaggagaacatg
cacatgaagctgtacatggagggcaccgtgaacaaccaccacttcaag
agaatcaaggcggtcgagggcggccctctccccttcgccttcgacatc
ctggctaccagcttcatgtacggcagcaaaaccttcatcaaccacacc
cagggcatccccgacttctttaagcagtccttccccgagggcttcaca
tgggagagagtcaccacatacgaagatgggggcgtgctgaccgctacc
caggacaccagcctccaggacggctgcctcatctacaacgtcaagatc
agaggggtgaacttcccatccaacggccctgtgatgcagaagaaaaca
ctcggctgggaggcctccaccgagacactgtaccccgctgacggcggc
ctggaaggcagagccgacatggccctgaagctcgtgggcgggggccac
ctgatctgcaaccttaagaccacatacagatccaagaaacccgctaag
aacctcaagatgcccggcgtctactatgtggacaggagactggaaaga
atcaaggaggccgacaaagagacatacgtcgagcagcacgaggtggct
gtggccagatactgcgacctccctagcaaactggggcacaaacttaat
tccTAACCaGCtGTCCtGCCTATGGCCTTTCTCCTTTTGTCTCTAGTT
CCTGCTGAAGCAGGCTGGAGACGTGGAGGAGAACCCTGGACCTgtgag
cgagctgattaaggagaacatgcacatgaagctgtacatggagggcac
cgtgaacaaccaccacttcaagtgcacatccgagggcgaaggcaagcc
ctacgagggcacccagaccatgagaatcaaggcggtcgagggcggccc
tctccccttcgccttcgacatcctggctaccagcttcatgtacggcag
caaaaccttcatcaaccacacccagggcatccccgacttctttaagca
gtccttccccgagggcttcacatgggagagagtcaccacatacgaaga
tgggggcgtgctgaccgctacccaggacaccagcctccaggacggctg
cctcatctacaacgtcaagatcagaggggtgaacttcccatccaacgg
ccctgtgatgcagaagaaaacactcggctgggaggcctccaccgagac
actgtaccccgctgacggcggcctggaaggcagagccgacatggccct
gaagctcgtgggcgggggccacctgatctgcaaccttaagaccacata
cagatccaagaaacccgctaagaacctcaagatgcccggcgtctacta
tgtggacaggagactggaaagaatcaaggaggccgacaaagagacata
cgtcgagcagcacgaggtggctgtggccagatactgcgacctccctag
caaactggggcacaaacttaattccTAaATCTgTGGCTGAGGGATGAC
gagagatctggcagcggaGGAAGCGGAGCTACTAACTTCAGCCTGCTG
AAGCAGGCTGGAGACGTGGAGGAGAACCCTGGACCTgtgagcgagctg
attaaggagaacatgcacatgaagctgtacatggagggcaccgtgaac
aaccaccacttcaagtgcacatccgagggcgaaggcaagccctacgag
ggcacccagaccatgagaatcaaggcggtcgagggcggccctctcccc
ttcgccttcgacatcctggctaccagcttcatgtacggcagcaaaacc
ttcatcaaccacacccagggcatccccgacttctttaagcagtccttc
cccgagggcttcacatgggagagagtcaccacatacgaagatgggggc
gtgctgaccgctacccaggacaccagcctccaggacggctgcctcatc
tacaacgtcaagatcagaggggtgaacttcccatccaacggccctgtg
atgcagaagaaaacactcggctgggaggcctccaccgagacactgtac
cccgctgacggcggcctggaaggcagagccgacatggccctgaagctc
gtgggcgggggccacctgatctgcaaccttaagaccacatacagatcc
aagaaacccgctaagaacctcaagatgcccggcgtctactatgtggac
aggagactggaaagaatcaaggaggccgacaaagagacatacgtcgag
cagcacgaggtggctgtggccagatactgcgacctccctagcaaactg
gggcacaaacttaattccTAaactagggacaggattggtgacagaaaa
TGAAGCAGGCTGGAGACGTGGAGGAGAACCCTGGACCTgccaccatgg
tgagcgagctgattaaggagaacatgcacatgaagctgtacatggagg
gcaccgtgaacaaccaccacttcaagtgcacatccgagggcgaaggca
agccctacgagggcacccagaccatgagaatcaaggcggtcgaagacg
gccctctccccttcaccttcgacatcctggctaccagcttcatgtacg
gcaacaaaaccttcatcaaccacacccagggcatccccgacttcttta
agcagtccttccccgagggcttcacatgggagagagtcaccacatacg
aagatgggggcgtgctgaccgctacccaggacaccagcctccaggacg
gctgcctcatctacaacgtcaagatcagaggggtgaacttcccatcca
acggccctgtgatgcagaagaaaacactcggctgggaggcctccaccg
agacactgtaccccgctgacggcggcctggaaggcagagccgacatgg
ccctgaagctcgtgggcgggggccacctgatctgcaaccttaagacca
catacagatccaagaaacccgctaagaacctcaagatgcccggcgtct
actatgtggacaggagactggaaagaatcaaggaggccgacaaagaga
catacgtcgagcagcacgaggtggctgtggccagatactgcgacctcc
ctagcaaactggggcacaaacttaattccTAaTGACTAGGAATGGGGG
Pantoea stewartii Red DNA
Pantoea stewartii RecE DNA
Pantoea brenneri RecT DNA
Pantoea brenneri RecE DNA
Pantoea dispersa RecT DNA
Pantoea dispersa RecE DNA
Providencia stuartii RecT DNA
Providencia stuartii RecE DNA
Providencia sp. MGF014 RecT DNA
Providencia sp. MGF014 RecE DNA
Shewanella putrefaciens RecT DNA
Shewanella putrefaciens RecE DNA
Bacillus sp. MUM 116 RecT DNA
Bacillus sp. MUM 116 RecEDNA
Shigella sonnei RecT DNA
Shigella sonnei RecE DNA
Salmonella enterica RecT DNA
Salmonella enterica RecE DNA(SEQ ID NO: 104):
Acetobacter Red DNA
Acetobacter RecE DNA
Salmonella enterica subsp. enterica serovar
Salmonella enterica subsp. enterica serovar
Javiana str. 10721 RecE DNA
Pseudobacteriovorax antillogorgiicola RecT DNA
Pseudobacteriovorax antillogorgiicola RecE DNA
Photobacterium sp. JCM 19050 RecT DNA
Photobacterium sp. JCM 19050 RecE DNA
Providencia alcalifaciens DSM 30120 RecT DNA
Pantoea stewartii Red Protein
Pantoea stewartii RecE Protein
Pantoea brenneri Red Protein
Pantoea brenneri RecE Protein
Pantoea dispersa Red Protein
Pantoea dispersa RecE Protein
Providencia stuartii Red Protein
Providencia stuartii RecE Protein
Providencia sp. MGF014 Red Protein
Providencia sp. MGF014 RecE Protein
Shewanella putrefaciens RecT Protein
Shewanella putrefaciens RecE Protein
Bacillus sp. MUM 116 Red Protein
Bacillus sp. MUM 116 RecEProtein
Shigella sonnei Red Protein
Shigella sonnei RecE Protein
Salmonella enterica RecT Protein
Salmonella enterica RecE Protein
Acetobacter RecT Protein
Acetobacter RecE Protein
Salmonella enterica subsp. enterica serovar
Javiana str. 10721 RecT Protein
Salmonella enterica subsp. enterica serovar
Pseudobacteriovorax antillogorgiicola RecT Protein
Pseudobacteriovorax antillogorgiicola RecE Protein
Photobacterium sp. JCM 19050 Red Protein
Photobacterium sp. JCM 19050 RecE Protein
Providencia alcalifaciens DSM 30120 Red Protein
Providencia alcalifaciens DSM 30120 RecE Protein
All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and were set forth in its entirety herein.
Preferred embodiments of this invention are described herein, including the best mode known to the inventors for carrying out the invention. Variations of those preferred embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description. The inventors expect skilled artisans to employ such variations as appropriate, and the inventors intend for the invention to be practiced otherwise than as specifically described herein. Accordingly, this invention includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the invention unless otherwise indicated herein or otherwise clearly contradicted by context.
This application claims the benefit of U.S. Provisional Application No. 62/984,618, filed Mar. 3, 2020, and U.S. Provisional Application No. 63/146,447, filed Feb. 5, 2021, the contents of each are incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/020513 | 3/2/2021 | WO |
Number | Date | Country | |
---|---|---|---|
62984618 | Mar 2020 | US | |
63146447 | Feb 2021 | US |