The instant application contains a sequence listing, which has been submitted in ASCII format by electronic submission and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Aug. 16, 2021, is named P13668WO00_ST25.txt and is 297,208 bytes in size.
The present disclosure relates to compositions and methods for editing genomic sequences and for modulating gene expression in plants.
CRISPR-Cas9, since its first demonstration as RNA-guided nuclease, has been rapidly applied for genome editing in eukaryotes including plants. Predominant use of CRISPR-Cas9 has been based on targeted mutagenesis through error-prone non-homologous end joining (NHEJ) repair of Cas9-induced DNA double-strand breaks (DSBs). In recent years, Cas9-derived base editors such as cytosine base editors (CBEs) and adenine base editors (ABEs) have gained momentum on conferring precise base changes in genomes of interest. Dual base editors that confer simultaneous C-to-T and A-to-G base edits have also been developed, including synchronous programmable adenine and cytosine editor (SPACE), A&C-BEmax, and Target-ACEmax demonstrated in human cells, and saturated targeted endogenous mutagenesis editors (STEMEs) demonstrated in plants. Furthermore, a SWISS platform was developed for simultaneous adenine base editing, cytosine base editing, and indel formation in plant genomes. These multifunctional CRISPR systems however limit their capabilities to solely genome editing.
Aside from genome editing, the CRISPR-Cas9 system has been repurposed for genome reprogramming. CRISPR activation (CRISPRa) systems allow for transcriptional activation, and such systems were developed in mammalian cells and plant cells. On the contrary, CRISPR interference (CRISPRi) systems were used for transcriptional repression in mammalian cells and plants. These transcription regulation systems are based on deactivated Cas9 (dCas9) which abolishes nuclease activity while retaining single guide RNA (sgRNA)-mediated DNA binding activity. Coupled with engineered sgRNA scaffolds for the recruitment of activators and repressors, CRISPR-dCas9 was demonstrated for simultaneous transcriptional activation and repression in human cells and yeast. Alternatively, the DNA cleavage activity of Cas9 can be abolished without compromise on DNA binding by using truncated protospacers. Based on a similar principle, a nuclease active AsCas12a-VPR fusion was engineered for orthogonal genome editing and transcriptional activation in mice. However, direct fusion of a transcriptional activator to a Cas protein would prevent its use for transcriptional repression. On the other hand, nuclease active Cas12a was also used to develop a dual functional CRISPR system for simultaneous gene editing and repression in Corynebacterium glutamicum. Orthogonal genome editing and transcriptional activation could also be achieved by using orthogonal Cas9 proteins. However, programming functionalities through guide RNAs appears to be more versatile as well as easier for vector construction and delivery. Further, a robust CRISPR system for simultaneous genome editing, transcriptional activation, and repression is yet to be developed in any organism. One constraint that has limited orthogonal CRISPR applications in plants is the lack of a highly efficient CRISPRa system.
It is an objective of the present disclosure to provide an improved CRISPRa system with higher levels of gene activation. It is a further objective of the present disclosure to provide an orthogonal CRISPR system for simultaneous genome editing, gene activation, and gene repression. Additional objectives, features, and advantages will become apparent based on the disclosure contained herein.
The presently disclosed subject matter relates generally to genome engineering. A potent CRISPR transcriptional activation system in plants, termed CRISPR-Act3.0, is provided. The system provides higher levels of gene activation than all other gene activation systems reported in plants to date. Further provided is a comprehensive platform called CRISPR-Combo, which allows for simultaneous and combinational gene activation, editing, and repression. The platform enables multiple genome engineering outcomes in plants including potent single or multi-gene activation; simultaneous gene editing and gene activation; simultaneous gene editing and gene repression; simultaneous gene activation and repression; and simultaneous gene editing, activation, and repression. The gene editing may include non-homologous end joining (NHEJ) based mutagenesis, base editing, prime editing, and homology-based repair (HDR).
Systems for activating expression of a target nucleic acid are provided, the systems comprising (i) a Cas polypeptide, or a polynucleotide encoding the Cas polypeptide; (ii) a guide polynucleotide comprising an aptamer; (iii) a polypeptide comprising an adapter domain and a multimerized epitope, wherein the adapter domain binds the aptamer, or a polynucleotide encoding the polypeptide; and (iv) a polypeptide comprising an affinity domain and a transcriptional activation domain, wherein the affinity domain binds the multimerized epitope, or a polynucleotide encoding the polypeptide.
Systems for simultaneously activating expression of a target nucleic acid and modifying a nucleotide sequence at a target site in a genome are provided, the systems comprising: (i) a Cas polypeptide, or a polynucleotide encoding the Cas polypeptide; (ii) a dead guide polynucleotide that mediates increased expression of a target nucleic acid, wherein the dead guide polynucleotide comprises an aptamer; (iii) a polypeptide comprising an adapter domain and a multimerized epitope, wherein the adapter domain binds the aptamer, or a polynucleotide encoding the polypeptide; (iv) a polypeptide comprising an affinity domain and a transcriptional activation domain, wherein the affinity domain binds the multimerized epitope, or a polynucleotide encoding the polypeptide; and (v) a guide polynucleotide that mediates sequence-specific cleavage at a target site in the genome.
Systems for simultaneously activating expression of a first target nucleic acid and repressing expression of a second target nucleic acid are provided, the systems comprising (i) a Cas polypeptide, or a polynucleotide encoding the Cas polypeptide; (ii) a first dead guide polynucleotide that mediates increased expression of the first target nucleic acid, wherein the first dead guide polynucleotide comprises an aptamer; (iii) a polypeptide comprising an adapter domain and a multimerized epitope, wherein the adapter domain binds the aptamer, or a polynucleotide encoding the polypeptide; (iv) a polypeptide comprising an affinity domain and a transcriptional activation domain, wherein the affinity domain binds the multimerized epitope, or a polynucleotide encoding the polypeptide; and (v) a second dead guide polynucleotide that mediates reduced expression of the second target nucleic acid.
Systems for simultaneously activating expression of a first target nucleic acid, repressing expression of a second target nucleic acid, and modifying a nucleotide sequence at a target site in a genome are provided, the systems comprising (i) a Cas polypeptide, or a polynucleotide encoding the Cas polypeptide; (ii) a first dead guide polynucleotide that mediates increased expression of the first target nucleic acid, wherein the first dead guide polynucleotide comprises an aptamer; (iii) a polypeptide comprising an adapter domain and a multimerized epitope, wherein the adapter domain binds the aptamer, or a polynucleotide encoding the polypeptide; (iv) a polypeptide comprising an affinity domain and a transcriptional activation domain, wherein the affinity domain binds the multimerized epitope, or a polynucleotide encoding the polypeptide; (v) a second dead guide polynucleotide that mediates reduced expression of the second target nucleic acid; and (vi) a guide polynucleotide that mediates sequence-specific cleavage at a target site in the genome.
Methods to use these systems to activate the expression of a target nucleic acid in a plant cell, repress the expression of a target nucleic acid in a plant cell, and/or modify a nucleotide sequence at a target site in a genome of a plant cell are described herein. Modified plants and plant cells are also encompassed.
While multiple example embodiments are disclosed, still other example embodiments of the inventions will become apparent to those skilled in the art from the following detailed description, which shows and describes illustrative example embodiments of the invention. Accordingly, the figures and detailed description are to be regarded as illustrative and not restrictive in any way.
The following drawings form part of the specification and are included to further demonstrate certain example embodiments or various aspects of the invention. In some instances, example embodiments can be best understood by referring to the accompanying figures in combination with the detailed description presented herein. The description and accompanying figures may highlight a certain specific example, or a certain aspect of the invention. However, one skilled in the art will understand that portions of the examples or aspects provided in the present disclosure may be used in combination with other examples or aspects of the invention.
A highly robust CRISPRa system, CRISPR-Act3.0, developed through systematically exploring different effector recruitment strategies and various transcription activators is provided. The CRISPR-Act3.0 system results in four- to six-fold higher activation than the state-of-the-art CRISPRa systems. In addition, the CRISPR-Act3.0 allows simultaneous modification of multiple traits, which are stably transmitted to the T3 generations. RNA-guided CRISPR-Cas9 nuclease, its derived base editors, CRISPRa systems, and CRISPRi systems are nearly always used in isolation, leaving their potential combinational power untapped. The present disclosure also provides a versatile CRISPR-Combo platform for simultaneous genome editing, gene activation, and gene repression in plants. Based on a single Cas polypeptide, the multifunctionality of CRISPR-Combo is programmed through sgRNA engineering. Hence, implementation of CRISPR-Combo is as simple as any multiplexed CRISPR systems.
So that the present invention may be more readily understood, certain terms are first defined. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one skilled in the art to which example embodiments of the invention pertain. Many methods and materials similar, modified, or equivalent to those described herein can be used in the practice of the example embodiments without undue experimentation, the preferred materials and methods are described herein. In describing the example embodiments and claiming the present invention, the following terminology will be used in accordance with the definitions set out below.
It is to be understood that all terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting in any manner or scope. For example, as used in this specification and the appended claims, the singular forms “a,” “an” and “the” can include plural referents unless the content clearly indicates otherwise. Similarly, the word “or” is intended to include “and” unless the context clearly indicates otherwise. The word “or” means any one member of a particular list and also includes any combination of members of that list. Further, all units, prefixes, and symbols may be denoted in its SI accepted form.
Numeric ranges recited within the specification are inclusive of the numbers defining the range and include each integer within the defined range. Throughout this disclosure, various aspects of this invention are presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges, fractions, and individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6, and decimals and fractions, for example, 1.2, 3.8, 1½, and 4¾ This applies regardless of the breadth of the range.
The methods, systems, and compositions of the present disclosure may comprise, consist essentially of, or consist of the components described herein. As used herein, “consisting essentially of” means that the methods, systems, and compositions may include additional steps or components, but only if the additional steps or components do not materially alter the basic and novel characteristics of the claimed methods, systems, and compositions.
The term “CRISPR/Cas” or “clustered regularly interspaced short palindromic repeats” or “CRISPR” refers to DNA loci containing short repetitions of base sequences followed by short segments of spacer DNA from previous exposures to a virus or plasmid. Bacteria and archaea have evolved adaptive immune defenses termed CRISPR/CRISPR-associated (Cas) systems that use short RNA to direct degradation of foreign nucleic acids. In bacteria, the CRISPR system provides acquired immunity against invading foreign DNA via RNA-guided DNA cleavage.
The “CRISPR/Cas9” system or “CRISPR/Cas9-mediated gene editing” refers to a type II CRISPR/Cas system that has been modified for genome editing/engineering. It is typically comprised of a “guide” RNA (gRNA) and a non-specific CRISPR-associated endonuclease (Cas9). “Guide RNA (gRNA)” is used interchangeably herein with “short guide RNA (sgRNA)” or “single guide RNA (sgRNA). The sgRNA is a short synthetic RNA composed of a “scaffold” sequence necessary for Cas9-binding and a user-defined approximately 20 nucleotide “spacer” or “targeting” sequence which defines the genomic target to be modified. The genomic target of Cas9 can be changed by changing the targeting sequence present in the sgRNA.
“CRISPRa” system refers to a modification of the CRISPR/Cas system that functions to activate or increase gene expression.
“dCas9” as used herein refers to a catalytically dead Cas9 protein that lacks endonuclease activity.
The term “dead guide RNA” refers to a guide RNA, which is catalytically inactive yet maintains target-site binding capacity.
“Encoding” refers to the inherent property of specific sequences of nucleotides in a polynucleotide, such as a gene, a cDNA, or an mRNA, to serve as templates for synthesis of other polymers and macromolecules in biological processes having either a defined sequence of nucleotides (i.e., rRNA, tRNA and mRNA) or a defined sequence of amino acids and the biological properties resulting therefrom. Thus, a gene encodes a protein if transcription and translation of mRNA corresponding to that gene produces the protein in a cell or other biological system. Both the coding strand, the nucleotide sequence of which is identical to the mRNA sequence and is usually provided in sequence listings, and the non-coding strand, used as the template for transcription of a gene or cDNA, can be referred to as encoding the protein or other product of that gene or cDNA.
As used herein, the term “exogenous” refers to any material introduced from or produced outside an organism, cell, tissue or system.
The term “expression” as used herein is defined as the transcription and/or translation of a particular nucleotide sequence driven by its promoter.
“Isolated” means altered or removed from the natural state. For example, a nucleic acid or a peptide naturally present in a living plant is not “isolated,” but the same nucleic acid or peptide partially or completely separated from the coexisting materials of its natural state is “isolated.” An isolated nucleic acid or protein can exist in substantially purified form, or can exist in a non-native environment such as, for example, a host cell.
The term “knockdown” as used herein refers to a decrease in gene expression of one or more genes.
The term “knockout” as used herein refers to the ablation of gene expression of one or more genes.
“Operably linked” refers to the association of nucleic acid fragments in a single fragment so that the function of one is regulated by the other. For example, a promoter is operably linked with a nucleic acid fragment when it is capable of regulating the transcription of that nucleic acid fragment.
A “vector” is a composition of matter which comprises an isolated nucleic acid and which can be used to deliver the isolated nucleic acid to the interior of a cell.
The present disclosure relates to the use of recombinant polypeptides to modulate (e.g., activate, repress) expression of a target nucleic acid.
As used herein, a “polypeptide” is an amino acid sequence including a plurality of consecutive polymerized amino acid residues (e.g. at least about 15 consecutive polymerized amino acid residues). “Polypeptide” refers to an amino acid sequence, oligopeptide, peptide, protein, or portions thereof, and the terms “polypeptide” and “protein” are used interchangeably.
Polypeptides as described herein also include polypeptides having various amino acid additions, deletions, or substitutions relative to the native amino acid sequence of a polypeptide of the present disclosure. In some embodiments, polypeptides that are homologs of a polypeptide of the present disclosure contain non-conservative changes of certain amino acids relative to the native sequence of a polypeptide of the present disclosure. In some embodiments, polypeptides that are homologs of a polypeptide of the present disclosure contain conservative changes of certain amino acids relative to the native sequence of a polypeptide of the present disclosure, and thus may be referred to as conservatively modified variants. A conservatively modified variant may include individual substitutions, deletions or additions to a polypeptide sequence which result in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well-known in the art. Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles of the disclosure. The following eight groups contain amino acids that are conservative substitutions for one another: 1) Alanine (A), Glycine (G); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W); 7) Serine (S), Threonine (T); and 8) Cysteine (C), Methionine (M) (see, e.g., Creighton, Proteins (1984)). A modification of an amino acid to produce a chemically similar amino acid may be referred to as an analogous amino acid.
Recombinant polypeptides of the present disclosure that are composed of individual polypeptide domains may be described based on the individual polypeptide domains of the overall recombinant polypeptide. A domain in such a recombinant polypeptide refers to the particular stretches of contiguous amino acid sequences with a particular function or activity. For example, a recombinant polypeptide that is a fusion of a transcriptional activator polypeptide and an affinity polypeptide, the contiguous amino acids that encode the transcriptional activator polypeptide may be described as the transcriptional activator domain in the overall recombinant polypeptide, and the contiguous amino acids that encode the affinity polypeptide may be described as the affinity domain in the overall recombinant polypeptide. Individual domains in an overall recombinant protein may also be referred to as units of the recombinant protein. Recombinant polypeptides that are composed of individual polypeptide domains may also be referred to as fusion polypeptides.
Certain embodiments of the present disclosure relate to a polypeptide comprising an adapter domain and a multimerized epitope domain. In certain embodiments, the adapter domain is recombinantly fused to a multimerized epitope domain (e.g., an adapter-multimerized epitope fusion protein). The adapter domain may be in an N-terminal orientation or a C-terminal orientation relative to the multimerized epitope domain. The multimerized epitope domain may be in an N-terminal orientation or a C-terminal orientation relative to the adapter domain. In some embodiments, an adapter-multimerized epitope fusion protein may be a direct fusion of an adapter domain and a multimerized epitope domain. In some embodiments, an adapter-multimerized epitope fusion protein may be an indirect fusion of an adapter domain and a multimerized epitope domain. In embodiments where the fusion is indirect, a linker domain or other contiguous amino acid sequence may separate the adapter domain and the multimerized epitope domain.
Certain embodiments of the present disclosure relate to a polypeptide comprising an affinity domain and a transcriptional activation domain. In certain embodiments, an affinity domain is recombinantly fused to the transcriptional activator domain (e.g., an affinity-transcriptional activator fusion protein). The transcriptional activator domain of an affinity-transcriptional activator fusion protein may be in an N-terminal orientation or a C-terminal orientation relative to the affinity polypeptide. The affinity polypeptide domain of an affinity-transcriptional activator fusion protein may be in an N-terminal orientation or a C-terminal orientation relative to the transcriptional activator polypeptide domain. In some embodiments, an affinity-transcriptional activator fusion protein may be a direct fusion of an affinity domain and transcriptional activator domain. In some embodiments, an affinity-transcriptional activator fusion protein may be an indirect fusion of an affinity polypeptide domain and a transcriptional activator domain. In embodiments where the fusion is indirect, a linker domain or other contiguous amino acid sequence may separate the affinity domain and the transcriptional activator domain.
Certain aspects of the present disclosure involve targeting a transcriptional activator to a target nucleic acid such that the transcriptional activator activates the expression/transcription of the target nucleic acid. In some embodiments, a transcriptional activator is present in a recombinant polypeptide that contains a transcriptional activator polypeptide and an affinity polypeptide.
Transcriptional activators are polypeptides that facilitate the activation of transcription/expression of a nucleic acid (e.g., a gene). Transcriptional activators may be DNA-binding proteins that bind to enhancers, promoters, or other regulatory elements of a nucleic acid, which then promotes expression of the nucleic acid. Transcriptional activators may interact with proteins that are components of transcriptional machinery or other proteins that are involved in regulation of transcription in a manner that promotes expression of the nucleic acid.
Transcriptional activators of the present disclosure may be endogenous to the host plant, or they may be exogenous/heterologous to the host plant. In some embodiments, the transcriptional activator is a viral transcriptional activator. In some embodiments, the transcriptional activator is derived from Herpes Simplex Virus. For example, one or more copies of a Herpes Simplex Virus Viral Protein 16 (VP16) domain may be used herein. In some embodiments, at least two, at least three, or at least four or more copies of a VP16 domain may be used as a transcriptional activator. A polypeptide containing 4 copies of the Herpes Simplex Virus Viral Protein 16 (VP16) domain is known as a VP64 domain.
In some embodiments, the transcriptional activator is a VP64 polypeptide. A VP64 polypeptide of the present disclosure may contain an amino acid sequence with at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% amino acid identity to the amino acid sequence of SEQ ID NO: 2.
In certain embodiments, the transcriptional activator is a TAL activation domain (TAD) derived from the transcription activator-like effector (TALE) proteins from the plant pathogen Xanthomonas. In some embodiments, the transcriptional activator comprises two repeats of TAD (2xTAD). A TAD polypeptide of the present disclosure may contain an amino acid sequence with at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% amino acid identity to the amino acid sequence of SEQ ID NO: 4.
Other exemplary transcriptional activators include, for example, the EDLL motif present in the ERF/EREBP family of transcriptional regulators in plants, activation domains of or full-length transcription factors, plant endogenous and exogenous histone acetylases (e.g. p300 from mammals), histone methylases (e.g. H3K4 methylation depositors (SDG2)), histone demethylases (e.g. H3K9 demethylases (IBM1)), Polymerase II subunits, and various combinations of the above mentioned transcriptional activators. For example, 2xTAD and VP64 may each be fused to an affinity polypeptide.
Additional transcriptional activators that may be used in the methods and compositions described herein will be readily apparent to those of skill in the art.
Certain embodiments of the present disclosure relate to recombinant polypeptides that contain an affinity polypeptide. Affinity polypeptides of the present disclosure may bind to one or more epitopes (e.g. a multimerized epitope). In some embodiments, an affinity polypeptide is present in a recombinant polypeptide that contains a transcriptional activator polypeptide and an affinity polypeptide.
A variety of affinity polypeptides are known in the art and may be used herein. Generally, the affinity polypeptide should be stable in the conditions present in the intracellular environment of a plant cell. Additionally, the affinity polypeptide should specifically bind to its corresponding epitope with minimal cross-reactivity.
The affinity polypeptide may be an antibody such as, for example, an scFv. The antibody may be optimized for stability in the plant intracellular environment. When a GCN4 epitope is used in the methods described herein, a suitable affinity polypeptide that is an antibody may contain an anti-GCN4 scFv domain.
In embodiments where the affinity polypeptide is an scFv antibody, the polypeptide may contain an amino acid sequence with at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% amino acid identity to the amino acid sequence of SEQ ID NO: 14.
Other exemplary affinity polypeptides include, for example, proteins with SH2 domains or the domain itself, 14-3-3 proteins, proteins with SH3 domains or the domain itself, the Alpha-Syntrophin PDZ protein interaction domain, the PDZ signal sequence, or proteins from plants, which can recognize AGO hook motifs (e.g., AGO4 from Arabidopsis thaliana).
Certain embodiments of the present disclosure relate to recombinant polypeptides that contain an epitope or a multimerized epitope. Epitopes of the present disclosure may bind to an affinity polypeptide. In some embodiments, an epitope or multimerized epitope is present in a recombinant polypeptide that contains an adapter polypeptide and an epitope or multimerized epitope.
Epitopes of the present disclosure may be used for recruiting affinity polypeptides (and any polypeptides they may be recombinantly fused to) to an adapter polypeptide. In embodiments where an adapter polypeptide is fused to an epitope or a multimerized epitope, the adapter polypeptide may be fused to one copy of an epitope, multiple copies of an epitope, more than one different epitope, or multiple copies of more than one different epitope as further described herein.
A variety of epitopes and multimerized epitopes are known in the art and may be used herein. In general, the epitope or multimerized epitope may be any polypeptide sequence that is specifically recognized by an affinity polypeptide of the present disclosure. Exemplary epitopes may include a c-Myc affinity tag, an HA affinity tag, a His affinity tag, an S affinity tag, a methionine-His affinity tag, an RGD-His affinity tag, a FLAG octapeptide, a strep tag or strep tag II, a V5 tag, a VSV-G epitope, and a GCN4 epitope.
Other exemplary amino acid sequences that may serve as epitopes and multimerized epitopes include, for example, phosphorylated tyrosines in specific sequence contexts recognized by SH2 domains, characteristic consensus sequences containing phosphoserines recognized by 14-3-3 proteins, proline-rich peptide motifs recognized by SH3 domains, the PDZ protein interaction domain or the PDZ signal sequence, and the AGO hook motif from plants.
Epitopes described herein may also be multimerized. Multimerized epitopes may include at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, or at least 24 or more copies of an epitope.
Multimerized epitopes may be present as tandem copies of an epitope, or each individual epitope may be separated from another epitope in the multimerized epitope by a linker or other amino acid sequence. Suitable linker regions are known in the art and are described herein. The linker may be configured to allow the binding of affinity polypeptides to adjacent epitopes without, or without substantial, steric hindrance. Linker sequences may also be configured to provide an unstructured or linear region of the polypeptide to which they are recombinantly fused. The linker sequence may comprise e.g. one or more glycines and/or serines. The linker sequences may be e.g. at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 or more amino acids in length.
In some embodiments, the epitope is a GCN4 epitope (SEQ ID NO: 16). In some embodiments, the multimerized epitope contains at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, or at least 24 copies of a GCN4 epitope. In some embodiments, the multimerized epitope contains 10 copies of a GCN4 epitope (SEQ ID NO: 18).
Various linkers may be used in the construction of recombinant proteins as described herein. In general, linkers are short peptides that separate the different domains in a multi-domain protein. They may play an important role in fusion proteins, affecting the crosstalk between the different domains, the yield of protein production, and the stability and/or the activity of the fusion proteins. Linkers are generally classified into 2 major categories: flexible or rigid. Flexible linkers are typically used when the fused domains require a certain degree of movement or interaction, and these linkers are usually composed of small amino acids such as, for example, glycine (G), serine (S) or proline (P).
The certain degree of movement between domains allowed by flexible linkers is an advantage in some fusion proteins. However, it has been reported that flexible linkers can sometimes reduce protein activity due to an inefficient separation of the two domains. In this case, rigid linkers may be used since they enforce a fixed distance between domains and promote their independent functions. A thorough description of several linkers has been provided in Chen X et al., 2013, Advanced Drug Delivery Reviews 65 (2013) 1357-1369).
Various linkers may be used in, for example, the construction of recombinant polypeptides as described herein. Linkers may be used in e.g., adapter-multimerized epitope fusion proteins as described herein to separate the coding sequences of the adapter domain and the multimerized epitope domain. Linkers may be used in e.g., affinity-transcriptional activator fusion proteins as described herein to separate the coding sequences of the affinity domain and the transcriptional activator domain. For example, a variety of wiggly/flexible linkers, stiff/rigid linkers, short linkers, and long linkers may be used as described herein. Various linkers as described herein may be used in the construction of recombinant proteins as described herein.
A variety of shorter or longer linker regions are known in the art, for example corresponding to a series of glycine residues, a series of adjacent glycine-serine dipeptides, a series of adjacent glycine-glycine-serine tripeptides, or known linkers from other proteins.
Recombinant polypeptides of the present disclosure may contain one or more nuclear localization signals (NLS). Nuclear localization signals may also be referred to as nuclear localization sequences, domains, peptides, or other terms readily apparent to those of skill in the art. Nuclear localization signals are a translocation sequence that, when present in a polypeptide, direct that polypeptide to localize to the nucleus of a eukaryotic cell.
Various nuclear localization signals may be used in recombinant polypeptides of the present disclosure. For example, one or more SV40-type NLS or one or more REX NLS may be used in recombinant polypeptides. Recombinant polypeptides may also contain two or more tandem copies of a nuclear localization signal. For example, recombinant polypeptides may contain at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, or at least ten copies, either tandem or not, of a nuclear localization signal.
Recombinant polypeptides of the present disclosure may contain one or more nuclear localization signals that contain an amino acid sequence with at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% amino acid identity to the amino acid sequence of SEQ ID NO: 20.
Recombinant polypeptides of the present disclosure may contain one or more tags that allow for e.g., purification and/or detection of the recombinant polypeptide. Various tags may be used herein and are well-known to those of skill in the art. Exemplary tags may include HA, GST, FLAG, MBP, etc., and multiple copies of one or more tags may be present in a recombinant polypeptide.
Recombinant polypeptides of the present disclosure may contain one or more reporters that allow for e.g., visualization and/or detection of the recombinant polypeptide. A reporter polypeptide encodes a protein that may be readily detectable due to its biochemical characteristics such as, for example, enzymatic activity or chemifluorescent features. Reporter polypeptides may be detected in a number of ways depending on the characteristics of the particular reporter. For example, a reporter polypeptide may be detected by its ability to generate a detectable signal (e.g., fluorescence), by its ability to form a detectable product, etc. Various reporters may be used herein and are well-known to those of skill in the art. Exemplary reporters may include GFP, GUS, mCherry, luciferase, etc., and multiple copies of one or more tags may be present in a recombinant polypeptide.
Recombinant polypeptides of the present disclosure may contain one or more polypeptide domains that serve a particular purpose depending on the particular goal/need. For example, recombinant polypeptides may contain translocation sequences that target the polypeptide to a particular cellular compartment or area. Suitable features will be readily apparent to those of skill in the art.
CRISPR systems naturally use small base-pairing guide RNAs to target and cleave foreign DNA elements in a sequence-specific manner (Wiedenheft et al., 2012). There are diverse CRISPR systems in different organisms that may be used to target proteins of the present disclosure to a target nucleic acid. One of the simplest systems is the type II CRISPR system from Streptococcus pyogenes. Only a single gene encoding the Cas9 protein and two RNAs, a mature CRISPR RNA (crRNA) and a partially complementary trans-acting RNA (tracrRNA), are necessary and sufficient for RNA-guided silencing of foreign DNAs (Jinek et al, 2012). Maturation of crRNA requires tracrRNA and RNase III (Deltcheva et al., 2011). However, this requirement can be bypassed by using an engineered small guide RNA (gRNA) containing a designed hairpin that mimics the tracrRNA-crRNA complex (Jinek et al., 2012). Base pairing between the gRNA and target DNA normally causes double-strand breaks (DSBs) due to the endonuclease activity of Cas9.
It is known that the endonuclease domains of the Cas9 protein can be mutated to create a programmable RNA-dependent DNA-binding protein (dCas9) (Qi et al., 2013). The fact that duplex gRNA-dCas9 binds target sequences without endonuclease activity has been used to tether regulatory proteins, such as transcriptional activators or repressors, to promoter regions in order to modify gene expression (Gilbert et al., 2013), and Cas9 transcriptional activators have been used for target specificity screening and paired nickases for cooperative genome engineering (Mali et al., 2013, Nature Biotechnology 31:833-838). Thus, dCas9 may be used as a modular RNA-guided platform to recruit different proteins to DNA in a highly specific manner.
A variety of Cas proteins may be used in the methods of the present disclosure. There are several Cas9 genes present in different bacteria species (Esvelt, K et al, 2013, Nature Methods). One of the most characterized CAS9 proteins is the CAS9 protein from S. pyogenes that, in order to be active, needs to bind a gRNA with a specific sequence and the presence of a PAM motif (NGG, where N is any nucleotide) at the 3′ end of the target locus. However, other CAS9 proteins from different bacterial species show differences in 1) the sequence of the gRNA they can bind and 2) the sequence of the PAM motif. Therefore, other Cas9 proteins such as, for example, those from Streptococcus thermophilus or N. meningitidis may also be utilized herein. Indeed, these two Cas9 proteins have a smaller size (around 1100 amino acids) as compared to S. pyogenes Cas9 (1400 amino acids), which may confer some advantages during cloning or protein expression.
Cas9 proteins from a variety of bacteria have been used successfully in engineered CRISPR-Cas9 systems. There are also versions of Cas9 proteins available in which the codon usage has been more highly optimized for expression in eukaryotic systems, such as human codon optimized CAS9 (Cell, 152:1173-1183) and plant optimized CAS9 (Nature Biotechnology, 31:688-691).
Cas9 proteins may also be modified for various purposes. For example, Cas9 proteins may be engineered to contain a nuclear-localization sequence (NLS). Cas9 proteins may be engineered to contain an NLS at the N-terminus of the protein, at the C-terminus of the protein, or at both the N- and C-terminus of the protein. Engineering a Cas9 protein to contain an NLS may assist with directing the protein to the nucleus of a host cell. Cas9 proteins may be engineered such that they are unable to cleave nucleic acids (e.g. nuclease-deficient dCas9 polypeptides). One of skill in the art would be able to readily identify a suitable Cas9 protein for use in the methods and compositions of the present disclosure.
Exemplary Cas proteins that may be used in the methods and compositions of the present disclosure may include, for example, a Cas protein having the amino acid sequence of SEQ ID NO: 22, 24, or 26, homologs thereof, and fragments thereof.
In some embodiments, the Cas polypeptide is a SpCas9 polypeptide. SpCas9 polypeptides may contain an amino acid sequence with at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% amino acid identity to the amino acid sequence of SEQ ID NO: 22.
In some embodiments, the Cas polypeptide is a SpRY polypeptide. SpRY polypeptides may contain an amino acid sequence with at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% amino acid identity to the amino acid sequence of SEQ ID NO: 24.
In some embodiments, the Cas polypeptide is a AaCas12b polypeptide. AaCas12b polypeptides may contain an amino acid sequence with at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% amino acid identity to the amino acid sequence of SEQ ID NO: 26.
Fusion proteins comprising a Cas polypeptide and an effector domain are provided. In certain embodiments, the effector domain of the fusion protein can be a nucleotide deaminase or a catalytic domain thereof. The nucleotide deaminase may be an adenosine deaminase or a cytidine deaminase. In general, a Cas polypeptide fused with a deaminase domain can target a sequence in the genome of a plant through the direction of a guide RNA to perform base editing, including the introduction of C to T or A to G substitutions. In some embodiments, the adenosine deaminase can be, without limit, a member of the enzyme family known as adenosine deaminases that act on RNA (ADARs), a member of the enzyme family known as adenosine deaminases that act on tRNA (ADATs), or an adenosine deaminase domain-containing (ADAD) family member. In some embodiments, the cytidine deaminase can be, without limit, a member of the enzyme family known as apolipoprotein B mRNA-editing complex (APOBEC) family deaminase, an activation-induced deaminase (AID), or a cytidine deaminase 1 (CDA1).
An adenosine deaminase domain of the present disclosure may contain an amino acid sequence with at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% amino acid identity to the amino acid sequence of SEQ ID NO: 33.
A cytidine deaminase domain of the present disclosure may contain an amino acid sequence with at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% amino acid identity to the amino acid sequence of SEQ ID NO: 31.
In certain embodiments, the disclosure includes use of “dead guide RNAs”. These 14-nt or 15-nt guide RNAs have been shown to be catalytically inactive yet maintain target-site binding capacity (Kiani et al. (2015) Nat Methods 12, 1051-1054; Dahlman et al. (2015) Nat Biotechnol 33(11): 1159-1161). Thus, these catalytically dead guide RNAs can be utilized to modulate gene expression using a catalytically active Cas nuclease. Therefore, an active Cas nuclease can be repurposed to simultaneously perform genome editing and regulate gene transcription using both types of gRNAs in the same cell using a single active Cas.
In certain embodiments, the guide RNA is provided with one or more distinct RNA loop(s) or distinct sequence(s) (e.g. an aptamer) that can recruit an adapter protein. In particular embodiments, the aptamer is a minimal hairpin aptamer, which selectively binds MS2 bacteriophage coat protein (SEQ ID NO: 36) and is introduced into the guide RNA, such as in the stemloop and/or in a tetraloop.
In some embodiments, the guide RNA comprises an MS2 aptamer having a nucleic acid sequence with at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the nucleic acid sequence of SEQ ID NO: 34.
A variety of promoters may be used to drive expression of the guide RNA. Guide RNAs may be expressed using a Pol III promoter such as, for example, the U3 promoter, U6 promoter, or the H1 promoter (eLife 2013 2:e00471). For example, an approach in plants has been described using three different Pol III promoters from three different Arabidopsis U6 genes, and their corresponding gene terminators (BMC Plant Biology 2014 14:327). One skilled in the art would readily understand that many additional Pol III promoters could be utilized to simultaneously express many guide RNAs to many different locations in the genome. The use of different Pol III promoters for each gRNA expression cassette may be desirable to reduce the chances of natural gene silencing that can occur when multiple copies of identical sequences are expressed in plants.
In some embodiments, the guide RNA is driven by a U3 promoter. In some embodiments, the guide RNA is driven by a promoter having a nucleic acid sequence with at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the nucleic acid sequence of SEQ ID NO: 27.
Alternatively, a tRNA-gRNA expression cassette (Xie, X et al, 2015, Proc Natl Acad Sci USA. 2015 Mar. 17; 112(11):3570-5) may be used to deliver multiple gRNAs simultaneously with high expression levels. In such an embodiment, a tRNA in such a cassette may have a nucleic acid sequence with at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at leak about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the nucleic acid sequence of SEQ ID NO: 28.
Certain embodiments of the present disclosure relate to recombinant nucleic acids encoding recombinant proteins of the present disclosure. Certain aspects of the present disclosure relate to recombinant nucleic acids encoding various portions/domains of recombinant proteins of the present disclosure.
As used herein, the terms “polynucleotide,” “nucleic acid,” and variations thereof shall be generic to polydeoxyribonucleotides (containing 2-deoxy-D-ribose), to polyribonucleotides (containing D-ribose), to any other type of polynucleotide that is an N-glycoside of a purine or pyrimidine base, and to other polymers containing non-nucleotidic backbones, provided that the polymers contain nucleobases in a configuration that allows for base pairing and base stacking, as found in DNA and RNA. Thus, these terms include known types of nucleic acid sequence modifications, for example, substitution of one or more of the naturally occurring nucleotides with analog and inter-nucleotide modifications. As used herein, the symbols for nucleotides and polynucleotides are those recommended by the IUPAC-IUB Commission of Biochemical Nomenclature.
Sequences of the polynucleotides of the present disclosure may be prepared by various suitable methods known in the art, including, for example, direct chemical synthesis or cloning. For direct chemical synthesis, formation of a polymer of nucleic acids typically involves sequential addition of 3′-blocked and 5′-blocked nucleotide monomers to the terminal 5′-hydroxyl group of a growing nucleotide chain, wherein each addition is effected by nucleophilic attack of the terminal 5′-hydroxyl group of the growing chain on the 3′-position of the added monomer, which is typically a phosphorus derivative, such as a phosphotriester, phosphoramidite, or the like. Such methodology is known to those skilled in the art and is described in the pertinent texts and literature (e.g., in Matteucci et al., (1980) Tetrahedron Lett 21:719-722; U.S. Pat. Nos. 4,500,707; 5,436,327; and 5,700,637). In addition, the desired sequences may be isolated from natural sources by splitting DNA using appropriate restriction enzymes, separating the fragments using gel electrophoresis, and thereafter, recovering the desired polynucleotide sequence from the gel via techniques known to those skilled in the art, such as utilization of polymerase chain reactions (PCR; e.g., U.S. Pat. No. 4,683,195).
The nucleic acids employed in the methods and compositions described herein may be codon optimized relative to a parental template for expression in a particular host cell. Cells differ in their usage of particular codons, and codon bias corresponds to relative abundance of particular tRNAs in a given cell type. By altering codons in a sequence so that they are tailored to match with the relative abundance of corresponding tRNAs, it is possible to increase expression of a product (e.g., a polypeptide) from a nucleic acid. Similarly, it is possible to decrease expression by deliberately choosing codons corresponding to rare tRNAs. Thus, codon optimization/deoptimization can provide control over nucleic acid expression in a particular cell type (e.g., bacterial cell, plant cell, mammalian cell, etc.). Methods of codon optimizing a nucleic acid for tailored expression in a particular cell type are well-known to those of skill in the art.
Various methods are known to those of skill in the art for identifying similar (e.g. homologs, orthologs, paralogs, etc.) polypeptide and/or polynucleotide sequences, including phylogenetic methods, sequence similarity analysis, and hybridization methods.
Phylogenetic trees may be created for a gene family by using a program such as CLUSTAL (Thompson et al. Nucleic Acids Res. 22: 4673-4680 (1994); Higgins et al. Methods Enzymol 266: 383-402 (1996)) or MEGA (Tamura et al. Mol. Biol. & Evo. 24:1596-1599 (2007)). Once an initial tree for genes from one species is created, potential orthologous sequences can be placed in the phylogenetic tree and their relationships to genes from the species of interest can be determined. Evolutionary relationships may also be inferred using the Neighbor-Joining method (Saitou and Nei, Mol. Biol. & Evo. 4:406-425 (1987)). Homologous sequences may also be identified by a reciprocal BLAST strategy. Evolutionary distances may, for example, be computed using the Poisson correction method (Zuckerkandl and Pauling, pp. 97-166 in Evolving Genes and Proteins, edited by V. Bryson and H. J. Vogel. Academic Press, New York (1965)).
In addition, evolutionary information may be used to predict gene function. Functional predictions of genes can be greatly improved by focusing on how genes became similar in sequence (i.e., by evolutionary processes) rather than on the sequence similarity itself (Eisen, Genome Res. 8: 163-167 (1998)). Many specific examples exist in which gene function has been shown to correlate well with gene phylogeny (Eisen, Genome Res. 8: 163-167 (1998)). By using a phylogenetic analysis, one skilled in the art would recognize that the ability to deduce similar functions conferred by closely-related polypeptides is predictable.
When a group of related sequences are analyzed using a phylogenetic program such as CLUSTAL, closely related sequences typically cluster together or in the same clade (a group of similar genes). Groups of similar genes can also be identified with pair-wise BLAST analysis (Feng and Doolittle, J. Mol. Evol. 25: 351-360 (1987)). Analysis of groups of similar genes with similar functions that fall within one clade can yield sub-sequences that are particular to the clade. These sub-sequences, known as consensus sequences, can not only be used to define the sequences within each chide, but define the functions of these genes; genes within a clade may contain paralogous sequences, or orthologous sequences that share the same function (see also, for example, Mount, Bioinformatics: Sequence and Genome Analysis Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., page 543 (2001)).
To find sequences that are homologous to a reference sequence, BLAST nucleotide searches can be performed with the BLASTN program, score=100, wordlength=12, to obtain nucleotide sequences homologous to a nucleotide sequence encoding a protein of the disclosure. BLAST protein searches can be performed with the BLASTX program, score=50, wordlength=3, to obtain amino acid sequences homologous to a protein or polypeptide of the disclosure. To obtain gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul et al. (1997) Nucleic Acids Res. 25:3389. Alternatively, PSI-BLAST (in BLAST 2.0) can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al. (1997) supra. When utilizing BLAST, Gapped BLAST, or PSI-BLAST, the default parameters of the respective programs (e.g., BLASTN for nucleotide sequences, BLASTX for proteins) can be used.
Methods for the alignment of sequences and for the analysis of similarity and identity of polypeptide and polynucleotide sequences are well-known in the art.
As used herein “sequence identity” refers to the percentage of residues that are identical in the same positions in the sequences being analyzed. As used herein “sequence similarity” refers to the percentage of residues that have similar biophysical/biochemical characteristics in the same positions (e.g., charge, size, hydrophobicity) in the sequences being analyzed.
Methods of alignment of sequences for comparison are well-known in the art, including manual alignment and computer assisted sequence alignment and analysis. This latter approach is a preferred approach in the present disclosure, due to the increased throughput afforded by computer-assisted methods. As noted below, a variety of computer programs for performing sequence alignment are available or can be produced by one of skill in the art.
The determination of percent sequence identity and/or similarity between any two sequences can be accomplished using a mathematical algorithm. Examples of such mathematical algorithms are the algorithm of Myers and Miller, CABIOS 4:11-17 (1988); the local homology algorithm of Smith et al., Adv. Appl. Math. 2:482 (1981); the homology alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48:443-453 (1970); the search-for-similarity-method of Pearson and Lipman, Proc. Natl. Acad. Sci. 85:2444-2448 (1988); the algorithm of Karlin and Altschul; Proc. Natl. Acad. Sci. USA 87:2264-2268 (1990), modified as in Karlin and Altschul, Proc. Natl. Acad. Sci. USA 90:5873-5877 (1993).
Computer implementations of these mathematical algorithms can be utilized for comparison of sequences to determine sequence identity and/or similarity. Such implementations include, for example: CLUSTAL in the PC/Gene program (available from Intelligenetics, Mountain View, Calif); the AlignX program, version10.3.0 (Invitrogen, Carlsbad, Calif.) and GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Version 8 (available from Genetics Computer Group (GCG), 575 Science Drive; Madison; Wis., USA). Alignments using these programs can be performed using the default parameters. The CLUSTAL program is well described by Higgins et al. Gene 73:237-244 (1988); Higgins et al. CABIOS 5:151-153 (1989); Corpet et al., Nucleic Acids Res. 16:10881-90 (1988); Huang et al. CABIOS 8:155-65 (1992); and Pearson et al., Meth. Mol. Biol. 24:307-331 (1994). The BLAST programs of Altschul et al. Mol. Biol. 215:403-410 (1990) are based on the algorithm of Karlin and Altschul (1990) supra.
Polynucleotides homologous to a reference sequence can be identified by hybridization to each other under stringent or under highly stringent conditions. Single-stranded polynucleotides hybridize when they associate based on a variety of well characterized physical-chemical forces, such as hydrogen bonding, solvent exclusion, base stacking and the like. The stringency of a hybridization reflects the degree of sequence identity of the nucleic acids involved, such that the higher the stringency, the more similar are the two polynucleotide strands. Stringency is influenced by a variety of factors, including temperature, salt concentration and composition, organic and non-organic additives; solvents, etc. present in both the hybridization and wash solutions and incubations (and number thereof), as described in more detail in references cited below (e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd Ed., Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. (“Sambrook”) (1989); Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology, vol. 152 Academic Press, Inc., San Diego, Calif. (“Berger and Kimmel”) (1987); and Anderson and Young, “Quantitative Filter Hybridisation.” In: Flames and Higgins, ed., Nucleic Acid Hybridisation, A Practical Approach. Oxford; TRL Press, 73-111 (1985)).
Encompassed by the disclosure are polynucleotide sequences that are capable of hybridizing to the disclosed polynucleotide sequences and fragments thereof under various conditions of stringency (see, for example, Wahl and Berger, Methods Enzymol. 152: 399-407 (1987); and Kimmel, Methods Enzymo. 152: 507-511, (1987)). Full-length cDNA, homologs, orthologs, and paralogs of polynucleotides of the present disclosure may be identified and isolated using well-known polynucleotide hybridization methods.
With regard to hybridization, conditions that are highly stringent; and means for achieving them, are well known in the art. See, for example, Sambrook et al, (1989) (supra); Berger and Kimmel (1987) pp. 467-469 (supra); and Anderson and Young (1985)(supra).
Hybridization experiments are generally conducted in a buffer of pH between 6.8 to 7.4, although the rate of hybridization is nearly independent of pH at ionic strengths likely to be used in the hybridization buffer (Anderson and Young (1985) (supra)). In addition, one or more of the following may be used to reduce non-specific hybridization: sonicated salmon sperm DNA or another non-complementary DNA, bovine serum albumin, sodium pyrophosphate, sodium dodecylsulfate (SDS), polyvinyl-pyrrolidone, ficoll and Denhardt's solution. Dextran sulfate and polyethylene glycol 6000 act to exclude DNA from solution, thus raising the effective probe DNA concentration and the hybridization signal within a given unit of time. In some instances, conditions of even greater stringency may be desirable or required to reduce non-specific and/or background hybridization. These conditions may be created with the use of higher temperature, lower ionic strength and higher concentration of a denaturing agent such as formamide.
Stringency conditions can be adjusted to screen for moderately similar fragments such as homologous sequences from distantly related organisms, or to highly similar fragments such as genes that duplicate functional enzymes from closely related organisms. The stringency can be adjusted either during the hybridization step or in the post-hybridization washes. Salt concentration, formamide concentration, hybridization temperature and probe lengths are variables that can be used to alter stringency. As a general guideline, high stringency is typically performed at Tm—5° C. to Tm—20° C., moderate stringency at Tm—20° C. to Tm—35° C. and low stringency at Tm—35° C. to Tm—50° C. for duplex >150 base pairs. Hybridization may be performed at low to moderate stringency (25-50° C. below Tm), followed by post-hybridization washes at increasing stringencies. Maximum rates of hybridization in solution are determined empirically to occur at Tm—25° C. for DNA-DNA duplex and Tm—15° C. for RNA-DNA duplex. Optionally, the degree of dissociation may be assessed after each wash step to determine the need for subsequent, higher stringency wash steps.
High stringency conditions may be used to select nucleic acid sequences with high degrees of identity to the disclosed sequences. An example of stringent hybridization conditions obtained in a filter-based method such as a Southern or northern blot for hybridization of complementary nucleic acids that have more than 100 complementary residues is about 5° C. to 20° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH.
Hybridization and wash conditions that may be used to bind and remove polynucleotides with less than the desired homology to the nucleic acid sequences or their complements of the present disclosure include, for example: 6×SSC and 1% SDS at 65° C.;
50% formamide, 4×SSC at 42° C.; 0.5×SSC to 2.0×SSC, 0.1% SDS at 50° C. to 65° C.; or 0.1×SSC to 2×SSC, 0.1% SDS at 50° C.-65° C.; with a first wash step of, for example, 10 minutes at about 42° C. with about 20% (v/v) formamide in 0.1×SSC, and with, for example, a subsequent wash step with 0.2×SSC and 0.1% SUS at 65° C. for 10, 20 or 30 minutes.
For identification of less closely related homologs, wash steps may be performed at a lower temperature, e.g., 50° C. An example of a low stringency wash step employs a solution and conditions of at least 25° C. in 30 mM NaCl, 3 mM trisodium citrate, and 0.1% SDS over 30 min. Greater stringency may be obtained at 42° C. in 15 mM NaCl, with 1.5 mM trisodium citrate, and 0.1% SDS over 30 min, Wash procedures will generally employ at least two final wash steps. Additional variations on these conditions will be readily apparent to those skilled in the art (see, for example, US Patent Application No. 20010010913).
If desired, one may employ wash steps of even greater stringency, including conditions of 65° C-68° C. in a solution of 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS, or about 0.2×SSC, 0.1% SDS at 65° C. and washing twice, each wash step of 10, 20 or 30 min in duration, or about 0.1×SSC, 0.1% SDS at 65° C. and washing twice for 10, 20 or 30 min. Hybridization stringency may be increased further by using the same conditions as in the hybridization steps, with the wash temperature raised about 3° C. to about 5° C., and stringency may be increased even further by using the same conditions except the wash temperature is raised about 6° C. to about 9° C.
Various types of nucleic acids may be targeted for gene editing, activation, and repression as will be readily apparent to one of skill in the art. The gene editing may include non-homologous end joining (NHEJ) based mutagenesis (e.g. deletions and insertions; indels), base editing, prime editing, and homology-based repair (HDR). The target nucleic acid may be located within the coding region of a target gene or upstream or downstream thereof. Moreover, the target nucleic acid may reside endogenously in a target gene or may be inserted into the gene, e.g., heterologous, for example, using techniques such as homologous recombination. For example, a target gene of the present disclosure can be operably linked to a control region, such as a promoter, that contains a sequence that can be recognized by e.g., a guide RNA of the present disclosure such that a transcriptional activator of the present disclosure may be targeted to that sequence. In some embodiments, the target nucleic acid is not a target of and/or does not naturally associate with the naturally-occurring transcriptional activator polypeptide.
In some embodiments, the target nucleic acid is endogenous to the plant where the expression of one or more genes is activated according to the methods described herein. In some embodiments, the target nucleic acid is a transgene of interest that has been inserted into a plant. Methods of introducing transgenes into plants are well known in the art. Transgenes may be inserted into plants in order to provide a production system for a desired protein, or may be added to the genetic complement in order to modulate the metabolism of a plant.
Suitable target nucleic acids will be readily apparent to one of skill in the art depending on the particular need or outcome. The target nucleic acid may be in e.g., a region of euchromatin (e.g. highly expressed gene), or the target nucleic acid may be in a region of heterochromatin (e.g. centromere DNA). Use of transcriptional activators according to the methods described herein to induce transcriptional activation in a region of heterochromatin or other highly methylated region of a plant genome may be especially useful in certain research embodiments. For example, activation of a retrotransposon in a plant genome may find use in inducing mutagenesis of other genomic regions in that genome. A target nucleic acid of the present disclosure may be methylated or it may be unmethylated.
The CRISPRa system enables simultaneous activation of many genes in plants and can be used in applications such as: activation of plant endogenous morphogenic genes such as BABY BOOM (BBM) and WUSCHEL (WUS) for promoting plant species or genotype-independent regeneration, a bottleneck to generate transgenic or gene-edited crops; activation of endogenous florigen gene(s) (e.g. FT) for early flowering in plants; activation of morphogenic genes and florigen genes to promote rapid plant regeneration and shorten the plant life cycle for in crop breeding; activation of plant endogenous metabolic pathway genes for improving the production of certain metabolites and creating nutritious foods to improve human health; activation of plant immune responsive genes, especially through a pathogen inducible fashion, to confer designated resistance to plant diseases such as rice blast disease, soybean rust disease and citrus greening or Huanglongbing (HLB) disease; activation of plant enzyme genes for herbicide resistance (examples include ALS activation for resistance to imidazolinone and sulfonylurea, EPSPS activation for resistance glyphosate, ACC activation for resistance to haloxyfop-R-methyl and quizalofop, TubA2 activation for trifluralin, GS2 activation for glufosinate, and CESA3 activation for C17); activation of plant specific development pathways to promote growth, high yield, changed aboveground morphology, altered root structures, climate-resistance, etc; activation of pathways for improved nutrition deposition in plant cell, tissues, and organs including leaves, fruits, roots and seeds in a cell- or tissues-specific manner; activation of C4 photosynthesis pathway genes in C3 plants for improved photosynthesis; activation of plant stresses responsive pathways, especially through a stress inducible fashion, to confer designated resistance to environmental stresses such as heat, cold, drought stresses, etc, to achieve climate resilience.
The CRISPR-Combo system enables simultaneous gene editing, activation, and repression. The technology can be used in many applications such as: simultaneous gene editing and morphogenic genes (e.g. BBM and WUS) activation in crops, which allows for accelerated regeneration of gene-edited crops; simultaneous gene editing and florigen genes (e.g. FT) activation, which allows for fast-track breeding of gene-edited crop products; simultaneous gene editing and activation of morphogenic genes and florigen genes to promote plant regeneration and shorten the juvenile stage to accelerate the gene-editing based crop production pipeline; simultaneous gene editing and activation of an endogenous herbicide resistance gene or an endogenous marker gene to generated genome-edited plants without the use of a conventional selection marker that is provided exogenously (e.g., part of the T-DNA vector) such as nptII, hpt, bar, and gox, that confer resistance to kanamycin, hygromycin, phosphinothricin, and glyphosate, respectively; simultaneous gene editing and activation of one or many metabolic pathways in plant for sophisticated crop engineering; simultaneous gene editing and transcriptional regulation of one or many metabolic pathways in plant in a spatiotemporal manner; simultaneous gene editing and self-activation of the CRISPR-Combo components, through a positive regulation feedback loop, for robust expression in plant cells; simultaneous gene editing and repression of a DNA repair pathway (NHEJ or HDR) to control the gene editing outcomes; simultaneous gene editing and activation of a DNA repair pathway (NHEJ or HDR) to control the gene editing outcomes; simultaneous gene editing, activation of a DNA repair pathway and repression of another DNA repair pathway to control the gene editing outcomes; simultaneous gene editing, activation of morphogenic and/or florigen genes, and repression of a DNA repair pathway in plants; simultaneous gene editing to destroy invading DNA viruses in plants, with concurrent activation of plant defense pathways for engineering synergistic and robust virial defense in plants; simultaneous gene activation and editing of the same enzyme genes to engineer super herbicide resistance in crops.
As used herein, a “plant” refers to any of various photosynthetic, eukaryotic multi-cellular organisms of the kingdom Plantae, characteristically producing embryos, containing chloroplasts, having cellulose cell walls and lacking locomotion. As used herein, a “plant” includes any plant or part of a plant at any stage of development, including seeds, suspension cultures, plant cells, embryos, meristematic regions, callus tissue, leaves, roots, shoots, gametophytes, sporophytes, pollen, microspores, and progeny thereof. Also included are cuttings, and cell or tissue cultures. As used in conjunction with the present disclosure, plant tissue includes, for example, whole plants, plant cells, plant organs, e.g., leaves, stems, roots, meristems, plant seeds, protoplasts, callus, cell cultures, and any groups of plant cells organized into structural and/or functional units.
Any plant cell may be used in the present disclosure so long as it remains viable after being transformed with a sequence of nucleic acids. Preferably, the plant cell is not adversely affected by the transduction of the necessary nucleic acid sequences, the subsequent expression of the proteins or the resulting intermediates.
As disclosed herein, a broad range of plant types may be modified to incorporate recombinant polypeptides and/or polynucleotides of the present disclosure. Suitable plants that may be modified include both monocotyledonous (monocot) plants and dicotyledonous (dicot) plants.
Examples of suitable plants may include, for example, species of the Family Gramineae, including Sorghum bicolor and Zea mays; species of the genera: Cucurbita, Rosa, Vitis, Juglans, Fragaria, Lotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, Datura, Hyoscyamus, Lycopersicon, Nicotiana, Solanum, Petunia, Digitalis, Majorana, Cichorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Heterocallis, Nemesis, Pelargonium, Panieum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Pisum, Phaseolus, Lolium, Oryza, Avena, Hordeum, Secale, and Triticum.
In some embodiments, plant cells may include, for example, those from corn (Zea mays), canola (Brassica napus, Brassica rapa ssp.), Brassica species useful as sources of seed oil, alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), millet (e.g., pearl millet (Pennisetum glaucum), proso millet (Panicum miliaceum), foxtail millet (Setaria italica), finger millet (Eleusine coracana)), sunflower (Helianthus annuus), safflower (Carthamus tinctorius), wheat (Triticum aestivum), duckweed (Lemna), soybean (Glycine max), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium barbadense, Gossypium hirsutum), sweet potato (Ipomoea batatas), cassava (Manihot esculenta), coffee (Coffea spp.), coconut (Cocos nucijra), pineapple (Ananas comosus), citrus trees (Citrus spp.), cocoa (Theobroma cacao), tea (Camellia sinensis), banana (Musa spp.), avocado (Persea americana), fig (Ficus carica), guava (Psidium guajava), mango (Mangifera indica), olive (Olea europaea), Papaya (Carica papaya), cashew (Anacardium occidentale), Macadamia (Macadamia spp.), almond (Prunus amygdalus), sugar beets (Beta vulgaris), sugarcane (Saccharum spp), oats, barley, vegetables, ornamentals, and conifers.
Examples of suitable vegetable plants may include, for example, tomatoes (Lycopersicon esculentum), lettuce (e.g., Lactuca sativa), green beans (Phaseolus vulgaris), lima beans (Phaseolus limensis), peas (Lathyrus spp.), and members of the genus Cucumis such as cucumber (C. sativus), cantaloupe (C. cantalupensis), and musk melon (C. melo).
Examples of suitable ornamental plants may include, for example, azalea (Rhododendron spp.), hydrangea (Macrophylla hydrangea), hibiscus (Hibiscus rosasanensis), roses (Rosa spp.), tulips (Tulipa spp.), daffodils (Narcissus spp.), petunias (Petunia hybrida), carnation (Dianthus caryophyllus), poinsettia (Euphorbia pulcherrima), and chrysanthemum.
Examples of suitable conifer plants may include, for example, loblolly pine (Pinus taeda), slash pine (Pinus elliottii), Ponderosa pine (Pinus ponderosa), lodgepole pine (Pinus contorta), Monterey pine (Pinus radiata), Douglas-fir (Pseudotsuga menziesii), Western hemlock (Tsuga canadensis), Sitka spruce (Picea glauca), redwood (Sequoia sempervirens), silver fir (Abies amabilis), balsam fir (Abies balsamea), Western red cedar (Thuja plicata), and Alaska yellow-cedar (Chamaecyparis nootkatensis).
Examples of suitable leguminous plants may include, for example, guar, locust bean, fenugreek, soybean, garden beans, cowpea, mungbean, lima bean, fava bean, lentils, chickpea, peanuts (Arachis sp.), crown vetch (Vicia sp.), hairy vetch, adzuki bean, lupine (Lupinus sp.), trifolium, common bean (Phaseolus sp.), field bean (Pisum sp.), clover (Melilotus sp.) Lotus, trefoil, lens, and false indigo.
Examples of suitable forage and turf grass may include, for example, alfalfa (Medicago ssp.), orchard grass, tall fescue, perennial ryegrass, creeping bentgrass, and redtop.
Examples of suitable crop plants and model plants may include, for example, Arabidopsis, corn, rice, alfalfa, sunflower, canola, soybean, cotton, peanut, sorghum, wheat, and tobacco.
The plants of the present disclosure may be genetically modified in that recombinant nucleic acids have been introduced into the plants, and as such the genetically modified plants do not occur in nature. A suitable plant of the present disclosure is one capable of expressing one or more nucleic acid constructs encoding one or more recombinant proteins.
As used herein, the terms “transgenic plant” and “genetically modified plant” are used interchangeably and refer to a plant, which contains within its genome a recombinant nucleic acid. Generally, the recombinant nucleic acid is stably integrated within the genome such that the polynucleotide is passed on to successive generations. However, in certain embodiments, the recombinant nucleic acid is transiently expressed in the plant. The recombinant nucleic acid may be integrated into the genome alone or as part of a recombinant expression cassette. “Transgenic” is used herein to include any cell, cell line, callus, tissue, plant part or plant, the genotype of which has been altered by the presence of exogenous nucleic acid including those transgenics initially so altered as well as those created by sexual crosses or asexual propagation from the initial transgenic.
“Recombinant nucleic acid” or “heterologous nucleic acid” or “recombinant polynucleotide” as used herein refers to a polymer of nucleic acids wherein at least one of the following is true: (a) the sequence of nucleic acids is foreign to (i.e., not naturally found in) a given host cell; (b) the sequence may be naturally found in a given host cell, but in an unnatural (e.g., greater than expected) amount; or (c) the sequence of nucleic acids contains two or more subsequences that are not found in the same relationship to each other in nature. For example, regarding instance (c), a recombinant nucleic acid sequence will have two or more sequences from unrelated genes arranged to make a new functional nucleic acid. Specifically, the present disclosure describes the introduction of an expression vector into a plant cell, where the expression vector contains a nucleic acid sequence coding for a protein that is not normally found in a plant cell or contains a nucleic acid coding for a protein that is normally found in a plant cell but is under the control of different regulatory sequences. With reference to the plant cell's genome, then, the nucleic acid sequence that codes for the protein is recombinant. A protein that is referred to as recombinant generally implies that it is encoded by a recombinant nucleic acid sequence which may be present in the plant cell. Recombinant proteins of the present disclosure may also be exogenously supplied directly to host cells (e.g. plant cells).
A “recombinant” polypeptide, protein, or enzyme of the present disclosure, is a polypeptide, protein, or enzyme that may be encoded by a “recombinant nucleic acid” or “heterologous nucleic acid” or “recombinant polynucleotide.”
In some embodiments, the genes encoding the recombinant proteins in the plant cell may be heterologous to the plant cell. In certain embodiments, the plant cell does not naturally produce one or more polypeptides of the present disclosure, and contains heterologous nucleic acid constructs capable of expressing one or more genes necessary for producing those molecules. In certain embodiments, the plant cell does not naturally produce one or more polypeptides of the present disclosure, and is provided the one or more polypeptides through exogenous delivery of the polypeptides directly to the plant cell without the need to express a recombinant nucleic acid encoding the recombinant polypeptide in the plant cell.
Recombinant nucleic acids and/or recombinant proteins of the present disclosure may be present in host cells (e.g. plant cells). In some embodiments, recombinant nucleic acids are present in an expression vector, and the expression vector may be present in host cells (e.g. plant cells).
Recombinant polypeptides of the present disclosure may be introduced into plant cells via any suitable methods known in the art. For example, a recombinant polypeptide can be exogenously added to plant cells and the plant cells are maintained under conditions such that the recombinant polypeptide is involved with targeting one or more target nucleic acids to activate the expression of the target nucleic acids in the plant cells. Alternatively, a recombinant nucleic acid encoding a recombinant polypeptide of the present disclosure can be expressed in plant cells. Additionally, in some embodiments, a recombinant polypeptide of the present disclosure may be transiently expressed in a plant via viral infection of the plant. Methods of introducing recombinant proteins via viral infection or via the introduction of RNAs into plants are well known in the art. For example, Tobacco Rattle Virus (TRV) has been successfully used to introduce zinc finger nucleases in plants to cause genome modification (“Nontransgenic Genome Modification in Plant Cells”, Plant Physiology 154:1079-1087 (2010)).
A recombinant nucleic acid encoding a recombinant polypeptide of the present disclosure can be expressed in a plant with any suitable plant expression vector. Typical vectors useful for expression of recombinant nucleic acids in higher plants are well known in the art and include, for example, vectors derived from the tumor-inducing (Ti) plasmid of Agrobacterium tumefaciens (e.g., see Rogers et al., Meth. in Enzymol. (1987) 153:253-277). These vectors are plant integrating vectors in that on transformation, the vectors integrate a portion of vector DNA into the genome of the host plant. Exemplary A. tumefaciens vectors useful herein are plasmids pKYLX6 and pKYLX7 (e.g., see of Schardl et al., Gene (1987) 61:1-11; and Berger et al., Proc. Natl. Acad. Sci. USA (1989) 86:8402-8406); and plasmid pBI 101.2 that is available from Clontech Laboratories, Inc. (Palo Alto, Calif.).
In addition to regulatory domains, recombinant polypeptides of the present disclosure can be expressed as a fusion protein that is coupled to, for example, a maltose binding protein (“MBP”), glutathione S transferase (GST), hexahistidine, c-myc, or the FLAG epitope for ease of purification, monitoring expression, or monitoring cellular and subcellular localization.
Moreover, a recombinant nucleic acid encoding a recombinant polypeptide of the present disclosure can be modified to improve expression of the recombinant protein in plants by using codon preference. When the recombinant nucleic acid is prepared or altered synthetically, advantage can be taken of known codon preferences of the intended plant host where the nucleic acid is to be expressed. For example, recombinant nucleic acids of the present disclosure can be modified to account for the specific codon preferences and GC content preferences of monocotyledons and dicotyledons, as these preferences have been shown to differ (Murray et al., Nucl. Acids Res. (1989) 17: 477-498).
The present disclosure further provides expression vectors encoding recombinant polypeptides of the present disclosure. A nucleic acid sequence coding for the desired recombinant nucleic acid of the present disclosure can be used to construct a recombinant expression vector, which can be introduced into the desired host cell. A recombinant expression vector will typically contain a nucleic acid encoding a recombinant protein of the present disclosure, operably linked to transcriptional initiation regulatory sequences which will direct the transcription of the nucleic acid in the intended host cell, such as tissues of a transformed plant. Recombinant nucleic acids e.g. encoding recombinant polypeptides of the present disclosure may be expressed on multiple expression vectors or they may be expressed on a single expression vector.
For example, plant expression vectors may include (1) a cloned gene under the transcriptional control of 5 and 3′ regulatory sequences and (2) a dominant selectable marker. Such plant expression vectors may also include, if desired, a promoter regulatory region (e.g., one conferring inducible or constitutive, environmentally- or developmentally-regulated, or cell- or tissue-specific/selective expression), a transcription initiation start site, a ribosome binding site, an RNA processing signal, a transcription termination site, and/or a polyadenylation signal.
In some embodiments, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a promoter (e.g. a promoter functional in plants or a plant-specific promoter). A plant promoter, or functional fragment thereof, can be employed to control the expression of a recombinant nucleic acid of the present disclosure in regenerated plants. The selection of the promoter used in expression vectors will determine the spatial and temporal expression pattern of the recombinant nucleic acid in the modified plant, e.g., the nucleic acid encoding the recombinant polypeptide of the present disclosure is only expressed in the desired tissue or at a certain time in plant development or growth. Certain promoters will express recombinant nucleic acids in all plant tissues and are active under most environmental conditions and states of development or cell differentiation (i.e., constitutive promoters). Other promoters will express recombinant nucleic acids in specific cell types (such as leaf epidermal cells, mesophyll cells, root cortex cells) or in specific tissues or organs (roots, leaves or flowers; for example) and the selection will reflect the desired location of accumulation of the gene product. Alternatively, the selected promoter may drive expression of the recombinant nucleic acid under various inducing conditions.
Examples of suitable constitutive promoters may include, for example, the core promoter of the Rsyn7, the core CaMV 355 promoter (Odell et al., Nature (1985) 313:810-812), CaMV 19S (Lawton et al., 1987), rice actin (Wang et al., 1992; U.S. Pat. No. 5,641,876; and McElroy et al., Plant Cell (1985) 2:163-171); ubiquitin (Christensen et al., Plant Mol. Biol. (1989) 12:619-632; and Christensen et al., Plant Mol. Biol. (1992) 18:675-689), pEMU (Last et al., Theor. Appl. Genet. (1991) 81:581-588), MAS (Velton et al., EMBO J. (1984) 3:2723-2730), nos (Ebert et al., 1987), Adh (Walker et al.; 1987), the P- or 2′-promoter derived from T-DNA of Agrobacterium tumefaciens, the Smas promoter, the cinnamyl alcohol dehydrogenase promoter (U.S. Pat. No. 5,683,439), the Nos promoter, the pEmu promoter, the rubisco promoter, the GRP 1-8 promoter, and other transcription initiation regions from various plant genes known to those of skilled artisans, and constitutive promoters described in, for example, U.S. Pat. Nos. 5,608,149; 5,608,144; 5,604,121; 5,569,597; 5,466,785; 5,399,680; 5,268,463; and 5,608,142.
In some embodiments, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a UBQ10 promoter. In some embodiments, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a promoter having a nucleic acid sequence with at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the nucleic acid sequence of SEQ ID NO: 29.
Examples of suitable tissue specific promoters may include, for example, the lectin promoter (Vodkin et al., 1983; Lindstrom et al., 1990), the corn alcohol dehydrogenase 1 promoter (Vogel et al., 1989; Dennis et al., 1984), the corn light harvesting complex promoter (Simpson, 1986; Bansal et al., 1992); the corn heat shock protein promoter (Odell et al., Nature (1985) 313:810-812; Rochester et al., 1986), the pea small subunit RuBP carboxylase promoter (Poulsen et al., 1986; Cashmore et al., 1983), the Ti plasmid mannopine synthase promoter (Langridge et al., 1989), the Ti plasmid nopaline synthase promoter (Langridge et al., 1989), the petunia chalcone isomerase promoter (Van Tunen et al., 1988), the bean glycine rich protein 1 promoter (Keller et al., 1989), the truncated CaMV 35s promoter (Odell et al., Nature (1985) 313:810-812), the potato patatin promoter (Wenzler et al., 1989), the root cell promoter (Conkling et al., 1990); the maize zein promoter (Reina et al., 1990; Kriz et al., 1987; Wandelt and Feix, 1989; Langridge and Feix, 1983; Reina et al., 1990), the globulin-1 promoter (Belanger and Kriz et al., 1991), the a-tubulin promoter, the cab promoter (Sullivan et al., 1989), the PEPCase promoter (Hudspeth & Grula, 1989), the R gene complex-associated promoters (Chandler et al., 1989), and the chalcone synthase promoters (Franken et al., 1991).
Alternatively, the plant promoter can direct expression of a recombinant nucleic acid of the present disclosure in a specific tissue or may be otherwise under more precise environmental or developmental control. Such promoters are referred to here as “inducible” promoters. Environmental conditions that may affect transcription by inducible promoters include, for example, pathogen attack, anaerobic conditions, or the presence of light. Examples of inducible promoters include, for example, the AdhI promoter which is inducible by hypoxia or cold stress; the Hsp70 promoter which is inducible by heat stress, and the PPDK promoter which is inducible by light. Examples of promoters under developmental control include, for example, promoters that initiate transcription only, or preferentially, in certain tissues, such as leaves, roots, fruit, seeds, or flowers. An exemplary promoter is the anther specific promoter 5126 (U.S. Pat. Nos. 5,689,049 and 5,689,051). The operation of a promoter may also vary depending on its location in the genome. Thus, an inducible promoter may become fully or partially constitutive in certain locations.
Moreover, any combination of a constitutive or inducible promoter, and a non-tissue specific or tissue specific promoter may be used to control the expression of various recombinant polypeptides of the present disclosure.
The recombinant nucleic acids of the present disclosure and/or a vector housing a recombinant nucleic acid of the present disclosure, may also contain a regulatory sequence that serves as a 3′ terminator sequence. One of skill in the art would readily recognize a variety of terminators that may be used in the recombinant nucleic acids of the present disclosure. For example, a recombinant nucleic acid of the present disclosure may contain a 3′ NOS terminator.
In some embodiments, recombinant nucleic acids of the present disclosure contain a transcriptional termination site. Transcription termination sites may include, for example, OCS terminators and NOS terminators.
In some embodiments, the vector comprises a nucleic acid sequence with at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% nucleic acid sequence identity to the nucleic acid sequence of any of SEQ ID NOs: 37-47.
Plant transformation protocols as well as protocols for introducing recombinant nucleic acids of the present disclosure into plants may vary depending on the type of plant or plant cell, e.g., monocot or dicot, targeted for transformation. Suitable methods of introducing recombinant nucleic acids of the present disclosure into plant cells and subsequent insertion into the plant genome include, for example, microinjection (Crossway et al, Biotechniques (1986) 4:320-334), electroporation (Riggs et al., Proc. Natl. Acad Sci. USA (1986) 83:5602-5606), Agrobacterium-mediated transformation (U.S. Pat. No. 5,563,055), direct gene transfer (Paszkowski et al., EMBO J. (1984) 3:2717-2722), and ballistic particle acceleration (U.S. Pat. No. 4,945,050; Tomes et al. (1995). “Direct DNA Transfer into Intact Plant Cells via Microprojectile Bombardment,” in Plant Cell, Tissue, and Organ Culture: Fundamental Methods; ed. Gamborg and Phillips (Springer-Verlag, Berlin); and McCabe et al., Biotechnology (1988) 6:923-926).
Additionally, recombinant polypeptides of the present disclosure can be targeted to a specific organelle within a plant cell. Targeting can be achieved by providing the recombinant protein with an appropriate targeting peptide sequence. Examples of such targeting peptides include, for example, secretory signal peptides (for secretion or cell wall or membrane targeting), plastid transit peptides, chloroplast transit peptides, mitochondrial target peptides, vacuole targeting peptides, nuclear targeting peptides, and the like (e.g., see Reiss et al., Mol. Gen. Genet. (1987) 209(1):116-121; Settles and Martienssen, Trends Cell Biol (1998) 12:494-501; Scott et al, J Biol Chem (2000) 10:1074; and Luque and Correas, J Cell Sci (2000) 113:2485-2495).
The modified plant may be grown into plants in accordance with conventional ways (e.g. McCormick et al., Plant Cell. Reports (1986) 81-84). These plants may then be grown, and pollinated with either the same transformed strain or different strains, with the resulting progeny having the desired phenotypic characteristic. Two or more generations may be grown to ensure that the subject phenotypic characteristic is stably maintained and inherited and then seeds harvested to ensure the desired phenotype or other property has been achieved.
The present disclosure also provides plants derived from plants having increased expression, reduced expression, or a genomic edit as a consequence of the methods of the present disclosure. A plant having increased expression, reduced expression, or a genomic edit as a consequence of the methods of the present disclosure may be crossed with itself or with another plant to produce an F1 plant. In some embodiments, one or more of the resulting F1 plants can also have increased expression, reduced expression, or a genomic edit of the target nucleic acid.
Further provided are methods of screening plants derived from plants having increased expression, reduced expression, or a genomic edit as a consequence of the methods of the present disclosure. In some embodiments, the derived plants (e.g. F1 or F2 plants resulting from or derived from crossing the plant having increased expression, reduced expression, or a genomic edit as a consequence of the methods of the present disclosure with another plant) can be selected from a population of derived plants. For example, provided are methods of selecting one or more of the derived plants that (i) lack recombinant nucleic acids, and (ii) have increased expression, reduced expression, or a genomic edit of the target nucleic acid.
A target nucleic acid of the present disclosure in a plant cell of the present disclosure may have its expression increased/upregulated/activated by at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% as compared to a corresponding control.
A target nucleic acid of the present disclosure in a plant cell of the present disclosure may have its expression reduced/downregulated/repressed by at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 100% as compared to a corresponding control.
Various controls will be readily apparent to one of skill in the art. For example, a control may be a corresponding plant or plant cell that does not contain recombinant polypeptides of the present disclosure (e.g. wild-type plant or plant cell).
Methods of probing the expression level of a nucleic acid are well-known to those of skill in the art. For example, qRT-PCR analysis may be used to determine the expression level of a population of nucleic acids isolated from a nucleic acid-containing sample (e.g., plants, plant tissues, or plant cells).
Growing conditions sufficient for the recombinant polypeptides of the present disclosure to be expressed in the plant to be targeted to and modulate the expression of one or more target nucleic acids of the present disclosure are well known in the art and include any suitable growing conditions disclosed herein. Typically, the plant is grown under conditions sufficient to express a recombinant polypeptide of the present disclosure, and for the expressed recombinant polypeptides to be localized to the nucleus of cells of the plant in order to be targeted to and modulate the expression of the target nucleic acids (if those targets are present in the nucleus). Generally, the conditions sufficient for the expression of the recombinant polypeptide will depend on the promoter used to control the expression of the recombinant polypeptide. For example, if an inducible promoter is utilized, expression of the recombinant polypeptide in a plant will require that the plant be grown in the presence of the inducer.
Growing conditions sufficient for the recombinant polypeptides of the present disclosure to be expressed in the plant to be targeted to and modulate the expression of one or more target nucleic acids may vary depending on a number of factors (e.g. species of plant, use of inducible promoter, etc.). Suitable growing conditions may include, for example, ambient environmental conditions, standard greenhouse conditions, growth in long days under standard environmental conditions (e.g. 16 hours of light, 8 hours of dark), growth in 12 hour light: 12 hour dark day/night cycles, etc.
Various time frames may be used to observe changes in expression of a target nucleic acid according to the methods of the present disclosure. Plants may be observed/assayed for changes in expression of a target nucleic acid after, for example, about 5 days of growth, about 10 days of growth, about 15 days after growth, about 20 days after growth, about 25 days after growth, about 30 days after growth, about 35 days after growth, about 40 days after growth, about 50 days after growth, or 55 days or more of growth.
The following numbered embodiments also form part of the present disclosure:
Our previous CRISPR-Act2.0 system utilized an engineered gRNA2.0 (gR2.0) scaffold that contains two MS2 RNA aptamers for recruiting activator VP64 through the MS2-MCP interaction. We reasoned that by installing more MS2 aptamers into the single guide RNA (sgRNA) scaffold the system could recruit more VP64 that might lead to improved gene activation. We adopted sgRNA scaffolds containing 8xMS2 (gR8xMS2) and 16xMS2 (gR16xMS2), as both scaffolds were previously demonstrated to recruit many copies of fluorescent proteins for live cell imaging of mammalian cells (
These earlier attempts suggested that some strategies which successfully recruit fluorescent proteins for DNA imaging do not result in gene activation through recruitment of transcription activators such as VP64. This could be due to the complex process of gene activation as it requires further recruitment of transcription machinery based on the activators. The SunTag system has been previously established for gene activation in both human cells and plants. In the SunTag system, the tandemly arrayed GCN4 epitopes are directly fused to the C-terminus dCas9 to recruit VP64 through a single-chain antibody scFv. We hypothesized that coupling the SunTag system with the MS2-MCP interaction would recruit more VP64 (
Encouraged by the success in combining the SunTag system with the MS2 system, we next developed two new activators 2xTAD (TAL Activation Domain) and 2xTAD-VP64 and compared them with previously reported activators VP64, TV and VPR to test this platform by targeting OsER1 in rice protoplasts (
We next sought to visualize the CRISPR-Act3.0-mediated activation by using a mCherry reporter system. Two randomly selected promoters ProOsTPR-like and ProOsCCR1 were used to drive mCherry expression, generating two corresponding mCherry reporter constructs (
The tRNA-based processing system is highly compact and efficient for multiplexing sgRNAs in plants, yeast, Drosophila, and human cells. To enable efficient multiplexed gene activation in rice, we developed a streamlined cloning system for one-step assembly of up to six tRNA-gRNA2.0 cassettes (
However, we also found that highly efficient singular sgRNAs could be identified using a protoplast-based prescreen process. In most cases, the activation level with a single sgRNA would be strong enough for the target gene, which reserves much room for multiplexing many genes as only one sgRNA is used for one gene. To demonstrate this one sgRNA for one gene strategy, we sought to apply CRISPR-Act3.0 to target metabolic pathway genes with the M-tRNA system. In a first demonstration, we targeted seven enzyme-encoding genes in the β-carotene pathway in rice. For each gene, three to four sgRNAs were tested in rice protoplasts in the prescreen step. Prescreen data in rice protoplasts showed four of seven genes could be activated 10-fold or higher (
Furthermore, we used Agrobacterium-mediated transformation to introduce a M-Act3.0 vector targeting the six enzyme-encoding genes in the proanthocyanidin pathway and the no-sgRNA control vector into the indica rice variety Kasalath. The M-Act3.0 system resulted in a similar activation pattern for all target genes except OsLAR in both rice protoplast cells and transgenic callus (
It is worth noting that the final T-DNA vector expressing dpcoCas9-Act3.0 and M-tRNA components could cause DNA rearrangements in A. tumefaciens EHA105 despite that different promoters (ZmUbi, UBQ10 (ubiquitin-10) or a cauliflower mosaic virus 35S) were used to drive the dpcoCas9 expression (
To assess CRISPR-Act3.0 in dicot plants, we simultaneously targeted two genes, AtFT (regulating flowering) and AtTCL1 (regulating trichome development), in the model plant Arabidopsis by the dpcoCas9 based CRISPR-Act3.0 system. Each gene was targeted with two sgRNAs and the four corresponding sgRNAs were assembled based on the streamlined cloning system (
Since zCas9 resulted high efficiency genome editing in dicot plants such as Arabidopsis and carrot, the dzCas9-Act3.0 system presumably should work well for gene activation in dicot plants. We tested dzCas9-Act3.0 in tomato. Four different sgRNAs (gR1 to gR4) were designed to target the promoter of the SFT gene in tomato. Based on a protoplast assay, gR1 and gR2 each resulted in 240-fold transcription activation, while gR3 and gR4 generated —30-fold and 20-fold transcription activation, respectively (
Our work here, along with earlier studies in plants, has shown that gene activation efficiency varies among different sgRNAs for the same target gene. When designing sgRNAs, we had already focused on the most effective promoter region, which is 0 bp to −250 bp from the transcription start site (TSS), according to earlier studies in humans. To provide further guidance in sgRNA design for implementing CRISPR-Act3.0 in plants, we investigated the protoplast-based gene activation data from 56 sgRNAs targeting the −3 bp to −261 bp region from the TSS of 16 genes in rice. We found that most sgRNAs were effective in the 0 bp to −200 bp region from the TSS (
The narrow high-activity targeting window, high-activity GC contents and preference of targeting noncoding strand DNA would collectively limit the sgRNA choice in designing and implementing CRISPR-Act3.0 in plants. Additionally, it is also important to avoid targeting cis-regulatory elements so that binding of the CRISPR-Act3.0 components will not interfere with the recruitment of endogenous transcription factors and regulators necessary for transcription. In light of all these issues, it could be challenging to find many potentially good target sites for CRISPR-Act3.0 based on SpCas9, which recognizes NGG (N=A, C, G or T) protospacer adjacent motifs (PAMs). The limited target choices when targeting AtTCL1 in Arabidopsis may partly explain the relatively low level of gene activation that we observed for this gene (
We however realized that the improved Cas12b activation system was not as strong as the SpCas9 based CRISPR-Act3.0 system. Subsequently, we decided to relax the PAM requirements of SpCas9 in CRISPR-Act3.0. One promising SpCas9 variant is Cas9-NG that recognizes NG PAMs in human cells and in plants. Another promising SpCas9 variant is SpRY, which was recently claimed as near-PAM-less since it can edit NR (R=G and A) PAM sites with high efficiency and NY (Y=C and T) PAM sites with relatively low efficiency. To compare both SpCas9 variants, we engineered dzCas9-NG-Act3.0 and dSpRY-Act3.0 (based on the same maize codon-optimized Cas9) (
In plant functional genomics, a central question is to define the causal relationships between gene expression and phenotypic features in plants. The CRISPRa represents a promising approach to streamline and expedite such research by targeting gene activation in plants. To improve activation potency, targeting flexibility and scalability of CRISPRa in plants, we applied an engineering approach to systemically exploit different sgRNA scaffolds and transcription activators to develop the next-generation CRISPRa systems. We successfully developed CRISPR-Act3.0, which consists of dCas9-VP64, gR2.0 scaffold with 2xMS2 stem loops, 10xGCN4 SunTag fused to RNA binding protein MCP and 2xTAD activators fused to scFv (
To make the CRISPR-Act3.0 systems user-friendly, we developed an efficient toolbox for multiplexed sgRNAs assembly of up to six gRNA2.0 cassettes in one step based on polymerase chain reaction (PCR)-free modular Golden Gate cloning and Gateway cloning systems (
Previous studies have demonstrated that the CRISPRa potency is highly sensitive to sgRNA target position relative to the TSS. The optimal targeting window for CRISPRa in mammalian cells, bacteria and plants had been reported to be the 200 bp, 60-90 bp and 350 bp upstream region of the TSS. However, only limited sgRNAs and a few genes were tested in these studies. By analyzing activation data from 16 genes with 56 sgRNAs, we identified the −0 bp to −200 bp region from TSS as a high activity window for CRISPR-Act3.0 based gene activation in plants (
Toward this end, we developed an improved dAaCas12b-based activation system for targeting VTTV PAMs with a new engineered sgRNA scaffold Aac.4 (
In conclusion, we have developed a highly efficient CRISPR-Act3.0 toolbox for multiplexed gene activation in plants, which would aid many applications including rewiring metabolic pathways, investigating gene regulatory networks, and genome-wide screens for identifying key genes in regulating plant development and stress responses.
To assess inactivation of CRISPR-Cas9′s DNA cleavage activity through sgRNA engineering, we tested targeted mutagenesis by Cas9 at OsYSA and OsMAPK5 loci using differ protospacer lengths in rice protoplasts. Restriction fragment length polymorphism (RFLP) analysis showed that 17 to 20-nt protospacers conferred efficient mutagenesis while 14 to 16-nt protospacers were unable to cause mutations at the target site (
The above experiments proved the principle of a first CRISPR-Combo system that allows for simultaneous genome editing and gene activation in an orthogonal manner programmed by the normal sgRNA (gR1.0) and sgRNA-2.0 (gR2.0) with 20-nt protospacers and 15-nt protospacers, respectively (
Recently, Cas9 variant SpRY was demonstrated for PAM-less genome editing in human cells and plants. Based on the CRISPR-Combo principle, we further demonstrated that SpRY-Act3.0 enables orthogonal gene activation and knockout by simultaneous targeting OsBBM1, OsGW2 and OsGN1a at both NGG and NGC PAMs (
We next sought to develop CRISPR-Combo systems suitable for simultaneous base editing and gene activation. For C-to-T base editing, CBE-Cas9n-Act3.0 was generated by implanting the highly efficient A3A/Y130E-Cas9-UGI into the CRISPR-Act3.0 system (
To broaden the targeting scope, we also generated CBE-SpRYn-Act3.0 and ABE-SpRYn-Act3.0. By simultaneously targeting OsBBM1, OsALS and OsEPSPS, we found CBE-SpRYn-Act3.0 only generated a low level of gene activation of OsBBM1 in rice protoplasts, while ABE-SpRYn-Act3.0 failed for gene activation (
The CRISPR-Combo systems are enabling technologies due to their ability for simultaneous gene editing and activation. For demonstration in this study, we decided to focus on addressing some of the most pressing challenges in plant genome editing. The first change is on achieving accelerated breeding of genome-edited transgene-free plants. We recently showed activation of the florigen gene FT in Arabidopsis by CRISPR-Act3.0 promoted early flowering. We reasoned that a genome editing pipeline with simultaneous activation of such a florigen gene using CRISPR-Combo would have three benefits compared to the traditional genome editing experiments. First, it would drastically reduce the plant breeding life cycle. Second, the transgenic plants with extra-early flowering phenotypes (plants showed four leaves when flower buds became visible) would suggest high levels of CRISPR-Combo expression, indicating high levels of genome editing in these lines. Hence, the easy-to-score extra-early flowering phenotype would indicate high-efficiency genome editing, saving much effort for molecular genotyping. Third, selection of normal flowering plants in the next generation of genome-edited early flowering plants would drastically reduce the effort of genotyping for transgene-free genome-edited plants by at least 75%, based on the Mendelian segregation pattern of a single transgene.
To assess this improved genome editing pipeline, we decided to test two CRISPR-Combo systems, Cas9-Act3.0 and CBE-Cas9n-Act3.0, in Arabidopsis. For Cas9-Act3.0, multiplexed gene editing (GE) of AtPYL1 and AtAP1 was pursued with simultaneous activation of AtFT, with three different sgRNAs according to the design guideline (
We next sought to evaluate whether the early flowering phenotype could be reliably used as a phenotypic marker for transgenic plants in the next (T2) generation. We focused on the progeny of some extra-early flowering T1 plants. Plants from each T2 population were again classified as extra-early flowering, early flowering, and standard (
Three representative CBE-Cas9n-Act3.0-mediated atals atacc2 T2 lines 14-#23, 17-#11, and 17-#22 were selected to determine herbicide resistance. T2 line 14-#23 contains the P197F mutation of AtALS and the P1864L mutation of AtACC2. Both 17-#11, and 17-#22 contain the P197S mutation of AtALS and the P1864L mutation of AtACC2. We found that the descendants of T2 lines 14-#23, 17-#11, and 17-#22 were tribenuron-resistant, consistent with the previous report that P197F and P197S mutations of AtALS confer Arabidopsis plants with herbicide resistance (Chen at al., Sci China Life Sci (2017) 60:520-523). However, whether P1864L of AtACC2 would confer herbicide resistance was unknown. We found that the three atals atacc2 T2 lines' descendants could survive in the MS medium supplemented with both tribenuron and haloxyfop, indicating P1864L of AtACC2 confers Arabidopsis plants with haloxyfop resistance (
In addition, we investigated whether the Cas9-Act3.0-A+GE and CBE-Cas9n-Act3.0-A+BE constructs induced potential off-target events at AtFT target sites with a 15 nt sgRNA (
Many plant species are recalcitrant for tissue culture and regeneration. Even a plant species can be regenerated, the process is often lengthy and tedious. These challenges prevent the wide use of genome editing in many plant species. To overcome this challenge, ectopic expression of morphogenic genes was successful applied to boost plant tissue culture and de novo meristem regeneration. The nature of pluripotency of plant cells and the presence of morphogenic genes in every plant genome led us to hypothesize that plant regeneration could be stimulated by activation of endogenous morphogenic genes. Hence, we explored the idea of promoting plant tissue culture in genome editing experiments with CRISPR-Combo. We conducted targeted mutagenesis of the Pt4CL1 gene in poplar, a model of a woody plant. one uses Cas9 for genome editing (Cas9-GE), and the other employed the CRISPR-Combo system to simultaneously edit Pt4CL1 and activate the poplar WUSCHEL gene (PtWUS). Compared to the conventional Cas9-GE construct, the CRISPR-Combo construct (Cas9-Act3.0-A+GE) resulted in rapid tissue culture with accelerated root initiation and shoot growth (
We decided to assess the CRISPR-Combo system for simultaneous editing of Pt4CL1 and activation of other morphogenic genes. In one case, we targeted PtWOX11 for activation. Analysis of 10 randomly chosen CRISPR-Combo T0 lines showed a high level of PtWOX11 activation, up to 800-fold (
To test CRISPR-Combo for morphogen activation in another plant species, we chose tomato and selected seven morphogenic genes, including SlWUS, SlFAD-BD, SlE2F, SlARF7, SlARF19, SlBBM, and SlSTM, which are involved in callus formation and soot meristem development. For each gene, we screened multiple sgRNAs and identified those that can mount high levels of gene activation for these individual genes (
Having demonstrated the use of CRISPR-Combo for boosting tissue culture in dicot plants, we next pursued enhancing tissue culture with CRISPR-Combo in a monocot crop, rice. Previous studies have reported improvements in tissue culture of monocot plants by overexpression of WUSCHEL (WUS) and BABY BOOM (BBM) sourced from maize (Lowe et al., Plant Cell (2016) 28:1998-2015). We hypothesized that activation of OsBBM1 along or together with OsWUS would promote rice tissue culture independent of exogenous plant hormones. The EHA105:Cas9-Act3.0-GE strain-infected control callus explants couldn't produce any hygromycin-resistant calluses on hormone (2, 4-D)-free regeneration and selection medium (RSM). However, about 20% of callus explants infected with EHA105:Cas9-Act3.0-A-GE strains with activation of OsBBM1 or OsBBM1 & OsWUS1 showed hygromycin resistant callus growth (
We built a novel CRISPR-Combo platform to unleash versatile genome engineering in plants. We developed CRISPR-Combo systems for simultaneous targeted mutagenesis by NHEJ and gene activation, as well as simultaneous base editing and gene activation. While there are numerous applications, we focused on the demonstration of CRISPR-Combo for enabling plant genome editing experiments. In one example, we showed that CRISPR-Combo facilitated accelerated breeding and selection of transgene-free genome-edited plants through simultaneous activation of the endogenous florigen, FT. Although the concept was demonstrated in a model plant Arabidopsis, the technology is readily transferable into crop plants. In a second example, we directly worked on an application of CRISPR-Combo in the bioenergy crop poplar. We showed that by simultaneous activation of endogenous morphogenic genes such WUS, regeneration of genome-edited polar plants could be accelerated. Based on our data, it is conceivable that activation of florigen genes and morphogenic genes may be combined to further fast-track the breeding of transgene-free genome-edited crops. Hence, CRISPR-Combo greatly contributes to the improvement and application of genome editing in plants.
Previously, simultaneous genome editing and gene regulation was demonstrated using orthogonal Cas9 systems that require the expression of multiple Cas9 proteins sourced from different bacteria. By contrast, CRISPR-Combo is based on a single Cas9, which can be used for simultaneous genome editing, gene activation, and gene repression (
We pursued CRISPR interference (CRISPRi) for OsKU70 and OsKU80, both of which are involved in the canonical non-homologous end joining (NHEJ) DNA repair pathway in rice. For OsKu70, four of five sgRNAs (gR1 to gR5) combined with dCas9 resulted in predominantly gene repression. In particular, the gR1 reduced the OsKu70 expression level by 73.7%. Similarly, all five sgRNAs (gR1 to gR5) resulted in significant gene repression of OsKu80. The gR4 reduced the OsKu80 expression level by 94.5% (
This application claims priority to provisional application U.S. Ser. No. 63/066,674, filed Aug. 17, 2020, which is hereby incorporated herein by reference in its entirety.
This invention was made with government support under IOS1758745 and IOS2029889 awarded by the National Science Foundation. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/046281 | 8/17/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63066674 | Aug 2020 | US |