The instant application contains a Sequence Listing, which has been submitted electronically in .xml format and is hereby incorporated by reference in its entirety. Said xml copy, created on Nov. 28, 2022 is named 2003080-0237.xml and is 202,812 bytes in size.
Type V CRISPR/Cas12a effector proteins (also referred to as Cpf1 effector proteins) have been described as an alternative to Cas9 effector proteins for genome editing applications (Zetsche et al., Cell 163:759-771 (2015): Shmakov et al., Mol Cell. 60 (3): 385-97 (2015): Kleinstiver et al., Nat Biotechnol 34 (8): 869-74 (2016): Kim et al., Nat Biotechnol 34 (8): 863-8 (2016)). Cas12a effector proteins possess a number of potentially advantageous properties that include, but are not limited to: recognition of T-rich protospacer-adjacent motif (PAM) sequences, relatively greater genome-wide specificities in human cells compared to wild-type Streptococcus pyogenes Cas9 (SpCas9), an endoribonuclease activity to process pre-crRNAs that simplifies the simultaneous targeting of multiple sites (multiplexing), DNA endonuclease activity that generates a 5′ DNA overhang (rather than a blunt double-strand break as observed with SpCas9), and cleavage of the protospacer DNA sequence on the end most distal from the PAM (compared with cleavage at the PAM proximal end of the protospacer as is observed with SpCas9).
Given these capabilities there is a need to develop Cas12a effector proteins that provide a suitable alternative to Cas9 effector proteins for genome editing applications in humans.
The present disclosure provides strategies, systems, compositions, and methods related to engineered Type V CRISPR/Cas12a effector proteins and variants thereof with increased activity(ies) for altering a cell, e.g., altering a structure, e.g., altering a sequence, of a target nucleic acid of a cell, compared to other Type V CRISPR/Cas12a effector proteins described in the art. For example, in some embodiments, the present disclosure provides for strategies, systems, compositions, and methods related to engineered Cas12a effector proteins and variants thereof with increased activity(ies) for introducing double strand and/or single strand breaks in a target nucleic sequence, compared to other Cas12a effector proteins described in the art.
Among other things, the present disclosure provides strategies, systems, compositions, and methods related to engineered Cas12a effector proteins and variants thereof that are fused to one or more heterologous protein domains (or “fusion proteins”). For example, in some embodiments, Cas12a effector proteins may be fused to one or more heterologous protein domains such as a deaminase or catalytic domain for base editing. The fusion proteins provided herein exhibit increased activity(ies) compared to fusion proteins known in the art.
The disclosed Cas12a effector proteins, and related strategies, systems, compositions, and methods, present several advantages compared to other Cas12a effector proteins known in the art. For example, in some embodiments, the described Cas12a effector proteins, and related strategies, systems, compositions, and method, create a single and/or double strand break in a target and/or non-target nucleic sequence with higher efficiency compared to other Cas12a effector proteins known in the art. Moreover, in some embodiments, the described Cas12a effector proteins, and related strategies, systems, compositions, and method, alter the genomes of at least a plurality of cells at a higher rate compared to other Cas12a effector proteins known in the art.
The teachings described herein will be more fully understood from the following description of various exemplary embodiments, when read together with the accompanying drawing. It should be understood that the drawing described below is for illustration purposes only and is not intended to limit the scope of the present teachings in any way.
Unless otherwise specified, each of the following terms have the meaning set forth in this section.
The indefinite articles “a” and “an” refer to at least one of the associated noun, and are used interchangeably with the terms “at least one” and “one or more.” The conjunctions “or” and “and/or” are used interchangeably as non-exclusive disjunctions.
The term “cancer” (also used interchangeably with the term “neoplastic”), as used herein, refers to cells having the capacity for autonomous growth, e.g., an abnormal state or condition characterized by rapidly proliferating cell growth. Cancerous disease states may be categorized as pathologic, e.g., characterizing or constituting a disease state, e.g., malignant tumor growth, or may be categorized as non-pathologic, e.g., a deviation from normal but not associated with a disease state, e.g., cell proliferation associated with wound repair.
The terms “CRISPR/Cas effector protein”, “Cas enzyme”, “CRISPR enzyme”, “CRISPR protein”, “Cas protein” and “CRISPR/Cas” are generally used interchangeably and at all points of reference herein refer by analogy to new CRISPR/Cas effector proteins further described in this application, unless otherwise apparent, such as by specific reference to Cas12a or Cpf1. In some embodiments, a CRISPR/Cas effector protein is part of a fusion protein comprising one or more heterologous protein domains (e.g., about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains in addition to the CRISPR enzyme). In some embodiments, one or more heterologous protein domains comprises a deaminase. In some embodiments, one or more heterologous protein domains comprises a reverse transcriptase domain. In some embodiments, a CRISPR/Cas effector protein is a nuclease. In some embodiments, a CRISPR/Cas effector protein is a nickase. In some embodiments, a CRISPR/Cas effector protein is engineered (e.g., made by hand of man). In some embodiments, a CRISPR/Cas effector protein is a variant CRISPR/Cas effector protein.
The term “CRISPR/Cas nuclease” as used herein refer to any CRISPR/Cas protein with DNA nuclease activity, e.g., a Cas12a protein that exhibits specific association (or “targeting”) to a DNA target site, e.g., within a genomic sequence in a cell in the presence of a guide molecule. The strategies, systems, and methods disclosed herein can use any combination of CRISPR/Cas nuclease disclosed herein.
The term “CRISPR/Cas nickase” as used herein refer to any CRISPR/Cas protein with DNA nickase activity, e.g., a Cas12a protein that exhibits specific association (or “targeting”) to a DNA target site, e.g., within a genomic sequence in a cell in the presence of a guide molecule. The strategies, systems, and methods disclosed herein can use any combination of CRISPR/Cas nickase(s) disclosed herein.
The term “fuse.” or “fused” refers to the covalent linkage between two polypeptides in a fusion protein. The polypeptides may be fused via a peptide bond, either directly to each other or via a linker. The term “fusion protein” refers to a protein having at least two polypeptides covalently linked, either directly or via a linker (e.g., an amino acid linker). The polypeptides forming a fusion protein may be linked C-terminus to N-terminus, C-terminus to C-terminus, N-terminus to N-terminus, or N-terminus to C-terminus. The polypeptides of the fusion protein may be in any order and may include more than one of either or both of the constituent polypeptides. The term “fusion protein” encompasses conservatively modified variants, polymorphic variants, alleles, mutants, subsequences, interspecies homologs, and fragments of the poly peptides that make up the fusion protein. A fusion protein may be a protein developed from a fusion gene that is created through adjoining of two or more genes originally coding for separate proteins. Translation of this fusion gene may result in a single or multiple polypeptides with functional properties derived from each of the original proteins.
The term “operably linked” refers to a juxtaposition wherein the components described are in a relationship permitting them to function in their intended manner. A control element “operably linked” to a functional element is associated in such a way that expression and/or activity of the functional element is achieved under conditions compatible with the control element. In some embodiments, “operably linked” control elements are contiguous (e.g., covalently linked) with coding elements of interest: in some embodiments, control elements act in trans to the functional element of interest. In some embodiments, “operably linked” refers to functional linkage between a regulatory sequence and a heterologous nucleic acid sequence resulting in expression of the latter. For example, a first nucleic acid sequence is operably linked with a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. In some embodiments, for example, a functional linkage may include transcriptional control. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. Operably linked DNA sequences can be contiguous with each other and, e.g., where necessary to join two protein coding regions, are in the same reading frame.
The term “nuclease” as used herein refers to any protein that catalyzes the cleavage of phosphodiester bonds. In some embodiments the nuclease is a DNA nuclease. In some embodiments the nuclease is a “nickase” which causes a single-strand break when it cleaves double-stranded DNA, e.g., genomic DNA in a cell. In some embodiments the nuclease causes a double-strand break when it cleaves double-stranded DNA, e.g., genomic DNA in a cell. In some embodiments the nuclease binds a specific target site within the double-stranded DNA that overlaps with or is adjacent to the location of the resulting break. In some embodiments, the nuclease causes a double-strand break that contains overhangs ranging from 0 (blunt ends) to 22 nucleotides in both 3′ and 5′ orientations. As discussed herein, CRISPR/Cas nucleases are exemplary nucleases that can be used in accordance with the strategies, systems, and methods of the present disclosure.
The term “nucleic acid” in its broadest sense, refers to any compound and/or substance that is or can be incorporated into an oligonucleotide chain. In some embodiments, a nucleic acid is a compound and/or substance that is or can be incorporated into an oligonucleotide chain via a phosphodiester linkage. As will be clear from context, in some embodiments, “nucleic acid” refers to an individual nucleic acid residue (e.g., a nucleotide and/or nucleoside): in some embodiments, “nucleic acid” refers to an oligonucleotide chain comprising individual nucleic acid residues. In some embodiments, a “nucleic acid” is or comprises RNA: in some embodiments, a “nucleic acid” is or comprises DNA. In some embodiments, a nucleic acid is, comprises, or consists of one or more natural nucleic acid residues. In some embodiments, a nucleic acid is, comprises, or consists of one or more nucleic acid analogs. In some embodiments, a nucleic acid analog differs from a nucleic acid in that it does not utilize a phosphodiester backbone. For example, in some embodiments, a nucleic acid is, comprises, or consists of one or more “peptide nucleic acids”, which are known in the art and have peptide bonds instead of phosphodiester bonds in the backbone, are considered within the scope of the present invention. Alternatively or additionally, in some embodiments, a nucleic acid has one or more phosphorothioate and/or 5′-N-phosphoramidite linkages rather than phosphodiester bonds. In some embodiments, a nucleic acid is, comprises, or consists of one or more natural nucleosides (e.g., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxy guanosine, and deoxycytidine). In some embodiments, a nucleic acid is, comprises, or consists of one or more nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C-5 propynyl-cytidine, C-5 propynyl-uridine, 2-aminoadenosine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 2-aminoadenosine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, 0 (6)-methylguanine, 2-thiocytidine, methylated bases, intercalated bases, and combinations thereof). In some embodiments, a nucleic acid comprises one or more modified sugars (e.g., 2′-fluororibose, ribose, 2′-deoxyribose, arabinose, and hexose) as compared with those in natural nucleic acids. In some embodiments, a nucleic acid has a nucleotide sequence that encodes a functional gene product such as an RNA or protein. In some embodiments, a nucleic acid includes one or more introns. In some embodiments, nucleic acids are prepared by one or more of isolation from a natural source, enzymatic synthesis by polymerization based on a complementary template (in vivo or in vitro), reproduction in a recombinant cell or system, and chemical synthesis. In some embodiments, a nucleic acid is at least 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 20, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000 or more residues long. In some embodiments, a nucleic acid is partly or wholly single stranded: in some embodiments, a nucleic acid is partly or wholly double stranded. In some embodiments a nucleic acid has a nucleotide sequence comprising at least one element that encodes, or is the complement of a sequence that encodes, a polypeptide. In some embodiments, a nucleic acid has enzymatic activity.
The terms “orthologue” (also referred to as “ortholog” herein) and “homologue” (also referred to as “homolog” herein) are known in the art. By means of further guidance, a “homologue” of a protein as used herein is a protein of the same species which performs the same or a similar function as the protein it is a homologue of. Homologous proteins may but need not be structurally related, or are only partially structurally related. An “orthologue” of a protein as used herein is a protein of a different species which performs the same or a similar function as the protein it is an orthologue of. Orthologous proteins may but need not be structurally related, or are only partially structurally related. Homologs and orthologs may be identified by homology modelling (see, e g., Greer, Science vol. 228 (1985) 1055, and Blundell et al. Eur J Biochem vol 172 (1988), 513) or “structural BLAST” (Dey F, Cliff Zhang Q, Petrey D, Honig B. Toward a “structural BLAST”: using structural relationships to infer function. Protein Sci. 22 (4): 359-66 (2013)). See also Shmakov et al. (2015) for application in the field of CRISPR/Cas loci. Homologous proteins may but need not be structurally related, or are only partially structurally related.
The term “endogenous,” as used herein in the context of nucleic acids refers to a native nucleic acid (e.g., a gene, a protein coding sequence) in its natural location, e.g., within the genome of a cell.
The term “exogenous,” as used herein in the context of nucleic acids refers to a nucleic acid (whether native or non-native) that has been artificially introduced into a man-made construct (e.g., a knock-in cassette, or a donor template) or into the genome of a cell using, for example, gene editing or genetic engineering techniques, e.g., HDR based integration techniques.
The term “guide molecule” or “guide RNA” or “gRNA” or “gRNA molecule” when used in reference to a CRISPR/Cas system is any nucleic acid that promotes the specific association (or “targeting”) of a CRISPR/Cas effector protein, e.g., a Cas12 effector protein to a DNA target site such as within a genomic sequence in a cell. While guide molecules are typically RNA molecules it is well known in the art that chemically modified RNA molecules including DNA/RNA hybrid molecules can be used as guide molecules.
The term “linker” is used to refer to that portion of a multi-element agent that connects different elements to one another. For example, those of ordinary skill in the art appreciate that a polypeptide whose structure includes two or more functional or organizational domains often includes a stretch of amino acids between such domains that links them to one another. In some embodiments, a polypeptide comprising a linker element has an overall structure of the general form S1-L-S2, wherein S1 and S2 may be the same or different and represent two domains associated with one another by the linker. In some embodiments, a polypeptide linker is at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more amino acids in length. In some embodiments, a linker is characterized in that it tends not to adopt a rigid three-dimensional structure, but rather provides flexibility to the polypeptide. A variety of different linker elements that can appropriately be used when engineering polypeptides (e.g., fusion polypeptides) are known in the art (see e.g., Holliger et al., Proc. Natl. Acad. Sci. USA 90:6444-6448 (1993); Poljak et al., Structure 2:1121-1123 (1994)).
The term “polyadenylation” refers to the covalent linkage of a polyadenylyl moiety, or its modified variant, to a messenger RNA molecule. In eukaryotic organisms, most messenger RNA (mRNA) molecules are polyadenylated at the 3′ end. In some embodiments, a 3′ poly(A) tail is a long sequence of adenine nucleotides (e.g., 50, 60, 70, 100, 200, 500, 1000, 2000, 3000, 4000, or 5000) added to the pre-mRNA through the action of an enzyme, polyadenylate polymerase. In higher eukaryotes, a poly(A) tail can be added onto transcripts that contain a specific sequence, the polyadenylation signal or “poly(A) sequence.” A poly(A) tail and proteins bound to it aid in protecting mRNA from degradation by exonucleases. Polyadenylation can affect transcription termination, export of the mRNA from the nucleus, and translation. Typically, polyadenylation occurs in the nucleus immediately after transcription of DNA into RNA, but additionally can also occur later in the cytoplasm. After transcription has been terminated, the mRNA chain can be cleared through the action of an endonuclease complex associated with RNA polymerase. The cleavage site can be characterized by the presence of the base sequence AAUAAA near the cleavage site. After mRNA has been cleaved, adenosine residues can be added to the free 3′ end at the cleavage site. As used herein, a “poly(A) sequence” is a sequence that triggers the endonuclease cleavage of an mRNA and the addition of a series of adenosines to the 3′ end of the cleaved mRNA.
The term “polypeptide” refers to any polymeric chain of residues (e.g., amino acids) that are typically linked by peptide bonds. In some embodiments, a polypeptide has an amino acid sequence that occurs in nature. In some embodiments, a polypeptide has an amino acid sequence that does not occur in nature. In some embodiments, a polypeptide has an amino acid sequence that is engineered in that it is designed and/or produced through action of the hand of man. In some embodiments, a polypeptide may comprise or consist of natural amino acids, non-natural amino acids, or both. In some embodiments, a polypeptide may include one or more pendant groups or other modifications, e.g., modifying or attached to one or more amino acid side chains, at a polypeptide's N-terminus, at a polypeptide's C-terminus, or any combination thereof. In some embodiments, such pendant groups or modifications may be acetylation, amidation, lipidation, methylation, pegylation, etc., including combinations thereof. In some embodiments, polypeptides may contain L-amino acids, D-amino acids, or both and may contain any of a variety of amino acid modifications or analogs known in the art. In some embodiments, useful modifications may be or include, e.g., terminal acetylation, amidation, methylation, etc. In some embodiments, a protein may comprise natural amino acids, non-natural amino acids, synthetic amino acids, and combinations thereof.
The term “polynucleotide” (including, but not limited to “nucleotide sequence”, “nucleic acid”, “nucleic acid molecule”, “nucleic acid sequence”, and “oligonucleotide”) as used herein refer to a series of nucleotide bases (also called “nucleotides”) in DNA and RNA, and mean any chain of two or more nucleotides. In some embodiments, polynucleotides, nucleotide sequences, nucleic acids, etc. can be chimeric mixtures or derivatives or modified versions thereof, single-stranded or double-stranded. In some such embodiments, modifications can occur at the base moiety, sugar moiety, or phosphate backbone, for example, to improve stability of the molecule, its hybridization parameters, etc. In general, a nucleotide sequence typically carries genetic information, including, but not limited to, the information used by cellular machinery to make proteins and enzymes. In some embodiments, a nucleotide sequence and/or genetic information comprises double- or single-stranded genomic DNA, RNA, any synthetic and genetically manipulated polynucleotide, and/or sense and/or antisense polynucleotides. In some embodiments, nucleic acids containing modified bases.
Conventional IUPAC notation is used in nucleotide sequences presented herein, as shown in Table 1, below (see also Cornish-Bowden, Nucleic Acids Res. 13 (9): 3021-30 (1985), incorporated by reference herein). It should be noted, however, that “T” denotes “Thymine or Uracil” in those instances where a sequence may be encoded by either DNA or RNA, for example in certain CRISPR/Cas guide molecules.
The terms “prevent,” “preventing,” and “prevention” as used herein refer to the prevention of a disease in a mammal, e.g., in a human, including (a) avoiding or precluding the disease; (b) affecting the predisposition toward the disease; or (c) preventing or delaying the onset of at least one symptom of the disease.
As used herein, the term “recombinant” is intended to refer to polypeptides that are designed, engineered, prepared, expressed, created, manufactured, and/or or isolated by recombinant means, such as polypeptides expressed using a recombinant expression construct transfected into a host cell: polypeptides isolated from a recombinant, combinatorial human polypeptide library; polypeptides isolated from an animal (e.g., a mouse, rabbit, sheep, fish, etc.) that is transgenic for or otherwise has been manipulated to express a gene or genes, or gene components that encode and/or direct expression of the polypeptide or one or more component(s), portion(s), element(s), or domain(s) thereof; and/or polypeptides prepared, expressed, created or isolated by any other means that involves splicing or ligating selected nucleic acid sequence elements to one another, chemically synthesizing selected sequence elements, and/or otherwise generating a nucleic acid that encodes and/or directs expression of a polypeptide or one or more component(s), portion(s), element(s), or domain(s) thereof. In some embodiments, one or more of such selected sequence elements is found in nature. In some embodiments, one or more of such selected sequence elements is designed in silico. In some embodiments, one or more such selected sequence elements results from mutagenesis (e.g., in vivo or in vitro) of a known sequence element, e.g., from a natural or synthetic source such as, for example, in the germline of a source organism of interest (e.g., of a human, a mouse, etc.).
The term “reference” describes a standard or control relative to which a comparison is performed. For example, in some embodiments, an agent, animal, individual, population, sample, sequence or value of interest is compared with a reference or control agent, animal, individual, population, sample, sequence or value. In some embodiments, a reference or control is tested and/or determined substantially simultaneously with the testing or determination of interest. In some embodiments, a reference or control is a historical reference or control, optionally embodied in a tangible medium. Typically, as would be understood by those skilled in the art, a reference or control is determined or characterized under comparable conditions or circumstances to those under assessment. Those skilled in the art will appreciate when sufficient similarities are present to justify reliance on and/or comparison to a particular possible reference or control. In some embodiments, a reference is a negative control reference: in some embodiments, a reference is a positive control reference.
The term “sample” typically refers to an aliquot of material obtained or derived from a source of interest. In some embodiments, a source of interest is a biological or environmental source. In some embodiments, a source of interest may be or comprise a cell or an organism, such as a microbe (e.g., virus), a plant, or an animal (e.g., a human). In some embodiments, a source of interest is or comprises biological tissue or fluid. In some embodiments, a biological tissue or fluid may be or comprise amniotic fluid, aqueous humor, ascites, bile, bone marrow, blood, breast milk, cerebrospinal fluid, cerumen, chyle, chime, ejaculate, endolymph, exudate, feces, gastric acid, gastric juice, lymph, mucus, pericardial fluid, perilymph, peritoneal fluid, pleural fluid, pus, rheum, saliva, sebum, semen, serum, smegma, sputum, synovial fluid, sweat, tears, urine, vaginal secretions, vitreous humour, vomit, and/or combinations or component(s) thereof. In some embodiments, a biological fluid may be or comprise an intracellular fluid, an extracellular fluid, an intravascular fluid (blood plasma), an interstitial fluid, a lymphatic fluid, and/or a transcellular fluid. In some embodiments, a biological fluid may be or comprise a plant exudate. In some embodiments, a biological tissue or sample may be obtained, for example, by aspirate, biopsy (e.g., fine needle or tissue biopsy), swab (e.g., oral, nasal, skin, or vaginal swab), scraping, surgery, washing or lavage (e.g., bronchioalveolar, ductal, nasal, ocular, oral, uterine, vaginal, or other washing or lavage). In some embodiments, a biological sample is or comprises cells obtained from an individual. In some embodiments, a sample is a “primary sample” obtained directly from a source of interest by any appropriate means. In some embodiments, as will be clear from context, the term “sample” refers to a preparation that is obtained by processing (e.g., by removing one or more components of and/or by adding one or more agents to) a primary sample. For example, filtering using a semi-permeable membrane. Such a “processed sample” may comprise, for example nucleic acids or proteins extracted from a sample or obtained by subjecting a primary sample to one or more techniques such as amplification or reverse transcription of nucleic acid, isolation and/or purification of certain components, etc.
The term “subject” as used herein means a human or non-human animal. In some embodiments a human subject can be any age (e.g., a fetus, infant, child, young adult, or adult). In some embodiments a human subject may be at risk of or suffer from a disease, or may be in need of alteration of a gene or a combination of specific genes. Alternatively, in some embodiments, a subject may be a non-human animal, which may include, but is not limited to, a mammal. In some embodiments, a non-human animal is a non-human primate, a rodent (e.g., a mouse, rat, hamster, guinea pig, etc.), a rabbit, a dog, a cat, and so on. In some embodiments of this disclosure, the non-human animal subject is livestock, e.g., a cow, a horse, a sheep, a goat, etc. In some embodiments, the non-human animal subject is poultry, e.g., a chicken, a turkey, a duck, etc.
The terms “treatment,” “treat,” and “treating,” as used herein refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress, ameliorate, reduce severity of, prevent or delay the recurrence of a disease, disorder, or condition or one or more symptoms thereof, and/or improve one or more symptoms of a disease, disorder, or condition as described herein. In some embodiments, a condition includes an injury. In some embodiments, an injury may be acute or chronic (e.g., tissue damage from an underlying disease or disorder that causes, e.g., secondary damage such as tissue injury). In some embodiments, treatment may be administered to a subject after one or more symptoms have developed and/or after a disease has been diagnosed. Treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease. For example, in some embodiments, treatment may be administered to a susceptible subject prior to the onset of symptoms (e.g., in light of genetic or other susceptibility factors). In some embodiments, treatment may also be continued after symptoms have resolved, for example to prevent or delay their recurrence. In some embodiments, treatment results in improvement and/or resolution of one or more symptoms of a disease, disorder or condition.
The term “variant” as used herein refers to an entity such as a polypeptide or polynucleotide that shows significant structural identity with a reference entity but differs structurally from the reference entity in the presence or level of one or more chemical moieties as compared with the reference entity. In many embodiments, a variant also differs functionally from its reference entity. In general, whether a particular entity is properly considered to be a “variant” of a reference entity is based on its degree of structural identity with the reference entity. As used herein, the terms “functional variant” refer to a variant that confers the same function as the reference entity. It is to be understood that a functional variant need not be functionally equivalent to the reference entity as long as it confers the same function as the reference entity.
CRISPR/Cas effector systems according to the present disclosure comprise, but are not limited to, naturally-occurring Class 2 CRISPR effector proteins such as Cas12a (Cpf1), as well as other Cas12 effector proteins and effector proteins derived or obtained therefrom. In functional terms, CRISPR/Cas effector systems are defined as comprising a CRISPR/Cas effector protein that: (A) interact with (e.g., complex with) a gRNA molecule; and (B) together with the gRNA molecule, associate with, and optionally alter, cleave or modify, a target region of a DNA that includes (1) a sequence complementary to the targeting domain of the gRNA and, optionally, (2) an additional sequence referred to as a “protospacer adjacent motif,” or “PAM,” which is described in greater detail below. As the following examples will illustrate, CRISPR/Cas effector proteins can be defined, in broad terms, by their PAM specificity and cleavage activity, even though variations may exist between individual CRISPR/Cas effector proteins that share the same PAM specificity or cleavage activity. Skilled artisans will appreciate that some aspects of the present disclosure relate to systems and methods that can be implemented using any suitable CRISPR/Cas effector proteins having a certain PAM specificity and/or cleavage activity. For this reason, unless otherwise specified, the term CRISPR/Cas effector proteins should be understood as a generic term, and not limited to any species (e.g., Acidaminococcus sp. vs. Lachnospiraceae bacterium) or variation (e.g., full-length vs. truncated or split: naturally-occurring PAM specificity vs. engineered PAM specificity, etc.) of CRISPR/Cas effector proteins.
In general, a CRISPR/Cas effector protein can be delivered to the cell as a protein or a nucleic acid encoding the protein, e.g., a DNA molecule or mRNA molecule. The protein or nucleic acid can be combined with other delivery agents, e.g., lipids or polymers in a lipid or polymer nanoparticle and targeting agents such as antibodies or other binding agents with specificity for the cell. The DNA molecule can be a nucleic acid vector, such as a viral genome or circular double-stranded DNA, e.g., a plasmid. Nucleic acid vectors encoding a CRISPR/Cas effector protein can include other coding or non-coding elements. For example, a CRISPR/Cas effector protein can be delivered as part of a viral genome (e.g., in an AAV, adenoviral or lentiviral genome) that includes certain genomic backbone elements (e.g., inverted terminal repeats, in the case of an AAV genome).
The CRISPR/Cas effector proteins described herein have activities and properties that can be useful in a variety of applications, but the skilled artisan will appreciate that CRISPR/Cas effector proteins can also be modified in certain instances, to alter cleavage activity, PAM specificity, or other structural or functional features.
For example, a CRISPR/Cas effector system may comprise a nuclease, nickase, inactive or dead CRISPR/Cas effector protein, or base editor as described herein. In some embodiments, a nuclease may nick both a target strand of a DNA sequence and a non-target strand of a DNA sequence to create a double-strand break to create indels in the genome of a cell comprising the DNA sequence as described herein. In some embodiments, a CRISPR/Cas effector system comprises a nickase. In some embodiments, a CRISPR/Cas effector system comprises a CRISPR/Cas effector protein with no nuclease/nickase/cutting activity which simply binds to a target nucleic acid sequence e.g., an inactive or dead Cas12a effector protein or dCas12a effector protein. It is contemplated that the nuclease, nickase, inactive or dead CRISPR/Cas effector proteins described herein can be delivered to a cell in vitro, in vivo, or ex vivo.
In some embodiments, the present disclosure describes CRISPR/Cas effector protein fusions for improved base editing activity (“base editors”). In some embodiments, base editors comprise a CRISPR/Cas effector protein fused to a deaminase that nicks only a target strand of a target nucleic sequence and then a deaminase makes either an I or U base edit which after repair leads to either a permanent C to T or an A to G change in the genome of a cell as described herein. In some embodiments, base editors comprise a dead CRISPR/Cas (e.g., dCas12a) effector protein having one or more mutations as described herein. In some embodiments, base editors comprise a wild-type CRISPR/Cas effector protein having one or more mutations as described herein. In some embodiments, base editors comprise a CRISPR/Cas effector protein that is a nickase as described herein. In some embodiments, base editors comprise a CRISPR/Cas effector protein that is a nickase having one or more mutations as described herein. In some embodiments, base editors can be used for a DNA target nucleic sequence that requires a CRISPR/Cas effector protein with a T-rich PAM, e.g., those within introns to correct splicing-defect mutations. In some embodiments, a Cas12a effector protein described herein may be fused to a deaminase or catalytic domain thereof to produce a base editor (BE), e.g., as described by PCT Publication Nos. WO 2018/176009A1, WO 2018/213708A1, WO 2018/213726A1, WO 2019/041296A1, WO 2019/126762A2, WO 2019/120310A1, WO 2019/161783A1, WO 2021/016086A1, WO 2021/087246A1, WO 2021/123397A1, or WO 2021/155109A1, the contents of each of which is hereby incorporated herein by reference in its entirety. It is contemplated that the base editors described herein can be delivered to a cell in vitro, in vivo, or ex vivo.
CRISPR/Cas effector proteins may also optionally include a tag, such as, but not limited to, a nuclear localization signal, to facilitate movement of the CRISPR/Cas effector protein into the nucleus. In some embodiments, the CRISPR/Cas effector protein can incorporate C- and/or N-terminal nuclear localization signals. Nuclear localization sequences are known in the art.
In some embodiments, CRISPR/Cas effector systems and methods of their use are described in US Publication No. 2019/0062735 A1, the disclosure of which are incorporated by reference herein in its entirety.
The present disclosure describes the use of Cas12a effector proteins, derived from a Cas12a locus denoted as subtype V-A, and variants thereof. Such effector proteins are also referred to herein as Cas12a effector proteins. Presently, the subtype V-A loci encompasses Cas1, Cas2, a distinct gene denoted Cas12a and a CRISPR array. Cpf1 (CRISPR-associated protein Cpf1, subtype PREFRAN) (Cas12a) is a large protein (about 1300 amino acids) that contains a RuvC-like nuclease domain homologous to the corresponding domain of Cas9 along with a counterpart to the characteristic arginine-rich cluster of Cas9. However, Cas12a lacks the HNH nuclease domain that is present in all Cas9 proteins, and the RuvC-like domain is contiguous in the Cas12a sequence, in contrast to Cas9 where it contains long inserts including the HNH domain. Accordingly, in particular embodiments, a Cas12a effector protein comprises only a RuvC-like nuclease domain.
A crystal structure of Acidaminococcus sp. Cas12a in complex with crRNA and a dsDNA target including a TTTN PAM sequence has been solved by Yamano et al., Cell 165 (4): 949-962 (2016). Cas12a has two lobes: a REC (recognition) lobe, and a NUC (nuclease) lobe. The REC lobe includes REC1 and REC2 domains, which lack similarity to any known protein structures. The NUC lobe, meanwhile, includes three RuvC domains (RuvC-I, -II and —III) and a bridge helix (BH) domain. However, the Cas12a REC lobe lacks an HNH domain, and includes other domains that also lack similarity to known protein structures: a structurally unique PAM-interacting (PI) domain, three Wedge (WED) domains (WED-I, -II and —III), and a nuclease (Nuc) domain.
In some embodiments, a Cas12a effector protein is derived from an organism from the genus of Eubacterium. In some embodiments, the CRISPR effector protein is a Cas12a effector protein derived from an organism from the bacterial species of Eubacterium rectale (ErCas12a, e.g., MAD7). In some embodiments, the amino acid sequence of a Cas12a effector protein corresponds to NCBI Reference Sequence WP_055225123.1, NCBI Reference Sequence WP_055237260.1, NCBI Reference Sequence WP 055272206.1, or GenBank ID OLA16049.1. In some embodiments, the homologue or orthologue of Cas12a as referred to herein has a sequence homology or identity of at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% with one or more of the Cas12a sequences disclosed herein, e.g., one or more of the ErCas12a sequences disclosed herein. In further embodiments, the homologue or orthologue of Cas12a as referred to herein has a sequence identity of at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99% with a wild-type ErCas12a. In some embodiments, a Cas12a effector protein comprises an amino acid sequence of SEQ ID NO: 15. In some embodiments, a Cas12a effector protein comprises an amino acid sequence of SEQ ID NO: 16. In some embodiments, a Cas12a effector protein has a sequence homology or sequence identity of at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% with NCBI Reference Sequence WP_055225123.1, NCBI Reference Sequence WP_055237260.1, NCBI Reference Sequence WP 055272206.1, GenBank ID OLA16049.1, SEQ ID NO: 15, or SEQ ID NO: 16. A skilled person will understand that this includes truncated forms of a Cas12a effector protein whereby the sequence identity is determined over the length of the truncated form. In some embodiments, the ErCas12a effector protein recognizes the PAM sequence of TTTN or CTTN.
In some embodiments, a Cas12a effector protein may be from an organism of a genus which includes, but is not limited to Acidaminococcus sp. Lachnospiraceae bacterium, Francisella tularensis subsp. Novicida, Moraxella bovoculi, or Eubacterium rectale. In some embodiments, a Cas12a effector protein may be an organism of a species which includes, but is not limited to Acidaminococcus sp. BV316 (AsCas12a); Lachnospiraceae bacterium ND2006 (LbCas12a); or Lachnospiraceae bacterium MA2020 (Lb2Cas12a). In some embodiments, the homologue or orthologue of Cas12a as referred to herein has a sequence homology or identity of at least 80%, more preferably at least 85%, even more preferably at least 90%, such as for instance at least 95%, such as for instance at least 97%, such as for instance at least 98%, such as for instance at least 99% with one or more of the Cas12a sequences disclosed herein. In further embodiments, the homologue or orthologue of Cas12a as referred to herein has a sequence identity of at least 80%, more preferably at least 85%, even more preferably at least 90%, such as for instance at least 95%, such as for instance at least 97%, such as for instance at least 98%, such as for instance at least 99% with a wild-type ErCas12a, FnCas12a, AsCas12a, LbCas12a, Lb2Cas12a, MbCas12a, or MG29-1. In some embodiments, a Cas12a effector protein comprises an amino acid sequence of SEQ ID NO: 1. In some embodiments, a Cas12a effector protein comprises an amino acid sequence of SEQ ID NO: 2. In some embodiments, a Cas12a effector protein comprises an amino acid sequence of SEQ ID NO: 3. In some embodiments, a Cas12a effector protein comprises an amino acid sequence of SEQ ID NO: 4. In some embodiments, a Cas12a effector protein comprises an amino acid sequence of SEQ ID NO: 5. In some embodiments, a Cas12a effector protein comprises an amino acid sequence of SEQ ID NO: 14. In some embodiments, a Cas12a effector protein comprises an amino acid sequence of SEQ ID NO: 15.
In some embodiments, a Cas12a effector protein has a sequence homology or identity of at least 60%, more particularly at least 70%, at least 80%, more preferably at least 85%, even more preferably at least 90%, at least 95%, at least 97%, at least 98%, or at least 99%, with ErCas12a, AsCas12a, FnCas12a, LbCas12a, Lb2Cas12a, MbCas12a, or MG29-1. In some embodiments, a Cas12a effector protein as referred to herein has a sequence identity of at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99%, with a wild-type ErCas12a, AsCas12a, FnCas12a, LbCas12a, Lb2Cas12a, MbCas12a, or MG29-1. In some embodiments, a Cas12a effector protein has less than 60% sequence identity with AsCas12a. In some embodiments, a Cas12a effector protein has less than 60% sequence identity with ErCas12a. In some embodiments, a Cas12a effector protein has less than 60% sequence identity with FnCas12a. In some embodiments, a Cas12a effector protein has less than 60% sequence identity with LbCas12a. In some embodiments, a Cas12a effector protein has less than 60% sequence identity with Lb2Cas12a. In some embodiments, a Cas12a effector protein has less than 60% sequence identity with MbCas12a. In some embodiments, a Cas12a effector protein has less than 60% sequence identity with MG29-1. A skilled person will understand that this includes truncated forms of a Cas12a effector protein whereby the sequence identity is determined over the length of the truncated form.
In some embodiments, a homologue or orthologue of Cas12a as referred to herein has a sequence homology or identity of at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% with Cas12a. In further embodiments, the homologue or orthologue of Cas12a as referred to herein has a sequence identity of at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% with a wild-type Cas12a. Where the Cas12a has one or more mutations (mutated), the homologue or orthologue of the Cas12a as referred to herein has a sequence identity of at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% with the mutated Cas12a.
Cas12a effector proteins may also refer to Cas12a nucleases, Cas12a nickases, and/or dead Cas12a effector proteins, and related variants thereof as described herein. In some embodiments, Cas12a effector proteins are fused to one or more heterologous protein domains (“fusion proteins”) for base editing as described herein.
The foregoing list of modifications is intended to be exemplary in nature, and the skilled artisan will appreciate, in view of the instant disclosure, that other modifications may be possible or desirable in certain applications. For brevity, therefore, exemplary systems, methods and compositions of the present disclosure are presented with reference to particular CRISPR/Cas effector proteins, but it should be understood that the CRISPR/Cas effector proteins used may be modified in ways that do not alter their operating principles. Such modifications are within the scope of the present disclosure.
Turning first to modifications that alter cleavage activity of Cas12a effector proteins, the present disclosure describes substitutions (or mutations) that reduce or eliminate activity of domains within the NUC lobe. In general, mutations that reduce or eliminate activity in nuclease domains result in CRISPR/Cas effector proteins with nickase activity, but it should be noted that the type of nickase activity varies depending on which domain is inactivated. For example, exemplary mutations at positions corresponding to K1000, S1001, e.g., K1000G, S1001G in AsCas12a may be made as described by PCT Publication No. WO 2019/233990A1, the entire contents of which are incorporated herein by reference.
As another example, exemplary mutations are included that alter the PAM specificity of ErCas12a variants, e.g., those at positions K535, K594, e.g., K535R, K594L, e.g., K535R/N539S, K535R/N539S/K594L/E730Q, K535R/N539S and K535R/N539S/K594L/E730Q as described in WO 2020/086475A1 or those at positions K169, N264, D529, K535, N539, and K594, which are corresponding to exemplary mutations at positions K177, N272, D537, K543, N547, K602, e.g., K177R, N272A, D537R, K543V, K543R, N547R. K602R as described in WO 2021/074191A1, the entire contents of each of which are incorporated herein by reference. However, it is noted that the combination of substitutions described by the present disclosure result in unexpectedly increased activity compared to variants described by the art.
In some embodiments, a Cas12a effector protein is a Cas12a variant comprising 1, 2, 3, 4, 5, 6, or 7 of the amino acid substitutions corresponding to substitutions at M537, F870, S186, R301, T315, Q1014, and E174 in AsCas12a. In some embodiments, a Cas12a effector protein is a Cas12a variant comprising 1, 2, 3, 4, 5, 6, or 7 of the amino acid substitutions corresponding to substitutions at M537R, F870L, S186K, R301K, T315R. Q1014R, and E174R in AsCas12a. In some embodiments, a Cas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions corresponding to substitutions at K1000, S1001, e.g., K1000G, S1001G in AsCas12a). In some embodiments, a Cas12a effector protein is a Cas12a nickase comprising one or more additional amino acid substitutions corresponding to substitutions at R1226 in AsCas12a (e.g., at a position corresponding to a substitution at R1226, e.g., R1226A in AsCas12a).
In some embodiments, an AsCas12a effector protein comprises amino acid substitutions E174, S542, and K548 in AsCas12a. In some embodiments, a Cas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position corresponding to a substitution at R1226, e.g., R1226A in AsCas12a).
In some embodiments, a Cas12a effector protein is an ErCas12a variant comprising 1, 2, 3, 4, 5, 6, or 7 of the amino acid substitutions corresponding to substitutions at M537, F870, S186, R301, T315, Q1014, and E174 in AsCas12a. In some embodiments, a Cas12a effector protein is an ErCas12a variant comprising 1, 2, 3, 4, 5, 6, or 7 of the amino acid substitutions corresponding to substitutions at M537R, F870L, S186K. R301K, T315R, Q1014R, and E174R in AsCas12a. In some embodiments, an ErCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions corresponding to K1000, S1001, e.g., K1000G, S1001G in AsCas12a). In some embodiments, an ErCas12a variant is an ErCas12a PAM variant comprising one or more additional amino acid substitutions (e.g., at positions K535, K594, e.g., K535R, K594L, or at positions K169, N264, D529, K535, N539, and K594, e.g., K169R, N264A, D529R, K535V, K535R, N539R, and K594R). In some embodiments, an ErCas12a effector protein is an ErCas12a nickase comprising one or more additional amino acid substitutions (e.g., at position R1173, e.g., R1173A).
In some embodiments, a Cas12a effector protein is an ErCas12a variant comprising 1 or 2 amino acid substitutions at positions selected from I524 and F840. In some embodiments, a Cas12a effector protein is an ErCas12a variant comprising 1 or 2 of the amino acid substitutions selected from I524R and F840L. In some embodiments, an ErCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K969, K970, e.g., K969G, K970G). In some embodiments, an ErCas12a variant is an ErCas12a PAM variant comprising one or more additional amino acid substitutions (e.g., at positions K535, K594, e.g., K535R, K594L, or at positions K169, N264, D529, K535, N539, and K594, e.g., K169R, N264A, D529R, K535V, K535R, N539R, and K594R). In some embodiments, an ErCas12a effector protein is an ErCas12a nickase comprising one or more additional amino acid substitutions (e.g., at position R1173, e.g., R1173A).
In some embodiments, a Cas12a effector protein is an ErCas12a variant comprising 1, 2, 3, 4, 5, 6, or 7 amino acid substitutions selected from substitutions at I524, F840, S181, T292, K982, K169, and D1055. In some embodiments, a Cas12a effector protein is an ErCas12a variant comprising 1, 2, 3, 4, 5, 6 or 7 of the amino acid substitutions selected from substitutions at I524R, F840L, S181K, T292R, K982R, K169R, and D1055Y. In some embodiments, an ErCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K969, K970, e.g., K969G, K970G). In some embodiments, an ErCas12a variant is an ErCas12a PAM variant comprising one or more additional amino acid substitutions (e.g., at positions K535, K594, e.g., K535R, K594L, or at positions K169, N264, D529, K535, N539, and K594, e.g., K169R, N264A, D529R, K535V, K535R, N539R, and K594R). In some embodiments, an ErCas12a effector protein is an ErCas12a nickase comprising one or more additional amino acid substitutions (e.g., at position R1173, e.g., R1173A).
In some embodiments, a Cas12a effector protein is an ErCas12a variant comprising 1, 2, 3, 4 or 5 amino acid substitutions at positions selected from I524, F840, S181, T292, and K982. In some embodiments, a Cas12a effector protein is an ErCas12a variant comprising 1, 2, 3, 4, or 5 of the amino acid substitutions selected from I524R, F840L, S181K, T292R, and K982R. In some embodiments, an ErCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K969, K970, e.g., K969G, K970G). In some embodiments, an ErCas12a variant is an ErCas12a PAM variant comprising one or more additional amino acid substitutions (e.g., at positions K535, K594, e.g., K535R, K594L, or at positions K169, N264, D529, K535, N539, and K594, e.g., K169R, N264A, D529R, K535V, K535R, N539R, and K594R). In some embodiments, an ErCas12a effector protein is an ErCas12a nickase comprising one or more additional amino acid substitutions (e.g., at position R1173, e.g., R1173A).
In some embodiments, a Cas12a effector protein is an ErCas12a variant comprising 1, 2 or 3 amino acid substitutions at positions selected from I524, F840, and K169. In some embodiments, a Cas12a effector protein is an ErCas12a variant comprising 1, 2 or 3 of the amino acid substitutions selected from I524R, F840L, and K169R. In some embodiments, an ErCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K969, K970, e.g., K969G, K970G). In some embodiments, an ErCas12a variant is an ErCas12a PAM variant comprising one or more additional amino acid substitutions (e.g., at positions K535, K594, e.g., K535R, K594L, or at positions K169, N264, D529, K535, N539, and K594, e.g., K169R, N264A, D529R, K535V, K535R, N539R, and K594R). In some embodiments, an ErCas12a effector protein is an ErCas12a nickase comprising one or more additional amino acid substitutions (e.g., at position R1173, e.g., R1173A).
In some embodiments, an ErCas12a effector protein comprises amino acid substitutions I524R and F840L. In some embodiments, an ErCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K969, K970, e.g., K969G, K970G). In some embodiments, an ErCas12a variant is an ErCas12a PAM variant comprising one or more additional amino acid substitutions (e.g., at positions K535, K594, e.g., K535R, K594L, or at positions K169, N264, D529, K535, N539, and K594, e.g., K169R. N264A, D529R, K535V, K535R, N539R, and K594R). In some embodiments, an ErCas12a effector protein is an ErCas12a nickase comprising one or more additional amino acid substitutions (e.g., at position R1173, e.g., R1173A).
In some embodiments, an ErCas12a effector protein comprises amino acid substitutions K169R, D529R, and K535R. In some embodiments, an ErCas12a effector protein is an ErCas12a nickase comprising one or more additional amino acid substitutions (e.g., at position R1173, e.g., R1173A).
In some embodiments, a Cas12a effector protein is an FnCas12a variant comprising 1 or 2 amino acid substitutions at positions selected from N602 and F879. In some embodiments, a Cas12a effector protein is an FnCas12a variant comprising 1 or 2 of the amino acid substitutions selected from N602R and F879L. In some embodiments, an FnCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K1013, R1014, e.g., K1013G, R1014G). In some embodiments, an FnCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1218, e.g., R1218A).
In some embodiments, a Cas12a effector protein is an FnCas12a variant comprising 1, 2, 3, 4 or 5 amino acid substitutions at positions selected from N602, F879, P196, S334, and K1026. In some embodiments, a Cas12a effector protein is an FnCas12a variant comprising 1, 2, 3, 4, or 5 of the amino acid substitutions selected from N602R, F879L, P196K, S334R, and K1026R. In some embodiments, an FnCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K1013, R1014, e.g., K1013G, R1014G). In some embodiments, an FnCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1218, e.g., R1218A).
In some embodiments, a Cas12a effector protein is an FnCas12a variant comprising 1, 2 or 3 amino acid substitutions at positions selected from N602, F879, and E184. In some embodiments, a Cas12a effector protein is an FnCas12a variant comprising 1, 2 or 3 of the amino acid substitutions selected from N602R, F879L, and E184R. In some embodiments, an FnCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K1013, R1014, e.g., K1013G, R1014G). In some embodiments, an FnCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1218, e.g., R1218A).
In some embodiments, an FnCas12a effector protein comprises amino acid substitutions N602R and F879L. In some embodiments, an FnCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K1013, R1014, e.g., K1013G, R1014G). In some embodiments, an FnCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1218, e.g., R1218A).
In some embodiments, a FnCas12a effector protein comprises amino acid substitutions E184R, N607R, and K613R. In some embodiments, a FnCas12a effector protein is a FnCas12a nickase comprising one or more additional amino acid substitutions (e.g., at position R1218, e.g., R1218A).
In some embodiments, a Cas12a effector protein amino acid sequence comprises an amino acid sequence having at least about 90%, 95%, 97%, 98%, 99% or 100% identity to an FnCas12a amino acid sequence described herein.
In some embodiments, a Cas12a effector protein is an Lb2Cas12a variant comprising 1 or 2 amino acid substitutions at positions selected from R507 and T778. In some embodiments, a Cas12a effector protein is an Lb2Cas12a variant comprising the amino acid substitution of T778L. In some embodiments, an Lb2Cas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K913, R914, e.g., K913G, R914G). In some embodiments, an Lb2Cas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at a position corresponding to a substitution at R1124, e.g., R1124A).
In some embodiments, a Cas12a effector protein is an Lb2Cas12a variant comprising 1, 2, 3, 4 or 5 amino acid substitutions at positions selected from R507, T778, S167, E271, and K926. In some embodiments, a Cas12a effector protein is an Lb2Cas12a variant comprising 1, 2, 3, or 4 of the amino acid substitutions selected from T778L, S167K, E271R, and K926R. In some embodiments, an Lb2Cas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K913, R914, e.g., K913G, R914G). In some embodiments, an Lb2Cas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1124, e.g., R1124A).
In some embodiments, a Cas12a effector protein is an Lb2Cas12a variant comprising 1, 2 or 3 amino acid substitutions at positions selected from R507, T778, and K155. In some embodiments, a Cas12a effector protein is an Lb2Cas12a variant comprising 1 or 2 of the amino acid substitutions selected from T778L and K155R. In some embodiments, an Lb2Cas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K913, R914, e.g., K913G, R914G). In some embodiments, an Lb2Cas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1124, e.g., R1124A).
In some embodiments, an Lb2Cas12a effector protein comprises amino acid substitution T778L. In some embodiments, an Lb2Cas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K913, R914, e.g., K913G, R914G). In some embodiments, an Lb2Cas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1124, e.g., R1124A).
In some embodiments, a Lb2Cas12a effector protein comprises amino acid substitutions K155R, N512R, and K518R. In some embodiments, an Lb2Cas12a effector protein is a Lb2Cas12a nickase comprising one or more additional amino acid substitutions (e.g., at position R1124, e.g., R1124A).
In some embodiments, a Cas12a effector protein amino acid sequence comprises an amino acid sequence having at least about 90%, 95%, 97%, 98%, 99% or 100% identity to an Lb2Cas12a amino acid sequence described herein.
In some embodiments, a Cas12a effector protein is an LbCas12a variant comprising 1 or 2 amino acid substitutions at positions selected from N527 and E795. In some embodiments, a Cas12a effector protein is an LbCas12a variant comprising 1 or 2 of the amino acid substitutions selected from N527R and E795L. In some embodiments, an LbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K932, N933, e.g., K932G, N933G). In some embodiments, an LbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1138, e.g., R1138A).
In some embodiments, a Cas12a effector protein is an LbCas12a variant comprising 1, 2, 3, 4 or 5 amino acid substitutions at positions selected from N527, E795, S168, S286, and K945. In some embodiments, a Cas12a effector protein is an LbCas12a variant comprising 1, 2, 3, 4, or 5 of the amino acid substitutions selected from N527R, E795L, S168K, S286R, and K945R. In some embodiments, an LbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K932, N933, e.g., K932G, N933G). In some embodiments, an LbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1138, e.g., R1138A).
In some embodiments, a Cas12a effector protein is an LbCas12a variant comprising 1, 2 or 3 amino acid substitutions at positions selected from N527, E795, and D156. In some embodiments, a Cas12a effector protein is an LbCas12a variant comprising 1, 2 or 3 of the amino acid substitutions selected from N527R, E795L, and D156R. In some embodiments, an LbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K932, N933, e.g., K932G, N933G). In some embodiments, an LbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1138, e.g., R1138A).
In some embodiments, an LbCas12a effector protein comprises amino acid substitutions N527R and E795L. In some embodiments, an LbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K932, N933, e.g., K932G, N933G). In some embodiments, an LbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1138, e.g., R1138A).
In some embodiments, an LbCas12a effector protein comprises amino acid substitutions D156R, G532R, and K538R. In some embodiments, an LbCas12a effector protein is an LbCas12a nickase comprising one or more additional amino acid substitutions (e.g., at position R1138, e.g., R1138A).
In some embodiments, a Cas12a effector protein amino acid sequence comprises an amino acid sequence having at least about 90%, 95%, 97%, 98%, 99% or 100% identity to an LbCas12a amino acid sequence described herein.
In some embodiments, a Cas12a effector protein is an MbCas12a variant comprising 1 or 2 amino acid substitutions at positions selected from N568 and M825. In some embodiments, a Cas12a effector protein is an MbCas12a variant comprising 1 or 2 of the amino acid substitutions selected from N568R and M825L. In some embodiments, an MbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K965, R966, e.g., K965G, R966G). In some embodiments, an MbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1171, e.g., R1171A).
In some embodiments, a Cas12a effector protein is an MbCas12a variant comprising 1, 2, 3, 4 or 5 amino acid substitutions at positions selected from N568, M825, H184, G292, and N978. In some embodiments, a Cas12a effector protein is an MbCas12a variant comprising 1, 2, 3, 4, or 5 of the amino acid substitutions selected from N568R, M825L, H184K, G292R, and N978R. In some embodiments, an MbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K965, R966, e.g., K965G, R966G). In some embodiments, an MbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1171, e.g., R1171A).
In some embodiments, a Cas12a effector protein is an MbCas12a variant comprising 1, 2 or 3 amino acid substitutions at positions selected from N568, M825, and D172. In some embodiments, a Cas12a effector protein is an MbCas12a variant comprising 1, 2 or 3 of the amino acid substitutions selected from N568R, M825L, and D172R. In some embodiments, an MbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K965, R966, e.g., K965G, R966G). In some embodiments, an MbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1171, e.g., R1171A).
In some embodiments, an MbCas12a effector protein comprises amino acid substitutions N568R and M825L. In some embodiments, an MbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K965, R966, e.g., K965G, R966G). In some embodiments, an MbCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1171, e.g., R1171A).
In some embodiments, an MbCas12a effector protein comprises amino acid substitutions D172R, N563R, and K569R. In some embodiments, an MbCas12a effector protein is a MbCas12a nickase comprising one or more additional amino acid substitutions (e.g., at position R1218, e.g., R1218A).
In some embodiments, a Cas12a effector protein amino acid sequence comprises an amino acid sequence having at least about 90%, 95%, 97%, 98%, 99% or 100% identity to an MbCas12a amino acid sequence described herein.
In some embodiments, a Cas12a effector protein is an AsCas12a variant comprising 1, 2, 3, 4, 5, 6, 7, or 8 amino acid substitutions at positions selected from M537, F870, E174, S186, R301, T315, Q1014, and I1088. In some embodiments, a Cas12a effector protein is an AsCas12a variant comprising 1, 2, 3, 4, 5, 6, 7 or 8 of the amino acid substitutions selected from M537R, F870L, E174R, S186K, R301K, T315R, Q1014R, and I1088Y. In some embodiments, an AsCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K1000, K1001, e.g., K1000G, K1001G). In some embodiments, an AsCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1226, e.g., R1226A).
In some embodiments, a Cas12a effector protein is an AsCas12a variant comprising 1 or 2 amino acid substitutions at positions selected from K603 and I1088. In some embodiments, a Cas12a effector protein is an AsCas12a variant comprising the amino acid substitutions I1088Y. In some embodiments, an AsCas12a variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1226, e.g., R1226A).
In some embodiments, an AsCas12a effector protein comprises amino acid substitutions E174R, S542R, and K548R. In some embodiments, an AsCas12a effector protein is an AsCas12a nickase comprising one or more additional amino acid substitutions (e.g., at position R1226, e.g., R1226A).
In some embodiments, a Cas12a effector protein amino acid sequence comprises an amino acid sequence having at least about 90%, 95%, 97%, 98%, 99% or 100% identity to an AsCas12a amino acid sequence described herein.
In some embodiments, a Cas12a effector protein is an MG29-1 variant comprising 1, 2, 3, 4, 5, 6, or 7 of the amino acid substitutions corresponding to substitutions at M537, F870, S186, R301, T315, Q1014, and E174 in AsCas12a. In some embodiments, a Cas12a effector protein is an MG29-1 variant comprising 1, 2, 3, 4, 5, 6, or 7 of the amino acid substitutions corresponding to M537R, F870L, S186K, R301K, T315R, Q1014R, and E174R in AsCas12a. In some embodiments, an MG29-1 variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions corresponding to K1000, S1001, e.g., K1000G, S1001G in AsCas12a). In some embodiments, an MG29-1 variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1192, e.g., R1192A).
In some embodiments, a Cas12a effector protein is an MG29-1 variant comprising 1 or 2 amino acid substitutions at positions selected from A572 and F849. In some embodiments, a Cas12a effector protein is an MG29-1 variant comprising 1 or 2 of the amino acid substitutions selected from A572R and F849L. In some embodiments, an MG29-1 variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K983, R984, e.g., K983G, R984G). In some embodiments, an MG29-1 variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1192, e.g., R1192A).
In some embodiments, a Cas12a effector protein is an MG29-1 variant comprising 1, 2, 3, 4 or 5 amino acid substitutions at positions selected from A572, F849, S184, R292, T306, and K996. In some embodiments, a Cas12a effector protein is an MG29-1 variant comprising 1, 2, 3, 4, or 5 of the amino acid substitutions selected from A572R, F849L, S184K, R292K, T306R, and K996R. In some embodiments, an MG29-1 variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K983, R984, e.g., K983G, R984G). In some embodiments, an MG29-1 variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1192, e.g., R1192A).
In some embodiments, a Cas12a effector protein is an MG29-1 variant comprising 1, 2 or 3 amino acid substitutions at positions selected from A572, F849, and E172. In some embodiments, a Cas12a effector protein is an MG29-1 variant comprising 1, 2 or 3 of the amino acid substitutions selected from A572R, F849L, and E172R. In some embodiments, an MG29-1 variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K983, R984, e.g., K983G, R984G). In some embodiments, an MG29-1 variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1192, e.g., R1192A).
In some embodiments, a Cas12a effector protein comprises amino acid substitutions A572R and F849L. In some embodiments, an MG29-1 variant is a nickase comprising one or more additional amino acid substitutions (e.g., at positions K983, R984, e.g., K983G, R984G). In some embodiments, an MG29-1 variant is a nickase comprising one or more additional amino acid substitutions (e.g., at position R1192, e.g., R1192A).
In some embodiments, an MG29-1 effector protein comprises amino acid substitutions E172R, N577R, and K583R. In some embodiments, an MG29-1 effector protein is an MG29-1 nickase comprising one or more additional amino acid substitutions (e.g., at position R1192, e.g., R1192A).
In some embodiments, a Cas12a effector protein amino acid sequence comprises an amino acid sequence having at least about 90%, 95%, 97%, 98%, 99% or 100% identity to an MG29-1 amino acid sequence described herein.
Other suitable modifications of a Cas12a amino acid sequence are known to those of ordinary skill in the art. Some exemplary amino acid sequences of wild-type Cas12a (Cpf1) effector proteins and variants thereof are provided below:
Cas12a effector proteins can be, in some embodiments, size-optimized or truncated, for instance via one or more deletions that reduce the size of the effector protein while still retaining gRNA association, target and PAM recognition, and cleavage activities. In some embodiments, CRISPR/Cas effector proteins are bound, covalently or non-covalently, to another polypeptide, nucleotide, or other structure, optionally by means of a linker. Exemplary bound effector proteins and linkers are described by Guilinger et al., Nature Biotech. 32:577-582 (2014), the contents of which is hereby incorporated by reference herein in its entirety.
Additional suitable Cas12a effector proteins and variants thereof will be apparent to the skilled artisan based on the present. Moreover, a number of amino acid sequences of wild-type Cas12a effector protein orthologues are provided in US Publication No. 2021/0079366 A1, the disclosure of which is hereby incorporated herein by reference in its entirety. Exemplary suitable Cas12a effector proteins may include, but are not limited to, those provided in Table 2.
In one aspect, the present disclosure provides Cas12a effector proteins fused to one or more heterologous protein domains (“fusion proteins”) for base editing as described herein. In some embodiments, one or more heterologous protein domains comprise or are deaminase domains and/or polypeptides. Any deaminase domain and/or polypeptide useful for base editing may be used in a fusion protein of the present disclosure. A cytosine base editor (CBE), as used herein, comprises a cytosine deaminase. An adenine base editor (ABE), as used herein, comprises an adenine deaminase.
In some embodiments, a deaminase comprises or is a cytosine deaminase or a cytidine deaminase. A “cytosine deaminase” and “cytidine deaminase” as used herein refer to a polypeptide or domain thereof that catalyzes or is capable of catalyzing cytosine deamination in that the polypeptide or domain catalyzes or is capable of catalyzing the removal of an amine group from a cytosine base. Thus, a cytosine deaminase may result in conversion of cytosine to a thymidine (through a uracil intermediate), causing a C to T conversion, or a G to A conversion in the complementary strand in the genome. Thus, in some embodiments, a cytosine deaminase encoded by a polynucleotide of the present disclosure generates a C to T conversion in the sense (e.g., “+”: template) strand of the target nucleic acid and/or a G to A conversion in antisense (e.g., complementary) strand of the target nucleic acid. In some embodiments, a cytosine deaminase encoded by a polynucleotide of the present disclosure generates a C to T, G, or A conversion in the complementary strand in the genome.
In some embodiments, a cytosine deaminase may be any known or later identified cytosine deaminase from any organism (see, e.g., U.S. Pat. No. 10,167,457 and Thuronyi et al. Nat. Biotechnol. 37:1070-1079 (2019), each of which is incorporated by reference herein for its disclosure of cytosine deaminases). Cytosine deaminases can catalyze the hydrolytic deamination of cytidine or deoxycytidine to uridine or deoxyuridine, respectively. Thus, in some embodiments, a deaminase or deaminase domain may be a cytidine deaminase domain, catalyzing the hydrolytic deamination of cytosine to uracil. In some embodiments, a cytosine deaminase may be a variant of a naturally-occurring cytosine deaminase, including, but not limited to, a primate (e.g., a human, monkey, chimpanzee, gorilla), a dog, a cow, a rat, or a mouse cytosine deaminase. Thus, in some embodiments, an cytosine deaminase useful with the invention may be about 70% to about 100% identical to a wild-type cytosine deaminase (e.g., about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical, and any range or value therein, to a naturally occurring cytosine deaminase).
In some embodiments, a cytosine deaminase useful with the invention may be an apolipoprotein B mRNA-editing complex (APOBEC) family deaminase. In some embodiments, a cytosine deaminase may be an APOBEC1 deaminase, an APOBEC2 deaminase, an APOBEC3A deaminase, an APOBEC3B deaminase, an APOBEC3C deaminase, an APOBEC3D deaminase, an APOBEC3F deaminase, an APOBEC3G deaminase, an APOBEC3H deaminase, an APOBEC4 deaminase, a human activation induced deaminase (hAID), an rAPOBEC1, FERNY, and/or a CDA1, optionally a pmCDA1, an atCDA1 (e.g., At2gl9570), and evolved versions of the same. Evolved deaminases are disclosed in, for example, U.S. Pat. No. 10,113,163, Gaudelli et al., Nature 551 (7681): 464-471 (2017)) and Thuronyi et al., Nature Biotechnology 37:1070-1079 (2019), each of which are incorporated by reference herein for their disclosure of deaminases and evolved deaminases. In some embodiments, a cytosine deaminase may be an APOBEC1 deaminase having the amino acid sequence of SEQ ID NO: 57. In some embodiments, a cytosine deaminase may be an APOBEC3A deaminase having the amino acid sequence of SEQ ID NO: 58. In some embodiments, a cytosine deaminase may be a CDA1 deaminase, optionally a CDA1 having the amino acid sequence of SEQ ID NO: 59. In some embodiments, a cytosine deaminase may be a FERNY deaminase, optionally a FERNY having the amino acid sequence of SEQ ID NO: 60. In some embodiments, a cytosine deaminase may be an rAPOBEC1 deaminase, optionally an rAPOBEC1 deaminase having the amino acid sequence of SEQ ID NO: 61. In some embodiments, a cytosine deaminase may be an hAID deaminase, optionally an hAID having the amino acid sequence of SEQ ID NO: 62 or SEQ ID NO: 63. In some embodiments, a cytosine deaminase may be about 70% to about 100% identical (e.g., 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5% or 100% identical) to the amino acid sequence of a naturally occurring cytosine deaminase (e.g., “evolved deaminases”) (see, e.g., SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66). In some embodiments, a cytosine deaminase useful with the invention may be about 70% to about 99.5% identical (e.g., about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 99.5% identical) to the amino acid sequence of any one of SEQ ID NOs: 57-66 (e.g., at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of any one of SEQ ID NOs: 57-66). In some embodiments, a polynucleotide encoding a cytosine deaminase may be codon optimized for expression in a mammal and the codon optimized polynucleotide may be about 70% to 99.5% identical to the reference polynucleotide.
In some embodiments a cytosine base editor (CBE) of the present disclosure comprises a cytidine deaminase fused to a Cas12a nickase tethered to one (BE3) or two (BE4) monomers of uracil glycosylase inhibitor (UGI). In some embodiments the cytidine deaminase is PpAPOBEC1, e.g., having the following sequence:
In some embodiments the cytosine base editor comprises PpAPOBEC1 fused to a Cas12a nickase tethered to two (BE4) monomers of uracil glycosylase inhibitor (UGI), e.g., as described in Yu et al., Nat. Comm. 11:2052, 2020 and WO2020160517A1 (the entire contents of each of which are incorporated herein by reference).
In some embodiments, a deaminase comprises or is an adenine deaminase or an adenosine deaminase. An “adenine deaminase” and “adenosine deaminase” as used herein refer to a polypeptide or domain thereof that catalyzes or is capable of catalyzing the hydrolytic deamination (e.g., removal of an amine group from adenine) of adenine or adenosine. In some embodiments, an adenine deaminase may catalyze the hydrolytic deamination of adenosine or deoxy adenosine to inosine or deoxyinosine, respectively. In some embodiments, an adenine deaminase may catalyze the hydrolytic deamination of adenine or adenosine in DNA. In some embodiments, an adenine deaminase encoded by a nucleic acid may generate an A to G conversion in the sense (e.g., “+”: template) strand of the target nucleic acid or a T to C conversion in the antisense (e.g., complementary) strand of the target nucleic acid.
An adenine deaminase may be any known or later identified adenine deaminase from any organism (see, e.g., U.S. Pat. No. 10,113,163, which is incorporated by reference herein for its disclosure of adenine deaminases).
In some embodiments, an adenine deaminase may be a variant of a naturally-occurring adenine deaminase. Thus, in some embodiments, an adenine deaminase may be about 70% to 100% identical to a wild-type adenine deaminase (e.g., about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical, and any range or value therein, to a naturally occurring adenine deaminase). In some embodiments, an adenine deaminase does not occur in nature and may be referred to as an engineered, mutated or evolved adenine deaminase. Thus, for example, an engineered, mutated or evolved adenine deaminase polypeptide or an adenine deaminase domain may be about 70% to 99.9% identical to a naturally occurring adenine deaminase polypeptide/domain (e.g., about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8% or 99.9% identical, and any range or value therein, to a naturally occurring adenine deaminase polypeptide or adenine deaminase domain). In some embodiments, the adenosine deaminase may be from a bacterium, (e.g., Escherichia coli, Staphylococcus aureus, Haemophilus influenzae, Caulobacter crescentus). In some embodiments, a polynucleotide encoding an adenine deaminase polypeptide/domain may be codon optimized for expression in a mammal and the codon optimized polynucleotide may be about 70% to 99.5% identical to the reference polynucleotide.
In some embodiments, an adenine deaminase domain may be a wild-type tRNA-specific adenosine deaminase domain, e.g., a tRNA-specific adenosine deaminase (TadA) and/or a mutated/evolved adenosine deaminase domain, e.g., mutated/evolved tRNA-specific adenosine deaminase domain (TadA*). In some embodiments, a TadA domain may be from E. coli. In some embodiments, a TadA may be modified, e.g., truncated, missing one or more N-terminal and/or C-terminal amino acids relative to a full-length TadA (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 6, 17, 18, 19, or 20 N-terminal and/or C terminal amino acid residues may be missing relative to a full length TadA. In some embodiments, a TadA polypeptide or TadA domain does not comprise an N-terminal methionine. In some embodiments, a wild-type E. coli TadA comprises the amino acid sequence of SEQ ID NO: 71. In some embodiments, a mutated/evolved E. coli TadA* comprises the amino acid sequence of SEQ ID NOs: 72-75 (e.g., SEQ ID NOs: 72, 73, 74, or 75). In some embodiments, a polynucleotide encoding a TadA/TadA* may be codon optimized for expression in a mammal. In some embodiments, an adenine deaminase may comprise all or a portion of an amino acid sequence of any one of SEQ ID NOs: 76-81. In some embodiments, an adenine deaminase may comprise all or a portion of an amino acid sequence of any one of SEQ ID NOs: 71-81.
In some embodiment an adenine base editor of the present disclosure comprises an adenosine deaminase with one or more mutations to reduce undesirable RNA editing activity. In some embodiments, the base editor comprises an engineered E. coli TadA, e.g., with the mutations found in ABEs 0.1, 0.2, 1.1, 1.2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 2.10, 2.11, 2.12, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 4.1, 4.2, 4.3, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 5.10, 5.11, 5.12, 5.13, 5.14, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 7.10, ABEmax as described in Gaudelli et al., Nature 551 (7681): 464-471, 2017 and Koblan et al., Nat. Biotechnol. 36 (9): 843-846, 2018 (the entire contents of each of which are incorporated herein by reference) or any of the ABE8s variants, e.g., ABE8.17-m, described in Gaudelli et al., Nat. Biotechnol. 38:892-900, 2020 and US20210130805A1 (the entire contents of each of which are incorporated herein by reference). The mutations can include substitution with any other amino acid other than the wild-type amino acid. In some embodiments the substitution is with alanine or glycine. For example, the engineered E. coli TadA sequence present in ABE7.10 (TadA*7.10) is as follows:
In ABE7.10, a wild-type E. coli TadA sequence is fused to this engineered E. coli TadA sequence using a 32 amino acid linker, forming a heterodimer, the sequence of which is as follows:
As a further example, ABE8.17-m comprises a monomeric construct containing TadA*7.10 with V82S and Q154R mutations (TadA*8.17) as follows:
In some embodiments an adenine base editor (ABE) of the present disclosure comprises an adenosine deaminase fused to a Cas12a nickase, e.g., a Cas12a nickase fused to a wild-type E. coli TadA, e.g., of SEQ ID NO: 68:
In some embodiments, a nucleic acid of the present disclosure may further encode a glycosylase inhibitor (e.g., a uracil glycosylase inhibitor (UGI) such as uracil-DNA glycosylase inhibitor). Thus, in some embodiments, a nucleic acid encoding a Cas12a effector protein and a cytosine deaminase and/or adenine deaminase may further encode a glycosylase inhibitor, optionally wherein the glycosylase inhibitor may be codon optimized for expression in a mammal. In some embodiments, present disclosure provides fusion proteins comprising a Cas12a effector protein and a UGI and/or one or more polynucleotides encoding the same, optionally wherein the one or more polynucleotides may be codon optimized for expression in a mammal. In some embodiments, the present disclosure provides fusion proteins comprising a Cas12a effector protein, a deaminase domain (e.g., an adenine deaminase domain and/or a cytosine deaminase domain) and a UGI and/or one or more polynucleotides encoding the same, optionally wherein the one or more polynucleotides may be codon optimized for expression in a mammal. In some embodiments, the invention provides fusion proteins, wherein a Cas12a effector protein, a deaminase domain, and/or a UGI may be fused to any combination of peptide tags and affinity polypeptides as described herein, which may thereby recruit the deaminase domain and/or UGI to the Cas12a effector protein and to a target nucleic acid. In some embodiments, a guide nucleic acid may be linked to a recruiting RNA motif and one or more of the deaminase domain and/or UGI may be fused to an affinity polypeptide that is capable of interacting with the recruiting RNA motif, thereby recruiting the deaminase domain and UGI to a target nucleic acid.
A “uracil glycosylase inhibitor” or “UGI” may be any protein or polypeptide or domain thereof that is capable of inhibiting a uracil-DNA glycosylase base-excision repair enzyme. In some embodiments, a UGI comprises a wild-type UGI or a fragment thereof. In some embodiments, a UGI is about 70% to about 100% identical (e.g., 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5% or 100% identical and any range or value therein) to the amino acid sequence of a naturally occurring UGI. In some embodiments, a UGI may comprise the amino acid sequence of:
or a polypeptide having about 70% to about 99.5% identity to the amino acid sequence of SEQ ID NO: 82 (e.g., at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of SEQ ID NO: 82). For example, in some embodiments, a UGI may comprise a fragment of the amino acid sequence of SEQ ID NO: 82 that is 100% identical to a portion of consecutive nucleotides (e.g., 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80 consecutive nucleotides; e.g., about 10, 15, 20, 25, 30, 35, 40, 45, to about 50, 55, 60, 65, 70, 75, 80 consecutive nucleotides) of the amino acid sequence of SEQ ID NO: 82. In some embodiments, a UGI may be a variant of a known UGI (e.g., SEQ ID NO: 82) having about 70% to about 99.5% identity (e.g., 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5% identity, and any range or value therein) to the known UGI. In some embodiments, a polynucleotide encoding a UGI may be codon optimized for expression in a mammal and the codon optimized polynucleotide may be about 70% to about 99.5% identical to the reference polynucleotide.
Guide RNA (gRNA) Molecules
A gRNA molecule or gRNA for use in a CRISPR/Cas12a genome editing system generally includes a targeting domain and a complementarity domain (alternately referred to as a “handle”). It should also be noted that, in gRNAs for use with Cas12a, the targeting domain is usually present at or near the 3′ end, rather than the 5′ end as in connection with Cas9 gRNAs (the handle is at or near the 5′ end of a Cas12a gRNA).
Those of skill in the art will appreciate, however, that although structural differences may exist between gRNAs from different prokaryotic species, the principles by which gRNAs operate are generally consistent. Because of this consistency of operation, gRNAs can be defined, in broad terms, by their targeting domain sequences, and skilled artisans will appreciate that a given targeting domain sequence can be incorporated in any suitable gRNA, including a unimolecular or chimeric gRNA, or a gRNA that includes one or more chemical modifications and/or sequential modifications (substitutions, additional nucleotides, truncations, etc.). Thus, for economy of presentation in this disclosure, gRNAs may be described solely in terms of their targeting domain sequences.
More generally, skilled artisans will appreciate that some aspects of the present disclosure relate to systems, methods and compositions that can be implemented using multiple CRISPR/Cas effector proteins. For this reason, unless otherwise specified, the term gRNA should be understood to encompass any suitable gRNA that can be used with any CRISPR/Cas effector system, and not only those gRNAs that are compatible with a particular species of Cas12a. By way of illustration, the term gRNA can, in some embodiments, include a gRNA for use with any CRISPR/Cas effector protein occurring in a Class 2 CRISPR system, such as a Type V CRISPR system, or a CRISPR/Cas effector protein derived or adapted therefrom.
In some embodiments a method or system of the present disclosure may use more than one gRNA. In some embodiments, two or more gRNAs may be used to create two or more double strand breaks in the genome of a cell.
In some embodiments using more than one gRNA, a double-strand break may be caused by a dual-gRNA paired “nickase” strategy.
gRNA Design
Methods for selection and validation of target nucleic sequences as well as off-target analyses have been described previously, e.g., in Fu et al., Nat Biotechnol 32(3): 279-84 (2014), Heigwer et al., Nat methods 11(2): 122-3 (2014); Bae et al., Bioinformatics 30(10): 1473-5 (2014); and Xiao et al. Bioinformatics 30(8): 1180-1182 (2014). As a non-limiting example, gRNA design may involve the use of a software tool to optimize the choice of potential target nucleic sequences corresponding to a user's target nucleic sequence, e.g., to minimize total off-target activity across the genome. While off-target activity is not limited to cleavage, the cleavage efficiency at each off-target nucleic sequence can be predicted, e.g., using an experimentally-derived weighting scheme. These and other guide selection methods are described in detail in Park et al., Bioinformatics 34 (6): 1077-1079 (2018), the disclosure of which is hereby incorporated herein by reference in its entirety.
gRNA Modifications
In some embodiments, gRNAs as used herein may be modified or unmodified gRNAs. In some embodiments, gRNAs as used herein may be modified for increased activity compared to unmodified gRNAs. In some embodiments, a gRNA may include one or more modifications. In some embodiments, the one or more modifications may include a phosphorothioate linkage modification, a phosphorodithioate (PS2) linkage modification, a 2′-O-methyl modification, or combinations thereof. In some embodiments, the one or more modifications may be at the 5′ end of the gRNA, at the 3′ end of the gRNA, or combinations thereof.
In some embodiments, a gRNA modification may comprise one or more phosphorodithioate (PS2) linkage modifications.
In some embodiments, a gRNA used herein includes one or more or a stretch of deoxyribonucleic acid (DNA) bases, also referred to herein as a “DNA extension.” In some embodiments, a gRNA used herein includes a DNA extension at the 5′ end of the gRNA. In some embodiments, the DNA extension may be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 DNA bases long. For example, in some embodiments, the DNA extension may be 1, 2, 3, 4, 5, 10, 15, 20, or 25 DNA bases long. In some embodiments, the DNA extension may include one or more DNA bases selected from adenine (A), guanine (G), cytosine (C), or thymine (T). In some embodiments, the DNA extension includes the same DNA bases. For example, the DNA extension may include a stretch of adenine (A) bases. In some embodiments, the DNA extension may include a stretch of thymine (T) bases. In some embodiments, the DNA extension includes a combination of different DNA bases.
Exemplary suitable 5′ extensions for Cas12a guide RNAs are provided in Table 3 below:
In some embodiments, a gRNA used herein includes a DNA extension as well as a chemical modification, e.g., one or more phosphorothioate linkage modifications, one or more phosphorodithioate (PS2) linkage modifications, one or more 2′-O-methyl modifications, or one or more additional suitable chemical gRNA modification disclosed herein, or combinations thereof. In some embodiments, the one or more modifications may be at the 5′ end of the gRNA, at the 3′ end of the gRNA, or combinations thereof.
Without wishing to be bound by theory, it is contemplated that any DNA extension may be used with any gRNA disclosed herein, so long as it does not hybridize to the target nucleic acid being targeted by the gRNA and it also exhibits an increase in editing at the target nucleic acid site relative to a gRNA which does not include such a DNA extension.
In some embodiments, a gRNA used herein includes one or more or a stretch of ribonucleic acid (RNA) bases, also referred to herein as an “RNA extension”. In some embodiments, a gRNA used herein includes an RNA extension at the 5′ end of the gRNA, the 3′ end of the gRNA, or a combination thereof. In some embodiments, the RNA extension may be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 RNA bases long. For example, in some embodiments, the RNA extension may be 1, 2, 3, 4, 5, 10, 15, 20, or 25 RNA bases long. Exemplary suitable 5′ extensions for Cas12a guide RNAs are provided in Table 3 above. In some embodiments, the RNA extension may include one or more RNA bases selected from adenine (rA), guanine (rG), cytosine (rC), or uracil (rU), in which the “r” represents RNA. 2′-hydroxy. In some embodiments, the RNA extension includes the same RNA bases. For example, the RNA extension may include a stretch of adenine (rA) bases. In some embodiments, the RNA extension includes a combination of different RNA bases. In some embodiments, a gRNA used herein includes an RNA extension as well as one or more phosphorothioate linkage modifications, one or more phosphorodithioate (PS2) linkage modifications, one or more 2′-O-methyl modifications, one or more additional suitable gRNA modification, e.g., chemical modification, disclosed herein, or combinations thereof. In some embodiments, the one or more modifications may be at the 5′ end of the gRNA, at the 3′ end of the gRNA, or combinations thereof. In some embodiments, a gRNA including a RNA extension may comprise a sequence set forth herein.
It is contemplated that gRNAs used herein may also include an RNA extension and a DNA extension. In some embodiments, the RNA extension and DNA extension may both be at the 5′ end of the gRNA, the 3′ end of the gRNA, or a combination thereof. In some embodiments, the RNA extension is at the 5′ end of the gRNA and the DNA extension is at the 3′ end of the gRNA. In some embodiments, the RNA extension is at the 3′ end of the gRNA and the DNA extension is at the 5′ end of the gRNA.
In some embodiments, a gRNA which includes a modification, e.g., a DNA extension at the 5′ end and/or a chemical modification as disclosed herein, is complexed with a CRISPR/Cas effector protein, e.g., an Cas12a effector protein, to form an RNP, which is then employed to edit a target cell, e.g., a pluripotent stem cell or a progeny thereof.
Certain exemplary modifications discussed in this section can be included at any position within a gRNA sequence including, without limitation at or near the 5′ end (e.g., within 1-10, 1-5, or 1-2 nucleotides of the 5′ end) and/or at or near the 3′ end (e.g., within 1-10, 1-5, or 1-2 nucleotides of the 3′ end). In some cases, modifications are positioned within functional motifs, such as a stem loop structure of a Cas12a gRNA, and/or a targeting domain of a gRNA.
As one example, the 5′ end of a gRNA can include a eukaryotic mRNA cap structure or cap analog (e.g., a G(5′)ppp(5′)G cap analog, a m7G(5′)ppp(5′)G cap analog, or a 3′-O-Me-m7G(5′)ppp(5′) G anti-reverse cap analog (ARCA)), as shown below:
The cap or cap analog can be included during either chemical or enzymatic synthesis of the gRNA.
Along similar lines, the 5′ end of the gRNA can lack a 5′ triphosphate group. For instance, in vitro transcribed gRNAs can be phosphatase-treated (e.g., using calf intestinal alkaline phosphatase) to remove a 5′ triphosphate group.
Another common modification involves the addition, at the 3′ end of a gRNA, of a plurality (e.g., 1-10, 10-20, or 25-200) of adenine (A) residues referred to as a poly A tract. The poly A tract can be added to a gRNA during chemical or enzymatic synthesis, using a polyadenosine polymerase (e.g., E. coli Poly(A) Polymerase).
Guide RNAs can be modified at a 3′ terminal U ribose. For example, the two terminal hydroxyl groups of the U ribose can be oxidized to aldehyde groups and a concomitant opening of the ribose ring to afford a modified nucleoside as shown below:
wherein “U” can be an unmodified or modified uridine.
The 3′ terminal U ribose can be modified with a 2′3′ cyclic phosphate as shown below:
wherein “U” can be an unmodified or modified uridine.
Guide RNAs can contain 3′ nucleotides that can be stabilized against degradation, e.g., by incorporating one or more of the modified nucleotides described herein. In some embodiments, uridines can be replaced with modified uridines, e.g., 5-(2-amino) propyl uridine, and 5-bromo uridine, or with any of the modified uridines described herein: adenosines and guanosines can be replaced with modified adenosines and guanosines, e.g., with modifications at the 8-position, e.g., 8-bromo guanosine, or with any of the modified adenosines or guanosines described herein.
In some embodiments, sugar-modified ribonucleotides can be incorporated into a gRNA, e.g., wherein the 2′ OH-group is replaced by a group selected from H, —OR, —R (wherein R can be, e.g., alkyl, cycloalkyl, aryl, aralkyl, heteroaryl or sugar), halo, —SH, —SR (wherein R can be, e.g., alkyl, cycloalkyl, aryl, aralkyl, heteroaryl or sugar), amino (wherein amino can be, e.g., NH2, alkylamino, dialkylamino, heterocyclyl, arylamino, diarylamino, heteroarylamino, diheteroarylamino, or amino acid); or cyano (—CN). In some embodiments, the phosphate backbone can be modified as described herein, e.g., with a phosphothioate (PhTx) group. In some embodiments, one or more of the nucleotides of the gRNA can each independently be a modified or unmodified nucleotide including, but not limited to 2′-sugar modified, such as, 2′-O-methyl, 2′-O-methoxyethyl, or 2′-Fluoro modified including, e.g., 2′-F or 2′-O-methyl, adenosine (A), 2′-F or 2′-O-methyl, cytidine (C), 2′-F or 2′-O-methyl, uridine (U), 2′-F or 2′-O-methyl, thymidine (T), 2′-F or 2′-O-methyl, guanosine (G), 2′-O-methoxyethyl-5-methyluridine (Teo), 2′-O-methoxyethyladenosine (Aeo), 2′-O-methoxyethyl-5-methylcytidine (m5Ceo), and any combinations thereof.
Guide RNAs can also include “locked” nucleic acids (LNA) in which the 2′ OH-group can be connected, e.g., by a C1-6 alkylene or C1-6 heteroalkylene bridge, to the 4 carbon of the same ribose sugar. Any suitable moiety can be used to provide such bridges, including without limitation methylene, propylene, ether, or amino bridges; O-amino (wherein amino can be, e.g., NH2, alkylamino, dialkylamino, heterocyclyl, arylamino, diarylamino, heteroarylamino, or diheteroarylamino, ethylenediamine, or polyamino) and aminoalkoxy or O(CH2)n-amino (wherein amino can be, e.g., NH2, alkylamino, dialkylamino, heterocyclyl, arylamino, diarylamino, heteroarylamino, or diheteroarylamino, ethylenediamine, or polyamino).
In some embodiments, a gRNA can include a modified nucleotide which is multicyclic (e.g., tricyclo; and “unlocked” forms, such as glycol nucleic acid (GNA) (e.g., R-GNA or S-GNA, where ribose is replaced by glycol units attached to phosphodiester bonds), or threose nucleic acid (TNA, where ribose is replaced with α-L-threofuranosyl-(3′→2′)).
Generally, gRNAs include the sugar group ribose, which is a 5-membered ring having an oxygen. Exemplary modified gRNAs can include, without limitation, replacement of the oxygen in ribose (e.g., with sulfur(S), selenium (Se), or alkylene, such as, e.g., methylene or ethylene); addition of a double bond (e.g., to replace ribose with cyclopentenyl or cyclohexenyl); ring contraction of ribose (e.g., to form a 4-membered ring of cyclobutane or oxetane); ring expansion of ribose (e.g., to form a 6- or 7-membered ring having an additional carbon or heteroatom, such as for example, anhydrohexitol, altritol, mannitol, cyclohexanyl, cyclohexenyl, and morpholino that also has a phosphoramidate backbone). Although the majority of sugar analog alterations are localized to the 2′ position, other sites are amenable to modification, including the 4′ position. In some embodiments, a gRNA comprises a 4′-S, 4′-Se or a 4-C-aminomethyl-2′-O-Me modification.
In some embodiments, deaza nucleotides, e.g., 7-deaza-adenosine, can be incorporated into a gRNA. In some embodiments, O- and N-alkylated nucleotides, e.g., N6-methyl adenosine, can be incorporated into a gRNA. In some embodiments, one or more or all of the nucleotides in a gRNA are deoxynucleotides.
In some embodiments, a bifunctional cross-linker is used to link a 5′ end of a first gRNA fragment and a 3′ end of a second gRNA fragment, and the 3′ or 5′ ends of the gRNA fragments to be linked are modified with functional groups that react with the reactive groups of the cross-linker. In general, these modifications comprise one or more of amine, sulfhydryl, carboxyl, hydroxyl, alkene (e.g., a terminal alkene), azide and/or another suitable functional group. Multifunctional (e.g., bifunctional) cross-linkers are also generally known in the art, and may be either heterofunctional or homofunctional, and may include any suitable functional group, including without limitation isothiocyanate, isocyanate, acyl azide, an NHS ester, sulfonyl chloride, tosyl ester, tresyl ester, aldehyde, amine, epoxide, carbonate (e.g., Bis(p-nitrophenyl) carbonate), aryl halide, alkyl halide, imido ester, carboxylate, alkyl phosphate, anhydride, fluorophenyl ester, HOBt ester, hydroxy methyl phosphine, O-methylisourea, DSC, NHS carbamate, glutaraldehyde, activated double bond, cyclic hemiacetal, NHS carbonate, imidazole carbamate, acyl imidazole, methylpyridinium ether, azlactone, cyanate ester, cyclic imidocarbonate, chlorotriazine, dehydroazepine, 6-sulfo-cytosine derivatives, maleimide, aziridine, TNB thiol, Ellman's reagent, peroxide, vinylsulfone, phenylthioester, diazoalkanes, diazoacetyl, epoxide, diazonium, benzophenone, anthraquinone, diazo derivatives, diazirine derivatives, psoralen derivatives, alkene, phenyl boronic acid, etc. In some embodiments, a first gRNA fragment comprises a first reactive group and the second gRNA fragment comprises a second reactive group. For example, the first and second reactive groups can each comprise an amine moiety, which are crosslinked with a carbonate-containing bifunctional crosslinking reagent to form a urea linkage. In other instances, (a) the first reactive group comprises a bromoacetyl moiety and the second reactive group comprises a sulfhydryl moiety, or (b) the first reactive group comprises a sulfhydryl moiety and the second reactive group comprises a bromoacetyl moiety, which are crosslinked by reacting the bromoacetyl moiety with the sulfhydryl moiety to form a bromoacetyl-thiol linkage. These and other cross-linking chemistries are known in the art, and are summarized in the literature, including by Greg T. Hermanson, Bioconjugate Techniques, 3rd Ed. 2013, published by Academic Press.
Additional suitable gRNA modifications will be apparent to those of ordinary skill in the art based on the present disclosure. Suitable gRNA modifications include, for example, those described in PCT Publication Nos. WO2019070762A1, WO2016089433A1, WO2016164356A1, or WO2017053729A1, the entire contents of each of which are incorporated herein by reference.
Exemplary gRNAs
Non-limiting examples of guide RNAs suitable for certain embodiments embraced by the present disclosure are provided herein. Those of ordinary skill in the art will be able to envision suitable guide RNA sequences for a specific CRISPR effector protein, e.g., a Cas12a effector protein, from the disclosure of the targeting domain sequence, either as a DNA or RNA sequence. For example, a guide RNA comprising a targeting sequence consisting of RNA nucleotides would include the RNA sequence corresponding to the targeting domain sequence provided as a DNA sequence, and contain uracil instead of thymidine nucleotides. Suitable gRNA scaffold sequences are known to those of ordinary skill in the art. For a Cas12a, for example, a suitable scaffold sequence comprises a sequence selected from Table 4 or a pair of sequences selected from Table 5. In Table 5, it is understood that a “modulator sequence” listed herein may constitute the nucleotide sequence of a modulator nucleic acid. Alternatively, additional nucleotide sequences can be comprised in the modulator nucleic acid 5′ and/or 3′ to a “modulator sequence” listed herein. In the consensus PAM sequences of Table 4 and Table 5, N represents A, C, G or T. Where the PAM sequence is preceded by “5′,” it means that the PAM is located immediately upstream of the target nucleotide sequence when using the non-target strand (i.e., the strand not hybridized with the spacer sequence) as the coordinate.
Additional exemplary gRNA sequences include:
For an MG29-1 effector protein, a suitable guide RNA may comprise a backbone sequence comprising TAATTTCTACTGTTGTAGAT (SEQ ID NO: 55).
It will be understood that the exemplary targeting sequences provided herein are not limiting, and additional suitable sequences, e.g., variants of the specific sequences disclosed herein, will be apparent to the skilled artisan based on the present disclosure in view of the general knowledge in the art.
It will be understood that the exemplary gRNAs disclosed herein are provided to illustrate non-limiting embodiments embraced by the present disclosure. Additional suitable gRNA sequences will be apparent to the skilled artisan based on the present disclosure, and the disclosure is not limited in this respect.
In one aspect the present disclosure provides systems for editing the genome of a cell. In some embodiments, a Cas12a effector protein causes a double-strand break. In some embodiments a Cas12a effector protein causes a single-strand break, e.g., in some embodiments a Cas12a effector protein is a nickase.
Genome editing systems and methods comprising a Cas12a effector protein can be implemented (e.g., administered or delivered to a cell or a subject) in a variety of ways, and different implementations may be suitable for distinct applications. For instance, a genome editing system is implemented. In some embodiments, as a protein/RNA complex (a ribonucleoprotein, or RNP). In some embodiments, a genome editing system and/or method is implemented as one or more nucleic acids encoding a Cas12a effector protein and guide RNA components described herein (optionally with one or more additional components). In some embodiments, a genome editing system and/or method is implemented as one or more vectors comprising such nucleic acids, for instance a viral vector such as an adeno-associated virus. In some embodiments, a genome editing system and/or method is implemented as a combination of any of the foregoing. Additional or modified implementations that operate according to the principles set forth herein will be apparent to the skilled artisan and are within the scope of this disclosure.
In some embodiments, genome editing systems and/or methods may be capable of target disruption, such as target mutation or alteration, such as leading to gene knockout. In some embodiments, genome editing systems and/or methods may involve replacement of particular target sites, such as leading to target correction. In some embodiments, genome editing systems and/or methods may involve removal of particular target sites, such as leading to target deletion. In some embodiments, genome editing systems and methods comprise a Cas12a effector protein comprising a Cas12a dual nickase for homology directed repair (HDR). In some embodiments, genome editing systems and/or methods may involve modulation of target site functionality, such as target site activity or accessibility, leading for instance to (transcriptional and/or epigenetic) gene or genomic region activation or gene or genomic region silencing.
The present disclosure further provides a method of altering a cell, e.g., altering the structure, e.g., altering the sequence, of a target nucleic acid of a cell, comprising contacting the cell with: (a) a gRNA molecule as described herein and (b) a Cas12a effector protein or fusion protein as described herein, and optionally, (c) a second gRNA molecule as described herein. In another aspect, disclosed herein is a method of treating a subject (e.g., a subject suffering from a disease, e.g., a cancer), e.g., altering the structure, e.g., sequence, of a target nucleic acid of the subject, comprising contacting the subject (or a cell from the subject) with: (a) a gRNA as described herein; and (b) a Cas12a effector protein or fusion protein as described herein, and optionally, (c) a second gRNA molecule as described herein.
In some embodiments, the contacting comprises delivering to the cell a Cas12a effector protein or fusion protein of (b) as a protein or an mRNA, and a nucleic acid molecule which encodes (a) and optionally (c). In some embodiments, the contacting comprises delivering to the cell a Cas12a effector protein or fusion protein of (b) as a protein or an mRNA, the gRNA of (a) as an RNA, and optionally the second gRNA of (c), as an RNA.
In some embodiments, (a) and (b) are present on one nucleic acid molecule, e.g., one vector, e.g., one viral vector, e.g., an AAV vector. Exemplary AAV vectors that may be used in any of the described compositions and methods include an AAV1 vector, a modified AAV1 vector, an AAV2 vector, a modified AAV2 vector, an AAV3 vector, an AAV4 vector, a modified AAV4 vector, an AAV5 vector, a modified AAV5 vector, a modified AAV3 vector, an AAV6 vector, a modified AAV6 vector, an AAV7 vector, a modified AAV7 vector, an AAV8 vector, an AAV5 vector, an AAV.rh10 vector, a modified AAV.rh10 vector, an AAV.rh32/33 vector, a modified AAV.rh32/33 vector, an AAV.rh43 vector, a modified AAV.rh43 vector, an AAV.rh64R1 vector, and a modified AAV.rh64R1 vector. In some embodiments, (a) is present on a first nucleic acid molecule, e.g., a first vector, e.g., a first viral vector, e.g., a first AAV vector; and (b) is present on a second nucleic acid molecule, e.g., a second vector, e.g., a second vector, e.g., a second AAV vector. The first and second nucleic acid molecules may be AAV vectors.
In some embodiments, (a) and (c) are be present on one nucleic acid molecule, e.g., one vector, e.g., one viral vector, e.g., one AAV vector. In some embodiments, (a) and (c) are on different vectors. For example, (a) may be present on a first nucleic acid molecule, e.g., a first vector, e.g., a first viral vector, e.g., a first AAV vector; and (c) may be present on a second nucleic acid molecule, e.g., a second vector, e.g., a second vector, e.g., a second AAV vector. In some embodiments, the first and second nucleic acid molecules are AAV vectors.
In some embodiments, (a), (b), and (c) are present on one nucleic acid molecule, e.g., one vector, e.g., one viral vector, e.g., an AAV vector. In some embodiments, the nucleic acid molecule is an AAV vector. In some embodiments, one of (a), (b), and (c) is present on a first nucleic acid molecule, e.g., a first vector, e.g., a first viral vector, e.g., a first AAV vector; and a second and third of (a), (b), and (c) is encoded on a second nucleic acid molecule, e.g., a second vector, e.g., a second vector, e.g., a second AAV vector. The first and second nucleic acid molecule may be AAV vectors.
In some embodiments, (a) is present on a first nucleic acid molecule, e.g., a first vector, e.g., a first viral vector, a first AAV vector; and (b) and (c) are present on a second nucleic acid molecule, e.g., a second vector, e.g., a second vector, e.g., a second AAV vector. The first and second nucleic acid molecule may be AAV vectors.
In some embodiments, (b) is present on a first nucleic acid molecule, e.g., a first vector, e.g., a first viral vector, e.g., a first AAV vector; and (a) and (c) are present on a second nucleic acid molecule, e.g., a second vector, e.g., a second vector, e.g., a second AAV vector. The first and second nucleic acid molecule may be AAV vectors.
In some embodiments, (c) is present on a first nucleic acid molecule, e.g., a first vector, e.g., a first viral vector, e.g., a first AAV vector; and (b) and (a) are present on a second nucleic acid molecule, e.g., a second vector, e.g., a second vector, e.g., a second AAV vector. The first and second nucleic acid molecule may be AAV vectors.
In some embodiments, each of (a), (b) and (c) are present on different nucleic acid molecules, e.g., different vectors, e.g., different viral vectors, e.g., different AAV vector. For example, (a) may be on a first nucleic acid molecule, (b) on a second nucleic acid molecule, and (c) on a third nucleic acid molecule. The first, second and third nucleic acid molecule may be AAV vectors.
AAV vectors may be formulated as AAV particles as described herein. In some embodiments, AAV particles comprise (i) an AAV polynucleotide construct (e.g., a recombinant AAV polynucleotide construct), and (ii) a capsid comprising capsid proteins. In some embodiments, an AAV polynucleotide construct comprises a polynucleotide sequence encoding a CRISPR/Cas12a effector protein or a characteristic portion thereof. In some embodiments, an AAV polynucleotide construct comprises a polynucleotide sequence encoding a gRNA molecule or a characteristic portion thereof.
In certain embodiments, the contacting comprises delivering to the cell the gRNA of (a) as an RNA, optionally the second gRNA of (c) as an RNA, and a nucleic acid composition that encodes a Cas12a effector protein or fusion protein of (b).
In some embodiments, a gRNA molecule as described herein and a Cas12a effector protein or fusion protein as described herein or a nucleic acid encoding the Cas12a effector protein or fusion protein, and optionally the gRNA molecule, and further, optionally, a second gRNA molecule, as described herein can be delivered to a cell via a lipid-based system. A lipid-based system can comprise any components and/or structures known in the art. In some embodiments, a lipid-based system is or comprises a lipid nanoparticle (LNP).
A CRISPR/Cas effector protein or fusion protein can be delivered to the cell as a protein or a nucleic acid encoding the protein, e.g., a DNA molecule or mRNA molecule. The guide molecule can be delivered as an RNA molecule or encoded by a DNA molecule. A CRISPR/Cas effector protein or fusion protein can also be delivered with a guide molecule as a ribonucleoprotein (RNP) and introduced into the cell via nucleofection (electroporation).
In some embodiments, the method of altering a cell, e.g., altering the structure, e.g., altering the sequence, of a target nucleic acid of a cell, comprising altering one or more target genes expressed by target cells as described herein. In some embodiments, the method of altering a cell comprises altering two or more target genes expressed by target cells as described herein. In some embodiments, the method of altering a cell comprises altering three or more target genes expressed by target cells as described herein. In some embodiments, the method of altering a cell comprises altering four or more target genes expressed by target cells as described herein. In some embodiments, the method of altering a cell comprises altering five or more target genes expressed by target cells as described herein. In some embodiments, the method of altering a cell comprises altering six or more target genes expressed by target cells as described herein. In certain embodiments, the method of altering a cell comprises altering seven or more target genes expressed by target cells as described herein. In certain embodiments, the method of altering a cell comprises altering each of a target gene as described herein.
In some embodiments, a contacting step comprises contacting the cell with a nucleic acid composition as described herein. In some embodiments, a contacting step comprises contacting the cell with a composition as described herein. In some embodiments, the composition is a ribonucleoprotein composition.
In some embodiments, a nucleic acid composition further comprises (c) a third nucleotide sequence that encodes a second gRNA molecule comprising a targeting domain that is complementary with a target domain from a target cell. In some embodiments, a second gRNA targets the same target position as the first gRNA molecule.
The presently disclosed subject matter further provides a reaction mixture comprising a, gRNA molecule as described herein, a nucleic acid composition as described herein, or a composition as described herein, and a cell, e.g., a cell from a subject who would benefit from one or more alteration at one or more cell target positions in the one or more target genes.
The presently disclosed subject matter further provides a kit comprising, (a) a gRNA molecule as described herein, or a nucleic acid composition that encodes the gRNA, and one or more of the following: (b) a Cas12a effector protein or fusion protein as described herein: (c) a second gRNA molecule as described herein.
Additionally, the presently disclosed subject matter provides a gRNA molecule as described herein for use in treating a disease, e.g., a cancer, in a subject. In some embodiments, the gRNA molecule is used in combination with (b) a Cas12a effector protein or fusion protein.
The presently disclosed subject matter further provides use of a gRNA molecule as described herein in the manufacture of a medicament for treating a disease, e.g., a cancer, in a subject. In certain embodiments, the medicament further comprises (b) a Cas12a effector protein or fusion protein.
A skilled person will understand that modulation of target site functionality may involve a CRISPR effector protein variant (such as for instance generation of a catalytically inactive or dead CRISPR effector) and/or functionalization (such as for instance fusion of the CRISPR effector with a heterologous functional domain, such as a deaminase), as described herein. Accordingly, in some embodiments the present disclosure relates to engineered compositions for site directed base editing comprising modified CRISPR effector protein and functional domain(s). In some embodiments, a functional domain comprises a deaminase or catalytic domain thereof, including a cytidine and/or adenine deaminase. Example functional domains suitable for use in the embodiments disclosed herein are discussed in further detail herein.
All publications, patents and patent applications cited herein, whether supra or infra, are hereby incorporated by reference in their entirety.
Throughout this specification, unless the context requires otherwise, the words “comprise”, “comprises” and “comprising” will be understood to imply the inclusion of a stated step or element or group of steps or elements but not the exclusion of any other step or element or group of steps or elements. By “consisting of is meant including, and limited to, whatever follows the phrase “consisting of:” Thus, the phrase “consisting of” indicates that the listed elements are required or mandatory, and that no other elements may be present. By “consisting essentially of” is meant including any elements listed after the phrase, and limited to other elements that do not interfere with or contribute to the activity or action specified in the disclosure for the listed elements. Thus, the phrase “consisting essentially” of indicates that the listed elements are required or mandatory, but that no other elements are optional and may or may not be present depending upon whether or not they affect the activity or action of the listed elements.
These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.
The various embodiments described above can be combined to provide further embodiments. All of the U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non-patent publications referred to in this specification and/or listed in the Application Data Sheet are incorporated herein by reference, in their entirety. The contents of database entries, e.g., NCBI nucleotide or protein database entries provided herein, are incorporated herein in their entirety. Where database entries are subject to change over time, the contents as of the filing date of the present application are incorporated herein by reference. Aspects of the embodiments can be modified, if necessary to employ concepts of the various patents, applications and publications to provide yet further embodiments.
The disclosure is further illustrated by the following examples. The examples are provided for illustrative purposes only. They are not to be construed as limiting the scope or content of the disclosure in any way.
The present example describes AsCas12a effector proteins (nucleases) comprising one or more substitutions at certain residues with increased activity compared to wild-type AsCas12a proteins and other AsCas12a proteins.
The disclosure contemplates that certain amino acid residues of an AsCas12a effector protein may be substituted (or mutated) to generate AsCas12a effector proteins with increased activity(ies). An AsCas12a effector protein comprising two or more substitutions (e.g., M537R and F870L, e.g., SEQ ID NO: 6) has been demonstrated to result in increased activity compared to wild-type AsCas12a (SEQ ID NO: 1) (e.g., see Zhang et al., Nature Communications, 12:3908 (2021)), the disclosure of which is hereby incorporated herein by reference in its entirety).
As described in Example 5, Q571K and C1003Y substitutions in Lb2Cas12a effector proteins increased activity compared to wild-type Lb2Cas12a. It is an insight of the present disclosure that these substitutions, alone or in combination with other mutations, are expected to confer increased activity in AsCas12a effector proteins.
It was appreciated that amino acid sequences surrounding Lb2Cas12a Q571K and C1003Y mutations are conserved across Cas12a effector proteins. For example,
The present example also describes a variety of AsCas12a effector proteins comprising one of more mutations (e.g., E174R, S542R, and K548R, e.g., SEQ ID NO: 83) that exhibit higher activity compared to wild-type AsCas12a effector proteins (SEQ ID NO: 1).
The present example also describes a variety of AsCas12a effector proteins comprising one or more mutations that exhibit higher activity compared to wild-type AsCas12a effector proteins or AsCas12a effector proteins comprising SEQ ID NO: 6. This example refers to such exemplary AsCas12a effector proteins as “charge mutants”.
A variety of charge mutants were made and/or can be made by rational design as described herein (e.g., SEQ ID NOs: 7-10). For example, amino acid substitutions described by this example were designed as those residues that are spatially segregated from substitutions made in Exemplary AsCas12a variant 1 (SEQ ID NO: 6).
Charge mutants (e.g., Exemplary AsCas12a variant 3 (SEQ ID NO: 7), Exemplary AsCas12a variant 4 (SEQ ID NO: 8), Exemplary AsCas12a variant 5 (SEQ ID NO: 9), and Exemplary AsCas12a variant 6 (SEQ ID NO: 10), Exemplary AsCas12a variant 1 (SEQ ID NO: 6), and wild-type AsCas12a (SEQ ID NO: 1) were formulated as RNPs and administered to target cells at different concentrations to determine knock out efficiency of a target gene (TRAC) as determined by flow cytometry (see
Additional combinations of amino acid substitutions (mutations) relative to exemplary AsCas12a wild-type amino acid sequence SEQ ID NO: 1 are shown in Table 6. Alignments of AsCas12a sequences and exemplary substitutions as described herein are provided in
It is contemplated that amino acid substitutions that can be made are not limited by the present disclosure, and other substitutions can be incorporated into AsCas12a effector (nuclease) proteins.
The present example describes AsCas12a effector proteins (nickases) comprising one or more substitutions at certain residues with increased activity compared to wild-type AsCas12a proteins.
An AsCas12a effector protein comprising two or more substitutions (e.g., K1000G and S1001G) has been demonstrated to result in a nickase version of wild-type AsCas12a (SEQ ID NO: 1). Additional combinations of these nickase-inducing mutations with amino acid substitutions (mutations) that increase activity relative to exemplary AsCas12a wild-type amino acid sequence SEQ ID NO: 1 are shown in Table 7. Alignments of sequences and exemplary substitutions as described herein are provided in
An AsCas12a effector protein comprising a substitution (e.g., R1226A) has been demonstrated to result in a nickase version of an AsCas12a effector protein (SEQ ID NO: 83). Additional combinations of this nickase-inducing mutation with amino acid substitutions (mutations) that increase activity relative to exemplary AsCas12a wild-type amino acid sequence (SEQ ID NO: 1) are shown in Table 7. Alignments of sequences and exemplary substitutions as described herein are provided in
In some embodiments, an AsCas12a effector protein can comprise a combination of mutations at one or more positions including E174, S542, K548, and R1226. In some embodiments, an AsCas12a effector protein can comprise a combination of E174R, S542R, K548R, and R1226A mutations.
It is contemplated that amino acid substitutions that can be made are not limited by the present disclosure, and other substitutions can be incorporated into Cas12a effector (nickase) proteins.
The present example describes FnCas12a effector proteins (nucleases) comprising one or more substitutions at certain residues with increased activity compared to wild-type FnCas12a proteins.
The disclosure contemplates that certain amino acid residues of an FnCas12a effector protein may be substituted (or mutated) to generate FnCas12a effector proteins with increased activity(ies). For example, the present disclosure describes that N602 and/or F879 residues of an amino acid sequence provided in SEQ ID NO: 2 can be substituted (or mutated). The present disclosure also describes that other residues of an amino acid sequence provided in SEQ ID NO: 2 can be substituted (or mutated) in various combinations.
As described in Example 1, an AsCas12a effector protein comprising two or more substitutions (e.g., M537R and F870L, e.g., SEQ ID NO: 6) has been demonstrated to result in increased activity compared to wild-type AsCas12a (SEQ ID NO: 1) (e.g., see Zhang et al., Nature Communications, 12:3908 (2021)), the disclosure of which is hereby incorporated herein by reference in its entirety). It is an insight of the present disclosure that FnCas12a effector proteins and AsCas12a effector proteins share sequence conservation of domains (see
Additional combinations of amino acid substitutions (mutations) relative to exemplary FnCas12a wild-type amino acid sequence SEQ ID NO: 2 are shown in Table 8. Alignments of FnCas12a sequences with corresponding AsCas12a or Lb2Cas12a sequences and exemplary substitutions as described herein are provided in
It is contemplated that amino acid substitutions that can be made are not limited by the present disclosure, and other substitutions can be incorporated into AsCas12a effector (nuclease) proteins.
The present example describes FnCas12a effector proteins (nickases) comprising one or more substitutions at certain residues with increased activity compared to wild-type FnCas12a proteins.
The disclosure contemplates that certain amino acid residues of an FnCas12a effector protein may be substituted (or mutated) to generate FnCas12a effector proteins that are nickases, optionally with increased activity(ies). For example, the present disclosure describes that K1013 and/or R1014 residues of an amino acid sequence provided in SEQ ID NO: 2 can be substituted (or mutated).
An AsCas12a effector protein comprising two or more substitutions (e.g., K1000G and S1001G) has been demonstrated to result in a nickase version of wild-type AsCas12a (SEQ ID NO: 1). It is an insight of the present disclosure that FnCas12a effector proteins and AsCas12a effector proteins share sequence conservation around these positions (see
An AsCas12a effector protein comprising a substitution (e.g., R1226A) has been demonstrated to result in a nickase version of an AsCas12a effector protein (SEQ ID NO: 83). It is an insight of the present disclosure that FnCas12a effector proteins and AsCas12a effector proteins share sequence conservation around this position (see
Additional combinations of these nickase-inducing mutations with amino acid substitutions (mutations) that increase activity relative to exemplary FnCas12a wild-type amino acid sequence SEQ ID NO: 2 are shown in Table 9. Alignments of FnCas12a sequences with corresponding AsCas12a or Lb2Cas12a sequences and exemplary substitutions as described herein are provided in
In some embodiments, an FnCas12a effector protein can comprise a combination of the R1218A mutation with mutations of residues E184, N607, and K613 of FnCas12a to produce effector proteins with increased nickase activity. In particular, mutation of residues E184, N607, and K613 of FnCas12a to have E184R, N607R, and K613R, which correspond to E174R, S542R, and K548R in an exemplary AsCas12a effector protein (SEQ ID NO: 84).
It is contemplated that amino acid substitutions that can be made are not limited by the present disclosure, and other substitutions can be incorporated into Cas12a effector (nickase) proteins.
The present example describes Lb2Cas12a effector proteins comprising one or more substitutions at certain residues with increased activity compared to wild-type Lb2Cas12a proteins. A similar strategy described by Example 1 was used to determine certain amino acid substitutions for improvement of activity(ies) of Lb2Cas12a effectors proteins described by this example.
Lb2Cas12a effector proteins are smaller (e.g., about 300 base pairs smaller) than other Cas12a orthologues (such as AsCas12a). Although, an Lb2Cas12a effector protein comprising Q571K and C1003Y substitutions increased activity compared to wild-type Lb2Cas12a (see Tran et al., Molecular Therapy Nucleic Acids, 24:P40-53 (2021), the disclosure of which is hereby incorporated herein by reference in its entirety), it is contemplated by the present disclosure that additional mutations made to Lb2Cas12a may further increase activity and make this Cas12a orthologue more attractive for genome editing applications.
As described in Example 1, an AsCas12a effector protein comprising two or more substitutions (e.g., M537R and F870L, e.g., SEQ ID NO: 6) has been demonstrated to result in increased activity compared to wild-type AsCas12a (SEQ ID NO: 1) (e.g., see Zhang et al., Nature Communications, 12:3908 (2021)), the disclosure of which is hereby incorporated herein by reference in its entirety). It is an insight of the present disclosure that Lb2Cas12a effector proteins and AsCas12a effector proteins share sequence conservation of domains (see
Additional combinations of amino acid substitutions (mutations) relative to exemplary Lb2Cas12a wild-type amino acid sequence SEQ ID NO: 3 are shown in Table 10. Alignments of Lb2Cas12a sequences and exemplary substitutions as described herein are provided in
The present example describes Lb2Cas12a effector proteins (nickases) comprising one or more substitutions at certain residues with increased activity compared to wild-type Lb2Cas12a proteins.
The disclosure also contemplates that certain amino acid residues of an Lb2Cas12a effector protein may be substituted (or mutated) to generate Lb2Cas12a effector proteins that are nickases, optionally with increased activity(ies). For example, the present disclosure describes that K913 and/or R914 residues of an amino acid sequence provided in SEQ ID NO: 3 can be substituted (or mutated).
An AsCas12a effector protein comprising two or more substitutions (e.g., K1000G and S1001G) has been demonstrated to result in a nickase version of wild-type AsCas12a (SEQ ID NO: 1). It is an insight of the present disclosure that Lb2Cas12a effector proteins and AsCas12a effector proteins share sequence conservation around these positions (see
An AsCas12a effector protein comprising a substitution (e.g., R1226A) has been demonstrated to result in a nickase version of an AsCas12a effector protein (SEQ ID NO: 83). It is an insight of the present disclosure that Lb2Cas12a effector proteins and AsCas12a effector proteins share sequence conservation around these positions (see
Additional combinations of these nickase-inducing mutations with amino acid substitutions (mutations) that increase activity relative to exemplary Lb2Cas12a wild-type amino acid sequence SEQ ID NO: 3 are shown in Table 11. Alignments of Lb2Cas12a sequences and exemplary substitutions as described herein are provided in
In some embodiments, an Lb2Cas12a effector protein can comprise a combination of the R1218A mutation with mutations of residues E184, N607, and K613 of Lb2Cas12a to produce effector proteins with increased nickase activity. In particular, mutation of residues K155, N512, and K518 of FnCas12a to have K155R, N512R, and K518R, which correspond to E174R, S542R, and K548R in an exemplary AsCas12a effector protein (SEQ ID NO: 84).
It is contemplated that amino acid substitutions that can be made are not limited by the present disclosure, and other substitutions can be incorporated into Lb2Cas12a effector (nickase) proteins.
The present example describes LbCas12a effector proteins comprising one or more substitutions at certain residues with increased activity compared to wild-type LbCas12a proteins. A similar strategy described by Example 1 was used to determine certain amino acid substitutions for improvement of activity(ies) of LbCas12a effectors proteins described by this example.
As described in Example 1, an AsCas12a effector protein comprising two or more substitutions (e.g., M537R and F870L, e.g., SEQ ID NO: 6) has been demonstrated to result in increased activity compared to wild-type AsCas12a (SEQ ID NO: 1) (e.g., see Zhang et al., Nature Communications, 12:3908 (2021)), the disclosure of which is hereby incorporated herein by reference in its entirety). It is an insight of the present disclosure that LbCas12a effector proteins and AsCas12a effector proteins share sequence conservation of domains (see
Additional combinations of amino acid substitutions (mutations) relative to exemplary LbCas12a wild-type amino acid sequence SEQ ID NO: 4 are shown in Table 12. Alignments of LbCas12a sequences and exemplary substitutions as described herein are provided in
The present example describes LbCas12a effector proteins (nickases) comprising one or more substitutions at certain residues with increased activity compared to wild-type LbCas12a proteins.
The disclosure contemplates that certain amino acid residues of an LbCas12a effector protein may be substituted (or mutated) to generate LbCas12a effector proteins that are nickases, optionally with increased activity(ies). For example, the present disclosure describes that K932 and/or N933 residues of an amino acid sequence provided in SEQ ID NO: 4 can be substituted (or mutated).
An AsCas12a effector protein comprising two or more substitutions (e.g., K1000G and S1001G) has been demonstrated to result in a nickase version of wild-type AsCas12a (SEQ ID NO: 1). It is an insight of the present disclosure that LbCas12a effector proteins and AsCas12a effector proteins share sequence conservation around these positions (see
An AsCas12a effector protein comprising a substitution (e.g., R1226A) has been demonstrated to result in a nickase version of an AsCas12a effector protein (SEQ ID NO: 83). It is an insight of the present disclosure that LbCas12a effector proteins and AsCas12a effector proteins share sequence conservation around these positions (see
Additional combinations of these nickase-inducing mutations with amino acid substitutions (mutations) that increase activity relative to exemplary LbCas12a wild-type amino acid sequence SEQ ID NO: 4 are shown in Table 13. Alignments of LbCas12a sequences and exemplary substitutions as described herein are provided in
In some embodiments, an LbCas12a effector protein can comprise a combination of the R1138 mutation with mutations of residues D156, G532, and K538 of LbCas12a to produce effector proteins with increased nickase activity. In particular, mutation of residues D156, G532, and K538 of LbCas12a to have D156R, G532R, and K538R, which correspond to E174R, S542R, and K548R in an exemplary AsCas12a effector protein (SEQ ID NO: 84).
It is contemplated that amino acid substitutions that can be made are not limited by the present disclosure, and other substitutions can be incorporated into LbCas12a effector (nickase) proteins.
The present example describes MbCas12a effector proteins comprising one or more mutations at certain residues with increased activity compared to wild-type MbCas12a proteins. A similar strategy described by Example 1 was used to determine certain amino acid substitutions for improvement of activity(ies) of MbCas12a effectors proteins described by this example.
As described in Example 1, an AsCas12a effector protein comprising two or more substitutions (e.g., M537R and F870L, e.g., SEQ ID NO: 6) has been demonstrated to result in increased activity compared to wild-type AsCas12a (SEQ ID NO: 1) (e.g., see Zhang et al., Nature Communications, 12:3908 (2021)), the disclosure of which is hereby incorporated herein by reference in its entirety). It is an insight of the present disclosure that MbCas12a effector proteins and AsCas12a effector proteins share sequence conservation of domains (see
Additional combinations of amino acid substitutions (mutations) relative to exemplary MbCas12a wild-type amino acid sequence SEQ ID NO: 5 are shown in Table 14. Alignments of MbCas12a sequences and exemplary substitutions as described herein are provided in
The present example describes MbCas12a effector proteins (nickases) comprising one or more substitutions at certain residues with increased activity compared to wild-type MbCas12a proteins.
The disclosure contemplates that certain amino acid residues of an MbCas12a effector protein may be substituted (or mutated) to generate MbCas12a effector proteins that are nickases, optionally with increased activity(ies). For example, the present disclosure describes that K965 and/or R966 residues of an amino acid sequence provided in SEQ ID NO: 5 can be substituted (or mutated).
An AsCas12a effector protein comprising two or more substitutions (e.g., K1000G and S1001G) has been demonstrated to result in a nickase version of wild-type AsCas12a (SEQ ID NO: 1). It is an insight of the present disclosure that MbCas12a effector proteins and AsCas12a effector proteins share sequence conservation around these positions (see
An AsCas12a effector protein comprising a substitution (e.g., R1226A) has been demonstrated to result in a nickase version of an AsCas12a effector protein (SEQ ID NO: 83). It is an insight of the present disclosure that MbCas12a effector proteins and AsCas12a effector proteins share sequence conservation around these positions (see
Additional combinations of these nickase-inducing mutations with amino acid substitutions (mutations) that increase activity relative to exemplary MbCas12a wild-type amino acid sequence SEQ ID NO: 5 are shown in Table 15. Alignments of MbCas12a sequences and exemplary substitutions as described herein are provided in
In some embodiments, an MbCas12a effector protein can comprise a combination of the R1171 mutation with mutations of residues D172, N563, and K569 of MbCas12a to produce effector proteins with increased nickase activity. In particular, mutation of residues D172, N563, and K569 of MbCas12a to have D172R, N563R, and K569R, which correspond to E174R, S542R, and K548R in an exemplary AsCas12a effector protein (SEQ ID NO: 84).
It is contemplated that amino acid substitutions that can be made are not limited by the present disclosure, and other substitutions can be incorporated into MbCas12a effector (nickase) proteins.
The present example describes MG29-1 effector proteins comprising one or more mutations at certain residues with increased activity compared to naturally occurring MG29-1 proteins. A similar strategy described by Example 1 was used to determine certain amino acid substitutions for improvement of activity(ies) of MG29-1 effectors proteins described by this example.
As described in Example 1, an AsCas12a effector protein comprising two or more substitutions (e.g., M537R and F870L, e.g., SEQ ID NO: 6) has been demonstrated to result in increased activity compared to wild-type AsCas12a (SEQ ID NO: 1) (e.g., see Zhang et al., Nature Communications, 12:3908 (2021)), the disclosure of which is hereby incorporated herein by reference in its entirety). It is an insight of the present disclosure that MG29-1 effector proteins and AsCas12a effector proteins share sequence conservation of domains (see
Additional combinations of amino acid substitutions (mutations) relative to exemplary naturally occurring MG29-1 amino acid sequence SEQ ID NO: 14 are shown in Table 16. Alignments of MG29-1 sequences and exemplary substitutions as described herein are provided in
The present example describes MG29-1 effector proteins (nickases) comprising one or more substitutions at certain residues with increased activity compared to naturally occurring MG29-1 proteins.
The disclosure contemplates that certain amino acid residues of an MG29-1 effector protein may be substituted (or mutated) to generate MG29-1 effector proteins that are nickases, optionally with increased activity(ies). For example, the present disclosure describes that K983 and/or R984 residues of an amino acid sequence provided in SEQ ID NO: 14 can be substituted (or mutated).
An AsCas12a effector protein comprising two or more substitutions (e.g., K1000G and S1001G) has been demonstrated to result in a nickase version of wild-type AsCas12a (SEQ ID NO: 1). It is an insight of the present disclosure that MG29-1 effector proteins and AsCas12a effector proteins share sequence conservation around these positions (see
An AsCas12a effector protein comprising the R1226A mutation has been demonstrated to result in a nickase version of an AsCas12a effector protein (SEQ ID NO: 83). It is an insight of the present disclosure that MG29-1 effector proteins and AsCas12a effector proteins share sequence conservation around these positions (see
Additional combinations of these nickase-inducing mutations with amino acid substitutions (mutations) that increase activity relative to exemplary naturally occurring MG29-1 amino acid sequence SEQ ID NO: 14 are shown in Table 17. Alignments of MG29-1 sequences and exemplary substitutions as described herein are provided in
In some embodiments, an MG29-1 effector protein can comprise a combination of the R1192 mutation with mutations of residues E172, N577, and K583 of MG29-1 to produce effector proteins with increased nickase activity. In particular, mutation of residues E172, N577, and K583 of MG29-1 to have E172R, N577R, and K583R, which correspond to E174R, S542R, and K548R in an exemplary AsCas12a effector protein (SEQ ID NO: 84).
It is contemplated that amino acid substitutions that can be made are not limited by the present disclosure, and other substitutions can be incorporated into MG29-1 effector (nickase) proteins.
The present example describes ErCas12a effector proteins (nucleases) comprising one or more substitutions at certain residues with increased activity compared to wild-type ErCas12a proteins.
The disclosure contemplates that certain amino acid residues of an ErCas12a effector protein may be substituted (or mutated) to generate ErCas12a effector proteins with increased activity(ies). For example, the present disclosure describes that I524 and/or F840 residues of an amino acid sequence provided in SEQ ID NO: 15 can be substituted (or mutated). The present disclosure also describes that other residues of an amino acid sequence provided in SEQ ID NO: 15 can be substituted (or mutated) in various combinations.
An AsCas12a effector protein comprising two or more substitutions (e.g., M537R and F870L, e.g., SEQ ID NO: 6) has been demonstrated to result in increased activity compared to wild-type AsCas12a (SEQ ID NO: 1) (e.g., see Zhang et al., Nature Communications, 12:3908 (2021)). It is an insight of the present disclosure that ErCas12a (MAD7) effector proteins and AsCas12a effector proteins share sequence conservation of domains (see
Additional combinations of amino acid substitutions (mutations) relative to exemplary ErCas12a wild-type amino acid sequence SEQ ID NO: 15 are shown in Table 18. Alignments of ErCas12a sequences with corresponding AsCas12a or Lb2Cas12a sequences and exemplary substitutions as described herein are provided in
It is contemplated that amino acid substitutions that can be made are not limited by the present disclosure, and other substitutions can be incorporated into ErCas12a effector (nuclease) proteins.
The present example describes ErCas12a effector proteins (nickases) comprising one or more substitutions at certain residues with increased activity compared to wild-type ErCas12a proteins.
The disclosure contemplates that certain amino acid residues of an ErCas12a effector protein may be substituted (or mutated) to generate ErCas12a effector proteins that are nickases, optionally with increased activity(ies). For example, the present disclosure describes that K969 and/or K970 residues of an amino acid sequence provided in SEQ ID NO: 15 can be substituted (or mutated).
An AsCas12a effector protein comprising two or more substitutions (e.g., K1000G and S1001G) has been demonstrated to result in a nickase version of wild-type AsCas12a (SEQ ID NO: 1). It is an insight of the present disclosure that ErCas12a (MAD7) effector proteins and AsCas12a effector proteins share sequence conservation around these positions (see
An AsCas12a effector protein comprising a substitution (e.g., R1226A) has been demonstrated to result in a nickase version of an AsCas12a effector protein (SEQ ID NO: 83). It is an insight of the present disclosure that ErCas12A effector proteins and AsCas12a effector proteins share sequence conservation around these positions (see
Additional combinations of these nickase-inducing mutations with amino acid substitutions (mutations) that increase activity relative to exemplary ErCas12a wild-type amino acid sequence SEQ ID NO: 15 are shown in Table 19. Alignments of ErCas12a sequences with corresponding AsCas12a or Lb2Cas12a sequences and exemplary substitutions as described herein are provided in
In some embodiments, an ErCas12a effector protein can comprise a combination of the R1173 mutation with mutations of residues K169, D529, and K535 of ErCas12a to produce effector proteins with increased nickase activity. In particular, mutation of residues K169, D529, and K535 of ErCas12a to have K169R, D529R, and K535R, which correspond to E174R, S542R, and K548R in an exemplary AsCas12a effector protein (SEQ ID NO: 84).
It is contemplated that amino acid substitutions that can be made are not limited by the present disclosure, and other substitutions can be incorporated into ErCas12a effector (nickase) proteins.
It is to be understood that while the disclosure has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the present disclosure, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.
This application claims the benefit of U.S. Provisional Applications 63/283,690, filed Nov. 29, 2021, 63/283,770, filed Nov. 29, 2021, 63/283,965, filed Nov. 29, 2021, 63/301,953, filed Jan. 21, 2022, 63/301,955, filed Jan. 21, 2022, and 63/301,956, filed Jan. 21, 2022, the contents of each of which are hereby incorporated by reference in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/080510 | 11/28/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63301953 | Jan 2022 | US | |
63301955 | Jan 2022 | US | |
63301956 | Jan 2022 | US | |
63283690 | Nov 2021 | US | |
63283770 | Nov 2021 | US | |
63283965 | Nov 2021 | US |