The present invention relates to the field of gene-editing.
Researchers are aggressively exploring the use of Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) systems in order to modify DNA. To date, the vast majority of the work in this field has been in Cas9 systems. In these systems, a tracrRNA (trans-activating CRISPR RNA) and a crRNA (CRISPR RNA) hybridize to recruit a Cas9 protein and then direct the Cas9 protein to a DNA location that is complementary to a sequence within the crRNA. The complementary sequence within the DNA thus becomes a target site, and the Cas9 protein may, based on its functional domain, cause editing at this target site.
Despite the now well-recognized power of the Cas9 systems, those systems are not effective in all applications. Among the limitations of Cas9 systems are that the functional domains upon which the Cas9 systems can act are defined by the functional domain of the Cas9 protein that one uses and that the use of both a tracrRNA and a crRNA can be cumbersome.
Other Cas proteins are known. Among these other Cas proteins, the potential of which has not been fully explored, are those within the Type V family, particularly those that do not require the presence of a tracrRNA to function. Within these systems, one may use a single guide RNA (gRNA) that contains a crRNA sequence. This crRNA sequence can associate with the Cas protein of interest without needing to be associated with a tracrRNA. The absence of a need for a tracrRNA provides an underexplored possibility of developing improved gRNAs as well as complexes and systems that incorporate and use them.
The present invention provides novel and non-obvious gRNA-ligand binding complexes, base editing complexes, and methods for base editing. Through the use of various embodiments of the present invention, one may be able to efficiently and effectively cause base editing ex vivo, in vitro, and in vivo. Further, some embodiments of the present invention provide modular designs that allow for the same Type V Cas protein to be directed to different targeting sites and optionally associated with different effector proteins at the same or different sites.
According to a first embodiment, the present invention provides a gRNA-ligand binding complex, wherein the gRNA-ligand binding complex comprises: (a) a gRNA, wherein the gRNA is 35 to 60 or 36 to 60 nucleotides long and the gRNA has a crRNA sequence, wherein the crRNA sequence is 35 to 60 or 36 to 60 nucleotides long and the crRNA sequence comprises a Cas association region, wherein the Cas association region is 14 to 37 or 18 to 30 nucleotides long and a targeting region, wherein the targeting region is 14 to 37 or 18 to 30 or 18 to 20 nucleotides long and the Cas association region is capable of retaining association with an RNA binding domain of a Type V Cas protein in the absence of a tracrRNA; and (b) a ligand binding moiety, wherein the ligand binding moiety is either (i) directly bound to the gRNA, or (ii) bound to the gRNA through a linker. In one embodiment, the gRNA of the gRNA-ligand binding complex comprises or consists essentially of a chemically modified or unmodified sequence that is or encodes SEQ ID NO: 137.
According to a second embodiment, the present invention provides a base editing complex comprising: a gRNA-ligand binding complex of the present invention and a Type V Cas protein, wherein the Cas association region of the gRNA-ligand binding complex is associated with the Type V Cas protein. Optionally, the ligand binding moiety is reversibly associated with a ligand that is attached to or a part of an effector molecule.
According to a third embodiment, the present invention provides a method for base editing. The method comprises exposing a base editing complex of the present invention to double stranded DNA (“dsDNA”) or single stranded DNA (“ssDNA”). The base editing complex may be exposed to the dsDNA or ssDNA under conditions that permit base editing.
When an effector is attached to (or contains) a ligand, the system has a modular design. The presence of the ligand binding moiety within the gRNA-ligand binding complex allows that complex to associate with the corresponding ligand associated with (or contained within) the effector. Thus, the ligand binding moiety is associated with the gRNA in a manner and orientation that allows it to be capable of associating with a ligand. Similarly, the ligand is attached to or associated with the effector in a manner that renders it capable of reversibly associating with the ligand binding moiety.
When the ligand and the ligand binding moiety are associated with each other, the effector that is associated with the ligand will become part of any base editing complex that contains the gRNA-ligand binding complex. When the base editing complex also contains a Cas protein, that Cas protein and the effector can be retained in the same locality, e.g., at or near a target site of interest.
Thus, if one wishes to use a particular effector with the Cas protein, one only needs to associate that effector with the ligand that is capable of reversibly associating with the ligand binding moiety that is part of the base editing complex that contains that Cas protein. To change the effector from one system to the next, one need only change the effector-ligand. Consequently, one can use the same gRNA-ligand binding complex and its associated Cas protein with a plurality of different effectors. The plurality of different effectors may be used sequentially in the same system by associating and dissociating their ligands with the ligand binding moieties or simultaneously or sequentially in different systems.
Reference will now be made in detail to various embodiments of the present invention, examples of which are illustrated in the accompanying figures. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, unless otherwise indicated or implicit from context, the details are intended to be examples and should not be deemed to limit the scope of the invention in any way. Additionally, features described in connection with the various or specific embodiments are not to be construed as not appropriate for use in connection with other embodiments disclosed herein unless such exclusivity is explicitly stated or implicit from context.
Headers are provided herein for the convenience of the reader and do not limit the scope of any of the embodiments disclosed herein.
Definitions
Unless otherwise stated or apparent from context, the following terms shall have the meanings set forth below:
The phrase “2′ modification” refers to a nucleotide unit having a sugar moiety that is modified at the 2′ position of the sugar moiety. An example of a 2′ modification is a 2′-O-alkyl modification that forms a 2′-O-alkyl modified nucleotide or a 2′ halogen modification that forms a 2′ halogen modified nucleotide.
The phrase “2′-O-alkyl modified nucleotide” refers to a nucleotide unit having a sugar moiety, for example a, deoxyribosyl or ribosyl, moiety that is modified at the 2′ position such that an oxygen atom is attached both to the carbon atom located at the 2′ position of the sugar and to an alkyl group. In various embodiments, the alkyl moiety consists of or consists essentially of carbon(s) and hydrogens. When the O moiety and the alkyl group to which it is attached are viewed as one group, they may be referred to as an O-alkyl group, e.g., —O-methyl, —O-ethyl, —O-propyl, —O-isopropyl, —O-butyl, —O-isobutyl, —O-ethyl-O-methyl (—OCH2CH2OCH3), and —O-ethyl-OH (—OCH2CH2OH). A 2′-O-alkyl modified nucleotide may be substituted or unsubstituted.
The phrase “2′ halogen modified nucleotide” refers to a nucleotide unit having a sugar moiety, for example a deoxyribosyl moiety that is modified at the 2′ position such that the carbon at that position is directly attached to a halogen species, e.g., Fl, Cl, or Br.
A “ligand binding moiety” refers to a moiety such as an aptamer e.g., oligonucleotide or peptide or another compound that binds to a specific ligand and can reversibly or irreversibly be associated with that ligand.
The term “modified nucleotide” refers to a nucleotide having at least one modification in the chemical structure of the base, sugar and/or phosphate, including, but not limited to, 5-position pyrimidine modifications, 8-position purine modifications, modifications at cytosine exocyclic amines, and substitution of 5-bromo-uracil or 5-iodouracil; and 2′-modifications, including but not limited to, sugar-modified ribonucleotides in which the 2′-OH is replaced by a group such as an H, OR, R, halo, SH, SR, NH2, NHR, NR2, or CN, wherein R is an alkyl.
Modified bases refer to nucleotide bases such as, for example, adenine, guanine, cytosine, thymine, uracil, xanthine, inosine, and queuosine that have been modified by the replacement or addition of one or more atoms or groups. Some examples of these types of modifications include, but are not limited to, alkylated, halogenated, thiolated, aminated, amidated, or acetylated bases, alone and in various combinations. More specific modified bases include, for example, 5-propynyluridine, 5-propynylcytidine, 6-methyladenine, 6-methylguanine, N,N,-dimethyladenine, 2-propyladenine, 2-propylguanine, 2-aminoadenine, 1-methylinosine, 3-methyluridine, 5-methylcytidine, 5-methyluridine and other nucleotides having a modification at the 5 position, 5-(2-amino)propyluridine, 5-halocytidine, 5-halouridine, 4-acetylcytidine, 1-methyladenosine, 2-methyladenosine, 3-methylcytidine, 6-methyluridine, 2-methylguanosine, 7-methylguanosine, 2,2-dimethylguanosine, 5-methylaminoethyluridine, 5-methyloxyuridine, deazanucleotides such as 7-deaza-adenosine, 6-azouridine, 6-azocytidine, 6-azothymidine, 5-methyl-2-thiouridine, other thio bases such as 2-thiouridine and 4-thiouridine and 2-thiocytidine, dihydrouridine, pseudouridine, queuosine, archaeosine, naphthyl and substituted naphthyl groups, any O— and N-alkylated purines and pyrimidines such as N6-methyladenosine, 5-methylcarbonylmethyluridine, uridine 5-oxyacetic acid, pyridine-4-one, pyridine-2-one, phenyl and modified phenyl groups such as aminophenol or 2,4,6-trimethoxy benzene, modified cytosines that act as G-clamp nucleotides, 8-substituted adenines and guanines, 5-substituted uracils and thymines, azapyrimidines, carboxyhydroxyalkyl nucleotides, carboxyalkylaminoalkyl nucleotides, and alkylcarbonylalkylated nucleotides. Modified nucleotides also include those nucleotides that are modified with respect to the sugar moiety, as well as nucleotides having sugars or analogs thereof that are not ribosyl. For example, the sugar moieties may be, or be based on, mannoses, arabinoses, glucopyranoses, galactopyranoses, 4-thioribose, and other sugars, heterocycles, or carbocycles.
The phrase “codes for” and the term “encodes” mean that one sequence contains either a sequence that is identical to a referenced nucleotide sequence, a DNA or RNA equivalent of the referenced nucleotide sequence, or a DNA or RNA or a sequence that is a DNA or RNA complement of the referenced nucleotide sequence. Thus, when one refers to a sequence that codes for or encodes a recited DNA sequence, one refers to a sequence that unless otherwise specified is any one of the following: the same DNA sequence, a complement of the DNA sequence, the RNA equivalent of that sequence, or the RNA complement of that sequence or any of the aforementioned in which one or more ribonucleotides is substituted for its deoxyribonucleotide counterpart or one or more deoxyribonucleotides is substituted for its ribonucleotide counterpart.
The term “complementarity” refers to the ability of a nucleic acid to form one or more hydrogen bonds with another nucleic acid sequence by either traditional Watson-Crick base-pairing or other non-traditional types. A percent complementarity indicates the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 being 50%, 60%, 70%, 80%, 90%, and 100% complementary, respectively). “Perfectly complementary” means that all the contiguous residues of a nucleic acid sequence will hydrogen bond with the same number of contiguous residues in a second nucleic acid sequence. “Substantially complementary” as used herein refers to a degree of complementarity that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99%, over a region of for example, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more consecutive nucleotides, or refers to two nucleic acids that hybridize under stringent conditions.
The terms “hybridization” and “hybridizing” refer to a process in which completely, substantially, or partially complementary nucleic acid strands come together under specified hybridization conditions to form a double-stranded structure or region in which the two constituent strands are joined by hydrogen bonds. Unless otherwise stated, the hybridization conditions are naturally occurring or lab designed conditions. Although hydrogen bonds typically form between adenine and thymine or uracil (A and T or U) or between cytidine and guanine (C and G), other base pairs may form (see e.g., Adams et al., The Biochemistry of the Nucleic Acids, 11th ed., 1992).
The term “nucleotide” refers to a ribonucleotide or a deoxyribonucleotide or modified form thereof, as well as an analog thereof. Nucleotides include species that comprise purines, e.g., adenine, hypoxanthine, guanine, and their derivatives and analogs, as well as pyrimidines, e.g., cytosine, uracil, thymine, and their derivatives and analogs. Preferably, a nucleotide comprises a cytosine, uracil, thymine, adenine, or guanine moiety. Further, the term nucleotide also includes those species that have a detectable label, such as for example a radioactive or fluorescent moiety, or mass label attached to the nucleotide. The term nucleotide also includes what are known in the art as universal bases. By way of example, universal bases include but are not limited to 3-nitropyrrole, 5-nitroindole, or nebularine. Nucleotide analogs are, for example, meant to include nucleotides with bases such as inosine, queuosine, xanthine, sugars such as 2′-methyl ribose, and non-natural phosphodiester internucleotide linkages such as methylphosphonates, phosphorothioates, phosphoroacetates and peptides.
The terms “subject” and “patient” are used interchangeably herein to refer to an organism. e.g., a vertebrate, preferably a mammal, more preferably a human Mammals include, but are not limited to, murines, simians, humans, farm animals, sport animals, and pets such as dogs and cats. The tissues, cells and their progeny of an organism or other biological entity obtained in vivo or cultured in vitro are also encompassed within the terms subject and patient. Additionally, in some embodiments, a subject may be an invertebrate animal, for example, an insect or a nematode; while in others, a subject may be a plant or a fungus.
As used herein, “treatment,” “treating,” “palliating,” and “ameliorating” are used interchangeably. These terms refer to an approach for obtaining beneficial or desired results including, but not limited to, a therapeutic benefit and/or a prophylactic benefit. By therapeutic benefit is meant any therapeutically relevant improvement in or effect on one or more diseases, conditions, or symptoms under treatment. For prophylactic benefit, the complexes of the present invention may be administered to a subject, or a subject's cells or tissues, or those of another subject extracorporeally before re-administration, at risk of developing a particular disease, condition, or symptom, or to a subject reporting one or more of the physiological symptoms of a disease, condition, or symptom, even though the disease, condition, or symptom might not have yet been manifested.
As disclosed herein, a number of ranges of values are provided. It is understood that each intervening value, to the tenth of the unit of the lower limit, unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither, or both limits are included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
The term “about” generally refers to plus or minus 10% of the indicated number. For example, “about 10%” may indicate a range of 9% to 11%, and “about 20” may mean from 18-22. Other meanings of “about” may be apparent from the context, such as rounding off; for example “about 1” may also mean from 0.5 to 1.4.
Discussion
According to a first embodiment, the present invention comprises a gRNA-ligand binding complex that comprises, consists essentially of, or consists of both a gRNA and a ligand binding moiety. This complex has the ability to retain association with a Type V Cas protein in the absence of a tracrRNA. Within the gRNA-ligand binding complex, the gRNA may be covalently bound directly to the ligand binding moiety or bound to the ligand binding moiety through a linker.
gRNA
The gRNA of the gRNA-ligand binding complex is single strand of nucleotides. The nucleotides may be entirely RNA or a combination of ribonucleotides and other nucleotides such as deoxyribonucleotides. Each nucleotide may be unmodified, or one or more nucleotides may be modified, e.g., with one of the following modifications: 2′-O-methyl, 2′ fluoro or 2′aminopurine. In some embodiments over one or more ranges of one to forty or two to twenty or 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 35, or 36 nucleotides, there are consecutively modified nucleotides or a modification pattern of every second, or every third or every fourth nucleotide being modified at its 2′ position with all other nucleotides being unmodified. Additionally or alternatively, between one or more pairs or every pair of consecutive nucleotides, there may be modified or unmodified internucleotide linkages.
In some embodiments, the gRNA is 35 to 60 or 36 to 60 nucleotides long or 40 to 55 nucleotides long. The gRNA has a sequence that may consist of, consist essentially of or comprise a crRNA sequence. Within the crRNA sequence are a Cas association region, which also may be referred to as the repeat region, that is 14 to 37 or 18 to 30 nucleotides long or 18 to 30 nucleotides long or 20 to 25 nucleotides long and a targeting region, which also may be referred to as a spacer region, that is 14 to 37 or 18 to 30 nucleotides long or 20 to 25 nucleotides long.
The targeting region contains the targeting sequence, which is a variable sequence that may be selected based on where one wishes for the Cas protein and/or effector to cause base editing. Thus, the targeting region may be designed to include a region that is complementary and capable of hybridization to a pre-selected target site of interest. For example, the region of complementarity between the targeting region and the corresponding target site sequence may be about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, or more than 25 consecutive nucleotides in length or it may be at least 80%, at least 85%, at least 90%, or at least 95% complementary to a region of DNA over 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, or more than 25 consecutive nucleotides.
The Cas association region of the gRNA is designed such that it is capable of retaining association with an RNA binding domain of a Type V Cas protein in the absence of a tracrRNA. (Not all nucleotides within the Cas association need directly associate with the Cas protein.) Preferably, this association is possible under both naturally occurring conditions and under laboratory conditions in which the complex is to be used. In some embodiments, the gRNA has or encodes one of the following sequences:
The downstream portion of the crRNA sequence, shown as N16-30 in SEQ ID NO: 1 to SEQ ID NO: 12 or SEQ ID NO: 67 to SEQ ID NO: 71, corresponds to the targeting region and the sequence upstream of that sequence corresponds to the Cas association region. N refers to any modified or unmodified nucleotide. In SEQ ID NO: 1 to 12, N14-30 means that there can be 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides. In some embodiments, than there being 14 to 40 N nucleotides the are 14 to 37 nucleotides. In some embodiments, N is 16 to 30. In some embodiments, the Cas association region has a sequence that is at least 80%, at least 85%, at least 90%, at least 95% similar to or the same as the Cas association region (the region upstream of the Ns) of SEQ ID NO: 1 to SEQ ID NO: 12 or SEQ ID NO: 67 to SEQ ID NO: 71 or of a wildtype crRNA in a naturally occurring condition or endogenous to a naturally occurring or genetically modified organism.
Ligand Binding Moiety
The ligand binding moiety is an element that is capable of reversibly associating with a ligand by for example, forming non-covalent interactions. In some embodiments, the ligand binding moiety is an aptamer. The ligand binding moiety may be bound to the gRNA directly, e.g., through a covalent bond, or through a linker. The association of the ligand binding moiety with the gRNA, regardless of whether directly through a covalent bond or through a linker, may be at any of a number of locations. A ligand binding moiety is bound directly to a gRNA if it is bound to a nucleotide within the gRNA, e.g., to the backbone phosphate of a unit or to a sugar moiety or to a nitrogenous base of a nucleotide.
By way of non-limiting examples, the ligand binding moiety may be bound directly (through e.g., a covalent bond) to the 3′ end of the gRNA or to the 5′ end of the gRNA. Thus, the ligand binding moiety may be bound to the first or last nucleotide in the gRNA. When the ligand binding moiety is a nucleotide sequence and it is bound directly to the 5′ end or the 3′ end of the gRNA, it may be in the same 5′ to 3′ orientation as the gRNA. In these circumstances, there is a continuous strand of nucleotides that contains both the ligand binding moiety and the gRNA either
The ligand binding moiety may also be attached to the gRNA at a position other than the 5′ end or the 3′. When the ligand binding moiety is a nucleotide sequence it may be inserted in the gRNA, and thus there may for example be a first section of the gRNA that is 5′ of the ligand binding moiety and a second section of the gRNA that is 3′ of the ligand binding moiety such that there is one oligonucleotide sequence:
In some embodiments, when the ligand binding moiety is not a nucleotide sequence and it may be attached at either the 5′ or 3′ end of the gRNA to the phosphorous moiety, the sugar at e.g., the 2′, 3′ or 5′ position or the nitrogenous base. In other embodiments when the ligand binding moiety is not a nucleotide sequence, it may be bound to the gRNA at a location other than the 5′ or 3′ end of the gRNA, for example, it may be bound between two consecutive nucleotides as follows:
In some embodiments, one or more linkers binds the ligand binding moiety to the gRNA. In these embodiments, the linker and the ligand binding moiety each may independently comprise, consist essentially of, or consist of nucleotides. In some embodiments, each of the linker and the ligand binding moiety may independently comprise, consist essentially of, or consist of a moiety other than nucleotides. In some embodiments, one of the linker and the ligand binding moiety comprises, consists essentially of, or consists of a moiety other than nucleotides, while the other of the linker and the ligand binding moiety comprises, consists essentially of, or consists of nucleotides.
When the ligand binding moiety is a nucleotide sequence and it is bound through a linker that is also a nucleotide sequence to the 5′ end or the 3′ end of the gRNA, each of the gRNA, the linker and ligand binding moiety may be in the same 5′ to 3′ orientation. In these circumstances, there is a continuous strand of nucleotides that contains both the ligand binding moiety and the gRNA either
The ligand binding moiety may also be attached to the gRNA through a linker or two linkers at a position other than the 5′ end or the 3′. When the ligand binding moiety and the linker(s) are nucleotide sequences, they may be inserted in the gRNA, and thus there may, for example, be a first section of the gRNA that is 5′ of the ligand binding moiety and a second section of the gRNA that is 3′ of the ligand binding moiety. There may also be one or two linker sequences.
When there is only one linker sequence it may be either 5′ or 3′ of the ligand binding moiety such that the complex is
When there are two linker sequences, a first linker may be 5′ of the ligand binding moiety and the second linker may be 3′ of the ligand binding moiety such that the complex is
In some embodiments, each of the first section of gRNA, the first linker, the ligand binding moiety, the second linker, and the second section of gRNA are nucleotide sequences in the same orientation. In other embodiments, one or more of the first linker, ligand binding moiety and the second linker are in the opposite orientation to that of the first section of gRNA and the second section of gRNA, which are in the same orientation.
When the ligand binding moiety is between the first section of the gRNA and the second section of the gRNA (and if one or two linkers are present they are also between the first section of the gRNA and the second section of the gRNA), in some embodiments, the first section of the gRNA contains the entire Cas association region and the second section of the gRNA contains the entire targeting region. In other embodiments, the first section of the gRNA contains the entire Cas association region and a portion of the targeting region, while the second section of the gRNA contains the remainder of the targeting region. In other embodiments, the first section of the gRNA contains a portion of the Cas association region, while the second section of the gRNA contains the remainder of the Cas association and the entire targeting region. Relative to a gRNA that does not contain the ligand binding moiety, in a complex that contains the gRNA and the ligand binding moiety inserted, there may be no deletion of nucleotides from either the Cas association region or the targeting region. Alternatively, there may be a deletion of one or more nucleotide (e.g., 1 to 10 nucleotides) at the location of insertion.
When there are two linkers present, they may be of sufficient complementary such that they can hybridize to each under. For example, each linker may be 1 to 20 nucleotides long and the linkers may be at least 80%, at least 85%, at least 90%, at least 95% at least 98% or 100% complementary and have no bulges or one or more bulges.
When the linker is not a nucleotide sequence, it may be bound to the 5′ most nucleotide within the gRNA, the 3′ most nucleotides within the gRNA or a nucleotide other than the 5′ most nucleotide or the 3′ most nucleotide within the gRNA. Further, a linker that is not a nucleotide or oligonucleotide may be attached at any position of a sugar or nitrogenous base or be attached to or replace an internucleotide linkage. Additionally, in some embodiments, there are two non-nucleotide linkers or one nucleotide linker and one non-nucleotide linker.
In some embodiments, the gRNA forms a loop and the ligand binding moiety or the linker if present is bound to the loop. When bound to the loop of the gRNA, either directly or through a linker, the bonding may, for example, be at the first nucleotide in the loop, the second nucleotide in the loop, the third nucleotide in the loop, the fourth nucleotide in the loop, the center nucleotide in the loop if the loop has an odd number of nucleotides or one of the two center most nucleotides in the loop if the loop has an even number of nucleotides, or the last nucleotide in the loop. Any one or more of the aforementioned nucleotides and/or the 5′ and/or 3′ internucleotide linkage corresponding to them may be modified. These modifications may, for example, occur where the ligand binding moiety is bound to the gRNA (directly or through a linker) or only at locations other than where the ligand binding moiety is bound to the gRNA (directly or through a linker). For example, the ligand binding moiety can be attached to a 2′ position of a sugar or attached to a nitrogenous base in the gRNA oligonucleotide sequence.
In some embodiments, the ligand binding moiety comprises, consists essentially of, or consists of an oligonucleotide sequence that is unmodified or comprises one or more modified nucleotides. For example, the ligand binding moiety may be 10 to 50 or 18 to 50 nucleotides long. In one embodiment, the ligand binding moiety comprises, consists essentially of, or consists of SEQ ID NO: 13 (ACAUGAGGAUCACCCAUGU) or a sequence that is substantially similar to SEQ ID NO: 13. In some embodiments the ligand binding moiety forms a stem-loop structure. If there is no linker present, the ligand binding moiety may appear as an extension of the gRNA sequence immediately 5′ or 3′ of the gRNA or as an insert in the gRNA.
In some embodiments, the ligand binding moiety comprises, consists essentially of, or consists of biotin or streptavidin.
In some embodiments the ligand binding moiety can attach covalently or non-covalently.
In some embodiments, the ligand binding moiety is selected from the group consisting of moieties that associate with the following ligands: MS2 coat protein (MCP), Ku, PP7 coat protein (PCP), Com RNA binding protein or the binding domain thereof, SfMu, Sm7, Tat, Glutathione S-transferase (GST), CSY4, Qbeta, COM, pumilio, Anti-His Tag (6H7), SNAP-Tag, lambdaN22, a lectin (in which case ligand binding moiety may be carbohydrate or glycan or oligosaccharide), and PDGF beta-chain. In some embodiments, the ligand binding moiety is an aptamer that comprises deoxyribonucleotides, ribonucleotides or a combination of both. Therefore, as non-limiting examples, one may use DNA aptamers, RNA aptamers, DNA aptamers with modified nucleosides in the backbones, RNA aptamers with modified nucleosides in the backbones and combinations thereof.
In some embodiments, a naturally occurring MS2 aptamer is used as the ligand binding moiety. In other embodiments, one uses an MS2 C-5 mutant or an MS2 F-5 mutant or a modified MS2, e.g., MS2 in which there is one or more, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12, modified nucleotides such as an amino purine, at position 10, wherein position 10 is the tenth nucleotide from the 5′ end of an aptamer. The 2-amino purine may, for example, be 2-amino purine is 2′ deoxy-2-aminopurine or 2′ ribose 2-aminopurine. The modification at any one position or may be in addition to a modification at another position or to the exclusion of a modification at any or all of the other positions.
In some embodiments, the ligand binding moiety is an aptamer that comprises a 5′ modified nucleotide, wherein the 5′ modified nucleotide comprises at least one of a 2′ modification, a 5′ PO4 group, or a modification of the nitrogenous base.
In some embodiments, the ligand binding moiety is an aptamer that is or comprises one part of an aptamer-ligand pair, and as discussed below, and the effector is linked to or comprises the other part of the aptamer-ligand pair. For example, the aptamer may comprise a MS2 operator motif that specifically binds to an MS2 coat protein, MCP. As persons of ordinary skill in the art will appreciate alternatively, the aptamer can comprise the MCP moiety (or other ligand) in which case the effector would comprise or be linked to the MS2 operator motif (or other corresponding ligand binding moiety).
Linkers
A linker, when present, may be a species that connects the ligand binding moiety to the gRNA. It may be attached to each of the ligand binding moiety and the gRNA at one location or it may be attached to either or both of the gRNA and the ligand binding moiety at a plurality of locations. Attachments at a plurality of locations may allow for greater control in three dimensional space of the ligand binding moiety and in turn the effector to be used.
By way of non-limiting examples, a linker may attach to the gRNA at one location and to the ligand binding moiety at two or more locations; or the linker may attach to the ligand binding moiety at one location and to the gRNA at two or more locations. When the linker is attached to the gRNA at two or more locations, the linker may be attached to the gRNA exclusively in the targeting region or exclusively in the Cas association region or in both regions.
In some embodiments, the linker comprises, consists essentially of, or consists of an oligonucleotide sequence and optionally the linker comprises at least one or a plurality of 2′ modifications, e.g., all nucleotides are 2′ modified nucleotides within the linker. The nucleotide sequence may be random or intentionally designed not to be undesirably complementary to sequence within the aptamer, the gRNA or the target site of the DNA. In some embodiments in which there are two linkers, the two linkers flank the ligand binding moiety.
In some embodiments, the linker comprises, consists essentially of, or consists of at least one phosphorothioate linkage.
In some embodiments, the linker comprises, consists essentially of, or consists of a levulinyl moiety.
In some embodiments, the linker comprises, consists essentially of, or consists of an ethylene glycol moiety.
In some embodiments, the linker comprises or is selected from the group consisting of 18S, 9S or C3.
In some embodiments, the linker is a nucleotide sequence that is one to sixty or one to twenty-four or two to twenty or five to fifteen nucleotides long. Additionally, in some embodiments, the linker is GC rich, e.g., having at least 50%, at least 60%, at least 70%, at least 80% or at least 90% GC nucleotides. When a linker comprises nucleotides, it may, for example, be single stranded or double stranded or partially single stranded and partially double stranded. Additionally, when a linker is an oligonucleotide, the linker may be exclusively RNA, exclusively DNA or a combination thereof.
In some embodiments, the linker is a nucleotide sequence that is upstream or downstream of the ligand binding moiety. When the linker is upstream of a ligand binding moiety and the gRNA is upstream of the linker, there may be another sequence that is complementary to the linker that is downstream of the ligand binding moiety. Similarly, when the linker is downstream of a ligand binding moiety and the gRNA is downstream of the linker, there may be another sequence that is complementary to the linker that is upstream of the ligand binding moiety. As persons of ordinary skill in the art will recognize, complementarity is determined when the oligonucleotide self-folds and the strands align with each relevant section in a 5′ to 3′ direction.
Thus, in some embodiments, the ligand binding moiety, e.g., MS2 has an upstream sequence that is 1 to 12 nucleotides long and a downstream sequence that is 1 to 12 nucleotides long, wherein the upstream and downstream sequences immediately flank the ligand binding moiety (i.e., there are no other nucleotides between the ligand binding moiety and each of the upstream and downstream sequences) and the upstream sequence is complementary to the downstream sequence. In some embodiments, each of the upstream sequence and the downstream sequence is 1 nucleotide long, 2 nucleotides long, 3 nucleotides long, 4 nucleotides long, 5 nucleotides long, 6 nucleotides long, 7 nucleotides long, 8 nucleotides long, 9 nucleotides long, 10 nucleotides long, 11 nucleotides long, or 12 nucleotides long. In one embodiment each of the upstream sequence and the downstream sequence comprises or is GC. When there are both upstream and downstream sequences, they may also be referred to as extension sequences.
In some embodiments, both the upstream and downstream sequence is two nucleotides long or three nucleotides long or four nucleotides long. In some embodiments, one of the upstream and downstream sequence is two nucleotides long and the other of the upstream sequence and the downstream sequence is four nucleotides long. In some embodiments, one or both of the linker sequences is or encodes GC or GCGC. In some embodiments, one of the upstream or downstream linker is GC and the other of the upstream or downstream linker is GCGC.
Modifications
In some embodiments, at least one of the gRNA or the ligand binding moiety is modified, or if a linker is present, at least one of the gRNA, the ligand binding moiety or the linker is modified. The modification refers to the introduction of a moiety or species that does not occur under naturally occurring conditions. Modifications may be used to increase one or both of stability and specificity. In some embodiments, stability is improved with respect to resistance to one or both of the active domain of the Cas protein (e.g., RuvC domain) and the active domain of one or more other enzymes within the system into which a complex of the present invention is introduced, including but not limited to any effector. The resistance may, in some embodiments, be caused by steric hindrance. In some embodiments, the modification(s) is/are located within and/or between one or more if not all of the nucleotides within the targeting region.
Specificity is improved when a modification reduces the likelihood of an off-target effect and/or increases the likelihood that a base editing complex of the present invention will reach its target site. Nucleotides may be modified at the ribose, phosphate linkage, and/or base moiety. For example, a phosphorothioate backbone may be used, at one, a plurality or all positions within the gRNA, the targeting region or the Cas association region and/or the ligand binding moiety and/or linker if present.
In some embodiments, the modification is the presence of one or more 2′ modified nucleotides (e.g., 2′-O-methyl or 2′-fluoro) and/or the presence of a phosphorothioate internucleotide linkage or the introduction of a 5′-PO4 group of the gRNA and/or ligand binding moiety.
When more than one modification is present, the modifications may, for example, all be in the targeting region; all be in the Cas association region; all be in the ligand binding moiety; all be in the linker if present; be in both the targeting region and the Cas association region; be in both the Cas association region and the ligand binding moiety; be in both the Cas association region and the linker if present; be in both the targeting region and the ligand binding moiety; be in both the targeting region and the linker if present; be in both the ligand binding moiety and the linker if present; be in all three of the Cas association region, the targeting region and the ligand binding moiety; be in the Cas association region, the targeting region and the linker if present; be in the Cas association region, the ligand binding moiety and the linker if present; be in the targeting region, the ligand binding moiety and the linker if present; or be in each of the Cas association region, the targeting region, the ligand binding moiety and the linker if present.
In some embodiments, there are one to sixty or one to thirty or one to ten or ten to twenty or twenty to thirty or thirty to forty or forty to fifty or fifty to sixty 2′ modifications. By way of non-limiting examples, the set of 2′ modifications may be located in the targeting region; the set of 2′ modifications may be located in the ligand binding moiety if the ligand binding moiety is or comprises an oligonucleotide sequence; or the set of 2′ modifications may be located in the Cas association region. The modifications may be on consecutive nucleotides or there may be one or more pairs of unmodified nucleotides between modified nucleotides in regular or irregular patterns. By way of a further non-limiting example, within a gRNA any one or more of positions 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18 comprises a 2′-O-alkyl group, wherein the positions are measure from the 5′ end or the 3′ end of the gRNA.
In some embodiments, in addition to or in the absence of 2′ modified nucleotides there are modified internucleotide linkages such as a phosphorothioate linkage. Examples of modifications to the backbones of the gRNA, the aptamer (in an oligonucleotide), and the linker (if present and an oligonucleotide), include but are not limited to phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkyl phosphonates including 3′-alkylene phosphonates, 5′-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3′-amino phosphoramidate and aminoalkylphosphoramidates, phosphorodiamidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, selenophosphates and boranophosphates having normal 3′-5′ linkages, 2′-5′ linked analogs of these, and those having inverted polarity wherein one or more internucleotide linkages is a 3′ to 3′, 5′ to 5′ or 2′ to 2′ linkage. Suitable oligonucleotides having inverted polarity comprise a single 3′ to 3′ linkage at the 3′-most internucleotide linkage i. e., a single inverted nucleoside residue that may be abasic (the nucleobase is missing or has a hydroxyl group in place thereof). Various salts (such as, for example, potassium or sodium), mixed salts and free acid forms of the aforementioned internucleotide linkages are also included within the scope of the present invention.
Also within the scope of the present invention is the use of polynucleotide backbones that do not include a phosphorus atom therein and instead have backbones that are formed by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatomic or heterocyclic internucleoside linkages. These modifications include those having morpholino linkages (formed in part from the sugar portion of a nucleoside); siloxane backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; riboacetyl backbones; alkene containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having mixed N, O, S and CH2 component parts.
In some embodiments, one or more of the parts of a complex has one to sixty or one to twenty or one to ten or ten to twenty or twenty to thirty or thirty to forty or forty to fifty or fifty to sixty phosphorothioate linkages. These phosphorothioate linkages may: all be in the Cas association region; all be in the ligand binding moiety; all be in the linker; be in both the targeting region and the Cas association region; be in both the Cas association region and the ligand binding moiety; be in both the Cas association region and the linker if present; be in both the targeting region and the ligand binding moiety; be in both the targeting region and the linker if present; be in both the ligand binding moiety and the linker if present; be in all three of the Cas association region, the targeting region and the ligand binding moiety; be in the Cas association region, the targeting region and the linker if present; be in the Cas association region, the ligand binding moiety and the linker if present; be in the targeting region, the ligand binding moiety and the linker if present; or be in each of the Cas association region, the targeting region, the ligand binding moiety and the linker if present.
Any nucleotide within a complex of the present invention may include one or more substituted sugar moieties. These nucleotides may comprise a sugar substituent group selected from: OH; H; F; O—, S—, or N-alkyl; O—, S—, or N-alkenyl; O—, S— or N-alkynyl; or O-alkyl-Co-alkyl, wherein the alkyl, alkenyl and alkynyl may be substituted or unsubstituted C1 to C10 alkyl or C2 to C10 alkenyl and alkynyl. Particularly suitable are O((CH2)nO)mCH3, O(CH2)nOCH3, O(CH2)nNH2, O(CH2)nCH3, O(CH2)nONH2, and O(CH2)nON((CH2)nCH3)2, where n and m are from 1 to about 10. Other suitable nucleotides comprise a sugar substituent group selected from: C1 to C10 lower alkyl, substituted lower alkyl, alkenyl, alkynyl, alkaryl, aralkyl, O-alkaryl or O-aralkyl, SH, SCH3, OCN, Cl, Br, CN, CF3, OCF3, SOCH3, SO2CH3, ONO2, NO2, N3, NH2, heterocycloalkyl, heterocycloalkaryl, aminoalkylamino, polyalkylamino, substituted silyl, an RNA cleaving group, a reporter group, an intercalator, a group for improving the pharmacokinetic properties of an oligonucleotide, or a group for improving the pharmacodynamic properties of an oligonucleotide, and other substituents having similar properties. By way of a non-limiting example, a suitable modification includes 2′-methoxyethoxy (2′-O—CH2CH2OCH3, also known as 2′-O-(2-methoxyethyl) or 2′-MOE) (Martin et al., Helv. Chim. Acta, 1995, 78, 486-504) or another alkoxyalkoxy group. A further suitable modification includes 2′-dimethylaminooxyethoxy, i.e., a O(CH2)2ON(CH3)2 group, also known as 2′-DMAOE, and 2′-dimethylaminoethoxyethoxy (also known in the art as 2′-O-dimethyl-amino-ethoxy-ethyl or 2′-DMAEOE), i.e., 2′-O—CH2—O—CH2—N(CH3)2.
Other suitable sugar substituent groups include methoxy (—O—CH3), aminopropoxy (—OCH2CH2CH2NH2), allyl (—CH2—CH═CH2), —O-allyl CH2—CH═CH2) and fluoro (F). 2′-sugar substituent groups may be in the arabino (up) position or ribo (down) position. A suitable 2′-arabino modification is 2′-F. Similar modifications may also be made at other positions on the oligomeric compound, particularly the 3′ position of the sugar on the 3′ terminal nucleoside or in 2′-5′ linked oligonucleotides and the 5′ position of 5′ terminal nucleotide.
Any nucleotide within a complex of the present invention may also include nucleobase (often referred to in the art simply as “base”) modifications or substitutions. Modified nucleobases include, but are not limited to other synthetic and natural nucleobases such as 5-methylcytosine (5-me-C), 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl (—C═C—CH3) uracil and cytosine and other alkynyl derivatives of pyrimidine bases, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8-substituted adenines and guanines, 5-halo particularly 5-bromo, 5-trifluoromethyl and other 5-substituted uracils and cytosines, 7-methylguanine and 7-methyladenine, 2-F-adenine, 2-amino-adenine, 8-azaguanine and 8-azaadenine, 7-deazaguanine and 7-deazaadenine and 3-deazaguanine and 3-deazaadenine. Further modified nucleobases include, but are not limited to tricyclic pyrimidines such as phenoxazine cytidine (1H-pyrimido(5,4-b)(1,4)benzoxazin-2(3H)-one), phenothiazine cytidine (1H-pyrimido(5,4-b)(1,4)benzothiazin-2(3H)-one), G-clamps such as a substituted phenoxazine cytidine (e.g. 9-(2-aminoethoxy)-H-pyrimido(5,4-(b) (1,4)benzoxazin-2(3H)-one), carbazole cytidine (2H-pyrimido(4,5-b)indol-2-one), pyridoindole cytidine (H-pyrido(3′,2′:4,5)pyrrolo(2,3-d)pyrimidin-2-one) and 5-methoxy uracil.
Heterocyclic base moieties may also include, but are not limited to, those in which the purine or pyrimidine base is replaced with other heterocycles, for example, 7-deaza-adenine, 7-deazaguanosine, 2-aminopyridine and 2-pyridone. Examples of other nucleobases include those disclosed in U.S. Pat. No. 3,687,808, those disclosed in The Concise Encyclopedia Of Polymer Science And Engineering, pages 858-859, Kroschwitz, J. I., ed. John Wiley & Sons, 1990, those disclosed by Englisch et al., Angewandte Chemie, International Edition, 1991, 30, 613, and those disclosed by Sanghvi, Y. S., Chapter 15, Antisense Research and Applications, pages 289-302, Crooke, S. T. and Lebleu, B., ed., CRC Press, 1993. Certain of these nucleobases are useful for increasing the binding affinity of an oligomeric compound: 5-substituted pyrimidines, 6-azapyrimidines and N-2, N-6 and O-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine. Additionally, 5-methylcytosine substitutions may be advantageous when combined with 2′-O-methoxyethyl sugar modifications.
In some embodiments, there are two ligand binding moieties associated with a gRNA: a first ligand binding moiety and a second ligand binding moiety. Optionally, when there are two ligand binding moieties, there may be two linkers: a first linker and a second linker, wherein the first ligand binding moiety is attached to the first linker and the second ligand binding moiety is attached to the second linker. In these embodiments, the first linker and the second linker may each be attached to the Cas association region; or the first linker and the second linker may each be attached to the targeting region; or one of the first linker and the second linker may be attached to the Cas association region and the other of the first linker and the second linker may be attached to the targeting region.
Base Editing Complexes
According to another embodiment of the present invention, there is a base editing complex. The base editing complex comprises, consists essentially of, or consists of a gRNA-ligand binding complex of the present invention; and a Type V Cas protein, wherein the Cas association region of the gRNA-ligand binding complex is associated with the Type V Cas protein. Thus, the gRNA is capable of associating the gRNA with the Cas protein and delivering the Cas protein to the target nucleic acid without the need of a tracrRNA.
An example of a base editing complex of the presentation invention is shown in
Type V Cas Protein
In general, a Cas protein includes at least one RNA binding domain. The RNA binding domain interacts with the gRNA at the Cas association region. The Type V Cas protein that is of use in the present invention is one with which the gRNA-ligand binding complex can associate without there being a tracrRNA present. In some embodiments, the Type V Cas protein is an endonuclese that contains a RuvC domain. This RuvC domain may be mutated such that the endonuclease activity is deactivated. In some embodiments, the protein is a nickase that contains an active or deactivated RuvC domain.
Examples of Type V Cas proteins that may be of use in connection with the present invention include, but are not limited to, Cas12a, MAD7 (an engineered variant of ErCas12a), Cas12h, Cas12i, and Cas12j (CasPhi, also known as Casϕ) in active or deactivated form.
In some embodiments, the Type V Cas proteins comprise a fusion protein having: (a) an active, partially deactivated or deactivated Type V Cas protein; and (b) a uracil DNA glycosylase (UNG) inhibitor peptide (UGI). The UGI peptide can be fused directly to the Type V Cas protein or through a linker peptide comprised of 1 to 100 hundred amino acid residues. In some embodiments, the UGI comprises the wild type UGI sequence from the Bacillus phage PBS2
(https://www.ncbi.nlm nih.gov/protein/P14739):
MLLTSDAPEYKPWALVIQDSNGENKIKML (SEQ ID NO: 140). In some embodiments, the UGI comprises variants of SEQ ID NO: 140 that comprises a fragment of the wild type UGI peptide or a homologous amino acid sequence to SEQ ID NO: 28. In some embodiments, the UGI fragment of homologous sequence comprises at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 99%, or at least 99.5% homology to the wild type UGI peptide sequence (SEQ ID NO: 140).
In some embodiments, the active or deactivated Type V Cas protein comprises a fusion with two or more UGI peptides or variants. The UGI peptides, or variants of the UGI peptide, can be connected directly to another UGI peptide or Type V Cas protein or via a linker of 1 to 100 amino acid residues to another UGI peptide or Type V Cas protein.
The Cas protein or Cas protein fusion may be provided in purified or isolated form or can be part of a composition or complex. Preferably, when in a composition, the protein is first purified to some extent, more preferably to a high level of purity (e.g., about 80%, 90%, 95%, or 99% or higher). Compositions in which the complexes and components of the present invention may be stored and transported may be any type of composition desired, e.g., aqueous compositions suitable for use as, or inclusion in, a composition for RNA-guided targeting.
Effectors
The base editing complexes of the present invention may contain an effector that is attached to a ligand. The ligand is capable of reversibly or irreversibly associating with the ligand binding moiety. Thus, the ligand binding moiety recruits an effector, e.g. base editing enzyme that is fused to or otherwise associated with the ligand, because the ligand binding moiety is capable of retaining association with the ligand. This design may be particularly advantageous because it provides a modular design in which the nucleic acid sequence targeting function of the gRNA and effector function reside in different molecules. For example, to introduce modifications serially at the same site, one may use different effectors that are associated with the same ligand. Conversely, to introduce the same modifications at different sites, one may use the same ligand binding moiety with different gRNAs while using the same effector-ligand. Thus, this design allows one to multiplex a system without an undesirable burden of fusing effectors to either gRNAs or Cas proteins.
Examples of effectors that may be of use in connection with the present invention are deaminases such as those that have cytidine deamination or adenine deamination activity, as well as transcriptional regulators, repair enzymes, epigenetic modifiers, histone acetylases, deacetylases, methylases (of histones ad nucleotides), and demethylases (of histones and nucleotides). In some embodiments, the effector is selected from the group consisting of AID, CDA, APOBEC1, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, ADA, ADAR and tRNA adenosine deaminase. Examples of effectors and the types of genetic change that they case are provided in table 1.
In some embodiments, the base editing complex comprises two or more effectors. When there are two effectors they may be referred to as: a first effector and a second effector. Each effector may be attached to a different ligand binding moiety through a different ligand. Alternatively, when there are two effectors present, one is attached to a ligand and associated with the gRNA through the ligand binding moiety and another is attached directly to the Cas protein. Examples of sequences of deaminases that may be incorporated into the present invention include but are not limited to:
Ligands
As noted above, the effector is bound to a ligand, e.g., by one or more covalent bonds. A non-exhaustive list of examples of ligand binding moiety-ligand pairs that may be used in various embodiments of the present invention is provided in Table 2. Both unmodified and chemically modified versions or the ligand binding moieties and ligands are within the scope of the present invention.
Some of the sequences for the above binding pairs are listed below.
In each of the aforementioned sequences, one may, for example, use the identical sequence or sequences that have one or more insertions, deletions or substitutions in one or both sequences of a binding pair. By way of a non-limiting example, for either or both members of a binding pair one may use a sequence that is at least 80%, at least 85%, at least 90%, at least 95%, or at least 98% the same as an aforementioned sequence.
Additional Chemistries
In some embodiments, the base-editing complexes of the present invention are combined with additional chemistry technologies. For example, in some embodiments, a base editing complex further comprises a cysteine/selenocysteine tag. In some embodiments, the base editing complex comprises or is associated with elements for cycloaddition via click chemistry.
Methods For Base-Editing
In another embodiment, the present invention provides methods for base editing. In these methods, one exposes a base editing complex of the present invention to double-stranded DNA or to a solution that contains dsDNA or to a cell that contains dsDNA or to a subject. The method may occur in vitro or be conducted in vivo or ex vivo and may comprise delivering the base editing complex to a subject as part of a medicament for treatment.
These methods may, for example, be used to modify an immune cell selected from a T cell (including a primary T cell), Natural Killer (NK cell), B cell, or CD34+ hematopoietic stem progenitor cell (HSPC). The immune cell may be an engineered immune cell, such as T-cell comprising a CAR or TCR. The methods herein may thus be applied to engineer further a cell that has already been modified to include a CAR and/or TCR that is useful in therapy. By way of further example, primary immune cells, either naturally occurring within a host animal or patient, or derived from a stem cell or an induced pluripotent stem cell (iPSC) may be genetically modified using the methods and complexes provided herein. Suitable stem cells include, but are not limited to, mammalian stem cells such as human stem cells, including, but not limited to, hematopoietic, neural, embryonic, induced pluripotent stem cells (iPSC), mesenchymal, mesodermal, liver, pancreatic, muscle, and retinal stem cells. Other stems cells include, but are not limited to, mammalian stem cells such as mouse stem cells, e.g., mouse embryonic stem cells.
Provided herein are also methods for genome engineering (e.g., altering or manipulating the expression of one or more genes or one or more gene products) in prokaryotic or eukaryotic cells, in vitro, in vivo, or ex vivo. In particular, the methods provided herein may be useful for targeted base editing disruption in mammalian cells including primary human T cells, natural killer (NK) cells, CD34+ hematopoietic stem and progenitor cells (HSPCs), such as HSPCs isolated from umbilical cord blood or bone marrow and cells differentiated from them, as well as HSPCs isolated from mobilized peripheral blood.
Also provided herein are genetically engineered cells arising from haematopoietic stem cells, such as T cells, that have been modified according to the methods described herein.
In some cases, the methods are configured to produce genetically engineered T cells arising from HSCs or iPSCs, that are suitable as “universally acceptable” cells for therapeutic application. Haemopoietic stem cells (HSCs) arise from hemangioblasts, which can give rise to HSCs, vascular smooth muscle cells and angioblasts, which differentiate into vascular endothelial cells. HSCs can give rise to common myeloid and common lymphoid progenitors from which arise T cells, Natural Killer (NK) cells, B cells, myeloblasts, erythroblasts and other cells involved in the production of cells of blood, bone marrow, spleen, lymph nodes, and thymus. Such methods can also be applied to natural killer (NK) cells, CD34+ hematopoietic stem and progenitor cells (HSPCs), such as HSPCs isolated from umbilical cord blood or bone marrow and cells differentiated from them, as well as HSPCs isolated from mobilized peripheral blood.
In another aspect, provided herein are methods for targeting diseases for base editing correction. In some of the methods, the base editing complexes are delivered to a subject for treatment. The target sequence can be any disease-associated polynucleotide or gene. Examples of useful applications of mutation or correction of an endogenous gene sequence according to the present invention include but are not limited to: alterations of disease-associated gene mutations, alterations in sequences encoding splice sites, alterations in regulatory sequences, alterations in sequences to cause a gain-of-function mutation, and/or alterations in sequences to cause a loss-of-function mutation, and targeted alterations of sequences encoding structural characteristics of a protein.
Delivery of Components Into Cells
The base editing complexes or their components may be delivered to target cells and organisms via various methods and various formats (DNA, RNA or protein) or combination of these different formats. The base editing components may be delivered as: (a) DNA polynucleotides that encode the relevant sequence for the protein effectors or the guide RNAs; (b) synthetic RNA encoding the sequence for the protein effectors (messenger RNA) or the guide RNAs; (c) purified protein for the effector. When delivering as protein format, the Type V Cas protein can be assembled with the guide RNAs to form a ribonucleoprotein complex (RNP) for delivery into target cells and organisms.
For example, the components or complexes as assembled may be delivered together or separately by electroporation, by nucleofection, by transfection, via nanoparticles, via viral mediated RNA delivery, via non-viral mediated delivery, via extracellular vesicles (for example, exosome and microvesicles), via eukaryotic cell transfer (for example, by recombinant yeast) and other methods that can package molecules such that they can be delivered to a target viable cell without changes to the genomic landscape.
Other methods include, but are not limited to, non-integrative transient transfer of DNA. polynucleotides that include the relevant sequence for the protein recruitment so that the molecule can be transcribed into the desired RNA molecule and for amino acid containing components translated into a protein or protein fragment. This includes, without limitation, DNA-only vehicles (for example, plasmids, MiniCircles, MiniVectors, MiniStrings, Protelomerase generated DNA molecules (for example Doggybones), artificial chromosome (for example HAC), and cosmids), via DNA vehicles by nanoparticies, extracellular vesicles (for example, exosome and microvesicles), via eukaryotic cell transfer (for example, by recombinant yeast), transient viral transfer by AAV, non-integrating viral particles (for example, lentivirus and retrovirus based systems), cell penetrating peptides and other technology that can mediate the introduction of DNA into a cell without direct integration into the genomic landscape. Another method for the introduction of the RNA components include the use of integrative gene transfer technology for stable introduction of the machinery for RNA transcription into the genome of the target cells, this can be controls via constitutive or promoter inducible systems to attenuate the RNA expression and this can also be designed so that the system can be removed after the utility has been met for example, introducing a Cre-Lox recombination system), such technology for stable gene transfer includes, but is not limited to, integrating viral particles (for example, lentivirus, adenovirus and retrovirus based systems), transposase mediate transfer (for example Sleeping Beauty and Piggybac), exploitation of the non-homologous repair pathways introduced by DNA breaks (for example, utilizing CRISPR and TALEN) technology and a surrogate DNA molecule, and other technology that encourages integration of the target DNA into a cell of interest.
The various components of the complexes of the present invention, if not synthesized enzymatically within a cell or solution, may be created chemically or, if naturally occurring, isolated and purified from naturally occurring sources. Methods for chemically and enzymatically synthesizing the various embodiments of the present invention are well known to persons of ordinary skill in the art. Similarly, methods for ligating or introducing covalent bonds between components of the present invention are also well known to persons of ordinary skill in the art.
Applications
By way of a non-limiting example, the complexes of the present invention may be used to recruit transcriptional activators such as p65 and V64, as well as moieties that introduce epigenetic modifications or affect HDR. The complexes of the present invention can also be used for the following applications; base editing, genome editing, genome screening, generation of therapeutic cells, genome tagging, epigenome editing, karyotype engineering, chromatin imaging, transcriptome and metabolic pathway engineering, genetic circuits engineering, cell signaling sensing, cellular events recording, lineage information reconstruction, gene drive, DNA genotyping, miRNA quantification, in vivo cloning, site-directed mutagenesis, genomic diversification, and proteomic analysis in situ. In some embodiments, a cell or a population of cells are exposed to a base editing complex of the present invention and the cell or cells are introduced to a subject by infusion.
Applications also include research of human diseases such as cancer immunotherapy, antiviral therapy, bacteriophage therapy, cancer diagnosis, pathogen screening, microbiota remodeling, stem-cell reprogramming, immunogenomic engineering, vaccine development, and antibody production.
Vector Construction:
The coding sequence for a deactivated version of CasPhi and 2xUGI fusion (dCasPhi-2xUGI) was obtained and cloned into an expression vector under the control of the mouse CMV promoter in a T2A polycistronic cassette with a red fluorescent protein-puromycin fusion. The coding sequence for MS2 coat protein lizard Anolis Apobec fusion (MCP-lizAnoA1) (“AnoA1”) is:
which was obtained and cloned into an expression vector under control of the mouse CMV promoter. The sequence for crRNA containing the MS2 ligand binding moiety and unique spacer regions were cloned into an expression vector under control of the hU6 promoter.
HEK 293T cells (ATCC, #CRL-11268) were seeded at 20,000 cells per well in a 96-well plate one day prior to transfection. Cells were co-transfected using DharmaFECT Duo Transfection Reagent (Horizon Discovery, #T-2010) and 200 ng dCasPhi-2xUGI plasmid, 100 ng AnoA1 plasmid, and 100 ng crRNA plasmid. The plasmid crRNA consisted of a direct repeat length of e.g., 35 nucleotides and different spacer sequences of 18 or 20 nucleotides targeting transcripts within Site2 or B2M gene targets. Additionally, they have the MS2 ligand binding moiety at the 5′ terminus, the 3′ terminus, internally not at either the 5′ or 3′ terminus, or combinations therein. See SEQ ID NO: 35 to 58 in Table 3 below.
Cell Processing
Cells were lysed in 100 μL of a buffer containing proteinase K (Thermo Scientific, #FEREO0492), RNase A (Thermo Scientific, #FEREN0531), and Phusion HF buffer (Thermo Scientific, #F-518L) for 30 min at 56° C., followed by a 5 min heat inactivation at 95° C. This cell lysate was used to generate PCR amplicons spanning the region containing the base editing site(s). PCR amplicons between 400-1000 bp in length were sequenced by Sanger sequencing.
Editing Analysis
Base editing efficiencies were calculated using the Chimera analysis tool, an adaptation of the open source tool BEAT. Chimera determines editing efficiency by first subtracting the background noise to define the expected variability in a sample. This allows the estimation of editing efficiency without the need to normalize to control samples. Following this, Chimera filters out any outliers from the noise using the Median Absolute Deviation (MAD) method and then assesses the editing efficiency of the base editor over the span of the 18-20 bp input guide sequence.
Table 3 provides a list of plasmid guide sequences. Spacer region sequences are in bold. Direct repeat sequences are underlined. The ligand binding moiety sequence is italicized.
CTTTCAAGACTAATAGATTGCTCCTTACGAGGA
GAC
AGGCTGGCCCGCCCCGCA
CTTTCAAGACTAATAGATTGCTCCTTACGAGGA
GAC
GTGTTCCAGTTTCCTTTA
TAATAGATTGCTCCTTACGAGGAGAC
AGGCTG
GCCCGCCCCGCA
TAATAGATTGCTCCTTACGAGGAGAC
GTGTTCC
AGTTTCCTTTA
CTTTCAAGACTGCGCACATGAGGATCACCCATGT
GGCCCGCCCCGCA
CTTTCAAGACTGCGCACATGAGGATCACCCATGT
CAGTTTCCTTTA
CTTTCAAGACTAATAGATTGCTCCTTACA
ACATG
AGGATCACCCATGT
TGCGAGGAGAC
AGGCTGGC
CCGCCCCGCA
CTTTCAAGACTAATAGATTGCTCCTTACA
ACATG
AGGATCACCCATGT
TGCGAGGAGAC
GTGTTCCA
GTTTCCTTTA
CTTTCAAGACTAATAGATTGCTCCTTAACAACAT
GAGGATCACCCATGT
TGCCGAGGAGACAGGCTG
GCCCGCCCCGCA
CTTTCAAGACTAATAGATTGCTCCTTAACAACAT
GAGGATCACCCATGT
TGCCGAGGAGACGTGTTC
CAGTTTCCTTTA
CTTTCAAGACTAATAGATTGCTCCTTACGAGGA
GAC
AGGCTGGCCCGCCCCGCAGCGCACATGAG
GATCACCCATGTGC
CTTTCAAGACTAATAGATTGCTCCTTACGAGGA
GAC
GTGTTCCAGTTTCCTTTAGCGCACATGAGG
ATCACCCATGTGC
AATAGATTGCTCCTTACGAGGAGAC
AGGCTGG
CCCGCCCCGCA
AATAGATTGCTCCTTACGAGGAGAC
GTGTTCCA
GTTTCCTTTA
GCACATGAGGATCACCCATGTGCAATAGATTGCT
CCTTACGAGGAGAC
AGGCTGGCCCGCCCCGC
A
GCACATGAGGATCACCCATGTGCAATAGATTGCT
CCTTACGAGGAGAC
GTGTTCCAGTTTCCTTTA
CTTTCAAGACTAATAGATTGCTCCTTACGAGGA
GAC
AGGAATGCCCGCCAGCGC
TAATAGATTGCTCCTTACGAGGAGAC
AGGAAT
GCCCGCCAGCGC
CTTTCAAGACTAATAGATTGCTCCTTACGAGGA
GAC
AGGCTGGCCCGCCCCGCAGT
CTTTCAAGACTAATAGATTGCTCCTTACGAGGA
GAC
GTGTTCCAGTTTCCTTTACA
CTTTCAAGACTAATAGATTGCTCCTTACGAGGA
GAC
AGGAATGCCCGCCAGCGCGA
TAATAGATTGCTCCTTACGAGGAGAC
AGGCTG
GCCCGCCCCGCAGT
TAATAGATTGCTCCTTACGAGGAGAC
GTGTTCC
AGTTTCCTTTACA
TAATAGATTGCTCCTTACGAGGAGAC
AGGAAT
GCCCGCCAGCGCGA
The inventors used sequences from Table 3 to evaluate the effects of MS2 placement on base editing levels at two genomic target sites shown. Editing results are shown for eleven target C residues, which are identified in
For these experiments, guide RNAs targeting: (A) Site2 gRNA4, SEQ ID NO:s 35, 37, 39, 45, 47, and 49 (see
Based on the data, 5′MS2 pre-cr configuration results in higher levels of base editing at both sites, compared to other MS2 placements. Thus, for a CasPhi system a gRNA may encode SEQ ID NO 137 and/or SEQ ID NO 138 (see
Vector Construction
The coding sequence for CasPhi was obtained and cloned into an expression vector under the control of the mouse CMV promoter in a T2A polycistronic cassette with a red fluorescent protein-puromycin fusion. The sequence for crRNA and unique spacer regions were cloned into an expression vector under control of the hU6 promoter.
HEK 293T cells (ATCC, #CRL-11268) were seeded at 20,000 cells per well in a 96-well plate one day prior to transfection. Cells were co-transfected using DharmaFECT Duo Transfection Reagent (Horizon Discovery, #T-2010) and 200 ng CasPhi plasmid, and 100 ng crRNA plasmid. The plasmid crRNA consisted of a direct repeat length of 35 nucleotides.
Cell Processing:
Cells were lysed in 100 μL of a buffer containing proteinase K (Thermo Scientific, #FEREO0492), RNase A (Thermo Scientific, #FEREN0531), and Phusion HF buffer (Thermo Scientific, #F-518L) for 30 min at 56° C., followed by a 5 min heat inactivation at 95° C. This cell lysate may be used to generate PCR amplicons spanning the region containing the base editing site(s). PCR amplicons between 400-1000 bp in length may be digested with T7 endonuclease I (T7EI) enzyme (NEB, M0302S) in the presence of NEB buffer 2 (NEB, B7002S) for 25 minutes. The digested PCR product may be run on 2% agarose gel for 90 minutes at 80 volts, imaged, and analyzed using Horizon Discovery's online T7EI calculator
(https://horizondiscovery.com/en/ordering-and-calculation-tools/t7ei-calculator).
SEQ ID NO: 36, 38, 40, 42, 44, and 46 from table 3 were used in this example.
The percentages of editing from C to T are shown in
Different deactivating mutations of CasPhi were compared for base editing efficiency—D369A, E566A, and D369A/E566A/D658A. HEK293T cells were transfected with dCasPhi-2xUGI plasmid+AnoA1 plasmid+the indicated plasmid gRNAs, 35, 37, 39, and 45 for Site2 gRNA4 (
These data demonstrate the capability of using dCasPhi 5′ MS2 pre-cr guides with different deactivating mutations to perform base editing at several different target C residues at HEK Site2. Table 4 provides deactivated CasPhi sequences.
In one experiment, HEK293T cells were electroporated with dCasPhi-2xUGI mRNA+AnoA1 mRNA+the indicated extended length spacer synthetic gRNAs for Site2 gRNA5. SEQ ID NO: 101 and 102 (see Table 5). In a second experiment, HEK 293T cells were transfected with dCasPhi-2xUGI plasmid+AnoA1 plasmid+the indicated gRNA plasmids for Site2 gRNA5. SEQ ID NO: 36, 38, 54, and 57 (see Table 3). The cells were harvested, and base editing levels were analyzed using Chimera software. The data show % C>T conversion at the indicated cytosine positions along the spacers. The percentage of C to T editing for each experiment is shown in
gRNAs used in this example may more generally be represented by the schematics of
These data show that base editing works when using different spacer lengths in a gRNA that contains a ligand binding moiety.
mRNA Preparation:
Messenger RNA were prepared from DNA vectors carrying the T7 promoter and the coding sequences for dCasPhi-2xUGI and AnoA1 following the standard protocols for mRNA in vitro transcription.
RNA Synthesis:
All crRNA were synthesized by Horizon Discovery using either 2′-acetoxy ethyl orthoester (2′-ACE) or 2′-tert-butyldimethylsilyl (2′-TBDMS) protection chemistries or by Agilent. Chemical modifications were included where noted. RNA oligos were 2′-depotected/desalted and purified by either high-performance liquid chromatography (HPLC) or polyacrylamide gel electrophoresis (PAGE). Oligos were resuspended in 10 mM Tris pH7.5 buffer prior to electroporation.
HEK 293T cells (ATCC, #CRL-11268) were electroporated using the Invitrogen™ Neon™ Transfection System, 10 μl Kit. A mixture of 50,000 cells, 650 ng of dCasPhi-2xUGI mRNA and 200 ng AnoA1 mRNA, and 6 μM of synthetic crRNA may be electroporated at 1150V for 20 ms and for 2 pulses. The chemically synthesized crRNA consisted of different direct repeat lengths of 21 or 35 nucleotides, different spacer sequences targeting transcripts within Site2 or B2M gene targets, and the MS2 ligand binding moiety at the 5′ terminus, the 3′ terminus, internally not at either the 5′ or 3′ terminus, or combinations therein. Each sequence optionally contained chemical modifications at one or more bases and within one or more linkages. Cells were plated in a 96-well plate with full serum media and harvested after 48-72 hours for further processing
Cell Processing
Cells were lysed in 100 μL of a buffer containing proteinase K (Thermo Scientific, #FEREO0492), RNase A (Thermo Scientific, #FEREN0531), and Phusion HF buffer (Thermo Scientific, #F-518L) for 30 min at 56° C., followed by a 5 min heat inactivation at 95° C. This cell lysate was used to generate PCR amplicons spanning the region containing the base editing site(s). PCR amplicons between 400-1000 bp in length may be sequenced by Sanger sequencing. PCR amplicons may be purified (Qiagen, #28181) and submitted for NGS sequencing.
Editing Analysis
Base editing efficiencies were calculated using the Chimera analysis tool, an adaptation of the open source tool BEAT. Chimera determines editing efficiency by first subtracting the background noise to define the expected variability in a sample. This allows the estimation of editing efficiency without the need to normalize to control samples. Following this, Chimera filters out any outliers from the noise using the Median Absolute Deviation (MAD) method and then assesses the editing efficiency of the base editor over the span of the 18-20 bp input guide sequence. High throughput sequencing data analysis, specifically frequency of single nucleotide polymorphisms (SNPs) and insertions/deletions (indels), was performed as follows: barcoded samples were demultiplexed and the demultiplexed, paired-end reads were merged using a custom Python script, which filters out any reads with mismatches in the overlapping region and keeps the higher Phred score for each overlapping base. The non-overlapping portions of the reads were then trimmed off and merged reads containing any base with a Phred score<30 were filtered out. The resulting reads were aligned using Bowtie2 and a mpileup file was generated using SAMtools.
Table 5 provides examples of chemically synthesized guides for use with CasPhi (Cas12j) and that were successfully delivered through electroporation.
The chemical modifications are noted (m=2′-O-methyl; *=phosphorothioate). Spacer region sequences are in bold. Direct repeat sequences are underlined. The ligand binding moiety sequence is italicized.
mG*mC*UUUCAA
GACUAAUAGAU
UGCUCCUUACGA
GGAGAC
AGGCU
GGCCCGCCCCm
G*mC*A
mG*mC*UUUCAA
GACUAAUAGAU
UGCUCCUUACGA
GGAGAC
GUGUU
CCAGUUUCCUm
U*mU*A
GAGGAUCACCCA
UGUGCCUUUCA
AGACUAAUAGA
UUGCUCCUUACG
AGGAGAC
AGGC
UGGCCCGCCCC
mG*mC*A
GAGGAUCACCCA
UGUGCCUUUCA
AGACUAAUAGA
UUGCUCCUUACG
AGGAGAC
GUGU
UCCAGUUUCCU
mU*mU*A
mG*mC*UUUCAA
GACUGCGCACAU
GAGGAUCACCCA
UGUGCAAUAGA
UUGCUCCUUACG
AGGAGAC
AGGC
UGGCCCGCCCC
mG*mC*A
mG*mC*UUUCAA
GACUGCGCACAU
GAGGAUCACCCA
UGUGCAAUAGA
UUGCUCCUUACG
AGGAGAC
GUGU
UCCAGUUUCCU
mU*mU*A
mG*mC*UUUCAA
GACUAAUAGAU
UGCUCCUUACGA
GGAGAC
AGGCU
GGCCCGCCCCG
CAGCGCACAUGA
GGAUCACCCAUG
mU*mG*C
mG*mC*UUUCAA
GACUAAUAGAU
UGCUCCUUACGA
GGAGAC
GUGUU
CCAGUUUCCUU
UAGCGCACAUGA
GGAUCACCCAUG
mU*mG*C
mA*mA*UAGAUU
GCUCCUUACGAG
GAGAC
AGGCUG
GCCCGCCCCmG
*mC*A
mA*mA*UAGAUU
GCUCCUUACGAG
GAGAC
GUGUUC
CAGUUUCCUmU
*mU*
GGAUCACCCAUG
UGCAAUAGAUU
GCUCCUUACGAG
GAGAC
AGGCUG
GCCCGCCCCmG
*mC*A
GGAUCACCCAUG
UGCAAUAGAUU
GCUCCUUACGAG
GAGAC
GUGUUC
CAGUUUCCUmU
*mU*A
mG*mC*UUUCAA
GACUAAUAGAU
UGCUCCUUACGA
GGAGAC
AGGAA
UGCCCGCCAGm
C*mG*C
GAGGAUCACCCA
UGUGCCUUUCA
AGACUAAUAGA
UUGCUCCUUACG
AGGAGAC
AGGA
AUGCCCGCCAG
mC*mG*C
UCAAGACUAAU
AGAUUGCUCCU
UACGAGGAGAC
GUGUUCCAGUU
UCCUmU*mU*A
ACAUGAGGAUCA
CCCAUGUGCCUU
UCAAGACUAAU
AGAUUGCUCCU
UACGAGGAGAC
GUGUUCCAGUU
UCCUmU*mU*A
GAGGAUCACCCA
UGUGCUCUCUCU
CUAAUAGAUUG
CUCCUUACGAGG
AGAC
GUGUUCC
AGUUUCCUmU*
mU*A
ACAUGAGGAUCA
CCCAUGUGCUCU
AAGACUAAUAG
AUUGCUCCUUAC
GAGGAGAC
GUG
UUCCAGUUUCC
UmU*mU*A
CAAGACUAAUA
GAUUGCUCCUU
ACGAGGAGAC
G
UGUUCCAGUUU
CCUmU*mU*AGC
CACCCAUGmU*m
mC*mU*UUCAAG
ACUAAUAGAUU
GCUCCUUACGAG
GAGAC
GUGUUC
CAGUUUCCUmU
*mU*AGCUCUCU
AGGAUCACCCAU
GmU*mG*C
CAAGACUAAUA
GAUUGCUCCUU
ACGAGGAGAC
G
UGUUCCAGUUU
CCUmU*mU*AGC
UCUCUCUCUCGC
ACAUGAGGAUCA
CCCAUGmU*mG*
mG*mC*UUUCAA
GACUAAUAGAU
UGCUCCUUACGA
GGAGAC
AGGCU
GGCCCGCCCCG
CmA*mG*
GAGGAUCACCCA
UGUGCCUUUCA
AGACUAAUAGA
UUGCUCCUUACG
AGGAGAC
AGGC
UGGCCCGCCCC
GCmA*mG*U
mG*mC*UUUCAA
GACUAAUAGAU
UGCUCCUUACGA
GGAGAC
GUGUU
CCAGUUUCCUU
UmA*mC*A
GAGGAUCACCCA
UGUGCCUUUCA
AGACUAAUAGA
UUGCUCCUUACG
AGGAGAC
GUGU
UCCAGUUUCCU
UUmA*mC*A
mG*mC*UUUCAA
GACUAAUAGAU
UGCUCCUUACGA
GGAGAC
CUCUC
CCGCUCUGCAC
CmC*mU*C
GAGGAUCACCCA
UGUGCCUUUCA
AGACUAAUAGA
UUGCUCCUUACG
AGGAGAC
CUCU
CCCGCUCUGCA
CCmC*mU*C
mG*mC*UUUCAA
GACUAAUAGAU
UGCUCCUUACGA
GGAGAC
AGGAA
UGCCCGCCAGC
GmC*mG*A
GAGGAUCACCCA
UGUGCCUUUCA
AGACUAAUAGA
UUGCUCCUUACG
AGGAGAC
AGGA
AUGCCCGCCAG
CGmC*mG*A
AUCACCCAUGUG
AAUAGAUUGCU
CCUUACGAGGA
GAC
GUGUUCCA
GUUUCCUUUAC
A
GGAUCACCCAUG
UGCCUUUCAAG
ACUAAUAGAUU
GCUCCUUACGAG
GAGAC
GUGUUC
CAGUUUCCUUU
AC*mA
GAGGAUCACCCA
UGUGCCUUUCA
AGACUAAUAGA
UUGCUCCUUACG
AGGAGAC
GUGU
UCCAGUUUCCU
UUA*mC*mA
UGAGGAUCACCC
AUGUGCCUUUCA
AGACUAAUAGA
UUGCUCCUUACG
AGGAGAC
GUGU
UCCAGUUUCCU
UUmAmC*mA
AUGAGGAUCACC
CAUGUGCCUUUC
AAGACUAAUAG
AUUGCUCCUUAC
GAGGAGAC
GUG
UUCCAGUUUCC
UUU*mA*mC*mA
mG*mC*mUUUCA
AGACUAAUAGA
UUGCUCCUUACG
AGGAGAC
GUGU
UCCAGUUUCCU
UUmAmC*mA
AUCACCCAUGUG
AAUAGAUUGCU
CCUUACGAGGA
GAC
AGGAAUGC
CCGCCAGCGCG
A
GGAUCACCCAUG
UGCCUUUCAAG
ACUAAUAGAUU
GCUCCUUACGAG
GAGAC
AGGAAU
GCCCGCCAGCG
CG*mA
GAGGAUCACCCA
UGUGCCUUUCA
AGACUAAUAGA
UUGCUCCUUACG
AGGAGAC
AGGA
AUGCCCGCCAG
CGC*mG*mA
UGAGGAUCACCC
AUGUGCCUUUCA
AGACUAAUAGA
UUGCUCCUUACG
AGGAGAC
AGGA
AUGCCCGCCAG
CGmCmG*mA
AUGAGGAUCACC
CAUGUGCCUUUC
AAGACUAAUAG
AUUGCUCCUUAC
GAGGAGAC
AGG
AAUGCCCGCCA
GCG*mC*mG*m
A
mG*mC*mUUUCA
AGACUAAUAGA
UUGCUCCUUACG
AGGAGAC
AGGA
AUGCCCGCCAG
CGmCmG*mA
mG*mC*UUUCAA
GACUAAUAGAU
UGCUCCUUACGA
GGAGAC
CUCUC
CCGCUCUGCAm
C*mC*C
GAGGAUCACCCA
UGUGCCUUUCA
AGACUAAUAGA
UUGCUCCUUACG
AGGAGAC
CUCU
CCCGCUCUGCA
mC*mC*C
HEK293T cells were electroporated with dCasPhi-2xUGI mRNA+AnoA1 mRNA+the indicated synthetic gRNAs for (
HEK293T cells were electroporated with dCasPhi-2xUGI mRNA+AnoA1 mRNA+the indicated synthetic gRNAs. SEQ ID NO: 75 The dCasPhi mRNAs with different codon optimizations are noted in table 6 below. The cells were harvested, and base editing levels were analyzed using Chimera software. The data, which is summarized in
Guide RNAs for Site2 gRNA5, SEQ ID NOs: 107, 108, 109, 110, 111, and 112 (table 5), and B2M gRNA6 SEQ ID NO: 113, 114, 115, 86, 116, 117 and 118 (table 5), were synthesized with different combinations of chemical modifications on 5′ and 3′ ends (see table of sequences for details). HEK293T cells were electroporated with dCasPhi-2xUGI mRNA+AnoA1 mRNA+synthetic gRNAs. Base editing levels were measured by Chimera or NGS and compared to the gRNA with mN*mN* . . . mN*mN*N modifications. Data show % C>T conversion at the indicated cytosines along the spacer. The results are shown in
These data indicate that there are several chemical modification patterns that offer significantly improved base editing levels over unmodified chemically synthesized guides, e.g., a single 2′-O-methyl modification and phosphorothioate linkage at both the 5′ and 3′ end as well as additional incorporations of two or three 2′-Omethyl modifications and phosphorothioate linkages.
Guide RNAs were designed with (UC) linkers on the 5′ end, and/or between the MS2 and the spacer, and/or on the 3′ end of the guide. SEQ ID NO: 73, 75, 79, 92, 93, 94, 95, 96, 97, and 98 (table 5). RNAs were synthesized by the same method with the same chemical modifications. HEK293T cells were electroporated with dCasPhi-2xUGI mRNA+AnoA1 deaminase mRNA+synthetic gRNA. Base editing levels were measured by Chimera or NGS and compared to the gRNA without linkers. Data show the levels of C>T conversion at the indicated cytosines along the spacer sequence.
These data, summarized in
CD3+ T Cell Isolation from Fresh Blood Sources and Culturing: PBMCs were isolated from blood sources (e.g., CPD Blood bags, apheresis cones, leukopaks, etc.) by layering on Lymphoprep using SepMap columns (STEMCELL Technologies). Then total CD3+ T cells were isolated using negative selection with the EasySep Human T Cell Isolation Kit (STEMCELL Technologies). T Cells were checked by flow cytometry and then cultured in Immunocult XT media (STEMCELL Technologies) with 1'3 Penicillin/Streptomycin (Thermofisher) at 37 C and 5% CO2.
T Cell Electroporation: After 48-72 post-activation T cells were electroporated with using the Neon Electroporator (Thermofisher). Neon Electroporator conditions were 1600v/10 ms/3 pulses with a 10 ul tip with 250k cells, combined total mRNA concentration of 100 ng/ul, for both the Deaminase-MCP and nCasPhi-UGI-UGI (synthesized by Trilink), and the crRNA was a final concentration of 2 uM. Post-electroporation cells were transferred to Immunocult XT media with 100U IL-2, 100U IL-7 and 100U IL-15 (STEMCELL Technologies) and cultured at 37 C and 5% CO2 for 48-72 hours.
CD3+ T Cell Activation: T cells were activated by using 1:1 bead:cell ratio of Dynabeads Human T Activator CD3/CD28 beads (Thermofisher) cultured in Immunocult XT media (STEMCELL Technologies) in the presence of 100U/ml IL-2 (STEMCELL Technologies) and 1× Penicillin/Streptomycin (Thermofisher) at 37 C and 5% CO2 for 48-72 hours. Post-activation, beads were removed by placement on a magnet and the transfer of the cells back into culture.
Genomic DNA Analysis: Genomic DNA was released from lysed cells 48-72 hours post-electroporation. Locus of interest were amplified by PCR and products then sent for Sanger sequencing. Data was analyzed by Chimera.
Synthetic crRNA Sequence (without aptamer) against B2M locus: SEQ ID NO: 84 from Table 5.
Synthetic crRNA Sequence (with 1xMS2 aptamer) against B2M locus: SEQ ID NO: 85 from Table 5.
T lymphocytes were stimulated and then electroporated in the presence of a different aptamer designs with the same deaminase. The data, summarized in
mRNA Preparation:
Messenger mRNA are prepared from DNA vectors carrying the T7 promoter and the coding sequences for dCas12a-UGI and AnoA1 following the standard protocols for mRNA in vitro transcription.
RNA Synthesis:
All crRNA were synthesized by Horizon Discovery using either 2′-acetoxy ethyl orthoester (2′-ACE) or 2′-tert-butyldimethylsilyl (2′-TBDMS) protection chemistries or by Agilent. Chemical modifications were included where noted. RNA oligos were 2′-depotected/desalted and purified by either high-performance liquid chromatography (HPLC) or polyacrylamide gel electrophoresis (PAGE). Oligos were resuspended in 10 mM Tris pH7.5 buffer prior to electroporation.
HEK 293T cells (ATCC, #CRL-11268) were electroporated using the Invitrogen™ Neon™ Transfection System, 10 uL Kit. A mixture of 50,000 cells, 1 μg of mRNA, and 6 μM of synthetic crRNA were electroporated at 1150V for 20 ms and for 2 pulses. mRNA was mixed at a 3:1 molar ratio of dCas12a-2xUGI to AnoA1. Cells were plated in a 96-well plate with full serum growth media and harvested after 72 hours for further processing.
Cell Processing
Cells may be lysed in 100 μL of a buffer containing proteinase K (Thermo Scientific, #FEREO0492), RNase A (Thermo Scientific, #FEREN0531), and Phusion HF buffer (Thermo Scientific, #F-518L) for 30 min at 56° C., followed by a 5 min heat inactivation at 95° C. This cell lysate may be used to generate PCR amplicons spanning the region containing the base editing site(s). PCR amplicons between 400-1000 bp in length may be sequenced by Sanger sequencing.
Editing Analysis
Base editing efficiencies may be calculated using the Chimera analysis tool, an adaptation of the open source tool BEAT. Chimera determines editing efficiency by first subtracting the background noise to define the expected variability in a sample. This allows the estimation of editing efficiency without the need to normalize to control samples. Following this, Chimera filters out any outliers from the noise using the Median Absolute Deviation (MAD) method and then assesses the editing efficiency of the base editor over the span of the 23 nt input guide sequence.
Table 7 provides chemically synthesized gRNA for use with Cas 12a and the gRNA used in this example Chemically modified synthetic guides SEQ ID NO: 142 -144 demonstrated desirable levels of base editing. Spacer region sequences are in bold. Direct repeat sequences are underlined. The ligand binding moiety sequence is italicized.
ACUAAGUGUAGAU
CAGCCCGCTGGCCCTGTAAAmG
*mG*A
ACUUUUUAAUAAUUUCUACUAAGUGUAGAU
CAGCC
CGCTGGCCCTGTAAAmG*mG*A
mU*mA*AUUUCUACUAAGUGUAGAU
CAGCCCGCTG
GCCCTGTAAAGGAGCGCACAUGAGGAUCACCCAUGm
U*mG*C
mG*mU*CAAAAGACUUUUUAAUAAUUUCUACUAAGU
GUAGAU
CAGCCCGCTGGCCCTGTAAAGGAGCGCAC
AUGAGGAUCACCCAUGmU*mG*C
mG*mU*CAAAAGACUUUUUAAUAAUUUCUACUAAGU
GUAGAU
CAGCCCGCTGGCCCTGTAAAmG*mG*A
HEK293T cells were electroporated with dCas12a-2xUGI mRNA+AnoA1 mRNA+the indicated synthetic gRNAs. The cells were harvested, and base editing levels were analyzed by NGS. The data, summarized in
The coding sequence for a deactivated version of Cas12i2 and 2xUGI fusion (dCas12i2-UGI) was obtained and cloned into an expression vector under the control of the mouse CMV promoter in a T2A polycistronic cassette with a red fluorescent protein-puromycin fusion. The coding sequence for MS2 coat protein lizard Anolis Apobec fusion (AnoA1) was obtained and cloned into an expression vector under control of the mouse CMV promoter. The coding sequence for MS2 coat protein human APOBEC3A (hA3A) fusion (MCP-hA3A) was obtained and cloned into an expression vector under control of the mouse CMV promoter. The sequence for crRNA containing the MS2 ligand binding moiety and unique spacer regions were cloned into an expression vector under control of the hU6 promoter.
HEK 293T cells (ATCC, #CRL-11268) were seeded at 20,000 cells per well in a 96-well plate one day prior to transfection. Cells were co-transfected using DharmaFECT Duo Transfection Reagent (Horizon Discovery, #T-2010) and 75-200 ng dCas12i2-UGI plasmid, 75-100 ng AnoA1 or hA3Aplasmid, and 50-100 ng crRNA plasmid. The plasmid crRNA consisted of a direct repeat length of 31 nucleotides, different spacer sequences of 31 nucleotides targeting transcripts within Site2 or B2M gene targets, and have the MS2 ligand binding moiety at the 5′ terminus, the 3′ terminus, internally not at either the 5′ or 3′ terminus, or combinations therein. Sequences are provided in Table 8 below.
Cells were grown for 72 hours post-transfection and harvested for further processing.
Cell Processing
Cells were lysed in 100 μL of a buffer containing proteinase K (Thermo Scientific, #FEREO0492), RNase A (Thermo Scientific, #FEREN0531), and Phusion HF buffer (Thermo Scientific, #F-518L) for 30 min at 56° C., followed by a 5 min heat inactivation at 95° C. This cell lysate was used to generate PCR amplicons spanning the region containing the base editing site(s). PCR amplicons between 400-1000 bp in length were sequenced by Sanger sequencing.
Editing Analysis
Base editing efficiencies were calculated using the Chimera analysis tool, an adaptation of the open source tool BEAT. Chimera determines editing efficiency by first subtracting the background noise to define the expected variability in a sample. This allows the estimation of editing efficiency without the need to normalize to control samples. Following this, Chimera filters out any outliers from the noise using the Median Absolute Deviation (MAD) method and then assesses the editing efficiency of the base editor over the span of the 31 bp input guide sequence.
GGATCACCCA
TGTGCAGAA
ATCCGTCTTT
CATTGACGG
ACAGATGG
GGCTGGAC
AATTTTTCC
CCCTTT
AGAAATCCG
TCTTTCATTG
ACGG
ACAGA
TGGGGCTG
GACAATTTT
TCCCCCTTT
GGATCACCCA
TGTGC
AGAAATCCG
TCTTTCATTG
ACGG
ACAGA
TGGGGCTG
GACAATTTT
TCCCCCTTT
GGATCACCCA
TGTGCAGAA
ATCCGTCTTT
CATTGACGG
CAGTTTCCT
TTACAGGGC
CAGCGGGC
TGGAA
AGAAATCCG
TCTTTCATTG
ACGG
CAGTT
TCCTTTACA
GGGCCAGC
GGGCTGGA
AGCGCACAT
GAGGATCACC
CATGTGC
AGAAATCCG
TCTTTCATTG
ACGG
CAGTT
TCCTTTACA
GGGCCAGC
GGGCTGGA
A
GGATCACCCA
TGTGCAGAA
ATCCGTCTTT
CATTGACGG
AGGAATGC
CCGCCAGC
GCGACGCC
TCCACTT
AGAAATCCG
TCTTTCATTG
ACGG
AGGAA
TGCCCGCCA
GCGCGACG
CCTCCACTT
GGATCACCCA
TGTGC
AGAAATCCG
TCTTTCATTG
ACGG
AGGAA
TGCCCGCCA
GCGCGACG
CCTCCACTT
SEQ ID NO: 128, 129, and 130 from table 8 were used in this example
HEK293T cells were transfected with plasmids for: dCas12i2-2xUGI+AnoA1 or hA3A+the indicated gRNAs. The cells were harvested and base editing levels analyzed by Chimera. The data, summarized in
This application is a national stage application of international application serial number PCT/US2022/011289, filed Jan. 5, 2022, which claims the benefit of the filing date of U.S. Provisional Application Ser. No. 63/133,942, filed Jan. 5, 2021, the entire disclosures of which are incorporated by reference as if set forth fully herein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/011289 | 1/5/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63133942 | Jan 2021 | US |