The present invention relates to the field of gene-editing.
Researchers are aggressively exploring the use of Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) systems in order to modify DNA. To date, the vast majority of the work in this field has been in Cas9 systems. In these systems, a tracrRNA (trans-activating CRISPR RNA) and a crRNA (CRISPR RNA) that are either in the form of two separate strands of nucleotides or regions of a single strand of nucleotides hybridize to recruit a Cas9 protein and then direct the Cas9 protein to a DNA location that is complementary to a sequence within the crRNA. The complementary sequence within the DNA thus becomes a target site, and the Cas9 protein may, based on its functional domain, cause editing at this target site.
Despite the now well recognized power of the Cas9 systems, those systems are not effective in all applications. Among the limitations of Cas9 systems is that the functional domains upon which the Cas9 systems can act are defined by the functional domain of the Cas9 protein that one uses.
Other Cas proteins are known. Among these other Cas proteins, the potential of which has not been fully explored, are those within the Type V family. The use of enzymes from the Type V family is particularly underexplored when one seeks to introduce multiple edits at or near a target site. Therefore, there is a need to develop improved gRNAs (guide RNAs), as well as complexes and systems that incorporate and use them.
The present invention provides novel and non-obvious gRNA-ligand binding complexes, base editing complexes, and methods for base editing and genome modification. Through the use of various embodiments of the present invention, one may be able to efficiently and effectively cause base editing ex vivo, in vitro, and in vivo. Further, some embodiments of the present invention provide modular designs that allow for the same Type V Cas protein to be directed to different targeting sites and optionally associated with different effector proteins at the same or different sites.
According to a first embodiment, the present invention provides a gRNA-ligand binding complex, wherein the gRNA-ligand binding complex comprises: (a) a gRNA, wherein the gRNA contains 60 to 210 nucleotides or 80 to 180 nucleotides and the gRNA comprises (i) a crRNA sequence, wherein the crRNA sequence is 36 to 60 nucleotides long or 35 to 60 nucleotides long and the crRNA sequence comprises a Cas association region, wherein the Cas association region is 18 to 30 nucleotides long and a targeting region, wherein the targeting region is 18 to 30 nucleotides long, and (ii) a tracrRNA sequence, wherein the tracrRNA sequence is 45 to 120 nucleotides long and wherein the tracrRNA sequence comprises an anti-repeat region and a distal region, wherein the anti-repeat region is at least 80% complementary to the Cas association region over at least 18 consecutive nucleotides of the Cas association region and the Cas association region and the anti-repeat region are capable of hybridizing to form a hybridization region, wherein the hybridization region is capable of retaining association with an RNA binding domain of a Type V Cas protein; and (b) a ligand binding moiety, wherein the ligand binding moiety is either (i) directly bound to the gRNA, or (ii) bound to the gRNA through a linker.
According to a second embodiment, the present invention provides a base editing complex comprising: a gRNA-ligand binding complex of the present invention and a Type V Cas protein, wherein the Cas association region and the anti-repeat region of the gRNA-ligand binding complex are associated with the Type V Cas protein. Optionally, the ligand binding moiety is reversibly associated with a ligand that is attached to or a part of an effector molecule.
According to a third embodiment, the present invention provides a method for base editing. The method comprises exposing a base editing complex of the present invention to double stranded DNA (“dsDNA”) or to single stranded DNA (“ssDNA”). The base editing complex may be exposed to the dsDNA or ssDNA under conditions that permit base editing.
According to a fourth embodiment, the gRNA-ligand binding complex comprises or encodes SEQ ID NO: 28.
According to a fifth embodiment, the gRNA-ligand binding complex comprises or encodes any of SEQ ID NO: 59 to SEQ ID NO: 65.
According to a sixth embodiment, the gRNA-ligand binding complex comprises or encodes any of SEQ ID NO: 67 to SEQ ID NO: 71.
When an effector is attached to (or contains) a ligand, the system has a modular design. The presence of the ligand binding moiety within the gRNA-ligand binding complex allows that complex to associate with the corresponding ligand associated with (or contained within) the effector. Thus, the ligand binding moiety is associated with the gRNA in a manner and orientation that allows it to be capable of associating with a ligand. Similarly, the ligand is attached to or associated with the effector in a manner that renders it capable of reversibly associating with the ligand binding moiety.
When the ligand and the ligand binding moiety are associated with each other, the effector that is associated with the ligand will become part of any base editing complex that contains the gRNA-ligand binding complex. When the base editing complex also contains a Cas protein, that Cas protein and the effector can be retained in the same locality, e.g., at or near a target site of interest.
Thus, if one wishes to use a particular effector with the Cas protein, one only needs to associate that effector with the ligand that is capable of reversibly associating with the ligand binding moiety that is part of the base editing complex that contains that Cas protein. To change the effector from one system to the next, one need only change the effector-ligand. Consequently, one can use the same gRNA-ligand binding complex and its associated Cas protein with a plurality of different effectors. The plurality of different effectors may be used sequentially in the same system by associating and dissociating their ligands with the ligand binding moieties or simultaneously or sequentially in different systems.
ligand binding moiety at different locations in which the tracrRNA and the crRNA are separate strands of oligonucleotides.
Reference will now be made in detail to various embodiments of the present invention, examples of which are illustrated in the accompanying figures. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, unless otherwise indicated or implicit from context, the details are intended to be examples and should not be deemed to limit the scope of the invention in any way. Additionally, features described in connection with the various or specific embodiments are not to be construed as not appropriate for use in connection with other embodiments disclosed herein unless such exclusivity is explicitly stated or implicit from context.
Headers are provided herein for the convenience of the reader and do not limit the scope of any of the embodiments disclosed herein.
Unless otherwise stated or apparent from context, the following terms shall have the meanings set forth below:
The phrase “2′ modification” refers to a nucleotide unit having a sugar moiety that is modified at the 2′ position of the sugar moiety. An example of a 2′ modification is a 2′-O-alkyl modification that forms a 2′-O-alkyl modified nucleotide or a 2′ halogen modification that forms a 2′ halogen modified nucleotide.
The phrase “2′-O-alkyl modified nucleotide” refers to a nucleotide unit having a sugar moiety, for example a deoxyribosyl or ribosyl moiety that is modified at the 2′ position such that an oxygen atom is attached both to the carbon atom located at the 2′ position of the sugar and to an alkyl group. In various embodiments, the alkyl moiety consists of or consists essentially of carbon(s) and hydrogens. When the O moiety and the alkyl group to which it is attached are viewed as one group, they may be referred to as an O-alkyl group, e.g., —O-methyl, —O-ethyl, —O-propyl, —O-isopropyl, —O-butyl, —O-isobutyl, —O-ethyl-O-methyl (—OCH2CH2OCH3), and —O-ethyl-OH (—OCH2CH2OH). A 2′-O-alkyl modified nucleotide may be substituted or unsubstituted.
The phrase “2′ halogen modified nucleotide” refers to a nucleotide unit having a sugar moiety, for example a deoxyribosyl moiety that is modified at the 2′ position such that the carbon at that position is directly attached to a halogen species, e.g., Fl, Cl, or Br.
A “ligand binding moiety” refers to a moiety such as an aptamer e.g., oligonucleotide or peptide or another compound that binds to a specific ligand and can reversibly or irreversibly be associated with that ligand.
The term “modified nucleotide” refers to a nucleotide having at least one modification in the chemical structure of the base, sugar and/or phosphate, including, but not limited to, 5-position pyrimidine modifications, 8-position purine modifications, modifications at cytosine exocyclic amines, and substitution of 5-bromo-uracil or 5-iodouracil; and 2′-modifications, including but not limited to, sugar-modified ribonucleotides in which the 2′-OH is replaced by a group such as an H, OR, R, halo, SH, SR, NH2, NHR, NR2, or CN.
Modified bases refer to nucleotide bases such as, for example, adenine, guanine, cytosine, thymine, uracil, xanthine, inosine, and queuosine that have been modified by the replacement or addition of one or more atoms or groups. Some examples of these types of modifications include, but are not limited to, alkylated, halogenated, thiolated, aminated, amidated, or acetylated bases, alone and in various combinations. More specific modified bases include, for example, 5-propynyluridine, 5-propynylcytidine, 6-methyladenine, 6-methylguanine, N,N,-dimethyladenine, 2-propyladenine, 2-propylguanine, 2-aminoadenine, 1-methylinosine, 3-methyluridine, 5-methylcytidine, 5-methyluridine and other nucleotides having a modification at the position, 5-(2-amino)propyluridine, 5-halocytidine, 5-halouridine, 4-acetylcytidine, 1-methyladenosine, 2-methyladenosine, 3-methylcytidine, 6-methyluridine, 2-methylguanosine, 7-methylguanosine, 2,2-dimethylguanosine, 5-methylaminoethyluridine, 5-methyloxyuridine, deazanucleotides such as 7-deaza-adenosine, 6-azouridine, 6-azocytidine, 6-azothymidine, 5-methyl-2-thiouridine, other thio bases such as 2-thiouridine and 4-thiouridine and 2-thiocytidine, dihydrouridine, pseudouridine, queuosine, archaeosine, naphthyl and substituted naphthyl groups, any O— and N-alkylated purines and pyrimidines such as N6-methyladenosine, 5-methylcarbonylmethyluridine, uridine 5-oxyacetic acid, pyridine-4-one, pyridine-2-one, phenyl and modified phenyl groups such as aminophenol or 2,4,6-trimethoxy benzene, modified cytosines that act as G-clamp nucleotides, 8-substituted adenines and guanines, 5-substituted uracils and thymines, azapyrimidines, carboxyhydroxyalkyl nucleotides, carboxyalkylaminoalkyl nucleotides, and alkylcarbonylalkylated nucleotides. Modified nucleotides also include those nucleotides that are modified with respect to the sugar moiety, as well as nucleotides having sugars or analogs thereof that are not ribosyl. For example, the sugar moieties may be, or be based on, mannoses, arabinoses, glucopyranoses, galactopyranoses, 4-thioribose, and other sugars, heterocycles, or carbocycles.
The phrase “codes for” and the term “encodes” mean that one sequence contains either a sequence that is identical to a referenced nucleotide sequence, a DNA or RNA equivalent of the referenced nucleotide sequence, or a DNA or RNA or a sequence that is a DNA or RNA complement of the referenced nucleotide sequence. Thus, when one refers to a sequence that codes for or encodes a recited DNA sequence, one refers to a sequence that unless otherwise specified is any one of the following: the same DNA sequence, a complement of the DNA sequence, the RNA equivalent of that sequence, or the RNA complement of that sequence or any of the aforementioned in which one or more ribonucleotides is substituted for its deoxyribonucleotide counterpart or one or more deoxyribonucleotides is substituted for its ribonucleotide counterpart.
The term “complementarity” refers to the ability of a nucleic acid to form one or more hydrogen bonds with another nucleic acid sequence by either traditional Watson-Crick base-pairing or other non-traditional types of base pairs. A percent complementarity indicates the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 being 50%, 60%, 70%, 80%, 90%, and 100% complementary, respectively). “Perfectly complementary” means that all of the contiguous residues of a nucleic acid sequence will hydrogen bond with the same number of contiguous residues in a second nucleic acid sequence. “Substantially complementary” as used herein refers to a degree of complementarity that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99%, over a region of, for example, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more consecutive nucleotides, or refers to two nucleic acids that hybridize under stringent conditions.
The terms “hybridization” and “hybridizing” refer to a process in which completely, substantially, or partially complementary nucleic acid strands come together under specified hybridization conditions to form a double-stranded structure or region in which the two constituent strands are joined by hydrogen bonds. Unless otherwise stated, the hybridization conditions are naturally occurring or lab designed conditions. Although hydrogen bonds typically form between adenine and thymine or uracil (A and T or U) or between cytidine and guanine (C and G), other base pairs may form (see e.g., Adams et al., The Biochemistry of the Nucleic Acids, 11th ed., 1992).
The term “nucleotide” refers to a ribonucleotide or a deoxyribonucleotide or modified form thereof, as well as an analog thereof. Nucleotides include species that comprise purines, e.g., adenine, hypoxanthine, guanine, and their derivatives and analogs, as well as pyrimidines, e.g., cytosine, uracil, thymine, and their derivatives and analogs. Preferably, a nucleotide comprises a cytosine, uracil, thymine, adenine, or guanine moiety. Further, the term nucleotide also includes those species that have a detectable label, such as for example a radioactive or fluorescent moiety, or mass label attached to the nucleotide. The term nucleotide also includes what are known in the art as universal bases. By way of example, universal bases include but are not limited to 3-nitropyrrole, 5-nitroindole, or nebularine. Nucleotide analogs are, for example, meant to include nucleotides with bases such as inosine, queuosine, xanthine, sugars such as 2′-methyl ribose, and non-natural phosphodiester internucleotide linkages such as methylphosphonates, phosphorothioates, phosphoroacetates and peptides.
The terms “subject” and “patient” are used interchangeably herein to refer to an organism. e.g., a vertebrate, preferably a mammal, more preferably a human Mammals include, but are not limited to, murines, simians, humans, farm animals, sport animals, and pets such as dogs and cats. The tissues, cells and their progeny of an organism or other biological entity obtained in vivo or cultured in vitro are also encompassed within the terms subject and patient. Additionally, in some embodiments, a subject may be an invertebrate animal, for example, an insect or a nematode; while in others, a subject may be a plant or a fungus.
As used herein, “treatment,” “treating,” “palliating,” and “ameliorating” are used interchangeably. These terms refer to an approach for obtaining beneficial or desired results including, but not limited to, a therapeutic benefit and/or a prophylactic benefit. By therapeutic benefit is meant any therapeutically relevant improvement in or effect on one or more diseases, conditions, or symptoms under treatment. For prophylactic benefit, the complexes of the present invention may be administered to a subject, or a subject's cells or tissues, or those of another subject extracorporeally before re-administration, at risk of developing a particular disease, condition, or symptom, or to a subject reporting one or more of the physiological symptoms of a disease, even though the disease, condition, or symptom might not have yet been manifested.
As disclosed herein, a number of ranges of values are provided. It is understood that each intervening value, to the tenth of the unit of the lower limit, unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither, or both limits are included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
The term “about” generally refers to plus or minus 10% of the indicated number. For example, “about 10%” may indicate a range of 9% to 11%, and “about 20” may mean from 18-22. Other meanings of “about” may be apparent from the context, such as rounding off; for example “about 1” may also mean from 0.5 to 1.4.
According to a first embodiment, the present invention comprises a gRNA-ligand binding complex that contains both a gRNA and a ligand binding moiety. This complex has the ability to retain association with a Type V Cas protein. Within the gRNA-ligand binding complex, the gRNA may be covalently bound directly to the ligand binding moiety or bound to the ligand binding moiety through a linker.
The gRNA of the gRNA-ligand binding complex is either a single strand of nucleotides that has at least one region that is self-complementary or two strands of nucleotides each of which has at least one region that is complementary to a region of the other strand. Within the gRNA, regardless of whether it is a single strand of nucleotides or two strands of nucleotides, there may be one or more loops. The gRNA comprises two parts: a tracrRNA and a crRNA.
The nucleotides within the gRNA may be entirely RNA or a combination of ribonucleotides and other nucleotides such as deoxyribonucleotides. Each nucleotide may be unmodified, or one or more nucleotides may be modified, e.g., with one of the following modifications: 2′-O-methyl, 2′ fluoro or 2-aminopurine. In some embodiments over one or more ranges of one to forty or two to twenty or 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 35, or 36 nucleotides, there are consecutively modified nucleotides or a modification pattern of every second, or every third or every fourth nucleotide being modified at its 2′ position with all other nucleotides being unmodified. Additionally or alternatively, between one or more pairs or every pair of consecutive nucleotides, there may be modified or unmodified internucleotide linkages.
In some embodiments, the crRNA is 35 to 60 nucleotides or 36 to 60 nucleotides long or 40 to 55 nucleotides long. Within the crRNA sequence are a Cas association region, which also may be referred to as a repeat region, that may be 18 to nucleotides long or 20 to 25 nucleotides long and a targeting region, which also may be referred to as a spacer region, that may be 18 to 30 nucleotides long or 20 to nucleotides long.
The targeting region contains the targeting sequence, which is a variable sequence that may be selected based on where one wishes for the Cas protein and/or effector to cause base editing. Thus, the targeting region may be designed to include a region that is complementary and capable of hybridization to a pre-selected target site of interest. For example, the region of complementarity between the targeting region and the corresponding target site sequence may be about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, or more than 25 consecutive nucleotides in length or it may be at least 80%, at least 85%, at least 90%, or at least 95% complementary to a region of DNA over 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, or more than 25 consecutive nucleotides. The targeting region is a region that does not hybridize with the tracrRNA and it may be downstream of the Cas association region. The Cas association is designed based on the RNA binding domain of a Cas protein with which it is intended to associate. (Not all nucleotides within the Cas association need directly associate with the Cas protein.)
The tracrRNA sequence, may, for example, be 30 to 210 nucleotides long or 45 to 120 nucleotides long or 60 to 100 nucleotides long or 70 to 90 nucleotides long. The tracrRNA sequence comprises an anti-repeat region and a distal region. In some embodiments, the anti-repeat region is 18 to 60 nucleotides long or 25 to 50 nucleotides long or 30 to 40 nucleotides long. The distal region may, for example, be 18 to 60 nucleotides long or 25 to 50 nucleotides long or 30 to 40. The distal region is a region that does not hybridize with the crRNA and it may be upstream of the anti-repeat region.
The anti-repeat region is at least 80%, at least 85%, at least 90%, at least 95%, or 100% complementary to the Cas association region over at least 18 consecutive nucleotides of the Cas association region, and consequently, the Cas association region and the anti-repeat region are capable of hybridizing to form a hybridization region. When the tracrRNA and the crRNA hybridize over the hybridization region, the gRNA is capable of retaining association with an RNA binding domain of a Type V Cas protein. Preferably, this association is possible under both naturally occurring conditions and under laboratory conditions in which the complex is to be used.
If the tracrRNA and crRNA are part of a contiguous strand of nucleotides, there may be a loop region between the tracrRNA and the crRNA of for example, 4 to or 8 to 15 nucleotides. In 5′ to 3′ direction, the gRNA may comprise, consist essentially of or consist of the distal region, the anti-repeat region, the loop, the Cas association region and the targeting region.
Examples of tracrRNAs and crRNAs that may be of use in connection with the present invention appear below (all in 5′→3′ direction with N being any nucleotide, e.g., A, C, G, or U). In each sequence below a specific number of N nucleotides are shown. However, in any of these sequences, the N may be 16-30, which means that there can be 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides
Alicyclobacillus acidoterrestris, AacCas12b crRNA
Bacillus thermoamylovorans, Bth Cas12b tracrRNA
Bacillus thermoamylovorans. Bth Cas12b crRNA
The ligand binding moiety is an element that is capable of reversibly associating with a ligand by for example, forming non-covalent interactions. In some embodiments, the ligand binding moiety is an aptamer. The ligand binding moiety may be bound to the gRNA directly, e.g., through a covalent bond, or through a linker. The association of the ligand binding moiety with the gRNA, regardless of whether directly through a covalent bond or through a linker, may be at any of a number of locations, for example, the tracrRNA in either the anti-repeat region or the distal region or in the crRNA in the targeting region or the Cas association region, or a loop between the tracrRNA and the crRNA, if present. A ligand binding moiety is bound directly to a gRNA if it is bound to a nucleotide within the gRNA, e.g., to the backbone phosphate of a unit or to a sugar moiety or to a nitrogenous base of a nucleotide.
By way of non-limiting examples, the ligand binding moiety may be bound directly (through e.g., a covalent bond) to the 3′ end of the gRNA or to the 5′ end of the gRNA if the gRNA is a single strand (which correspond to the 5′ end of the tracrRNA and the 3′ end of the crRNA) or to the 3′ end of the crRNA, the 3′ end of the tracrRNA, the 5′ end of the tracrRNA or the 5′ end of the crRNA if they are separate strands. Thus, the ligand binding moiety may be bound to the first or last nucleotide in the gRNA or in either strand of the gRNA. Alternatively, the ligand binding moiety may be bound to a nucleotide that is not the first or last nucleotide in the gRNA (if a single stranded) or tracrRNA or crRNA (if separate strands).
When the ligand binding moiety is a nucleotide sequence, and it is bound directly to the 5′ end or the 3′ end of the gRNA, it may be in the same 5′ to 3′ orientation as the gRNA. In these circumstances, there is a continuous strand of nucleotides that contains both the ligand binding moiety and the gRNA and thus is either
The ligand binding moiety may also be attached to the gRNA at a position other than the 5′ end or the 3′. When the ligand binding moiety is a nucleotide sequence it may be inserted in the gRNA, and thus there may, for example, be a first section of the gRNA that is 5′ of the ligand binding moiety and a second section of the gRNA that is 3′ of the ligand binding moiety such that there is one oligonucleotide sequence” 5′-[first section of gRNA]-[ligand binding moiety]-[second section of gRNA]-3′. Relative to a gRNA that does not contain the ligand binding moiety, the complex that contains the gRNA and the ligand binding moiety inserted there may be no deletion of nucleotides from either the Cas association region or the targeting region. Alternatively, there may be a deletion of one or more nucleotide (e.g., 1 to 10 nucleotides) at the location of insertion.
In some embodiments, the ligand binding moiety forms a stem and loop complex. In some embodiments, this stem and loop complex of the ligand binding moiety may be at a location at which in the absence of the ligand binding moiety there is a bulge or another stem-loop complex.
In some embodiments, when the ligand binding moiety is not a nucleotide sequence and it is bound to the gRNA as the location other than the 5′ or 3′ end it may, for example, be bound between two consecutive nucleotides:
In other embodiments, the ligand binding moiety is bound to a linker that is bound to the to the 3′ end of the gRNA or to the 5′ end of the gRNA or to another one or more locations within the gRNA. In some embodiments, each of the linker and the ligand binding moiety may independently comprise, consist essentially, or consist of nucleotides. In some embodiments, each of the linker and the ligand binding moiety may independently comprise, consist essentially, or consist of a moiety other than nucleotides.
When the ligand binding moiety is a nucleotide sequence and it is bound through linker to the 5′ end or the 3′ end of the gRNA each of the gRNA, the linker and ligand binding moiety may be in the same 5′ to 3′ orientation. In these circumstances, there is a continuous strand of nucleotides that contains both the ligand binding moiety and the gRNA either
The ligand binding moiety may also be attached to the gRNA through a linker at a position other than the 5′ end or the 3′. When the ligand binding moiety and the linker are nucleotide sequences, they may be inserted in the gRNA, and thus there may for example be a first section of the gRNA that is 5′ of the ligand binding moiety and linker and a second section of the gRNA that is 3′ of the ligand binding moiety and linker. In some embodiments, there are two linkers that flank the ligand binding moiety.
When there is only one linker sequence it may be either 5′ or 3′ of the ligand binding moiety such that the complex is
When there is are two linker sequences a first linker may be 5′ of the ligand binding moiety and the second linker is 3′ of the ligand binding moiety, with such that the complex is
In some embodiments, each of the first section of gRNA the linker, the ligand binding moiety, the second linker an the second section of gRNA are nucleotide sequences in the same orientation. In other embodiments, one or more of the first linker, ligand binding moiety and the second linker are in the opposite orientation to that of the first section of gRNA and the second section of gRNA, which are in the same orientation.
In some embodiments, when the ligand binding moiety is between the first section of the gRNA and the second section of the gRNA (and if one or two linkers are present they are also between the first section of the gRNA and the second section of the gRNA):
When there are two linkers present, they may be of sufficient complementary such that they can hybridize to each under. For example, each linker may be 1 to 20 nucleotides long and the linkers may be at least 80%, at least 85%, at least 90%, at least 95% at least 98% or 100% complementary and have no bulges one or more bulges. In some embodiments, the linkers are the same size; in other embodiments, the linkers are different sizes.
When a ligand binding moiety is bound to the loop of the gRNA, either directly or through a linker, the bonding may, for example, be at the first nucleotide in the loop, the second nucleotide in the loop, the third nucleotide in the loop, the fourth nucleotide in the loop, the center nucleotide in the loop if the loop has an odd number of nucleotides or one of the two center most nucleotides in the loop if the loop has an even number of nucleotides, or the last nucleotide in the loop. Any one or more of the aforementioned nucleotides and/or the 5′ and/or 3′ internucleotide linkage corresponding to them may be modified. These modifications may, for example, occur where the ligand binding moiety is bound to the gRNA (directly or through a linker) or only at locations other than where the ligand binding moiety is bound to the gRNA (directly or through a linker). For example, the ligand binding moiety can be attached to a 2′ position of a sugar or attached to a nitrogenous base in the gRNA oligonucleotide sequence.
In some embodiments, the ligand binding moiety comprises, consists essentially of, or consists of an oligonucleotide sequence that is unmodified or comprises one or more modified nucleotides. For example, the ligand binding moiety may be 10 to 50 or 18 to 50 nucleotides long. In some embodiments the ligand binding moiety forms a stem-loop structure. If there is no linker present, the ligand binding moiety may appear as an extension of the gRNA sequence immediately 5′ or 3′ of the gRNA or 5′ or 3′ of the tracrRNA or the crRNA or it may appears as an insertion.
In some embodiments, the ligand binding moiety comprises, consists essentially of, or consists of biotin or streptavidin.
In some embodiments, the ligand binding moiety is selected from the group consisting of moieties that associate with the following ligands: MS2 coat protein (MCP), Ku, PP7 coat protein (PCP), Com RNA binding protein or the binding domain thereof, SfMu, Sm7, Tat, Glutathione S-transferase (GST), CSY4, Qbeta, COM, pumilio, Anti-His Tag (6H7), SNAP-Tag, lambdaN22, a lectin (in which case ligand binding moiety may be carbohydrate or glycan or oligosaccharide), and PDGF beta-chain. In some embodiments, the ligand binding moiety is an aptamer that comprises deoxyribonucleotides, ribonucleotides or a combination of both. Therefore, as non-limiting examples, one may use DNA aptamers, RNA aptamers, DNA aptamers with modified nucleosides in the backbones, RNA aptamers with modified nucleosides in the backbones and combinations thereof.
In some embodiments, a naturally occurring MS2 aptamer is used as the ligand binding moiety. In other embodiments, one uses an MS2 C-5 mutant or an MS2 F-5 mutant or a modified MS2, e.g., MS2 in which there is one or more, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12, modified nucleotides such as an amino purine, at position 10, wherein position 10 is the tenth nucleotide from the 5′ end of an aptamer. The 2-amino purine may, for example, be 2-amino purine is 2′ deoxy-2-aminopurine or 2′ ribose 2-aminopurine. The modification at any one position may be in addition to a modification at another position or to the exclusion of a modification at any or all of the other positions.
In some embodiments, the ligand binding moiety is an aptamer that comprises a 5′ modified nucleotide, wherein the 5′ modified nucleotide comprises at least one of a 2′ modification, a 5′ PO4 group, or a modification of the nitrogenous base.
In some embodiments, the ligand binding moiety is an aptamer that is or comprises one part of an aptamer-ligand pair, and as discussed below, the effector is linked to or comprises the other part of the aptamer-ligand pair. For example, the aptamer may comprise a MS2 operator motif that specifically binds to an MS2 coat protein, MCP. As persons of ordinary skill in the art will appreciate alternatively, the aptamer can comprise the MCP moiety (or other ligand) in which case the effector would comprise or be linked to the MS2 operator motif (or other corresponding ligand binding moiety).
A linker, when present, may be a species that connects the ligand binding moiety to the gRNA. It may be attached to each of the ligand binding moiety and the gRNA at one location or it may be attached to either or both of the gRNA and the ligand binding moiety at a plurality of locations. Attachments at a plurality of locations may allow for greater control in three dimensional space of the ligand binding moiety and in turn the effector to be used.
By way of non-limiting examples, the linker may attach to the gRNA at one location and to the ligand binding moiety at two or more locations; or the linker may attach to the ligand binding moiety at one location and to the gRNA at two or more locations. When the linker is attached to the gRNA at two or more locations, the linker may be attached to the gRNA exclusively in the targeting region, exclusively in the Cas association region or in both the targeting region and the Cas association region, exclusively in the anti-repeat region, exclusively in the distal region, in both the anti-repeat region and the distal region, in both the distal region and the targeting region, in both the anti-repeat region and the Cas association region, in both the anti-repeat region and the targeting region, or in both the Cas association region and the distal region.
In some embodiments, the linker comprises, consists essentially of, or consists of an oligonucleotide sequence and optionally the linker comprises at least one or a plurality of 2′ modifications, e.g., all nucleotides are 2′ modified nucleotides within the linker. The nucleotide sequence may be random or intentionally designed not to be undesirably complementary to sequence within the aptamer, the gRNA or the target site of the DNA.
In some embodiments, the linker comprises, consists essentially of, or consists of at least one phosphorothioate linkage.
In some embodiments, the linker comprises, consists essentially of, or consists of a levulinyl moiety.
In some embodiments, the linker comprises, consists essentially of, or consists of an ethylene glycol moiety.
In some embodiments, the linker comprises or is selected from the group consisting of 18S, 9S or C3.
In some embodiments, the linker is a nucleotide sequence that is one to sixty or one to twenty-four or two to twenty or five to fifteen nucleotides long.
Additionally, in some embodiments, the linker is GC rich, e.g., having at least 50%, at least 60%, at least 70%, at least 80% or at least 90% GC nucleotides. When a linker comprises nucleotides, it may, for example, be single stranded or double stranded or partially single stranded and partially double stranded. Additionally, when a linker is an oligonucleotide, the linker may be exclusively RNA, exclusively DNA or a combination thereof.
In some embodiments, the linker is a nucleotide sequence that is upstream or downstream of the ligand binding moiety. When the linker is upstream of a ligand binding moiety and the gRNA is upstream of the linker, there may be another sequence that is complementary to the linker that is downstream of the ligand binding moiety. Similarly, when the linker is downstream of a ligand binding moiety and the gRNA is downstream of the linker, there may be another sequence that is complementary to the linker that is upstream of the ligand binding moiety. As persons of ordinary skill in the art will recognize, complementarity is determined when the oligonucleotide self-folds and the strands align with each relevant section in a 5′ to 3′ direction.
Thus, in some embodiments, the ligand binding moiety, e.g., MS2 has an upstream sequence that is 1 to 12 nucleotides long and a downstream sequence that is 1 to 12 nucleotides long, wherein the upstream and downstream sequences immediately flank the ligand binding moiety (i.e., there are no other nucleotides between the ligand binding moiety and each of the upstream and downstream sequences) and the upstream sequence is complementary to the downstream sequence. In some embodiments, each of the upstream sequence and the downstream sequence is 1 nucleotide long, 2 nucleotides long, 3 nucleotides long, 4 nucleotides long, 5 nucleotides long, 6 nucleotides long, 7 nucleotides long, 8 nucleotides long, 9 nucleotides long, 10 nucleotides long, 11 nucleotides long, or 12 nucleotides long. In one embodiment each of the upstream sequence and the downstream sequence comprises or is GC. When there are both upstream and downstream sequences, they may also be referred to as extension sequences.
In some embodiments, at least one of the gRNA or the ligand binding moiety is modified, or if a linker is present, at least one of the gRNA, the ligand binding moiety or the linker is modified. The modification refers to the introduction of a moiety or species that does not occur under naturally occurring conditions. Modifications may be used to increase one or both of stability and specificity. In some embodiments, stability is improved with respect to resistance to one or both of the active domain of the Cas protein (e.g., RuvC domain) and the active domain of one or more other enzymes within the system into which a complex of the present invention is introduced, including but not limited to any effector. Specificity is improved when a modification reduces the likelihood of an off-target effect and/or increases the likelihood that a base editing complex of the present invention will reach its target site. Nucleotides may be modified at the ribose, phosphate linkage, and/or base moiety. For example, a phosphorothioate backbone may be used, at one, a plurality or all positions within the gRNA, the anti-repeat region, the distal region, the targeting region or the Cas association region and/or the ligand binding moiety and/or linker if present.
In some embodiments, the modification is the presence of one or more 2′ modified nucleotides (e.g., 2′-O-methyl or 2′-fluoro) and/or the presence of a phosphorothioate internucleotide linkage or the introduction of a 5′-PO4 group of the gRNA and/or ligand binding moiety.
In some embodiments, a modification or set of modifications is selected such that it imparts resistance to a RuvC active nuclease domain relative to a gRNA-ligand binding complex that lacks that modification or set of modifications. The resistance may, in some embodiments, be caused by steric hindrance. In some embodiments, the modification(s) is/are located within and/or between one or more if not all of the nucleotides within the targeting region.
When more than one modification is present, the modifications may, for example, all be in the targeting region; all be in the Cas association region; all be in the anti-repeat region; all be in the distal region; all be in the ligand binding moiety; all be in the linker if present; be in both the targeting region and the Cas association region; be in both the Cas association region and the ligand binding moiety; be in both the Cas association region and the linker if present; be in both the targeting region and the ligand binding moiety; be in both the targeting region and the linker if present; be in both the ligand binding moiety and the linker if present; be in all three of the Cas association region, the targeting region and the ligand binding moiety; be in the Cas association region, the targeting region and the linker if present; be in the Cas association region, the ligand binding moiety and the linker if present; be in the targeting region, the ligand binding moiety and the linker if present; be in each of the Cas association region, the targeting region, the ligand binding moiety and the linker if present; be in the anti-repeat region and the distal region; be in the anti-repeat region and the Cas association region; be in the anti-repeat region and the targeting region; be in the distal region and the Cas association region; be in the distal region and targeting region; be in all regions of the gRNA except the distal region; be in all regions of the gRNA except the anti-repeat region; be in all regions of the gRNA except the Cas association region; be in all regions of the gRNA except the targeting region; be in each of the targeting region, the Cas association region, the anti-repeat region and the distal region; be in the ligand binding moiety and all regions of the gRNA except the anti-repeat region; be in the ligand binding moiety and all regions of the gRNA except the Cas association region; be in the ligand binding moiety and in all regions of the gRNA except the targeting region; and be in the ligand binding moiety and each of the targeting region, the Cas association region, the anti-repeat region, and the distal region.
In some embodiments, there are one to sixty or one to thirty or one to ten or ten to twenty or twenty to thirty or thirty to forty or forty to fifty or fifty to sixty 2′ modifications. By way of non-limiting examples, the set of 2′ modifications may be located: in the targeting region; in the anti-repeat region; in the distal region; in the ligand binding moiety if the ligand binding moiety is or comprises an oligonucleotide sequence; or in the Cas association region or in combinations thereof. The modifications may be on consecutive nucleotides or there may be one or more pairs of unmodified nucleotides between modified nucleotides in regular or irregular patterns. By way of a further non-limiting example, within a gRNA any one or more of positions 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18 comprises a 2′-O-alkyl group, wherein the positions are measured from the 5′ end or the 3′ end of the gRNA or the tracrRNA or the crRNA.
In some embodiments, in addition to or in the absence of 2′ modified nucleotides there are modified internucleotide linkages such as phosphorothioate linkage. Examples of modifications to the backbones of the gRNA, the ligand binding moiety (in an oligonucleotide), and the linker (if present and an oligonucleotide), include but are not limited to phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkyl phosphonates including 3′-alkylene phosphonates, 5′-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3′-amino phosphoramidate and aminoalkylphosphoramidates, phosphorodiamidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, selenophosphates and boranophosphates having normal 3′-5′ linkages, 2′-5′ linked analogs of these, and those having inverted polarity wherein one or more internucleotide linkages is a 3′ to 3′, 5′ to 5′ or 2′ to 2′ linkage. Suitable oligonucleotides having inverted polarity comprise a single 3′ to 3′ linkage at the 3′-most internucleotide linkage i.e., a single inverted nucleoside residue that may be a basic (the nucleobase is missing or has a hydroxyl group in place thereof). Various salts (such as, for example, potassium or sodium), mixed salts and free acid forms of the aforementioned internucleotide linkages are also included within the scope of the present invention.
Also within the scope of the present invention is the use of polynucleotide backbones that do not include a phosphorus atom therein and instead have backbones that are formed by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatomic or heterocyclic internucleoside linkages. These modifications include those having morpholino linkages (formed in part from the sugar portion of a nucleoside); siloxane backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; riboacetyl backbones; alkene containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having mixed N, O, S and CH2 component parts.
In some embodiments, one or more of the parts of a complex has one to sixty or one to twenty or one to ten or ten to twenty or twenty to thirty or thirty to forty to forty to fifty or fifty to sixty phosphorothioate linkages. These phosphorothioate linkages may: all be in the targeting region; all be in the Cas association region; all be in the anti-repeat region; all be in the distal region; all be in the ligand binding moiety; all be in the linker if present; be in both the targeting region and the Cas association region; be in both the Cas association region and the ligand binding moiety; be in both the Cas association region and the linker if present; be in both the targeting region and the ligand binding moiety; be in both the targeting region and the linker if present; be in both the ligand binding moiety and the linker if present; be in all three of the Cas association region, the targeting region and the ligand binding moiety; be in the Cas association region, the targeting region and the linker if present; be in the Cas association region, the ligand binding moiety and the linker if present; be in the targeting region, the ligand binding moiety and the linker if present; be in each of the Cas association region, the targeting region, the ligand binding moiety and the linker if present; be in the anti-repeat region and the distal region; be in the anti-repeat region and the Cas association region; be in the anti-repeat region and the targeting region; be in the distal region and the Cas association region; be in the distal region and targeting region; be in all regions of the gRNA except the distal region; be in all regions of the gRNA except the anti-repeat region; be in all regions of the gRNA except the Cas association region; be in all regions of the gRNA except the targeting region; be in each of the targeting region, the Cas association region, the anti-repeat region and the distal region; be in the ligand binding moiety and all regions of the gRNA except the anti-repeat region; be in the ligand binding moiety and all regions of the gRNA except the Cas association region; be in the ligand binding moiety and in all regions of the gRNA except the targeting region; and be in the ligand binding moiety and each of the targeting region, the Cas association region, the anti-repeat region, and the distal region.
Any nucleotide within a complex of the present invention may include one or more substituted sugar moieties. These nucleotides may comprise a sugar substituent group selected from: OH; H; F; O—, S—, or N-alkyl; O—, S—, or N-alkenyl; O—, S— or N-alkynyl; or O-alkyl-Co-alkyl, wherein the alkyl, alkenyl and alkynyl may be substituted or unsubstituted C1 to C10 alkyl or C2 to C10 alkenyl and alkynyl. Particularly suitable are O((CH2)nO)mCH3, O(CH2)nOCH3, O(CH2)nNH2, O(CH2)nCH3, O(CH2)nONH2, and O(CH2)nON((CH2)nCH3)2, where n and m are from 1 to about 10. Other suitable nucleotides comprise a sugar substituent group selected from: C1 to C10 lower alkyl, substituted lower alkyl, alkenyl, alkynyl, alkaryl, aralkyl, O-alkaryl or O-aralkyl, SH, SCH3, OCN, Cl, Br, CN, CF3, OCF3, SOCH3, SO2CH3, ONO2, NO2, N3, NH2, heterocycloalkyl, heterocycloalkaryl, aminoalkylamino, polyalkylamino, substituted silyl, an RNA cleaving group, a reporter group, an intercalator, a group for improving the pharmacokinetic properties of an oligonucleotide, or a group for improving the pharmacodynamic properties of an oligonucleotide, and other substituents having similar properties. By way of a non-limiting example, a suitable modification includes 2′-methoxyethoxy (2′-O-CH2CH2OCH3, also known as 2′-O-(2-methoxyethyl) or 2′-MOE) (Martin et al., Helv. Chim. Acta, 1995, 78, 486-504) or another alkoxyalkoxy group. A further suitable modification includes 2′-dimethylaminooxyethoxy, i.e., a O(CH2)2ON(CH3)2 group, also known as 2′-DMAOE, and 2′-dimethylaminoethoxyethoxy (also known in the art as 2′-O-dimethyl-amino-ethoxy-ethyl or 2′-DMAEOE), i.e., 2′-O—CH2—O—CH2—N(CH3)2.
Other suitable sugar substituent groups include methoxy (—O—CH3), aminopropoxy (—O CH2CH2CH2NH2), allyl (—CH2—CH═CH2), —O-allyl CH2—CH═CH2) and fluoro (F). 2′-sugar substituent groups may be in the arabino (up) position or ribo (down) position. A suitable 2′-arabino modification is 2′-F. Similar modifications may also be made at other positions on the oligomeric compound, particularly the 3′ position of the sugar on the 3′ terminal nucleoside or in 2′-5′ linked oligonucleotides and the 5′ position of 5′ terminal nucleotide.
Any nucleotide within a complex of the present invention may also include nucleobase (often referred to in the art simply as “base”) modifications or substitutions. Modified nucleobases include, but are not limited to other synthetic and natural nucleobases such as 5-methylcytosine (5-me-C), 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl (—C═C—CH3) uracil and cytosine and other alkynyl derivatives of pyrimidine bases, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8-substituted adenines and guanines, 5-halo particularly 5-bromo, 5-trifluoromethyl and other 5-substituted uracils and cytosines, 7-methylguanine and 7-methyladenine, 2-F-adenine, 2-amino-adenine, 8-azaguanine and 8-azaadenine, 7-deazaguanine and 7-deazaadenine and 3-deazaguanine and 3-deazaadenine. Further modified nucleobases include, but are not limited to tricyclic pyrimidines such as phenoxazine cytidine (1H-pyrimido(5,4-b)(1,4)benzoxazin-2(3H)-one), phenothiazine cytidine (1H-pyrimido(5,4-b)(1,4)benzothiazin-2(3H)-one), G-clamps such as a substituted phenoxazine cytidine (e.g. 9-(2-aminoethoxy)-H-pyrimido(5,4-(b) (1,4)benzoxazin-2(3H)-one), carbazole cytidine (2H-pyrimido(4,5-b)indol-2-one), pyridoindole cytidine (H-pyrido(3′,2′:4,5)pyrrolo(2,3-d)pyrimidin-2-one) and 5-methoxy uracil.
Heterocyclic base moieties may also include, but are not limited to, those in which the purine or pyrimidine base is replaced with other heterocycles, for example, 7-deaza-adenine, 7-deazaguanosine, 2-aminopyridine and 2-pyridone. Examples of other nucleobases include those disclosed in U.S. Pat. No. 3,687,808, those disclosed in The Concise Encyclopedia Of Polymer Science And Engineering, pages 858-859, Kroschwitz, J. I., ed. John Wiley & Sons, 1990, those disclosed by Englisch et al., Angewandte Chemie, International Edition, 1991, 30, 613, and those disclosed by Sanghvi, Y. S., Chapter 15, Antisense Research and Applications, pages 289-302, Crooke, S. T. and Lebleu, B., ed., CRC Press, 1993. Certain of these nucleobases are useful for increasing the binding affinity of an oligomeric compound: 5-substituted pyrimidines, 6-azapyrimidines and N-2, N-6 and O-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine. Additionally, 5-methylcytosine substitutions may be advantageous when combined with 2′-O-methoxyethyl sugar modifications.
In some embodiments, there are two ligand binding moieties associated with a gRNA: a first ligand binding moiety and a second ligand binding moiety. Optionally, there may be two linkers: a first linker and a second linker, wherein the first ligand binding moiety is attached to the first linker and the second ligand binding moiety is attached to the second linker. In these embodiments, the first linker and the second linker may each be attached to the Cas association region; or the first linker and the second linker may each be attached to the targeting region; or one of the first linker and the second linker may be attached to the Cas association region and the other of the first linker and the second linker may be attached to the targeting region, or the first linker and the second linker may each be attached to the anti-repeat region; or the first linker and the second linker may each be attached to the distal region; or one of the first linker and the second linker may be attached to the distal region and the other of the first linker and the second linker may be attached to the anti-repeat region; or one of the first linker and the second linker may be attached to the distal region and the other of the first linker and the second linker may be attached to the Cas association region; or one of the first linker and the second linker may be attached to the distal region and the other of the first linker and the second linker may be attached to the targeting region; or one of the first linker and the second linker may be attached to the anti-repeat region and the other of the first linker and the second linker may be attached to the Cas association region; or one of the first linker and the second linker may be attached to the anti-repeat region and the other of the first linker and the second linker may be attached to the targeting region; or one of the first linker and the second linker may be attached to a loop region if present and the other of the first linker and the second linker may be attached to one of the targeting region, the Cas association region, the anti-repeat region, or the distal region.
According to another embodiment of the present invention, there is a base editing complex. The base editing complex comprises, consists essentially of, or consists of a gRNA-ligand binding complex of the present invention; and a Type V Cas protein, wherein the Cas association region and anti-repeat region of the gRNA-ligand binding complex are associated with the Type V Cas protein. Thus, the gRNA is capable of associating with the Cas protein. After association has occurred, the Cas protein will, based on the identity of targeting sequence, find the target site.
In general, a Cas protein includes at least one RNA binding domain. The RNA binding domain interacts with the guide RNA at the Cas association region. The Type V Cas protein that is of use in the present invention is one with which the gRNA-ligand binding complex can associate in the presence of a tracrRNA that contains an anti-repeat region that is of sufficient complementarity to the Cas association region. In some embodiments, the Type V Cas protein is an endonuclese that contains a RuvC domain This RuvC domain can be mutated such that the endonuclease activity is deactivated. In some embodiments, the protein is a nickase that contains an active or deactivated RuvC domain.
Examples of Type V Cas proteins that may be of use in connection with the present invention include, but are not limited to, Cas12b, Cas12e, CasMINI and Cas12f in active or deactivated form.
The Cas protein may be provided in purified or isolated form or can be part of a composition or a complex. Preferably, when in a composition, the protein is first purified to some extent, more preferably to a high level of purity (e.g., about 80%, 90%, 95%, or 99% or higher). Compositions in which the complexes and components of the present invention may be stored and transported may be any type of composition desired, e.g., aqueous compositions suitable for use as, or inclusion in, a composition for RNA-guided targeting that a person of ordinary skill in the art would appreciate would be of use in connection with the present invention.
In some embodiments, the Type V Cas proteins comprise a fusion protein having (a) an active, partially deactivated or deactivated Type V Cas protein, and (b) a uracil DNA glycosylase (UNG) inhibitor peptide (UGI). The UGI peptide can be fused directly to the Type V Cas protein or through a linker peptide comprised of 1 to 100 hundred amino acid residues. In some embodiments, the UGI comprises the wild type UGI sequence from the Bacillus phage PBS2 (https://www.ncbi.nlm.nih.gov/protein/P14739): MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENV MLLTSDAPEYKPWALVIQDSNGENKIKML (SEQ ID NO: 22). In some embodiments, the UGI comprises variants of SEQ ID NO: 22 that comprises a fragment of the wild type UGI peptide or a homologous amino acid sequence to SEQ ID NO: 22. In some embodiments, the UGI fragment of homologous sequence comprises at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 99%, or at least 99.5% homology to the wild type UGI peptide sequence (SEQID NO: 22).
In some embodiments, the active or deactivated Type V Cas protein comprises a fusion with two or more UGI peptides or variants. The UGI peptides, or variants of the UGI peptide, can be connected directly to another UGI peptide or Type V Cas protein or via a linker of 1 to 100 amino acid residues to another UGI peptide or Type V Cas protein.
The Cas protein or Cas protein fusion may be provided in purified or isolated form or can be part of a composition or complex.
The base editing complexes of the present invention may contain an effector that is attached to a ligand. The ligand is capable of reversibly or irreversibly associating with the ligand binding moiety. Thus, the ligand binding moiety recruits an effector, e.g. base editing enzyme that is fused to or otherwise associated with the ligand, because the ligand binding moiety is capable of retaining association with the ligand. This design may be particularly advantageous because it provides a modular design in which the nucleic acid sequence targeting function of the gRNA and effector function reside in different molecules. For example, to introduce modifications serially at the same site, one may use different effectors that are associated with the same ligand. Conversely, to introduce the same modifications at different sites, one may use the same ligand binding moiety with different gRNAs while using the same effector-ligand. Thus, this design allows one to multiplex a system without an undesirable burden of fusing effectors to either gRNAs or Cas proteins.
Examples of effectors that may be of use in connection with the present invention are deaminases such as those that have cytidine deamination or adenine deamination activity, as well as transcriptional regulators, repair enzymes, epigenetic modifiers, histone acetylases, deacetylases, methylases (of histones ad nucleotides), and demethylases (of histones and nucleotides). In some embodiments, the effector is selected from the group consisting of AID, CDA, APOBEC1, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, ADA, ADAR and tRNA adenosine deaminase. Examples of effectors and the types of genetic change that they case are provided in table 1.
Effector protein full names:
In some embodiments, the base editing complex comprises two or more effectors. When there are two effectors they may be referred to as: a first effector and a second effector. Each effector may be attached to a different ligand binding moiety through a different ligand. Alternatively, when there are two effectors present, one is attached to a ligand and associated with the gRNA through the ligand binding moiety and another is attached directly to the Cas protein.
As noted above, the effector is bound to a ligand, e.g., by one or more covalent bonds. A non-exhaustive list of examples of ligand binding moiety-ligand pairs that may be used in various embodiments of the present invention is provided in Table 2. Both unmodified and chemically modified versions or the ligand binding moieties and ligands are within the scope of the present invention.
Some of the sequences for the above binding pairs are listed below.
In each of the aforementioned sequences, one may, for example, use the identical sequence or sequences that have one or more insertions, deletions or substitutions in one or both sequences of a binding pair. By way of a non-limiting example, for either or both members of a binding pair one may use a sequence that is at least 80%, at least 85%, at least 90%, at least 95%, or at least 98% the same as an aforementioned sequence.
In some embodiments, the base-editing complexes of the present invention are combined with additional chemistry technologies. For example, in some embodiments, a base editing complex further comprises a cysteine/selenocysteine tag. In some embodiments, the base editing complex comprises or is associated with elements for cycloaddition via click chemistry.
In another embodiment, the present invention provides methods for base editing. In these methods, one exposes a base editing complex of the present invention to double-stranded DNA or to a solution that contains dsDNA or to a cell that contains dsDNA or to a subject. The method may occur in vitro or be conducted in vivo or ex vivo and may comprise delivering the base editing complex to a subject as part of a medicament for treatment.
These methods may, for example, be used to modify an immune cell selected from a T cell (including a primary T cell), Natural Killer (NK cell), B cell, or CD34+ hematopoietic stem progenitor cell (HSPC). The immune cell may be an engineered immune cell, such as T-cell comprising a CAR or TCR. The methods herein may thus be applied to engineer further a cell that has already been modified to include a CAR and/or TCR that is useful in therapy. By way of further example, primary immune cells, either naturally occurring within a host animal or patient, or derived from a stem cell or an induced pluripotent stem cell [iPSC] may be genetically modified using the methods and complexes provided herein. Suitable stem cells include, but are not limited to, mammalian stem cells such as human stem cells, including, but not limited to, hematopoietic, neural, embryonic, induced pluripotent stem cells (iPSC), mesenchymal, mesodermal, liver, pancreatic, muscle, and retinal stem cells. Other stems cells include, but are not limited to, mammalian stem cells such as mouse stem cells, e.g., mouse embryonic stem cells.
Provided herein are also methods for genome engineering (e.g., altering or manipulating the expression of one or more genes or one or more gene products) in prokaryotic or eukaryotic cells, in vitro, in vivo, or ex vivo. In particular, the methods provided herein may be useful for targeted base editing disruption in mammalian cells including primary human T cells, natural killer (NK) cells, CD34+ hematopoietic stem and progenitor cells (HSPCs), such as HSPCs isolated from umbilical cord blood or bone marrow and cells differentiated from them.
Also provided herein are genetically engineered cells arising from haematopoietic stem cells, such as T cells, that have been modified according to the methods described herein.
In some cases, the methods are configured to produce genetically engineered T cells arising from HSCs or iPSCs, that are suitable as “universally acceptable” cells for therapeutic application. Haemopoietic stem cells (HSCs) arise from hemangioblasts, which can give rise to HSCs, vascular smooth muscle cells and angioblasts, which differentiate into vascular endothelial cells. HSCs can give rise to common myeloid and common lymphoid progenitors from which arise T cells, Natural Killer (NK) cells, B cells, myeloblasts, erythroblasts and other cells involved in the production of cells of blood, bone marrow, spleen, lymph nodes, and thymus. Such methods can also be applied to natural killer (NK) cells, CD34+ hematopoietic stem and progenitor cells (HSPCs), such as HSPCs isolated from umbilical cord blood or bone marrow and cells differentiated from them.
In another aspect, provided herein are methods for targeting diseases for base editing correction. In some of the methods, the base editing complexes are delivered to a subject for treatment. The target sequence can be any disease-associated polynucleotide or gene. Examples of useful applications of mutation or correction of an endogenous gene sequence according to the present invention include but are not limited to: alterations of disease-associated gene mutations, alterations in sequences encoding splice sites, alterations in regulatory sequences, alterations in sequences to cause a gain-of-function mutation, and/or alterations in sequences to cause a loss-of-function mutation, and targeted alterations of sequences encoding structural characteristics of a protein.
The base editing complexes or their components may be delivered to target cells and organisms via various methods and various formats (DNA, RNA or protein) or combination of these different formats. The base editing components may be delivered as: (a) DNA polynucleotides that encode the relevant sequence for the protein effectors or the guide RNAs; (b) synthetic RNA encoding the sequence for the protein effectors (messenger RNA) or the guide RNAs; (c) purified protein for the effectors. When delivering as protein format, the Type V Cas protein can be assembled with the guide RNAs to form a ribonucleoprotein complex (RNP) tor delivery into target cells and organisms.
For example, the components or complexes as assembled may be delivered together or separately by electroporation, by nucleofection, by transfection, via nanoparticles, via viral mediated RNA delivery, via non-viral mediated delivery, via extracellular vesicles (for example, exosome and microvesicles), via eukaryotic cell transfer (for example, by recombinant yeast) and other methods that can package molecules such that they can be delivered to a target viable cell without changes to the genomic landscape.
Other methods include, but are not limited to, non-integrative transient transfer of DNA polynucleotides that include the relevant sequence for the protein recruitment so that the molecule can be transcribed into the desired RNA molecule and for amino acid containing components translated into a protein or protein fragment. This includes, without limitation, DNA-only vehicles (for example, plasmids, MiniCircles, MiniVectors, MiniStrings, Protelomerase generated DNA molecules (for example Doggybones), artificial chromosome (for example HAC), and cosmids), via DNA vehicles by nanoparticles, extracellular vesicles (for example, exosome and microvesicles), via eukaryotic cell transfer (for example, by recombinant yeast), transient viral transfer by AAV, non-integrating viral particles (for example, lentivirus and retrovirus based systems), cell penetrating peptides and other technology that can mediate the introduction of DNA into a cell without direct integration into the genomic landscape. Another method for the introduction of the RNA components include the use of integrative gene transfer technology for stable introduction of the machinery for RNA transcription into the genome of the target cells, this can be controls via constitutive or promoter inducible systems to attenuate the RNA expression and this can also be designed so that the system can be removed after the utility has been met (for example, introducing a Cre-Lox recombination system), such technology for stable gene transfer includes, but is not limited to, integrating viral particles (for example, lentivirus, adenovirus and retrovirus based systems), transposase mediate transfer (for example Sleeping Beauty and Piggybac), exploitation of the non-homologous repair pathways introduced by DNA breaks (for example, utilizing CRISPR and TALEN) technology and a surrogate DNA molecule, and other technology that encourages integration of the target DNA into a cell of interest.
The various components of the complexes of the present invention, if not synthesized enzymatically within a cell or solution, may be created chemically or, if naturally occurring, isolated and purified from naturally occurring sources. Methods for chemically and enzymatically synthesizing the various embodiments of the present invention are well known to persons of ordinary skill in the art. Similarly, methods for ligating or introducing covalent bonds between components of the present invention are also well known to persons of ordinary skill in the art.
By way of a non-limiting example, the complexes of the present invention may be used to recruit transcriptional activators such as p65 and V64, as well as moieties that introduce epigenetic modifications or affect HDR. The complexes of the present invention can also be used for the following applications; base editing, genome editing, genome screening, generation of therapeutic cells, genome tagging, epigenome editing, karyotype engineering, chromatin imaging, transcriptome and metabolic pathway engineering, genetic circuits engineering, cell signaling sensing, cellular events recording, lineage information reconstruction, gene drive, DNA genotyping, miRNA quantification, in vivo cloning, site-directed mutagenesis, genomic diversification, and proteomic analysis in situ. In some embodiments, a cell or a population of cells are exposed to a base editing complex of the present invention and the cell or cells are introduced to a subject by infusion.
Applications also include research of human diseases such as cancer immunotherapy, antiviral therapy, bacteriophage therapy, cancer diagnosis, pathogen screening, microbiota remodeling, stem-cell reprogramming, immunogenomic engineering, vaccine development, and antibody production.
The coding sequence for the Cas may be synthetically obtained and cloned into a vector under the control of the mouse CMV promoter (mCMV) in a T2A polycistronic cassette with a red fluorescent protein-puromycin fusion. A deactivated version of the Cas and 2xUGI fusion to the deactivated Cas12e variants may also be obtained and cloned in the aforementioned vector. The coding sequence for MS2 coat protein APOBEC fusion (MCP-APOBEC) may be obtained and cloned into an expression vector under control of the mouse CMV promoter. The sequence for gRNAs containing the MS2 ligand binding moiety and unique spacer regions may be cloned into an expression vector under control of the hU6 promoter.
HEK 293T cells (ATCC, #CRL-11268) may be seeded at 20,000 cells per well in a 96-well plate one day prior to transfection. Cells may be co-transfected using DharmaFECT Duo Transfection Reagent (Horizon Discovery, #T-2010) and 200 ng Cas12e plasmid, or its deactivated variants fused or not with UGI, 50 ng MCP-APOBEC plasmid, and 50 ng gRNA plasmid. The gRNA plasmid may consist of a constant region length of 101 nucleotides, different spacer sequences targeting transcripts within PPIB or EMX1 gene targets, and have the MS2 ligand binding moiety at the 5′ terminus, the 3′ terminus, internally not at either the 5′ or 3′ terminus, or combinations therein.
Cells may be selected in puromycin containing media and harvested 48 hours post-transfection for further processing as described below and the following sequences may be used.
mRNA Preparation:
Messenger mRNA may be prepared from DNA vectors carrying the T7 promoter and the coding sequences for Cas12e, dCas12e-UGI and MCP-APOBEC following the standard protocols for mRNA in vitro transcription.
The crRNA may be synthesized by Horizon Discovery using either 2′-acetoxy ethyl orthoester (2′-ACE) or 2′-tert-butyldimethylsilyl (2′-TBDMS) protection chemistries. RNA oligos may be 2′-depotected/desalted and purified by either high-performance liquid chromatography (HPLC) or polyacrylamide gel electrophoresis (PAGE). Oligos may be resuspended in 10 mM Tris pH7.5 buffer prior to electroporation.
HEK 293T cells (ATCC, #CRL-11268) may be electroporated using the Invitrogen™ Neon™ Transfection System, 10 μl Kit. A mixture of 50,000 cells, 1 μg of Cas12e or dCas12e-UGI mRNA and MCP-APOBEC mRNA, and 3 μM of synthetic crRNA and tracrRNA may be electroporated at 1150V for 20 ms and for 2 pulses. The chemically synthesized crRNA may consist of a constant region length of 23 nucleotides, different spacer sequences targeting transcripts within PPIB or EMX1 gene targets, and have the MS2 ligand binding moiety at the 5′ terminus, the 3′ terminus, internally not at either the 5′ or 3′ terminus, or combinations therein. Each sequence may contain chemical modifications at one or more bases and within one or more linkages. Cells may be plated in a 96-well plate with full serum media and harvested after 72 hours for further processing. The sequences below may be used.
For both examples 1 and 2:
Cells may be lysed in 100 μL of a buffer containing proteinase K (Thermo Scientific, #FERE00492), RNase A (Thermo Scientific, #FEREN0531), and Phusion GC buffer (Thermo Scientific, #F-518L) for 30 min at 56° C., followed by a 5 min heat inactivation at 95° C. This cell lysate may be used to generate PCR amplicons spanning the region containing the base editing site(s). Unpurified PCR amplicons between 500-1000 bp in length may be sequenced by Sanger sequencing.
Base editing efficiencies may be calculated using the Chimera analysis tool, an adaptation of the open source tool BEAT ((Xu et al. 2019. BEAT: A Python Program to Quantify Base Editing from Sanger Sequencing. The CRISPR Journal 2, 223-229). Chimera determines editing efficiency by first subtracting the background noise to define the expected variability in a sample. This allows the estimation of editing efficiency without the need to normalize to control samples. Following this, Chimera filters out any outliers from the noise using the Median Absolute Deviation (MAD) method and then assesses the editing efficiency of the base editor over the span of the 20 bp input guide sequence.
The following materials and methods were used in examples 3 — 6.
gRNA Sequences
For examples 3-5 (using Cas12b base editor), gRNA sequences encoded by the sequences listed in table 3 were used. All gRNA designs were based on the A. acidoterrestris Cas12b gRNA consisting of a 91 nt constant gRNA sequence, a target specific 20 nt spacer sequence, and a 7 nt poly-T U6 termination signal. All modifications were made to the constant region of the gRNA and consist of the inclusion of the RNA aptamer hairpins. A single copy of the MS2 hairpin sequence (C5 variant) were incorporated into either the 5′, 3′, stem-loop or internal polyU (internal stretch of UUUUU within the gRNA) of the gRNA. The relevant gRNA sequences were cloned into a separate expression vector under control of the hU6 promoter.
In table 3, N denotes the 20 nt target specific spacer sequence. The constant gRNA sequence as previously described is highlighted in bold. MS2 (C5 variant) is displayed in italics whilst the extensions to the aptamer are shown in italics and underlined. The mutation introduced in the internal polyU stretch is in bold and underlined.
A. acidoterrestris
GTCTAGAGGACAGAATTTTTCAACGGGTGTGCCAATGGCCACTT
TCCAGGTGGCAAAGCCCGTTGAGCTTCTCAAATCTGAGAAGTGG
CACNNNNNNNNNNNNNNNNNNNNTTTTTTT
G
GC
ACATGAGGATCACCCATGT
GC
GTCTAGAGGACAGAATTTTT
CAACGGGTGTGCCAATGGCCACTTTCCAGGTGGCAAAGCCCGTT
GAGCTTCTCAAATCTGAGAAGTGGCACNNNNNNNNNNNNNNNNN
G
GC
ACATGAGGATCACCCATGT
GC
GTCTAGAGGACAGAATT
G
TT
CAACGGGTGTGCCAATGGCCACTTTCCAGGTGGCAAAGCCCGTT
GAGCTTCTCAAATCTGAGAAGTGGCACNNNNNNNNNNNNNNN
GGTCTAGAGGACAGAATTTTTCAACGGGTGTGCCAATGGCCACT
TTCCAGGTGGCAAAGCCCGTTGAGCTTCTCAAATCTGAGAAGTG
GCAC
GC
ACATGAGGATCACCCATGT
GC
NNNNNNNNNNNNNNNN
GGTCTAGAGGACAGAATT
G
TTCAACGGGTGTGCCAATGGCCACT
TTCCAGGTGGCAAAGCCCGTTGAGCTTCTCAAATCTGAGAAGTG
GCAC
GC
ACATGAGGATCACCCATGT
GC
NNNNNNNNNNNNNNNN
GGTCTAGAGGACAGAATTTTTCAACGGGTGTGCCAATGG
ACATG
AGGATCACCCATGT
CCAGGTGGCAAAGCCCGTTGAGCTTCTCAA
ATCTGAGAAGTGGCACNNNNNNNNNNNNNNNNNNNNTTTTTTT
GGTCTAGAGGACAGAATT
G
TTCAACGGGTGTGCCAATGG
ACATG
AGGATCACCCATGTCCAGGTGGCAAAGCCCGTTGAGCTTCTCAA
ATCTGAGAAGTGGCACNNNNNNNNNNNNNNNNNNNNTTTTTTT
GGTCTAGAGGACAGAATT
GC
ACATGAGGATCACCCATGT
GC
TTT
CAACGGGTGTGCCAATGGCCACTTTCCAGGTGGCAAAGCCCGTT
GAGCTTCTCAAATCTGAGAAGTGGCACNNNNNNNNNNNNNNNNN
For example 6 (using CasMINI base editor), gRNA sequences encoded by the sequences listed in table 4 were used. All gRNA designs were based on the Acidibacillus sulfuroxidans Cas12f gRNA consisting of a 162 nt constant gRNA sequence, a target specific 23 nt spacer sequence, and a 7 nt poly-T U6 termination signal. All modifications were made to the constant component of the gRNA and consist of the inclusion of the RNA aptamer hairpins and truncations of stem-loops. A single copy of the MS2 hairpin sequence (C5 variant) was incorporated into either the 5′, 5′ and a truncation of the stem loop 1, stem loop 1 extension, replacement of stem loop 2 and truncation of stem loop 1, or replacement of the repeat:antirepeat. The relevant gRNA sequences were cloned into a separate expression vector under control of the hU6 promoter.
In table 4, N denotes the 23 nt target specific spacer sequence. The constant gRNA sequence is highlighted in bold. The MS2 (C5 variant) is displayed in italics whilst the extensions to the aptamer are shown in italics and underlined. Design 1 corresponds to a 5′MS2, design 2 corresponds to 5′MS2 and a stem loop 1 truncation, design 3 corresponds to an MS2 located as an extension of stem loop 1, design 4 corresponds to a MS2 located as a replacement of stem loop 2 and a truncation of stem loop 1, design 5 corresponds to a MS2 located as a replacement of the repeat:antirepeat region.
A. sulforoxidans
ATGGGCTTCACTGATAAAGTGGAGAACCGCTTCACCAA
AAGCTGTCCCTTAGGGGATTAGAACTTGAGTGAAGGTG
GGCTGCTTGCATCAGCCTAATGTCGAGAAGTGCTTTCT
TCGGAAAGTAACCCTCGAAACAAATTCATTTGAATGAA
GGAATGCAACNNNNNNNNNNNNNNNNNNNNNNNTTTTT
GC
ACATGAGGATCACCCATGT
GC
ATGGGCTTCACTGAT
AAAGTGGAGAACCGCTTCACCAAAAGCTGTCCCTTAGG
GGATTAGAACTTGAGTGAAGGTGGGCTGCTTGCATCAG
CCTAATGTCGAGAAGTGCTTTCTTCGGAAAGTAACCCT
CGAAACAAATTCATTTGAATGAAGGAATGCAACNNNNN
GC
ACATGAGGATCACCCATGT
GC
CGCTTCACCAAAAGC
TGTCCCTTAGGGGATTAGAACTTGAGTGAAGGTGGGCT
GCTTGCATCAGCCTAATGTCGAGAAGTGCTTTCTTCGG
AAAGTAACCCTCGAAACAAATTCATTTGAATGAAGGAA
TGCAACNNNNNNNNNNNNNNNNNNNNNNNTTTTTTT
GGG
GC
ACATGAGGATCACCCATGT
GC
AACCGCTTCACC
AAAAGCTGTCCCTTAGGGGATTAGAACTTGAGTGAAGG
TGGGCTGCTTGCATCAGCCTAATGTCGAGAAGTGCTTT
CTTCGGAAAGTAACCCTCGAAACAAATTCATTTGAATG
AAGGAATGCAACNNNNNNNNNNNNNNNNNNNNNNNTTT
ACCGCTTCAC
GC
ACATGAGGATCACCCATGT
GC
GTGAA
GGTGGGCTGCTTGCATCAGCCTAATGTCGAGAAGTGCT
TTCTTCGGAAAGTAACCCTCGAAACAAATTCATTTGAA
TGAAGGAATGCAACNNNNNNNNNNNNNNNNNNNNNNNT
GGGCTTCACTGATAAAGTGGAGAACCGCTTCACCAAAA
GCTGTCCCTTAGGGGATTAGAACTTGAGTGAAGGTGGG
CTGCTTGCATCAGCCTAATGTCGAGAAGTGCTTTCTTC
GGAAAGTAACCCTCGAAACAAA
GC
ACATGAGGATCACC
CATGT
GC
GGAATGCAACNNNNNNNNNNNNNNNNNNNNN
Aside from the gRNA, all components of the systems used were encoded on two vectors and expressed from a CMV promoter. The first vector encoded an enhanced human Apobec3A-MCP or Anolis carolinensis Apobec1a-MCP fusion protein (Deaminase vector). The second vector encoded a dCas12b (D569A, E847A, D976A) or dCasMINI (D325A; D509A) fused to two copies of UGI through its C-terminus (Cas vector). dCasMINI is a previously described version of dUn1Cas12f1. The dCas12b-UGI-UGI and dCasMINI-UGI-UGI fusion proteins were flanked by 2 copies of the SV40 NLS at the N terminus of the Cas sequence and the C terminus of UGI sequence. Additionally, the vector encoded the expression of turboRFP to allow the monitoring of transfection efficiency.
The relevant gRNA sequences were cloned into a separate expression vector under control of the hU6 promoter (gRNA expression vector).
One plasmid, which appears below as SEQ ID NO: 72 and was used in examples 3, 4, and 5, encodes a dead Cas12b fused to two UGIs: dCas12b-UGI-UGI. Below the following fonts are used:
ATCATTGAGAAGGAGACTGGGAAACAGCTGGTCATTCAGGAGTCCATC
CTGATGCTGCCTGAGGAGGTGGAGGAAGTGATCGGCAACAAGCCAGAG
TCTGACATCCTGGTGCACACCGCCTACGACGAGTCCACAGATGAGAAT
GTGATGCTGCTGACCTCTGACGCCCCCGAGTATAAGCCTTGGGCCCTG
GTCATCCAGGATTCTAACGGCGAGAATAAGATCAAGATGCTGAGCGGA
GGATCCGGAGGATCTGGAGGCAGCACCAACCTGTCTGACATCATCGAG
AAGGAGACAGGCAAGCAGCTGGTCATCCAGGAGAGCATCCTAATGCTT
CCCGAAGAAGTCGAAGAAGTGATCGGAAACAAGCCTGAGAGCGATATC
CTGGTCCATACTGCGTATGATGAAAGTACCGACGAAAACGTAATGCTA
CTCACATCCGACGCCCCAGAGTATAAGCCCTGGGCTCTAGTTATACAA
GACTCCAACGGAGAGAACAAAATCAAAATGCTG
TCTGGCGGCTCAAAA
AGAACCGCCGACGGCAGCGAATTCGAG
CCCAAGAAGAAGAGGAAAGTC
The corresponding amino acid sequence for SEQ ID NO: 72 appears below as SEQ ID NO: 73, and in that sequence the following fonts are used:
EVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGEN
KIKML
SGGSGGSGGS
TNLSDIIEKETGKQLVIQESILMLPEEVEEVIGN
KPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML
S
GGSKRTADGSEFE
PKKKRKV
A second plasmid, which appears below as SEQ ID NO: 74 and was used in example 6, encodes a dead CasMINI fused to two UGIs: dCasMINI-UGI-UGI. Below the following fonts are used:
CGGGAGCGGCGGGAGCGGGGGGAGC
ACTAATCTGAGCGACATCATTGAG
AAGGAGACTGGGAAACAGCTGGTCATTCAGGAGTCCATCCTGATGCTGC
CTGAGGAGGTGGAGGAAGTGATCGGCAACAAGCCAGAGTCTGACATCCT
GGTGCACACCGCCTACGACGAGTCCACAGATGAGAATGTGATGCTGCTG
ACCTCTGACGCCCCCGAGTATAAGCCTTGGGCCCTGGTCATCCAGGATT
CTAACGGCGAGAATAAGATCAAGATGCTG
AGCGGAGGATCCGGAGGATC
TGGAGGCAGCACCAACCTGTCTGACATCATCGAGAAGGAGACAGGCAAG
CAGCTGGTCATCCAGGAGAGCATCCTGATGCTGCCCGAAGAAGTCGAAG
AAGTGATCGGAAACAAGCCTGAGAGCGATATCCTGGTCCATACCGCCTA
CGACGAGAGTACCGACGAAAATGTGATGCTGCTGACATCCGACGCCCCA
GAGTATAAGCCCTGGGCTCTGGTCATCCAGGATTCCAACGGAGAGAACA
AAATCAAAATGCTGTCTGGCGGCTCAAAAAGAACCGCCGACGGCAGCGA
ATTCGAG
CCCAAGAAGAAGAGGAAAGTCTAA
The corresponding amino acid sequence for SEQ ID NO: 74 appears below as SEQ ID NO: 75, and in that sequence the following fonts are used:
GSGGSGGS
TNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDIL
VHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSGGS
GGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAY
DESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML
SGGSKRTADGSE
FE
PKKKRKV
A third plasmid, which appears below as SEQ ID NO: 76, encodes for a deaminase fused to MCP, Enhanced human Apobec3A-MCP. Below the following fonts are used:
CCCTGGGCGACACCACACACACCTCTCCACCTTGCCCAGCACCAGAGCT
GCTGGGAGGCCCT
ATGGCCAGCAACTTCACACAGTTTGTGCTGGTGGAT
AATGGAGGAACCGGCGACGTGACAGTGGCACCATCTAACTTTGCCAATG
GCATCGCCGAGTGGATCAGCTCCAACTCTCGGAGCCAGGCCTATAAGGT
GACCTGTAGCGTGCGGCAGTCTAGCGCCCAGAATAGAAAGTATACAATC
AAGGTGGAGGTGCCTAAGGGCGCCTGGAGATCCTACCTGAACATGGAGC
TGACCATCCCAATCTTTGCCACAAATTCTGATTGCGAGCTGATCGTGAA
GGCCATGCAGGGCCTGCTGAAGGACGGCAACCCTATCCCAAGCGCCATC
GCCGCCAATAGCGGAATCTACTGA-3′
The corresponding amino acid sequence for SEQ ID NO: 76 appears below as SEQ ID NO: 77, and in that sequence the following fonts are used:
GGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIK
VEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIA
ANSGIY
A fourth plasmid, which appears below as SEQ ID NO: 78, encodes for the deaminase fused to MCP, Anolis carolinensis Apobec1a-MCP. Below the following fonts are used:
AAGACACCCCTGGGCGACACCACACACACCTCTCCACCTTGCCCAGCAC
CAGAGCTGCTGGGAGGCCCT
ATGGCCAGCAACTTCACACAGTTTGTGCT
GGTGGATAATGGAGGAACCGGCGACGTGACAGTGGCACCATCTAACTTT
GCCAATGGCATCGCCGAGTGGATCAGCTCCAACTCTCGGAGCCAGGCCT
ATAAGGTGACCTGTAGCGTGCGGCAGTCTAGCGCCCAGAATAGAAAGTA
TACAATCAAGGTGGAGGTGCCTAAGGGCGCCTGGAGATCCTACCTGAAC
ATGGAGCTGACCATCCCAATCTTTGCCACAAATTCTGATTGCGAGCTGA
TCGTGAAGGCCATGCAGGGCCTGCTGAAGGACGGCAACCCTATCCCAAG
CGCCATCGCCGCCAATAGCGGAATCTACTGA-3′
The corresponding amino acid sequence for SEQ ID NO: 78 appears below as SEQ ID NO: 79, and in that sequence the following fonts are used:
TPLGDTTHTSPPCPAPELLGGP
MASNFTQFVLVDNGGTGDVTVAPSNFA
NGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNM
ELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY
The gRNA component of the base editing system was expressed on a separate vector with expression being driven by the RNA polymerase III U6 promoter (gRNA expression vector). The gRNA was expressed as a single unit encompassing the crRNA and tracrRNA components of A. acidoterrestris Cas12b or Acidibacillus sulfuroxidans Cas12f linked by an artificial tetra-loop as previously described. A list of gRNA target sequences for each Cas protein are shown in table 5.
HEK293 cells were grown in Dulbecco's modified Eagle medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and 100 U m1-1 penicillin/streptomycin. 24 hours prior to transfection 10,000 cells were seeded into a single well of a 96-well plate to achieve ˜70% confluency for transfection. After 24 hours the cells were lipid transfected with 200 ng of plasmid DNA (75 ng Cas vector, 75 ng Deaminase vector, and 50 ng gRNA expression vector) with 0.7 μl of DharmaFECT DUO (Horizon discovery) per well of a 96 well plate.
72 hours after transfection, the medium was removed, and the cells were washed 1× with PBS and 50 μl of TrypLE express enzyme (ThermoFisher scientific) was added to each well. After cells were dissociated, 100 ul of fresh DMEM was added and 20 μl of the resuspended cells were transferred to a 96 well plate and were incubated with 60 μl of DirectPCR lysis reagent (Viagen biotech) under the following conditions: 55° C. for 45 minutes followed by 95° C. for 15 minutes, and then the cell lysates were stored at −20° C.
1 μl of cell lysate obtained using the DirectPCR lysis reagent was used per PCR reaction. For NGS analysis, the Q5 high-fidelity 2× master mix (NEB) was used for amplification of sgRNA target sites, reaction mixes were set up as follows:
The PCR reaction was performed under the thermocycling conditions identified as follows:
For Sanger sequencing analysis, regions of interest were PCR-amplified using GoTaq Hot Start polymerase (Promega). Reaction mixes were set up as follows:
Primers used are detailed in table 4.
PCR products were submitted for Sanger sequencing (Genewiz). Data was analysed by proprietary in-house software (Chimera).
In this example next-generation sequencing analysis of the Site2 amplicon region shows specific C to T transitions introduced by introduction of a Cas12b base editor in an HEK-293T cell line. Two different effectors were used, either hApobec3A (Apobec3A-MCP) or AnoApobec (Anolis carolinensis Apobec1a-MCP), which were introduced through transfection of a plasmid expressing the sequence highlighted in SEQ ID NO: 76 and SEQ ID NO: 78, respectively. Concurrently, dCas12b-UGI-UGI was introduced through transfection of a plasmid expressing the sequence highlighted in SEQ ID NO: 72. gRNAs were introduced through transfection of plasmids expressing gRNA backbones that correspond to SEQ ID NOs: 58 to 65. The target site sequences correspond to SEQ ID NO: 80. The components were delivered by DNA plasmid lipid transfection and the cells were subsequently lysed 72 hours post-transfection, after which the targeted loci were amplified by PCR and analyzed by next-generation sequencing.
The results are reported in
This example demonstrates that the Cas12b base editor is functional at multiple genomic loci and sequence contexts. Base editing analysis shows that the Cas12b base editor introduces C to T transitions in different sites in the HEK-293T cell line. Seven different regions were targeted by sgRNAs: VEGFA_sgRNA1, SEQ ID NO: 82; VEGFA_sgRNA6, SEQ ID NO: 83; FANCF_sgRNA3, SEQ ID NO: 84; site2_sgRNA2, SEQ ID NO: 93; site2_sgRNA4, SEQ ID NO: 81; B2M_sgRNA2, SEQ ID NO: 94; and B2M_sgRNA3, SEQ ID NO: 95. Two different effectors were used, either hApobec3A or AnoApobec, which were introduced through transfection of plasmids expressing the sequences that correspond to SEQ ID NO: 76 and SEQ ID NO: 78, respectively. Cas12b was introduced through transfection of a plasmid that expresses the sequence corresponding to SEQ ID NO: 72. A 5′MS2 version of the gRNA (SEQ ID NO: 59) was used for all conditions. The target site sequences correspond to SEQ ID NOs: 93-95, and 81-84. Components were delivered by DNA plasmid lipid transfection and the cells were subsequently lysed 72 hours post-transfection, after which the targeted loci were amplified by PCR and analyzed by Sanger sequencing. Spacer sequences are shown in Table 5 and primers to amplify the edited region are shown in Table 6.
This example demonstrates that the Cas12b base editor is functional in different cell lines with Sanger sequencing and Chimera analysis shows that the Cas12b base editor introduces C to T transitions in the U2OS cell line. One region targeted by one sgRNA is shown, VEGFA_sgRNAL SEQ ID NO: 83. Two different effectors were used, either hApobec3A or AnoApobec, which were introduced through transfection of plasmids that express the sequences that correspond to SEQ ID NO: 76 and SEQ ID NO: 78, respectively. Cas12b was introduced through transfection of a plasmid that expresses the sequence that corresponds to SEQ ID NO: 72. Transfection of plasmid expressing a 5′MS2 version of the sgRNA (SEQ ID NO: 59) was used for all conditions. Components were delivered by DNA plasmid lipid transfection and the cells were subsequently lysed 72 hours post-transfection, after which the targeted loci were amplified by PCR and analyzed by Sanger sequencing.
This example demonstrates that the gRNA-ligand base editing system can be applied to other Type V enzymes other than Cas12b. Here it is shown that a CasMINI base editor introduces C to T transitions in different sites in HEK-293T cells. Two different effectors were used, either hApobec3A or AnoApobec, which were introduced through transfection of plasmids that express sequences corresponding to SEQ ID NO: 76 and SEQ ID NO: 78, respectively. CasMINI was introduced through transfection of a plasmid that expresses the sequence that corresponds to SEQ ID NO: 74. gRNAs were introduced through transfection of plasmids expressing gRNA backbones that correspond to SEQ ID NOs: 66 to 71. Two different regions targeted by two different sgRNAs are shown; VEGFA_sgRNA1, SEQ ID NO: 85, and VEGFA_sgRNA2, SEQ ID NO: 86. Components were delivered by DNA plasmid lipid transfection and the cells were subsequently lysed 72 hours post-transfection, after which the targeted loci were amplified by PCR and analyzed by Sanger sequencing.
This application is a national stage application of international application serial number PCT/US2022/011294, filed Jan. 5, 2022, which claims the benefit of the filing date of U.S. Provisional Application Ser. No. 63/133,945, filed Jan. 5, 2021, the entire disclosures of which are incorporated by reference as if set forth fully herein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/011294 | 1/5/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63133945 | Jan 2021 | US |