The sequence listing submitted on Nov. 17, 2023, as an .XML file entitled “10034-222US1_ST26.xml” created on Nov. 16, 2023, and having a file size of 315,191 bytes is hereby incorporated by reference pursuant to 37 C.F.R. § 1.52(e)(5).
The present disclosure relates gene editing systems, compositions, and methods of use to target proteins for directed evolution.
Libraries have long been created through traditional mutagenesis techniques such as site-saturation mutagenesis and error-prone PCR. While these methods can introduce sufficient diversity, they require laborious cloning techniques and cannot be rapidly iterated. A faster strategy would be engineering yeast that can create the desired diversity in situ.
Given limitations of current laboratory techniques and methods in generating libraries, there is need to address the aforementioned problems mentioned above by engineering yeast to generate diverse libraries. The compositions, systems, and methods disclosed herein address these and other needs.
The present disclosure provides CRISPR base editor systems and vectors for editing a gene. The present disclosure also provides methods of using the CRISPR base editor system.
In one aspect, disclosed herein is a gene editing system comprising a CRISPR base editor comprising a catalytically inactive nuclease, at least one guide RNA (gRNA), and a MS2 phage coat protein (MCP), wherein the MCP is operably linked to an activation-induced deaminase (AID), the at least one gRNA comprises at least one bacteriophage aptamer and at least one protospacer adjacent motif (PAM) sequence, and the gene editing system is coupled with a yeast display system to introduce a mutation into a target protein within a yeast cell.
In some embodiments, the gRNA binds a target nucleic acid encoding a target protein, or a fragment thereof. In some embodiments, the MCP binds the at least one bacteriophage aptamer of the gRNA. In some embodiments, the MCP comprises a nuclear localization signal (NLS). In some embodiments, the MCP comprises at least 90% sequence identity to SEQ ID NO: 41.
In some embodiments, the AID mutates the target nucleic acid encoding the target protein, or a fragment thereof. In some embodiments, the AID comprises SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 47, SEQ ID NO: 49, SEQ ID NO: 51, or a variant thereof.
In some embodiments, the yeast cell expresses a mutant of the target protein. In some embodiments, the catalytically inactive nuclease comprises a dead Cas 9 (dCas9) or a dead Cas12 (dCas12). In some embodiments, the at least one bacteriophage aptamer comprises at least one MS2 aptamer. In some embodiments, the AID comprises a cytidine activation-induced deaminase or an adenine activation-induced deaminase. In some embodiments, the yeast display system or yeast cell comprises Saccharomyces cerevisiae (S. cerevisiae).
In one aspect, disclosed herein is an expression vector comprising one or more nucleic acid sequences encoding a CRISPR base editor, wherein the CRISPR base editor comprises a catalytically inactive nuclease, a MS2 phage coat protein (MCP), and an activation-induced deaminase (AID), wherein expression of the CRISPR base editor is under control of a yeast-derived promoter sequence and a yeast-derived terminator sequence.
In some embodiments, the expression vector further comprises a nucleic acid sequence encoding at least one guide RNA (gRNA). In some embodiments, the at least one gRNA comprises at least one bacteriophage aptamer and at least one protospacer adjacent motif (PAM) sequence.
In some embodiments, the one or more nucleic acid sequences encodes the MCP comprising at least 90% sequence identity to SEQ ID NO: 40. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising SEQ ID NO: 42, SEQ ID NO: 44, SEQ ID NO: 46, SEQ ID NO: 48, SEQ ID NO: 50, or a variant thereof. In some embodiments, the catalytically inactive nuclease comprises a dead Cas 9 (dCas9) or a dead Cas12 (dCas12). In some embodiments, the one or more nucleic acid sequences encoding the catalytically inactive nuclease comprises at least 90% sequence identity to SEQ ID NO: 23.
In some embodiments, the at least one gRNA comprises SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, or a variant thereof.
In some embodiments, the at least one PAM sequence comprises SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, or a variant thereof.
In one aspect, disclosed herein is a method of treating or preventing a disease or disorder in a subject in need thereof, the method comprising identifying a target nucleic acid sequence encoding a target protein; incorporating into a yeast genome at least one expression vector comprising one or more nucleic acid sequences encoding the target nucleic acid sequence and a CRISPR base editor, wherein the CRISPR base editor comprises a catalytically inactive nuclease, at least one guide RNA (gRNA) comprising at least one bacteriophage aptamer and at least one protospacer adjacent motif (PAM) sequence, a MS2 phage coat protein (MCP), and an activation-induced deaminase (AID), and wherein expression of the CRISPR base editor is under control of a yeast-derived promoter sequence and a yeast-derived terminator sequence; inducing a mutation into the target protein, wherein the AID incorporates a mutation into the target nucleic acid sequence and the target nucleic acid is translated into the target protein, and wherein the mutation increases a function of the target protein relative to a wild-type control protein; expressing the target protein comprising the mutation in the yeast; isolating said target protein comprising the mutation; incorporating the target protein into a therapeutic composition; and administering the therapeutic composition to the subject.
The accompanying figures, which are incorporated in and constitute a part of this specification, illustrate several aspects described below.
The following description of the disclosure is provided as an enabling teaching of the disclosure in its best, currently known embodiment(s). To this end, those skilled in the relevant art will recognize and appreciate that many changes can be made to the various embodiments of the invention described herein, while still obtaining the beneficial results of the present disclosure. It will also be apparent that some of the desired benefits of the present disclosure can be obtained by selecting some of the features of the present disclosure without utilizing other features. Accordingly, those who work in the art will recognize that many modifications and adaptations to the present disclosure are possible and can even be desirable in certain circumstances and are a part of the present disclosure. Thus, the following description is provided as illustrative of the principles of the present disclosure and not in limitation thereof.
Reference will now be made in detail to the embodiments of the invention, examples of which are illustrated in the drawings and the examples. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this disclosure belongs. The term “comprising” and variations thereof as used herein is used synonymously with the term “including” and variations thereof and are open, non-limiting terms. Although the terms “comprising” and “including” have been used herein to describe various embodiments, the terms “consisting essentially of” and “consisting of” can be used in place of “comprising” and “including” to provide for more specific embodiments and are also disclosed. As used in this disclosure and in the appended claims, the singular forms “a”, “an”, “the”, include plural referents unless the context clearly dictates otherwise.
The following definitions are provided for the full understanding of terms used in this specification.
Ranges can be expressed herein as from “about” one particular value, and/or to “about” another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms another embodiment. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. It is also understood that there are a number of values disclosed herein, and that each value is also herein disclosed as “about” that particular value in addition to the value itself. For example, if the value “10” is disclosed, then “about 10” is also disclosed. It is also understood that when a value is disclosed that “less than or equal to” the value, “greater than or equal to the value” and possible ranges between values are also disclosed, as appropriately understood by the skilled artisan. For example, if the value “10” is disclosed the “less than or equal to 10” as well as “greater than or equal to 10” is also disclosed. It is also understood that throughout the application, data is provided in a number of different formats, and that this data, represents endpoints and starting points, and ranges for any combination of the data points. For example, if a particular data point “10” and a particular data point 15 are disclosed, it is understood that greater than, greater than or equal to, less than, less than or equal to, and equal to 10 and 15 are considered disclosed as well as between 10 and 15. It is also understood that each unit between two particular units are also disclosed. For example, if 10 and 15 are disclosed, then 11, 12, 13, and 14 are also disclosed.
“Optional” or “optionally” means that the subsequently described event or circumstance may or may not occur, and that the description includes instances where said event or circumstance occurs and instances where it does not.
An “increase” can refer to any change that results in a greater amount of a symptom, disease, composition, condition, or activity. An increase can be any individual, median, or average increase in a condition, symptom, activity, composition in a statistically significant amount. Thus, the increase can be a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100%, or more, increase so long as the increase is statistically significant.
A “decrease” can refer to any change that results in a smaller amount of a symptom, disease, composition, condition, or activity. A substance is also understood to decrease the genetic output of a gene when the genetic output of the gene product with the substance is less relative to the output of the gene product without the substance. Also, for example, a decrease can be a change in the symptoms of a disorder such that the symptoms are less than previously observed. A decrease can be any individual, median, or average decrease in a condition, symptom, activity, composition in a statistically significant amount. Thus, the decrease can be a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100% decrease so long as the decrease is statistically significant.
“Inhibit,” “inhibiting,” and “inhibition” mean to decrease an activity, response, condition, disease, or other biological parameter. This can include but is not limited to the complete ablation of the activity, response, condition, or disease. This may also include, for example, a 10% reduction in the activity, response, condition, or disease as compared to the native or control level. Thus, the reduction can be a 10, 20, 30, 40, 50, 60, 70, 80, 90, 100%, or any amount of reduction in between as compared to native or control levels.
By “reduce” or other forms of the word, such as “reducing” or “reduction,” means lowering of an event or characteristic (e.g., gene expression). It is understood that this is typically in relation to some standard or expected value, in other words it is relative, but that it is not always necessary for the standard or relative value to be referred to. For example, “reduces gene expression” means reducing or lowering the production of a gene product relative to a standard or a control.
The terms “treat,” “treating,” and grammatical variations thereof as used herein, include partially or completely delaying, alleviating, mitigating or reducing the intensity of one or more attendant symptoms of a disorder or condition and/or alleviating, mitigating or impeding one or more causes of a disorder or condition. Treatments according to the disclosure may be applied preventively, prophylactically, palliatively or remedially. Treatments are administered to a subject prior to onset (e.g., before obvious signs of disease), during early onset (e.g., upon initial signs and symptoms of disease), or after an established development of disease.
By “prevent” or other forms of the word, such as “preventing” or “prevention,” is meant to stop a particular event or characteristic, to stabilize or delay the development or progression of a particular event or characteristic, or to minimize the chances that a particular event or characteristic will occur. Prevent does not require comparison to a control as it is typically more absolute than, for example, reduce. As used herein, something could be reduced but not prevented, but something that is reduced could also be prevented. Likewise, something could be prevented but not reduced, but something that is prevented could also be reduced. It is understood that where reduce or prevent are used, unless specifically indicated otherwise, the use of the other word is also expressly disclosed.
As used herein, “enhance”, “enhanced”, “enhancement”, “enhancing”, and any grammatical variations thereof as used herein, refers to an act of intensifying, increasing, or further improving the quality, value, or extent of a biological function, composition, compound, cell, or tissue.
The term “subject” refers to any individual who is the target of administration or treatment. The subject can be a vertebrate, for example, a mammal. In one aspect, the subject can be human, non-human primate, bovine, equine, porcine, canine, or feline. The subject can also be a guinea pig, rat, hamster, rabbit, mouse, or mole. Thus, the subject can be a human or veterinary patient. The term “patient” refers to a subject under the treatment of a clinician, e.g., physician.
“Composition” refers to any agent that has a beneficial biological effect. Beneficial biological effects include both therapeutic effects, e.g., treatment of a disorder or other undesirable physiological condition, and prophylactic effects, e.g., prevention of a disorder or other undesirable physiological condition. The terms also encompass pharmaceutically acceptable, pharmacologically active derivatives of beneficial agents specifically mentioned herein, including, but not limited to, a vector, polynucleotide, cells, salts, esters, amides, proagents, active metabolites, isomers, fragments, analogs, and the like. When the term “composition” is used, then, or when a particular composition is specifically identified, it is to be understood that the term includes the composition per se as well as pharmaceutically acceptable, pharmacologically active vector, polynucleotide, salts, esters, amides, proagents, conjugates, active metabolites, isomers, fragments, analogs, etc.
“Comprising” is intended to mean that the compositions, methods, etc. include the recited elements, but do not exclude others. “Consisting essentially of” when used to define compositions and methods, shall mean including the recited elements, but excluding other elements of any essential significance to the combination. Thus, a composition consisting essentially of the elements as defined herein would not exclude trace contaminants from the isolation and purification method and pharmaceutically acceptable carriers, such as phosphate buffered saline, preservatives, and the like.
“Consisting of” shall mean excluding more than trace elements of other ingredients and substantial method steps for administering the compositions provided and/or claimed in this disclosure. Embodiments defined by each of these transition terms are within the scope of this disclosure.
As used herein, “operably fused” or “operably linked” refers to two or more compositions or compounds being bound or linked together in such a way the optimizes the intended function. When bound or linked, these compositions or compounds can be linked covalently, electrostatic interaction, through hydrogen bonding, or any combinations thereof.
Reference also is made herein to peptides, polypeptides, proteins, and compositions comprising peptides, polypeptides, and proteins. As used herein, a polypeptide and/or protein is defined as a polymer of amino acids, typically of length≥100 amino acids (Garrett & Grisham, Biochemistry, 2nd edition, 1999, Brooks/Cole, 110). A peptide is defined as a short polymer of amino acids, of a length typically of 20 or less amino acids, and more typically of a length of 12 or less amino acids (Garrett & Grisham, Biochemistry, 2nd edition, 1999, Brooks/Cole, 110).
The peptides, polypeptides, and proteins disclosed herein may be modified to include mutations or non-amino acid moieties. Modifications may include but are not limited to carboxylation, PEGylation (e.g., N-terminal or C-terminal PEGylation via additional of polyethylene glycol), acylation (e.g., O-acylation (esters), N-acylation (amides), S-acylation (thioesters)), acetylation (e.g., the addition of an acetyl group, either at the N-terminus of the protein or at lysine residues), formylation lipoylation (e.g., attachment of a lipoate, a C8 functional group), myristoylation (e.g., attachment of myristate, a C14 saturated acid), palmitoylation (e.g., attachment of palmitate, a C16 saturated acid), alkylation (e.g., the addition of an alkyl group, such as an methyl at a lysine or arginine residue), isoprenylation or prenylation (e.g., the addition of an isoprenoid group such as farnesol or geranylgeraniol), glycosylation (e.g., the addition of a glycosyl group to either asparagine, hydroxylysine, serine, or threonine, resulting in a glycoprotein).
The phrases “percent identity” and “% identity,” as applied to polypeptide sequences, refer to the percentage of residue matches between at least two polypeptide sequences aligned using a standardized algorithm. Methods of polypeptide sequence alignment are well-known. Some alignment methods consider conservative amino acid substitutions. Such conservative substitutions generally preserve the charge and hydrophobicity at the site of substitution, thus preserving the structure (and therefore function) of the polypeptide. Percent identity for amino acid sequences may be determined as understood in the art. (See, e.g., U.S. Pat. No. 7,396,664, which is incorporated herein by reference in its entirety). A suite of commonly used and freely available sequence comparison algorithms is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST) (Altschul, S. F. et al. (1990) J. Mol. Biol. 215:403 410), which is available from several sources, including the NCBI, Bethesda, Md., at its website. The BLAST software suite includes various sequence analysis programs including “blastp,” that is used to align a known amino acid sequence with other amino acids sequences from a variety of databases.
Percent identity may be measured over the length of an entire defined polypeptide sequence or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined polypeptide sequence, for instance, a fragment of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 150 contiguous residues. Such lengths are exemplary only, and it is understood that any fragment length may be used to describe a length over which percentage identity may be measured.
Fusion proteins and fusion polynucleotides are also contemplated herein. A “fusion protein” refers to a protein formed by the fusion of at least one peptide, polypeptide, protein or variant thereof as disclosed herein to at least one molecule of a heterologous peptide, polypeptide, protein or variant thereof. The heterologous protein(s) may be fused at the N-terminus, the C-terminus, or both termini. A fusion protein comprises at least a fragment or variant of the heterologous protein(s) that are fused with one another, preferably by genetic fusion (i.e., the fusion protein is generated by translation of a nucleic acid in which a polynucleotide encoding all or a portion of a first heterologous protein is joined in-frame with a polynucleotide encoding all or a portion of a second heterologous protein). The heterologous protein(s), once part of the fusion protein, may each be referred to herein as a “portion”, “region” or “moiety” of the fusion protein.
A fusion polynucleotide refers to the fusion of the nucleotide sequence of a first polynucleotide to the nucleotide sequence of a second heterologous polynucleotide (e.g., the 3′ end of a first polynucleotide to a 5′ end of the second polynucleotide). Where the first and second polynucleotides encode proteins, the fusion may be such that the encoded proteins are in-frame and results in a fusion protein. The first and second polynucleotide may be fused such that the first and second polynucleotide are operably linked (e.g., as a promoter and a gene expressed by the promoter as discussed below).
The term “variant” means a polypeptide derived from a parent polypeptide by one or more (several) alteration(s), i.e., a substitution, insertion, and/or deletion, at one or more (several) positions. A substitution means a replacement of an amino acid occupying a position with a different amino acid; a deletion means removal of an amino acid occupying a position; and an insertion nans adding 1 or more, such as 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10, preferably 1-3 amino acids immediately adjacent an amino acid occupying a position. In relation to substitutions, ‘immediately adjacent’ may be to the N-side (‘upstream’) or C-side (‘downstream’) of the amino acid occupying a position (‘the named amino acid’). Therefore, for an amino acid named/numbered ‘X,’ the insertion may be at position ‘X+1’ (‘downstream’ or at position ‘X−1’ (‘upstream’).
A “variant” of a particular polypeptide sequence may be defined as a polypeptide sequence having at least 50% sequence identity to the particular polypeptide sequence over a certain length of one of the polypeptide sequences using blastp with the “BLAST 2 Sequences” tool available at the National Center for Biotechnology Information's website. (See Tatiana A. Tatusova, Thomas L. Madden (1999), “Blast 2 sequences—a new tool for comparing protein and nucleotide sequences”, FEMS Microbiol Lett. 174:247-250). In some embodiments a variant polypeptide may show, for example, at least 60%, at least 70%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater sequence identity over a certain defined length relative to a reference polypeptide.
A variant polypeptide may have substantially the same functional activity as a reference polypeptide. For example, a variant polypeptide may exhibit one or more biological activities associated with binding a ligand and/or binding DNA at a specific binding site.
A “nucleotide” is a compound consisting of a nucleoside, which consists of a nitrogenous base and a 5-carbon sugar, linked to a phosphate group forming the basic structural unit of nucleic acids, such as DNA or RNA. The four types of DNA nucleotides are adenine (A), cytosine (C), guanine (G), and thymine (T), each of which are bound together by a phosphodiester bond to form a nucleic acid molecule.
A “nucleic acid” is a chemical compound that serves as the primary information-carrying molecules in cells and make up the cellular genetic material. Nucleic acids comprise nucleotides, which are the monomers made of a 5-carbon sugar (usually ribose or deoxyribose), a phosphate group, and a nitrogenous base. A nucleic acid can also be a deoxyribonucleic acid (DNA) or a ribonucleic acid (RNA). A chimeric nucleic acid comprises two or more of the same kind of nucleic acid fused together to form one compound comprising genetic material.
The terms “percent identity” and “% identity,” as applied to polynucleotide sequences, refer to the percentage of residue matches between at least two polynucleotide sequences aligned using a standardized algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in the sequences being compared in order to optimize alignment between two sequences, and therefore achieve a more meaningful comparison of the two sequences. Percent identity for a nucleic acid sequence may be determined as understood in the art. (See, e.g., U.S. Pat. No. 7,396,664, which is incorporated herein by reference in its entirety). A suite of commonly used and freely available sequence comparison algorithms is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST) (Altschul, S. F. et al. (1990) J. Mol. Biol. 215:403 410), which is available from several sources, including the NCBI, Bethesda, Md., at its website. The BLAST software suite includes various sequence analysis programs including “blastn,” that is used to align a known polynucleotide sequence with other polynucleotide sequences from a variety of databases. Also available is a tool called “BLAST 2 Sequences” that is used for direct pairwise comparison of two nucleotide sequences. “BLAST 2 Sequences” can be accessed and used interactively at the NCBI website. The “BLAST 2 Sequences” tool can be used for both blastn and blastp (discussed above).
Percent identity may be measured over the length of an entire defined polynucleotide sequence or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at least 20, at least 30, at least 40, at least 50, at least 70, at least 100, or at least 200 contiguous nucleotides. Such lengths are exemplary only, and it is understood that any fragment length may be used to describe a length over which percentage identity may be measured.
A “full length” polynucleotide sequence is one containing at least a translation initiation codon (e.g., methionine) followed by an open reading frame and a translation termination codon. A “full length” polynucleotide sequence encodes a “full length” polypeptide sequence.
A “variant,” “mutant,” or “derivative” of a particular nucleic acid sequence may be defined as a nucleic acid sequence having at least 50% sequence identity to the particular nucleic acid sequence over a certain length of one of the nucleic acid sequences using blastn with the “BLAST 2 Sequences” tool available at the National Center for Biotechnology Information's website. (See Tatiana A. Tatusova, Thomas L. Madden (1999), “Blast 2 sequences—a new tool for comparing protein and nucleotide sequences”, FEMS Microbiol Lett. 174:247-250). In some embodiments a variant polynucleotide may show, for example, at least 60%, at least 70%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater sequence identity over a certain defined length relative to a reference polynucleotide.
A “promoter,” as used herein, refers to a sequence in DNA that mediates the initiation of transcription by an RNApolymerase. Transcriptional promoters may comprise one or more of a number of different sequence elements as follows: 1) sequence elements present at the site of transcription initiation; 2) sequence elements present upstream of the transcription initiation site and; 3) sequence elements down-stream of the transcription initiation site. The individual sequence elements function as sites on the DNA, where RNA polymerases and transcription factors facilitate positioning of RNA polymerases on the DNA bind.
“Expression” as used herein refers to the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce a peptide/protein end product, and ultimately affect a phenotype, as the final effect.
As used herein, the term “genetically modified” refers to a living cell, tissue, or organism whose genetic material has been altered using genetic engineering techniques. The genetical modification results in an alteration that does not occur naturally by mating and/or natural recombination. Modified genes can be transferred within the same species, across species (creating transgenic organisms), and across kingdoms. New, exogenous genes can be introduced, or endogenous genes can be enhanced, altered, or knocked out.
As used herein, the term, “deletion,” also called gene deletion, deficiency, or deletion mutation, refers to part of a chromosome or a sequence of DNA being left out during DNA replication. Deletion, or gene deletions can cause any number of nucleotides to be deleted from a single base to an entire piece of chromosome.
Variants comprising deletions relative to a reference amino acid sequence or nucleotide sequence are contemplated herein. A “deletion” refers to a change in the amino acid or nucleotide sequence that results in the absence of one or more amino acid residues or nucleotides relative to a reference sequence. A deletion removes at least 1, 2, 3, 4, 5, 10, 20, 50, 100, or 200 amino acids residues or nucleotides. A deletion may include an internal deletion or a terminal deletion (e.g., an N-terminal truncation or a C-terminal truncation or both of a reference polypeptide or a 5′-terminal or 3′-terminal truncation or both of a reference polynucleotide).
Variants comprising a fragment of a reference amino acid sequence or nucleotide sequence are contemplated herein. A “fragment” is a portion of an amino acid sequence or a nucleotide sequence which is identical in sequence to but shorter in length than the reference sequence. A fragment may comprise up to the entire length of the reference sequence, minus at least one nucleotide/amino acid residue. For example, a fragment may comprise from 5 to 1000 contiguous nucleotides or contiguous amino acid residues of a reference polynucleotide or reference polypeptide, respectively. In some embodiments, a fragment may comprise at least 5, 10, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 70, 80, 90, 100, 150, 250, or 500 contiguous nucleotides or contiguous amino acid residues of a reference polynucleotide or reference polypeptide, respectively. Fragments may be preferentially selected from certain regions of a molecule, for example the N-terminal region and/or the C-terminal region of a polypeptide or the 5′-terminal region and/or the 3′ terminal region of a polynucleotide. The term “at least a fragment” encompasses the full-length polynucleotide or full length polypeptide.
Variants comprising insertions or additions relative to a reference sequence are contemplated herein. The words “insertion” and “addition” refer to changes in an amino acid or nucleotide sequence resulting in the addition of one or more amino acid residues or nucleotides. An insertion or addition may refer to 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, or 200 amino acid residues or nucleotides.
As used herein, a “transcription terminator” or a “terminator” refers to a segment of a nucleic acid sequence that marks the end of gene in genomic DNA during the transcription process, or gene expression. This sequence mediates or signals the end of transcription by providing signaling nucleotides in newly synthesized RNA transcripts that trigger an RNA polymerase to release the DNA and newly synthesized RNA.
A “genome” refers to a complete set of genes or genetic material present within a cell, tissue, or organism. A genome can be nuclear (found within the cell nucleus) or mitochondrial (found with the cell mitochondria).
As used herein, a “mutation” refers to changing the structure of a gene, resulting in a variant form that may be transmitted to later generations. A mutation is caused by the alteration of single nucleotides in DNA, or the deletion, insertion, or rearrangement of larger sections of genes. A mutation can lead to the expression of a protein that has been changed physically or functionally leading to lethality, non-lethal dysfunction effects, or no effects.
“Recombinant” used in reference to a gene refers herein to a sequence of nucleic acids that are not naturally occurring in the genome of the bacterium. The non-naturally occurring sequence may include a recombination, substitution, deletion, or addition of one or more bases with respect to the nucleic acid sequence originally present in the natural genome of the bacterium.
A “vector” is a composition of matter which comprises an isolated nucleic acid and which can be used to deliver the isolated nucleic acid to the interior of a cell. Numerous vectors are known in the art including, but not limited to, linear polynucleotides, polynucleotides associated with ionic or amphiphilic compounds, plasmids, and viruses. Thus, the term “vector” includes an autonomously replicating plasmid or a virus. The term should also be construed to include non-plasmid and non-viral compounds which facilitate transfer of nucleic acid into cells, such as, for example, polylysine compounds, liposomes, and the like. Examples of viral vectors include, but are not limited to, adenoviral vectors, adeno-associated virus vectors, retroviral vectors, and the like.
“CRISPR” (Clustered Regularly Interspaced Short Palindromic Repeats) loci refers to certain genetic loci encoding components of DNA cleavage systems, for example, used by bacterial and archaeal cells to destroy foreign DNA (Horvath and Barrangou, 2010, Science 327: 167-170; WO2007025097, published 1 Mar. 2007). A CRISPR locus can consist of a CRISPR array, comprising short direct repeats (CRISPR repeats) separated by short variable DNA sequences (called spacers), which can be flanked by diverse Cas (CRISPR-associated) genes.
As used herein, an “effector” or “effector protein” is a protein that encompasses an activity including recognizing, binding to, and/or cleaving or nicking a polynucleotide target. An effector, or effector protein, may also be an endonuclease. The “effector complex” of a CRISPR system includes Cas proteins involved in crRNA and target recognition and binding. Some of the component Cas proteins may additionally comprise domains involved in target polynucleotide cleavage.
A nuclease is an enzyme capable of cleaving the phosphodiester bonds between nucleotides of nucleic acids. Nuclease can possess properties to cause double or single stranded breaks to target nucleic acids. Nucleases are commonly used in CRISPR technology to modify a host genome to express or inhibit a target gene.
The term “Cas protein” refers to a polypeptide encoded by a Cas (CRISPR-associated) gene. A Cas protein includes proteins encoded by a gene in a Cas locus and includes adaptation molecules as well as interference molecules. An interference molecule of a bacterial adaptive immunity complex includes endonucleases. A Cas endonuclease described herein comprises one or more nuclease domains. Contemplated herein are any Cas molecules that comprise a Rec3 clamp, as described below.
A Cas endonuclease may also include a multifunctional Cas endonuclease. The term “multifunctional Cas endonuclease” and “multifunctional Cas endonuclease polypeptide” are used interchangeably herein and includes reference to a single polypeptide that has Cas endonuclease functionality (comprising at least one protein domain that can act as a Cas endonuclease) and at least one other functionality, such as but not limited to, the functionality to form a complex (comprises at least a second protein domain that can form a complex with other proteins). In one aspect, the multifunctional Cas endonuclease comprises at least one additional protein domain relative (either internally, upstream (5′), downstream (3′), or both internally 5′ and 3′, or any combination thereof) to those domains typical of a Cas endonuclease.
As used herein, the term “guide polynucleotide”, relates to a polynucleotide sequence that can form a complex with a Cas endonuclease, including the Cas endonuclease described herein, and enables the Cas endonuclease to recognize, optionally bind to, and optionally cleave a DNA target site. The guide polynucleotide sequence can be an RNA sequence, a DNA sequence, or a combination thereof (a RNA-DNA combination sequence).
The terms “single guide RNA” and “sgRNA” are used interchangeably herein and relate to a synthetic fusion of two RNA molecules, a crRNA (CRISPR RNA) comprising a variable targeting domain (linked to a tracr mate sequence that hybridizes to a tracrRNA), fused to a tracrRNA (trans-activating CRISPR RNA).
The term “administer,” “administering”, or derivatives thereof refer to delivering a composition, substance, inhibitor, or medication to a subject or object by one or more the following routes: oral, topical, intravenous, subcutaneous, transcutaneous, transdermal, intramuscular, intra-joint, parenteral, intra-arteriole, intradermal, intraventricular, intracranial, intraperitoneal, intralesional, intranasal, rectal, vaginal, by inhalation or via an implanted reservoir. The term “parenteral” includes subcutaneous, intravenous, intramuscular, intra-articular, intra-synovial, intrasternal, intrathecal, intrahepatic, intralesional, and intracranial injections or infusion techniques.
Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated system (CRISPR/-Cas9) is a popular tool for genome editing. As used herein, genome editing refers to the strategies and techniques for the targeted, specific modification of the genetic information (genome) of living organisms. Genome engineering is a very active field of research because of the wide range of applications, particularly in the areas of human health. For example, genome engineering can be used to alter (e.g., correct or inhibit) a gene carrying a harmful mutation or to explore the function of a gene. The present disclosure provides CRISPR base editor systems and vectors for editing a gene. The present disclosure also provides systems, vectors, and methods of using the CRISPR base editor system.
Base editing refers to a gene editing method to make targeted changes to a nucleic acid sequence. As an approach to genome editing, base editing uses components of CRISPR such as, for example gRNA and nucleases (i.e.: Cas endonucleases) together with other enzymes to directly introduce mutations into nucleic acid sequences. The difference between traditional CRISPR techniques and base editing techniques that incorporate CRISPR components is that base editing introduces mutations without making double-stranded DNA breaks. Thus, the present disclosure provides a base editor that introduces target nucleic acid mutations with minimal errors.
The present disclosure also provides systems, vectors, and methods utilizing directed evolution techniques to generate target proteins with optimal functions. “Directed evolution” refers to a method used in protein engineering that mimics the process of natural selection to steer proteins or nucleic acids towards user-defined objectives. Directed evolution comprises of subjecting a gene to iterative rounds of mutagenesis, selection, and amplification, to be performed in vivo such as for example in a prokaryotic species including, but not limited to yeast.
In one aspect, disclosed herein is a gene editing system comprising a CRISPR base editor comprising a catalytically inactive nuclease, at least one guide RNA (gRNA), and a MS2 phage coat protein (MCP), wherein the MCP is operably linked to an activation-induced deaminase (AID), the at least one gRNA comprises at least one bacteriophage aptamer and at least one protospacer adjacent motif (PAM) sequence, and the gene editing system is coupled with a yeast display system to introduce a mutation into a target protein within a yeast cell. As used herein, a “bacteriophage aptamer” refers to short, single-stranded nucleic acid sequences including, but not limited to DNA or RNA, derived from a bacteriophage virus. In general, aptamers comprise high affinity and specificity to interact with a desired target, such as for example the MCP protein. Further, a “bacteriophage” refers to a virus that infects and replicates within bacteria and archaea, but display minimal harmful effects in humans.
As used herein, a “yeast display” refers to a protein engineering technology wherein recombinant protein(s) are expressed in a yeast organism by incorporating a constructed nucleic acid sequence into the yeast genome. Following expression of the constructed nucleic acid sequence, the recombinant protein(s) are exposed on the yeast cell wall allowing for identification and/or isolation of said recombination.
In some embodiments, the gRNA binds a target nucleic acid encoding a target protein, or a fragment thereof. In some embodiments, the target protein comprises an antibody, or a fragment thereof. In some embodiments, the yeast display exposes an antibody, or a fragment thereof, on the cell wall surface. The term “antibody” is used in the broadest sense, and specifically covers monoclonal antibodies (including full length monoclonal antibodies), polyclonal antibodies, and multispecific antibodies (e.g., bispecific antibodies). Antibodies (Abs) and immunoglobulins (Igs) are glycoproteins having the same structural characteristics. While antibodies exhibit binding specificity to a specific target, immunoglobulins include both antibodies and other antibody-like molecules which lack target specificity. Native antibodies and immunoglobulins are usually heterotetrameric glycoproteins of about 150,000 daltons, composed of two identical light (L) chains and two identical heavy (H) chains. Each heavy chain has at one end a variable domain (VH) followed by a number of constant domains. Each light chain has a variable domain at one end (VL) and a constant domain at its other end.
The term “antibody fragment” refers to a portion of a full-length antibody, generally the target binding or variable region. Examples of antibody fragments include Fab, Fab′, F(ab′)2 and Fv fragments. The phrase “functional fragment or analog” of an antibody is a compound having qualitative biological activity in common with a full-length antibody. For example, a functional fragment or analog of an anti-IgE antibody is one which can bind to an IgE immunoglobulin in such a manner so as to prevent or substantially reduce the ability of such molecule from having the ability to bind to the high affinity receptor, FcεRI. As used herein, “functional fragment” with respect to antibodies, refers to Fv, F(ab) and F(ab′)2 fragments. An “Fv” fragment is the minimum antibody fragment which contains a complete target recognition and binding site. This region consists of a dimer of one heavy and one light chain variable domain in a tight, non-covalent association (VH-VL dimer). It is in this configuration that the three CDRs of each variable domain interact to define a target binding site on the surface of the VH-VL dimer. Collectively, the six CDRs confer target binding specificity to the antibody. However, even a single variable domain (or half of an Fv comprising only three CDRs specific for a target) has the ability to recognize and bind target, although at a lower affinity than the entire binding site. “Single-chain Fv” or “sFv” antibody fragments comprise the VH and VL domains of an antibody, wherein these domains are present in a single polypeptide chain. Generally, the Fv polypeptide further comprises a polypeptide linker between the VH and VL domains which enables the sFv to form the desired structure for target binding.
The term “monoclonal antibody” as used herein refers to an antibody obtained from a substantially homogeneous population of antibodies, i.e., the individual antibodies within the population are identical except for possible naturally occurring mutations that may be present in a small subset of the antibody molecules.
In some embodiments, the target protein includes, but is not limited to an enzyme, a structural protein, a contractile protein, a hormonal protein, a storage protein, a signaling protein, a transport protein, or fragments thereof.
In some embodiments, the MCP binds the at least one bacteriophage aptamer of the gRNA. In some embodiments, the MCP comprises a nuclear localization signal (NLS). In some embodiments, the MCP is operably fused to the NLS by a linker peptide. As used herein, a “NLS” refers to a short amino acid sequence that acts as a signal fragment that mediates the transport of proteins, either native or recombinant proteins, from the cytoplasm into the nucleus. In some embodiments, the NLS is a bipartite (BP) NLS. In some embodiments, the NLS is a monopartite (MP) NLS. The NLS can be located at the amino (N) terminus, the carboxy (C) terminus, or anywhere in between the N and C termini of the native or recombinant protein. Non-limiting examples of NLS include a simian virus 40 NLS, or mutants thereof, a cMYC NLS, or mutants thereof, and nucleoplasmin (Nuc) NLS, or mutants thereof.
In some embodiments, the MCP comprises at least 60% sequence identity to SEQ ID NO: 41. In some embodiments, the MCP comprises at least 70% sequence identity to SEQ ID NO: 41. In some embodiments, the MCP comprises at least 75% sequence identity to SEQ ID NO: 41. In some embodiments, the MCP comprises at least 80% sequence identity to SEQ ID NO: 41. In some embodiments, the MCP comprises at least 90% sequence identity to SEQ ID NO: 41. In some embodiments, the MCP comprises at least 95% sequence identity to SEQ ID NO: 41. In some embodiments, the MCP comprises at least 99% sequence identity to SEQ ID NO: 41. In some embodiments, the MCP comprises SEQ ID NO: 41.
In some embodiments, the AID mutates the target nucleic acid encoding the target protein, or a fragment thereof.
In some embodiments, the AID comprises at least 60% sequence identity to SEQ ID NO: 43. In some embodiments, the AID comprises at least 70% sequence identity to SEQ ID NO: 43. In some embodiments, the AID comprises at least 75% sequence identity to SEQ ID NO: 43. In some embodiments, the AID comprises at least 80% sequence identity to SEQ ID NO: 43. In some embodiments, the AID comprises at least 90% sequence identity to SEQ ID NO: 43. In some embodiments, the AID comprises at least 95% sequence identity to SEQ ID NO: 43. In some embodiments, the AID comprises at least 99% sequence identity to SEQ ID NO: 43. In some embodiments, the AID comprises SEQ ID NO: 43.
In some embodiments, the AID comprises at least 60% sequence identity to SEQ ID NO: 45. In some embodiments, the AID comprises at least 70% sequence identity to SEQ ID NO: 45. In some embodiments, the AID comprises at least 75% sequence identity to SEQ ID NO: 45. In some embodiments, the AID comprises at least 80% sequence identity to SEQ ID NO: 45. In some embodiments, the AID comprises at least 90% sequence identity to SEQ ID NO: 45. In some embodiments, the AID comprises at least 95% sequence identity to SEQ ID NO: 45. In some embodiments, the AID comprises at least 99% sequence identity to SEQ ID NO: 45. In some embodiments, the AID comprises SEQ ID NO: 45.
In some embodiments, the AID comprises at least 60% sequence identity to SEQ ID NO: 47. In some embodiments, the AID comprises at least 70% sequence identity to SEQ ID NO: 47. In some embodiments, the AID comprises at least 75% sequence identity to SEQ ID NO: 47. In some embodiments, the AID comprises at least 80% sequence identity to SEQ ID NO: 47. In some embodiments, the AID comprises at least 90% sequence identity to SEQ ID NO: 47. In some embodiments, the AID comprises at least 95% sequence identity to SEQ ID NO: 47. In some embodiments, the AID comprises at least 99% sequence identity to SEQ ID NO: 47. In some embodiments, the AID comprises SEQ ID NO: 47.
In some embodiments, the AID comprises at least 60% sequence identity to SEQ ID NO: 49. In some embodiments, the AID comprises at least 70% sequence identity to SEQ ID NO: 49. In some embodiments, the AID comprises at least 75% sequence identity to SEQ ID NO: 49. In some embodiments, the AID comprises at least 80% sequence identity to SEQ ID NO: 49. In some embodiments, the AID comprises at least 90% sequence identity to SEQ ID NO: 49. In some embodiments, the AID comprises at least 95% sequence identity to SEQ ID NO: 49. In some embodiments, the AID comprises at least 99% sequence identity to SEQ ID NO: 49. In some embodiments, the AID comprises SEQ ID NO: 49.
In some embodiments, the AID comprises at least 60% sequence identity to SEQ ID NO: 51. In some embodiments, the AID comprises at least 70% sequence identity to SEQ ID NO: 51. In some embodiments, the AID comprises at least 75% sequence identity to SEQ ID NO: 51. In some embodiments, the AID comprises at least 80% sequence identity to SEQ ID NO: 51. In some embodiments, the AID comprises at least 90% sequence identity to SEQ ID NO: 51. In some embodiments, the AID comprises at least 95% sequence identity to SEQ ID NO: 51. In some embodiments, the AID comprises at least 99% sequence identity to SEQ ID NO: 51. In some embodiments, the AID comprises SEQ ID NO: 51.
In some embodiments, the AID is a mutant AID. In some embodiments, the mutant AID comprises AID*Δ, AIDmono, AID*mono, AID731mono, AID731Δ, or AID dead. In some embodiments, the mutant AID comprises AID731Δ.
In some embodiments, an MCP of any preceding aspect fuses to an AID of any preceding aspect to form an MCP-AID fusion protein. In some embodiments, the CRISPR base editor system of any preceding aspect comprises one, two, three, or more MCP-AID fusion proteins.
In some embodiments, the yeast cell expresses a mutant of the target protein. The base editor of any preceding aspect, introduces mutations into the target protein to improve, increase, and/or enhance protein function. Thus, the base editor system introduces at least one substitution, insertion, deletion, frameshift mutation, or any combination thereof to improve, increase, and/or enhance protein function. The base editor of any preceding aspect can also introduce mutations into a defective protein, pathogenic protein, misfolded protein, or onco-protein (cancer-related) to render said protein dysfunctional.
The structure for Cas molecules was determined when bound in complex with a gRNA and double-stranded DNA target, in an active (DNA cleavage product state) and inactive (nonproductive state) conformation. This allowed for design of enzymes with different properties that facilitate better gene editing. The Cas nucleases disclosed herein have been mutated within the catalytic domains to be inactive, such that the Cas nuclease lacks endonuclease activity. In some embodiments, the catalytically inactive nuclease comprises a dead Cas 9 (dCas9) or a dead Cas12 (dCas12).
The coat protein (MCP) of the bacteriophage MS2 binds to specific stem-loop aptamers to regulate gene expression of viral genes. Herein, the MCP protein fused to the AID binds at least one MS2 aptamer, located with the gRNA, to introduce targeted mutations. In some embodiments, the at least one bacteriophage aptamer comprises at least one MS2 aptamer.
Activation-induced deaminases (AID) are enzymes that create mutations in nucleic acid sequences by deamination of cytosine (C) or adenine (A) bases. A non-limiting example of AID functions include AID enzymes changing a C:guanine (G) base pair into an uracil (U):G mismatch. Thus, during DNA replication, the host cell replication machinery recognizes the U base as a thymine (T), and hence the C:G is converted into a T:A base pair. In some embodiments, the AID comprises a cytidine activation-induced deaminase or an adenine activation-induced deaminase.
In some embodiments, the yeast display system or yeast cell is derived from any one organism including, but is not limited to Saccharomyces cerevisiae (S. cerevisiae), Kluyveromyces lactis (K. lactis), Kluyverimyces marxianus (K. marxianus), Scheffersomyces stipitis (S. stipites), Yarrowia lipolytica (Y. lipolytica), Hansenula polymorpha (H. polymorpha), Pichia pastoris (P. pastoris), Komagataella pastoris (K. pastoris), Ashbya gossypii (A. gossypii), Streptomyes noursei (S. noursei), Candida albicans (C. albicans), and Schizosaccharomyces pombe (S. pombe).
In one aspect, disclosed herein is an expression vector comprising one or more nucleic acid sequences encoding a CRISPR base editor, wherein the CRISPR base editor comprises a catalytically inactive nuclease, a MS2 phage coat protein (MCP), and an activation-induced deaminase (AID), wherein expression of the CRISPR base editor is under control of a yeast-derived promoter sequence and a yeast-derived terminator sequence.
In some embodiments, the expression vector comprises a plasmid or a virus or viral vector. A plasmid or a viral vector can be capable of extrachromosomal replication or, optionally, can integrate into the host genome. As used herein, the term “integrated” used in reference to an expression vector (e.g., a plasmid or viral vector) means the expression vector, or a portion thereof, is incorporated (physically inserted or ligated) into the chromosomal DNA of a host cell. As used herein, a “viral vector” refers to a virus-like particle containing genetic material which can be introduced into a eukaryotic cell without causing substantial pathogenic effects to the eukaryotic cell. A wide range of viruses or viral vectors can be used for transduction but should be compatible with the cell type the virus or viral vector are transduced into (e.g., low toxicity, capability to enter cells). Suitable viruses and viral vectors include adenovirus, lentivirus, retrovirus, among others. In some embodiments, the expression vector encoding a chimeric polypeptide is a naked DNA or is comprised in a nanoparticle (e.g., liposomal vesicle, porous silicon nanoparticle, gold-DNA conjugate particle, polyethyleneimine polymer particle, cationic peptides, etc.).
In some embodiments, the one or more nucleic acid sequences encoding the CRISPR base editor system are separated and inserted into 1, 2, 3, or more expression vectors. In some embodiments, the one or more nucleic acid sequences encoding the CRISPR base editor system is inserted into a first expression vector and at least one guide RNA (gRNA) is inserted into a second expression vector.
In some embodiments, the yeast-derived promoter comprises a native yeast promoter or a mutated yeast promoter. In some embodiments, the yeast-derived promoter comprises a native yeast terminator or a mutated yeast terminator.
In some embodiments, the expression vector further comprises a nucleic acid sequence encoding at least one gRNA. In some embodiments, the at least one gRNA comprises at least one bacteriophage aptamer and at least one protospacer adjacent motif (PAM) sequence.
In some embodiments, the one or more nucleic acid sequences encodes the MCP comprising at least 60% sequence identity to SEQ ID NO: 40. In some embodiments, the one or more nucleic acid sequences encodes the MCP comprising at least 70% sequence identity to SEQ ID NO: 40. In some embodiments, the one or more nucleic acid sequences encodes the MCP comprising at least 75% sequence identity to SEQ ID NO: 40. In some embodiments, the one or more nucleic acid sequences encodes the MCP comprising at least 80% sequence identity to SEQ ID NO: 40. In some embodiments, the one or more nucleic acid sequences encodes the MCP comprising at least 90% sequence identity to SEQ ID NO: 40. In some embodiments, the one or more nucleic acid sequences encodes the MCP comprising at least 95% sequence identity to SEQ ID NO: 40. In some embodiments, the one or more nucleic acid sequences encodes the MCP comprising at least 99% sequence identity to SEQ ID NO: 40. In some embodiments, the one or more nucleic acid sequences encodes the MCP comprising SEQ ID NO: 40.
In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 60% sequence identity to SEQ ID NO: 42. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 70% sequence identity to SEQ ID NO: 42. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 75% sequence identity to SEQ ID NO: 42. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 80% sequence identity to SEQ ID NO: 42. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 90% sequence identity to SEQ ID NO: 42. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 95% sequence identity to SEQ ID NO: 42. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 99% sequence identity to SEQ ID NO: 42. In some embodiments, the one or more nucleic acid sequences encoding the AID comprises SEQ ID NO: 42.
In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 60% sequence identity to SEQ ID NO: 44. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 70% sequence identity to SEQ ID NO: 44. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 75% sequence identity to SEQ ID NO: 44. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 80% sequence identity to SEQ ID NO: 44. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 90% sequence identity to SEQ ID NO: 44. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 95% sequence identity to SEQ ID NO: 44. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 99% sequence identity to SEQ ID NO: 44. In some embodiments, the one or more nucleic acid sequences encoding the AID comprises SEQ ID NO: 44.
In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 60% sequence identity to SEQ ID NO: 46. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 70% sequence identity to SEQ ID NO: 46. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 75% sequence identity to SEQ ID NO: 46. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 80% sequence identity to SEQ ID NO: 46. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 90% sequence identity to SEQ ID NO: 46. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 95% sequence identity to SEQ ID NO: 46. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 99% sequence identity to SEQ ID NO: 46. In some embodiments, the one or more nucleic acid sequences encoding the AID comprises SEQ ID NO: 46.
In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 60% sequence identity to SEQ ID NO: 48. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 70% sequence identity to SEQ ID NO: 48. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 75% sequence identity to SEQ ID NO: 48. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 80% sequence identity to SEQ ID NO: 48. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 90% sequence identity to SEQ ID NO: 48. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 95% sequence identity to SEQ ID NO: 48. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 99% sequence identity to SEQ ID NO: 48. In some embodiments, the one or more nucleic acid sequences encoding the AID comprises SEQ ID NO: 48.
In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 60% sequence identity to SEQ ID NO: 50. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 70% sequence identity to SEQ ID NO: 50. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 75% sequence identity to SEQ ID NO: 50. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 80% sequence identity to SEQ ID NO: 50. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 90% sequence identity to SEQ ID NO: 50. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 95% sequence identity to SEQ ID NO: 50. In some embodiments, the one or more nucleic acid sequences encodes the AID comprising at least 99% sequence identity to SEQ ID NO: 50. In some embodiments, the one or more nucleic acid sequences encoding the AID comprises SEQ ID NO: 50.
In some embodiments, the one or more nucleic acids encoding the catalytically inactive nuclease comprises a dead Cas 9 (dCas9) or a dead Cas12 (dCas12). In some embodiments, the one or more nucleic acid sequences encoding the catalytically inactive nuclease comprises at least 60% sequence identity to SEQ ID NO: 23. In some embodiments, the one or more nucleic acid sequences encoding the catalytically inactive nuclease comprises at least 70% sequence identity to SEQ ID NO: 23. In some embodiments, the one or more nucleic acid sequences encoding the catalytically inactive nuclease comprises at least 75% sequence identity to SEQ ID NO: 23. In some embodiments, the one or more nucleic acid sequences encoding the catalytically inactive nuclease comprises at least 80% sequence identity to SEQ ID NO: 23. In some embodiments, the one or more nucleic acid sequences encoding the catalytically inactive nuclease comprises at least 90% sequence identity to SEQ ID NO: 23. In some embodiments, the one or more nucleic acid sequences encoding the catalytically inactive nuclease comprises at least 95% sequence identity to SEQ ID NO: 23. In some embodiments, the one or more nucleic acid sequences encoding the catalytically inactive nuclease comprises at least 99% sequence identity to SEQ ID NO: 23. In some embodiments, the one or more nucleic acid sequences encoding the catalytically inactive nuclease comprises SEQ ID NO: 23.
In some embodiments, the at least one gRNA comprises SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, or a variant thereof.
In some embodiments, the at least one gRNA comprises a m1,3tx2 gRNA. In some embodiments, the m1,3tx2 gRNA comprises at least 60% sequence identity to SEQ ID NO: 12. In some embodiments, the m1,3tx2 gRNA comprises at least 70% sequence identity to SEQ ID NO: 12.
In some embodiments, the m1,3tx2 gRNA comprises at least 75% sequence identity to SEQ ID NO: 12. In some embodiments, the m1,3tx2 gRNA comprises at least 80% sequence identity to SEQ ID NO: 12. In some embodiments, the m1,3tx2 gRNA comprises at least 90% sequence identity to SEQ ID NO: 12. In some embodiments, the m1,3tx2 gRNA comprises at least 95% sequence identity to SEQ ID NO: 12. In some embodiments, the m1,3tx2 gRNA comprises at least 99% sequence identity to SEQ ID NO: 12. In some embodiments, the m1,3tx2 gRNA comprises SEQ ID NO: 12.
In some embodiments, the at least one PAM sequence comprises SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, or a variant thereof.
In some embodiments, the expression vector of any preceding aspect is a component of a therapeutic composition further comprising a pharmaceutically acceptable carrier. As herein, a “pharmaceutically acceptable carrier” (sometimes referred to as a “carrier”) means a carrier or excipient that is useful in preparing a pharmaceutical or therapeutic composition that is generally safe and non-toxic, and includes a carrier that is acceptable for veterinary and/or human pharmaceutical or therapeutic use. The terms “carrier” or “pharmaceutically acceptable carrier” can include, but are not limited to, phosphate buffered saline solution, water, emulsions (such as an oil/water or water/oil emulsion) and/or various types of wetting agents.
As used herein, the term “carrier” encompasses any excipient, diluent, filler, salt, buffer, stabilizer, solubilizer, lipid, stabilizer, or other material well known in the art for use in pharmaceutical formulations. The choice of a carrier for use in a composition will depend upon the intended route of administration for the composition. The preparation of pharmaceutically acceptable carriers and formulations containing these materials is described in, e.g., Remington's Pharmaceutical Sciences, 21st Edition, ed. University of the Sciences in Philadelphia, Lippincott, Williams & Wilkins, Philadelphia, P A, 2005. Examples of physiologically acceptable carriers include saline, glycerol, DMSO, buffers such as phosphate buffers, citrate buffer, and buffers with other organic acids; antioxidants including ascorbic acid; low molecular weight (less than about 10 residues) polypeptides; proteins, such as serum albumin, gelatin, or immunoglobulins; hydrophilic polymers such as polyvinylpyrrolidone; amino acids such as glycine, glutamine, asparagine, arginine or lysine; monosaccharides, disaccharides, and other carbohydrates including glucose, mannose, or dextrins; chelating agents such as EDTA; sugar alcohols such as mannitol or sorbitol; salt-forming counterions such as sodium; and/or nonionic surfactants such as TWEEN™ (ICI, Inc.; Bridgewater, New Jersey), polyethylene glycol (PEG), and PLURONICS™ (BASF; Florham Park, NJ). To provide for the administration of such dosages for the desired therapeutic treatment, compositions disclosed herein can advantageously comprise between about 0.1% and 99% by weight of the total of one or more of the subject compounds based on the weight of the total composition including carrier or diluent.
In one aspect, disclosed herein is a method of mutating a target protein using a CRISPR base editor system expressed in a yeast, the method comprising identifying a target nucleic acid sequence encoding a target protein; incorporating into a yeast genome at least one expression vector comprising one or more nucleic acid sequences encoding the target nucleic acid sequence and a CRISPR base editor, wherein the CRISPR base editor comprises a catalytically inactive nuclease, at least one guide RNA (gRNA) comprising at least one bacteriophage aptamer and at least one protospacer adjacent motif (PAM) sequence, a MS2 phage coat protein (MCP), and an activation-induced deaminase (AID), and wherein expression of the CRISPR base editor is under control of a yeast-derived promoter sequence and a yeast-derived terminator sequence; inducing a mutation into the target protein, wherein the AID incorporates a mutation into the target nucleic acid sequence and the target nucleic acid is translated into the target protein, and wherein the mutation increases a function of the target protein relative to a wild-type control protein; expressing the target protein comprising the mutation in the yeast; and isolating said target protein comprising the mutation.
In one aspect, disclosed herein is a method of mutating a target protein using a CRISPR base editor system expressed in a yeast, the method comprising identifying a target nucleic acid sequence encoding a target protein; incorporating into a yeast genome at least one expression vector comprising one or more nucleic acid sequences encoding the target nucleic acid sequence and a CRISPR base editor, wherein the CRISPR base editor comprises a catalytically inactive nuclease, at least one guide RNA (gRNA) comprising at least one bacteriophage aptamer and at least one protospacer adjacent motif (PAM) sequence, a MS2 phage coat protein (MCP), and an activation-induced deaminase (AID), and wherein expression of the CRISPR base editor is under control of a yeast-derived promoter sequence and a yeast-derived terminator sequence; and inducing a mutation into the target protein, wherein the AID incorporates a mutation into the target nucleic acid sequence and the target nucleic acid is translated into the target protein, and wherein the mutation increases a function of the target protein relative to a wild-type control protein.
In one aspect, disclosed herein is a method of mutating a target protein using a CRISPR base editor system expressed in a yeast, the method comprising identifying a target nucleic acid sequence encoding a target protein; incorporating into a yeast genome at least one expression vector comprising one or more nucleic acid sequences encoding the target nucleic acid sequence and a CRISPR base editor, wherein the CRISPR base editor comprises a catalytically inactive nuclease, at least one guide RNA (gRNA) comprising at least one bacteriophage aptamer and at least one protospacer adjacent motif (PAM) sequence, a MS2 phage coat protein (MCP), and an activation-induced deaminase (AID), and wherein expression of the CRISPR base editor is under control of a yeast-derived promoter sequence and a yeast-derived terminator sequence; inducing a mutation into the target protein, wherein the AID incorporates a mutation into the target nucleic acid sequence and the target nucleic acid is translated into the target protein, and wherein the mutation increases a function of the target protein relative to a wild-type control protein; and expressing the target protein comprising the mutation in the yeast.
In some embodiments, the method of mutating a protein comprises the CRISPR base editor system of any preceding aspect or the expression vector of any preceding aspect.
In some embodiments, the method of mutating a protein comprises the gRNA binding a target nucleic acid encoding a target protein, or a fragment thereof. In some embodiments, the method of mutating a protein comprises the MCP binding the at least one bacteriophage aptamer of the gRNA.
In some embodiments, the at least one bacteriophage aptamer comprises an MS2 aptamer. In some embodiments, the MCP comprises an NLS.
In some embodiments, the method of mutating a protein comprises the AID mutating the target nucleic acid encoding the target protein, or a fragment thereof. In some embodiments, the method of mutating a protein comprises an AID731Δ mutant.
In some embodiments, the method of mutating a protein comprises the gRNA and PAM sequence of any preceding aspect.
In some embodiments, the method of mutating a protein comprises a yeast cell that expresses a mutant of the target protein. In some embodiments, the mutant of the target protein comprises an antibody, an enzyme, a structural protein, a contractile protein, a hormonal protein, a storage protein, a signaling protein, a transport protein, or fragments thereof.
In some embodiments, the method of mutating a protein comprises a dead Cas 9 (dCas9) or a dead Cas12 (dCas12). In some embodiments, the method of mutating a protein comprises a cytidine activation-induced deaminase or an adenine activation-induced deaminase.
In some embodiments, the method of mutating a protein comprises a yeast display for expressing the mutant of the target protein. In some embodiments, the yeast display comprises Saccharomyces cerevisiae (S. cerevisiae).
Methods of Treating and/or Preventing Diseases or Disorders
In one aspect, disclosed herein is a method of treating or preventing a disease or disorder in a subject in need thereof, the method comprising identifying a target nucleic acid sequence encoding a target protein; incorporating into a yeast genome at least one expression vector comprising one or more nucleic acid sequences encoding the target nucleic acid sequence and a CRISPR base editor, wherein the CRISPR base editor comprises a catalytically inactive nuclease, at least one guide RNA (gRNA) comprising at least one bacteriophage aptamer and at least one protospacer adjacent motif (PAM) sequence, a MS2 phage coat protein (MCP), and an activation-induced deaminase (AID), and wherein expression of the CRISPR base editor is under control of a yeast-derived promoter sequence and a yeast-derived terminator sequence; inducing a mutation into the target protein, wherein the AID incorporates a mutation into the target nucleic acid sequence and the target nucleic acid is translated into the target protein, and wherein the mutation increases a function of the target protein relative to a wild-type control protein; expressing the target protein comprising the mutation in the yeast; isolating said target protein comprising the mutation; incorporating the target protein into a therapeutic composition; and administering the therapeutic composition to the subject.
In some embodiments, the method of treating or preventing a disease or disorder comprises the CRISPR base editor system of any preceding aspect or the expression vector of any preceding aspect.
In some embodiments, the method of treating or preventing a disease or disorder comprises the gRNA binding a target nucleic acid encoding a target protein, or a fragment thereof. In some embodiments, the method of treating or preventing a disease or disorder comprises the MCP binding the at least one bacteriophage aptamer of the gRNA. In some embodiments, the at least one bacteriophage aptamer comprises an MS2 aptamer. In some embodiments, the MCP comprises an NLS.
In some embodiments, the method of treating or preventing a disease or disorder comprises the AID mutating the target nucleic acid encoding the target protein, or a fragment thereof. In some embodiments, the method of treating or preventing a disease or disorder comprises an AID731Δ mutant.
In some embodiments, the method of treating or preventing a disease or disorder comprises the gRNA and PAM sequence of any preceding aspect.
In some embodiments, the method of treating or preventing a disease or disorder comprises a yeast cell that expresses a mutant of the target protein. In some embodiments, the mutant of the target protein comprises an antibody, an enzyme, a structural protein, a contractile protein, a hormonal protein, a storage protein, a signaling protein, a transport protein, or fragments thereof.
In some embodiments, the method of treating or preventing a disease or disorder comprises a dead Cas 9 (dCas9) or a dead Cas12 (dCas12). In some embodiments, the method of treating or preventing a disease or disorder comprises a cytidine activation-induced deaminase or an adenine activation-induced deaminase.
In some embodiments, the method of treating or preventing a disease or disorder comprises a yeast display for expressing the mutant of the target protein. In some embodiments, the yeast display comprises Saccharomyces cerevisiae (S. cerevisiae).
In some embodiments, the CRISPR base editor system of any preceding aspect or the expression vector of any preceding aspect is a component of a therapeutic composition further comprises a pharmaceutically acceptable carrier of any preceding aspect.
The therapeutic composition may be administered in such amounts, time, and route deemed necessary in order to achieve the desired result. The exact amount of the therapeutic composition will vary from subject to subject, depending on the species, age, and general condition of the subject, the severity of the disease, the particular therapeutic composition, its mode of administration, its mode of activity, and the like. The therapeutic composition is preferably formulated in dosage unit form for ease of administration and uniformity of dosage. It will be understood, however, that the total daily usage of the therapeutic composition will be decided by the attending physician within the scope of sound medical judgment. The specific therapeutically effective dose level for any particular subject will depend upon a variety of factors including the disease(s) being treated and the severity of the symptoms; the activity of the therapeutic composition employed; the specific therapeutic composition employed; the age, body weight, general health, sex and diet of the patient; the time of administration, route of administration, and rate of excretion of the specific therapeutic composition employed; the duration of the treatment; drugs used in combination or coincidental with the specific therapeutic composition employed; and like factors well known in the medical arts.
The therapeutic composition may be administered by any route. In some embodiments, the therapeutic composition is administered via a variety of routes, including oral, intravenous, intramuscular, intra-arterial, intramedullary, intrathecal, subcutaneous, intraventricular, transdermal, intradermal, rectal, intravaginal, intraperitoneal, topical (as by powders, ointments, creams, and/or drops), mucosal, nasal, buccal, enteral, sublingual; by intratracheal instillation, bronchial instillation, and/or inhalation; and/or as an oral spray, nasal spray, and/or aerosol. In general, the most appropriate route of administration will depend upon a variety of factors including the nature of the therapeutic composition (e.g., its stability in the environment of the subject's body), the condition of the subject (e.g., whether the subject is able to tolerate administration), etc.
The exact amount of therapeutic composition required to achieve a therapeutically or prophylactically effective amount will vary from subject to subject, depending on species, age, and general condition of a subject, severity of the side effects, identity of the particular compound(s), mode of administration, and the like. The amount to be administered to, for example, a child or an adolescent can be determined by a medical practitioner or person skilled in the art and can be lower or the same as that administered to an adult.
In one aspect, disclosed herein is therapeutic composition of any preceding aspect and a pharmaceutically acceptable carrier selected from an excipient, a diluent, a salt, a buffer, a stabilizer, a lipid, an emulsion, a nanoparticle, and a cream. One or more active agents (e.g. CRISPR base editor systems) can be administered in the “native” form or, if desired in the form of salts, esters, amides, prodrugs, or a derivative that is pharmacologically suitable. Salts, esters, amides, prodrugs, and other derivatives of the active agents can be prepared using standards procedures known to those skilled in the art of synthetic organic chemistry and described, for example, by March (1992) Advanced Organic Chemistry; Reactions, Mechanisms, and Structure, 4th Ed. N.Y. Wiley-Interscience.
In some embodiments, the therapeutic composition is administered 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, or more times. In some embodiments, the therapeutic composition is administered daily. In some embodiments, the therapeutic composition is administered every day, every 2 days, every 3 days, every 4 days, every 5 days, every 6 days, every 7 days, or more. In some embodiments, the therapeutic composition is administered every week, every 2 weeks, every 3 weeks, every 4 weeks, or more. In some embodiments, the therapeutic composition is administered every month, every 2 months, every 3 months, every 4 months, every 5 months, every 6 months, every 7 months, every 8 months, every 9 months, every 10 months, every 11 months, every 12 months, or more. In some embodiments, the therapeutic composition is administered every year, every 2 years, every 3 years, every 4 years, every 5 years, or more.
A number of embodiments of the disclosure have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. Accordingly, other embodiments are within the scope of the following claims.
By way of non-limiting illustration, examples of certain embodiments of the present disclosure are given below.
The following examples are set forth below to illustrate the compositions, devices, methods, and results according to the disclosed subject matter. These examples are not intended to be inclusive of all aspects of the subject matter disclosed herein, but rather to illustrate representative methods and results. These examples are not intended to exclude equivalents and variations of the present invention which are apparent to one skilled in the art.
The yeast Saccharomyces cerevisiae is commonly used to interrogate and screen protein variants and to perform directed evolution studies to develop proteins with enhanced features. While several techniques have been described that help enable the use of yeast for directed evolution, there remains a need to increase their speed and ease of use. Herein, yDBE, a yeast diversifying base editor, is presented that functions in vivo and employs a CRISPR-dCas9-directed cytidine deaminase base editor to diversify DNA in a targeted, rapid, and high-breadth manner. To develop yDBE, the mutation rate of an initial base editor is enhanced by employing improved deaminase variants and characterizing several scaffolded guide constructs. The ability of the yDBE platform to improve the affinity of a displayed antibody scFv, rapidly generating diversified libraries and isolating improved binders via cell sorting is demonstrated. By performing high-throughput sequencing analysis of the high-activity yDBE, it enables a mutation rate of 2.13×10−4 substitutions/bp/generation over a window of 100 bp. As yDBE functions entirely in vivo and can be easily programmed to diversify nearly any such window of DNA, it is a powerful tool for facilitating a variety of directed evolution experiments.
Directed evolution via DNA mutagenesis and screening of the resultant protein libraries is an essential strategy for improving protein function. The yeast Saccharomyces cerevisiae is frequently used for directed evolution experiments because it grows rapidly, has well developed genetic tools, can carry out eukaryotic posttranslational modifications, and can be engineered to display proteins or protein fragments tethered to the cell surface. The majority of directed evolution programs in yeast utilize mutant libraries created through traditional, in vitro mutagenesis techniques such as site-saturation mutagenesis or error-prone PCR. While these methods introduce sufficient diversity, they require laborious in vitro cloning procedures that slow the iterative process of directed evolution.
To circumvent these issues, a number of methods have been developed to continuously generate genetic diversity within a cell. Within yeast specifically, tet-directed DNA glycosylases (TaGTEAM), CRISPR-targeted error-prone DNA polymerases (EvolvR), a T7-polymerase-guided cytidine deaminase (TRIDENT), retrotransposon cycling with an error-prone reverse transcription step (ICE), and error-prone orthogonal DNA polymerases contained on cytoplasmic plasmids (OrthoRep and AHEAD) enable DNA mutation in vivo. These systems, with the exception of EvolvR, are unable to target multiple sequence regions and require a targeted gene to be first inserted in a predefined location. The rate of DNA diversification, as measured by substitutions per basepair per generation (s.p.b), can vary widely in these systems. As an example, OrthoRep, which reports a mutation rate of 1×10−5 s.p.b., in some cases took up to 13 passages, or up to 90 generations, to evolve a desired resistance phenotype. The highest reported mutation rate attained by these yeast-based systems is 1×10−3 s.p.b. using TRIDENT.
CRISPR base editors mediate in situ DNA mutation in a targeted manner by employing programmable DNA binding proteins, such as dCas9, to target cytidine or adenine deaminases to specific DNA sequences. In this way, nucleotide deaminases are directed to a specific locus that bears homology to a 20-bp spacer sequence within a CRISPR guide RNA (gRNA), resulting in DNA mutations near the targeted site. The deaminase can be fused directly to the dCas9 protein or recruited through a secondary protein-protein or protein-RNA interaction, for instance, by incorporating MS2 aptamer sequences into a gRNA scaffold and taking advantage of the high affinity interaction between the MS2 phage coat protein (MCP) and MS2 aptamers. In yeast specifically, CRISPR base editors have been employed to perturb essential genes using dCas9 fusions of Petromyzon marinus CDA1 or human APOBEC3A deaminases. Cytidine deaminases transform cytosine to uracil in ssDNA, and because uracil is recognized as DNA damage, it is often repaired inaccurately, leading to permanent DNA mutations. Possible outcomes of cytosine deamination to uracil include 1) replacement with thymine as the DNA undergoes replication, 2) excision and replacement with any nucleotide through base excision repair, or 3) mismatch repair, especially when one strand of DNA is nicked, causing mutations at or near the uracil. In this way, a variety of substitutions can occur at or near the deaminated cytosine.
Many CRISPR base editor engineering efforts have increased the precision and specificity of DNA mutagenesis to enable their application in gene editing technologies, for instance, for clinical application in which only one specific DNA base pair mutation is desired or allowable. To give two specific examples, a uracil glycosylase inhibitor domain or a uracil DNA glycosylase can be incorporated into a base editor to help reinforce C-to-T or C-to-G mutations, respectively. In contrast, diversifying base editors (DBEs), are designed to generate a high mutational load via a variety of substitutions in the vicinity of their target site with applications in directed evolution. For instance, the CRISPR-X technique utilized a human activation-induced cytidine deaminase (AID)-MCP fusion protein to mutate DNA in mammalian cells and recapitulate aspects of antibody affinity maturation. AID is the catalytic deaminase enzyme that mediates somatic hypermutation of antibody sequences, i.e., their mutation, in B cells.
Antibody therapeutics have seen tremendous growth over the past decade and are used to treat a variety of diseases, including viral infections, autoimmune disorders, and cancer. Due to its ability to grow rapidly to high densities and surface-present libraries of antibody variants, S. cerevisiae has become a popular platform for therapeutic antibody interrogation. A recently described in vivo continuous evolution platform for the isolation of high affinity nanobodies in yeast, AHEAD, demonstrates the remarkable potential of combining in situ DNA diversification with yeast protein display.17 Surprisingly, to the best of one's knowledge, DBEs have not been designed or employed for use in yeast.
In this work, a yeast DBE (yDBE) was created and then improved. The yDBE also established that it effectively mediated targeted DNA diversification of both an enzyme and an antibody fragment. Using a fluorescence shift assay of GFP enzyme variants, the initial mutagenesis capability of the yDBE was improved by (1) identifying a highly active AID variant from a panel of previously described or novel AID upmutants, (2) adjusting the number and placement of MS2 aptamers housed within the gRNA scaffold to find complementary scaffolds with high activity and unique targeting profiles, and (3) increasing the versatility of the yDBE platform to promote multi-loci targeting using rapidly-assembled, tRNA-gRNA cassettes. The yDBE platform was demonstrated to be utilized to improve the affinity of an anti-fluorescein scFv by over 100-fold through in situ DNA diversification coupled with yeast display. This work demonstrates the first development of a diversifying base-editor system for targeted and rapid DNA diversification in yeast. Furthermore, this is the first instance in which the human AID enzyme has been employed for CRISPR base editing in yeast. Lastly, yDBE enables a mutation rate of 2.13×10−4 s.p.b. over a window of 100 bp, approaching prior best-in-class in vivo mutagenesis studies.
Development of an initial CRISPR diversifying base editor for yeast (yDBE). A programmable, yeast-based diversifying base editor strain was developed for preliminary testing by genomically integrating codon-optimized MCP-AID*Δ and dCas9 proteins. When coupled with a gRNA encoding MS2 aptamer loops,
A fluorescence shift-based assay was used to determine if the initial yDBE platform introduces targeted mutations into the wtGFP enzyme. Compared to wtGFP, enhanced GFP (eGFP) has an S65T mutation that shifts the excitation spectra peak from 405 nm to 488 nm (
As a prior study revealed that dCas9 alone can result in mutations in a targeted locus, the present disclosure verified that AID was required to induce the S65T mutation by determining that a modified yDBE strain harboring a catalytically inactive mutant (AIDdead) did not result in any eGFP+ cells after a 4-day induction (
Employing higher activity AID variants to enhance yDBE mutation rate. After confirming the functionality of a yDBE, its activity was improved through two main strategies: 1) improving the activity or expression of the deaminase and 2) optimizing the location of the MS2 aptamers within the gRNA scaffold. In the first strategy, several methods were explored to increase the expression of AID*Δ and thereby increase base editing activity. It was found that 1) altering the codon optimization, 2) changing the GAL2 promoter to a strong, orthologous, constitutive promoter, or 3) using an altered MCP variant either did not change or decreased the activity relative to the initial base editor (
As attempts to alter expression or fuse mutation-enhancing factors to yDBE failed to noticeably improve its function, it was next investigated if engineered AID variants with higher catalytic activity might improve yDBE-mediated mutation rates. AID*Δ has a premature stop codon (195*) to remove its final three residues that can mediate nuclear export. All the tested variants also lack this nuclear export signal. AID*Δ, which contains three coding mutations, K10 E, T82I, and E156G, was isolated from previous work that measured global mutation rates of AID variants in E. coli. Interestingly, a related variant was reported to have more than 5-fold activity relative to the K10E, T82I, E156G mutant. This variant, referred to as “Mut7.3.1,” contains 9 coding mutations, including the 3 coding mutations found in AID*Δ, though it still retained a nuclear export signal (Table 3). Separate AID engineering efforts have also generated a variant dubbed AIDmono that showed higher activity as a base editor. To determine if use of an enhanced AID variant could improve the mutational rate, the Mut7.3.1 (AID731Δ) and AIDmono variants were fused to MCP, as well as combined mutations from AID*Δ, Mut7.3.1, and AIDmono into novel variants: AID*mono and AID731mono (Table 3). The activity was tested in comparison to the initial yDBE in the context of two targeting (18L, and t22L) and one nontargeting (NT1) gRNAs (
Varying MS2 aptamer placement. As a second strategy to increase yDBE mutational rate, it was sought to determine the ideal location and number of MS2 aptamers within the gRNA framework. Previous studies have characterized the impact of different MS2 aptamer locations in gRNAs, typically in mammalian cell hosts. As the effect of MS2 aptamers placement within gRNAs has not been characterized previously in yeast nor in the context of DBEs, a comprehensive set of gRNA/MS2 aptamer designs were analyzed using the initial yDBE that employed AID*Δ.
gRNAs that complex with the SpCas9 protein contain four loops (
In total, 11 different scaffolded gRNA constructs were constructed and tested, dubbed M1, M3, M4, M14, M34, Mt, Mtx2, M1tx2, M3tx2, M13t, M13tx2, and compared them to the starting configuration, M13. Across two experiments, the mutational capacity bestowed by each MS2 loop configuration was assessed in the yDBE system using in fluorescence shift assays, and each MS2 loop configuration was tested in the context of three gRNA spacer sequences to ensure effects were not gRNA dependent (
It was next sought to understand the mutagenic window afforded by the M13 and Mtx2 loop configurations, as a larger nucleotide range in which mutations occur is beneficial for a DBE. Previous work in mammalian cells showed that, while mutations could be detected −50 to +50 bp relative to the CRISPR PAM and the direction of transcription, the highest rate of mutation was seen from +20 to +40 bp, independent of the DNA strand being targeted. The mutational window was approximated to allow the M13 and Mtx2 loop configurations by using seven distinct gRNAs targeted (i.e., were complementary to) across the breadth of the coding strand of wtGFP (
Combining AID731Δ and Mtx2 scaffold. By combining the Mtx2 scaffold with AID731Δ, a rate of wtGFP to eGFP fluorescence shift was achieved of over 7% with gRNA 28L after 4 days of induction, representing an improvement over the original CRISPR-X (AID*Δ and M13 gRNA scaffold) construct performance in yeast of 26-fold (
Improving antibody affinity via yDBE in situ DNA diversification and surface display. It was next aimed to demonstrate the capabilities of yDBE by using it to improve the affinity of an antibody. 4-4-20, a single-chain variable fragment (scFv) that binds fluorescein, was integrated into the genome of yDBE-expression yeast strains. Because they demonstrated differential activity in gRNA-spacer selection, both M13 and Mtx2 MS2 loop configurations, both in concert with the AID731Δ variant, were used in parallel, directed evolution trials.
An scFv is comprised of a VH and VL segment, each approximately 345 bp in length (
Eight days were allowed in total for yDBE-mediated in situ antibody diversification to occur, with passages every 2 days (
After the final round of sorting, cells were plated and individual colonies picked to assess their affinity through yeast display. For the yDBEs using both the M13 and Mtx2 scaffolds, it was found that several mutant scFvs had a substantial increase in affinity over wild-type 4-4-20. Sequencing of single scFvs showed mutations in each CDR of the heavy chain, near or overlapping the spacers selected (Table 1). The nucleotide substitutions all occurred at C or G residues, consistent with AID activity and high-throughput sequencing results. Interestingly, certain mutants, such as L45V, were isolated from both the M13 and Mtx2 sorts, demonstrating convergence between the two libraries despite having different gRNA spacer sequences. Three mutants were selected for further characterization: W108F isolated from the yeast using the M13 scaffold design, and L45V and V23L, A24G, L45V from the yeast harboring Mtx2 scaffolded yDBE, where residue numbering refers to the position within the VH. These three variants were amplified from genomic DNA and re-cloned into EBY100 to ensure that examination of the scFv mutations in isolation. Using flow cytometry, each scFv's Kd value was calculated for fluorescein by titrating a broad range of concentrations (
Herein, the mutational rate of yDBE, a CRISPR diversifying base editor for in situ diversification of DNA in yeast, was designed and enhanced. Using fluorescence shift-based assays, two major components of the base editor were improved and characterized. First, the yDBE mutagenesis rate was universally improved 5-fold by surveying previously described and creating entirely new AID mutants with enhanced activity, particularly AID731Δ. Second, the mutational capability of a variety of gRNA/MS2 scaffold architectures was assessed and two, M13 and Mtx2, were identified which support high rates of mutagenesis but have unique targeting preferences. In addition, by mutating either wtGFP or the 4-4-20 scFv using distinct gRNA regions, yDBE is demonstrated to be reprogrammed to rapidly target new DNA sequences. Using high-throughput sequencing, a variety of mutations occurring on both sides of the targeted spacer region were confirmed, with the majority of mutations occurring within a 100-nucleotide window centered near the PAM. The combined Mtx2 mutation profile created through high-throughput sequencing showed a concentration of substitutions in the gRNA-binding region, especially at the 5′ end of where the spacer binds. dCas9 is relatively tolerant to single or sometimes even double mismatches in these areas, and for this reason the base editor continues operating despite the mounting mutations.
The enhanced yDBE, employing the novel mutant AID731Δ in concert with the Mtx2 scaffold design, is estimated to have a mutation rate of 2.13×104 s.p.b. over a region of 100 nucleotides, which is comparable to previously described in situ mutagenesis platforms for yeast. Because of its ability to readily substitute C residues in both strands into any other nucleotide, the base editor can make a variety of mutations in many DNA sequences. There was a preference for C-to-G substitutions using yDBE which contrasts with results in CRISPR-X, carried out in mammalian cells, that showed a preference for C-to-T substitutions. This is likely due to the preference yeast have to insert a cytosine across from an abasic site during the translesion synthesis step of base excision repair. Indeed, Target-AID, a precise base editor employed in yeast, similarly found a high ratio of C-to-G substitutions in targeted poly-C regions and identified polymerase 9 as the most likely cause. Similarly, AID* (the triple mutant of wild-type AID) overexpression in yeast causes many C-to-G mutations, which required active base excision repair proteins UNG1 and REV1. In general, a higher mutation rate was observed when using gRNAs that targeted the coding strand. Targeting this strand with dCas9 alone has been shown to be mutagenic in yeast through R-loop formation in the transcribed strand, which exposes it to background deaminase activity. It has been contemplated that this R-loop formation allows more access for the MCP-AID component of yDBE, leading to higher mutation rates.
One limitation of previous yDBE is the difficulty in universally mutating single, large genes >1,000 bp in length. While other systems such as OrthoRep and TRIDENT excel in this use case,16,14 they require placing genes adjacent to specific promoters, and these systems are unable to target multiple targets nor endogenous targets. Therefore, to further expand the targeting breadth of yDBE, the present disclosure implements a multiplexing gRNA expression cassette methodology by interspacing gRNAs with a tRNA. Interestingly, for both the M13 and Mtx2 gRNAs, the first gRNA of the cassette had the fewest substitutions near its target site, while the last gRNA had the most. This contradicts previous work with Cas9 gene knockout assays that found that the efficiency of the gRNAs generally decreased along the cassette. Targeting additional templates with new sets of spacers elucidates any effect array position has on gRNA efficiency. Another potential limitation of previous yDBE is the difficulty in targeting regions low in GC content. The high-throughput sequencing performed herein determined that only 1.3% of mutations occurred at A or T bases when applying the base editor. A way to overcome this is combining cytidine base editors with adenine base editors. Such a strategy has already been shown in CRISPR base editors in mammalian cells and plants and in a T7-RNAP-driven system (TRIDENT) in yeast.
The base editor demonstrated that by targeting exogenous genes (wtGFP and 4-4-20 scFv), it is equally suitable for targeting endogenous genes. For this reason, the yDBE platform can be extended to a wide variety of directed evolution tasks. For instance, since the yDBE system is amenable to multiplexing, it aids in the evolution of more complex phenotypes such as resistance to stress and optimization of metabolic pathways by enabling the mutation of multiple, distant loci simultaneously. Lastly, because the yDBE system uses AID, the mutagenic component driving somatic hypermutation in B cells, it is engineered to better recapitulate the mutational profile of affinity matured antibodies compared to other in vivo mutagenesis systems. For at least these reasons, the abilities of the systems disclosed herein improve antibody affinity.
Media, culture, and base strains. NEB 10-beta E. coli (New England Biolabs) were used to amplify plasmid constructs for molecular cloning. E. coli were cultured in 5 mL of LB broth (Teknova) at 37° C. overnight with agitation. LB was supplemented with 34 μg/mL chloramphenicol (Sigma Aldrich) or 100 μg/mL ampicillin (Sigma Aldrich) antibiotic for selection.
All yeast strains developed in this work are derived from strain EBY100 (Leu−, Trp−; ATCC MYA-4941), which is designed for yeast display. When harboring a plasmid, yeast were cultured in 2 mL of synthetic glucose (dextrose) or galactose-Trp media (SD-Trp, or SG-Trp), comprised of 0.74 g/L complete supplemental media-TRP (CSM-TRP, Sunrise Science), 0.67 g/L yeast nitrogen base (YNB, BD), and 20 g/L of glucose (Fisher Scientific) or galactose sugar (Sigma-Aldrich). In instances using selections for the Leu2 gene, CSM-TRP was replaced with 0.47 g/L CSM-LEU (Sunrise Science). When no selection was applied, YPD media, 10 g/L yeast extract (Thermofisher), 20 g/L peptone (Thermofisher), and 20 g/L glucose, was used. As needed, YPD was supplemented with 100 μg/mL nourseothricin (Gold Biotechnology) for NAT gene selection. For yeast display of antibody fragments, SD-Trp or SG-Trp media was further buffered to pH 6.25 by adding 5.4 g/L Na2HPO4 and 8.56 NaH2PO4·H2O. Yeast were grown at 30° C. with agitation. For both yeast and E. coli, solid media plates were made with the addition of 20 g/L of agar (Fisher Scientific).
Polymerase chain reaction (PCR) was carried out using KOD Hot Start DNA polymerase (Sigma-Aldrich). Custom DNA oligomers were synthesized by Eurofins Genomics. All oligomers/primers are listed in Supplemental Table S3. Gibson Assembly was carried out using a master mix containing Taq Ligase (Enzymatics), Phusion polymerase (New England Biolabs), and T5 Exonuclease (New England Biolabs). 100 ng of linearized backbone was combined with a 2× molar excess of PCR inserts in a 5 μL volume. 15 μL of master mix was then added, and the reaction was run on a thermocycler at 50° C. for one hour.
Golden Gate Assembly was carried out using a modification of previously described protocols. When annealing complementary oligos, compatible primers were combined at 25 μM in a 20-μ L volume and held at 97° C. for 5 min, then ramped down to 20° C. over the course of 35 minutes. In a 20-μ L reaction, 100 ng of base plasmid, 0.25 pmol annealed oligos (or a 2× molar excess of insert when assembling gRNA-tRNA cassettes or HR plasmids), 2 μL of T4 Ligase 10× Buffer, 0.4 μL of T4 Ligase (New England Biolabs), and 1 μL of BsaI-HFv2 (New England Biolabs) were combined. The following temperature profile was followed for the reaction: Step 1, 37° C. for 30 min; Step 2, 37° C. for 10 min; Step 3, 16° C. for 5 min; Step 4, repeat steps 2 and 3 for 30 cycles; Step 5, 37° C. for 30 min; Step 6, 60° C. for 5 min; Step 7, 80° C. for 5 min; Step 8, 4° C. hold. After assembly, the reaction mixture was dialyzed against ultrapure water and then transformed via electroporation into E. coli using standard electroporator protocols and then plated on solid media. Transformants were cultured overnight, and plasmids were extracted using a Qiaprep Spin Miniprep Kit (Qiagen). Plasmids were confirmed via both restriction enzyme digest check and Sanger sequencing.
Cloning Yeast Diversifying Base Editor (yDBE) Constructs
The amino acid sequence for MCP-AID*Δ was codon optimized for expression in yeast and synthesized by Twist Bioscience. MCP (MS2 phage coat protein) contains the N55K mutation and AID*Δ is an engineered version of human AID with the following amino acid mutations: K10E, T82I, E156G, 195*. MCP and AID*Δ are connected by a (GGGGS)4 linker and SV40 nuclear localization sequence. The dCas9 construct was derived from plasmid bRA77 (Addgene plasmid #100953) and includes a yeast-codon-optimized Cas9 from Streptococcus pyogenes with a triple, C-terminal, SV40 nuclear localization sequence. The PCR and Gibson Assembly was used to introduce the necessary mutations (D10A, H840A) to make nuclease dead Cas9 (dCas9).
Both dCas9 and MCP-AID*Δ were first placed into base “EMY” constructs using Gibson Assembly. Promoters and terminators were then added to each sequence and cloned with Golden Gate Assembly into a backbone that is compatible with yeast homologous recombination (HR). Base EMY plasmids containing verified yeast promoters, terminators, and backbones (both HR-ready and 2p expression plasmid sets) were compatible with Golden Gate cloning. MCP-AID*Δ was placed under control of the S. cerevisiae GAL2 promoter, while dCas9 was placed under control of the GAL1 promoter. Both the GAL1 and GAL2 promoters are strongly induced in galactose media.
The 4-4-20 scFv fused to AGA2 was taken from plasmid pCT302 (Addgene plasmid #41845) and placed in an HR vector. 4-4-20 is expressed under control of the pGAL1 promoter. Mammalian-codon-optimized wild-type GFP (wtGFP) was created from an eGFP expression vector, pcDNA3-eGFP-LIC (Addgene plasmid #40768). Mutations L64F and T65S (reverting eGFP to wtGFP) were introduced using Gibson Assembly. Note that wtGFP still contains an H231L mutation and valine insertion at the “1a” position relative to Aequorea victoria GFP, but neither mutation affects the excitation/emission spectra. wtGFP was placed into a base EMY vector, and Golden Gate Assembly was used to place it in an HR vector along with a strong, constitutive promoter (pTDH3).
The sequence for AIDdead was synthesized as a linear fragment by Twist Bioscience, inserted into an EMY base vector using Gibson Assembly, and then placed into an HR vector using Golden Gate. Alternate codon optimizations of AID were synthesized by Twist Bioscience, amplified and cloned using a similar pipeline to AIDdead. Mutants AID731Δ, AIDmono, AID*mono, and AID731mono were created by amplifying fragments of AIDdead or AID*Δ with custom primers to introduce the desired mutations, then inserting the amplicons into an HR vector using Gibson Assembly. A complete list of mutations from the wildtype AID sequence can be found in Table 3. For strains including RFA3, the RFA3 coding sequence was amplified from yeast genomic DNA and then fused to the C-terminus of AID*Δ or dCas9 and placed in an EMY backbone using Gibson Assembly followed by Golden Gate Assembly to place into an HR backbone. A similar strategy was used to fuse AID731Δ to the C-terminus of dCas9 in an EMY backbone to create dCas9-AID731Δ. An alternate sequence for MCP, dubbed MCPz, was synthesized by Twist Bioscience and then fused to AID*Δ and directly cloned into an HR backbone using Gibson Assembly. All AID mutants were placed under the pGAL2 promoter to allow comparison to the original construct. dCas9-RFA3 and dCas9-AID731Δ were under the control of the pGAL1 promoter.
gRNA plasmid cloning. For single targeting gRNA plasmids, a Golden-Gate-compatible base plasmid was first constructed using Gibson Assembly. Four gRNA scaffolds were synthesized by Twist Bioscience: No MS2, M13, M4, and Mtx2. Each of these were cloned into a 2p, Trp-selection plasmid, pY120, using Gibson Assembly creating pY120g-NoMS2, pY120g-M13, pY120g-M4, and pY120g-Mtx2, respectively. Each plasmid consists of a strong, yeast, RNA polymerase III pSNR52 promoter, a blank gap region flanked by BsaI cut sites, the gRNA scaffold variant, and a tSUP4 terminator. Using these first four plasmids, all the remaining gRNA scaffold variants were made using PCR and Gibson Assembly of partial scaffold fragments (M1, M14, M3tx2, etc.; Table 6). To construct true targeting cassettes, e.g., to mutate wtGFP by targeting distinct DNA sequences within the gene as described below, the blank gap region was routinely replaced by a 20-bp spacer sequence using annealed oligos and Golden Gate Assembly. A full list of spacer sequences can be found in Table 7.
For 3× gRNA-tRNA cassettes, the assembly strategy of GTR-CRISPR was generally followed. First a base plasmid was made to attach the M13 or Mtx2 scaffold gRNA with yeast tRNAGLY (GCC). Gibson Assembly was used to join the tRNA to the C-terminal end of the gRNA scaffold in a pUC19 base vector, creating pUC19-M13-tRNAGly and pUC19-Mtx2-tRNAGly. At the C-terminal end of the gRNA scaffold, the tRNAs were separated by a short ‘AAACAA’ nucleotide linker. Custom primers were used to perform two separate PCRs that would add the desired spacer sequences along with BsaI recognition sites that would reveal customized 4-bp gates when digested. Golden gates were checked for compatibility using a custom python script and a dataset that measured gate fidelity in the presence of T4 Ligase. The two PCRs were combined with their matching pY120g-M13 or pY120g-Mtx2 base plasmids in a Golden Gate Assembly to produce a 3× gRNA-tRNA cassette.
Yeast strain engineering. The Ura3 selection marker, along with the adjacent pGAL1-AGA1 construct, were removed by plating on 5-FOA to create strain EBY101 (Table 2), which was sequence confirmed following gDNA extraction and then used as a base for all fluorescence shift assay and high-throughput sequencing tests. Linear fragments to be used for integration were amplified using PCR. For simultaneous integrations, each linear fragment had 50-60 base pairs of homology to the adjacent fragments (e.g., HR1 has homology to HR2 and HR2 has homology to HR3, etc.). Linear fragments were integrated using the high-efficiency, lithium acetate transformation method, and integration loci were selected based off prior work showing sites that yield robust gene expression. For an initial strain construction step and demonstration of yDBE activity via mutation of wtGFP (shift assay described below), the wtGFP was inserted at YORWΔ22 along with the NAT (nourseothricin resistance) gene (EBY101-wtGFP). The base editor gene expression constructs (e.g., MCP-AID*Δ and dCas9) along with a Leu marker gene were integrated simultaneously at YPRCτ3 (AC001). AIDdead was integrated in a similar manner (AC0002). These integrations, along with all those described below, were confirmed by PCR of extracted genomic DNA. Finally, gRNA plasmids were transformed into desired strains using the same lithium acetate transformation protocol used for integrations.
To test MCP-AID mutants, three preliminary strains were created that had an integrated wtGFP at YORWΔ22 and an integrated dCas9 and gRNA expression cassette (either 18L, t22L, or NT1 gRNA targeting sequences) at YPRCτ3 (strains AC201-203). Then, for each MCP-AID variant, linear expression constructs were amplified and integrated at the YPRCΔ15 locus with TRP selection in strains AC201-203. For brevity, the resultant strains are excluded from Table 2. For further analysis, MCP-AID731Δ and dCas9 were later integrated at YPRCτ3 (AC003), which facilitated comparisons to AC001 with a larger set of gRNAs expressed on plasmids. dCas9-RFA3 with AID731Δ or dCas9-AID731Δ were integrated at YPRCτ3 (AC004 and AC005, respectively), followed by gRNA plasmid transformation, for comparison with AC003.
For creation of strains for yeast display studies and mutation of an scFv, an AGA2-4-4-20 expression construct was inserted at YORWΔ22 along with the NAT gene in EBY100. The optimized base editor (MCP-AID731Δ and dCas9) was then integrated at YPRCτ3 (AC301). Lastly, the M13 and Mtx2 3× gRNA-tRNA plasmids were transformed into this strain. To confirm the binding profile of 4-4-20 variants, 4-4-20 mutant strains were created by first making an integration-compatible vector using Gibson Assembly, then amplifying a linear AGA2-4-4-20 (mutant) fragment using PCR and integrating the construct into otherwise unmodified strain EBY100.
wtGFP-eGFP fluorescence shift assay. For the wtGFP-eGFP fluorescence shift assay, yeast were picked from a plate into 2 mL SD-Trp at 30° C. with shaking. After overnight growth, the cells were induced by diluting the cells down to an OD of 0.25 in 2 mL SG-Trp media. Cells were then cultured for the specified time (1-8 days). For inductions longer than 2 days, cultures were passaged in fresh SG-Trp media every 2 days, with initial OD set at 0.25. After induction of yDBEs in galactose, 1×107 cells were rinsed with phosphate buffered saline (PBS) and analyzed using a FACSMelody flow cytometer (BD). Flow cytometry data analysis was performed using FlowJo.
High-throughput sequencing of mutated genomic yeast DNA. To induce mutations prior to high throughput sequencing, AC003 yeast with pY120g-Mtx2-28L were cultured for 8 days in SG-Trp media. The cells were diluted every 2 days in fresh media down to an OD of 0.25. Genomic DNA was collected using the Yeastar Genomic DNA Kit (Zymo Research). The wtGFP locus was amplified by PCR with primers that added sequencing adapters, and DNA concentrations were measured using the Qubit fluorimetry system. DNA was sent to Genewiz for EZ-Amplicon sequencing (PE250 MiSeq, Illumina), generating 100,000+ reads per run. The paired reads were first merged together using BBMerge and then aligned to the reference using bwa mem. Then, variant calls were compiled together using samtools mpileup. To remove background signal from the analysis, DNA from EBY101-wtGFP with plasmid pY120 (which lacks all base editor components) was also sequenced. The cells were similarly cultured for 8 days, and the amplified DNA was prepared similar to the base editor strains. Before the final substitution frequency analysis, the background signal from EBY101-wtGFP was subtracted from the signal collected from the base editor strain, and negative values were set to zero. DNA from the VH of 4-4-20 was prepared and sequenced similarly to GFP DNA.
A custom Python script was used to calculate and visualize the average substitution and insertion/deletion rate over a user-specified window. First, the substitution rate was calculated on a per-nucleotide basis using the file generated by mpileup. The rate is the number of reads with a mismatch from the reference nucleotide divided by the number of total reads that aligned to that nucleotide, excluding insertions or deletions. These per-nucleotide rates could then be averaged across a window to give an overall substitution rate. Additional custom Python scripts were used to plot the frequency of mutations at each base (distribution plots) and map the frequency of each type of substitution (heatmaps). The number of substitutions per read was calculated and visualized using a custom Python script that processed the MD:Z tag from the bam file produced during the bwa mem alignment step. All Python scripts are available upon request.
Yeast display and sorting. To induce mutations prior to staining and sorting, yeast were cultured for 8 days in SG-Trp. The cells were diluted every 2 days in fresh media down to an OD of 0.25. Cells were induced to display by first growing in buffered SD-Trp media overnight then diluting the cells to OD 0.5 in buffered SG-Trp and culturing for 24 hours.
Due to the relatively high starting affinity of 4-4-20 for fluorescein, the scFvs were screened using a competitive assay.6 2×107 cells were first rinsed with PBSF (PBS with 0.1% BSA) and stained with 1 μM biotinylated fluorescein (Biotium) for 60 minutes in a volume of 200 μL. Cells were rinsed again with PBSF then stained with aminofluorescein (Thermo Scientific), a non-fluorescent competitor. Cells were then placed on ice and then rinsed with ice-cold PBSF and scFv expression was stained for using an anti-c-myc antibody conjugated to AF647 (Cell Signaling Technologies) at 8 μg/mL. The presence of remaining scFv-bound biotinylated fluorescein was visualized using streptavidin-PE (Invitrogen) at 10 μg/mL. The secondary stain was performed on ice in the dark for 30 minutes. The cells were then rinsed and sorted using a FACSMelody instrument. The sorted cells were collected in SD-Trp media and allowed to recover for 1-2 days. This process was repeated four times, each time using a more stringent gate during FACS. After the fourth sort, the cells were plated on synthetic, -TRP plates and allowed to grow for 2 days.
Single yeast colonies were picked and compared against strain EBY100-4420. Clones which had a substantial increase in antigen binding in a competitive stain relative to EBY100-4420 were selected for further characterization. From these mutants, the 4-4-20 scFv gene was extracted using a nested colony PCR and Sanger sequenced. To verify that affinity improvements were definitively and solely from mutations in the 4-4-20 sequence, the mutant sequences were copied using PCR, cloned into an HR backbone, and integrated into an unmodified strain background as described above.
Titration of antibody affinity. An antigen titration was used to measure the affinity of 4-4-20 and its variants. Cells were first cultured and induced to display scFv as described above. 1×105 cells were rinsed then stained in 500 μL PBSF with antigen concentrations ranging from 0.3 pM to 30 nM of biotinylated fluorescein for 3 hours at room temperature. For antigen concentrations 0.1 nM and 0.03 nM, 1×104 displaying cells were mixed with 1×105 of non-displaying cells to both ensure that antigen quantities were never limiting and there were still sufficient cells to form a pellet. For the lowest two concentrations, 3 pM and 0.01 nM, 1×105 cells were used but the volume was increased to 40 mL and 14 mL respectively to prevent limiting antigen quantities. After primary staining, cells were placed on ice, rinsed with ice-cold PBSF, and then stained with secondary reagents (streptavidin-PE and anti-c-myc-AF647) in 30 μL for 30 minutes on ice. Cells were then rinsed and analyzed on a FACSMelody flow cytometer. Data was normalized and best-fit lines were calculated using a nonlinear regression in Graphpad Prism.
Statistics. All the fluorescence shift assays and the antigen titration were performed in biological triplicate (n=3). All reported error bars represent one standard deviation, except where otherwise noted. To calculate p-values, a multiple comparison test using Tukey's range method was done using Graphpad Prism. For comparisons of the dissociation constants generated by the best-fit curves of the antigen titrations, an extra-sum-of-squares F test was done in Graphpad Prism.
The yeast Saccharomyces cerevisiae is commonly used to screen protein variants to interrogate and improve their structure and performance. While there are many techniques to carry out directed evolution in yeast, there is still a need to improve their speed and ease of use. Herein, an optimized and integrated CRISPR diversifying base editor for use in yeast and demonstrate its ability to rapidly improve the affinity of an antibody through yeast display. The base editor mutation rate up was enhanced to 27-fold by characterizing an improved deaminase variant and by optimizing the structure of the CRISPR guide RNAs. The optimized diversifying base editor was applied to generate a library of anti-fluorescein scFv variants and a higher affinity mutate was isolated via FACS sorting. The diversifying base editor is a powerful tool for facilitating not only antibody affinity maturation, but any directed evolution experiments, and are able to attain a rate of in situ mutations of 1×104 mutations/bp/generation, roughly 10-fold higher than the previously reported highest rate of in situ mutations. In general, the names S. cerevisiae and yeast are used interchangeably.
Libraries have long been created through traditional mutagenesis techniques such as site-saturation mutagenesis and error-prone PCR. While these methods can introduce sufficient diversity, they require laborious cloning techniques and cannot be rapidly iterated. A faster strategy is engineering yeast that can create the desired diversity in situ. To this end, a number of methods have been developed to generate genetic diversity within a cell. However, they generally result in low mutation rates, untargeted mutations, an inability to quickly re-target the mutagenesis system, and/or require upstream efforts of molecular cloning with non-traditional vector systems. For example, OrthoRep, which is a method for error-prone PCR integrated within a yeast cell, took up to 13 passages, or up to 90 generations, to evolve a desired resistance phenotype. This represents a substantial time investment to introduce diversity. The present disclosure describes development of a technique that exceeds the reported Orthorep mutation rate of 1×10−5 mutations/bp/generation by 10-fold.
CRISPR base editors demonstrate great applications, which in general combine CRISPR DNA binding proteins such as Cas9 and Cas12 with cytidine or adenine deaminases (
The present disclosure provides a developed diversifying base editor system in yeast for use with the human activation-induced cytidine deaminase (AID), which is the enzyme responsible for somatic hypermutation of antibody DNA region in developing B-cell lymphocytes, which are the type of human immune cell that creates and produces antibodies. It stands to reason that using AID to mutagenize antibodies in yeast in the context of a diversifying CRISPR base editor leads to affinity maturation that most closely resembles that which occurs in human B cells.
Antibody therapeutics have seen tremendous growth over the past decade. This versatile drug platform has been used to treat a variety of diseases, including viral infections, autoimmune disorders, and cancer. The yeast Saccharomyces cerevisiae has become a popular platform for antibody interrogation. Yeast can be engineered to display antibody fragments on their cell surface, allowing for quick characterization of the antibody through flow cytometry and other techniques. Yeast also grow rapidly and to high densities, making them ideal for screening large libraries of antibody variants up to 1010 in size. Generating mutant libraries is useful for altering the characteristics of a candidate antibody therapeutic, most often to improve the affinity.
A dCas9 CRISPR diversifying base editor has been optimized for use in yeast that generates in situ mutations in a targeted but broad manner at high rates and that is further compatible with yeast display. The diversifying base editor was optimized in two critical ways using a fluorescent screen. First, by testing a variety of AID enzymes, both previously described and novel, a highly active variant was identified that improved mutation rates. Second, the location of MS2 aptamer loops was optimized within the gRNA scaffold to further maximize in situ mutation rate and increase the breadth of mutation across a small region of the genome. Third, the versatility of the platform was increased using rapidly-assembled, tRNA-gRNA cassettes. This system was applied to improve the affinity of an anti-fluorescein antibody scFV using yeast display. While CRISPR base editors have been used previously in yeast, the present disclosure demonstrates for the first time they can been applied for diversification and directed evolution, i.e., this is the first demonstration of a diversifying base editor in yeast, and it is fully compatible with high throughput screens like yeast display of antibodies. Three new methods of increasing the mutational rate and breadth are also demonstrated. These include (1) using an AID variant with high catalytic activity to increase mutation rate, (2) optimizing the placement and number of MS2 aptamer loops with the gRNA to increase both mutation rate and mutation breadth (i.e., how far away from the gRNA target site that mutations can still occur), and (3) utilizing rapidly-assembled, tRNA-gRNA cassettes to enable targeting multiple DNA regions with the diversifying base editor, again improving its breadth of mutation. Finally, this is the first demonstration of human AID being used for CRISPR base editing in yeast. The enhanced diversifying base editor is fully compatible with high throughput screens like yeast display of antibodies, and is the first demonstration of human AID being used for CRISPR base editing in yeast.
This CRISPR-based diversifying base editing platform for yeast has general utility for high throughput in situ targetable mutation generation that can be applied towards the rapid and directed evolution of desired cellular phenotypes and protein characteristics. Because a single yeast strain that has been engineered to express the dCas9 protein and MCP-AID fusion protein can be easily engineered to create mutations in a user-defined manner via expression of an easily introduced gRNA, this platform can allow for evolution of any native or heterologous DNA sequence with ease. This platform can also allow for continuous in vivo directed evolution and for retargeting the diversifying base editing system towards new DNA regions of interest by introducing new targeting gRNAs at any time. This platform can also allow for simultaneous evolution of multiple DNA regions at a time by expressing several gRNAs simultaneously, as described with the gRNA-tRNA arrays.
Characterization of an initial CRISPR diversifying base editor in yeast. A base editing strain was created for preliminary testing by integrating dCas9 in the yeast genome as well as a fusion protein consisting of the MCP (MS2-binding protein) and a variant of AID called AID*Δ. dCas9 expression was controlled by a galactose-inducible promoter: Pgal1, and MCP-AID* A expression was similarly controlled by Pgal2. The fluorescent protein, wtGFP, was also integrated into the yeast genome at a separate site and was under the control of a constitutive promoter, pTDH3. For these two genome integration sites, the genomic sites YORWΔ22 and YPRCτ3 were selected because said sites are noncoding loci with high transgene expression. A fluorescence-based assay was then used that could detect mutations to characterize the base editor's ability to generate in situ mutations. Relative to “wild-type” GFP (wtGFP), enhanced GFP (eGFP) has a mutation at S65T which causes a shift in the excitation spectra peak from 405 nm to 488 nm (
By targeting this locus within wtGFP with the base editor by expression a MS2-containing gRNA with homology to DNA near this site, the base editor creates S65T mutations in situ over time in yeast, and the fraction of cells with eGFP-like fluorescence were able to be used as a correlate for base editor mutation rate. As described below, high-throughput sequencing confirmed that a higher rate of eGFP-like fluorescence correlates with a higher rate of mutation at all proximal bases, showing that this assay can also detect breadth of mutation rate for a diversifying based editor.
The initial base editor (MCP-AID*Δ and dCas9) was applied and cells with a shifted excitation were produced (
Enhancing CRISPR diversifying base editor mutation rate by varying AID variant. It was sought to improve the mutation rate afforded by the AID-based CRISPR diversifying base editor through two main strategies: 1) improving the activity of the deaminase and 2) optimizing the location of the ms2 aptamers within the gRNA scaffold.
To determine if utilizing alternative AID variants improved the mutation rate bestowed by the CRISPR based editor system in yeast, a few previously described mutants of AID were assessed for their activity in the present platform (when fused to MCP and coexpressed with dCas9). Some of the mutations described in prior work were combined into novel variants not previously reported, mainly AID*mono and AID731mono (Table 3). As a secondary strategy, AID variants were also made that included the yeast ssDNA binding protein RFA3, a subunit of the replication factor A (RPA) complex. Previous work has shown that fusing RFA3 to AID leads to an increase in the rate of genome-wide mutations. A fluorescence shift assay was performed on all variants. The best performing variant, AID731Δ, had a roughly 5-fold increase in activity in base editing relative to AID*Δ, as measured by increased eGFP fluorescence that corresponded to a DNA S65T mutation, without any discernible impact to growth (
Enhancing CRISPR diversifying base editor mutation rate and breadth by optimizing ms2 aptamer placement with the gRNA scaffold (
In addition, it was ideal to expand the mutagenic window, i.e., the breadth of mutation, to allow more thorough mutation introduction across a wider range of DNA via the diversifying base editor. Previous work in mammalian cells estimated that, while mutations could be detected −50 to +50 bp relative to the tip of the targeting gRNA binding region and the direction of transcription, the highest rate of mutation was seen from +20 to +40 bp. This window was independent of the strand that was being targeted. The mutational window of the base editor was approximated using a set of seven positional gRNAs which targeted the template strand of wtGFP and spanned from −81 bp to +84 bp relative to the site of the desired mutation at S65T (
By combining the best gRNA scaffold (mtx2) with the best AID (AID731Δ) variant, a rate of wtGFP to eGFP fluorescence shift of over 7% after 4 days was achieved, representing an improvement over the original construct performance of 26-fold (
Enhancing CRISPR diversifying base editor mutation breadth by expressing several gRNAs in a gRNA-tRNA array and the optimized diversifying base editor's application towards improving affinity of an antibody-derived protein. 4-4-20, a mouse-derived single-chain variable fragment (scFv) that binds fluorescein, was selected as the model antibody for yeast display and in vivo targeted evolution with the enhanced diversifying base editor. The scFv was integrated into the yeast genome, and the base editor integrated at a separate genomic locus. Finally, gRNAs expressed on plasmids were used to target the scFv. The optimal mtx2 scaffold and AID731Δ base editor was used.
An scFv is comprised of a VH and VL segment, each approximately 340 bp in length. Given this size, it is not possible to reach a high level of mutagenesis over this window using a single gRNA. Ideally, each complementarity determining region (CDR) of the scFv could be targeted with a separate gRNA to maximize the rate of mutation in these important regions. This requires a great mutational breadth from the CRISPR diversifying based editor. It was determined that a mechanism that employs gRNA processing using interspersed tRNAs in array is used to allow for simultaneous coexpression of several gRNAs to allow for a greater breadth of mutation rate. A golden gate assembly method, similar to that described by a tool called GTR-CRISPR, was used to create 3× and 6× gRNA-tRNA cassettes (i.e., cassettes with three different targeting gRNAs or 6 different targeting gRNAs) that target the VH CDRs or the VH and VL CDRs, respectively, of the 4-4-20 scFv (
The base editor was induced in galactose media for eight days with the 3 or 6 gRNAs (or a 1 gRNA test as a control) to generate a large library of scFv mutants. Cells were passaged every 2 days. Deep sequencing showed that mutations were localized around each CDR, with a peak in mutations roughly 20 basepairs downstream of the targeting gRNA PAM. The yeast was further sorted that were displaying the scFv for their improved binding to the scFv's target, fluorescein. The cells were stained with biotinylated fluorescein and AlexaFluor647 anti-c-myc antibody. The cells were sorted 3 times, each time recovering only the top 1-2% of scFv-positive cells based on their antigen binding.
After 4 rounds of sorting, the cells were plated and the individual colonies picked to assess their affinity through yeast display. Three were found that had a substantial increase in affinity over wildtype. Genomic DNA from these yeast were purified, PCR amplified, and sequenced. A concentration of mutations were found in the CDR2 of the heavy chain. The best mutant, in terms of increased FITC signal, had a W184L mutation near its VHCDR2 (
To validate the results of the fluorescence shift assay, deep amplicon sequencing was used to characterize the mutation rate of the base editor. Critically, the S65T mutation rate correlates with the overall mutation rate. In cells with a single gRNA targeting wtGFP, the window of prominent mutation was roughly ±50 bp and centered 20 bp downstream of the PAM. Deep sequencing was also performed on the 4-4-20 scFv targeted simultaneously with 3 gRNAs. Here, the overall mutation rate within the variable heavy region was estimated to be 1×10−4 mutations/bp/generation, a 10-fold higher mutation that has ever previously been reported for an in situ evolution system. Herein, insertions and deletions were rarely detected in the targeted DNA, making this technology well-suited for protein engineering and diversification.
Methods for high throughput sequencing: For deep sequencing, the base editor was induced for 8 days targeting the wtGFP locus with a single gRNA. Genomic DNA was collected using the Yeastar Genomic DNA Kit (Zymo Research). The wtGFP locus was amplified by PCR, and DNA concentrations were normalized using the Qubit fluorimetry system. DNA was sent to Genewiz for EZ-Amplicon sequencing (PE250 MiSeq, Illumina), generating 50,000 reads. The reads were demultiplexed using fastq-multx from ea-utils then aligned to the reference using bwa mem. Mutation rates were then calculated.
Advantages and improvements over existing methods, devices or materials. A key advantage of this technology is that it enables the highest rates of in situ DNA mutation of all such platforms to date. The CRISPR base editor with three sgRNAs, disclosed herein, has a mutation rate of 1×10−4 mutations/bp/generation, which is 10-fold higher than previously described systems. The base editor can also make a variety of mutations (e.g., C4T, G4T). The enhanced diversifying base editor technology further facilitates rapid antibody diversification (8 days for library generation). It has been contemplated that this diversifying base editor can be extended to a wide variety of tasks.
This system is an enabling technology for studies that wish to employ direct evolution via DNA mutagenesis. The key innovation herein is simultaneously fusing an enhanced (731-variant) AID to the MCP viral protein that binds MS2 stem loops, using an optimized placement of MS2 stem loops in sgRNA scaffolds, and using interspacing of tRNAs in conjunction with multiple sgRNAs allowed a robust and targetable in situ mutation rate of 1×10−4 mutations/bp/generation, roughly 10 fold higher than the previously reported highest rate of in situ mutations.
Because this system uses AID, it is more likely to recapitulate somatic hypermutation. Ultimately, it has been contemplated that engineered yeast can mimic the entirety of antibody production and evolution as is seen in mammalian B cells.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present disclosure without departing from the scope or spirit of the invention. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the methods disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
TGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTCCGTT
GGGCCAAGTGGCACCGAGTCGGTGC
ACATGAGGATCACCCATGTCGCTCGTGTTCCC
TGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTCCGTT
GTCTGCAGGGCCAAGTGGCACCGAGTCGGTGC
CATGAGGATCACCCATGTCTGCAGGGCCTCGGTGC
TGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTCCGTT
ATCACCCATGTCTGCAGGGCCTCGGTGC
GTCTGCAGGGCCAAGTGGCACCGAGGCCAACATGAGG
ATCACCCATGTCTGCAGGGCCTCGGTGC
TGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTCCGTT
CACATGAGGATCACCCATGTGCCACGAGCGACATGAG
GATCACCCATGTCGCTCGTGTTCCC
GTCTGCAGGGCCAAGTGGCACCGAGTCGGTGCGGGAG
CACATGAGGATCACCCATGTGCCACGAGCGACATGAG
GATCACCCATGTCGCTCGTGTTCCC
TGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTCCGTT
GGGCCAAGTGGCACCGAGTCGGTGCGGGAGCACATGA
GGATCACCCATGTGCCACGAGCGACATGAGGATCACC
CATGTCGCTCGTGTTCCC
TGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTCCGTT
GGGCCAAGTGGCACCGAGTCGGTGCGCGCACATGAGG
ATCACCCATGTGC
CAGCGGCAAGTGGCACCGAGTCGGTGC
CGGGTGCAGGGCCAAGTGGCACCGAGTCGGTGC
TGTGGCCAAGTGGCACCGAGTCGGTGC
ACCCATGTCCAGCTGCAGGGCCAAGTGGCACCGAGT
ATGTCTGCAGGGCC
AAGTGGCACCGAGTCGGTGC
CTGCAGGGCC
TAGCAAGTTGAAATAAGGCTAGTCCG
CAAGTGGCACCGAGTCGGTGC
CTGCAGGGCC
TAGCAAGTTGAAATAAGGCTAGTCCG
CAGGGCCAAGTGGCACCGAGTCGGTGC
CTGCAGGGCC
TAGCAAGTTGAAATAAGGCTAGTCCG
CAAGTGGCACCGAGTCGGTGC
CTGCAGGGCC
TAGCAAGTTGAAATAAGGCTAGTCCG
GTCCAGCTGCAGGGCCAAGTGGCACCGAGTCGGTGC
CTGCAGGGCC
TAGCAAGTTGAAATAAGGCTAGTCCG
GCAGGGCC
AAGTGGCACCGAGTCGGTGC
ATGGCTAGTAATTTTACTCAATTCGTGTTAGTGGACAAC
GGTGGTACTGGTGATGTAACAGTTGCTCCATCTAATTTT
GCCAATGGCGTGGCTGAGTGGATTTCCAGTAACTCCAGA
TCACAAGCCTACAAAGTGACATGCTCCGTTCGTCAATCC
TCCGCTCAGAAGAGAAAATATACCATAAAGGTGGAAGTC
CCAAAGGTCGCCACCCAAACCGTTGGTGGAGTAGAATTA
CCTGTAGCCGCTTGGCGTTCATACTTAAACATGGAATTA
ACAATTCCCATTTTTGCCACTAACTCAGACTGTGAATTA
ATAGTAAAAGCAATGCAAGGCTTATTAAAGGATGGAAAC
CCAATCCCTTCAGCAATTGCTGCTAATTCAGGCATTTAT
TCAGCAGGAGGTGGAGGTTCAGGCGGTGGCGGAAGTGGAG
TTGTACCAATTTAAGAACGTGAGATGGGCTAAAGGTAGAAG
GGAAACTTATCTATGTTACGTAGTGAAAAGAAGAGACTCAG
CAACTTCCTTTTCTTTAGATTTCGGTTACTTAAGAAATAAGA
ACGGCTGTCATGTTGAATTGTTGTTCTTGAGGTACATAAGTG
ACTGGGACCTAGATCCTGGAAGGTGTTATCGTGTTACATGG
TTTATCTCTTGGTCACCATGCTATGATTGTGCCAGACACGTA
GCTGATTTCTTACGTGGTAACCCAAATTTATCATTAAGAATT
TTCACCGCTAGATTGTATTTTTGCGAAGATAGGAAAGCTGA
GCCTGAAGGCTTAAGAAGATTACATAGAGCCGGAGTTCAGA
TTGCAATAATGACTTTCAAAGATTACTTTTACTGCTGGAATA
CCTTCGTCGAAAATCATGGTAGAACCTTCAAAGCTTGGGAA
GGCTTGCACGAAAACTCCGTCAGATTGAGTAGGCAATTAAG
AAGAATATTGCTACCCTTGTACGAAGTTGACGATTTACGTG
ATGCATTCAGGACATAA (SEQ ID NO: 22)
AAGCGTAAGGTGGATCCTAAGAAAAAGAGAAAGGTTTAA
ATGCAGTTACTTCGCTGTTTTTCAATATTTTCTGTTATTG
CTTCAGTTTTAGCACAGGAACTGACAACTATATGCGAGC
AAATCCCCTCACCAACTTTAGAATCGACGCCGTACTCTT
TGTCAACGACTACTATTTTGGCCAACGGGAAGGCAATGC
AAGGAGTTTTTGAATATTACAAATCAGTAACGTTTGTCA
GTAATTGCGGTTCTCACCCCTCAACAACTAGCAAAGGCA
GCCCCATAAACACACAGTATGTTTTTAAGGACAATAGCT
CGACGATTGAAGGTAGATACCCATACGACGTTCCAGACTAC
ACACCACTATCACTTCCTGTTAGTCTAGGTGATCAAGCCTCCATC
TCTTGCAGATCTAGTCAGAGCCTTGTACACAGTAATGGAAACAC
CTATTTACGTTGGTACCTGCAGAAGCCAGGCCAGTCTCCAAAGG
TCCTGATCTACAAAGTTTCCAACCGATTTTCTGGGGTCCCAGAC
AGGTTCAGTGGCAGTGGATCAGGGACAGATTTCACACTCAAGAT
CAGCAGAGTGGAGGCTGAGGATCTGGGAGTTTATTTCTGCTCTC
AAAGTACACATGTTCCGTGGACGTTCGGTGGAGGCACCAAGCTT
GAAATTAAGTCCTCTGCTGATGATGCTAAGAAGGATGCTGCTAA
GAAGGATGATGCTAAGAAAGATGATGCTAAGAAAGATGGTGACG
TCAAACTGGATGAGACTGGAGGAGGCTTGGTGCAACCTGGGAG
GCCCATGAAACTCTCCTGTGTTGCCTCTGGATTCACTTTTAGTGA
CTACTGGATGAACTGGGTCCGCCAGTCTCCAGAGAAAGGACTG
GAGTGGGTAGCACAAATTAGAAACAAACCTTATAATTATGAAACA
TATTATTCAGATTCTGTGAAAGGCAGATTCACCATCTCAAGAGAT
GATTCCAAAAGTAGTGTCTACCTGCAAATGAACAACTTAAGAGTT
GAAGACATGGGTATCTATTACTGTACGGGTTCTTACTATGGTATG
GACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCA
GAAC
AAAAGCTTATTTCTGAAGAAGACTTGTAA (SEQ ID NO: 25)
TCTGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTC
TCTGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTC
CATGTCTGCAGGGCCAAGTGGCACCGAGTCGGTGC
TCTGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTC
ATGAGGATCACCCATGTCTGCAGGGCCTCGGTGC
CATGTCTGCAGGGCCAAGTGGCACCGAGGCCAACA
TGAGGATCACCCATGTCTGCAGGGCCTCGGTGC
TCTGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTC
CATGTCTGCAGGGCCAAGTGGCACCGAGTCGGTGC
ACATGAGGATCACCCATGTCGCTCGTGTTCCC
TCTGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTC
AGGATCACCCATGTCGCTCGTGTTCCC
TCTGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTC
CATGAGGATCACCCATGTGC
TCAGCGGCAAGTGGCACCGAGTCGGTGC
CACGGGTGCAGGGCCAAGTGGCACCGAGTCGGTGC
ATGTGGCCAAGTGGCACCGAGTCGGTGC
CACCCATGTCCAGCTGCAGGGCCAAGTGGCACCGA
TCTGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTC
GGCAAGTGGCACCGAGTCGGTGC
TCTGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTC
GTGCAGGGCCAAGTGGCACCGAGTCGGTGC
TCTGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTC
GCCAAGTGGCACCGAGTCGGTGC
TCTGCAGGGCCTAGCAAGTTGAAATAAGGCTAGTC
CATGTCCAGCTGCAGGGCCAAGTGGCACCGAGTCG
GTGGAGGCGGTGGTTCAGGC
CCTAAGAAAAAGA
GAAAAGTG
GCCGCAGCCGGCTCT
GSG
PKKKRKV
AAAGS
This U.S. Nonprovisional Patent Application claims priority to, and the benefit of, U.S. Provisional Patent Application No. 63/384,131, filed Nov. 17, 2022, which is incorporated by reference herein in its entirety.
| Number | Date | Country | |
|---|---|---|---|
| 63384131 | Nov 2022 | US |