MAD NUCLEASES

Information

  • Patent Application
  • 20220213458
  • Publication Number
    20220213458
  • Date Filed
    March 09, 2022
    2 years ago
  • Date Published
    July 07, 2022
    a year ago
Abstract
The present disclosure provides new RNA-guided nuclease systems and engineered nickases for making rational, direct edits to nucleic acids in live cells.
Description
FIELD OF THE INVENTION

The present disclosure provides new RNA-guided nuclease systems and engineered nickases for making rational, direct edits to nucleic acids in live cells.


BACKGROUND OF THE INVENTION

In the following discussion certain articles and methods will be described for background and introductory purposes. Nothing contained herein is to be construed as an “admission” of prior art. Applicant expressly reserves the right to demonstrate, where appropriate, that the methods referenced herein do not constitute prior art under the applicable statutory provisions.


The ability to make precise, targeted changes to the genome of living cells has been a long-standing goal in biomedical research and development. Recently, various nucleases have been identified that allow manipulation of gene sequence: hence, gene function. These nucleases include nucleic acid-guided nucleases. The range of target sequences that nucleic acid-guided nucleases can recognize, however, is constrained by the need for a specific PAM to be located near the desired target sequence. PAMs are short nucleotide sequences recognized by a gRNA/nuclease complex where this complex directs editing of the target sequence. The precise PAM sequence and PAM length requirements for different nucleic acid-guided nucleases vary; however, PAMs typically are 2-7 base-pair sequences adjacent or in proximity to the target sequence and, depending on the nuclease, can be 5′ or 3′ to the target sequence. Engineering nucleic acid-guided nucleases or mining for new nucleic acid-guided nucleases may provide nucleases with altered PAM preferences and/or altered activity or fidelity; all changes that may increase the versatility of a nucleic acid-guided nuclease for certain editing tasks.


There is thus a need in the art of nucleic acid-guided nuclease gene editing for novel nucleases with varied PAM preferences, varied activity in cells from different organisms such as mammals and/or altered enzyme fidelity. The novel MAD nucleases described herein satisfy this need.


SUMMARY OF THE INVENTION

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Other features, details, utilities, and advantages of the claimed subject matter will be apparent from the following written Detailed Description including those aspects illustrated in the accompanying drawings and defined in the appended claims.


The present disclosure provides Type II MAD nucleases (e.g., RNA-guided nucleases or RGNs) with varied PAM preferences, and/or varied activity in mammalian cells.


Thus, in one embodiment there are provided MAD nuclease systems that perform nucleic acid-guided nuclease editing including a MAD2015 system comprising SEQ ID Nos. 1 (MAD2015 nuclease), 2 (CRISPR RNA) and 3 (trans-activating crispr RNA); a MAD2016 system comprising SEQ ID Nos. 4 (MAD2016 nuclease), 5 (CRISPR RNA) and 6 (trans-activating crispr RNA); a MAD2017 system comprising SEQ ID Nos. 7 (MAD2017 nuclease), 8 (CRISPR RNA) and 9 (trans-activating crispr RNA); a MAD2019 system comprising SEQ ID Nos. 10 (MAD2019 nuclease), 11 (CRISPR RNA) and 12 (trans-activating crispr RNA); a MAD2020 system comprising SEQ ID Nos. 13 (MAD2020 nuclease), 14 (CRISPR RNA) and 15 (trans-activating crispr RNA); a MAD2021 system comprising SEQ ID Nos. 16 (MAD2021 nuclease), 17 (CRISPR RNA) and 18 (trans-activating crispr RNA); a MAD2022 system comprising SEQ ID Nos. 19 (MAD2022 nuclease), 20 (CRISPR RNA) and 21 (trans-activating crispr RNA); a MAD2023 system comprising SEQ ID Nos. 22 (MAD2023 nuclease), 23 (CRISPR RNA) and 24 (trans-activating crispr RNA); a MAD2024 system comprising SEQ ID Nos. 25 (MAD2024 nuclease), 26 (CRISPR RNA) and 27 (trans-activating crispr RNA); a MAD2025 system comprising SEQ ID Nos. 28 (MAD2025 nuclease), 29 (CRISPR RNA) and 30 (trans-activating crispr RNA); a MAD2026 system comprising SEQ ID Nos. 31 (MAD2026 nuclease), 32 (CRISPR RNA) and 33 (trans-activating crispr RNA); a MAD2027 system comprising SEQ ID Nos. 34 (MAD2034 nuclease), 35 (CRISPR RNA) and 36 (trans-activating crispr RNA); a MAD2028 system comprising SEQ ID Nos. 37 (MAD2028 nuclease), 38 (CRISPR RNA) and 39 (trans-activating crispr RNA); a MAD2029 system comprising SEQ ID Nos. 40 (MAD2029 nuclease), 41 (CRISPR RNA) and 42 (trans-activating crispr RNA); a MAD2030 system comprising SEQ ID Nos. 43 (MAD2030 nuclease), 44 (CRISPR RNA) and 45 (trans-activating crispr RNA); a MAD2031 system comprising SEQ ID Nos. 46 (MAD2031 nuclease), 47 (CRISPR RNA) and 48 (trans-activating crispr RNA); a MAD2032 system comprising SEQ ID Nos. 49 (MAD2032 nuclease), 50 (CRISPR RNA) and 51 (trans-activating crispr RNA); a MAD2033 system comprising SEQ ID Nos. 52 (MAD2033 nuclease), 53 (CRISPR RNA) and 54 (trans-activating crispr RNA); a MAD2034 system comprising SEQ ID Nos. 55 (MAD2034 nuclease), 56 (CRISPR RNA) and 57 (trans-activating crispr RNA); a MAD2035 system comprising SEQ ID Nos. 58 (MAD2035 nuclease), 59 (CRISPR RNA) and 60 (trans-activating crispr RNA); a MAD2036 system comprising SEQ ID Nos. 61 (MAD2036 nuclease), 62 (CRISPR RNA) and 63 (trans-activating crispr RNA); a MAD2037 system comprising SEQ ID Nos. 64 (MAD2031 nuclease), 65 (CRISPR RNA) and 66 (trans-activating crispr RNA); a MAD2038 system comprising SEQ ID Nos. 67 (MAD2038 nuclease), 68 (CRISPR RNA) and 69 (trans-activating crispr RNA); a MAD2039 system comprising SEQ ID Nos. 70 (MAD2039 nuclease), 71 (CRISPR RNA) and 72 (trans-activating crispr RNA); and a MAD2040 system comprising SEQ ID Nos. 73 (MAD2040 nuclease), 74 (CRISPR RNA) and 75 (trans-activating crispr RNA). In some aspects, the MAD system components are delivered as sequences to be transcribed (in the case of the gRNA components) and transcribed and translated (in the case of the MAD nuclease), and in some aspects, the coding sequence for the MAD nuclease and the gRNA component sequences are on the same vector. In other aspects, the coding sequence for the MAD nuclease and the gRNA component sequences are on a different vector and in some aspects, the gRNA component sequences are located in an editing cassette which also comprises a donor DNA (e.g., homology arm). In other aspects, the MAD nuclease is delivered to the cells as a peptide or the MAD nuclease and gRNA components are delivered to the cells as a ribonuclease complex.


Additionally there is provided engineered nickases derived from the nucleases from the above-referenced systems, including MAD2016-H851A (SEQ ID NO: 178); MAD2016-N874A (SEQ ID NO: 179); MAD2032-H590A (SEQ ID NO: 180); MAD2039-H587A (SEQ ID NO: 181); MAD2039-N610A (SEQ ID NO: 182).


These aspects and other features and advantages of the invention are described below in more detail.





BRIEF DESCRIPTION OF THE FIGURES


FIG. 1 is an exemplary workflow for creating and screening mined MAD nucleases or RGNs.



FIG. 2 is a simplified depiction of an in vitro test conducted on candidate enzymes.



FIG. 3 is a list of novel Type II MADzymes that have been identified.



FIG. 4 is a map of Type II MADzymes in cluster 59.



FIG. 5 is a map of Type II MADzymes in cluster 55, 56, 57 and 58.



FIG. 6 is a map of Type II MADzymes in cluster 141.



FIG. 7 is a reproduction of a gel showing nicked plasmid formation with different MADzyme nickases compared to corresponding MADzyme nucleases.





It should be understood that the drawings are not necessarily to scale.


DETAILED DESCRIPTION

The description set forth below in connection with the appended drawings is intended to be a description of various, illustrative embodiments of the disclosed subject matter. Specific features and functionalities are described in connection with each illustrative embodiment; however, it will be apparent to those skilled in the art that the disclosed embodiments may be practiced without each of those specific features and functionalities. Moreover, all of the functionalities described in connection with one embodiment are intended to be applicable to the additional embodiments described herein except where expressly stated or where the feature or function is incompatible with the additional embodiments. For example, where a given feature or function is expressly described in connection with one embodiment but not expressly mentioned in connection with an alternative embodiment, it should be understood that the feature or function may be deployed, utilized, or implemented in connection with the alternative embodiment unless the feature or function is incompatible with the alternative embodiment.


The practice of the techniques described herein may employ, unless otherwise indicated, conventional techniques and descriptions of organic chemistry, polymer technology, molecular biology (including recombinant techniques), cell biology, biochemistry, biological emulsion generation, and sequencing technology, which are within the skill of those who practice in the art. Such conventional techniques include polymer array synthesis, hybridization and ligation of polynucleotides, and detection of hybridization using a label. Specific illustrations of suitable techniques can be had by reference to the examples herein. However, other equivalent conventional procedures can, of course, also be used. Such conventional techniques and descriptions can be found in standard laboratory manuals such as Green, et al., Eds. (1999), Genome Analysis: A Laboratory Manual Series (Vols. I-IV); Weiner, Gabriel, Stephens, Eds. (2007), Genetic Variation: A Laboratory Manual; Dieffenbach, Dveksler, Eds. (2003), PCR Primer: A Laboratory Manual; Bowtell and Sambrook (2003), DNA Microarrays: A Molecular Cloning Manual; Mount (2004), Bioinformatics: Sequence and Genome Analysis; Sambrook and Russell (2006), Condensed Protocols from Molecular Cloning: A Laboratory Manual; and Sambrook and Russell (2002), Molecular Cloning: A Laboratory Manual (all from Cold Spring Harbor Laboratory Press); Stryer, L. (1995) Biochemistry (4th Ed.) W.H. Freeman, New York N.Y.; Gait, “Oligonucleotide Synthesis: A Practical Approach” 1984, IRL Press, London; Nelson and Cox (2000), Lehninger, Principles of Biochemistry 3rd Ed., W. H. Freeman Pub., New York, N.Y.; Berg et al. (2002) Biochemistry, 5th Ed., W.H. Freeman Pub., New York, N.Y.; Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, eds., John Wiley & Sons 1998), all of which are herein incorporated in their entirety by reference for all purposes. Nuclease-specific techniques can be found in, e.g., Genome Editing and Engineering From TALENs and CRISPRs to Molecular Surgery, Appasani and Church, 2018; and CRISPR: Methods and Protocols, Lindgren and Charpentier, 2015; both of which are herein incorporated in their entirety by reference for all purposes. Basic methods for enzyme engineering may be found in, Enzyme Engineering Methods and Protocols, Samuelson, ed., 2013; Protein Engineering, Kaumaya, ed., (2012); and Kaur and Sharma, “Directed Evolution: An Approach to Engineer Enzymes”, Crit. Rev. Biotechnology, 26:165-69 (2006).


Note that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “an oligonucleotide” refers to one or more oligonucleotides. Terms such as “first,” “second,” “third,” etc., merely identify one of a number of portions, components, steps, operations, functions, and/or points of reference as disclosed herein, and likewise do not necessarily limit embodiments of the present disclosure to any particular configuration or orientation.


Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. All publications mentioned herein are incorporated by reference for the purpose of describing and disclosing devices, methods and cell populations that may be used in connection with the presently described invention.


Where a range of values is provided, it is understood that each intervening value, between the upper and lower limit of that range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either both of those included limits are also included in the invention.


In the following description, numerous specific details are set forth to provide a more thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that the present invention may be practiced without one or more of these specific details. In other instances, well-known features and procedures well known to those skilled in the art have not been described in order to avoid obscuring the invention.


The term “complementary” as used herein refers to Watson-Crick base pairing between nucleotides and specifically refers to nucleotides hydrogen bonded to one another with thymine or uracil residues linked to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three hydrogen bonds. In general, a nucleic acid includes a nucleotide sequence described as having a “percent complementarity” or “percent homology” to a specified second nucleotide sequence. For example, a nucleotide sequence may have 80%, 90%, or 100% complementarity to a specified second nucleotide sequence, indicating that 8 of 10, 9 of 10 or 10 of 10 nucleotides of a sequence are complementary to the specified second nucleotide sequence. For instance, the nucleotide sequence 3′-TCGA-5′ is 100% complementary to the nucleotide sequence 5′-AGCT-3′; and the nucleotide sequence 3′-TCGA-5′ is 100% complementary to a region of the nucleotide sequence 5′-TAGCTG-3′.


The term DNA “control sequences” refers collectively to promoter sequences, polyadenylation signals, transcription termination sequences, upstream regulatory domains, origins of replication, internal ribosome entry sites, nuclear localization sequences, enhancers, and the like, which collectively provide for the replication, transcription and translation of a coding sequence in a recipient cell. Not all of these types of control sequences need to be present so long as a selected coding sequence is capable of being replicated, transcribed and—for some components—translated in an appropriate host cell.


As used herein the term “donor DNA” or “donor nucleic acid” refers to nucleic acid that is designed to introduce a DNA sequence modification (insertion, deletion, substitution) into a locus by homologous recombination using nucleic acid-guided nucleases. For homology-directed repair, the donor DNA must have sufficient homology to the regions flanking the “cut site” or site to be edited in the genomic target sequence. The length of the homology arm(s) will depend on, e.g., the type and size of the modification being made. In many instances and preferably, the donor DNA will have two regions of sequence homology (e.g., two homology arms) to the genomic target locus. Preferably, an “insert” region or “DNA sequence modification” region—the nucleic acid modification that one desires to be introduced into a genome target locus in a cell—will be located between two regions of homology. The DNA sequence modification may change one or more bases of the target genomic DNA sequence at one specific site or multiple specific sites. A change may include changing 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 or more base pairs of the target sequence. A deletion or insertion may be a deletion or insertion of 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 75, 100, 150, 200, 300, 400, or 500 or more base pairs of the target sequence.


The terms “guide nucleic acid” or “guide RNA” or “gRNA” refer to a polynucleotide comprising 1) a guide sequence capable of hybridizing to a genomic target locus, and 2) a scaffold sequence capable of interacting or complexing with a nucleic acid-guided nuclease.


“Homology” or “identity” or “similarity” refers to sequence similarity between two peptides or, more often in the context of the present disclosure, between two nucleic acid molecules. The term “homologous region” or “homology arm” refers to a region on the donor DNA with a certain degree of homology with the target genomic DNA sequence. Homology can be determined by comparing a position in each sequence which may be aligned for purposes of comparison. When a position in the compared sequence is occupied by the same base or amino acid, then the molecules are homologous at that position. A degree of homology between sequences is a function of the number of matching or homologous positions shared by the sequences.


“Operably linked” refers to an arrangement of elements where the components so described are configured so as to perform their usual function. Thus, control sequences operably linked to a coding sequence are capable of effecting the transcription, and in some cases, the translation, of a coding sequence. The control sequences need not be contiguous with the coding sequence so long as they function to direct the expression of the coding sequence. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the coding sequence and the promoter sequence can still be considered “operably linked” to the coding sequence. In fact, such sequences need not reside on the same contiguous DNA molecule (i.e. chromosome) and may still have interactions resulting in altered regulation.


A “promoter” or “promoter sequence” is a DNA regulatory region capable of binding RNA polymerase and initiating transcription of a polynucleotide or polypeptide coding sequence such as messenger RNA, ribosomal RNA, small nuclear or nucleolar RNA, guide RNA, or any kind of RNA transcribed by any class of any RNA polymerase I, II or III. Promoters may be constitutive or inducible and, in some embodiments—particularly many embodiments in which selection is employed—the transcription of at least one component of the nucleic acid-guided nuclease editing system is under the control of an inducible promoter.


As used herein the term “selectable marker” refers to a gene introduced into a cell, which confers a trait suitable for artificial selection. General use selectable markers are well-known to those of ordinary skill in the art. Drug selectable markers such as ampicillin/carbenicillin, kanamycin, chloramphenicol, erythromycin, tetracycline, gentamicin, bleomycin, streptomycin, rhamnose, puromycin, hygromycin, blasticidin, and G418 may be employed. In other embodiments, selectable markers include, but are not limited to human nerve growth factor receptor (detected with a MAb, such as described in U.S. Pat. No. 6,365,373); truncated human growth factor receptor (detected with MAb); mutant human dihydrofolate reductase (DHFR; fluorescent MTX substrate available); secreted alkaline phosphatase (SEAP; fluorescent substrate available); human thymidylate synthase (TS; confers resistance to anti-cancer agent fluorodeoxyuridine); human glutathione S-transferase alpha (GSTA1; conjugates glutathione to the stem cell selective alkylator busulfan; chemoprotective selectable marker in CD34+cells); CD24 cell surface antigen in hematopoietic stem cells; human CAD gene to confer resistance to N-phosphonacetyl-L-aspartate (PALA); human multi-drug resistance-1 (MDR-1; P-glycoprotein surface protein selectable by increased drug resistance or enriched by FACS); human CD25 (IL-2α; detectable by Mab-FITC); Methylguanine-DNA methyltransferase (MGMT; selectable by carmustine); and Cytidine deaminase (CD; selectable by Ara-C). “Selective medium” as used herein refers to cell growth medium to which has been added a chemical compound or biological moiety that selects for or against selectable markers.


The terms “target genomic DNA sequence”, “target sequence”, or “genomic target locus” refer to any locus in vitro or in vivo, or in a nucleic acid (e.g., genome) of a cell or population of cells, in which a change of at least one nucleotide is desired using a nucleic acid-guided nuclease editing system. The target sequence can be a genomic locus or extrachromosomal locus.


A “vector” is any of a variety of nucleic acids that comprise a desired sequence or sequences to be delivered to and/or expressed in a cell. Vectors are typically composed of DNA, although RNA vectors are also available. Vectors include, but are not limited to, plasmids, fosmids, phagemids, virus genomes, synthetic chromosomes, and the like. As used herein, the phrase “engine vector” comprises a coding sequence for a nuclease to be used in the nucleic acid-guided nuclease systems and methods of the present disclosure. The engine vector may also comprise, in a bacterial system, the λ Red recombineering system or an equivalent thereto. Engine vectors also typically comprise a selectable marker. As used herein the phrase “editing vector” comprises a donor nucleic acid, optionally including an alteration to the target sequence that prevents nuclease binding at a PAM or spacer in the target sequence after editing has taken place, and a coding sequence for a gRNA. The editing vector may also comprise a selectable marker and/or a barcode. In some embodiments, the engine vector and editing vector may be combined; that is, the contents of the engine vector may be found on the editing vector. Further, the engine and editing vectors comprise control sequences operably linked to, e.g., the nuclease coding sequence, recombineering system coding sequences (if present), donor nucleic acid, guide nucleic acid, and selectable marker(s).


Editing in Nucleic Acid-Guided Nuclease Genome Systems

RNA-guided nucleases (RGNs) have rapidly become the foundational tools for genome engineering of prokaryotes and eukaryotes. Clustered Rapidly Interspaced Short Palindromic Repeats (CRISPR) systems are an adaptive immunity system which protect prokaryotes against mobile genetic elements (MGEs). RGNs are a major part of this defense system because they identify and destroy MGEs. RGNs can be repurposed for genome editing in various organisms by reprogramming the CRISPR RNA (crRNA) that guides the RGN to a specific target DNA. A number of different RGNs have been identified to date for various applications; however, there are various properties that make some RGNs more desirable than others for specific applications. RGNs can be used for creating specific double strand breaks (DSBs), specific nicks of one strand of DNA, or guide another moiety to a specific DNA sequence.


The ability of an RGN to specifically target any genomic sequence is perhaps the most desirable feature of RGNs; however, RGNs can only access their desired target if the target DNA also contains a short motif called PAM (protospacer adjacent motif) that is specific for every RGN. Type V RGNs such as MAD7, AsCas12a and LbCas12a tend to access DNA targets that contain YTTN/TTTN on the 5′ end whereas type II RGNs—such as the MADzymes disclosed herein—target DNA sequences containing a specific short motif on the 3′ end. An example well known in the art for a type II RGN is SpCas9 which requires an NGG on the 3′ end of the target DNA. Type II RGNs, unlike type V RGNS, require a transactivating RNA (tracrRNA) in addition to a crRNA for optimal function. Compared to type V RGNs, the type II RGNs create a double-strand break closer to the PAM sequence, which is highly desirable for precise genome editing applications.


A number of type II RGNs have been discovered so far; however, their use in widespread applications is limited by restrictive PAMs. For example, the PAM of SpCas9 occurs less frequently in AT-rich regions of the genome. New type II RGNs with new and less restrictive PAMs are beneficial for the field. Further, not all type II nucleases are active in multiple organisms. For example, a number of RGNs have been discussed in the scientific literature but only a few have been demonstrated to be active in vitro and fewer still are active in cells, particularly in mammalian cells. The present disclosure identifies multiple type II RGNs that have novel PAMs and are active in mammalian cells.


In performing nucleic acid-guided nuclease editing, the type II RGNs or MADzymes may be delivered to cells to be edited as a polypeptide; alternatively, a polynucleotide sequence encoding the MADzyme are transformed or transfected into the cells to be edited. The polynucleotide sequence encoding the MADzyme may be codon optimized for expression in particular cells, such as archaeal, prokaryotic or eukaryotic cells. Eukaryotic cells can be yeast, fungi, algae, plant, animal, or human cells. Eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including but not limited to human, mouse, rat, rabbit, dog, or non-human mammals including non-human primates. The choice of the MADzyme to be employed depends on many factors, such as what type of edit is to be made in the target sequence and whether an appropriate PAM is located close to the desired target sequence. The MADzyme may be encoded by a DNA sequence on a vector (e.g., the engine vector) and be under the control of a constitutive or inducible promoter. In some embodiments, the sequence encoding the nuclease is under the control of an inducible promoter, and the inducible promoter may be separate from but the same as an inducible promoter controlling transcription of the guide nucleic acid; that is, a separate inducible promoter may drive the transcription of the nuclease and guide nucleic acid sequences but the two inducible promoters may be the same type of inducible promoter (e.g., both are pL promoters). Alternatively, the inducible promoter controlling expression of the nuclease may be different from the inducible promoter controlling transcription of the guide nucleic acid; that is, e.g., the nuclease may be under the control of the pBAD inducible promoter, and the guide nucleic acid may be under the control of the pL inducible promoter.


In general, a guide nucleic acid (e.g., gRNA) complexes with a compatible nucleic acid-guided nuclease and can then hybridize with a target sequence, thereby directing the nuclease to the target sequence. With the type II MADzymes described herein, the nucleic acid-guided nuclease editing system uses two separate guide nucleic acid components that combine and function as a guide nucleic acid; that is, a CRISPR RNA (crRNA) and a transactivating CRISPR RNA (tracrRNA). The gRNA may be encoded by a DNA sequence on a polynucleotide molecule such as a plasmid, linear construct, or the coding sequence may reside within an editing cassette and is under the control of a constitutive promoter, or, in some embodiments, an inducible promoter as described below.


A guide nucleic acid comprises a guide polynucleotide sequence having sufficient complementarity with a target sequence to hybridize with the target sequence and direct sequence-specific binding of a complexed nucleic acid-guided nuclease to the target sequence. The degree of complementarity between a guide sequence and the corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences. In some embodiments, a guide sequence is about or more than about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20 nucleotides in length. Preferably the guide sequence is 10-30 or 15-20 nucleotides long, or 15, 16, 17, 18, 19, or 20 nucleotides in length.


In the present methods and compositions, the components of the guide nucleic acid is provided as a sequence to be expressed from a plasmid or vector and comprises both the guide sequence and the scaffold sequence as a single transcript under the control of a promoter, and in some embodiments, an inducible promoter. In general, to generate an edit in a target sequence, the gRNA/nuclease complex binds to a target sequence as determined by the guide RNA, and the nuclease recognizes a protospacer adjacent motif PAM) sequence adjacent to the target sequence. The target sequence can be any polynucleotide endogenous or exogenous to a prokaryotic or eukaryotic cell, or in vitro. For example, the target sequence can be a polynucleotide residing in the nucleus of a eukaryotic cell. A target sequence can be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide, an intron, a PAM, or “junk” DNA).


The guide nucleic acid may be part of an editing cassette that encodes the donor nucleic acid. Alternatively, the guide nucleic acid may not be part of the editing cassette and instead may be encoded on the engine or editing vector backbone. For example, a sequence coding for a guide nucleic acid can be assembled or inserted into a vector backbone first, followed by insertion of the donor nucleic acid in, e.g., the editing cassette. In other cases, the donor nucleic acid in, e.g., an editing cassette can be inserted or assembled into a vector backbone first, followed by insertion of the sequence coding for the guide nucleic acid. In yet other cases, the sequence encoding the guide nucleic acid and the donor nucleic acid (inserted, for example, in an editing cassette) are simultaneously but separately inserted or assembled into a vector. In yet other embodiments, the sequence encoding the guide nucleic acid and the sequence encoding the donor nucleic acid are both included in the editing cassette.


The target sequence is associated with a PAM, which is a short nucleotide sequence recognized by the gRNA/nuclease complex. The precise PAM sequence and length requirements for different nucleic acid-guided nucleases vary; however, PAMs typically are 2-7 base-pair sequences adjacent or in proximity to the target sequence and, depending on the nuclease, can be 5′ or 3′ to the target sequence. Engineering of the PAM-interacting domain of a nucleic acid-guided nuclease may allow for alteration of PAM specificity, improve fidelity, or decrease fidelity. In certain embodiments, the genome editing of a target sequence both introduces a desired DNA change to a target sequence, e.g., the genomic DNA of a cell, and removes, mutates, or renders inactive a proto-spacer mutation (PAM) region in the target sequence. Rendering the PAM at the target sequence inactive precludes additional editing of the cell genome at that target sequence, e.g., upon subsequent exposure to a nucleic acid-guided nuclease complexed with a synthetic guide nucleic acid in later rounds of editing. Thus, cells having the desired target sequence edit and an altered PAM can be selected using a nucleic acid-guided nuclease complexed with a synthetic guide nucleic acid complementary to the target sequence. Cells that did not undergo the first editing event will be cut rendering a double-stranded DNA break, and thus will not continue to be viable. The cells containing the desired target sequence edit and PAM alteration will not be cut, as these edited cells no longer contain the necessary PAM site and will continue to grow and propagate.


As mentioned previously, the range of target sequences that nucleic acid-guided nucleases can recognize is constrained by the need for a specific PAM to be located near the desired target sequence. As a result, it often can be difficult to target edits with the precision that is necessary for genome editing. It has been found that nucleases can recognize some PAMs very well (e.g., canonical PAMs), and other PAMs less well or poorly (e.g., non-canonical PAMs). Because the mined MAD nucleases disclosed herein may recognize different PAMs, the mined MAD nucleases increase the number of target sequences that can be targeted for editing; that is, mined MAD nucleases decrease the regions of “PAM deserts” in the genome. Thus, the mined MAD nucleases expand the scope of target sequences that may be edited by increasing the number (variety) of PAM sequences recognized. Moreover, cocktails of mined MAD nucleases may be delivered to cells such that target sequences adjacent to several different PAMs may be edited in a single editing run.


Another component of the nucleic acid-guided nuclease system is the donor nucleic acid. In some embodiments, the donor nucleic acid is on the same polynucleotide (e.g., editing vector or editing cassette) as the guide nucleic acid and may be (but not necessarily) under the control of the same promoter as the guide nucleic acid (e.g., a single promoter driving the transcription of both the guide nucleic acid and the donor nucleic acid). For cassettes of this type, see U.S. Pat. Nos. 10,240,167; 10,266,849; 9,982,278; 10,351,877; 10,364,442; 10,435,715; and 10,465,207. The donor nucleic acid is designed to serve as a template for homologous recombination with a target sequence nicked or cleaved by the nucleic acid-guided nuclease as a part of the gRNA/nuclease complex. A donor nucleic acid polynucleotide may be of any suitable length, such as about or more than about 20, 25, 50, 75, 100, 150, 200, 500, or 1000 nucleotides in length. In certain preferred aspects, the donor nucleic acid can be provided as an oligonucleotide of between 20-300 nucleotides, more preferably between 50-250 nucleotides. The donor nucleic acid comprises a region that is complementary to a portion of the target sequence (e.g., a homology arm). When optimally aligned, the donor nucleic acid overlaps with (is complementary to) the target sequence by, e.g., about 20, 25, 30, 35, 40, 50, 60, 70, 80, 90 or more nucleotides. In many embodiments, the donor nucleic acid comprises two homology arms (regions complementary to the target sequence) flanking the mutation or difference between the donor nucleic acid and the target template. The donor nucleic acid comprises at least one mutation or alteration compared to the target sequence, such as an insertion, deletion, modification, or any combination thereof compared to the target sequence.


Often the donor nucleic acid is provided as an editing cassette, which is inserted into a vector backbone where the vector backbone may comprise a promoter driving transcription of the gRNA and the coding sequence of the gRNA, or the vector backbone may comprise a promoter driving the transcription of the gRNA but not the gRNA itself. Moreover, there may be more than one, e.g., two, three, four, or more guide nucleic acid/donor nucleic acid cassettes inserted into an engine vector, where each guide nucleic acid is under the control of separate different promoters, separate like promoters, or where all guide nucleic acid/donor nucleic acid pairs are under the control of a single promoter. In some embodiments the promoter driving transcription of the gRNA and the donor nucleic acid (or driving more than one gRNA/donor nucleic acid pair) is an inducible promoter. Inducible editing is advantageous in that isolated cells can be grown for several to many cell doublings to establish colonies before editing is initiated, which increases the likelihood that cells with edits will survive, as the double-strand cuts caused by active editing are largely toxic to the cells. This toxicity results both in cell death in the edited colonies, as well as a lag in growth for the edited cells that do survive but must repair and recover following editing. However, once the edited cells have a chance to recover, the size of the colonies of the edited cells will eventually catch up to the size of the colonies of unedited cells. See, e.g., U.S. Pat. Nos. 10,533,152; 10,550,363; 10,532,324; 10,550,363; 10,633,626; 10,633,627; 10,647,958; 10,760,043; 10,723,995; 10,801,008; and 10,851,339. Further, a guide nucleic acid may be efficacious directing the edit of more than one donor nucleic acid in an editing cassette; e.g., if the desired edits are close to one another in a target sequence.


In addition to the donor nucleic acid, an editing cassette may comprise one or more primer sites. The primer sites can be used to amplify the editing cassette by using oligonucleotide primers; for example, if the primer sites flank one or more of the other components of the editing cassette.


In addition, the editing cassette may comprise a barcode. A barcode is a unique DNA sequence that corresponds to the donor DNA sequence such that the barcode can identify the edit made to the corresponding target sequence. The barcode typically comprises four or more nucleotides. In some embodiments, the editing cassettes comprise a collection of donor nucleic acids representing, e.g., gene-wide or genome-wide libraries of donor nucleic acids. The library of editing cassettes is cloned into vector backbones where, e.g., each different donor nucleic acid is associated with a different barcode.


Additionally, in some embodiments, an expression vector or cassette encoding components of the nucleic acid-guided nuclease system further encodes one or more nuclear localization sequences (NLSs), such as about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs. In some embodiments, the nuclease comprises NLSs at or near the amino-terminus of the MADzyme, NLSs at or near the carboxy-terminus of the MADzyme, or a combination.


The engine and editing vectors comprise control sequences operably linked to the component sequences to be transcribed. As stated above, the promoters driving transcription of one or more components of the mined MAD nuclease editing system may be inducible, and an inducible system is likely employed if selection is to be performed. A number of gene regulation control systems have been developed for the controlled expression of genes in plant, microbe, and animal cells, including mammalian cells, including the pL promoter (induced by heat inactivation of the CI857 repressor), the pBAD promoter (induced by the addition of arabinose to the cell growth medium), and the rhamnose inducible promoter (induced by the addition of rhamnose to the cell growth medium). Other systems include the tetracycline-controlled transcriptional activation system (Tet-On/Tet-Off, Clontech, Inc. (Palo Alto, Calif.); Bujard and Gossen, PNAS, 89(12):5547-5551 (1992)), the Lac Switch Inducible system (Wyborski et al., Environ Mol Mutagen, 28(4):447-58 (1996); DuCoeur et al., Strategies 5(3):70-72 (1992); U.S. Pat. No. 4,833,080), the ecdysone-inducible gene expression system (No et al., PNAS, 93(8):3346-3351 (1996)), the cumate gene-switch system (Mullick et al., BMC Biotechnology, 6:43 (2006)), and the tamoxifen-inducible gene expression (Zhang et al., Nucleic Acids Research, 24:543-548 (1996)) as well as others.


Typically, performing genome editing in live cells entails transforming cells with the components necessary to perform nucleic acid-guided nuclease editing. For example, the cells may be transformed simultaneously with separate engine and editing vectors; the cells may already be expressing the mined MAD nuclease (e.g., the cells may have already been transformed with an engine vector or the coding sequence for the mined MAD nuclease may be stably integrated into the cellular genome) such that only the editing vector needs to be transformed into the cells; or the cells may be transformed with a single vector comprising all components required to perform nucleic acid-guided nuclease genome editing.


A variety of delivery systems can be used to introduce (e.g., transform or transfect) nucleic acid-guided nuclease editing system components into a host cell. These delivery systems include the use of yeast systems, lipofection systems, microinjection systems, biolistic systems, virosomes, liposomes, immunoliposomes, polycations, lipid:nucleic acid conjugates, virions, artificial virions, viral vectors, electroporation, cell permeable peptides, nanoparticles, nanowires, exosomes. Alternatively, molecular trojan horse liposomes may be used to deliver nucleic acid-guided nuclease components across the blood brain barrier. Of particular interest is the use of electroporation, particularly flow-through electroporation (either as a stand-alone instrument or as a module in an automated multi-module system) as described in, e.g., U.S. Pat. Nos. 10,435,713; 10,443,074; 10,323,258; and 10,415,058.


After the cells are transformed with the components necessary to perform nucleic acid-guided nuclease editing, the cells are cultured under conditions that promote editing. For example, if constitutive promoters are used to drive transcription of the mined MAD nucleases and/or gRNA, the transformed cells need only be cultured in a typical culture medium under typical conditions (e.g., temperature, CO2 atmosphere, etc.) Alternatively, if editing is inducible—by, e.g., activating inducible promoters that control transcription of one or more of the components needed for nucleic acid-guided nuclease editing, such as, e.g., transcription of the gRNA, donor DNA, nuclease, or, in the case of bacteria, a recombineering system—the cells are subjected to inducing conditions.


EXAMPLES

The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention, nor are they intended to represent or imply that the experiments below are all of or the only experiments performed. It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific aspects without departing from the spirit or scope of the invention as broadly described. The present aspects are, therefore, to be considered in all respects as illustrative and not restrictive.


Example 1
Exemplary Workflow Overview

The disclosed MADzyme Type II CRISPR enzymes were identified by the method depicted in FIG. 1. FIG. 1 shows an exemplary workflow for creating and for in vitro screening of MADzymes, including those in untapped clusters. In a first step, metagenome mining was performed to identify putative RGNs of interest based on, e.g., sequence (HMMER profile) and a search for CRISPR arrays. Once putative RGNs of interest were identified in silico, candidate pools were created and each MADzyme was identified by cluster, the tracrRNA was identified, and the sgRNA structure was predicted. Final candidates were identified, then the genes were synthesized. An in vitro depletion test was performed (see FIG. 2), where a synthetic target library was constructed in which to test target depletion for each of the candidate MADzymes. After target depletion, amplicons were produced for analysis for in vivo analysis. FIG. 2 depicts the in vitro depletion test in more detail.


Example 2
Metagenome Mining

The NCBI Metagenome database was used to search for novel, putative CRISPR nucleases using HMMER hidden Markov model searches. Hundreds of potential nucleases were identified. For each potential nuclease candidate, putative CRISPR arrays were identified and CRISPR repeat and anti-repeats were identified. Thirteen nucleases (FIG. 3) were chosen for in vitro validation and 11 active MADzymes were identified and assigned to clusters. There was less than 40% sequence identity between clusters. Cluster 59 shown in FIG. 4 presents two unique subclusters with distinct sgRNA architecture. Clusters 55-57 are shown in FIG. 5. These new MADzymes have diverse PAM preferences and distinct sgRNA structure. Cluster 141 (FIG. 6) is a distant cluster from 55, 56, 57 and 59 and shows diverse Cas protein structure and smaller-sized enzymes (e.g., approximately 200 amino acids shorter than the counterparts from the 55, 56, 57 and 59 clusters). Table 1 lists the identified MADzymes, including amino acid sequences, origin, and nucleic acid sequences of the CRISPR RNA and the trans-activating crispr RNA.
















TABLE 1








Organism






MAD
Clus-

(meta-


CRISPR



name
ter
Contig_id
genome)
Source
aa_seq
repeat
tracrRNA






















MAD2015
59
DPZI01000013.1

Vagococcus


MGKNYTIGLDIGTNSVGWSVVTENQQLVKKRMKIRGDS
GTTTT
TGTTGGT





sp.

EKKQVKKNFWGVRLFDEGETAEATRLKRTTRRRYTRRR
AGAGC
AGCATTC







NRVVDLQNIFKDEINQKDSNFFNRLNESFLVVEDKKQP
TATGC
AAAACAA







KQMIFGTVEEEASYHESFPTIYHLRKELVDNKDQADIR
TGTTT
CATAGCA







LVYLAMAHMIKYRGHFLIEGQLSTENTSVEEKFHLFLK
TGAAT
AGTTAAA







EYNSTFCKQEDGSLVNPVNEDINGEEILMGTLSRSKKA
GCTTC
ATAAGGC







EQIMKSFEGEKSNGVFSQFLKMIVGNQGNFKKAFNLEE
CAAAA
TTTGTCC







DAKIQFAKEEYDEDLTTLLSNIGDEYANVFSLAKETYE
C
GTTCTCA







AIELSGILSTKDKETYAKLSSSMTERYEDHEKDLASLK
[SEQ
ACTTTTA







SFFREHLPEKYAVMFKDVSKNGYAGYIENSNKISQEEF
ID NO.
GTGACGC







YKYTKKLIGQIEGADYFIKKMEQEAFLRKQRTYDNGVI
2]
TGTTTCG







PYQVHLSELTHIINNQKKYYPFLLEKEEEIKSILTFKI

GCG







PYYIGPLAKGNSDFAWLIRNSNDKITPSNFNEVLDIEN

[SEQ ID







SASQFIERMTNNDVYLPEEKVLPKNSMLYQKYIVFNEL

NO. 3]







TKVRYINDRGTECNFSGEEKLQIFERFFKDSSTKVKKV









SLENYLNKEYMIESPTIKGIEDDFNASFRTYHDFIKLG









VSREMLDDIDNEEMFEDIVKILTIFEDRQMIKKQLEKY









KDVFDSDILKKMVRRHYTGWGRLSKKLLHEMKDDNSGK









TILDYLIEDDRLPKHINRNFMQLINDSNLSFKEKIEKA









QLTDGTEDIDSVVKNLIGSPAIKKGISQSLKIVEELVS









IMGYQPTSIVVEMARENQTTSKGKRQSIQRYKRLEAAI









NELGSDLLKVCPTDNHALKDDRLYLYYLQNGRDMYTGL









ELDIHNLSQYDIDHIVPRSFITDNSIDNRVLVSSKKNR









GKLDNVPSKEIVQKNKLLWMNLKKSKLMSEKKYANLIK









GETGGLTEDDKAKFLNRQLVETRQITKNVAQILDQRFN









TQKDEKGNIIREVKVITLKSALVSQFRQNFEFYKVREV









NDFHHANDAYLNAVVANTLLKVYPKLTPDFVYGEYRKG









NPFKNTKATAKKHYYSNIMENLCHETTIIDDETGEILW









DKKCIGTIKQVLNYHQVNVVKKVETQTGRFSEETLVPR









GSTKNPIALKSHLDPQKYGGFKSPTIAYTIVIEYKKGK









KDILIKELLGISIMNRGAFEKNNKEYLEKLNYKEPRVL









MVLPKYSLFELENGRRRLLASDKESQKGNQMAVPSYLN









NLLYHTNKSLSKNAKSLEYVNEHRQQFEELLEEIIDFA









NQFTLAEKNTLLIADLYESNKEADIELLASSFINLLRF









NQMGAPAEFSFFEKPIPRKRYSSTFELLKGKVIHQSIT









GLYETHQKV [SEQ ID NO. 1]







MAD2016
59
DGLK01000042.1

Entero-

New
MKKDYVIGLDIGTNSVGWAVMTEDYQLVKKKMPIYGNT
GTTTT
TCTTTTG






coccus

York
EKKKIKKNFWGVRLFEEGHTAEDRRLKRTARRIISRRR
AGAGT
GGACTAT






faecalis

City
NRLRYLQAFFEEAMTDLDENFFARLQESFLVPEDKKWH
CATGT
TCTAAAC






MTA
RHPIFAKLEDEVAYHETYPTIYHLRKKLADSSEQADLR
TGTTT
AACATAG






subway
LIYLALAHIVKYRGHFLIEGKLSTENISVKEQFQQFMI
AGAAT
CAAGTTA







IYNQTFVNGESRLVSAPLPESVLIEEELTEKASRTKKS
GGTAC
AAATAAG







EKVLQQFPQEKANGLFGQFLKLMVGNKADFKKVFGLEE
CAAAA
GTTTTAA







EAKITYASESYEEDLEGILAKVGDEYSDVFLAAKNVYD
C
CCGTAAT







AVELSTILADSDKKSHAKLSSSMIVRFTEHQEDLKKFK
[SEQ
CAACTGT







RFIRENCPDEYDNLFKNEQKDGYAGYIAHAGKVSQLKF
ID NO.
AAAGTGG







YQYVKKIIQDIAGAEYFLEKIAQENFLRKQRTFDNGVI
5]
CGCTGTT







PHQIHLAELQAIIHRQAAYYPFLKENQEKIEQLVTFRI

TCGGCGC







PYYVGPLSKGDASTFAWLKRQSEEPIRPWNLQETVDLD

[SEQ ID







QSATAFIERMTNFDTYLPSEKVLPKHSLLYEKFMVFNE

NO. 6]







LTKISYTDDRGIKANFSGKEKEKIFDYLFKTRRKVKKK









DIIQFYRNEYNTEIVTLSGLEEDQFNASFSTYQDLLKC









GLTRAELDHPDNAEKLEDIIKILTIFEDRQRIRTQLST









FKGQFSAEVLKKLERKHYTGWGRLSKKLINGIYDKESG









KTILGYLIKDDGVSKHYNRNFMQLINDSQLSFKNAIQK









AQSSEHEETLSETVNELAGSPAIKKGIYQSLKIVDELV









AIMGYAPKRIVVEMARENQTTSTGKRRSIQRLKIVEKA









MAEIGSNLLKEQPTTNEQLRDTRLFLYYMQNGKDMYTG









DELSLHRLSHYDIDHIIPQSFMKDDSLDNLVLVGSTEN









RGKSDDVPSKEVVKDMKAYWEKLYAAGLISQRKFQRLT









KGEQGGLTLEDKAHFIQRQLVETRQITKNVAGILDQRY









NANSKEKKVQIITLKASLTSQFRSIFGLYKVREVNDYH









HGQDAYLNCVVATTLLKVYPNLAPEFVYGEYPKFQTFK









ENKATAKAIIYTNLLRFFTEDEPRFTKDGEILWSNSYL









KTIKKELNYHQMNIVKKVEVQKGGFSKESIKPKGPSNK









LIPVKNGLDPQKYGGFDSPIVAYTVLFTHEKGKKPLIK









QEILGITIMEKTRFEQNPILFLEEKGFLRPRVLMKLPK









YTLYEFPEGRRRLLASAKEAQKGNQMVLPEHLLTLLYH









AKQCLLPNQSESLTYVEQHQPEFQEILERVVDFAEVHT









LAKSKVQQIVKLFEANQTADVKEIAASFIQLMQFNAMG









APSTFKFFQKDIERARYTSIKEIFDATIIYQSTTGLYE









TRRKVVD [SEQ ID NO. 4]







MAD2017
59
DMKA01000006.1

Strepto-


MKKPYSIGLDIGTNSVGWAVITDDYKVPAKKMKVLGNT
GTTTT
TGTTGGA






coccus


DKKYIKKNLLGALLFDSGETAEVTRLKRTARRRYTRRK
AGAGC
ACTATTC





sp.

NRLRYLQEIFAKEMTKVDESFFQRLEESFLTDDDKTFD
TGTGC
GAAACAA





(firmi-

SHPIFGNKAEEDAYHQKFPTIYHLRKYLADSQEKADLR
TGTTT
CACAGCG






cutes)


LVYLALAHMIKYRGHFLIEGELNAENTDVQKLFNVFVE
CGAAT
AGTTAAA







TYDKIVDESHLSEIEVDASSILTEKVSKSRRLENLIKQ
GGTTC
ATAAGGC







YPTEKKNTLFGNLIALALGLQPNFKTNFKLSEDAKLQF
CAAAA
TTTGTCC







SKDTYEEDLEELLGKVGDDYADLFISAKNLYDAILLSG
C
GTACACA







ILTVDDNSTKAPLSASMIKRYVEHHEDLEKLKEFIKIN
[SEQ
ACTTGTA







KLKLYHDIFKDKTKNGYAGYIDNGVKQDEFYKYLKTIL
ID NO.
AAAGGGG







TKIDDSDYFLDKIERDDFLRKQRTFDNGSIPHQIHLQE
8]
CACCCGA







MHSILRRQGEYYPFLKENQAKIEKILTFRIPYYVGPLA

TTCGGGT







RKDSRFAWANYHSDEPITPWNFDEVVDKEKSAEKFITR

GCA







MTLNDLYLPEEKVLPKHSHVYETFTVYNELTKIKYVNE

[SEQ ID







QGESFFFDANMKQEIFDHVFKENRKVTKAKLLSYLNNE

NO. 9]







FEEFRINDLIGLDKDSKSFNASLGTYHDLKKILDKSFL









DDKTNEQIIEDIVLTLTLFEDRDMIHERLQKYSDFFTS









QQLKKLERRHYTGWGRLSYKLINGIRNKENNKTILDFL









IDDGHANRNFMQLINDESLSFKTIIQEAQVVGDVDDIE









AVVHDLPGSPAIKKGILQSVKIVDELVKVMGDNPDNIV









IEMARENQTTGYGRNKSNQRLKRLQDSLKEFGSDILSK









KKPSYVDSKVENSHLQNDRLFLYYIQNGKDMYTGEELD









IDRLSDYDIDHIIPQAFIKDNSIDNKVLTSSAKNRGKS









DDVPSIEIVRNRRSYWYKLYKSGLISKRKFDNLTKAER









GGLTEADKAGFIKRQLVETRQITKHVAQILDARFNTKR









DENDKVIRDVKVITLKSNLVSQFRKEFKFYKVREINDY









HHANDAYLNAVVGTALLKKYPKLTPEFVYGEYKKYDVR









KLIAKSSDDYSEMGKATAKYFFYSNLMNFFKTEVKYAD









GRVFERPDIETNADGEVVWNKQKDFDIVRKVLSYPQVN









IVKKVEAQTGGFSKESILSKGDSDKLIPRKTKKVYWNT









KKYGGFDSPTVAYSVLVVADIEKGKAKKLKTVKELVGI









SIMERSFFEENPVSFLEKKGYHNVQEDKLIKLPKYSLF









EFEGGRRRLLASATELQKGNEVMLPAHLVELLYHAHRI









DSFNSTEHLKYVSEHKKEFEKVLSCVENFSNLYVDVEK









NLSKVRAAAESMTNFSLEEISASFINLLTLTALGAPAD









FNFLGEKIPRKRYTSTKECLSATLIHQSVTGLYETRID









LSKLGEE [SEQ ID NO. 7]







MAD2019
59
DOTL01000042.1

Strepto-


MTKPYSIGLDIGTNSVGWAVITDDYKVPSKKMKVLGNT
GTTTT
GGTTTGA






coccus


SKKYIKKNLLGALLFDSGITAEGRRLKRTARRRYTRRR
AGAGC
AACCATT





sp.

NRILYLQEIFSTEMATLDDAFFQRLDDSFLVPDDKRDS
TGTGT
CGAAACA





(firmi-

KYPIFGNLVEEKAYHDEFPTIYHLRKYLADSTKKADLR
TGTTT
ATACAGC






cutes)


LVYLALAHMIKYRGHFLIEGEFNSKNNDIQKNFQDFLD
CGAAT
AAAGTTA







TYNAIFESDLSLENSKQLEEIVKDKISKLEKKDRILKL
GGTTC
AAATAAG







FPGEKNSGIFSEFLKLIVGNQADFKKYFNLDEKASLHF
CAAAA
GCTAGTC







SKESYDEDLETLLGYIGDDYSDVFLKAKKLYDAILLSG
C
CGTATAC







ILTVTDNGTETPLSSAMIMRYKEHEEDLGLLKAYIRNI
[SEQ
AACGTGA







SLKTYNEVFNDDTKNGYAGYIDGKTNQEDFYVYLKKLL
ID NO.
AAACACG







AKFEGADYFLEKIDREDFLRKQRTFDNGSIPYQIHLQE
11]
TGGCACC







MRAILDKQAKFYPFLAKNKERIEKILTFRIPYYVGPLA

GATTCGG







RGNSDFAWSIRKRNEKITPWNFEDVIDKESSAEAFINR

TGC







MTSFDLYLPEEKVLPKHSLLYETFTVYNELTKVRFIAE

[SEQ ID







GMSDYQFLDSKQKKDIVRLYFKGKRKVKVTDKDIIEYL

NO. 12]







HAIDGYDGIELKGIEKQFNSSLSTYHDLLNIINDKEFL









DDSSNEAIIEEIIHTLTIFEDREMIKQRLSKFENIFDK









SVLKKLSRRHYTGWGKLSAKLINGIRDEKSGNTILDYL









IDDGISNRNFMQLIHDDALSFKKKIQKAQIIGDKDKDN









IKEVVKSLPGSPAIKKGILQSIKIVDELVKVMGRKPES









IVVEMARENQYTNQGKSNSQQRLKRLEESLEELGSKIL









KENIPAKLSKIDNNSLQNDRLYLYYLQNGKDMYTGDDL









DIDRLSNYDIDHIIPQAFLKDNSIDNKVLVSSASNRGK









SDDVPSLEVVKKRKTLWYQLLKSKLISQRKFDNLTKAE









RGGLSPEDKAGFIQRQLVETRQITKHVARLLDEKFNNK









KDENNRAVRTVKIITLKSTLVSQFRKDFELYKVREIND









FHHAHDAYLNAVVASALLKKYPKLEPEFVYGDYPKYNS









FRERKSATEKVYFYSNIMNIFKKSISLADGRVIERPLI









EVNEETGESVWNKESDLATVRRVLSYPQVNVVKKVEVQ









SGGFSKELVQPHGNSDKLIPRKTKKMIWDTKKYGGFDS









PIVAYSVLVMAEREKGKSKKLKPVKELVRITIMEKESF









KENTIDFLERRGLRNIQDENIILLPKFSLFELENGRRR









LLASAKELQKGNEFILPNKLVKLLYHAKNIHNTLEPEH









LEYVESHRADFGKILDVVSVFSEKYILAEAKLEKIKEI









YRKNMNTEIHEMATAFINLLTFTSIGAPATFKFFGHNI









ERKRYSSVAEILNATLIHQSVTGLYETRIDLGKLGED









[SEQ ID NO. 10]







MAD2020
55
DQFW01000027.1

Achole-

human
MKNNEETLKKLRLGLDIGTNSVGYALLDENNKLIKKNG
GTTTG
TGTAAAT






plasmatales

gut
HTFWGVRMFDEAETAKDRGSYRKSRRRLLRRKERMEIL
CTAGT
AACATAA





bacterium

RSFFTKEICDIDPTFFERLDDSFYYKEDKKNKNTYNLF
TATGT
CGAGTGC







TSEYTDKDFYLEYPTIYHLRKAMQEEDKKFDIRMVYLA
TATTT
AAATAAG







IAHIIKYRGNFLYPGEEFSTSEYTSIKQFFLDFNDILD
ATAGT
CGTTTCG







ELSNELEDNEDYSAEYFDKIENINDDFLEKLKVILMEI
ATTAA
CGAAAAT







KGISNKKKELLDLFNVNKKSIYNELVIPFISGSAKVNI
GCAAA
TTACAGT







SSLSVIKNSKYPKTEISLGSEELEGQVEEAISVAPEIK
C
GGCCCTG







SVLEMIIKIKEISDFYFINKILSDSKTISESMVKMYDE
[SEQ
CTGTGGG







HNEDLKKLKGFFKKYAEDQYNEIFKIRDEKLANYVAYV
ID NO.
GCCTTTT







GFNKLRKNKVERFKHASREEFYGYLKQKLNNIKYAEAQ
14]
TTATTTA







EEIKYFIDKIDNNEFLLKQNSNQNGAFPMQLHLKELKT

TCAAA







ILNNQEKYYPFLSEGNDGYSIKEKIILTFKYKIPYYVG

[SEQ ID







PLNKESKYSWVVREDEKIYPWNFDKVVKLDETAEKFIL

NO. 15]







RMQNKCTYLKGDNDYCLPKNSLIFSEYSCLSYLNKLSI









NGKPIDPIMKSKIFNEVFLIKKQPTKKDIIEFIKTNYN









ADALTTTEKELPEATCNMASYIKMKEIFGKDFNDNKEM









IENIIKDITIFEDKSILGNRLKELYKLNNDRIKQIKGL









NYKGYSRLSKNLLVGLQIVDNQTGEIKGNVIEVMRKTN









LNLQEILYLDGYRLIDAIDEYNRKNSLNDSYLCARDYI









AENLVISPSFKRALIQTCSIIQEIERIFHKKIDEFYVE









VTRTNKDKNKGKTTSSRYDKIKKIYSSCQELAMAYNFD









MKRLKNELESNKDNLKSDILYFYFTQLGKCMYSLEDID









ISDLTNNYHYDIDHIYPQSIIKDDSLSNRVLVDKKKNA









AKTDKFLFEAKVLNPKAQQFYKKLLSLELISKEKYRRL









TQKEISKDELEGFVNRQLVSTNQSVMGLIKLLKEYYKV









DEKNIIYSKGENVSDFRHTFDLVKSRTANNFHHANDAY









LNVVVGGILNKYYTSRRFYQFSDIARIENEGESLNPSR









IFTKRDILKANGKVIWDKKEDIKRIEKDLYHRFDITET









IRTYNPNKMYSKVTILPKGEGESAVPFQTTTPRVDVEK









YGGITSNKFSRYVIIEAHGKKGLDTILEAIPKTACGDN









NKIEKDIDNYIASLDEYQKYTSYKVVNYNIKANVVIQE









GSFKYIITGKSGNQYVLQNVQDRFFSKKAMITIKNIDK









YLNNKKLGIIMAKDNEKIIVSPARGKNNEEIFFEKTEL









VNLLKEIKTMYSKDIYSFSAIQNIVNNIDCSIDYSIDD









FIIICNNLLQILKTNERKNADLRLIHLSGNSGTLYLGK









KLKSGMKFIWQSITGYYEEILYEVK [SEQ ID NO.









13]







MAD2021
57
DEED01000018.1

Lachno-


MSEKYFVGLDMGTSSVGWAVTDEHYHLLRRKGKDLWGA
GTTTG
GATAATG






spiraceae


RLFDEAETAAGRRTNRVSRRRLARQRARIGWLKELFRP
AGAGC
TTTTACA





bacterium

YLEEKDAGFLQRLEESRFFLEDKTVKQPYALFSDKEFT
CTTGT
AGGCGAG







DKDYYQKYPTIFHLRKELLESKAPHDVRLVFLAVLNMY
AAAAC
TTCAAAT







AHRGHFLNPELQEGTLGDIHDLLSRLDAYIQDLFEDQG
CGTAT
AAGGATT







WSILENVEEQQKVLAEKNISNTVRLEKILSAIGTSPKD
ATCTC
TATCCGA







KEKKPLIEIYKLICGLKGSLSLAFSGVEMNETDAQMKF
TCAAG
AATCGCT







SFSDSNLEENEPEIERILGERYFEMYSILKEIHAWGLL
C
TGCGTGC







SEIMSDDSGKTYPYISYAKVDLYQKHHEQLRMLKKIIR
[SEQ
ATTGGCA







TYAPDEYHRMFRSMEDNTYSAYVGSVNSKNKKQRRGAK
ID NO.
CCATCTA







STDFFKEVKRIIEKIEKEHGELPECEEILDLIARDSFL
17]
TCTTTTA







PKQLTTANGVIPNQVYATELRQIVTNAAAYLPFLNDKD









DTGLTNAEKIVEMFKFHIPYYIGPLKNDGNGTAWVVRK

AGACTTT







QQGTVYPWNIDEKVDMAKTRDQFILNLVRKCSYLNDET

CTTTGAA







VLPASSLLYEKFKVLNELNNLTINGQKISVELKQDIFR

AGTCTT







DLFRATGKRVTTRKLMGYLRRKAVIDADADETCLEGFD

[SEQ ID







KTQGGFVSTLSSYHKFMEIFSTDVLTDRQREIAEGAIY

NO. 18]







FATVYGEDKSFLKKVLRDKFSPAELSQAQIDRLSGIRF









KDWSHLSREFLLLEEADHSTGEIMTIIDRLWNTNENLM









QIIHSDEYTYKQAIEERTARLEKSLSEVSFEDIEDSYM









SAPVRRMVWQTIRILQEIEEVMGSEPARVFVEMTRSEG









EKGDKGRKDSRKKKLKELYKKCKDDDQGLLSDIEGRDE









RDFRIRKLYLYYMQKGLCMYSGHPIDFGKLFDDSYYDI









DHIYPRHYVKDDSIENNLVLVESKLNRDKKDTLLCPDI









QERMHPVWEMLHRQGFMNDEKFKRLMRKEPFSEEEFAH









FIERQLVETGQGTKEIARILNDVLGNKDENNKVIYVKA









GNVSSFRNDNKKNPEFVKCRVINDHHHAKDAYLNIVVG









NTYYTKFTLHPANFIRELRNKSHPTLEDQYNMDKLFAR









RVERNGYTAWNPDTDFQTVKQVLRKNSVLISRRSFIEH









GQIADLQLVSGRKISEVNGKGYLPIKASDIRLSGPSGT









MKYGGYNKASGAYFFLVEHELKGKLVRTIEPVYVYMMA









SIHGKEDLEKYCQEELGYIHPRICLKKIPMYSHIRING









FDYYLTGRSNDRLFICNAVQLTLSSEWSAYIKALSKAV









DEKWDAAYIEQQASRIQDSLKSEEVFISKERNDQLYKV









LLQKHLEGFFNNRINSIGTIMKEGYDSFRALPVNEQAE









TLMEILKISQLVNIGANLVSIGGKSRSGVATVSKKISD









SKSFQLISDSVTGIFQRATDLLTI [SEQ ID NO.









16]







MAD2022
57
CACYWR010000004.1
uncultured
Cattle
MEKEYYLGLDMGTSSVGWAVTDKEYRLLRAKGKDMWGI
GTTTG
GAGAATT






Lachno-

rumen
REFEEAQTAVERRTHRLSKRRRARQLVRIGLLKDYFHD
AGAGT
AACAAGA






spiraceae


EIMKIDPNFYIRLENSKYYLEDKDVRLASSNGIFDDKN
CTTGT
CGAGTGC





bacterium

YTDKDYYEQYKTIFHLRSELIHNSQKHDVRLVYLALLN
TAATT
AAATAAG







MFKHRGHFLFEGDAYVQGNIGDIYKEFIQLLKNEYYED
CTTAA
GTTTATC







ENVKLTDQIDYFKLKEILSNSEFSRTAKAEKINSLVHI
AGGTG
CGGAATC







DKKNKLENTYIRLLCGLEIELKILFPEIDEKIKICFAK
TAAAA
GTCAATA







GYDEKLVEITEILTDNQLQILENLKKIHDIAALDKIRK
C
TGACCTG







GKEYLSDARVAEYEKHREDLALLKKIYREYMTKQDYDR
[SEQ
CATTGTG







MFREGEDGSYSAYVNSYNTSKKQRRNMKHRKIDEFYGT
ID NO.
CAGAATC







IRKDLKLLLKQGIQDDNIERILEEIDGNNDNKFMPKQL
20]
TTTAAAA







SFANGVIPNSLHKAEMKAILRNAETYLPFLLETDESGL

TCATATG







TVSERILQLFSFHIPYYIGPVSVNSEKNNGNGWVVRRE

ATTTCAT







DGEVLPWNIEQKIDYGETSKRFIEKMVRRCTYISGEQV

ATGGTTT







LPKNSFIYEKYCVLNEINNIKIDGERITVELKQNIYND

TA







LYLHGKRVTKKQLINYLNNRGMIEDENQVSGIDINLNN

[SEQ ID







YLGSYGKFLPIFEEKLKEDNYIKIAEDIIYLASIYGDS

NO. 21]







KKMLKSQIKSKYGDILDDKQIKRILGLKFKDWGRISRR









FLELEGLDKETGEITTIIKAMWDYNLNFMEIIHSDAFD









FKDKIEELHANSIKPLAEIEVEDLDDMYFSAPVKRMIW









QTFKVIKEIEKVMGCPPKKVFIEMTRINDKKSKGKRTN









SRKEKFLSLYKNIHDELVDWKQLIISSDESGKLNSKKM









YLYLTQQGICMYTGRRINLEELFDDNKYDIDHIYPRHF









VKDDNLENNLVLVEKQSNSRKSDTYPIDKSIRNNSQVY









KHWKSLREGNFISKEKYDRLTGKNEFTDEQKAGFIARQ









MVETSQGTKGVADIIKQALPQSRIIYSKASNVSEFRRK









YDILKSRTVNEFHHANDAYLNIVVGNVYDTKFTSNPLN









FIKKQYNVDRKANNYNLDKMFVYDVKRGNEIAWIGWNP









KKSEDSSEMSKRGTIVTVKKMLSKNTPLMTRMSFVGHG









GIAEDNLSSHFVAKNKGYMPNGKESDVTKYGGYKKAKT









AYFFVVEHGQTNNRIRTIETLPIYRRREVEKYEDGLIK









YCEQSLSLLNPIIIYKKIKIQSLMKINGYYAYISGKSN









EVYTFRNGVNMCLSQEWINYVKKLENYIEKDRQDRMIT









YEKNIELYEIILRKYSTTILNKRLSKMDKKLINAKDRF









CILNVKEQSQVLINVFVLSRIGDNQTDLSKIGIGKQSG









QITQNKKITGCKEFKLVNQSVTGLYENEIDLLTV









[SEQ ID NO. 19]







MAD2023
56
DCGJ01000048.1

Lachno-

Feces
MEKNNYLLGLDIGTDSVGYAVTNDKYDILKFHGEPAWG
GTTTG
AGACCCC






clostridium

of six-
VTIFDEASLSTEKRSFRVSRRRLDRRQQRVLLVQELFA
AGAGT
TATGGAT





sp.
years
SEVAKVDKDFFKRIQESNLYRSDAENQAGLFIGEDYCD
AGTGT
TTACATT






old
REYYGQYPTIHHLISDLMNGTSPHDVRLVYLACAWLVA
AAATC
GCGAGTT






elephant
HRGHFLSNIDKDNLSGLKDFSSVYEGLMQYFSDNGYER
CATAG
CAAATAA







PWNANVDVKALGDALKKKQGVTAKTKELLALLLDSAKA
GGGTC
AAGTTTA







EKLPREEFPFSQDGIIKLLAGGTYKLSELFGNEEYKDF
TCAAA
CTCAAAT







GSVKLSMDDEKLGEIMSNIGEDYELIASLRIVSDWAVL
C
CGTTGGC







VDVLGESATISEAKVGIYNQHKADLEVLKKIIRKYTGK
[SEQ
TTGACCA







EGYKKVFRQVDSKENYVAYSQHESDGKAPKEKGIDIAT
ID NO.
ACCGCAC







FSKFILNIVRLLDVEPEDKEVYEDMVARLELNSFLPKQ
23]
AGCGTGT







VNTDNRVIPYQLYWFELHKILENASIYLPMLTEKDSNG









ISVMEKLESVFMFRIPYFVGPLNKHSKYAWLERKEGKI

GCTTAAA







YPWNFENMVDLDASEANFIKRMTNTCTYLPGQNVLPKD

GATCTCT







SLRYHRFMVLNEINNLRINNERISVELKQKIYSELFLN

TCAGTGA







VKKVTRKRLVDFLISNGELRKGEESSLTGIDVEIKANL

GGTC







APQIAFKKLMESGQLTEEDVESIIERASYAEDKARLAH

[SEQ ID







WLEAKYSKLSEIDRKYICGIKIKDFGRLSKMFLSELEG

NO. 24]







VDKTTGEMTTILGAMWNSQLNLMELINSELYSFREAIC









AYQTDYYSTHSSSLEERMNEMYLSNAVKRPVYRTLDIV









KDVKKAFGEPKKIFVEMTRGASEEQKGKRTKSRKEQIL









ELYKQCKDEDVRILQQQLEEMGDLADNKLQGDKLFLYY









MQKGKCMYTGTPIVLEQLGSKAYDIDHIYPQAYVKDDS









ILNNRVLVLSEANGKKKDIYPIEKETRDKMHGFWTYLN









DKGMITEEKYKRLTRTTGFTEEEKWSFINRQLTETSQA









TKAVATLLGELFPNAEIVYSKARLTSEFRQEFNLLKCR









SYNDLHHAVDAYLNIVCGNVYNMKFTKRWFNINKDYSI









KTKTVFTHPVVCGGQVVWDGQEMLNKVIRNAKKNTAHF









TKYAYIRKGGFFDQMPVKAAEGLTPLKKDMPTAVYGGY









NKPSVAFLIPTRYKAGKKTEIIILSVEHLFGERFLRDE









AYAKEYAAERLKKILGKQVDEVSFPMGMRPWKINTVLS









LDGFLICISGIGSGGKCLRAQSIMQFSSDYRWTIYLKR









LERLVEKITVNAKYVYSEEFDKVSTIENIELYDLYIEK









YKATIFSKRVNSPEEIIESGRDKFVKLDVLSQARALLC









IHQTFGRIVGGCDLGLIGGKKNSAATGNFSSTISNWAK









YYKDVRIIDQSTSGLWVRKSENLLELV









[SEQ ID NO. 22]




MAD2024
56
CADAKQ010000027.1
uncultured
Cattle
MNFDGEYFLGLDIGTDSVGYAVTDQRYNLVKFKGEPMW
GTTTG
GAGCCCT






Lachno-

rumen
GSHLFDAANQCAERRGFRTARRRLDRRQQRVKLVDEIF
AGAGT
CTGGATT






spiraceae


APEVAKVDPNFYIRKMESALYPEDKSNKGDLYLYFNKQ
AGTGT
TACACTA





bacterium

EYDEKHYYKDYPTIHHLICALMNDEKTKFDIRLINIAI
AAATC
CGAGTTC







DWLVAHRGHFLSEVGTDSVDKVLDFRKIYDEFMALFSD
CAGAG
AAATAAA







EDDAVSSKPWENINPDELGKVLKIHGKNAKRNELKKLL
GGCTC
AATTATT







YGGKIPTDEDSFIDRKLLIDFIAGTSVQCNKLFRNSEY
CAAAA
TCAAATC







EDDLKITISNSDEREVVLPQLEDFHADIIAKLSSMYDW
C
GCCGCTA







SVLSDILSGSTYISESKVKVYEQHKKDLKELKEFVRKY
[SEQ
TGTCGGC







APEKYNDIFRLASKETYNYTAYSYNLKSVKDEKDLPKG
ID NO.
CGCACAG







KASKEDFYSYLKKTLKLDKAENYNFVNDADTRFFDDMV
26]
TGTGTGC







ERISSGTFLPKQVNSDNRVIPYQVYYIELKKILENAKK

ATTAAGA







HYAFFEEKDEDGYSNVEKIMSVFTFRIPYYVGPLRNDD

AAAGTCC







KSPYAWIRRKADGKIYPWDFEEKVDLDASENAFIDRMT

GAAAGGG







NSCTYIPGADVLPKWSLLYTKYMVLNEINNIKVNNIGI

C







SVEAKQGIYNELFCKKAKVSLKAIREYLISNGFMQKDD

[SEQ ID







EMSGIDITVKSSLKSRYDFRHLLEKNELTTDDVEAIIS

NO. 27]







RSTYAEDKARFKKWLKKEFPQLSDEDYKYVSKLKYKDF









GRLSRSLLNGLEGASKETGEIGTIMHFLWETNDNLMQL









LSDRYTFMEEINKKRQDYYIEHKLTLNEQMEELGISNA









VKRPVTRTLAVVKDVVSAIGYAPQKIFVEMARQEDEKK









KRSVTRKEQILELYKNVEEDTKELERQLKKMGDTANNE









LQSDALFLYYLQLGKCMYSGKPIDLTQIKTTKKYDIDH









IWPQSMVKDDSLLNNRVLVLSEINGDKKDVYPIDESIR









SKMHSYWKMLLDKNLITKEKYSRLTRPTPFTESEKLGF









INRQLVETRQSMKAVTQLLNNMYPDSEIVYVKAKLAAD









FKQDFKLAPKSRIINDLHHAKDTYLNVVAGNVYNERFT









KKWFNVNEKYSMKTKVLFGHDVKIGDRLIWDSKKDLQT









VKNTYEKNNIHLTRYAYCQKGGLFDQMPVKKGQGQIQL









KKGMDIDRYGGYNKATASFFIIARYLRGGKKEVSFVPV









ELMVSEKFLNDDNFAIEYITNVLTGMNTKKIENVELPL









GKRVIKIKTVLLLDGYKVWVNGKASGGTRVMLTSAESL









RMPKEYVEYLKKMENYSEKKKSNRNFMHDSENDGLSEE









KNILLYDKLLEKLDENHFKKMPGNQCETMKSGRVKFIE









LDFDVQISTLLNCIDLLKSGRTGGCDLKNIGGKSASGV









VYISANLSACKYNDVHIIDISPAGLHENISCNLMELFE









[SEQ ID NO. 25]







MAD2025
56
DOQG01000053.1

Rumino-

human
MSFKENSKFYFGLDIGTDSVGWAVTDNLYKLYKYKNNL
GTTTG
TTTTACT






coccaceae

gut
MWGVSLFEAASPAEDRRNHRTARRRLDRRQQRVALLRE
AGAGT
ACCCTAT





bacterium

LFAKEILKTDPDFFLRLKESSLYPEDRTNKNVNTYFDD
AGTGT
AAATTTA







ADFKDSDYFKMYPTVHHLIKELSESDKPHDVRLVYLAC
AAATT
CACTACG







AFIVAHRGHFLNGADENNVQEVLDFNSSYCEFTDWFKS
TATAG
AGTTCAA







NDIEDNPFSESTENEFSVILRKKIGITAKEKEIKNLLF
GGTAG
ATAAAAA







GTTKTPDCYKDEEYPIDIDVLIKFISGGKTNLAKLFRN
TAAAA
TTATTTC







PAYDELDIQTVEVGKADFADTIDLLASSMEDTDVPLLS
C
AAATCGT







AVKAMYDWSLLIDVLKGQKTISDAKVCEYEQHKSDLKA
[SEQ
ACTTTTT







LKHIVRKYLDKAQYDEIFRTAGEKPNYVSYSYNVTDVK
ID NO.
AGTACCT







LKQLPSNFKKKYSEEFCKYINSKLEKIKPEPDDEAVYN
29]
TCACAAG







ELIEKCNSKTLCPKQVTDENRVIPYQLYYHELSMILDK

TGTTGTG







ASAYLDFLNETEDGISVKQKILTLMKFRIPYFVGPSVK

AATATTA







RNETDNVWIVRKAEGRIYPWNFENMVDYDKSEDGFIRR

ACTCACC







MTCKCTYLAGEDVLPKYSLLYSRYTVLNEINNIKVKDV

TTCGGGT







KISPELKQDIFNELFMKTSRVTVKKITELLKRKGAFSE

GAG







ENGDSLSGVDINIKSSLKSYLDFRRLLENGSLSESDVE

[SEQ ID







RIIERITVTTDKPRLISWLKTEYPALPAEDIRYISRLS

NO. 30]







YKDYGRLSAKMLTGCYELDMDTGEIGGRSIIDLMWAEN









INLMQIMSDSYGYKSFIEEENKKYYAINPTGSIAQTLR









EMYVSPSASRAIIRTMDIVKELRKIIKRDPDKIFVEMA









RGSKPEDKGKRTSSRREQIEKLFASAKEFVSDEEISHL









RSQLGSLSDEQLRSEKYYLYFTQFGKCVYSGEAIDFSR









LGDNHCYDIDHIYPQSKVKDDSLHNKVLVKSQLNGEKS









DDYPIKEQIRNKMHPIWKNLFYRDPKNPTDKIKYERLT









RSTPFTEDELAGFIERQLVETRQSTKAVATLLKEMFPD









SKIVYVKAGQVSKFRHDFDMLKCREINDLHHAKDAYLN









VVVGNVHDVKFTSNPLNFVKNADKHYTIKIKETLKHKV









ARNGETAWNPETDFDTVKRMMSKNSVRYVRYCYKRKGE









LFKQQPKKAGNPDLAWLKKNLDPVKYGGYNSKSISCFS









LIKCTGVGVVIIPVELLCEKRYFSDDSFASEYAYSVLK









NALPAKNIAKISIDDISFPLKRRPIKINTLFEFDGYRV









NIRSKDSYSVFRISSAMAAIYSKDTSDYIKAISSYIDK









SDKGSKFKPGEAFDVLSNLKAYDEIAKKCISEPFCKIS









KLAEAGKKMEEGRNKFAELSIIEQMKTLLLLVDVLKTG









RVDKCNLKPVGGVDNFHTERMSAILKNTKYSDIRIIDQ









SPTGLYENKSDNLLEL [SEQ ID NO. 28]







MAD2026
65
CADBQN010000053.1
uncultured
Cattle
MEQKDYYIGLDIGTNSVGWAVVDEGYQLCRFKKYDMWG
GTTTG
GACTACC






Firmicutes

rumen
VRLFDSAETAAERRMNRVNRRRNRRKKQRIDLLQGLFA
AGAGT
ATATGAG





bacterium

EEIAKIDRTFFVRLNESRLHPEDKSTAFRHPLFNDPNY
AGTGT
ATTACAC







TDVDYYKEYPTIYHLRKELMDSAEPHDIRLVYLALHHI
AATTT
TACACGG







LKNRGHFLIEGGFEDSKKFEPTFRQLLEVLTEELGLKM
CATAT
TTCAAAT







DGADAALAESVLKDRGMKKTEKVKRLKNVFTLNTTDMD
GGTAG
AAAGAAT







QESQKKQKAQIDAVCKFLAGSKGDFKKLVADEALNELK
TCAAA
GTTCGAA







LDTFALGTSKAEDIGLEIEKSAPQYCVVFESVKSVFDW
C
ACCGCCC







KIMTQILGDESTFSSAKVKEYEKHHENLIILRELIRKY
[SEQ
TTTGGGG







CDKETYRHFFNNVNGGYSRYIGSLKKNGKKYYVAGCTQ
ID NO.
CCCGCTT







EEFYKELKGLLKSIDQRVDPEDRPVYQRVLAETEDETF
32]
GTTGCGG







LPLLRSKANSAIPRQIHQKELDDILQNASVYLPFLNDV

ATTTACA







DEDGLSAAEKIRSIFTFRIPYYVGPLSLRHKDKGAHVW

GACTTGA







IKRKEEGYIYPWNYEKKIDREKSNEEFIRRLINQCTYL

TATCAAG







KDEKVLPKKSLLYSEFMVLNELNNLRIRGKRLSEEQVE

TCTG







LKQRIYRDLFMTKTRVTKKTLLNYLRKEDSDLTEEDLS

[SEQ ID







GFDNDFKASLSSCLELKNKVFGDRIEEDRVRKIAEDLI

NO. 33]







RWLTIYDDDKKMIKEVIRAEYPNEFTNEQLDVICRLKF









SGWGNLSEAFLCGVEGADKDTGEVFTIIEALRNTNHNL









MELLSGNYTFTEKIREHNAALSSEIKAKDYESLVRDLY









VSPACKRGIWQTIRITEEIKKIMGHEPKKIFVEMTREH









RDSGRTTSRKDQLLALYQKCEEDARDWVKEIEDREERD









FSSIKLFLYYLQQGKCMYSGEAIDLDELMSKNSRWDRD









HIYPQSKIKDDSLDNLVLVKKELNAVKDNGEIAPDIQK









RMKGFWLSLLRQGFLSKKKFDRLTRTGPFTSEELAGFI









SRQLVETSQMSKAVAELLNQLYEDSRVVYVKAGLVSQF









RQKDLGVLKSRSVNDYHHAKDAYLNVVVGDMFDRKFTS









DPARWFKKNKKVNYSINQVFRRDYEENGKLIWKGIDRG









EDGKPLFRDGLIHGGTIDLVRAIAKRNTNIRYTEYTYC









ETGQLYNLTLLPKTDTAITIPLKKELPAAKYGGFKGAG









TSYFSLIEFDDKKGHHHKQIVGVPIYVANMLEHNENAF









IEYLETVCSFRNITVLCEKIKKNALISVNGYPMRIRGE









NEILNMLKNNLQLVLSQEGEETLRHIEKYFNKKPGFEP









DKEHDGIDRDAMAALYDEMTEKLCTVYKKRPTNQGELL









KNNRGLFLNLEKRSEMAKVLSETAKMFGTTAQTTADLS









LIKGSKYAGKIVINKNTLGAAKLILIHQSVTGLFETRV









EL [SEQ ID NO. 31]







MAD2027
65
CACWRN010000001.1
uncultured
Cattle
MSKKFAGEYYLGLDIGTDSVGWAVTDNQYNVLKFNGKS
GTTTG
TTTACCA






Succini-

rumen
MWGIRLFDAAQTAAERRMFRTARRRVERRRWRLELLQE
AGAGT
TCCAGTG






clasticum


LFQNEIEKKDPDFFQRMKDSALYPEDSKTGKPFALFCD
AATGT
AGTTTAC





sp.

KDLNDKLYYKQYPTIYHLRKALLTENSKFDIRLVYLAI
AAATT
ATTACAA







HHILKHRGHFLFNGDFSNVTRFSFAFEQLQTCLCNELD
CATAG
GTTCAAA







MDFECNNVQKLSEILKDTHMSKNDKVKASVALFENSGD
GATGG
TAAAAAT







KKQLQAVIGLFCGAKKKLADVFLDETLNDTEMPSISIA
TAAAA
TTATTCA







DKPYEELRPELESILAEKCCVIDYIKAVYDWAILADML
C
ACCCGTT







DGGEYGNRTYISVARVRQYEKHHDDLKKLKKLVRRYCK
[SEQ
CTTCGGA







SEYKSFFSVAGTDNYCAYIGDDIETDDRKSVKKCKQED
ID NO.
ACCTCCA







FYKRIKGLLKKAIENGCPKDEVVEIIKDIDAQVFLPLQ
35]
CCGTGTG







VTKDNGVIPHQVHEMELKQILKNAEKYYPFLCKKDEEG

GAACATT







IVTSNKILQLFKFRIPYYVGPLNSRIGKNSWIVRRAEG

AAGGTCT







KIYPWNFEEKVDFDKSEEGFIRRMTNPCTYMAGADVLP

GCTTTGC







KYSLLYSEFMVLNELNNVRICGDKLSVEIKQTIIKDLF

AGGCC







QRTRRVTVRKLCDKLKAEGVISRNSNQKDIDIKGIDQD

[SEQ ID







LKSSMVSYVDFKNIFGKEIEKYSVQQMCERIIFLLTIH

NO. 36]







HDDKRRLQKRIRAEFTEAQITDDQLQKVLRLNYQGWGR









FSAEFLKELKGVDTETGEVFSIINALRETDDNLMQLLS









NRYTFAEELEKYNSNKRKKIEALTYDNIMEGIVASPAI









KRSAWQAISIVMELSKIMGREPKRIFVEMARGPEEKKH









TISRKNQLLELYKSVKDESRDWKTELETKTESDFRSIK









LFLYYTQMGRCMYTGEPIDLDQLANTTIYDRDHIYPQS









LTKDDSLNNLVLVKKVENANKGNGLISADIQKKMRGFW









AELKKKGLISDEKFSRLTRTTPLSDDELAGFINRQLVE









TRQSSKIVADLFHQLYPTTQVVYVKAKIVSDFRHETLD









MVKVRSLNDLHHAKDAYLNIVTGNVYYEKFSGNPLTWL









RKNPDRNYSLNQMFNYDIVKKTKEGTSYVWKKGKDGSI









AVVRRTMERNDILYTRQATENKNGGLFDQNIVSSKNKP









FIPVKKGLDVNKYGGYKGITPAYFALIEFTDKKGSRQR









LLEAVPLYLRADIDNDSNVLRDFYKNVLGLENPVVILN









RIKKNSLLKINGFLIHLRGTTGFSASQLKVQNAVEFSL









PHHMEDYVKKLENYEKHIIAERGSTKNSQIKITEWDGI









SKEKNLQLYDMFINKMENTIYKFRPANQVSNLKENREV









FNSLAVEDQCSVLNQVLMLFVCKPVTANLSLIKGSKNA









GNMALSKIISNMRSAYLIHQSVTGLFEQKIDLLKVSSQ









KD [SEQ ID NO. 34]







MAD2028
66
DHKP01000031.1

Bacillales

gut
MANKLFIGLDVGSDSVGWAATDENFHLYRLKGKTAWGA
GTTTG
GCATTGT





bacterium
meta-
RIFSEASDAKGRRGFRVAGRRLARRKERIRLLNTLFDP
AGAGC
AAGACAA






genome
LLKEKDPTFLLRLENSAIQNDDPNKPAQAVTDCLLFAN
AGTGT
CACTGCT







KQEEKGFYKRYPTIWHLRKALMDNEDCAFSDIRFLYLA
TGTCT
ACGTTCA







IHHIIKYRGNFLRDGEIKIGQFDYSVFDKLNETLSVLF
TATAT
AATAAGC







DLQSEDEDSQEGHFVGLPKSQYEAFITTANDRNLPKQT
AGCTC
ATATTGC







KKTKLLSMFEKDEESKSFLEMFCTLCAGGEFSTKKLNK
GAAAA
TACAAGG







KGEETFDDTKISFNASYDQNEPNYQEILGDAFDLVDIA
C
TTCTCCC







KAVFDYCDLSDILNGNDNLSNAFVELYDSHKSQLSALK
[SEQ
TCGGAGA







AICKQIDNQSNLKGDASVYVKLFNDPNDKSNYPAFTHN
ID NO.
ATGACCA







KTLVDKRCDIHTFDKYVIDTVLPYEPLLMGQDATNWQM
38]
TTAGGTC







LKSLAEQDRLLQTIALRSTSVIPMQLHQKELKIILKNA

ACTTAGA







ISRNVKGIAEIEEKILKLFQYKIPYYCGPLTTKSAYSN

TAGCCGG







VVFKNNEYRPLKPWDYEEAIDWDETKKKFMEGLTNKCT

TTCTTCT







YLKDKNVLPKQSILYQDFDAWNKLNNLKVNGSKPSLKE

GGCTA







LKDLFSFVSQRPKTTMKDIQRHFKSDTNSKDKDVVVSG

[SEQ ID







WNPEDYICCSSRASFGKNGVFDLNNPDSSDPKDLSKCE

NO. 39]







RMIFLKTIYADSPKDADVAILKEFPDLTNDQKSLLKTI









KCKEWSPLSKEFLELRYADKYGEIRESIINLLRSGEGN









LMQILAKYDYQERIDAYNADSFQTKSKSQIVSDLIEEM









PPKMRRPVIQAVRIVHEVVKVAKKEPDQISIEVTRENN









NKEKKQQLTKKAKSRSAQIQTFLKNLVKIDTFEEKRVD









EVLEELKKYSDRSINGKHLYLYFLQNGKDAYTGKPINI









DDVLSGNKYDTDHVIPQSKMKDDSIDNLVLVERSINQH









RSNEYPLPESIRKNPANVAFWSKLKKAGMMSEKKFNNL









TRANPLTEEELSAFVAAQINVVNRSNIVIRDVLKVLYP









NAKLIFSKAQYPSQIRKELNIPKLRDLNDTHHAVDAYL









NIVSGVSLTERYGNLSFIKAAQKNENQTDYSLNMERYI









SSLIQTKEGEKTSLGKLIDQTSRRHDFLLTYRFSYQDS









AFYNQTIYKKNAGLIPVHEKLPPERYGGYNSMSTEVNC









VVTIKGKKERRYLVGVPHLLLEKGNKVADINKEIANSV









PHKENETIAVSLKDIIQLDSMVKKDGLVYLCTTQNKDL









VKLKPFGPIFLSRESEVYLSNLNKFVEKYPNIADGNEN









YSLKTNRYGEKSIDFLQEKTGNVLKELVDLSNQKRFDY









CPMICKLRTIDYRKGVEGKTLTEQLILIRSFVGVFTRK









SEALSNGSNFRKARGLVLQDGLVLCSDSITGLYHTERK









L [SEQ ID NO. 37]







MAD2029
66
DBKT01000013.1

Bacillales

gut
MADKLFIGLDVGSESVGWAATDENFHLYRLKGKTAWGA
GTTTG
GCATTGT





bacterium
meta-
RIFSEANDAKTRRGFRVAGRRLARRKERIRLLNTLFDP
AGAGC
AAGACAA






genome
LLKKDPAFLLRLENSAIQNDDPNKPIQAIADCPLLVNK
AGTGT
CACTGCT







QEEKDYYKRYPTIWHLRKALMENDDHAFSDIRFLYLAI
TGTCT
ACGTTCA







HHIIKYRGNFLREGDIKIGQFDYSIFDKLNETLAVLFD
TATAT
AATAAGC







LQNEDGENEEGRFIGLPKSQYEAFITCANDRNLPKQPK
AGCTC
ATATTGC







KAKLLSMFEKTEESKAFLEMFCTLCSGGEFSTKKLNAK
GAAAA
TACAAGG







GEETYQDAKISFNSSYDENEGAYQEILGDFFDLVDIAK
C
TTCTCCA







AVFDYCDLSDILNGNDNLSSAFVELYDSHKSQLSALKS
[SEQ
TTGGAGA







ICKRIDNQNGFIGEKSIYVKLFNDPNDKSNYPAFTNNK
ID NO.
ATGACCA







TLVDKRCDIHTFDKYVKETILPYESSLTGRDAVNWQML
41]
TTAGGTC







KSLAEQDRLLQTIALRSTSVIPMQLHQKELKIILKNAV

GCTTAGA







SRNIKGVAEIEEKILKLFQYKIPYYCGPLTTKSDYSNV

TAGCCAG







VFKNNEYRPLKPWDYEEAIDWDGTKQKFMEGLTNKCTY

TTCTTCT







LKDKNVLPKQSVLYQDFDTWNKLNNLKVNGNKPSLEDL

GGCTA







NDLFSFVSQRSKTTMRDIQRYLKSKTNSKENDVVVSGW

[SEQ ID







NSEDYICCSSRASFNKNGIFNLNNSEVLKECERIIFLK

NO. 42]







TIYTDSPKDADAAVLKEFPDLTNNQKTLLKTIKCKEWS









PLSKEFLELRYSDKYGElRQSIIDLLRNGEGNLMQILA









KYDYQEVIDACNAASFQTKSKSQIVSDLIEEMPPKMRR









PVIQAVRIVQEVAKVAKKEPDEISIEVTRENNDKEKKQ









QLTKKAKSRSTQIQNFLKNLVKIDASEKKQANEVLEEL









KKYSDQSINGKHLYLYFLQNGKDAYTGKPINIDDVLSG









NKYDTDHIIPQSKMKDDSIDNLVLVEREINQHRSNEYP









LPESIRKNPANVAFWRKLKKAGMMSEKKFNNLTRSNPL









TEEELGAFVAAQINVVNRSNVVIRDVLKILYPNAKLIF









SKAQYPSQIRKELNIPKLRDLNDTHHAVDAYLNIVSGV









TLTDRYGNMRFIKASQDEEKHSLNMERYISSLIQTKEG









QRTELGELIDQTSRRHDFLLTYRFSYQDSAFYKQTIYK









KNAGLIPAHDNLPPERYGGYDSMSTEVNCVATIIGKKT









TRYLVGVPHLLIKKAKDGIDVNDELIKLVPHKENEVVK









VDLNTTLQLDCTVKKDGFMYLCTSNNIALVKLKPFSPI









FLSRESEIYLSNLMKYVEKYPNISDENSEYEFKINREN









VDPIKFTEKQSIEVVQDLIIKAKQDRFSYCSMISKLRD









INAEEMIHSKSLTEQLKIIKSLIGVFTRKSEILSDKNN









FRKSRGAILQEDLFLCSDSITGLYHTERKL [SEQ ID









NO. 40]







MAD2030
66
DBLD01000015.1

Bacillales

gut
MEQNTKKLFIGLDVGTDSVGWAATDEYFNLYRLKGKTA
GTTTG
GCATTGT





bacterium
meta-
WGARLFLDAANAKDRRQHRVSGRRLARRKERIRLLNAL
AGAGC
AAGACAA






genome
FDPLLKKVDPTFLLRLESSTLQNDDPNKDQRAVSDALL
AGTGT
CACTGCA







FGNKKHEKAYYAAFPTIWHLRKALIENDDKAFSDIRYL
TGTCT
CGTTCAA







YLAIHHIIKYRGNFLRQGEIKIGEFDFSCFDKLNQFFD
TAAAT
ATAAGCA







IYFSKEDEEEVEFIGLPNENYQRFIDCAADKNLGKGKK
AGCTC
GATTGCT







KGDLLKLMSFSEDEKPFCEMFCSLCAGLAFSTKKLNKK
GAAAA
ACAAGGT







DETVFEDIKVEFNGKFDDKQEEIKSVLGDAYDLVELAK
C
TCCCGTA







FIFDYCDLKDILGASTNRLSEAFAGIYDSHKEELKALK
[SEQ
AGGGAAT







GICREIDRSLGNESKNSLYREVFNDKGIPNNYAAFIHH
ID NO.
GACCATC







ETNSSRCGIADFNNYVLQKIEPLENLLSKQNYKNWIQL
44]
TGGTCAC







KQLASQGRLLQTIAIRSTSIIPMQLHLKDLKLILANAE

ATGAATA







KRDIPGIKDIKEKILLLFQFKVPYYCGPLTDRSQYSNV

GCCCCCG







VLKAGTREKITPWNFADQVDLEETKKKFMEGLTNKCTY

GCAACGG







LKDCNVLPRQSLMFQEYDAWNKLNNLSINGNKPSPEEM

TGGCTG







NALFDFASKRRKTTMSDIKKFEKRATMSKENDVTVSGW

[SEQ ID







NENDFIDLSSFVSLSGFFDLGEIHSADYMACEEAILLK

NO. 45]







TIFTDAPQDADPIIAEKFPNLKPNQLAALKKMSCKGWA









TLSREFLTLKAVDADGEVMNETLLGLMKEGKGNLMQLL









HSSLYNFQDVIDSHNRAVFGDKSPKQIANDLIEEMPPQ









MRRPVIQALRIVREVSKVAKKQPDVISIEVTRESNDKK









KKEEWSKKATDRKKQIDLFLKNLKKTEDVKQTESELDG









QAINDIDSIRGKHLYLYFLQNGKDAYTGLPIDINDVLN









GTKYDTDHIIPQSLMKDDSIDNLVLVNREKNQHKSNEF









PLPRDIQTKANIERWRALKKAGGMSEKKFNNLTRTTPL









TEEELSAFVAAQINVVNRSNVVIRDVLKILYPNAKLIF









SKAQYPSQIRRDLEIPKLRDLNDTHHAVDAFLNIVSGV









ELTKQFGRMDVIKAAAKGDKDHSLNMTRYLERLLKKVD









ENKNETMTELGNHVFVTSQRHDFLLTYRFDYQDSAFYN









ATIYSPDKNLIPMHDGMDPERYGGYSSLNIEYNCIATI









KGKKKTTRYLLGVPHLLALKFKNDGIDITSDLIKLVPH









KGDEEVSIDWKNPIPLRITVKKDGVEYLLAPFNAQVME









LKPVSPVFLPREAAEYLARLKKAVDQKKQFIYQNSAEI









FQSKDKNNALQFGPEQSKNVALKIYALADAKKYDYCAM









ISKLRDAALRAEMLDSLSSEALFKQYNDLISLLSQLTR









RSKKISSKYFSKSRGALLQDGLKIVSKSITGLYETERN









L [SEQ ID NO. 43]







MAD2031
141
CACVOG010000001.1
uncultured
Cattle
MNYILGLDIGIASVGWAAVALDANDEPCKILDLNARIF
ATTGT
TTGTAAT






Seleno-

rumen
EAAEQPKTGASLAAPRREARGSRRRTRRRRHRMERLRH
ACCAT
AACCTAT






monadaceae


LFAREELISAENIAALFEAPADVYRLRAEGLSRRLDEG
AGCGA
TTTACCT





bacterium

EWARVLYHIAKRRGFKSNRKGAASDADEGKVLEAVKEN
GTTAA
CGCTATG







EALLKNYKTVGEMMFRDEKFQTAKRNKGGSYTFCVSRG
ATTAG
GCACAAT







MLAEEIGELFAAQREQGNPHASETFETAYSKIFADQRS
GGAAT
TTGTTAT







FDDGPDANSRSPYAGNQIEKMIGTCSLETDPPEKRAAK
TACAA
TACATGG







ASYSFMRFSLLQKINHLRLKDAKGEERPLTDEERAAVE
C
ACATTAT







ALAWKSPSLTYGAIRKALPLPDELRFTDLYYRWDKKPE
[SEQ
ACTAAAC







EIEKKKLPFAAPYHEIRKALDKREKGRIQSLTPDALDA
ID NO.
ATTTCCT







VGYAFTVFKNDAKIEAALSAAGIDGEDAVALMAAGLTF
47]
AAAAAAG







RGFGHISVKACRKLIPHLEKGMTYDKACKEAGYDLQKT

CAACGAA







GGEKTKLLSGNLDEIREIPNPVVRRAIAQTVKVVNAVI

AAACGTG







RRYGSPVAVNVELAREMGRTFQERRDMMKSMEDNNAEN

CTGGCAG







EKRKEELKGYGVVHPSGLDIVKLKLYKEQGGVCAYSLA

CAA







AMPIEKVLKDHDYAEVDHILPYSRSFDDSYANKVLVLS

[SEQ ID







KENRDKGNRTPMEYMANMPGRRHDFITWVKSAVRNPRK

NO. 48]







RDNLLLEKFGEDKEAAWKERHLTDTKYIGSFIANLLRD









HLEFAPWLNGKKKQHVLAVNGAVTDYTRKRLGIRKIRE









DGDLHHAVDAAVIATVTQGNIQKLTDYSKQIERAFVKN









RDGRYVNPDTGEVLKKDEWIVQRSRHFPEPWPGFRHEL









EARVSDHPKEMIESLRLPTYTPEEIDGLKPPFVSRMPT









RKVRGAAHLETVVSPRLKDEGMIVKKVSLDALKLTKDK









DAIENYYAPESDHLLYEALLHRLQAFGGDGEKAFAESF









HKPKADGTPGPVVKKVKIAEKSTLSVPVHHGRGLAANG









GMVRVDVFFIPEGKDRGYYLVPVYTSDVVRGELPMRAV









VQGKSYAEWKLMREEDFIFSLYPNDLVYIEHEKGVKVK









IQKKLREISTLPREKTMTSGLFYYRTMGIAVASIHIYA









PDGVYVQESLGVKTLKEFKKWTIDILGGEPHPVQKEKR









QDFASVKRDPHAAKSTSSG [SEQ ID NO. 46]







MAD2032
141
CACVWE010000020.1
uncultured
Cattle
MKYIIGLDMGITSVGFATMMLDDKDEPCRIIRMGSRIF
GTTGT
ATTGTAT






Rumino-

rumen
EAAEHPKDGSSLAAPRRINRGMRRRLRRKSHRKERIKD
AGTTC
CATACCA






coccus


LIIKNELMTADEISAIYSTGKQLSDIYQIRAEALDRKL
CCTAA
AGAACAA





sp.

NTEEFVRLLIHLSQRRGFKSNRKVDAKEKGSDAGKLLS
TTATT
TTAGGTT







AVNSNKELMIEKNYRTIGEMLYKDEKFSEYKRNKADDY
CTTGG
ACTATGA







SNTFARSEYEDEIRQIFSAQQEHGNPYATDELKESYLD
TATGG
TAAGGTA







IYLSQRSFDEGPGGSSPYGGNQIEKMIGNCTLEPEEKR
TATAA
GTATACC







AAKATFSFEYFNLLSKVNSIKIVSSSGKRALNNDERQS
T
GCAAAGC







VIRLAFAKNAISYTSLRKELNMEYSERFNISYSQSDKS
[SEQ
TCTAACA







IEEIEKKTKFTYLTAYHTFKKAYGSVFVEWSADKKNSL
ID NO.
CCTCATC







AYALTAYKNDTKIIEYLTQKGFDAAETDIALTLPSFSK
50]
TTCGGAT







WGNLSEKALNNIIPYLEQGMLYHDACTAAGYNFKADDT

GAGGTGT







DKRMYLPAHEKEAPELDDITNPVVRRAISQTIKVINAL

TATCT







IREMGESPCFVNIELARELSKNKAERSKIEKGQKENQV

[SEQ ID







RNDRIMERLRNEFGLLSPTGQDLIKLKLWEEQDGICPY

NO. 51]







SLKPIKIEKLFDVGYTDIDHIIPYSLSFDDTYNNKVLV









MSSENRQKGNRIPMQYLEGKRQDDFWLWVDNSNLSRRK









KQNLTKETLSEDDLSGFKKRNLQDTQYLSRFMMNYLKK









YLALAPNTTGRKNTIQAVNGAVTSYLRKRWGIQKVREN









GDTHHAVDAVVISCVTAGMTKRVSEYAKYKETEFQNPQ









TGEFFDVDIRTGEVINRFPLPYARFRNELLMRCSENPS









RILHEMPLPTYAADEKVAPIFVSRMPKHKVKGSAHKET









IRRAFEEDGKKYTVSKVPLTDLKLKNGEIENYYNPESD









GLLYNALKEQUAFGGDAAKAFEQPFYKPKSDGSEGPLV









KKVKLINKATLTVPVLNNTAVADNGSMVRVDVFFVEGE









GYYLVPIYVADTVKKELPNKAIIANKPYEEWKEMREEN









FVFSLYPNDLIKISSRKDMKFNLVNKESTLAPNCQSKE









ALVYYKGSDISTAAVTAINHDNTYKLRGLGVKTLLKIE









KYQVDVLGNVFKVGKEKRVRFK [SEQ ID NO. 49]







MAD2033
141
DCJP01000021.1
un-
Feces
MKNTLYGIGLDIGVASVGWAVVGLNGTGEPVGLHRLGV
GTTGT
TTATACC





cultivated
of
RIFDKAEQPKTGESLAAPRRMARGMRRRLRRKALRRAD
AGTTC
ATACCAA






Faecali

three-
VYALLERSGLSTREALAQMFEAGGLEDIYALRTRALDE
CCTAA
GAACTGT





bacterium
weeks
PVGKAEFSRILLHLAQRRGFKSNRRTASDGEDGRLLAA
CAGTT
TATGGTT





sp.
old
VNENRRRMAQGGWRTVGEMLYRHEAFALRKRNKADEYL
CTTGG
GCTATGA






elephant
STVGRDMVAEEASLLFQRQRELGCAWATPELQAEYLSI
TATGG
TAAGGTC







LLRQRSFDEGPGGNSPYGGNQVEKMVGRCTFEPDEPRA
TATAA
TTAGCAC







AKAAYSFEYFSLLQKLNHIRLAENGETRPLTQPQRQQL
T
CGTAAAG







LSLAHKTPDVSLARIRKELALPETVQFNGVRCRANETL
[SEQ
CTCTGAC







EESEKKEKFACLPAYHKMRKALDGVVKGRISSLSISQR
ID NO.
GCCTCGC







DAAATALSLYKNEDTLRAKLTEAGFQAPEIDALAGLTG
53]
TTTCAGC







FSKFGHLSLKACRKLIPHLEQGLTYDQACSAAGYDFKG

GGGGCGT







HGAGERAFTLPAAAPEMEQITSPVVRRAVAQTIKVVNG

CATCTTT







IIREMDASPAWVRIELARELSKTFGERQEMDRSMRENA

TTTGCCC







AQNERLMQELRDTFHLLSPTGQDLVKYRLWKEQDGVCA

AAAAGAC







YSLRRLDVERLFEPGYVDVDHIVPYSLSFDDRRSNKVL

ACGGATA







VLSSENRQKGNRLPLQYLQGKRREDFIVWTNSSVRDYR

TTTTT







KRQNLLREKFSGDEAEGFRQRNLQDTQHMARFLYNYIS

[SEQ ID







DHLAFAQSEALGKKRVFAVSGAVTSHLRKRWGLSKVRA

NO. 54]







DGDLHHALDAAVIACTTDGMIRRISGYYGHIEGEYLQD









ADGAGSQHARTKERFPAPWPRFRDELIVRLSEQPGEHL









LDINPAFYCEYGTEHICPVFVSRMPRRKVTGPGHKETI









KGAAAADEGLLTVRKALTELKLDKDGEIKDYYMPSSDT









LLYEALKAQLRRFGGDGKKAFAEPFYKPKADGTPGPLV









RKVKTIEKATLTVPVHGGAASNDTMVRVDVFLVPGDGY









YWVPVYVADTLKPELPNRAVVAFKPYSEWKEMREEDFI









FSLYPNDLVYVEHKSGLKFTLQNADSTLEKTWVPKASF









AYFVGGDISTAAISLRTHDNAYGLRGLGIKTLKVLKKY









QVDVLGNISPVHRETRQRFR [SEQ ID NO. 52]







MAD2034
141
CACXAV010000001.1
uncultured
Cattle
MAYGIGLDIGIASVGFATVALNEQDEPCGILRMGSRIF
GTTGT
TTATACC






Clostri-

rumen
DAAEHPKNGASLAAPRREARSARRRLRRHRHRLERIRN
AGTTC
ATACCAA






diales


LLVESCLISQDGLGSLFEGRLEDIYALRTRALDERLTD
CCTAA
GAACTGT





bacterium

AELCRVLIHLAQRRGFRSNRKADAADKEAGKLLKAVSE
CGGTT
TGGGTTA







NDRRMEENGYRTVGEMLYKDPLFAEHRRNKGEAYLSTV
CTTGG
CTACAAT







TRTAVEQEARLVLSTQREKGNAAITEDFVEKYLDILLS
TATGG
AAGGTAG







QRPFDVGPGGNSPYGGNMIEKMIGRCTFEPDELRAPKA
TATAA
TAAACCG







SYSFEYFQLLQKVNHIRLLRDGRSEPLSEEQRRAIIDL
T
AAAAGCT







ALASADVTFAKIRKALSLPDSVRFNDVYYRESAEEAEK
[SEQ
CTGACGT







KKKLGCMDAYHEMRKALDKVAKGRICAIPVEQRNAIAY
ID NO.
CTTGTTT







VLTVHKTDERILTELQNINLERSDIDQLMQMKGFSKFG
56]
GCGCAGG







HLSIKACDRIIPYLEQGMTYSDACTAAGYAFRGHEGGE

ACGTCAT







HSLYLPAQTPEMDEITSPVVRRAVSQTIKVVNALIREQ

CTTTATA







GESPTFVNIELAREMSKDFAERNDIRRENEKNAKANEA

TCAGACG







VMNELRRTFGLVNPSGQDLVKYKLFLEQGGVCPYTQRP

GATG







MEPGRLFEAGYADVDHIVPYSISFDDRYCNKVLTFASV

[SEQ ID







NRKEKGNRLPLQFLKGERRESFIVYVKANVRDYRKQRL

NO. 57]







LLKETVTEEDRKGFRDRNLQDTKHMAAFLHSYINDHLQ









FAPFQTDRKRHVTAVNGAVTAYLRKRWGIRKVRAEGDL









HHASDALVIACTTPGMIQRLSRYAELREAEYMQTEDGA









VRFDPATGEVLEKFPYPWPCFRQEWTARVSDDPQAMLQ









DMKLTDYRGLPLEQVKPVFVSRMPKHKVTGAAHKDTVK









SAKALDRGVVLVKRALTDLKLKDGEIENYYDPASDRLL









YEALKERLIAFGGDAQKAFAEPFHKPKRDGTPGPLVKK









VKLMEKSSLTVPVHDGKGVADNDSMVRIDVFFVAGEGY









YFVPIYVADTVKPELPNRAVVANKPYAEWKEMKDEDFL









FSLYPSDLMRVTQKKGIKLSLINKESTLKKEEMAQSIL









LYYVKGSISTGSITAENHDRTYAINSLGIKTLEKLEKY









QVDVLGNVSPVGKEKRLTFC [SEQ ID NO. 55]







MAD2035
141
CADATZ010000012.1
uncultured
Cattle
MLPYAIGLDIGIASVGWAVVGLDTNERPFCILGMGSRI
GTTGT
TTATACC






Chloroflexi

rumen
FDKAEQPKTGASLALPRREARSLRRRLRRHRHRNERIR
AGTCC
ATTCCAG





bacterium

NLLLREKIISESELQDLFSGTLSDIYQLRVEALDRKLD
CCTGA
AAACTAT







DKEFSRVLIHIAQRRGFKSNRKNAAASQEDGKLLSAVT
TGGTT
TATGGTC







ENQQRMNDKGYRTVSEMLLRDDKFKDHKRNKGGEYLTT
TCTGG
ACTACAA







VTRTMVEDEVHKIFSAQRTHGNLKADNQLESEYLEILL
AATGG
TAAGGTA







SQRSFDEGPGGDSPYGGSQIEKMIGKCTFFPEEKRAAK
TATAA
TTAGACC







ATYTFEYFNLLEKINHIRLVSKDNLPEPLSDFQRRSLI
T
GTAGAGC







ELAYKVENLTYDRIRKELHISPELKFNTIRYESDDLPE
[SEQ
ACTAACA







NEKKQKLNCLKAYHElRKALDKLGKGTINTLSKEQLNT
ID NO.
CCCCATT







IGTVLSMYKTSEIIKNKMEQIPAEIVDKLDEEGINFSK
59]
TGGGGTG







FGHLSIKACELIIPGLEKGLNYNDACEEAGLNFKAHNN

TTATCTC







EEKSFLLHPTEDDYADITSPVVKRAASQTIKVINAIIR

TTTAAAC







KQGCSPTYINIEVARELSKDFYERDKINKRNEANRAEN

TGTCCAA







ERSLEQIRKEYGKSNASGLDLVKFKLYQKQDGVCAYSQ

AATTTAG







KQISFERLFEPNYVEVDHIIPYSKCFDDRESNKVLVFA

TATTGCA







KENREKGNRLPLEYLDGKKRESFIVWVNSKVKDYRKKQ

ATTATTG







NLLKESLSEEEEKQFKERNLQDTKTVSKFLMNYINDNL

A







IFSSSNKRKKHVTAVSGGVTSYMRKRWGISKVREDGDQ

[SEQ ID







HHAVDALVIVCTTDGMIQQVSKYVEYKECQYIQTDAGS

NO. 60]







LAVDPYTGEVLRSFPYPWARFHEDAVTWTEKIFVSRMP









MRKVTGPAHKETIKSPKALGEGLLIVRKPLTELKLKNG









EIENYYKPEADLLLYNGLKERLMEFGGDAKKAFAEPFP









KPGNPQKIVKKVRLTEKSTLNVPVLKGEGRADNDSMVR









VDVFLKDGKYYLVPIYVADTLKPELPNKACIAHKPYDE









WATMDDGDFLFSLYPNDLIYIKHKKGIKLTKINKNSTL









ADSIEGKEFFLFYKTMGISSAVLTCTNHDNTYYIESLG









VKTLESLEKCVVGVLGEIHKVRKEKRTGFSGN [SEQ









ID NO. 58]







MAD2036
141
CADAWQ010000026.1

Ruminoe-

Cattle
MLPYAIGLDIGISSVGWASVALDEEDKPCGIIGMGSRI
GTTAT
TTATACC






coccacea

rumen
FDAAEQPKTGDSLAAPRRAARSARRRLRRRRHRNERIR
AGTTC
ATACCAA





bacterium

ALMLREGLLSEAELAALFDGRLEDICALRVRALDEAVT
CCTGT
GAACGAA







NDELARILLHLSQRRGFRSNRKTAATQEDGELLAAVSA
TCGTT
GCAGGTT







NRALMQERGYRTVAEMLLRDERYRDHRRNKGGAYIATV
CTTGG
ACTATGA







GRDMVEDEVRQIFAAQRALGSTAASETLETAYLEILLS
TATGG
TAAGGTA







QRSFDAGPGEPSPYAGGQIERMIGRCTFEPDEPRAARA
TATAA
GTATACC







TYSFEYFSLLEAVNHIRLTEAGESVPLTKEQREKLIAL
T
GCAGAGC







AHRTADLSYAKIRKELGVPESQRFNMVTYGKTDSADEA
[SEQ
TCCAACG







EKKTKLKQLRAYHQMRAAFEKAAKGSFVLLTKEQRNAV
ID NO.
CCTCGCT







GQTLSIYKTSDNIRPRLREAGLTEAEIDVAEGLSFSKF
62]
TTTGCGG







GHLSVKACDKIIPFLEQGMKYSEACVAAGYAFRGHEGQ

GGCGTTG







DKQRLLPPLDNDAKDTITSPVVLRAVSQTIKVVNAIIR

TCTCT







ERGGSPTFINIELAREMAKDFSERSQIKREQDSNRARN

[SEQ ID







ERMMERIKTEYGKSSPTGLDLVKLKLYEEQAGVCAYSL

NO. 63]







KQMSLEHLFDPNYAEIDHIIPYSISFDDGYKNKVLVLA









KENRDKGNRLPLEYLNGKRREDFIVWVNSSVRDWRKKQ









NLLKEHVTPEDEAKFKERNLQDTKTASRFLLNYIADNL









AFAPFQTERKKRVTAVNGSVTAYLRKRWGIAKVRANGD









LHHAVDALVIACTTDGLIQKVSRYACYQENRYSEAGGV









IVDSATGEVVAQFPEPWPRFRHELEARLSDDPARAVLG









LGLAHYMTGEIRPRPLFVSRMPRRKVTGAAHKETVKSP









RALDEGQLVTKTPLSALKLGKDGEIPGYYKPESDRLLY









EALKARLRQFGGDGKKAFAEPFHKPKHDGTPGPVVTKV









KLCEPATLSVPVHGGLGAANNDSMVRIDVFHVEGDGYY









FVPIYIADTLKLELPNKACVKIKKISEWKHMKPQDFMF









SLYPNDLFRIVSKKGITLNLVSKESTLPTSVNVSDTLL









YFVSAGIASACLTCRNHDNTYQIESLGIKTLEKLEKYT









VDVLGNVHRVEKEPRMSFSQKGD [SEQ ID NO.









61]







MAD2037
141
DGSQ01000028.1

Clostri-

low
MLPYGIGLDIGITSVGWATVALDENDRPYGIIGMGSRI
GTTAT
TTATACC






diales

methane
FDAAEQPKTGESLAAPRRAARSARRRLRRHRHRNERIR
AGTTC
ATACCAA





bacterium
producing
ALILRENLLSEGQLLHLYDGQLSDVYSLRVKALDERVS
CCTGA
GAACTAT






sheep
NEEFARILIHISQRRGFKSNRKGASSKEDSELLAAISA
TAGTT
GAGGTTG







NQVRMQQQGYRTVAEMYLKDPIYQEHRRNKGGNYIATV
CTTGG
CTATAAT







SRAMVEDEVHQIFTGQRACGNPAATKELEEAYVEILLS
TATGG
AAGGTAG







QRSFDDGPGDGSPYAGSQIERMIGKCQLEKEAGEPRAA
TATAA
TAAACCG







KATYSFEYFSLLAAINNISIISNGQLSPLTKEQREMLI
T
CAGAGCT







ALAHKTSELNYARIRKELGLSEAQRFNTVSYGKMEIAE
[SEQ
CTAACGC







AEKKTKFEHLKAYHKMRREFERIAKGHFASITIEQRNA
ID NO.
CTCACAT







IGDVLSKYKTDAKIRPALREAGLTELDIDAAEALNFSK
65]
TTGTGGG







FGHISIKACKKIIPWLEQGMKYSEACNAAGYNFKGHDG

GCGTTAT







QEKSHLLPPLDEESRNVITSPVALRAISQTIKVVNAII

CTCT







RERGCSPTFINIELAREMSKDFYERIEIKKEQDGNRAK









NERMMERIRTEYGKASPTGQDLVKFKLYEEQGGVCAYS

[SEQ ID







LKQMSLAHLFEPDYAEVDHIVPYSISFDDGYKNKVLVL

NO. 66]







AKENRDKGNRLPLQYLQGKRREDFIAWVNSCVRDYKKR









QRLLKESISEDDLRAFKERNLQDTKTASRFLLNYISDH









LEFTQFATERKKHVTAVNGSVTAYLRKRWGITKIRENG









DLHHAVDALVIACTTDGMIQQVSRFAQHRENQYSLAED









SRFIIDPETGEVIKEFPYPWPRFRQELEARLSSNPGLA









VRDRGFLLYMAESIPVHPLFVSRMPRRKVTGAAHKETI









KSGKAQKDGLLIVKKPLTDLKLDKEGEIANYYNPMSDR









LLYEALKKRLTAFNGDGKKAFADPFYKPKSDGTQGPLV









NKVKLCEPSTLNVSVIGGKGVAENDSMVRIDVFRVEGD









GYYFVPVYVADTVKPELPNKACVANKPYTDWKEMRESD









FLFSLYPNDLLKVTHKKALILTKAQKDSDLPDCKETKS









EMLYFVSASISTASLACRTHDNSYRINSLGIKTLEALE









KYTVDVLGEYHPVRRETRQTFTGRESSGHSGIS [SEQ









ID NO. 64]







MAD2038
141
CACWHR010000008.1

Rumino-

Cattle
MRPYGIGLDIGISSVGWAAIALDHQDSPCGILDMGARI
GTTGT
TTATACC






coccaceae

rumen
FDAAENPKDGASLAAPRREKRSQRRRLRRHRHRNERIR
AGTTC
ATACCAA





bacterium

RMLLKEGLLTEAELTGLFDGALEDIYALRTRALDEALT
CCTGA
GAACGAT







KQEFARVLLHLSQRRGFRSNRRATAAQEDGKLLDAVSE
TCGTT
CAGGTTG







NAKRMADCGYRTVGEMLCRDATFAKHKRNKGGEYLTTV
CTTGG
CTACAAT







SRAMIEDEVKLVFASQRRLGSAFASEALEQGYLDILLS
TATGG
AAGGTAG







QRSFDEGPGGNSPYGGAQIERMIGKCTFYPEEPRAARA
TATAA
TAAACCG







CYSFEYFSLLQKVNHIRLQKDGESTPLTSEQRLQLIEL
T
AAGAGCT







ANKTENLDYARIRRALQIPDAYRFNTVSYRIESDPAAA
[SEQ
CTAACGC







EKKEKFQYLRAYHTMRKAIDGASKGRFALLSQEQRDQI
ID NO.
CCCGTTT







GTVLTLYKSQERISEKLTEAGIEPCDIAALESVSGFSK
68]
CTTTACG







TGHISLRACKELIPYLEQGMNYNEACAAAGIEFHGHSG

GGGCGTT







TERTVVLHPTPDDLADITSPVVRRAVAQTVKVINAVIR

ATCTCT







RYGSPVFVNIELARELAKDFTERKKLEKDNKTNRAENE

[SEQ ID







RLMRRIREEYGKMNPTGLDLVKLRLYEEQAGVCPYSQK

NO. 69]







QMSLQRLFEPNYAEVDHIIPYSISFDDSRRNKVLVLAE









ENRNKGNRLPLQYLTGERRDNFIVWVNSSVRDYRKKQK









LLKPTVTDEDKQQFKERNLQDTKTMSRFLMNYINDHLQ









FGVSAKERKKRVTAVNGIVTSYLRKRWGITKIRGDGDL









HHAVDALVIACATDGMIRQITRYAQYRECRYMQTDTGS









AAIDEATGEVLRIFPYPWEHFRKELEARLSSDPARAVN









ALRLPFYLDSGEPLPKPLFVSRMPRRKVSGAAHKDTVK









SPKAMAEGKVIVRRALTDLKLKNGEIENYFDPGSDRLL









YDALKARLAAFGGDGAKAFREPFYKPRHDGTPGPLVKK









VKLCEPTTLNVAVHGGKGVADNDSMVRIDVFRVEGDGY









YFVPIYIADTLKPVLPNKACVAFKPYSEWRTMDDRDFI









FSLYPNDLIRVTHKSALKLSRVSKESTLPESIESKTAL









LYYVSAGISGAAVSCRNHDNSYEIKSMGIKTLEKLEKY









TVDVLGEYHKVEKERRMPFTGKRS [SEQ ID NO.









67]







MAD2039
141
CACZLL010000017.1

Rumino-

Cattle
MRPYAIGLDIGITSVGWATVALDADESPCGIIGLGSRI
GTTAT
TTATACC






coccaceae

rumen
FDAAEQPKTGESLAAPRRAARGSRRRLRRHRHRNERIR
AGTTC
ATACCAA





bacterium

SLMLEERLISQDELETLFDGRLEDIYALRVKALDEIVS
CCTGA
GAACTAT







RTDFARILLHISQRRGFKSNRKNPTTKEDGVLLAAVNE
TAGTT
TTAGGTT







NKQRMSEHGYRTVGEMFLLDETFKDHKRNKGGNYITTV
CTTGG
ACTATGA







ARDMVADEVRAIFSAQRELGASFASEEFEERYLEILLS
TATGG
TAAGGTT







QRSFDEGPGGNSPYGGSQIERMVGRCTFFPDEPRAAKA
TATAA
TAGTACA







TYSFEYFTLLQKVNHIRIVENGVASKLTDEQRRIIIEL
T
CCTTAGA







AHTTKDVSYAKIRKVLKLSDKQLFNIRYSDNSPAEDSE
[SEQ
GCTCTGA







KKEKLGIMKAYHQMRSAIDRVSKGRFAMMPRAQRNAIG
ID NO.
CGCCTCG







TALSLYKTSDKIRKYLTDAGLDEIDINSADSIGSFSKF
71]
CTTTTGC







GHISVKACDMLIPFLEQGMNYNEACAAAGLNFKGHDAG

GAGGCGT







EKSKLLHPKEEDYEDITSPVVRRAIAQTIKVINAIIRR

TATCTCT







EGCSPTFINIELAREMAKDFRERNRIKKENDDNRAKNE

TTATATT







RLLERIRTEYGKNNPTGLDLVKLRLYEEQSGVCMYSLK

GCCAAAA







QMSLEKLFEPNYAEVDHIVPYSISFDDSRKNKVLVLTE

ATGCAAA







ENRNKGNRLPLQYLKGRRREDFIVWVNNNVKDYRKRRL

TATATCG







LLKEELTAEDESGFKERNLQDTKTMSRFLLNYIADNLE

TACAATG







FAESTRGRKKKVTAVNGAVTAYMRKRWGITKIREDGDC

GTGGC







HHAVDAVVIACTTDAMIRQVSRYAQFRECEYMQTESGS

[SEQ ID







VAVDTGTGEVLRTFPYPWPDFRKELEARLANDPAKVIN

NO. 72]







DLHLPFYMSAGRPLPEPVFVSRMPRRKVTGAAHKDTIK









SARELDNGYLIVKRPLTDLKLKNGEIENYYNPQSDKCL









YDALKNALIEHGGDAKKAFAGEFRKPKRDGTPGPIVKK









VKLLEPTTMCVPVHGGKGAADNDSMVRVDVFLSGGKYY









LVPIYVADTLKPELPNKAVTRGKKYSEWLEMADEDFIF









SLYPNDLICATSKNGITLSVCRKDSTLPPTVESKSFML









YYRGTDISTGSISCITHDNAYKLRGLGVKTLEKLEKYT









VDVLGEYHKVGKEVRQPFNIKRRKACPSEML [SEQ









ID NO. 70]







MAD2040
141
DHKF01000115.1

Clostri-

Feces
MHRYAIGLDIGITSVGWAAIALDAEENPCGMLDFGSRI
GTTGT
TTATACC






diales


FTGAEHPKTGASLAAPRREARGARRRLRRHRHRNERIR
AGTTC
ATACCAA





bacterium

RLMVSGGLISQEQLESLFAGQLEDIYALRTRALDEQVA
CCTGA
GAACTGC





UBA4701

REELARIMLHLSQRRGFRSNRKGGADAEDGKLLEAVGD
TGGTT
TCAGGTT







NKRRMDEKGYRTAGEMFFKDEAFAAHKRNKGGNYIATV
CTTGG
ACTATGA







TRAMTEDEVHRIFAAQRGFGAEYANEKLEAAYLDILLS
TATGG
TAAGGTA







QRSFDEGPGGDSPYGGSQIERMIGTCAFEPDQPRAAKA
TATAA
GTAAACC







AYSFEYFSLLEKLNHIRLVSGGKSEPLTDAQRKKLIEL
T
GAAGAGC







AHKQDTLSYAKIRKELELNEAVRFNSVRYTDDATFEEQ
[SEQ
TCTAATG







EKKEKIVCMKAYHAMRKAVDKNAKGRFAYLTIPQRNEI
ID NO.
CCCCGTC







GRVLSTYKTSAKIEPALAAAGIEPCDIAALEGLSFSKF
74]
TCGCACG







GHLSIKACDKLIPFLEKAMNYNDACAAAGYDFRGHSRD

GGGCATT







GRQMYLPPLGGDCTEITSPVVRRAVSQTIKVINAIIRR

ATCTCTA







YGTSPVYVNIELAREMSKDFAERNKIKKQNDDNRSKNE

ACAGCGA







KIKEQVAEYKHGAATGLDIVKMKLFNEQGGICAYSQRQ

AAAGGCA







MSLERLFDPNYAEVDHIVPYSISFDDRYKNKVLVLTEE

AA







NRNKGNRLPLQYLTGERRDRFIVWVNNSVRDFQKRKLL

[SEQ ID







LKEALTPEEENDWKERNLQDTKFVSSFLLNYINDNLLF

NO. 75]







APSVRRKKRVTAVNGAVTDYMRKRWGISKVREDGDRHH









AVDAVVIACTNDALIQKVSRYESWHERHYMPTENGSIL









VDPATGEIKQTFPYPWAMFRKELEARLSNDPSRAVADL









KLPFYMDADAPPVKPLFVSRMPTRKVTGAAHKDTVKSA









RALADGLAIVRRPLTALKLDKDGEIAGYYNKDSDRLLY









DALKARLTEYGGNAAKAFAEPFYKPKSDGTPGPVVNKV









KLTEPTTLSVPVQDGTGIADNDSMVRIDVFRVVGDGYY









FVPVYVADTLKQELPDRAVVAFKAHSEWKVMSDGDFVF









SLYPNDLVKVTRKKDVILKRSFDNSTLPETIASNECLL









YYAGADISTGAISCVTNDNAYSIRGLGIKTLVSMEKYT









VDILGEYHPVRKEERQRFNTKR [SEQ ID NO. 73]









Example 3
Vector Cloning, MADZYME Library Construction and PCR

The MADzyme coding sequences were cloned into a pUC57 vector with T7-promoter sequence attached to the 5′-end of the coding sequence and a T7-terminator sequence attached to the 3′-end of the coding sequence.


First, Q5 Hot Start 2× master mix reagent (NEB, Ipswich, MA) was used to amplify the MADzyme sequences cloned in the pUC57 vector. The forward primer 5′-TTGGGTAACGCCAGGGTTTT [SEQ ID No. 172] and reverse primer 5′-TGTGTGGAATTGTGAGCGGA [SEQ ID No. 173] amplified the sequences flanking the MADzyme in the pUC57 vector including the T7-promoter and T7-terminator components at the 5′- and 3′-end of the MADzymes, respectively. 1 μM primers were used in a 10 μL PCR reaction using 3.3 μL boiled cell samples as templates in 96 well PCR plates. The PCR conditions shown in Table 2 were used:













TABLE 2







STEP
TEMPERATURE
TIME





















DENATURATION
98° C.
30
SEC



30 CYCLES
98° C.
10
SEC




66° C.
30
SEC




72° C.
3
MIN



FINAL EXTENSION
72° C.
2
MIN



HOLD
12° C.










Example 4
gRNA Construction

Several functional gRNAs associated with each MADzyme was designed by truncating the 5′ region, the 3′ region and the repeat/anti-repeat duplex (see Table 3).














TABLE 3





gRNA 







name
sgRNAv1
sgRNAv2
sgRNAv3
sgRNAv4
sgRNAv5







sgM
GTTTTAGAGCTATGC
GTTTTAGAGCTATGC
GTTTTAGAGCTATGC
GTTTTAGAGCT
NONE


2015
TGTTTTGAATGCTTC
TGTTTTGAATGCTTC
TGTTAACAACATAGC
ATGCAAACATA




CAAAACGAAATGTTG
GTAGCATTCAAAACA
AAGTTAAAATAAGGC
GCAAGTTAAAA




GTAGCATTCAAAACA
ACATAGCAAGTTAAA
TTTGTCCGTTCTCAA
TAAGGCTTTGT




ACATAGCAAGTTAAA
ATAAGGCTTTGTCCG
CTTTTAGTGACGCTG
CCGTTCTCAAC




ATAAGGCTTTGTCCG
TTCTCAACTTTTAGT
TTTCGGCG
TTTTAGTGACG




TTCTCAACTTTTAGT
GACGCTGTTTCGGCG
[SEQ ID NO. 78]
CTGTTTCGGCG




GACGCTGTTTCGGCG
[SEQ ID NO. 77]

[SEQ ID NO.




[SEQ ID NO. 76]


79]






sgM
GTTTTAGAGTCATGT
GTTTTAGAGTCATGT
GTTTTAGAGTCATGT
NONE
NONE


2016
TGTTTAGAATGGTAC
TGTAAAAACAACATA
TGTAAAAACAACATA





CAAAACATCTTTTGG
GCAAGTTAAAATAAG
GCAAGTTAAAATAAG





GACTATTCTAAACAA
GTTTTAACCGTAATC
CGTAATCAACTGTAA





CATAGCAAGTTAAAA
AACTGTAAAGTGGCG
AGTGGCGCTGTTTCG





TAAGGTTTTAACCGT
CTGTTTCGGCGC
GCGC





AATCAACTGTAAAGT
[SEQ ID NO. 81]
[SEQ ID NO. 82]





GGCGCTGTTTCGGCG







C [SEQ ID NO.







80]









sgM
GTTTTAGAGCTGTGC
GTTTTAGAGCTGTGC
GTTTTAGAGCTGTGC
GTTTTAGAGCT
NONE


2017
TGTTTCGAATGGTTC
TGTTTCGAAAAATCG
TGTAAAAACAACACA
GTGCAAACACA




CAAAACGAAATGTTG
AAACAACACAGCGAG
GCGAGTTAAAATAAG
GCGAGTTAAAA




GAACTATTCGAAACA
TTAAAATAAGGCTTT
GCTTTGTCCGTACAC
TAAGGCTTTGT




ACACAGCGAGTTAAA
GTCCGTACACAACTT
AACTTGTAAAAGGGG
CCGTACACAAC




ATAAGGCTTTGTCCG
GTAAAAGGGGCACCC
CACCCGATTCGGGTG
TTGTAAAAGGG




TACACAACTTGTAAA
GATTCGGGTGC
C
GCACCCGATTC




AGGGGCACCCGATTC
[SEQ ID NO. 84]
[SEQ ID NO. 85]
GGGTGC




GGGTGCA


[SEQ ID NO.




[SEQ ID NO. 83]


86]






sgM
GTTTTAGAGCTGTGT
GTTTTAGAGCTGTGT
GTTTTAGAGCTGTGT
NONE
NONE


2019
TGTTTCGAATGGTTC
TGTAAAAACAATACA
TGTAAAAACAATACA





CAAAACGGTTTGAAA
GCAAAGTTAAAATAA
GCAAGTTAAAATAAG





CCATTCGAAACAATA
GGCTAGTCCGTATAC
GCTAGTCCGTATACA





CAGCAAAGTTAAAAT
AACGTGAAAACACGT
ACGTGAAAACACGTG





AAGGCTAGTCCGTAT
GGCACCGATTCGGTG
GCACCGATTCGGTGC





ACAACGTGAAAACAC
C
[SEQ ID NO. 89





GTGGCACCGATTCGG
[SEQ ID NO. 88]






TGC [SEQ ID NO.







87]









sgM
GTTTGCTAGTTATGT
GTTTGCTAGTTATGT
GTTTGCTAGTTATGT
NONE
NONE


2020
TATTTATAGTATTAA
TATAAAAATAACATA
TATAAAAATAACATA





GCAAACTGTAAATAA
ACGAGTGCAAATAAG
ACGAGTGCAAATAAG





CATAACGAGTGCAAA
CGTTTCGCGAAAATT
CGTTTCGCGAAAATT





TAAGCGTTTCGCGAA
TACAGTGGCCCTGCT
TACAGTGGCCCTGCT





AATTTACAGTGGCCC
GTGGGGCCTTTTTTA
GTGGGGCC





TGCTGTGGGGCCTTT
TTTATCAAA
[SEQ ID NO. 92]





TTTATTTATCAAA
[SEQ ID NO. 91]






[SEQ ID NO. 90]









sgM
GTTTGAGAGCCTTGT
NONE
NONE
NONE
NONE


2021
AAAACCGTATATCTC







TCAAGCGAAAGATAA







TGTTTTACAAGGCGA







GTTCAAATAAGGATT







TATCCGAAATCGCTT







GCGTGCATTGGCACC







ATCTATCTTTTAAGA







CTTTCTTTGAAAGTC







TT [SEQ ID NO.







93]









sgM
GTTTGAGAGTCTTGT
GTTTGAGAGTCTTGT
GTTTGAGAGTCTTGT
GTTTGAGAGTC
NONE


2022
TAATTCTTAAAGGTG
AAAAACAAGACGAGT
AAAAACAAGACGAGT
TTGTTAATTCA




TAAAACGAGAATTAA
GCAAATAAGGTTTAT
GCAAATAAGGTTTAT
AAAGAATTAAC




CAAGACGAGTGCAAA
CCGGAATCGTCAATA
CCGGAATCGTCAATA
AAGACGAGTGC




TAAGGTTTATCCGGA
TGACCTGCATTGTGC
TGACCTGCATTGTGC
AAATAAGGTTT




ATCGTCAATATGACC
AGAATCTTTAAAATC
AG [SEQ ID NO.
ATCCGGAATCG




TGCATTGTGCAGAAT
ATATGATTTCATATG
96]
TCAATATGACC




CTTTAAAATCATATG
GTTTTA [SEQ ID

TGCATTGTGCA




ATTTCATATGGTTTT
NO. 95]

GAATCTTTAAA




A [SEQ ID NO.


ATCATATGATT




94]


TCATATGGTTT







TA [SEQ ID







NO. 97]






sgM
GTTTGAGAGTAGTGT
NONE
NONE
NONE
NONE


2023
AAATCCATAGGGGTC







TCAAACGAAAAGACC







CCTATGGATTTACAT







TGCGAGTTCAAATAA







AAGTTTACTCAAATC







GTTGGCTTGACCAAC







CGCACAGCGTGTGCT







TAAAGATCTCTTCAG







TGAGGTC [SEQ ID







NO. 98]









sgM
GTTTGAGAGTAGTGT
NONE
NONE
NONE
NONE


2024
AAATCCAGAGGGCTC







CAAAACGAGCCCTCT







GGATTTACACTACGA







GTTCAAATAAAAATT







ATTTCAAATCGCCGC







TATGTCGGCCGCACA







GTGTGTGCATTAAGA







AAAGTCCGAAAGGGC







[SEQ ID NO. 99]









sgM
GTTTGAGAGTAGTGT
GTTTGAGAGTAGTGT
GTTTGAGAGTAGTGT
GTTTGAGAGTA
NONE


2025
AAATTTATAGGGTAG
AAAAATACACTACGA
AAAAATACACTACGA
GTGTAAATTTA




TAAAACAAATTTTAC
GTTCAAATAAAAATT
GTTCAAATAAAAATT
TAGGAAAACCT




TACCCTATAAATTTA
ATTTCAAATCGTACT
ATTTCAAATCGTACT
ATAAATTTACA




CACTACGAGTTCAAA
TTTTAGTACCTTCAC
TTTTAGTACCTTCAC
CTACGAGTTCA




TAAAAATTATTTCAA
AAGTGTTGTGAATAT
AAGTGTTGTGAA
AATAAAAATTA




ATCGTACTTTTTAGT
TAACTCACCTTCGGG
[SEQ ID NO.
TTTCAAATCGT




ACCTTCACAAGTGTT
TGAG [SEQ ID
102]
ACTTTTTAGTA




GTGAATATTAACTCA
NO. 101]

CCTTCACAAGT




CCTTCGGGTGAG


GTTGTGAATAT




[SEQ ID NO.


TAACTCACCTT




100]


CGGGTGAG







[SEQ ID NO. 







103]






sgM
GTTTGAGAGTAGTGT
NONE
NONE
NONE
NONE


2026
AATTTCATATGGTAG







TCAAACGACTACCAT







ATGAGATTACACTAC







ACGGTTCAAATAAAG







AATGTTCGAAACCGC







CCTTTGGGGCCCGCT







TGTTGCGGATTTACA







GACTTGATATCAAGT







CTG [SEQ ID NO.







104]









sgM
GTTTGAGAGTAATGT
GTTTGAGAGTAATGT
GTTTGAGAGTAATGT
GTTTGAGAGTA
NONE


2027
AAATTCATAGGATGG
AAAAATACATTACAA
AAAAATACATTACAA
ATGTAAATTCA




TAAAACGAAATTTAC
GTTCAAATAAAAATT
GTTCAAATAAAAATT
TAAAAGTGAGT




CATCCAGTGAGTTTA
TATTCAACCCGTTCT
TATTCAACCCGTTCT
TTACATTACAA




CATTACAAGTTCAAA
TCGGAACCTCCACCG
TCGGAACCTCCACCG
GTTCAAATAAA




TAAAAATTTATTCAA
TGTGGAACATTAAGG
TGTGGA [SEQ ID
AATTTATTCAA




CCCGTTCTTCGGAAC
TCTGCTTTGCAGGCC
NO. 107]
CCCGTTCTTCG




CTCCACCGTGTGGAA
[SEQ ID NO.

GAACCTCCACC




C [SEQ ID NO.
106]

GTGTGGAACAT




105]


TAAG [SEQ







ID NO. 108]






sgM
GTTTGAGAGCAGTGT
NONE
NONE
NONE
NONE


2028
TGTCTTATATAGCTC







GAAAACGCATTGTAA







GACAACACTGCTACG







TTCAAATAAGCATAT







TGCTACAAGGTTCTC







CCTCGGAGAATGACC







ATTAGGTCACTTAGA







TAGCCGGTTCTTCTG







GCTA [SEQ ID







NO. 109]









sgM
GTTTGAGAGCAGTGT
GTTTGAGAGCAGTGT
GTTTGAGAGCAGTGT
GTTTGAGAGCA
NONE


2029
TGTCTTATATAGCTC
AAAAACACTGCTACG
AAAAACACTGCTACG
GTGTTGTCAAA




GAAAACGCATTGTAA
TTCAAATAAGCATAT
TTCAAATAAGCATAT
AGACAACACTG




GACAACACTGCTACG
TGCTACAAGGTTCTC
TGCTACAAGGTTCTC
CTACGTTCAAA




TTCAAATAAGCATAT
CATTGGAGAATGACC
CATTGGAGAATGACC
TAAGCATATTG




TGCTACAAGGTTCTC
ATTAGGTCGCTTAGA
ATTAGGTC [SEQ
CTACAAGGTTC




CATTGGAGAATGACC
TAGCCAGTTCTTCTG
ID NO. 112]
TCCATTGGAGA




ATTAGGTCGCTTAGA
GCTA [SEQ ID

ATGACCATTAG




TAGCCAGTTCTTCTG
NO. 111]

GTCGCTTAGAT




GCTA [SEQ ID


AGCCAGTTCTT




NO. 110]


CTGGCTA







[SEQ ID NO. 







113]






sgM
GTTTGAGAGCAGTGT
NONE
NONE
NONE
NONE


2030
TGTCTTAAATAGCTC







GAAAACGCATTGTAA







GACAACACTGCACGT







TCAAATAAGCAGATT







GCTACAAGGTTCCCG







TAAGGGAATGACCAT







CTGGTCACATGAATA







GCCCCCGGCAACGGT







GGCTG [SEQ ID







NO. 114]









sgM
ATTGTACCATAGCGA
NONE
NONE
NONE
NONE


2031
GTTAAATTAGGGAAT







TACAACGAAATTGTA







ATAACCTATTTTACC







TCGCTATGGCACAAT







TTGTTATTACATGGA







CATTATACTAAACAT







TTCCTAAAAAAGCAA







CGAAAAACGTGCT







[SEQ ID NO.







115]









sgM
GTTGTAGTTCCCTAA
GTTGTAGTTCCCTAA
GTTGTAGTTCCCTAA
GTTGTAGTTCC
NONE


2032
TTATTCTTGGTATGG
TTATTCTTGGTAAAA
TTATTCTTGGTAAAA
CTAATTATTCT




TATAATGAAAATTGT
ACCAAGAACAATTAG
ACCAAGAACAATTAG
TGGTATGGTAA




ATCATACCAAGAACA
GTTACTATGATAAGG
GTTACTATGATAAGG
AAATATCATAC




ATTAGGTTACTATGA
TAGTATACCGCAAAG
TAGTATACCGCAAAG
CAAGAACAATA




TAAGGTAGTATACCG
CTCTAACACCTCATC
CTCTAACACCTCATC
GGTTACTATGA




CAAAGCTCTAACACC
TTCGGATGAGGTGTT
TTCGGATGAG [SEQ
TAAGGTAGTAT




TCATCTTCGGATGAG
A [SEQ ID NO.
ID NO. 118]
ACCGCAAAGCT




GTGTTATCT [SEQ
117]

CTAACACCTCA




ID NO. 116]


TCTTCGGATGA







GGTGTTATCT







[SEQ ID NO. 







119]






sgM
GTTGTAGTTCCCTAA
GTTGTAGTTCCCTAA
GTTGTAGTTCCCTAA
GTTGTAGTTCC
NONE


2033
CAGTTCTTGGTATGG
CAGTTCTAAAAAGAA
CAGTTCTAAAAAGAA
CTAACAGTAAA




TATAATAAAAATTAT
CTGTTATGGTTGCTA
CTGTTATGGTTGCTA
AACTGTTATGG




ACCATACCAAGAACT
TGATAAGGTCTTAGC
TGATAAGGTCTTAGC
TTGCTATGATA




GTTATGGTTGCTATG
ACCGTAAAGCTCTGA
ACCGTAAAGCTCTGA
AGGTCTTAGCA




ATAAGGTCTTAGCAC
CGCCTCGCTTTCAGC
CGCCTCGCTTTCAGC
CCGTAAAGCTC




CGTAAAGCTCTGACG
GGGGCGTCA [SEQ
GGGG [SEQ ID
TGACGCCTCGC




CCTCGCTTTCAGCGG
ID NO. 121]
NO. 122]
TTTCAGCGGGG




GGCGTCATCTTTTTT


CGTCA




GCCCAAAAGACACGG


[SEQ ID NO. 




ATATTTTT [SEQ


123]




ID NO. 120]









sgM
GTTGTAGTTCCCTAA
GTTGTAGTTCCCTAA
GTTGTAGTTCCCTAA
GTTGTAGTTCC
NONE


2034
CGGTTCTTGGTATGG
CGGTACTGTTGGGTT
CGGTACTGTTGGGTT
CTAACGGTTCT




TATAATGAATTATAC
ACTACAATAAGGTAG
ACTACAATAAGGTAG
TGAAAACAAGA




CATACCAAGAACTGT
TAAACCGAAAAGCTC
TAAACCGAAAAGCTC
ACTGTTGGGTT




TGGGTTACTACAATA
TGACGTCTTGTTTGC
TGACGTCTTGTTTGC
ACTACAATAAG




AGGTAGTAAACCGAA
GCAGGACGTCATCTT
GCAGGACGTCATCTT
GTAGTAAACCG




AAGCTCTGACGTCTT
TATATCAGACGGATG
T [SEQ ID NO.
AAAAGCTCTGA




GTTTGCGCAGGACGT
[SEQ ID NO.
126]
CGTCTTGTTTG




CATCTTTATATCAGA
125]

CGCAGGACGTC




CGGATG [SEQ ID


ATCTTTATATC




NO. 124]


AGACGGATG







[SEQ ID NO. 







127]






sgM
GTTGTAGTCCCCTGA
NONE
NONE
NONE
NONE


2035
TGGTTTCTGGAATGG







TATAATGAAATTATA







CCATTCCAGAAACTA







TTATGGTCACTACAA







TAAGGTATTAGACCG







TAGAGCACTAACACC







CCATTTGGGGTGTTA







TCTCTTTAAACTGTC







CAAAATTTAGTATTG







CAATTATTGA [SEQ







ID NO. 128]









sgM
GTTATAGTTCCCTGT
NONE
NONE
NONE
NONE


2036
TCGTTCTTGGTATGG







TATAATGAAATTATA







CCATACCAAGAACGA







AGCAGGTTACTATGA







TAAGGTAGTATACCG







CAGAGCTCCAACGCC







TCGCTTTTGCGGGGC







GTTGTCTCT [SEQ







ID NO. 128]









sgM
GTTATAGTTCCCTGA
NONE
NONE
NONE
NONE


2037
TAGTTCTTGGTATGG







TATAATGAAATTATA







CCATACCAAGAACTA







TGAGGTTGCTATAAT







AAGGTAGTAAACCGC







AGAGCTCTAACGCCT







CACATTTGTGGGGCG







TTATCTCT [SEQ







ID NO. 129]









sgM
GTTGTAGTTCCCTGA
NONE
NONE
NONE
NONE


2038
TCGTTCTTGGTATGG







TATAATGAAATTATA







CCATACCAAGAACGA







TCAGGTTGCTACAAT







AAGGTAGTAAACCGA







AGAGCTCTAACGCCC







CGTTTCTTTACGGGG







CGTTATCTCT [SEQ







ID NO. 130]









sgM
GTTATAGTTCCCTGA
GTTATAGTTCCCTGA
GTTATAGTTCCCTGA
GTTATAGTTCC
GTTATAGTTC


2039
TAGTTCTTGGTATGG
TAGTTCTTGGTATGG
TAGTTCTTAACCAAG
CTGATAGTTCT
CCTGATAGTT



TATAATGAATTATAC
TATAATGAATTATAC
AACTATTTAGGTTAC
TGCAAGAACTA
CTTGCAAGAA



CATACCAAGAACTAT
CATACCAAGAACTAT
TATGATAAGGTTTAG
TTTAGGTTACT
CTATTTAGGT



TTAGGTTACTATGAT
TTAGGTTACTATGAT
TACACCTTAGAGCTC
ATGATAAGGTT
TACTATGATA



AAGGTTTAGTACACC
AAGGTTTAGTACACC
TGACGCCTCGCTTTT
TAGTACACCTT
AGGTTTAGTA



TTAGAGCTCTGACGC
TTAGAGCTCTGACGC
GCGAGGCGTTATCTC
AGAGCTCTGAC
CACCTTAGAG



CTCGCTTTTGCGAGG
CTCGCTTTTGCGAGG
T [SEQ ID NO.
GCCTCGCTTTT
CTCTGACGCC



CGTTATCTCTTTATA
CGTTATCTCT [SEQ
133]
GCGAGGCGTTA
AAAAGGCGTT



TTGCCAAAAATGCAA
ID NO. 132]

TCTCT
ATCTCT



ATATATCGTACAATG


[SEQ ID
[SEQ ID



GTGGC [SEQ ID


NO. 134] 
NO. 135]



NO. 131]









sgM
GTTGTAGTTCCCTGA
NONE
GTTGTAGTTCCCTGA
GTTGTAGTTCC
NONE


2040
TGGTTCTTGGTATGG

TGGTTCTTGAAAAAG
CTGATGGTTCT




TATAATAAATTATAC

AACTGCTCAGGTTAC
TGAAAAAGAAC




CATACCAAGAACTGC

TATGATAAGGTAGTA
TGCTCAGGTTA




TCAGGTTACTATGAT

AACCGAAGAGCTCTA
CTATGATAAGG




AAGGTAGTAAACCGA

ATGCCCCGTCTCGCA
TAGTAAACCGA




AGAGCTCTAATGCCC

CGGGGCATTATCTCT
AGAGCTCTAAT




CGTCTCGCACGGGGC

[SEQ ID NO.
GCCAAAGGGCA




ATTATCTCT [SEQ

137]
TTATCTCT




ID NO. 136]


[SEQ ID NO. 







138]









To find the optimal gRNA length, different lengths of spacer, repeat:anti-repeat duplex and 3′ end of the tracrRNA were included. These gRNAs were then synthesized as a single stranded DNA downstream of the T7 promoter (see Table 4). These sgRNAs were amplified using two primers (5′-AAACCCCTCCGTTTAGAGAG [SEQ ID NO. 174] and 5′-AAGCTAATACGACTCACTATAGGCCAGTC [SEQ ID NO. 175]) and 1 uL of 10 uM diluted single stranded DNA as a template in 25 uL PCR reactions for each sgRNA according to the conditions of Table 5.










TABLE 4





Name
Sequence







sg M201
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGCGCCGAAACAGCGCCACTTTACAGTTGATTACGGT


6v1
TAAAACCTTATTTTAACTTGCTATGTTGTTTAGAATAGTCCCAAAAGATGTTTTGGTACCATTCTAAACAA



CATGACTCTAAAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 139]





sg M201
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGCGCCGAAACAGCGCCACTTTACAGTTGATTACGGT


6v2
TAAAACCTTATTTTAACTTGCTATGTTGTTTTTACAACATGACTCTAAAACCCAGTAACATTACTGACTGG



CCTATAGTGAGTCGTATTA [SEQ ID NO. 140]





sg M201
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGCGCCGAAACAGCGCCACTTTACAGTTGATTACGCT


6v3
TATTTTAACTTGCTATGTTGTTTTTACAACATGACTCTAAAACCCAGTAACATTACTGACTGGCCTATAGT



GAGTCGTATTA [SEQ ID NO. 141]





sg M201
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGCACCGAATCGGTGCCACGTGTTTTCACGTTGTATA


9v1
CGGACTAGCCTTATTTTAACTTTGCTGTATTGTTTCGAATGGTTTCAAACCGTTTTGGAACCATTCGAAAC



AACACAGCTCTAAAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO.



142]





sg M201
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGCACCGAATCGGTGCCACGTGTTTTCACGTTGTATA


9v2
CGGACTAGCCTTATTTTAACTTTGCTGTATTGTTTTTACAACACAGCTCTAAAACCCAGTAACATTACTGA



CTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 143]





sg M201
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGCACCGAATCGGTGCCACGTGTTTTCACGTTGTATA


9v3
CGGACTAGCCTTATTTTAACTTGCTGTATTGTTTTTACAACACAGCTCTAAAACCCAGTAACATTACTGAC



TGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 144]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATTTGATAAATAAAAAAGGCCCCACAGCAGGGCCACT


0v1
GTAAATTTTCGCGAAACGCTTATTTGCACTCGTTATGTTATTTACAGTTTGCTTAATACTATAAATAACAT



AACTAGCAAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 145]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATTTGATAAATAAAAAAGGCCCCACAGCAGGGCCACT


0v2
GTAAATTTTCGCGAAACGCTTATTTGCACTCGTTATGTTATTTTTATAACATAACTAGCAAACCCAGTAAC



ATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 146]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGGCCCCACAGCAGGGCCACTGTAAATTTTCGCGAAA


0v3
CGCTTATTTGCACTCGTTATGTTATTTTTATAACATAACTAGCAAACCCAGTAACATTACTGACTGGCCTA



TAGTGAGTCGTATTA [SEQ ID NO. 147]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATAAAACCATATGAAATCATATGATTTTAAAGATTCT


2v1
GCACAATGCAGGTCATATTGACGATTCCGGATAAACCTTATTTGCACTCGTCTTGTTAATTCTTTTGAATT



AACAAGACTCTCAAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO.



148]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATAAAACCATATGAAATCATATGATTTTAAAGATTCT


2v2
GCACAATGCAGGTCATATTGACGATTCCGGATAAACCTTATTTGCACTCGTCTTGTTTTTACAAGACTCTC



AAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 149]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTACTGCACAATGCAGGTCATATTGACGATTCCGGATAA


2v3
ACCTTATTTGCACTCGTCTTGTTTTTACAAGACTCTCAAACCCAGTAACATTACTGACTGGCCTATAGTGA



GTCGTATTA [SEQ ID NO. 150]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGCTCACCCGAAGGTGAGTTAATATTCACAACACTTGTGAA


5v1
GGTACTAAAAAGTACGATTTGAAATAATTTTTATTTGAACTCGTAGTGTAAATTTATAGGTTTTCCTATAA



ATTTACACTACTCTCAAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO.



151]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTACTCACCCGAAGGTGAGTTAATATTCACAACACTTGT


5v2
GAAGGTACTAAAAAGTACGATTTGAAATAATTTTTATTTGAACTCGTAGTGTATTTTTACACTACTCTCAA



ACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 152]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATTCACAACACTTGTGAAGGTACTAAAAAGTACGATT


5v3
TGAAATAATTTTTATTTGAACTCGTAGTGTATTTTTACACTACTCTCAAACCCAGTAACATTACTGACTGG



CCTATAGTGAGTCGTATTA [SEQ ID NO. 153]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGGCCTGCAAAGCAGACCTTAATGTTCCACACGGTGG


7v1
AGGTTCCGAAGAACGGGTTGAATAAATTTTTATTTGAACTTGTAATGTAAACTCACTTTTATGAATTTACA



TTACTCTCAAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 154]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGGCCTGCAAAGCAGACCTTAATGTTCCACACGGTGG


7v2
AGGTTCCGAAGAACGGGTTGAATAAATTTTTATTTGAACTTGTAATGTATTTTTACATTACTCTCAAACCC



AGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 155]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATCCACACGGTGGAGGTTCCGAAGAACGGGTTGAATA


7v3
AATTTTTATTTGAACTTGTAATGTATTTTTACATTACTCTCAAACCCAGTAACATTACTGACTGGCCTATA



GTGAGTCGTATTA [SEQ ID NO. 156]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATAGCCAGAAGAACTGGCTATCTAAGCGACCTAATGG


9v1
TCATTCTCCAATGGAGAACCTTGTAGCAATATGCTTATTTGAACGTAGCAGTGTTGTCTTTTGACAACACT



GCTCTCAAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 157]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATAGCCAGAAGAACTGGCTATCTAAGCGACCTAATGG


9v2
TCATTCTCCAATGGAGAACCTTGTAGCAATATGCTTATTTGAACGTAGCAGTGTTTTTACACTGCTCTCAA



ACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 158]





sg M202
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGACCTAATGGTCATTCTCCAATGGAGAACCTTGTAG


9v3
CAATATGCTTATTTGAACGTAGCAGTGTTTTTACACTGCTCTCAAACCCAGTAACATTACTGACTGGCCTA



TAGTGAGTCGTATTA [SEQ ID NO. 159]





sg M203
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAAGATAACACCTCATCCGAAGATGAGGTGTTAGAGCT


2v1
TTGCGGTATACTACCTTATCATAGTAACCTAATTGTTCTTGGTATGATATTTTTACCATACCAAGAATAAT



TAGGGAACTACAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 160]





sg M203
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATAACACCTCATCCGAAGATGAGGTGTTAGAGCTTTG


2v2
CGGTATACTACCTTATCATAGTAACCTAATTGTTCTTGGTTTTTACCAAGAATAATTAGGGAACTACAACC



CAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 161]





sg M203
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTACTCATCCGAAGATGAGGTGTTAGAGCTTTGCGGTAT


2v3
ACTACCTTATCATAGTAACCTAATTGTTCTTGGTTTTTACCAAGAATAATTAGGGAACTACAACCCAGTAA



CATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 162]





sg M203
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATGACGCCCCGCTGAAAGCGAGGCGTCAGAGCTTTAC


3v1
GGTGCTAAGACCTTATCATAGCAACCATAACAGTTTTTACTGTTAGGGAACTACAACCCAGTAACATTACT



GACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 163]





sg M203
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATGACGCCCCGCTGAAAGCGAGGCGTCAGAGCTTTAC


3v2
GGTGCTAAGACCTTATCATAGCAACCATAACAGTTCTTTTTAGAACTGTTAGGGAACTACAACCCAGTAAC



ATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 164]





sg M203
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTACCCCGCTGAAAGCGAGGCGTCAGAGCTTTACGGTGC


3v3
TAAGACCTTATCATAGCAACCATAACAGTTCTTTTTAGAACTGTTAGGGAACTACAACCCAGTAACATTAC



TGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 165]





sg M203
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTACATCCGTCTGATATAAAGATGACGTCCTGCGCAAAC


4v1
AAGACGTCAGAGCTTTTCGGTTTACTACCTTATTGTAGTAACCCAACAGTTCTTGTTTTCAAGAACCGTTA



GGGAACTACAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 166]





sg M203
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTACATCCGTCTGATATAAAGATGACGTCCTGCGCAAAC


4v2
AAGACGTCAGAGCTTTTCGGTTTACTACCTTATTGTAGTAACCCAACAGTACCGTTAGGGAACTACAACCC



AGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 167]





sg M203
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAAAAGATGACGTCCTGCGCAAACAAGACGTCAGAGCT


4v3
TTTCGGTTTACTACCTTATTGTAGTAACCCAACAGTACCGTTAGGGAACTACAACCCAGTAACATTACTGA



CTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 168]





sg M203
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAAGAGATAACGCCTCGCAAAAGCGAGGCGTCAGAGCT


9v1
CTAAGGTGTACTAAACCTTATCATAGTAACCTAAATAGTTCTTGCAAGAACTATCAGGGAACTATAACCCA



GTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 169]





sg M203
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAAGAGATAACGCCTTTTGGCGTCAGAGCTCTAAGGTG


9v2
TACTAAACCTTATCATAGTAACCTAAATAGTTCTTGCAAGAACTATCAGGGAACTATAACCCAGTAACATT



ACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 170]





sg M203
AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAAGAGATAACGCCTCGCAAAAGCGAGGCGTCAGAGCT


9v3
CTAAGGTGTACTAAACCTTATCATAGTAACCTAAATAGTTCTTGGTTAAGAACTATCAGGGAACTATAACC



CAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 171]




















TABLE 5







STEP
TEMPERATURE
TIME





















DENATURATION
98° C.
30
SEC



12 CYCLES
98° C.
10
SEC




66° C.
30
SEC




72° C.
2
MIN



FINAL EXTENSION
72° C.
2
MIN



HOLD
12° C.










The target library was designed based on an assumption that the eight randomized NNNNNNNN [SEQ ID NO. 176] PAMs of these nucleases reside on the 3′ end of the target sequence (5′-CCAGTCAGTAATGTTACTGG [SEQ ID NO. 177]).


Example 5
In Vitro Transcription and Translation for Production of MAD Nucleases and gRNAs

The MADZYMEs were tested for activity by in vitro transcription and translation (txtl). Both the gRNA plasmid and nuclease plasmid were included in each txtl reaction. A PURExpress® In Vitro Protein Synthesis Kit (NEB, Ipswich, Mass.) was used to produce MADzymes from the PCR-amplified MADZYME library and also to produce the gRNA libraries. In each well in a 96-well plate, the reagents listed in Table 6 were mixed to start the production of MADzymes and gRNAs:












TABLE 6







REAGENTS
VOLUME (μl)




















1
SolA (NEB kit)
10



2
SolB (NEB kit)
7.5



3
PCR amplified gRNA
0.4



4
Murine RNase inhibitor (NEB)
0.5



5
Water
3.0



6
PCR amplified T7 MADZYMEs
3.6










A master mix with all reagents was mixed on ice with the exception of the PCR-amplified T7-MADZYMEs to cover enough 96-well plates for the assay. After 21 μL of the master mix was distributed in each well in 96 well plates, 4 μL of the mixture of PCR amplified MADZYMEs and gRNA under the control of T7 promoter was added. The 96-well plates were sealed and incubated for 4 hrs at 37° C. in a thermal cycler. The plates were kept at room temperature until the target pool was added to perform the target depletion reaction.


After 4 hours incubation to allow production of the MADzymes and gRNAs, 4 μL of the target library pool (10 ng/μL) was added to the 10 μL aliquots of in vitro transcription/translation reaction mixture and allowed to deplete for 30 min, 3 hrs or overnight at 37° C. and 48° C. The target depletion reaction mixtures were diluted into PCR-grade water that contains RNAse A incubated for 5 min at room temperature. Proteinase K was then added and the mixtures were incubated for 5 min at 55° C. RNAseA/Proteinase K treated samples were purified with DNA purification kits and the purified DNA samples were then amplified and sequenced. The PCR conditions are shown in Table 7:













TABLE 7







STEP
TEMPERATURE
TIME





















DENATURATION
98° C.
30
SEC



 4 CYCLES
98° C.
10
SEC




66° C.
30
SEC




72° C.
20
SEC



12 CYCLES
98° C.
10
SEC




72° C.
20
SEC



FINAL EXTENSION
72° C.
2
MINUTES



HOLD
12° C.










Example 6
Measurement of Nicked Plasmid with Nickase RNP Complexes

Proteins were produced in vitro under a PURExpress® In Vitro Protein Synthesis Kit (NEB, Ipswich, Mass.). Guide RNAs that target the target plasmid were also produced under a T7 promoter in the same mixture. The MADzyme Nickase or Nuclease and guide complexes (RNP complex) formed as they were produced in the in vitro transcription and translation reagent. Supercoiled plasmid target was diluted into the digestion buffer, then the RNP complex was added to the same digestion buffer to initiate the plasmid digestion. After incubation at 37° C. to allow digestion of the plasmid, the resulting mixtures were treated with RNAase and Proteinase K, then the target plasmid was purified with a PCR cleanup kit, and run on TAE-agarose gel to observe the formation of nicked or double stand cut plasmid. The results are shown in FIG. 7. Table 8 lists the identified MADzyme nickases, including the variations from the nuclease sequence in Table 1 and the amino acid sequence.











TABLE 8





MAD




zyme
SEQ



Nickase
ID



Name
NO
Amino Acid Sequence







MAD2016-
178
MKKDYVIGLDIGTNSVGWAVMTEDYQLVKKKMPIYGNTEKKKIKKNFWGVRLFEEGHTAEDRR


H851A

LKRTARRIISRRRNRLRYLQAFFEEAMTDLDENFFARLQESFLVPEDKKWHRHPIFAKLEDEV




AYHETYPTIYHLRKKLADSSEQADLRLIYLALAHIVKYRGHFLIEGKLSTENISVKEQFQQFM




IIYNQTFVNGESRLVSAPLPESVLIEEELTEKASRTKKSEKVLQQFPQEKANGLFGQFLKLMV




GNKADFKKVFGLEEEAKITYASESYEEDLEGILAKVGDEYSDVFLAAKNVYDAVELSTILADS




DKKSHAKLSSSMIVRFTEHQEDLKKFKRFIRENCPDEYDNLFKNEQKDGYAGYIAHAGKVSQL




KFYQYVKKIIQDIAGAEYFLEKIAQENFLRKQRTFDNGVIPHQIHLAELQAIIHRQAAYYPFL




KENQEKIEQLVTFRIPYYVGPLSKGDASTFAWLKRQSEEPIRPWNLQETVDLDQSATAFIERM




TNFDTYLPSEKVLPKHSLLYEKFMVFNELTKISYTDDRGIKANFSGKEKEKIFDYLFKTRRKV




KKKDIIQFYRNEYNTEIVTLSGLEEDQFNASFSTYQDLLKCGLTRAELDHPDNAEKLEDIIKI




LTIFEDRQRIRTQLSTFKGQFSAEVLKKLERKHYTGWGRLSKKLINGIYDKESGKTILGYLIK




DDGVSKHYNRNFMQLINDSQLSFKNAIQKAQSSEHEETLSETVNELAGSPAIKKGIYQSLKIV




DELVAIMGYAPKRIVVEMARENQTTSTGKRRSIQRLKIVEKAMAEIGSNLLKEQPTTNEQLRD




TRLFLYYMQNGKDMYTGDELSLHRLSHYDIDAIIPQSFMKDDSLDNLVLVGSTENRGKSDDVP




SKEVVKDMKAYWEKLYAAGLISQRKFQRLTKGEQGGLTLEDKAHFIQRQLVETRQITKNVAGI




LDQRYNANSKEKKVQIITLKASLTSQFRSIFGLYKVREVNDYHHGQDAYLNCVVATTLLKVYP




NLAPEFVYGEYPKFQTFKENKATAKAIIYTNLLRFFTEDEPRFTKDGEILWSNSYLKTIKKEL




NYHQMNIVKKVEVQKGGFSKESIKPKGPSNKLIPVKNGLDPQKYGGFDSPIVAYTVLFTHEKG




KKPLIKQEILGITIMEKTRFEQNPILFLEEKGFLRPRVLMKLPKYTLYEFPEGRRRLLASAKE




AQKGNQMVLPEHLLTLLYHAKQCLLPNQSESLTYVEQHQPEFQEILERVVDFAEVHTLAKSKV




QQIVKLFEANQTADVKEIAASFIQLMQFNAMGAPSTFKFFQKDIERARYTSIKEIFDATIIYQ




STTGLYETRRKVVD





MAD2016-
179
MKKDYVIGLDIGTNSVGWAVMTEDYQLVKKKMPIYGNTEKKKIKKNFWGVRLFEEGHTAEDRR


N874A

LKRTARRIISRRRNRLRYLQAFFEEAMTDLDENFFARLQESFLVPEDKKWHRHPIFAKLEDEV




AYHETYPTIYHLRKKLADSSEQADLRLIYLALAHIVKYRGHFLIEGKLSTENISVKEQFQQFM




IIYNQTFVNGESRLVSAPLPESVLIEEELTEKASRTKKSEKVLQQFPQEKANGLFGQFLKLMV




GNKADFKKVFGLEEEAKITYASESYEEDLEGILAKVGDEYSDVFLAAKNVYDAVELSTILADS




DKKSHAKLSSSMIVRFTEHQEDLKKFKRFIRENCPDEYDNLFKNEQKDGYAGYIAHAGKVSQL




KFYQYVKKIIQDIAGAEYFLEKIAQENFLRKQRTFDNGVIPHQIHLAELQAIIHRQAAYYPFL




KENQEKIEQLVTFRIPYYVGPLSKGDASTFAWLKRQSEEPIRPWNLQETVDLDQSATAFIERM




TNFDTYLPSEKVLPKHSLLYEKFMVFNELTKISYTDDRGIKANFSGKEKEKIFDYLFKTRRKV




KKKDIIQFYRNEYNTEIVTLSGLEEDQFNASFSTYQDLLKCGLTRAELDHPDNAEKLEDIIKI




LTIFEDRQRIRTQLSTFKGQFSAEVLKKLERKHYTGWGRLSKKLINGIYDKESGKTILGYLIK




DDGVSKHYNRNFMQLINDSQLSFKNAIQKAQSSEHEETLSETVNELAGSPAIKKGIYQSLKIV




DELVAIMGYAPKRIVVEMARENQTTSTGKRRSIQRLKIVEKAMAEIGSNLLKEQPTTNEQLRD




TRLFLYYMQNGKDMYTGDELSLHRLSHYDIDHIIPQSFMKDDSLDNLVLVGSTEARGKSDDVP




SKEVVKDMKAYWEKLYAAGLISQRKFQRLTKGEQGGLTLEDKAHFIQRQLVETRQITKNVAGI




LDQRYNANSKEKKVQIITLKASLTSQFRSIFGLYKVREVNDYHHGQDAYLNCVVATTLLKVYP




NLAPEFVYGEYPKFQTFKENKATAKAIIYTNLLRFFTEDEPRFTKDGEILWSNSYLKTIKKEL




NYHQMNIVKKVEVQKGGFSKESIKPKGPSNKLIPVKNGLDPQKYGGFDSPIVAYTVLFTHEKG




KKPLIKQEILGITIMEKTRFEQNPILFLEEKGFLRPRVLMKLPKYTLYEFPEGRRRLLASAKE




AQKGNQMVLPEHLLTLLYHAKQCLLPNQSESLTYVEQHQPEFQEILERVVDFAEVHTLAKSKV




QQIVKLFEANQTADVKEIAASFIQLMQFNAMGAPSTFKFFQKDIERARYTSIKEIFDATIIYQ




STTGLYETRRKVVD





MAD2032-
180
MKYIIGLDMGITSVGFATMMLDDKDEPCRIIRMGSRIFEAAEHPKDGSSLAAPRRINRGMRRR


H590A

LRRKSHRKERIKDLIIKNELMTADEISAIYSTGKQLSDIYQIRAEALDRKLNTEEFVRLLIHL




SQRRGFKSNRKVDAKEKGSDAGKLLSAVNSNKELMIEKNYRTIGEMLYKDEKFSEYKRNKADD




YSNTFARSEYEDEIRQIFSAQQEHGNPYATDELKESYLDIYLSQRSFDEGPGGSSPYGGNQIE




KMIGNCTLEPEEKRAAKATFSFEYFNLLSKVNSIKIVSSSGKRALNNDERQSVIRLAFAKNAI




SYTSLRKELNMEYSERFNISYSQSDKSIEEIEKKTKFTYLTAYHTFKKAYGSVFVEWSADKKN




SLAYALTAYKNDTKIIEYLTQKGFDAAETDIALTLPSFSKWGNLSEKALNNIIPYLEQGMLYH




DACTAAGYNFKADDTDKRMYLPAHEKEAPELDDITNPVVRRAISQTIKVINALIREMGESPCF




VNIELARELSKNKAERSKIEKGQKENQVRNDRIMERLRNEFGLLSPTGQDLIKLKLWEEQDGI




CPYSLKPIKIEKLFDVGYTDIDAIIPYSLSFDDTYNNKVLVMSSENRQKGNRIPMQYLEGKRQ




DDFWLWVDNSNLSRRKKQNLTKETLSEDDLSGFKKRNLQDTQYLSRFMMNYLKKYLALAPNTT




GRKNTIQAVNGAVTSYLRKRWGIQKVRENGDTHHAVDAVVISCVTAGMTKRVSEYAKYKETEF




QNPQTGEFFDVDIRTGEVINRFPLPYARFRNELLMRCSENPSRILHEMPLPTYAADEKVAPIF




VSRMPKHKVKGSAHKETIRRAFEEDGKKYTVSKVPLTDLKLKNGEIENYYNPESDGLLYNALK




EQUAFGGDAAKAFEQPFYKPKSDGSEGPLVKKVKLINKATLTVPVLNNTAVADNGSMVRVDVF




FVEGEGYYLVPIYVADTVKKELPNKAIIANKPYEEWKEMREENFVFSLYPNDLIKISSRKDMK




FNLVNKESTLAPNCQSKEALVYYKGSDISTAAVTAINHDNTYKLRGLGVKTLLKIEKYQVDVL




GNVFKVGKEKRVRFK





MAD2039-
181
MRPYAIGLDIGITSVGWATVALDADESPCGIIGLGSRIFDAAEQPKTGESLAAPRRAARGSRR


H587A

RLRRHRHRNERIRSLMLEERLISQDELETLFDGRLEDIYALRVKALDEIVSRTDFARILLHIS




QRRGFKSNRKNPTTKEDGVLLAAVNENKQRMSEHGYRTVGEMFLLDETFKDHKRNKGGNYITT




VARDMVADEVRAIFSAQRELGASFASEEFEERYLEILLSQRSFDEGPGGNSPYGGSQIERMVG




RCTFFPDEPRAAKATYSFEYFTLLQKVNHIRIVENGVASKLTDEQRRIIIELAHTTKDVSYAK




IRKVLKLSDKQLFNIRYSDNSPAEDSEKKEKLGIMKAYHQMRSAIDRVSKGRFAMMPRAQRNA




IGTALSLYKTSDKIRKYLTDAGLDEIDINSADSIGSFSKFGHISVKACDMLIPFLEQGMNYNE




ACAAAGLNFKGHDAGEKSKLLHPKEEDYEDITSPVVRRAIAQTIKVINAIIRREGCSPTFINI




ELAREMAKDFRERNRIKKENDDNRAKNERLLERIRTEYGKNNPTGLDLVKLRLYEEQSGVCMY




SLKQMSLEKLFEPNYAEVDAIVPYSISFDDSRKNKVLVLTEENRNKGNRLPLQYLKGRRREDF




IVWVNNNVKDYRKRRLLLKEELTAEDESGFKERNLQDTKTMSRFLLNYIADNLEFAESTRGRK




KKVTAVNGAVTAYMRKRWGITKIREDGDCHHAVDAVVIACTTDAMIRQVSRYAQFRECEYMQT




ESGSVAVDTGTGEVLRTFPYPWPDFRKELEARLANDPAKVINDLHLPFYMSAGRPLPEPVFVS




RMPRRKVTGAAHKDTIKSARELDNGYLIVKRPLTDLKLKNGEIENYYNPQSDKCLYDALKNAL




IEHGGDAKKAFAGEFRKPKRDGTPGPIVKKVKLLEPTTMCVPVHGGKGAADNDSMVRVDVFLS




GGKYYLVPIYVADTLKPELPNKAVTRGKKYSEWLEMADEDFIFSLYPNDLICATSKNGITLSV




CRKDSTLPPTVESKSFMLYYRGTDISTGSISCITHDNAYKLRGLGVKTLEKLEKYTVDVLGEY




HKVGKEVRQPFNIKRRKACPSEML





MAD2039-
182
MRPYAIGLDIGITSVGWATVALDADESPCGIIGLGSRIFDAAEQPKTGESLAAPRRAARGSRR


N610A

RLRRHRHRNERIRSLMLEERLISQDELETLFDGRLEDIYALRVKALDEIVSRTDFARILLHIS




QRRGFKSNRKNPTTKEDGVLLAAVNENKQRMSEHGYRTVGEMFLLDETFKDHKRNKGGNYITT




VARDMVADEVRAIFSAQRELGASFASEEFEERYLEILLSQRSFDEGPGGNSPYGGSQIERMVG




RCTFFPDEPRAAKATYSFEYFTLLQKVNHIRIVENGVASKLTDEQRRIIIELAHTTKDVSYAK




IRKVLKLSDKQLFNIRYSDNSPAEDSEKKEKLGIMKAYHQMRSAIDRVSKGRFAMMPRAQRNA




IGTALSLYKTSDKIRKYLTDAGLDEIDINSADSIGSFSKFGHISVKACDMLIPFLEQGMNYNE




ACAAAGLNFKGHDAGEKSKLLHPKEEDYEDITSPVVRRAIAQTIKVINAIIRREGCSPTFINI




ELAREMAKDFRERNRIKKENDDNRAKNERLLERIRTEYGKNNPTGLDLVKLRLYEEQSGVCMY




SLKQMSLEKLFEPNYAEVDHIVPYSISFDDSRKNKVLVLTEENRNKGNRLPLQYLKGRRREDF




IVWVNNNVKDYRKRRLLLKEELTAEDESGFKERNLQDTKTMSRFLLNYIADNLEFAESTRGRK




KKVTAVNGAVTAYMRKRWGITKIREDGDCHHAVDAVVIACTTDAMIRQVSRYAQFRECEYMQT




ESGSVAVDTGTGEVLRTFPYPWPDFRKELEARLANDPAKVINDLHLPFYMSAGRPLPEPVFVS




RMPRRKVTGAAHKDTIKSARELDNGYLIVKRPLTDLKLKNGEIENYYNPQSDKCLYDALKNAL




IEHGGDAKKAFAGEFRKPKRDGTPGPIVKKVKLLEPTTMCVPVHGGKGAADNDSMVRVDVFLS




GGKYYLVPIYVADTLKPELPAKAVTRGKKYSEWLEMADEDFIFSLYPNDLICATSKNGITLSV




CRKDSTLPPTVESKSFMLYYRGTDISTGSISCITHDNAYKLRGLGVKTLEKLEKYTVDVLGEY




HKVGKEVRQPFNIKRRKACPSEML









While this invention is satisfied by embodiments in many different forms, as described in detail in connection with preferred embodiments of the invention, it is understood that the present disclosure is to be considered as exemplary of the principles of the invention and is not intended to limit the invention to the specific embodiments illustrated and described herein. Numerous variations may be made by persons skilled in the art without departure from the spirit of the invention. The scope of the invention will be measured by the appended claims and their equivalents. The abstract and the title are not to be construed as limiting the scope of the present invention, as their purpose is to enable the appropriate authorities, as well as the general public, to quickly determine the general nature of the invention. In the claims that follow, unless the term “means” is used, none of the features or elements recited therein should be construed as means-plus-function limitations pursuant to 35 U.S.C. § 112, ¶6.

Claims
  • 1. A system for CRISPR editing of live cells comprising a MAD2015 nuclease having a sequence SEQ ID NO: 1, a CRISPR repeat RNA having a sequence SEQ ID NO: 2, and a tracr RNA having a sequence SEQ ID NO: 3; a MAD2016 nuclease having a sequence SEQ ID NO: 4, a CRISPR repeat RNA having a sequence SEQ ID NO: 5, and a tracr RNA having a sequence SEQ ID NO: 6; a MAD2017 nuclease having a sequence SEQ ID NO: 7, a CRISPR repeat RNA having a sequence SEQ ID NO: 8, and a tracr RNA having a sequence SEQ ID NO: 9; a MAD2019 nuclease having a sequence SEQ ID NO: 10, a CRISPR repeat RNA having a sequence SEQ ID NO: 11, and a tracr RNA having a sequence SEQ ID NO: 12; a MAD2020 nuclease having a sequence SEQ ID NO: 13, a CRISPR repeat RNA having a sequence SEQ ID NO: 14, and a tracr RNA having a sequence SEQ ID NO: 15; a MAD2021 nuclease having a sequence SEQ ID NO: 16, a CRISPR repeat RNA having a sequence SEQ ID NO: 17, and a tracr RNA having a sequence SEQ ID NO: 18; or a MAD2022 nuclease having a sequence SEQ ID NO: 19, a CRISPR repeat RNA having a sequence SEQ ID NO: 20, and a tracr RNA having a sequence SEQ ID NO: 21.
  • 2. The system for CRISPR editing of live cells of claim 1, comprising a MAD2015 nuclease having a sequence SEQ ID NO: 1, a CRISPR repeat RNA having a sequence SEQ ID NO: 2, and a tracr RNA having a sequence SEQ ID NO: 3.
  • 3. The system for CRISPR editing of live cells of claim 1, comprising a MAD2016 nuclease having a sequence SEQ ID NO: 4, a CRISPR repeat RNA having a sequence SEQ ID NO: 5, and a tracr RNA having a sequence SEQ ID NO: 6.
  • 4. The system for CRISPR editing of live cells of claim 1, comprising a MAD2017 nuclease having a sequence SEQ ID NO: 7, a CRISPR repeat RNA having a sequence SEQ ID NO: 8, and a tracr RNA having a sequence SEQ ID NO: 9.
  • 5. The system for CRISPR editing of live cells of claim 1, comprising a MAD2019 nuclease having a sequence SEQ ID NO: 10, a CRISPR repeat RNA having a sequence SEQ ID NO: 11, and a tracr RNA having a sequence SEQ ID NO: 12.
  • 6. The system for CRISPR editing of live cells of claim 1, comprising a MAD2020 nuclease having a sequence SEQ ID NO: 13, a CRISPR repeat RNA having a sequence SEQ ID NO: 14, and a tracr RNA having a sequence SEQ ID NO: 15.
  • 7. The system for CRISPR editing of live cells of claim 1, comprising a MAD2021 nuclease having a sequence SEQ ID NO: 16, a CRISPR repeat RNA having a sequence SEQ ID NO: 17, and a tracr RNA having a sequence SEQ ID NO: 18.
  • 8. The system for CRISPR editing of live cells of claim 1, a MAD2022 nuclease having a sequence SEQ ID NO: 19, a CRISPR repeat RNA having a sequence SEQ ID NO: 20, and a tracr RNA having a sequence SEQ ID NO: 21.
RELATED CASES

This application is a continuation of U.S. Ser. No. 17/463,498, filed 31 Aug. 2021, now allowed; which claims priority to U.S. Ser. No. 63/133,502, filed 4 Jan. 2021, entitled “MAD NUCLEASES”, which is incorporated herein in its entirety. Submitted with the present application is an electronically filed sequence listing via EFS-Web as an ASCII formatted sequence listing, entitled “INSC083US2_SEQLIST_20220309”, created Mar. 9, 2022, and 359,000 bytes in size. The sequence listing is part of the specification filed Mar. 9, 2022 and is incorporated by reference in its entirety.

Provisional Applications (1)
Number Date Country
63133502 Jan 2021 US
Continuations (1)
Number Date Country
Parent 17463498 Aug 2021 US
Child 17691018 US