METHODS AND COMPOSITIONS FOR THE MAKING AND USING OF GUIDE NUCLEIC ACIDS

Information

  • Patent Application
  • 20210207130
  • Publication Number
    20210207130
  • Date Filed
    August 17, 2020
    4 years ago
  • Date Published
    July 08, 2021
    3 years ago
Abstract
Provided herein are methods and compositions to make guide nucleic acids (gNAs), nucleic acids encoding gNAs, collections of gNAs, and nucleic acids encoding for a collection of gNAs from any source nucleic acid. Also provided herein are methods and compositions to use the resulting gNAs, nucleic acids encoding gNAs, collections of gNAs, and nucleic acids encoding for a collection of gNAs in a variety of applications.
Description
INCORPORATION BY REFERENCE OF SEQUENCE LISTING

The present application is being filed with a Sequence Listing in electronic format. The content of the ASCII text file of the sequence listing named “155949-00086_Sequence_Listing.TXT” which is 797 kb in size was created on Aug. 17, 2020, and electronically submitted via EFS-Web herewith the application is incorporated herein by reference in its entirety.


BACKGROUND

Human clinical DNA samples and sample libraries such as cDNA libraries derived from RNA contain highly abundant sequences that have little informative value and increase the cost of sequencing. While methods have been developed to deplete these unwanted sequences (e.g., via hybridization capture), these methods are often time-consuming and can be inefficient.


Although a guide nucleic acid (gNA) mediated nuclease systems (such as guide RNA (gRNA)-mediated Cas systems) can efficiently deplete any target DNA, targeted depletion of very high numbers of unique DNA molecules is not feasible. For example, a sequencing library derived from human blood may contain >99% human genomic DNA. Using a gRNA-mediated Cas9 system-based method to deplete this genomic DNA to detect an infectious agent circulating in the human blood would require extremely high numbers of gRNAs (about 10-100 million gRNAs), in order to ensure that a gRNA will be present every 30-50 base pairs (bp), and that no target DNA will be missed. Very large numbers of gRNAs can be predicted computationally and then synthesized chemically, but at a prohibitively expensive cost.


Therefore, there is a need in the art to provide a cost-effective method of converting any DNA into a gNA (e.g., gRNA) library to enable, for example, genome-wide depletion of unwanted DNA sequences from those of interest, without prior knowledge about their sequences. Provided herein are methods and compositions that address this need.


SUMMARY

Provided herein are compositions and methods to generate gNAs and collections of gNAs from any source nucleic acid. For example, gRNAs and collections of gRNAs can be generated from source DNA, such as genomic DNA. Such gNAs and collections of the same are useful for a variety of applications, including depletion, partitioning, capture, or enrichment of target sequences of interest, genome-wide labeling, genome-wide editing, genome-wide functional screens, and genome-wide regulation.


In one aspect, the invention described herein provides a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment encoding a nucleic acid-guided nuclease system protein-binding sequence, wherein at least 10% of the nucleic acids in the collection vary in size. In another aspect, the invention described herein provides a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence, wherein the size of the second segment is greater than 21 bp; and a third segment encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease system protein is a CRISPR/Cas system protein. In some embodiments, the size of the second segment varies from 15-250 bp across the collection of nucleic acids. In some embodiments, at least 10% of the second segments in the collection are greater than 21 bp. In some embodiments, the size of the second segment is not 20 bp. In some embodiments, the size of the second segment is not 21 bp. In some embodiments, the collection of nucleic acids is a collection of DNA. In some embodiments, the second segment is single stranded DNA. In some embodiments, the third segment is single stranded DNA. In some embodiments, the second segment is double stranded DNA. In some embodiments, the third segment is double stranded DNA. In some embodiments, the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3. In some embodiments, the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or a viral genome. In some embodiments, the targeting sequence is directed at repetitive or abundant DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the sequence of the second segments is selected from Table 3 and/or Table 4. In some embodiments, the collection comprises at least 102 unique nucleic acid molecules. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the collection comprises targeting sequences directed to sequences of interest spaced about every 10,000 bp or less across the genome of an organism. In some embodiments, the PAM sequence is AGG, CGG, or TGG. In some embodiments, the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the third segment comprises DNA encoding a gRNA stem-loop sequence. In some embodiments, the third segment encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the sequence of the third segment encodes for a crRNA and a tracrRNA. In some embodiments, the nucleic acid-guided nuclease system protein is from a bacterial species. In some embodiments, the nucleic acid-guided nuclease system protein is from an archaea species. In some embodiments, the CRISPR/Cas system protein is a Type I, Type II, or Type III protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the third segment comprises DNA encoding a Cas9-binding sequence. In some embodiments, a plurality of third segments of the collection encode for a first nucleic acid-guided nuclease system protein binding sequence, and a plurality of the third segments of the collection encode for a second nucleic acid-guided nuclease system protein binding sequence. In some embodiments, the third segments of the collection encode for a plurality of different binding sequences of a plurality of different binding sequences of a plurality of different nucleic acid-guided nuclease system proteins.


In another aspect, the invention described herein provides for a collection of guide RNAs (gRNAs), comprising: a first RNA segment a targeting sequence; and a second RNA segment comprising a nucleic acid-guided nuclease system protein-binding sequence, wherein at least 10% of the gRNAs in the collection vary in size. In some embodiments, the nucleic acid-guided nuclease system protein is a CRISPR/Cas system protein. In some embodiments, the size of the first segment varies from 15-250 bp across the collection of gRNAs. In some embodiments, the at least 10% of the first segments in the collection are greater than 21 bp. In some embodiments, the size of the first segment is not 20 bp. In some embodiments, the size of the first segment is not 21 bp. In some embodiments, the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or viral genome. In some embodiments, the targeting sequence is directed at repetitive or abundant DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the sequence of the first segments is RNA encoded by sequences selected from Table 3 and/or Table 4. In some embodiments, the collection comprises at least (unique gRNAs. In some embodiments, the gRNAs comprise cytosine, guanine, and adenine. In some embodiments, a subset of the gRNAs further comprises thymine. In some embodiments, a subset of the gRNAs further comprises uracil. In some embodiments, the first segment is at least 80% complementary to a target genomic sequence of interest. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments the PAM sequence is AGG, CGG, or TGG. In some embodiments, the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the second segment comprises a gRNA stem-loop sequence. In some embodiments, the third segment comprises DNA encoding a gRNA stem-loop sequence. In some embodiments, the third segment comprises the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or comprises the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the second segment comprises a crRNA and a tracrRNA. In some embodiments, the nucleic acid-guided nuclease system protein is from a bacterial species. In some embodiments, the nucleic acid-guided nuclease system protein is from an archaea species. In some embodiments, the CRISPR/Cas system protein is a Type I, Type II, or Type III protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the second segment comprises a Cas9-binding sequence. In some embodiments, at least 10% of the gRNAs in the collection vary in their 5′ terminal-end sequence. In some embodiments, the collection comprises targeting sequences directed to sequences of interest spaced every 10,000 bp or less across the genome of an organism. In some embodiments, a plurality of second segments of the collection comprise a first nucleic acid-guided nuclease system protein binding sequence, and a plurality of the second segments of the collection comprise a second nucleic acid-guided nuclease system protein binding sequence. In some embodiments, the second segments of the collection comprise a plurality of different binding sequences of a plurality of different nucleic acid-guided nuclease system proteins. In some embodiments, a plurality of the gRNAs of the collection are attached to a substrate. In some embodiments, a plurality of the gRNAs of the collection comprise a label. In some particular embodiments, a plurality of the gRNAs of the collection comprise different labels.


In another aspect, the invention described herein provides nucleic acid comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence, wherein the targeting sequence is greater than 30 bp; and a third segment encoding a nucleic acid encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas system protein. In some embodiments, the nucleic acid is DNA. In some embodiments, the second segment is single stranded DNA. In some embodiments, the third segment is single stranded DNA. In some embodiments, the second segment is double stranded DNA. In some embodiments, the third segment is double stranded DNA. In some embodiments, the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3. In some embodiments, the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or a viral genome. In some embodiments, the targeting sequence is directed at abundant or repetitive DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the sequence of the second segments is selected from Table 3 and/or Table 4. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the target genomic sequence of interest is 5′ upstream of a PAM sequence. In some embodiments, the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the third segment comprises DNA encoding a gRNA stem-loop sequence. In some embodiments, the third segment comprises DNA encoding a gRNA stem-loop sequence. In some embodiments, the third segment encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the nucleic acid-guided nuclease system protein is from a bacterial species. In some embodiments, the nucleic acid-guided nuclease system protein is from an archaea species. In some embodiments, the CRISPR/Cas system protein is a Type I, Type II, or Type III protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the third segment comprises DNA encoding a Cas9-binding sequence.


In another aspect, the invention described herein provides a guide RNA comprising a first segment comprising a targeting sequence, wherein the size of the first segment is greater than 30 bp; and a second segment comprising a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas system protein. In some embodiments, the gRNA comprises an adenine, a guanine, and a cytosine. In some embodiments, the gRNA further comprises a thymine. In some embodiments, the gRNA further comprises a uracil. In some embodiments, the size of the first RNA segment is between 30 and 250 bp. In some embodiments, the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or viral genome. In some embodiments, the targeting sequence is directed at repetitive or abundant DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the first segment is at least 80% complementary to the target genomic sequence of interest. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the second segment comprises a gRNA stem-loop sequence. In some embodiments, the sequence of the second segment comprises GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or comprises the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the sequence of the third segment comprises a crRNA and a tracrRNA. In some embodiments, the nucleic acid-guided nuclease system protein is from a bacterial species. In some embodiments, the nucleic acid-guided nuclease system protein is from an archaea species. In some embodiments, the CRISPR/Cas system protein is a Type I, Type II, or Type III protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the second segment is a Cas9-binding sequence.


In another aspect, the invention provides a complex comprising a nucleic acid-guided nuclease system protein and a comprising a first segment comprising a targeting sequence, wherein the size of the first segment is greater than 30 bp; and a second segment comprising a nucleic acid-guided nuclease system protein-binding sequence.


In another aspect, the invention described herein provides a method for depleting and partitioning of targeted sequences in a sample, enriching a sample for non-host nucleic acids, or serially depleting targeted nucleic acids in a sample comprising: providing nucleic acids extracted from a sample; and contacting the sample with a plurality of complexes comprising (i) any one of the collection of gRNAs provided herein; and (ii) nucleic acid-guided nuclease system proteins. In some embodiments, the nucleic acid-guided nuclease system proteins are CRISPR/Cas system proteins. In some embodiments, the CRISPR/Cas system proteins are Cas9 proteins.


In another aspect, the invention provides a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, comprising: (a) providing double-stranded DNA molecules, each comprising a sequence of interest 5′ to a PAM sequence, and its reverse complementary sequence on the opposite strand; (b) performing an enzymatic digestion reaction on the double stranded DNA molecules, wherein cleavages are generated at the PAM sequence and/or its reverse complementary sequence on the opposite strand, but never completely remove the PAM sequence and/or its reverse complementary sequence on the opposite strand from the double stranded DNA; (c) ligating adapters comprising a recognition sequence to the resulting DNA molecules of step b; (d) contacting the DNA molecules of step c with an restriction enzyme that recognizes the recognition sequence of step c, whereby generating DNA fragments comprising blunt-ended double strand breaks immediately 5′ to the PAM sequence, whereby removing the PAM sequence and the adapter containing the enzyme recognition site; and (e) ligating the resulting double stranded DNA fragments of step d with a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, whereby generating a plurality of DNA fragments, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas nucleic acid-guided nuclease system protein. In some embodiments, the starting DNA molecules of the collection further comprise a regulatory sequence upstream of the sequence of interest 5′ to the PAM sequence. In some embodiments, the regulatory sequence comprises a promoter. In some embodiments, the promoter comprises a T7, Sp6, or T3 sequence. In some embodiments, the double stranded DNA molecules are genomic DNA, intact DNA, or sheared DNA. In some embodiments, the genomic DNA is human, mouse, avian, fish, plant, insect, bacterial, or viral. In some embodiments, the DNA segments encoding a targeting sequence are at least 22 bp. In some embodiments, the DNA segments encoding a targeting sequence are 15-250 bp in size range. In some embodiments, the PAM sequence is AGG, CGG, or TGG. In some embodiments, the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, step (b) further comprises (1) contacting the DNA molecules with an enzyme capable of creating a nick in a single strand at a CCD site, whereby generating a plurality of nicked double stranded DNA molecules, each comprising a sequence of interest followed by an HGG sequence, wherein the DNA molecules are nicked at the CCD sites; and (2) contacting the nicked double stranded DNA molecules with an endonuclease, whereby generating a plurality of double stranded DNA fragments, each comprising a sequence of interest followed by an HGG sequence wherein residual nucleotides from HGG and/or CCD sequences is (are) left behind. In some embodiments, step (d) further comprises PCR amplification of the adaptor-ligated DNA fragments from step (c) before cutting with the restriction enzyme recognizing the recognition sequence of step (c), wherein after PCR, the recognition sequence is positioned 3′ of the PAM sequence, and a regulatory sequence is positioned at the 5′ distal end of the PAM sequence. In some embodiments, the enzymatic reaction of step (b) comprises the use of a Nt.CviPII enzyme, and a T7 Endonuclease I enzyme. In some embodiments, step (c) further comprises a blunt-end reaction with a T4 DNA Polymerase, if the adapter to be ligated does not comprise an overhang. In some embodiments, the adapter of step (c) is either (1) double stranded, comprising a restriction enzyme recognition sequence in one strand, and a regulatory sequence in the other strand, if the adapter is Y-shaped and comprises an overhang; or (2) has a palindromic enzyme recognition sequence in both strands, if the adapter is not Y-shaped. In some embodiments, the restriction enzyme of step (d) is MlyI. In some embodiments, the restriction enzyme of step (d) is BaeI. In some embodiments, step (d) further comprises contacting the DNA molecules with an XhoI enzyme. In some embodiments, in step (e) the DNA encoding a nucleic acid-guided nuclease system-protein binding sequence encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the targeted sequences of interest are spaced every 10,000 bp or less across the genome of an organism.


In another aspect, the invention provides a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, comprising: (a) providing a plurality of double stranded DNA molecules, each comprising a sequence of interest, an NGG site, and its complement CCN site; (b) contacting the molecules with an enzyme capable of creating a nick in a single strand at a CCN site, whereby generating a plurality of nicked double stranded DNA molecules, each comprising a sequence of interest 5′ to the NGG site, wherein the DNA molecules are nicked at the CCD sites; (c) contacting the nicked double stranded DNA molecules with an endonuclease, whereby generating a plurality of double stranded DNA fragments, each comprising a sequence of interest, wherein the fragments comprise an terminal overhang; (d) contacting the double stranded DNA fragments with an enzyme without 5′ to 3′ exonuclease activity to blunt end the double stranded DNA fragments, whereby generating a plurality of blunt ended double stranded fragments, each comprising a sequence of interest; (e) contacting the blunt ended double stranded fragments of step d with an enzyme that cleaves the terminal NGG site; and (f) ligating the resulting double stranded DNA fragments of step e with a DNA encoding a nucleic acid-guided nuclease system-protein binding sequence, whereby generating a plurality of DNA fragments, each comprising a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas system protein. In some embodiments, the plurality of double stranded DNA molecules have a regulatory sequence 5′ upstream of the NGG sites. In some embodiments, the regulatory sequence comprises a T7, SP6, or T3 sequence. In some embodiments, the NGG site comprises AGG, CGG, or TGG, and the CCN site comprises CCT, CCG, or CCA. In some embodiments, the plurality of double stranded DNA molecules, each comprising a sequence of interest comprise sheared fragments of genomic DNA. In some embodiments, the genomic DNA is mammalian, prokaryotic, eukaryotic, avian, bacterial or viral. In some embodiments, the plurality of double stranded DNA molecules in step (a) are at least 500 bp. In some embodiments, the enzyme in step b is a Nt.CviPII enzyme. In some embodiments, the enzyme in step c is a T7 Endonuclease I. In some embodiments, the enzyme in step d is a T4 DNA Polymerase. In some embodiments, in step f the DNA encoding a nucleic acid-guided nuclease system-protein binding sequence encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the step e additionally comprises ligating adaptors carrying a MlyI recognition site and digesting with MlyI enzyme. In some embodiments, the sequence of interest is spaced every 10,000 bp or less across the genome.


In another aspect, the invention provides a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence and a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, comprising: (a) providing genomic DNA comprising a plurality of sequences of interest, comprising NGG and CCN sites; (b) contacting the genomic DNA with an enzyme capable of creating nicks in the genomic DNA, whereby generating nicked genomic DNA, nicked at CCN sites; (c) contacting the nicked genomic DNA with an endonuclease, whereby generating double stranded DNA fragments, with an overhang; (d) ligating the DNA with overhangs from step c to a Y-shaped adapter, thereby introducing a restriction enzyme recognition sequence only at 3′ of the NGG site and a regulatory sequence 5′ of the sequence of interest; (e) contacting the product from step d with an enzyme that cleaves away the NGG site together with the adaptor carrying the enzyme recognition sequence; and (f) ligating the resulting double stranded DNA fragments of step e with a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, whereby generating a plurality of DNA fragments, each comprising a sequence of interest ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas system protein. In some embodiments, the NGG site comprises AGG, CGG, or TGG, and CCN site comprises CCT, CCG, or CCA. In some embodiments, the regulatory sequence comprises a promoter sequence. In some embodiments, the promoter sequence comprises a T7, SP6, or T3 sequence. In some embodiments, the DNA fragments are sheared fragments of genomic DNA.


In some embodiments, the genomic DNA is mammalian, prokaryotic, eukaryotic, or viral. In some embodiments, the fragments are at least 200 bp. In some embodiments, the enzyme in step b is a Nt.CviPII enzyme. In some embodiments, the enzyme in step c is a T7 Endonuclease I. In some embodiments, step d further comprises PCR amplification of the adaptor-ligated DNA. In some embodiments, in step f, the DNA encoding nucleic acid-guided nuclease system protein-binding sequence encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the enzyme removing NGG site in step e is MlyI. In some embodiments, the target of interest of the collection is spaced every 10,000 bp or less across the genome.


In another aspect, the invention provides kits and/or reagents useful for performing a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, as described in the embodiments herein.


In another aspect, the invention described herein provides kit comprising a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment encoding a CRISPR/Cas system protein-binding sequence, wherein at least 10% of the nucleic acids in the collection vary in size.


In another aspect, the invention described herein provides a kit comprising a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence, wherein the size of the second segment is greater than 21 bp; and a third segment encoding a CRISPR/Cas system protein-binding sequence.


In another aspect, the invention described herein provides a kit comprising a collection of guide RNAs comprising a first RNA segment a targeting sequence; and a second RNA segment comprising a CRISPR/Cas system protein-binding sequence, wherein at least 10% of the gRNAs in the collection vary in size.


In another aspect, the invention described herein provides a method of making a collection of guide nucleic acids, comprising: a. obtaining abundant cells in a source sample; b. collecting nucleic acids from said abundant cells; and c. preparing a collection of guide nucleic acids (gNAs) from said nucleic acids. In some embodiments, said abundant cells comprise cells from one or more most abundant bacterial species in said source sample. In some embodiments, said abundant cells comprise cells from more than one species. In some embodiments, said abundant cells comprise human cells. In some embodiments, said abundant cells comprise animal cells. In some embodiments, said abundant cells comprise plant cells. In some embodiments, said abundant cells comprise bacterial cells. In some embodiments, the method further comprises contacting nucleic acid-guided nucleases with said library of gNAs to form nucleic acid-guided nuclease-gNA complexes. In some embodiments, the method further comprises using said nucleic acid-guided nuclease-gNA complexes to cleave target nucleic acids at target sites, wherein said gNAs are complementary to said target sites. In some embodiments, said target nucleic acids are from said source sample. In some embodiments, a species of said target nucleic acids is the same as a species of said source sample. In some embodiments, said species of said target nucleic acids and said species of said source sample is human. In some embodiments, said species of said target nucleic acids and said species of said source sample is animal. In some embodiments, said species of said target nucleic acids and said species of said source sample is plant.


In another aspect, the invention described herein provides a method of making a collection of nucleic acids, each comprising a targeting sequence, comprising: a. obtaining source DNA; b. nicking said source DNA with a nicking enzyme at nicking enzyme recognition sites, thereby producing double-stranded breaks at proximal nicks; and c. repairing overhangs of said double-stranded breaks, thereby producing a double-stranded fragment comprising (i) a targeting sequence and (ii) said nicking enzyme recognition site. In another aspect, the invention described herein provides a method of making a collection of nucleic acids, each comprising a targeting sequence, comprising: a. obtaining source DNA; b. nicking said source DNA with a nicking enzyme at nicking enzyme recognition sites, thereby producing a nick; and c. synthesizing a new strand from said nick, thereby producing a single-stranded fragment of said source DNA comprising a targeting sequence. In some embodiments, the method further comprises producing a double-stranded fragment comprising said targeting sequence from said single-stranded fragment. In some embodiments, said producing said double-stranded fragment comprises random priming and extension. In some embodiments, said random priming is conducted with a primer comprising a random n-mer region and a promoter region. In some embodiments, said random n-mer region is a random hexamer region. In some embodiments, said random n-mer region is a random octamer region. In some embodiments, said promoter region is a T7 promoter region. In some embodiments, the method further comprises ligating a nuclease recognition site nucleic acid comprising a nuclease recognition site to said double-stranded fragment. In some embodiments, said nuclease recognition site corresponds to a nuclease that cuts at a distance from said nuclease recognition site equal to the length of said nicking enzyme recognition sites. In some embodiments, said nuclease recognition site is a MlyI recognition site. In some embodiments, said nuclease recognition site is a BaeI recognition site. In some embodiments, the method further comprises digesting said double-stranded fragment with said nuclease, thereby removing said nicking enzyme recognition site from said double-stranded fragment. In some embodiments, the method further comprises ligating said double-stranded fragment to a nucleic acid-guided nuclease system protein recognition site nucleic acid comprising a nucleic acid-guided nuclease system protein recognition site. In some embodiments, said nucleic acid-guided nuclease system protein recognition site comprises a guide RNA stem-loop sequence. In some embodiments, said nuclease recognition site corresponds to a nuclease that cuts at a distance from said nuclease recognition site equal to a length of said targeting sequence. In some embodiments, said length of said targeting sequence is 20 base pairs. In some embodiments, said nuclease recognition site is a MmeI recognition site. In some embodiments, the method further comprises digesting said double-stranded fragment with said nuclease. In some embodiments, said nuclease recognition site corresponds to a nuclease that cuts at a distance from said nuclease recognition site equal to a length of said targeting sequence plus a length of said nicking enzyme recognition sites. In some embodiments, said length of said targeting sequence plus a length of said nicking enzyme recognition sites is 23 base pairs. In some embodiments, said nuclease recognition site is a EcoP15I recognition site. In some embodiments, the method further comprises digesting said double-stranded fragment with said nuclease. In some embodiments, the method further comprises ligating said double-stranded fragment to a nucleic acid-guided nuclease system protein recognition site nucleic acid comprising a nucleic acid-guided nuclease system protein recognition site. In some embodiments, said nucleic acid-guided nuclease system protein recognition site comprises a guide RNA stem-loop sequence.


In another aspect, the invention described herein provides a kit comprising all essential reagents and instructions for carrying out the methods of aspects of the invention described herein.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 illustrates an exemplary scheme for producing a collection of gRNAs (a gRNA library) from genomic DNA.



FIG. 2 illustrates another exemplary scheme for producing a collection of gRNAs (a gRNA library) from genomic DNA.



FIG. 3 illustrates an exemplary scheme for nicking of DNA and subsequent treatment with polymerase to generate blunt ends.



FIG. 4 illustrates an exemplary scheme for sequential production of a library of gNAs using three adapters.



FIG. 5 illustrates an exemplary scheme for sequential production of a library of gNAs using one adapter and one oligo.



FIG. 6 illustrates an exemplary scheme for generation of a large pool of DNA fragments with blunt ends using Nicking Enzyme Mediated DNA Amplification (NEMDA).



FIG. 7 illustrates an exemplary scheme for generation of a large pool of gNAs using Nicking Enzyme Mediated DNA Amplification (NEMDA).





DETAILED DESCRIPTION OF THE INVENTION

There is a need in the art for a scalable, low-cost approach to generate large numbers of diverse guide nucleic acids (gNAs) (e.g., gRNAs, gDNAs) for a variety of downstream applications.


Unless defined otherwise herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described.


Numeric ranges are inclusive of the numbers defining the range.


For purposes of interpreting this specification, the following definitions will apply and whenever appropriate, terms used in the singular will also include the plural and vice versa. In the event that any definition set forth below conflicts with any document incorporated herein by reference, the definition set forth shall control.


As used herein, the singular form “a”, “an”, and “the” includes plural references unless indicated otherwise.


It is understood that aspects and embodiments of the invention described herein include “comprising,” “consisting,” and “consisting essentially of” aspects and embodiments.


The term “about” as used herein refers to the usual error range for the respective value readily known to the skilled person in this technical field. Reference to “about” a value or parameter herein includes (and describes) embodiments that are directed to that value or parameter per se.


The term “nucleic acid,” as used herein, refers to a molecule comprising one or more nucleic acid subunits. A nucleic acid can include one or more subunits selected from adenosine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), and modified versions of the same. A nucleic acid comprises deoxyribonucleic acid (DNA), ribonucleic acid (RNA), combinations, or derivatives thereof. A nucleic acid may be single-stranded and/or double-stranded.


The nucleic acids comprise “nucleotides”, which, as used herein, is intended to include those moieties that contain purine and pyrimidine bases, and modified versions of the same. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles. In addition, the term “nucleotide” or “polynucleotide” includes those moieties that contain hapten or fluorescent labels and may contain not only conventional ribose and deoxyribose sugars, but other sugars as well. Modified nucleosides, nucleotides or polynucleotides also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, or are functionalized as ethers, amines, or the like.


The term “nucleic acids” and “polynucleotides” are used interchangeably herein. Polynucleotide is used to describe a nucleic acid polymer of any length, e.g., greater than about 2 bases, greater than about 10 bases, greater than about 100 bases, greater than about 500 bases, greater than 1000 bases, up to about 10,000 or more bases composed of nucleotides, e.g., deoxyribonucleotides or ribonucleotides, and may be produced enzymatically or synthetically (e.g., PNA as described in U.S. Pat. No. 5,948,902 and the references cited therein) which can hybridize with naturally occurring nucleic acids in a sequence specific manner analogous to that of two naturally occurring nucleic acids, e.g., can participate in Watson-Crick base pairing interactions. Naturally-occurring nucleotides include guanine, cytosine, adenine and thymine (G, C, A and T, respectively). DNA and RNA have a deoxyribose and ribose sugar backbones, respectively, whereas PNA's backbone is composed of repeating N-(2-aminoethyl)-glycine units linked by peptide bonds. In PNA various purine and pyrimidine bases are linked to the backbone by methylene carbonyl bonds. A locked nucleic acid (LNA), often referred to as inaccessible RNA, is a modified RNA nucleotide. The ribose moiety of an LNA nucleotide is modified with an extra bridge connecting the 2′ oxygen and 4′ carbon. The bridge “locks” the ribose in the 3′-endo (North) conformation, which is often found in the A-form duplexes. LNA nucleotides can be mixed with DNA or RNA residues in the oligonucleotide whenever desired. The term “unstructured nucleic acid,” or “UNA” is a nucleic acid containing non-natural nucleotides that bind to each other with reduced stability. For example, an unstructured nucleic acid may contain a G′ residue and a C′ residue, where these residues correspond to non-naturally occurring forms, i.e., analogs, of G and C that base pair with each other with reduced stability, but retain an ability to base pair with naturally occurring C and G residues, respectively. Unstructured nucleic acid is described in US20050233340, which is incorporated by reference herein for disclosure of UNA.


The term “oligonucleotide” as used herein denotes a single-stranded multimer of nucleotides.


Unless otherwise indicated, nucleic acids are written left to right in 5′ to 3′ orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.


The term “cleaving,” as used herein, refers to a reaction that breaks the phosphodiester bonds between two adjacent nucleotides in both strands of a double-stranded DNA molecule, thereby resulting in a double-stranded break in the DNA molecule.


The term “nicking” as used herein, refers to a reaction that breaks the phosphodiester bond between two adjacent nucleotides in only one strand of a double-stranded DNA molecule, thereby resulting in a break in one strand of the DNA molecule.


The term “cleavage site, as used herein, refers to the site at which a double-stranded DNA molecule has been cleaved.


The “nucleic acid-guided nuclease-gNA complex” refers to a complex comprising a nucleic acid-guided nuclease protein and a guide nucleic acid (gNA, for example a gRNA or a gDNA). For example the “Cas9-gRNA complex” refers to a complex comprising a Cas9 protein and a guide RNA (gRNA). The nucleic acid-guided nuclease may be any type of nucleic acid-guided nuclease, including but not limited to wild type nucleic acid-guided nuclease, a catalytically dead nucleic acid-guided nuclease, or a nucleic acid-guided nuclease-nickase.


The term “nucleic acid-guided nuclease-associated guide NA” refers to a guide nucleic acid (guide NA). The nucleic acid-guided nuclease-associated guide NA may exist as an isolated nucleic acid, or as part of a nucleic acid-guided nuclease-gNA complex, for example a Cas9-gRNA complex.


The terms “capture” and “enrichment” are used interchangeably herein, and refer to the process of selectively isolating a nucleic acid region containing: sequences of interest, targeted sites of interest, sequences not of interest, or targeted sites not of interest.


The term “hybridization” refers to the process by which a strand of nucleic acid joins with a complementary strand through base pairing as known in the art. A nucleic acid is considered to be “selectively hybridizable” to a reference nucleic acid sequence if the two sequences specifically hybridize to one another under moderate to high stringency hybridization and wash conditions. Moderate and high stringency hybridization conditions are known (see, e.g., Ausubel, et al., Short Protocols in Molecular Biology, 3rd ed., Wiley & Sons 1995 and Sambrook et al., Molecular Cloning: A Laboratory Manual, Third Edition, 2001 Cold Spring Harbor, N.Y.). One example of high stringency conditions includes hybridization at about 42° C. in 50% formamide, 5×SSC, 5×Denhardt's solution, 0.5% SDS and 100 μg/ml denatured carrier DNA followed by washing two times in 2×SSC and 0.5% SDS at room temperature and two additional times in 0.1×SSC and 0.5% SDS at 42° C.


The term “duplex,” or “duplexed,” as used herein, describes two complementary polynucleotides that are base-paired, i.e., hybridized together.


The term “amplifying” as used herein refers to generating one or more copies of a target nucleic acid, using the target nucleic acid as a template.


The term “genomic region,” as used herein, refers to a region of a genome, e.g., an animal or plant genome such as the genome of a human, monkey, rat, fish or insect or plant. In certain cases, an oligonucleotide used in the method described herein may be designed using a reference genomic region, i.e., a genomic region of known nucleotide sequence, e.g., a chromosomal region whose sequence is deposited at NCBI's Genbank database or other databases, for example.


The term “genomic sequence,” as used herein, refers to a sequence that occurs in a genome. Because RNAs are transcribed from a genome, this term encompasses sequence that exist in the nuclear genome of an organism, as well as sequences that are present in a cDNA copy of an RNA (e.g., an mRNA) transcribed from such a genome.


The term “genomic fragment,” as used herein, refers to a region of a genome, e.g., an animal or plant genome such as the genome of a human, monkey, rat, fish or insect or plant. A genomic fragment may be an entire chromosome, or a fragment of a chromosome. A genomic fragment may be adapter ligated (in which case it has an adapter ligated to one or both ends of the fragment, or to at least the 5 end of a molecule), or may not be adapter ligated.


In certain cases, an oligonucleotide used in the method described herein may be designed using a reference genomic region, i.e., a genomic region of known nucleotide sequence, e.g., a chromosomal region whose sequence is deposited at NCBI's Genbank database or other databases, for example. Such an oligonucleotide may be employed in an assay that uses a sample containing a test genome, where the test genome contains a binding site for the oligonucleotide.


The term “ligating,” as used herein, refers to the enzymatically catalyzed joining of the terminal nucleotide at the 5′ end of a first DNA molecule to the terminal nucleotide at the 3′ end of a second DNA molecule.


If two nucleic acids are “complementary,” each base of one of the nucleic acids base pairs with corresponding nucleotides in the other nucleic acid. The term “complementary” and “perfectly complementary” are used synonymously herein.


The term “separating,” as used herein, refers to physical separation of two elements (e.g., by size or affinity, etc.) as well as degradation of one element, leaving the other intact. For example, size exclusion can be employed to separate nucleic acids, including cleaved targeted sequences.


In a cell, DNA usually exists in a double-stranded form, and as such, has two complementary strands of nucleic acid referred to herein as the “top” and “bottom” strands. In certain cases, complementary strands of a chromosomal region may be referred to as “plus” and “minus” strands, the “first” and “second” strands, the “coding” and “noncoding” strands, the “Watson” and “Crick” strands or the “sense” and “antisense” strands. The assignment of a strand as being a top or bottom strand is arbitrary and does not imply any particular orientation, function or structure. Until they become covalently linked, the first and second strands are distinct molecules. For ease of description, the “top” and “bottom” strands of a double-stranded nucleic acid in which the top and bottom strands have been covalently linked will still be described as the “top” and “bottom” strands. In other words, for the purposes of this disclosure, the top and bottom strands of a double-stranded DNA do not need to be separated molecules. The nucleotide sequences of the first strand of several exemplary mammalian chromosomal regions (e.g., BACs, assemblies, chromosomes, etc.) is known, and may be found in NCBI's Genbank database, for example.


The term “top strand,” as used herein, refers to either strand of a nucleic acid but not both strands of a nucleic acid. When an oligonucleotide or a primer binds or anneals “only to a top strand,” it binds to only one strand but not the other. The term “bottom strand,” as used herein, refers to the strand that is complementary to the “top strand.” When an oligonucleotide binds or anneals “only to one strand,” it binds to only one strand, e.g., the first or second strand, but not the other strand. If an oligonucleotide binds or anneals to both strands of a double-stranded DNA, the oligonucleotide may have two regions, a first region that hybridizes with the top strand of the double-stranded DNA, and a second region that hybridizes with the bottom strand of the double-stranded DNA.


The term “double-stranded DNA molecule” refers to both double-stranded DNA molecules in which the top and bottom strands are not covalently linked, as well as double-stranded DNA molecules in which the top and bottom stands are covalently linked. The top and bottom strands of a double-stranded DNA are base paired with one other by Watson-Crick interactions.


The term “denaturing,” as used herein, refers to the separation of at least a portion of the base pairs of a nucleic acid duplex by placing the duplex in suitable denaturing conditions. Denaturing conditions are well known in the art. In one embodiment, in order to denature a nucleic acid duplex, the duplex may be exposed to a temperature that is above the Tm of the duplex, thereby releasing one strand of the duplex from the other. In certain embodiments, a nucleic acid may be denatured by exposing it to a temperature of at least 90° C. for a suitable amount of time (e.g., at least 30 seconds, up to 30 mins). In certain embodiments, fully denaturing conditions may be used to completely separate the base pairs of the duplex. In other embodiments, partially denaturing conditions (e.g., with a lower temperature than fully denaturing conditions) may be used to separate the base pairs of certain parts of the duplex (e.g., regions enriched for A-T base pairs may separate while regions enriched for G-C base pairs may remain paired). Nucleic acid may also be denatured chemically (e.g., using urea or NaOH).


The term “genotyping” as used herein, refers to any type of analysis of a nucleic acid sequence, and includes sequencing, polymorphism (SNP) analysis, and analysis to identify rearrangements.


The term “sequencing,” as used herein, refers to a method by which the identity of consecutive nucleotides of a polynucleotide are obtained.


The term “next-generation sequencing” refers to the so-called parallelized sequencing-by-synthesis or sequencing-by-ligation platforms, for example, those currently employed by Illumina, Life Technologies, and Roche, etc. Next-generation sequencing methods may also include nanopore sequencing methods or electronic-detection based methods such as Ion Torrent technology commercialized by Life Technologies.


The term “complementary DNA” or cDNA refers to a double-stranded DNA sample that was produced from an RNA sample by reverse transcription of RNA (using primers such as random hexamers or oligo-dT primers) followed by second-strand synthesis by digestion of the RNA with RNaseH and synthesis by DNA polymerase.


The term “RNA promoter adapter” is an adapter that contains a promoter for a bacteriophage RNA polymerase, e.g., the RNA polymerase from bacteriophage T3, T7, SP6 or the like.


Other definitions of terms may appear throughout the specification.


For any of the structural and functional characteristics described herein, methods of determining these characteristics are known in the art.


Guide Nucleic Acids (gNAs)


Provided herein are guide nucleic acids (gNAs) derivable from any nucleic acid source. The gNAs can be guide RNAs (gRNAs) or guide DNAs (gDNAs). The nucleic acid source can be DNA or RNA. Provided herein are methods to generate gNAs from any source nucleic acid, including DNA from a single organism, or mixtures of DNA from multiple organisms, or mixtures of DNA from multiple species, or DNA from clinical samples, or DNA from forensic samples, or DNA from environmental samples, or DNA from metagenomic DNA samples (for example a sample that contains more than one species of organism). Examples of any source DNA include, but are not limited to any genome, any genome fragment, cDNA, synthetic DNA, or a DNA collection (e.g. a SNP collection, DNA libraries). The gNAs provided herein can be used for genome-wide applications.


In some embodiments, the gNAs are derived from genomic sequences (e.g., genomic DNA). In some embodiments, the gNAs are derived from mammalian genomic sequences. In some embodiments, the gNAs are derived from eukaryotic genomic sequences. In some embodiments, the gNAs are derived from prokaryotic genomic sequences. In some embodiments, the gNAs are derived from viral genomic sequences. In some embodiments, the gNAs are derived from bacterial genomic sequences. In some embodiments, the gNAs are derived from plant genomic sequences. In some embodiments, the gNAs are derived from microbial genomic sequences. In some embodiments, the gNAs are derived from genomic sequences from a parasite, for example a eukaryotic parasite.


In some embodiments, the gNAs are derived from repetitive DNA. In some embodiments, the gNAs are derived from abundant DNA. In some embodiments, the gNAs are derived from mitochondrial DNA. In some embodiments, the gNAs are derived from ribosomal DNA. In some embodiments, the gNAs are derived from centromeric DNA. In some embodiments, the gNAs are derived from DNA comprising Alu elements (Alu DNA). In some embodiments, the gNAs are derived from DNA comprising long interspersed nuclear elements (LINE DNA). In some embodiments, the gNAs are derived from DNA comprising short interspersed nuclear elements (SINE DNA). In some embodiments the abundant DNA comprises ribosomal DNA. In some embodiments, the abundant DNA comprises host DNA (e.g., host genomic DNA or all host DNA). In an example, the gNAs can be derived from host DNA (e.g., human, animal, plant) for the depletion of host DNA to allow for easier analysis of other DNA that is present (e.g., bacterial, viral, or other metagenomic DNA). In another example, the gNAs can be derived from the one or more most abundant types (e.g., species) in a mixed sample, such as the one or more most abundant bacteria species in a metagenomic sample. The one or more most abundant types (e.g., species) can comprise the two, three, four, five, six, seven, eight, nine, ten, or more than ten most abundant types (e.g., species). The most abundant types can be the most abundant kingdoms, phyla or divisions, classes, orders, families, genuses, species, or other classifications. The most abundant types can be the most abundant cell types, such as epithelial cells, bone cells, muscle cells, blood cells, adipose cells, or other cell types. The most abundant types can be non-cancerous cells. The most abundant types can be cancerous cells. The most abundant types can be animal, human, plant, fungal, bacterial, or viral. gNAs can be derived from both a host and the one or more most abundant non-host types (e.g., species) in a sample, such as from both human DNA and the DNA of the one or more most abundant bacterial species. In some embodiments, the abundant DNA comprises DNA from the more abundant or most abundant cells in a sample. For example, for a specific sample, the highly abundant cells can be extracted and their DNA can be used to produce gNAs; these gNAs can be used to produce depletion library and applied to original sample to enable or enhance sequencing or detection of low abundance targets.


In some embodiments, the gNAs are derived from DNA comprising short terminal repeats (STRs).


In some embodiments, the gNAs are derived from a genomic fragment, comprising a region of the genome, or the whole genome itself. In one embodiment, the genome is a DNA genome. In another embodiment, the genome is a RNA genome.


In some embodiments, the gNAs are derived from a eukaryotic or prokaryotic organism; from a mammalian organism or a non-mammalian organism; from an animal or a plant; from a bacteria or virus; from an animal parasite; from a pathogen.


In some embodiments, the gNAs are derived from any mammalian organism. In one embodiment the mammal is a human. In another embodiment the mammal is a livestock animal, for example a horse, a sheep, a cow, a pig, or a donkey. In another embodiment, a mammalian organism is a domestic pet, for example a cat, a dog, a gerbil, a mouse, a rat. In another embodiment the mammal is a type of a monkey.


In some embodiments, the gNAs are derived from any bird or avian organism. An avian organism includes but is not limited to chicken, turkey, duck and goose.


In some embodiments, the gNAs are derived from a plant. In one embodiment, the plant is rice, maize, wheat, rose, grape, coffee, fruit, tomato, potato, or cotton.


In some embodiments, the gNAs are derived from a species of bacteria. In one embodiment, the bacteria are tuberculosis-causing bacteria.


In some embodiments, the gNAs are derived from a virus.


In some embodiments, the gNAs are derived from a species of fungi.


In some embodiments, the gNAs are derived from a species of algae.


In some embodiments, the gNAs are derived from any mammalian parasite.


In some embodiments, the gNAs are derived from any mammalian parasite. In one embodiment, the parasite is a worm. In another embodiment, the parasite is a malaria-causing parasite. In another embodiment, the parasite is a Leishmaniasis-causing parasite. In another embodiment, the parasite is an amoeba.


In some embodiments, the gNAs are derived from a nucleic acid target. Contemplated targets include, but are not limited to, pathogens; single nucleotide polymorphisms (SNPs), insertions, deletions, tandem repeats, or translocations; human SNPs or STRs; potential toxins; or animals, fungi, and plants. In some embodiments, the gRNAs are derived from pathogens, and are pathogen-specific gNAs.


In some embodiments, a guide NA of the invention comprises a first NA segment comprising a targeting sequence, wherein the targeting sequence is 15-250 bp; and a second NA segment comprising a nucleic acid guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence. In some embodiments, the targeting sequence is greater than 21 bp, greater than 22 bp, greater than 23 bp, greater than 24 bp, greater than 25 bp, greater than 26 bp, greater than 27 bp, greater than 28 bp, greater than 29 bp, greater than 30 bp, greater than 40 bp, greater than 50 bp, greater than 60 bp, greater than 70 bp, greater than 80 bp, greater than 90 bp, greater than 100 bp, greater than 110 bp, greater than 120 bp, greater than 130 bp, greater than 140 bp, or even greater than 150 bp. In an exemplary embodiment, the targeting sequence is greater than 30 bp. In some embodiments, the targeting sequences of the present invention range in size from 30-50 bp. In some embodiments, targeting sequences of the present invention range in size from 30-75 bp. In some embodiments, targeting sequences of the present invention range in size from 30-100 bp. For example, a targeting sequence can be at least 15 bp, 20 bp, 25 bp, 30 bp, 35 bp, 40 bp, 45 bp, 50 bp, 55 bp, 60 bp, 65 bp, 70 bp, 75 bp, 80 bp, 85 bp, 90 bp, 95 bp, 100 bp, 110 bp, 120 bp, 130 bp, 140 bp, 150 bp, 160 bp, 170 bp, 180 bp, 190 bp, 200 bp, 210 bp, 220 bp, 230 bp, 240 bp, or 250 bp. In specific embodiments, the targeting sequence is at least 22 bp. In specific embodiments, the targeting sequence is at least 30 bp.


In some embodiments, target-specific gNAs can comprise a nucleic acid sequence that is complementary to a region on the opposite strand of the targeted nucleic acid sequence 5′ to a PAM sequence, which can be recognized by a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein. In some embodiments the targeted nucleic acid sequence is immediately 5′ to a PAM sequence. In specific embodiments, the nucleic acid sequence of the gNA that is complementary to a region in a target nucleic acid is 15-250 bp. In specific embodiments, the nucleic acid sequence of the gNA that is complementary to a region in a target nucleic acid is 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 75, 80, 90, or 100 bp.


In some particular embodiments, the targeting sequence is not 20 bp. In some particular embodiments, the targeting sequence is not 21 bp.


In some embodiments, the gNAs comprise any purines or pyrimidines (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, uracil, guanine, and cytosine (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, thymine, guanine, and cytosine (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, thymine, guanine, cytosine and uracil (and/or modified versions of the same).


In some embodiments, the gNAs comprise a label, are attached to a label, or are capable of being labeled. In some embodiments, the gNA comprises is a moiety that is further capable of being attached to a label. A label includes, but is not limited to, enzyme, an enzyme substrate, an antibody, an antigen binding fragment, a peptide, a chromophore, a lumiphore, a fluorophore, a chromogen, a hapten, an antigen, a radioactive isotope, a magnetic particle, a metal nanoparticle, a redox active marker group (capable of undergoing a redox reaction), an aptamer, one member of a binding pair, a member of a FRET pair (either a donor or acceptor fluorophore), and combinations thereof.


In some embodiments, the gNAs are attached to a substrate. The substrate can be made of glass, plastic, silicon, silica-based materials, functionalized polystyrene, functionalized polyethyleneglycol, functionalized organic polymers, nitrocellulose or nylon membranes, paper, cotton, and materials suitable for synthesis. Substrates need not be flat. In some embodiments, the substrate is a 2-dimensional array. In some embodiments, the 2-dimensional array is flat. In some embodiments, the 2-dimensional array is not flat, for example, the array is a wave-like array. Substrates include any type of shape including spherical shapes (e.g., beads). Materials attached to substrates may be attached to any portion of the substrates (e.g., may be attached to an interior portion of a porous substrates material). In some embodiments, the substrate is a 3-dimensional array, for example, a microsphere. In some embodiments, the microsphere is magnetic. In some embodiments, the microsphere is glass. In some embodiments, the microsphere is made of polystyrene. In some embodiments, the microsphere is silica-based. In some embodiments, the substrate is an array with interior surface, for example, is a straw, tube, capillary, cylindrical, or microfluidic chamber array. In some embodiments, the substrate comprises multiple straws, capillaries, tubes, cylinders, or chambers.


Nucleic Acids Encoding gNAs


Also provided herein are nucleic acids encoding for gNAs (e.g., gRNAs or gDNAs). In some embodiments, by encoding it is meant that a gNA results from the transcription of a nucleic acid encoding for a gNA (e.g., gRNA). In some embodiments, by encoding, it is meant that the nucleic acid is a template for the transcription of a gNA (e.g., gRNA). In some embodiments, by encoding, it is meant that a gNA results from the reverse transcription of a nucleic acid encoding for a gNA. In some embodiments, by encoding, it is meant that the nucleic acid is a template for the reverse transcription of a gNA. In some embodiments, by encoding, it is meant that a gNA results from the amplification of a nucleic acid encoding for a gNA. In some embodiments, by encoding, it is meant that the nucleic acid is a template for the amplification of a gNA.


In some embodiments the nucleic acid encoding for a gNA comprises a first segment comprising a regulatory region; a second segment comprising targeting sequence, wherein the second segment can range from 15 bp-250 bp; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence.


In some embodiments, the nucleic acids encoding for gNAs comprise DNA. In some embodiments, the first segment is double stranded DNA. In some embodiments, the first segment is single stranded DNA. In some embodiments, the second segment is single stranded DNA. In some embodiments, the third segment is single stranded DNA. In some embodiments, the second segment is double stranded DNA. In some embodiments, the third segment is double stranded DNA.


In some embodiments, the nucleic acids encoding for gNAs comprise RNA.


In some embodiments the nucleic acids encoding for gNAs comprise DNA and RNA.


In some embodiments, the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3.


Collections of gNAs


Provided herein are collections (interchangeably referred to as libraries) of gNAs.


As used herein, a collection of gNAs denotes a mixture of gNAs containing at least 102 unique gNAs. In some embodiments a collection of gNAs contains at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 unique gNAs. In some embodiments a collection of gNAs contains a total of at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 gNAs.


In some embodiments, a collection of gNAs comprises a first NA segment comprising a targeting sequence; and a second NA segment comprising a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein at least 10% of the gNAs in the collection vary in size. In some embodiments, the first and second segments are in 5′- to 3-order′.


In some embodiments, the size of the first segment varies from 15-250 bp, or 30-100 bp, or 22-30 bp, or 15-50 bp, or 15-75 bp, or 15-100 bp, or 15-125 bp, or 15-150 bp, or 15-175 bp, or 15-200 bp, or 15-225 bp, or 15-250 bp, or 22-50 bp, or 22-75 bp, or 22-100 bp, or 22-125 bp, or 22-150 bp, or 22-175 bp, or 22-200 bp, or 22-225 bp, or 22-250 bp across the collection of gNAs.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are greater than 21 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are greater than 25 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are greater than 30 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are 15-50 bp.


In some embodiments, at least 0%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are 30-100 bp.


In some particular embodiments, the size of the first segment is not 20 bp.


In some particular embodiments, the size of the first segment is not 21 bp.


In some embodiments, the gNAs and/or the targeting sequence of the gNAs in the collection of gRNAs comprise unique 5′ ends. In some embodiments, the collection of gNAs exhibit variability in sequence of the 5′ end of the targeting sequence, across the members of the collection. In some embodiments, the collection of gNAs exhibit variability at least 5%, or at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75% variability in the sequence of the 5′ end of the targeting sequence, across the members of the collection.


In some embodiments, the 3′ end of the gNA targeting sequence can be any purine or pyrimidine (and/or modified versions of the same). In some embodiments, the 3′ end of the gNA targeting sequence is an adenine. In some embodiments, the 3′ end of the gNA targeting sequence is a guanine. In some embodiments, the 3′ end of the gNA targeting sequence is a cytosine. In some embodiments, the 3′ end of the gNA targeting sequence is a uracil. In some embodiments, the 3′ end of the gNA targeting sequence is a thymine. In some embodiments, the 3′ end of the gNA targeting sequence is not cytosine.


In some embodiments, the collection of gNAs comprises targeting sequences which can base-pair with the targeted DNA, wherein the target of interest is spaced at least every 1 bp, at least every 2 bp, at least every 3 bp, at least every 4 bp, at least every 5 bp, at least every 6 bp, at least every 7 bp, at least every 8 bp, at least every 9 bp, at least every 10 bp, at least every 11 bp, at least every 12 bp, at least every 13 bp, at least every 14 bp, at least every 15 bp, at least every 16 bp, at least every 17 bp, at least every 18 bp, at least every 19 bp, 20 bp, at least every 25 bp, at least every 30 bp, at least every 40 bp, at least every 50 bp, at least every 100 bp, at least every 200 bp, at least every 300 bp, at least every 400 bp, at least every 500 bp, at least every 600 bp, at least every 700 bp, at least every 800 bp, at least every 900 bp, at least every 1000 bp, at least every 2500 bp, at least every 5000 bp, at least every 10,000 bp, at least every 15,000 bp, at least every 20,000 bp, at least every 25,000 bp, at least every 50,000 bp, at least every 100,000 bp, at least every 250,000 bp, at least every 500,000 bp, at least every 750,000 bp, or even at least every 1,000,000 bp across a genome of interest.


In some embodiments, the collection of gNAs comprises a first NA segment comprising a targeting sequence; and a second NA segment comprising a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein the gNAs in the collection can have a variety of second NA segments with various specificities for protein members of the nucleic acid-guided nuclease system (e.g., CRISPR/Cas system). For example a collection of gNAs as provided herein, can comprise members whose second segment comprises a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a first nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein; and also comprises members whose second segment comprises a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein, wherein the first and second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins are not the same. In some embodiments a collection of gNAs as provided herein comprises members that exhibit specificity to at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or even at least 20 nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins. In one specific embodiment, a collection of gNAs as provided herein comprises members that exhibit specificity for a Cas9 protein and another protein selected from the group consisting of Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.


In some embodiments, a plurality of the gNA members of the collection are attached to a label, comprise a label or are capable of being labeled. In some embodiments, the gNA comprises is a moiety that is further capable of being attached to a label. A label includes, but is not limited to, enzyme, an enzyme substrate, an antibody, an antigen binding fragment, a peptide, a chromophore, a lumiphore, a fluorophore, a chromogen, a hapten, an antigen, a radioactive isotope, a magnetic particle, a metal nanoparticle, a redox active marker group (capable of undergoing a redox reaction), an aptamer, one member of a binding pair, a member of a FRET pair (either a donor or acceptor fluorophore), and combinations thereof.


In some embodiments, a plurality of the gNA members of the collection are attached to a substrate. The substrate can be made of glass, plastic, silicon, silica-based materials, functionalized polystyrene, functionalized polyethyleneglycol, functionalized organic polymers, nitrocellulose or nylon membranes, paper, cotton, and materials suitable for synthesis. Substrates need not be flat. In some embodiments, the substrate is a 2-dimensional array. In some embodiments, the 2-dimensional array is flat. In some embodiments, the 2-dimensional array is not flat, for example, the array is a wave-like array. Substrates include any type of shape including spherical shapes (e.g., beads). Materials attached to substrates may be attached to any portion of the substrates (e.g., may be attached to an interior portion of a porous substrates material). In some embodiments, the substrate is a 3-dimensional array, for example, a microsphere. In some embodiments, the microsphere is magnetic. In some embodiments, the microsphere is glass. In some embodiments, the microsphere is made of polystyrene. In some embodiments, the microsphere is silica-based. In some embodiments, the substrate is an array with interior surface, for example, is a straw, tube, capillary, cylindrical, or microfluidic chamber array. In some embodiments, the substrate comprises multiple straws, capillaries, tubes, cylinders, or chambers.


Collections of Nucleic Acids Encoding % NAs

Provided herein are collections (interchangeably referred to as libraries) of nucleic acids encoding for gNAs (e.g., gRNAs or gDNAs). In some embodiments, by encoding it is meant that a gNA results from the transcription of a nucleic acid encoding for a gNA. In some embodiments, by encoding, it is meant that the nucleic acid is a template for the transcription of a gNA.


As used herein, a collection of nucleic acids encoding for gNAs denotes a mixture of nucleic acids containing at least 102 unique nucleic acids. In some embodiments a collection of nucleic acids encoding for gNAs contains at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 unique nucleic acids encoding for gNAs. In some embodiments a collection of nucleic acids encoding for gNAs contains a total of at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 nucleic acids encoding for gNAs.


In some embodiments, a collection of nucleic acids encoding for gNAs comprises a first segment comprising a regulatory region; a second segment comprising a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein at least 10% of the nucleic acids in the collection vary in size.


In some embodiments, the first, second, and third segments are in 5′- to 3′-order′.


In some embodiments, the nucleic acids encoding for gNAs comprise DNA. In some embodiments, the first segment is single stranded DNA. In some embodiments, the first segment is double stranded DNA. In some embodiments, the second segment is single stranded DNA. In some embodiments, the third segment is single stranded DNA. In some embodiments, the second segment is double stranded DNA. In some embodiments, the third segment is double stranded DNA.


In some embodiments, the nucleic acids encoding for gNAs comprise RNA.


In some embodiments the nucleic acids encoding for gNAs comprise DNA and RNA.


In some embodiments, the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3.


In some embodiments, the size of the second segments (targeting sequence) in the collection varies from 15-250 bp, or 30-100 bp, or 22-30 bp, or 15-50 bp, or 15-75 bp, or 15-100 bp, or 15-125 bp, or 15-150 bp, or 15-175 bp, or 15-200 bp, or 15-225 bp, or 15-250 bp, or 22-50 bp, or 22-75 bp, or 22-100 bp, or 22-125 bp, or 22-150 bp, or 22-175 bp, or 22-200 bp, or 22-225 bp, or 22-250 bp across the collection of gNAs.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are greater than 21 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are greater than 25 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are greater than 30 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are 15-50 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are 30-100 bp.


In some particular embodiments, the size of the second segment is not 20 bp.


In some particular embodiments, the size of the second segment is not 21 bp.


In some embodiments, the gNAs and/or the targeting sequence of the gNAs in the collection of gNAs comprise unique 5′ ends. In some embodiments, the collection of gNAs exhibit variability in sequence of the 5′ end of the targeting sequence, across the members of the collection. In some embodiments, the collection of gNAs exhibit variability at least 5%, or at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75% variability in the sequence of the 5′ end of the targeting sequence, across the members of the collection.


In some embodiments, the collection of nucleic acids comprises targeting sequences, wherein the target of interest is spaced at least every 1 bp, at least every 2 bp, at least every 3 bp, at least every 4 bp, at least every 5 bp, at least every 6 bp, at least every 7 bp, at least every 8 bp, at least every 9 bp, at least every 10 bp, at least every 11 bp, at least every 12 bp, at least every 13 bp, at least every 14 bp, at least every 15 bp, at least every 16 bp, at least every 17 bp, at least every 18 bp, at least every 19 bp, 20 bp, at least every 25 bp, at least every 30 bp, at least every 40 bp, at least every 50 bp, at least every 100 bp, at least every 200 bp, at least every 300 bp, at least every 400 bp, at least every 500 bp, at least every 600 bp, at least every 700 bp, at least every 800 bp, at least every 900 bp, at least every 1000 bp, at least every 2500 bp, at least every 5000 bp, at least every 10,000 bp, at least every 15,000 bp, at least every 20,000 bp, at least every 25,000 bp, at least every 50,000 bp, at least every 100,000 bp, at least every 250,000 bp, at least every 500,000 bp, at least every 750,000 bp, or even at least every 1,000,000 bp across a genome of interest.


In some embodiments, the collection of nucleic acids encoding for gNAs comprise a third segment encoding for a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein the segments in the collection vary in their specificity for protein members of the nucleic acid-guided nuclease system (e.g., CRISPR/Cas system). For example, a collection of nucleic acids encoding for gNAs as provided herein, can comprise members whose third segment encode for a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a first nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein; and also comprises members whose third segment encodes for a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein, wherein the first and second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins are not the same. In some embodiments, a collection of nucleic acids encoding for gNAs as provided herein comprises members that exhibit specificity to at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or even at least 20 nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins. In one specific embodiment, a collection of nucleic acids encoding for gNAs as provided herein comprises members that exhibit specificity for a Cas9 protein and another protein selected from the group consisting of Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.


Sequences of Interest

Provided herein are gNAs and collections of gNAs, derived from any source DNA (for example from genomic DNA, cDNA, artificial DNA, DNA libraries), that can be used to target sequences of interest in a sample for a variety of applications including, but not limited to, enrichment, depletion, capture, partitioning, labeling, regulation, and editing. The gNAs comprise a targeting sequence, directed at sequences of interest.


In some embodiments, the sequences of interest are genomic sequences (genomic DNA). In some embodiments, the sequences of interest are mammalian genomic sequences. In some embodiments, the sequences of interest are eukaryotic genomic sequences. In some embodiments, the sequences of interest are prokaryotic genomic sequences. In some embodiments, the sequences of interest are viral genomic sequences. In some embodiments, the sequences of interest are bacterial genomic sequences. In some embodiments, the sequences of interest are plant genomic sequences. In some embodiments, the sequences of interest are microbial genomic sequences. In some embodiments, the sequences of interest are genomic sequences from a parasite, for example a eukaryotic parasite. In some embodiments, the sequences of interest are host genomic sequences (e.g., the host organism of a microbiome, a parasite, or a pathogen). In some embodiments, the sequences of interest are abundant genomic sequences, such as sequences from the genome or genomes of the most abundant species in a sample.


In some embodiments, the sequences of interest comprise repetitive DNA. In some embodiments, the sequences of interest comprise abundant DNA. In some embodiments, the sequences of interest comprise mitochondrial DNA. In some embodiments, the sequences of interest comprise ribosomal DNA. In some embodiments, the sequences of interest comprise centromeric DNA. In some embodiments, the sequences of interest comprise DNA comprising Alu elements (Alu DNA). In some embodiments, the sequences of interest comprise long interspersed nuclear elements (LINE DNA). In some embodiments, the sequences of interest comprise short interspersed nuclear elements (SINE DNA). In some embodiments, the abundant DNA comprises ribosomal DNA.


In some embodiments, the sequences of interest comprise single nucleotide polymorphisms (SNPs), short tandem repeats (STRs), cancer genes, inserts, deletions, structural variations, exons, genetic mutations, or regulatory regions.


In some embodiments, the sequences of interest can be a genomic fragment, comprising a region of the genome, or the whole genome itself. In one embodiment, the genome is a DNA genome. In another embodiment, the genome is a RNA genome.


In some embodiments, the sequences of interest are from a eukaryotic or prokaryotic organism; from a mammalian organism or a non-mammalian organism; from an animal or a plant; from a bacteria or virus from an animal parasite; from a pathogen.


In some embodiments, the sequences of interest are from any mammalian organism. In one embodiment the mammal is a human. In another embodiment the mammal is a livestock animal, for example a horse, a sheep, a cow, a pig, or a donkey. In another embodiment, a mammalian organism is a domestic pet, for example a cat, a dog, a gerbil, a mouse, a rat. In another embodiment the mammal is a type of a monkey.


In some embodiments, the sequences of interest are from any bird or avian organism. An avian organism includes but is not limited to chicken, turkey, duck and goose.


In some embodiments, the sequences of interest are from a plant. In one embodiment, the plant is rice, maize, wheat, rose, grape, coffee, fruit, tomato, potato, or cotton.


In some embodiments, the sequences of interest are from a species of bacteria. In one embodiment, the bacteria are tuberculosis-causing bacteria.


In some embodiments, the sequences of interest are from a virus.


In some embodiments, the sequences of interest are from a species of fungi.


In some embodiments, the sequences of interest are from a species of algae.


In some embodiments, the sequences of interest are from any mammalian parasite.


In some embodiments, the sequences of interest are obtained from any mammalian parasite. In one embodiment, the parasite is a worm. In another embodiment, the parasite is a malaria-causing parasite. In another embodiment, the parasite is a Leishmaniasis-causing parasite. In another embodiment, the parasite is an amoeba.


In some embodiments, the sequences of interest are from a pathogen.


Targeting Sequences

As used herein, a targeting sequence is one that directs the gNA to the sequences of interest in a sample. For example, a targeting sequence targets a particular sequence of interest, for example the targeting sequence targets a genomic sequence of interest.


Provided herein are gNAs and collections of gNAs that comprise a segment that comprises a targeting sequence. Also provided herein, are nucleic acids encoding for gNAs, and collections of nucleic acids encoding for gNAs that comprise a segment encoding for a targeting sequence.


In some embodiments, the targeting sequence comprises DNA.


In some embodiments, the targeting sequence comprises RNA.


In some embodiments, the targeting sequence comprises RNA, and shares at least 70% sequence identity, at least 75% sequence identity, at least 80% sequence identity, at least 85% sequence identity, at least 90% sequence identity, at least 95% sequence identity, or shares 100% sequence identity to a sequence 5′ to a PAM sequence on a sequence of interest, except that the RNA comprises uracils instead of thymines. In some embodiments, the PAM sequence is AGG, CGG, or TGG.


In some embodiments, the targeting sequence comprises DNA, and shares at least 70% sequence identity, at least 75% sequence identity, at least 80% sequence identity, at least 85% sequence identity, at least 90% sequence identity, at least 95% sequence identity, or shares 100% sequence identity to a sequence 5′ to a PAM sequence on a sequence of interest.


In some embodiments, the targeting sequence comprises RNA and is complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the targeting sequence is at least 70% complementary, at least 75% complementary, at least 80% complementary, at least 85% complementary, at least 90% complementary, at least 95% complementary, or is 100% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the PAM sequence is AGG, CGG, or TGG.


In some embodiments, the targeting sequence comprises DNA and is complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the targeting sequence is at least 70% complementary, at least 75% complementary, at least 80% complementary, at least 85% complementary, at least 90% complementary, at least 95% complementary, or is 100% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the PAM sequence is AGG, CGG, or TGG.


In some embodiments, a DNA encoding for a targeting sequence of a gRNA shares at least 70% sequence identity, at least 75% sequence identity, at least 80% sequence identity, at least 85% sequence identity, at least 90% sequence identity, at least 95% sequence identity, or shares 100 sequence identity to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the PAM sequence is AGG, CGG, or TGG.


In some embodiments, a DNA encoding for a targeting sequence of a gRNA is complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence and is at least 70% complementary, at least 75% complementary, at least 80% complementary, at least 85% complementary, at least 90% complementary, at least 95% complementary, or is 100% complementary to a sequence 5′ to a PAM sequence on a sequence of interest. In some embodiments, the PAM sequence is AGG, CGG, or TGG.


Nucleic Acid-Guided Nuclease System Proteins

Provided herein are gNAs and collections of gNAs comprising a segment that comprises a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence. Also provided herein, are nucleic acids encoding for gNAs, and collections of nucleic acids encoding for gNAs that comprise a segment encoding a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence. A nucleic acid-guided nuclease system can be an RNA-guided nuclease system. A nucleic acid-guided nuclease system can be a DNA-guided nuclease system.


Methods of the present disclosure can utilize nucleic acid-guided nucleases. As used herein, a “nucleic acid-guided nuclease” is any nuclease that cleaves DNA, RNA or DNA/RNA hybrids, and which uses one or more nucleic acid guide nucleic acids (gNAs) to confer specificity. Nucleic acid-guided nucleases include CRISPR/Cas system proteins as well as non-CRISPR/Cas system proteins.


The nucleic acid-guided nucleases provided herein can be DNA guided DNA nucleases; DNA guided RNA nucleases; RNA guided DNA nucleases; or RNA guided RNA nucleases. The nucleases can be endonucleases. The nucleases can be exonucleases. In one embodiment, the nucleic acid-guided nuclease is a nucleic acid-guided-DNA endonuclease. In one embodiment, the nucleic acid-guided nuclease is a nucleic acid-guided-RNA endonuclease.


A nucleic acid-guided nuclease system protein-binding sequence is a nucleic acid sequence that binds any protein member of a nucleic acid-guided nuclease system. For example, a CRISPR/Cas system protein-binding sequence is a nucleic acid sequence that binds any protein member of a CRISPR/Cas system.


In some embodiments, the nucleic acid-guided nuclease is selected from the group consisting of CAS Class I Type 1, CAS Class I Type III, CAS Class 1 Type IV, CAS Class II Type 11, and CAS Class 11 Type V. In some embodiments, CRISPR/Cas system proteins include proteins from CRISPR Type I systems, CRISPR Type II systems, and CRISPR Type III systems. In some embodiments, the nucleic acid-guided nuclease is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, Csf1, C2c2, and NgAgo.


In some embodiments, nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins) can be from any bacterial or archaeal species.


In some embodiments, the nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins) are from, or are derived from nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins) from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, or Corynebacter diphtheria.


In some embodiments, examples of nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins can be naturally occurring or engineered versions.


In some embodiments, naturally occurring nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins include Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. Engineered versions of such proteins can also be employed.


In some embodiments, engineered examples of nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins include catalytically dead nucleic acid-guided nuclease system proteins. The term “catalytically dead” generally refers to a nucleic acid-guided nuclease system protein that has inactivated nucleases (e.g., HNH and RuvC nucleases). Such a protein can bind to a target site in any nucleic acid (where the target site is determined by the guide NA), but the protein is unable to cleave or nick the target nucleic acid (e.g., double-stranded DNA). In some embodiments, the nucleic acid-guided nuclease system catalytically dead protein is a catalytically dead CRISPR/Cas system protein, such as catalytically dead Cas9 (dCas9). Accordingly, the dCas9 allows separation of the mixture into unbound nucleic acids and dCas9-bound fragments. In one embodiment, a dCas9/gRNA complex binds to targets determined by the gRNA sequence. The dCas9 bound can prevent cutting by Cas9 while other manipulations proceed. In another embodiment, the dCas9 can be fused to another enzyme, such as a transposase, to target that enzyme's activity to a specific site. Naturally occurring catalytically dead nucleic acid-guided nuclease system proteins can also be employed.


In some embodiments, engineered examples of nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins also include nucleic acid-guided nickases (e.g., Cas nickases). A nucleic acid-guided nickase refers to a modified version of a nucleic acid-guided nuclease system protein, containing a single inactive catalytic domain. In one embodiment, the nucleic acid-guided nickase is a Cas nickase, such as Cas9 nickase. A Cas9 nickase may contain a single inactive catalytic domain, for example, either the RuvC- or the HNH-domain. With only one active nuclease domain, the Cas9 nickase cuts only one strand of the target DNA, creating a single-strand break or “nick”. Depending on which mutant is used, the guide NA-hybridized strand or the non-hybridized strand may be cleaved. Nucleic acid-guided nickases bound to 2 gNAs that target opposite strands will create a double-strand break in a target double-stranded DNA. This “dual nickase” strategy can increase the specificity of cutting because it requires that both nucleic acid-guided nuclease/gNA (e.g., Cas9/gRNA) complexes be specifically bound at a site before a double-strand break is formed. Naturally occurring nickase nucleic acid-guided nuclease system proteins can also be employed.


In some embodiments, engineered examples of nucleic acid-guided nuclease system proteins also include nucleic acid-guided nuclease system fusion proteins. For example, a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein may be fused to another protein, for example an activator, a repressor, a nuclease, a fluorescent molecule, a radioactive tag, or a transposase.


In some embodiments, the nucleic acid-guided nuclease system protein-binding sequence comprises a gNA (e.g., gRNA) stem-loop sequence.


In some embodiments, a double-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA on the other strand (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4).


In some embodiments, a single-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence: (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA serves as a transcription template.


In some embodiments, the gNA (e.g., gRNA) stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 1)


In some embodiments, a double-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAA AGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and its reverse-complementary DNA on the other strand (5′>3′, GAAAAAAAGCACCGACTCGGTGCCAITTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6).


In some embodiments, a single-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence: (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-stranded DNA serves as a transcription template.


In some embodiments, the gNA (e.g., gRNA) stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC) (SEQ ID NO: 2).


In some embodiments, provided herein is a nucleic acid encoding for a gNA (e.g., gRNA) comprising a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence. In some embodiments, the third segment comprises a single transcribed component, which upon transcription yields a NA (e.g., RNA) stem-loop sequence. In some embodiments, the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence is double-stranded, comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA on the other strand (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4). In some embodiments, the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence is single-stranded, and comprises the following DNA sequence: (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATITTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA serves as a transcription template. In some embodiments, upon transcription from the single transcribed component, the resulting gNA (e.g., gRNA) stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 1). In some embodiments, the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence is double-stranded, comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAA AGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and its reverse-complementary DNA on the other strand (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6). In some embodiments, the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence is single-stranded, and comprises the following DNA sequence: (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACUITITrCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-stranded DNA serves as a transcription template. In some embodiments, upon transcription from the single transcribed component, the yielded gRNA stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC) (SEQ ID NO: 2). In some embodiments, the third segment comprises two sub-segments, which encode for a crRNA and a tracrRNA upon transcription. In some embodiment, the crRNA does not comprise the N20 plus the extra sequence which can hybridize with tracrRNA. In some embodiments, the crRNA comprises the extra sequence which can hybridize with tracrRNA. In some embodiments, the two sub-segments are independently transcribed. In some embodiments, the two sub-segments are transcribed as a single unit. In some embodiments, the DNA encoding the crRNA comprises NtargetGTTTTAGAGCTATGCTGTTTTG (SEQ ID NO: 7), where Ntarget represents the targeting sequence. In some embodiments, the DNA encoding the tracrRNA comprises the sequence GGAACCATTCAAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGC ACCGAGTCGGTGCTTTTTTT (SEQ ID NO: 8).


In some embodiments, provided herein is a nucleic acid encoding for a gNA (e.g., gRNA) comprising a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence. In some embodiments, the third segment comprises a DNA sequence, which upon transcription yields a gRNA stem-loop sequence capable of binding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein. In one embodiment, the DNA sequence can be double-stranded. In some embodiments, the third segment double stranded DNA comprises the following DNA sequence on one strand (5′>3′, GTITAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA on the other strand (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4). In some embodiments, the third segment double stranded DNA comprises the following DNA sequence on one strand (5′>3′, GTTITAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAA AGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and its reverse-complementary DNA on the other strand (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTrFITCAAGTTGATAACGGACTAGCCTTATTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6). In one embodiment, the DNA sequence can be single-stranded. In some embodiments, the third segment single stranded DNA comprises the following DNA sequence (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTITrrICAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA serves as a transcription template. In some embodiments, the third segment single stranded DNA comprises the following DNA sequence (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-stranded DNA serves as a transcription template. In some embodiments, the third segment comprises a DNA sequence which, upon transcription, yields a first RNA sequence that is capable of forming a hybrid with a second RNA sequence, and which hybrid is capable of CRISPR/Cas system protein binding. In some embodiments, the third segment is double-stranded DNA comprising the DNA sequence on one strand: (5′>3′, GTTTTAGAGCTATGCTGTTTTG) (SEQ ID NO: 9) and its reverse complementary DNA sequence on the other strand: (5′>3′, CAAAACAGCATAGCTCTAAAAC) (SEQ ID NO: 10). In some embodiments, the third segment is single-stranded DNA comprising the DNA sequence of (5′>3′, CAAAACAGCATAGCTCTAAAAC) (SEQ ID NO: 10). In some embodiments, the second segment and the third segment together encode for a crRNA sequence. In some embodiments, the second RNA sequence that is capable of forming a hybrid with the first RNA sequence encoded by the third segment of the nucleic acid encoding a gRNA is a tracrRNA. In some embodiments, the tracrRNA comprises the sequence (5′>3′, GGAACCAUUCAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG GCACCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 11). In some embodiments, the tracrRNA is encoded by a double-stranded DNA comprising sequence of (5′>3′, GGAACCATTCAAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGC ACCGAGTCGGTGCTTTTTTT) (SEQ ID NO: 8), and optionally fused with a regulatory sequence at its 5′ end. In some embodiments, the regulatory sequence can be bound by a transcription factor. In some embodiments, the regulatory sequence is a promoter. In some embodiments, the regulatory sequence is a T7 promoter, comprising the sequence of (5′>3′, GCCTCGAGCTAATACGACTCACTATAGAG) (SEQ ID NO: 12).


In some embodiments, provided herein is a nucleic acid encoding for a gNA comprising a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence. In some embodiments, the third segment encodes for a RNA sequence that, upon post-transcriptional cleavage, yields a first RNA segment and a second RNA segment. In some embodiments, the first RNA segment comprises a crRNA and the second RNA segment comprises a tracrRNA, which can form a hybrid and together, provide for nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein binding. In some embodiments, the third segment further comprises a spacer in between the transcriptional unit for the first RNA segment and the second RNA segment, which spacer comprises an enzyme cleavage site.


In some embodiments, provided herein is a gNA (e.g., gRNA) comprising a first NA segment comprising a targeting sequence and a second NA segment comprising a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence. In some embodiments, the size of the first segment is greater than 30 bp. In some embodiments, the second segment comprises a single segment, which comprises the gRNA stem-loop sequence. In some embodiments, the gRNA stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 1). In some embodiments, the gRNA stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC) (SEQ ID NO: 2). In some embodiments, the second segment comprises two sub-segments: a first RNA sub-segment (crRNA) that forms a hybrid with a second RNA sub-segment (tracrRNA), which together act to direct nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein binding. In some embodiments, the sequence of the second sub-segment comprises GUUUUAGAGCUAUGCUGUUUUG. In some embodiments, the first RNA segment and the second RNA segment together forms a crRNA sequence. In some embodiments, the other RNA that will form a hybrid with the second RNA segment is a tracrRNA. In some embodiments the tracrRNA comprises the sequence of 5′>3′, GGAACCAUUCAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG GCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 11).


CRISPR/Cas System Nucleic Acid-Guided Nucleases

In some embodiments, CRISPR/Cas system proteins are used in the embodiments provided herein. In some embodiments, CRISPR/Cas system proteins include proteins from CRISPR Type I systems, CRISPR Type II systems, and CRISPR Type III systems.


In some embodiments, CRISPR/Cas system proteins can be from any bacterial or archaeal species.


In some embodiments, the CRISPR/Cas system protein is isolated, recombinantly produced, or synthetic.


In some embodiments, the CRISPR/Cas system proteins are from, or are derived from CRISPR/Cas system proteins from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, or Corynebacter diphtheria.


In some embodiments, examples of CRISPR/Cas system proteins can be naturally occurring or engineered versions.


In some embodiments, naturally occurring CRISPR/Cas system proteins can belong to CAS Class I Type I, III, or IV, or CAS Class II Type II or V, and can include Cas9, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cmr5, Csf1, C2c2, and Cpf1.


In an exemplary embodiment, the CRISPR/Cas system protein comprises Cas9.


A “CRISPR/Cas system protein-gNA complex” refers to a complex comprising a CRISPR/Cas system protein and a guide NA (e.g. a gRNA or a gDNA). Where the gNA is a gRNA, the gRNA may be composed of two molecules, i.e., one RNA (“crRNA”) which hybridizes to a target and provides sequence specificity, and one RNA, the “tracrRNA”, which is capable of hybridizing to the crRNA. Alternatively, the guide RNA may be a single molecule (i.e., a gRNA) that contains crRNA and tracrRNA sequences.


A CRISPR/Cas system protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type CRISPR/Cas system protein. The CRISPR/Cas system protein may have all the functions of a wild type CRISPR/Cas system protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.


The term “CRISPR/Cas system protein-associated guide NA” refers to a guide NA. The CRISPR/Cas system protein-associated guide NA may exist as isolated NA, or as part of a CRISPR/Cas system protein-gNA complex.


Cas9

In some embodiments, the CRISPR/Cas System protein nucleic acid-guided nuclease is or comprises Cas9. The Cas9 of the present invention can be isolated, recombinantly produced, or synthetic.


Examples of Cas9 proteins that can be used in the embodiments herin can be found in F. A. Ran, L. Cong, W. X. Yan, D. A. Scott, J. S. Gootenberg, A. J. Kriz, B. Zetsche, O. Shalem, X. Wu, K. S. Makarova, E. V. Koonin, P. A. Sharp, and F. Zhang; “In vivo genome editing using Staphylococcus aureus Cas9” Nature 520, 186-191 (9 Apr. 2015) doi:10.1038/nature14299, which is incorporated herein by reference.


In some embodiments, the Cas9 is a Type II CRISPR system derived from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lar, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, or Corynebacter diphtheria.


In some embodiments, the Cas9 is a Type II CRISPR system derived from S. pyogenes and the PAM sequence is NGG located on the immediate 3′ end of the target specific guide sequence. The PAM sequences of Type II CRISPR systems from exemplary bacterial species can also include: Streptococcus pyogenes (NGG), Staph aureus (NNGRRT), Neisseria meningitidis (NNNNGA TT), Streptococcus thermophilus (NNAGAA) and Treponema denticola (NAAAAC) which are all usable without deviating from the present invention.


In one exemplary embodiment, Cas9 sequence can be obtained, for example, from the pX330 plasmid (available from Addgene), re-amplified by PCR then cloned into pET30 (from EMD biosciences) to express in bacteria and purify the recombinant 6His tagged protein.


A “Cas9-gNA complex” refers to a complex comprising a Cas9 protein and a guide NA. A Cas9 protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type Cas9 protein, e.g., to the Streptococcus pyogenes Cas9 protein. The Cas9 protein may have all the functions of a wild type Cas9 protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.


The term “Cas9-associated guide NA” refers to a guide NA as described above. The Cas9-associated guide NA may exist isolated, or as part of a Cas9-gNA complex.


Non-CRISPR/Cas System Nucleic Acid-Guided Nucleases

In some embodiments, non-CRISPR/Cas system proteins are used in the embodiments provided herein.


In some embodiments, the non-CRISPR/Cas system proteins can be from any bacterial or archaeal species.


In some embodiments, the non-CRISPR/Cas system protein is isolated, recombinantly produced, or synthetic.


In some embodiments, the non-CRISPR/Cas system proteins are from, or are derived from Aquifex aeolicus, Thermus thermophilus, Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, Natronobacterium gregoryi, or Corynebacter diphtheria.


In some embodiments, the non-CRISPR/Cas system proteins can be naturally occurring or engineered versions.


In some embodiments, a naturally occurring non-CRISPR/Cas system protein is NgAgo (Argonaute from Natronobacterium gregoryi).


A “non-CRISPR/Cas system protein-gNA complex” refers to a complex comprising a non-CRISPR/Cas system protein and a guide NA (e.g. a gRNA or a gDNA). Where the gNA is a gRNA, the gRNA may be composed of two molecules, i.e., one RNA (“crRNA”) which hybridizes to a target and provides sequence specificity, and one RNA, the “tracrRNA”, which is capable of hybridizing to the crRNA. Alternatively, the guide RNA may be a single molecule (i.e., a gRNA) that contains crRNA and tracrRNA sequences.


A non-CRISPR/Cas system protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type non-CRISPR/Cas system protein. The non-CRISPR/Cas system protein may have all the functions of a wild type non-CRISPR/Cas system protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.


The term “non-CRISPR/Cas system protein-associated guide NA” refers to a guide NA. The non-CRISPR/Cas system protein-associated guide NA may exist as isolated NA, or as part of a non-CRISPR/Cas system protein-gNA complex.


Catalytically Dead Nucleic Acid-Guided Nucleases

In some embodiments, engineered examples of nucleic acid-guided nucleases include catalytically dead nucleic acid-guided nucleases (CRISPR/Cas system nucleic acid-guided nucleases or non-CRISPR/Cas system nucleic acid-guided nucleases). The term “catalytically dead” generally refers to a nucleic acid-guided nuclease that has inactivated nucleases, for example inactivated HNH and RuvC nucleases. Such a protein can bind to a target site in any nucleic acid (where the target site is determined by the guide NA), but the protein is unable to cleave or nick the nucleic acid.


Accordingly, the catalytically dead nucleic acid-guided nuclease allows separation of the mixture into unbound nucleic acids and catalytically dead nucleic acid-guided nuclease-bound fragments. In one exemplary embodiment, a dCas9/gRNA complex binds to the targets determined by the gRNA sequence. The dCas9 bound can prevent cutting by Cas9 while other manipulations proceed.


In another embodiment, the catalytically dead nucleic acid-guided nuclease can be fused to another enzyme, such as a transposase, to target that enzyme's activity to a specific site.


In some embodiments, the catalytically dead nucleic acid-guided nuclease is dCas9, dCpf1, dCas3, dCas8a-c, dCas10, dCse1, dCsy1, dCsn2, dCas4, dCsm2, dCm5, dCsf1, dC2C2, or dNgAgo.


In one exemplary embodiment the catalytically dead nucleic acid-guided nuclease protein is a dCas9.


Nucleic Acid-Guided Nuclease Nickases

In some embodiments, engineered examples of nucleic acid-guided nucleases include nucleic acid-guided nuclease nickases (referred to interchangeably as nickase nucleic acid-guided nucleases).


In some embodiments, engineered examples of nucleic acid-guided nucleases include CRISPR/Cas system nickases or non-CRISPR/Cas system nickases, containing a single inactive catalytic domain.


In some embodiments, the nucleic acid-guided nuclease nickase is a Cas9 nickase, Cpf1 nickase, Cas3 nickase, Cas8a-c nickase, Cas10 nickase, Cse1 nickase, Csy1 nickase, Csn2 nickase, Cas4 nickase, Csm2 nickase, Cm5 nickase, Csf1 nickase, C2C2 nickase, or a NgAgo nickase.


In one embodiment, the nucleic acid-guided nuclease nickase is a Cas9 nickase.


In some embodiments, a nucleic acid-guided nuclease nickase can be used to bind to target sequence. With only one active nuclease domain, the nucleic acid-guided nuclease nickase cuts only one strand of a target DNA, creating a single-strand break or “nick”. Depending on which mutant is used, the guide NA-hybridized strand or the non-hybridized strand may be cleaved. nucleic acid-guided nuclease nickases bound to 2 gNAs that target opposite strands can create a double-strand break in the nucleic acid. This “dual nickase” strategy increases the specificity of cutting because it requires that both nucleic acid-guided nuclease/gNA complexes be specifically bound at a site before a double-strand break is formed.


In exemplary embodiments, a Cas9 nickase can be used to bind to target sequence. The term “Cas9 nickase” refers to a modified version of the Cas9 protein, containing a single inactive catalytic domain, i.e., either the RuvC- or the HNH-domain. With only one active nuclease domain, the Cas9 nickase cuts only one strand of the target DNA, creating a single-strand break or “nick”. Depending on which mutant is used, the guide RNA-hybridized strand or the non-hybridized strand may be cleaved. Cas9 nickases bound to 2 gRNAs that target opposite strands will create a double-strand break in the DNA. This “dual nickase” strategy can increase the specificity of cutting because it requires that both Cas9/gRNA complexes be specifically bound at a site before a double-strand break is formed.


Capture of DNA can be carried out using a nucleic acid-guided nuclease nickase. In one exemplary embodiment, a nucleic acid-guided nuclease nickase cuts a single strand of double stranded nucleic acid, wherein the double stranded region comprises methylated nucleotides.


Dissociable and Thermostable Nucleic Acid-Guided Nucleases

In some embodiments, thermostable nucleic acid-guided nucleases are used in the methods provided herein (thermostable CRISPR/Cas system nucleic acid-guided nucleases or thermostable non-CRISPR/Cas system nucleic acid-guided nucleases). In such embodiments, the reaction temperature is elevated, inducing dissociation of the protein; the reaction temperature is lowered, allowing for the generation of additional cleaved target sequences. In some embodiments, thermostable nucleic acid-guided nucleases maintain at least 50% activity, at least 55% activity, at least 60% activity, at least 65% activity, at least 70% activity, at least 75% activity, at least 80% activity, at least 85% activity, at least 90% activity, at least 95% activity, at least 96% activity, at least 97% activity, at least 98% activity, at least 99% activity, or 100% activity, when maintained for at least 75° C. for at least 1 minute. In some embodiments, thermostable nucleic acid-guided nucleases maintain at least 50% activity, when maintained for at least 1 minute at least at 75° C., at least at 80° C., at least at 85° C., at least at 90° C., at least at 91° C., at least at 92° C., at least at 93° C., at least at 94° C., at least at 95° C., 96° C., at least at 97° C., at least at 98° C., at least at 99° C., or at least at 100° C. In some embodiments, thermostable nucleic acid-guided nucleases maintain at least 50% activity, when maintained at least at 75° C. for at least 1 minute, 2 minutes, 3 minutes, 4 minutes, or 5 minutes. In some embodiments, a thermostable nucleic acid-guided nuclease maintains at least 50% activity when the temperature is elevated, lowered to 25° C.-50° C. In some embodiments, the temperature is lowered to 25° C., to 30° C., to 35° C. to 40° C., to 45° C., or to 50° C. In one exemplary embodiment, a thermostable enzyme retains at least 90% activity after 1 min at 95° C.


In some embodiments, the thermostable nucleic acid-guided nuclease is thermostable Cas9, thermostable Cpf1, thermostable Cas3, thermostable Cas8a-c, thermostable Cas10, thermostable Cse1, thermostable Csy1, thermostable Csn2, thermostable Cas4, thermostable Csm2, thermostable Cm5, thermostable Csf1, thermostable C2C2, or thermostable NgAgo.


In some embodiments, the thermostable CRISPR/Cas system protein is thermostable Cas9.


Thermostable nucleic acid-guided nucleases can be isolated, for example, identified by sequence homology in the genome of thermophilic bacteria Streptococcus thermophilus and Pyrococcus furiosus. Nucleic acid-guided nuclease genes can then be cloned into an expression vector. In one exemplary embodiment, a thermostable Cas9 protein is isolated.


In another embodiment, a thermostable nucleic acid-guided nuclease can be obtained by in vitro evolution of a non-thermostable nucleic acid-guided nuclease. The sequence of a nucleic acid-guided nuclease can be mutagenized to improve its thermostability.


Methods of Making Collections of gNAs


Provided herein are methods that enable the generation of a large number of diverse gRNAs, collections of gNAs, from any source nucleic acid (e.g., DNA). Methods provided herein can employ enzymatic methods including but not limited to digestion, ligation, extension, overhang filling, transcription, reverse transcription, amplification.


Generally, the method can comprise providing a nucleic acid (e.g., DNA); employing a first enzyme (or combinations of first enzymes) that cuts at a part of the PAM sequence in the nucleic acid, in a way that a residual nucleotide sequence from the PAM sequence is left; ligating an adapter that positions a restriction enzyme typeIIS site (an enzyme that cuts outside yet near its recognition motif) at a distance to eliminate the PAM sequence; employing a second typeIIS enzyme (or combination of second enzymes) to eliminate the PAM sequence together with the adapter; and fusing a sequence that can be recognized by protein members of the nucleic acid-guided nuclease (e.g., CRISPR/Cas) system, for example, a gRNA stem-loop sequence. In some embodiments, the first enzymatic reactions cuts part of the PAM sequence in a way that residual nucleotide sequence from the PAM sequence is left, and that the nucleotide sequence immediately 5′ to the PAM sequence can be any purine or pyrimidine, not just those with a cytosine 5′ to the PAM sequence, for example, not just those that are C/NGG or C/TAG, etc.


Table 1 shows exemplary strategies/protocols to convert any source nucleic acid (e.g., DNA) into a collection of gNAs (e.g., gRNAs) using different restriction enzymes.









TABLE 1







Exemplary strategies for preparing a collection of guide nucleic acids.













First

3′ Adapter sequence with


CRISPR/Cas
PAM
Enzyme/

typeIIS enzyme site


System
Se-
Compo-

(provided with only one


Species
quence
nents
Strategy
strand sequence 5′→3′)






Streptococcus

NGG
CviPII
Nicks immediately 5′ of CCD sequence,
ggGACTCggatccctatagtc



pyogenes



nicks the other strand with T7 endonuclease
(SEQ ID NO: 4421)


(SP); SpCas9


I, blunt with T4 DNA polymerase; ligate to






adapter; cut with MlyI to remove PAM and






adapter; ligate gRNA stem-loop sequence at






3′ end







Staphylococcus

NNGRRT
AlwI
Cut, blunt with T4 DNA polymerase; ligate to
ttttagcggccgcctgctgCTCtacaa



aureus

or

adapter SA; cut with EcoP15I to remove
agacgatgacgacaagcgt


(SA); SaCas9
NNGRR

PAM and adapter; blunt end; ligate gRNA
(SEQ ID NO: 4422)



(N)

stem-loop sequence at 3′ end







Neisseria

NNNNGA
TfiI
Cut, blunt with T4 DNA polymerase; ligate to
TCgcggccgcttttattctgctgCTCt



meningitidis

TT

adapter NM; cut with EcoRI to eliminate
acaaagacgatgacgacaagcgt


(NM)


unwanted DNA and EcoP15I to remove PAM
(SEQ ID NO: 4428)





and adapter; blunt end; ligate gRNA stem-






loop sequence at 3′ end







Streptococcus

NNAGAA
BsmI
Cut, blunt with T4 DNA polymerase; ligate to
ttacggccgcttttattctgctgCTCt



thermophilus

W

adapter ST; cut with EcoP15I to remove PAM
acaaagacgatgacgacaagcgt


(ST)


and adapter; blunt end; ligate gRNA stem-
(SEQ ID NO: 4429)





loop sequence at 3′ end







Treponema

NAAAAC
Cly7489I
Cut, blunt with T4 DNA polymerase; ligate to
tttagcggccgcctgecgCTCtacaaa



denticola


I
adapter TD; cut with EcoP15I to remove
gacgatgacgacaagcgt


(TD)


PAM and adapter
(SEQ ID NO: 4430)









Table 2 shows additional exemplary strategies/protocols to convert any source nucleic acid (e.g., DNA) into a collection of gNAs (e.g., gRNAs) using different restriction enzymes.









TABLE 2







Additional exemplary strategies for preparing a collection of guide nucleic acids.











CRISPR/

First

Adapter oligo sequence (with


Cas System
PAM
Enzyme/

Inosine overhangs, all in 5′→3′


Species
Sequence
Component
Exemplary Strategy
direction)






Streptococcus

NGG
CviPII
Nicks immediately 5′ of CCD
Adapter oligo I:



pyogenes (SP);



sequence, nicks the other strand
ggggGACTCggatccctatagtgatac


SpCas9


with T7 endonuclease I; ligate to
aaagacgatgacgacaagcg





adapter; cut with MlyI to remove
(SEQ ID NO: 4404)





PAM and 3′ adapter; ligate gRNA
Adapter oligo 2:





stem-loop sequence at 3′ end
gcctcgagc*t*a*atacgactcactatag






ggatccaagtccc






(* denotes a phosphorothioate






backbone linkage)






(SEQ ID NO: 4405)






Staphylococcus

NNGRRT or
AlwI
Cut; ligate to adapter SA; cut
Adapter oligo 1:



aureus (SA),

NNGRR(N)

with EcoP15I to remove PAM and 3′
IttttagcggccgcctgctgCTCtacaaa


SaCas9


adapter; blunt end; ligate gRNA
gacgatgacgacaagcgt





stem-loop sequence at 3′ end
(SEQ ID NO: 4422)






Adapter oligo 2:






gagatcagcttctgcattgatgcGAGcag






caggcggccgctaaaa






(SEQ ID NO: 4423)






Neisseria

NNNNGATT
TfiI
Cut; ligate to adapter NM; cut
Adapter oligo 1:



meningitidis



with EcoP15I to remove PAM and
attTCgcggccgcttttattctgctgCTCt


(NM)


3′ adapter; blunt end; ligate
acaaagacgatgacgacaagcgt





gRNA stem-loop sequence at 3′
(SEQ ID NO: 4424)





end
Adapter oligo 2:






gagatcagcttctgcattgatgcGAGcag






cagaataaaagcggccgcGA






(SEQ ID NO: 4425)






Streptococcus

NNAGAAW
BsmI
Cut; ligate to adapter ST; cut
Adapter oligo 1:



thermophilus



with EcoP15I to remove PAM and 3′
gcggccgcttttattctgctgCTCtacaaa


(ST)


adapter; blunt end; ligate gRNA
gacgatgacgacaagcgt





stem-loop sequence at 3′ end
(SEQ ID NO: 4426)






Adapter oligo 2:






gagatcagcttctgcattgatgcGAGcag






cagaataaaagcggccgcIG






SEQ ID NO: 4427)









Exemplary applications of the compositions and methods described herein are provided in FIG. 1, FIG. 2, FIG. 3, FIG. 4, FIG. 5, FIG. 6, and FIG. 7. The figures depict non-limiting exemplary embodiments of the present invention that includes a method of constructing a gNA library (e.g., gRNA library) from input nucleic acids (e.g., DNA), such as genomic DNA (e.g., human genomic DNA).


In FIG. 1, the starting material can be fragmented genomic DNA (e.g., human) or other source DNA. These fragments are blunt-ended before constructing the library 101. T7 promoter adapters are ligated to the blunt-ended DNA fragments 102, which is then PCR amplified. Nt.CviPII is then used to generate a nick on one strand of the PCR product immediately 5′ to the CCD sequence 103. T7 Endonuclease I cleaves on the opposite strand 1, 2, or 3 bp 5′ of the nick 104. The resulting DNA fragments are blunt-ended with T4 DNA Polymerase, leaving HGG sequence at the end of the DNA fragment 105. The resulting DNA is cleaned and recovered on beads. An adapter carrying MlyI recognition site is ligated to the blunt-ended DNA fragment immediately 3′ of HGG sequence 106. MlyI generates a blunt-end cleavage immediately 5′ to the HGG sequence, removing HGG together with the adapter sequence 107. The resulting DNA fragments are cleaned and recovered again on beads. A gRNA stem-loop sequence is then ligated to the blunt-end cleaved by MlyI, forming a gRNA library covering the human genome 108. This library of DNA is then PCR amplified and cleaned on beads, ready for in vitro transcription.


In FIG. 2, the starting material can intact genomic DNA (e.g., human) or other source DNA 201. Nt.CviPII and T7 Endonuclease I are used to generate nicks on each strand of the human genomic DNA, resulting in smaller DNA fragments 202. DNA fragments of 200-600 bp are size selected on beads, then ligated with Y-shaped adapters carrying a GG overhang on the 5′. One strand of the Y-shaped adapter contains a MlyI recognition site, wherein the other strand contains a mutated MlyI site and a T7 promoter sequence 203. Because of these features, after PCR amplification, the T7 promoter sequence is at the distal end of the HGG sequence, and the MlyI sequence is at the rear end of HGG 204. Digestion with MlyI generates a cleavage immediately 5′ of HGG sequence 205. MlyI generates a blunt-end cleavage immediately 5′ to the HGG sequence, removing HGG together with the adapter sequence 206. A gRNA stem-loop sequence is then ligated to the blunt-end cleaved by MlyI, forming a gRNA library covering the human genome. This library of DNA is then PCR amplified and cleaned on beads, ready for in vitro transcription.


In FIG. 3, the source DNA (e.g., genomic DNA) can be nicked 301, for example with a nicking enzyme. In some cases, the nicking enzyme can have a recognition site that is three or fewer bases in length. In some cases, CviPII is used, which can recognize and nick at a sequence of CCD (where D represents a base other than C). Nicks can be proximal, surrounding a region containing the sequence (represented by the thicker line) which will be used to yield the guide RNA N20 sequence. When nicks are proximal, a double stranded break can occur and lead to 5′ or 3′ overhangs 302. These overhangs can be repaired, for example with a polymerase (e.g., T4 polymerase). In some cases, such as with 5′ strands, repair can comprise synthesizing a complementary strand. In some case, such as with 3′ strands, repair can comprise removing overhangs. Repair can result in a blunt end including the N20 guide sequence and a sequence complementary to the nick recognition sequence (e.g., HGG, where H represents a base other than G).


In FIG. 4, continuing for example from the end of FIG. 3, different combinations of adapters can be ligated to the DNA to allow for the desired cleaving. Adapters with a recognition site for a nuclease enzyme that cuts 3 base pairs from the site (e.g., MlyI) can be ligated 401, and digestion at that site can be used to remove a left over sequence, such as an HGG sequence 402. Adapters with a recognition site for a nuclease that cuts 20 base pairs from the site (e.g., MmeI) 403. These adapters can also include a second recognition site for a nuclease that cuts the proper number of nucleotides from the site to later remove the first recognition site (e.g., BsaXI). The first enzyme can be used to cut 20 nucleotides down, thereby keeping the N20 sequence 404. Then, a promoter adapter (e.g., T7) can be ligated next to the N20 sequence 405. Then, the nuclease corresponding to the second recognition site (e.g., BsaXI) can be used to remove the adapter for the site that cuts 20 nucleotides away (e.g., MmeI) 406. Finally, the guide RNA stem-loop sequence adapter can be ligated to the N20 sequence 407 to prepare for guide RNA production.


Alternatively, the protocol shown in FIG. 5 can follow the end of a protocol such as that shown in FIG. 3. Adapters with a recognition site for a nuclease enzyme that cleaves 25 nucleotides from the site (e.g., EcoP151) can be ligated to the DNA 501. These adapters can also include a second recognition site for a nuclease that cuts the proper number of nucleotides from the site to later remove the first recognition site (e.g., Bac) and any other left-over sequence, such as HGG. The enzyme corresponding to the first recognition site (e.g., EcoP15I) can then be used to cleave after the N20 sequence 502. Then, a promoter adapter (e.g., T7) can be ligated next to the N20 sequence 503. The enzyme corresponding to the second recognition site (e.g., BaeI) can then be used to remove the recognition sites and any residual sequence (e.g., HGG) 504. Finally, the guide RNA stem-loop sequence adapter can be ligated (e.g., by single strand ligation) to the N20 sequence 505.


As an alternative to protocols such as that shown in FIG. 3, the protocol shown in FIG. 6 can be used in preparation for protocols such as those shown in FIG. 4 or FIG. 5. A nick can be introduced by a nicking enzyme (e.g., CviPII) 601. In some cases, the nick recognition site is three or fewer bases in length. In some cases, CviPII is used, which can recognize and nick at a sequence of CCD. A polymerase (e.g., Bst large fragment DNA polymerase) can then be used to synthesize a new DNA strand starting from the nick while displacing the old strand 602. Because of the DNA synthesis, the nick can be sealed and made available to be nicked again 603. Subsequent cycles of nicking and synthesis can be used to yield large amounts of target sequences 604. These single stranded copies of target sequences can be made double stranded, for example by random priming and extension. These double stranded nucleic acids comprising N20 sequences can then be further processed by methods disclosed herein, such as those shown in FIG. 4 or FIG. 5.


As another alternative to protocols such as that shown in FIG. 3 or FIG. 6, the protocol shown in FIG. 7 can be used in preparation for protocols such as those shown in FIG. 4 or FIG. 5. A nick can be introduced by a nicking enzyme (e.g., CviPII) 701. In some cases, the nicking enzyme recognition site is three or fewer bases in length. In some cases, CviPII is used, which can recognize and nick at a sequence of CCD. A polymeras (e.g., Bst large fragment DNA polymerase) can then be used to synthesize a new DNA strand starting from the nick while displacing the old strand (e.g., nicking endonuclease-mediated strand-displacement DNA amplification (NEMDA)). The reaction parameters can be adjusted to control the size of the single stranded DNA produced. For example, the nickase:polymerase ratio (e.g., CviPII:Bts large fragment polymerase ratio) can be adjusted. Reaction temperature can also be adjusted. Next, an oligonucleotide can be added 704 which has (in the 5′>3′ direction) a promoter (e.g., T7 promoter) 702 followed by a random n-mer (e.g., random 6-mer, random 8-mer) 703. The random n-mer region can bind to a region of the single stranded DNA generated previously. For example, binding can be conducted by denaturing at high temperature followed by rapid cool down, which can allow the random n-mer region to bind to the single stranded DNA generated by NEMDA. In some cases, the DNA is denatured at 98° C. for 7 minutes then cooled down rapidly to 10° C. Extension and/or amplification can be used to produce double-stranded DNA. Blunt ends can be produced, for example enzymatically (e.g., by treatment with DNA polymerase I at 20° C.). This can result in one end ending at the promoter (e.g., T7 promoter) and the other end ending at any nicking enzyme recognition sites (e.g., any CCD sites). These fragments can then be purified, for example by size selection (e.g., by gel purification, capillary electrophoresis, or other fragment separation techniques). In some cases, the target fragments are about 50 base pairs in length (adapter sequence (e.g., T7 adapter)+target N20 sequence+nicking enzyme recognition site or complement (e.g., HGG)). Fragments can then be ligated to an adapter comprising a nuclease recognition site for a nuclease that cuts an appropriate distance away to remove the nicking enzyme recognition site 705. For example, for a three-nucleotide long nicking enzyme recognition site (e.g., CCD for CviPII). BaeI can be used. The appropriate nuclease (e.g., BaeI) can then be used to remove the nuclease recognition site and the nicking enzyme recognition site 706. The remaining nucleic acid sequence (e.g., the N20 site) can then be ligated to the final stem-loop sequence for the guide RNA 707. Amplification (e.g., PCR) can be conducted. Guide RNAs can be produced.


In some embodiments, a collection of gNAs (e.g., gRNAs) targeting human mitochondrial DNA (mtDNA) is created, that can be used for directing nucleic acid-guided nuclease (e.g., Cas9) proteins, comprising the nucleic acid-guided nuclease (e.g., Cas9) target sequence. In some embodiments, the targeting sequence of this collection of gNAs (e.g., gRNAs) are encoded by DNA sequences comprising at least the 20 nt sequence provided in the second column from the right of Table 3 (if the NGG sequence is on positive strand) and Table 4 (if the NGG sequence is on negative strand). In some embodiments, a collection of gRNA nucleic acids, as provided herein, with specificity for human mitochondrial DNA, comprise a plurality of members, wherein the members comprise a plurality of targeting sequences provided in the second column from the right column of Table 3 and/or the second column from the right of Table 4.









TABLE 3







gRNA target sequence for human mtDNA carrying NGG sequence on the (+) strand.













Chr end
nt sequence on the (+)

20 nt gRNA target



Chr start
position
strand containing gRNA

sequence



position
(+
target sequence followed
SEQ
(will encode the gRNA
SEQ


(+ strand)
strand)
by NGG
ID NO:
targeting sequence)
ID NO:















13
35
ATCACCCTATTAACCAC
13
ATCACCCTATTAACCA
436




TCACGG

CTCA






14
36
TCACCCTATTAACCACT
14
TCACCCTATTAACCAC
437




CACGGG

TCAC






32
54
ACGGGAGCTCTCCATGC
15
ACGGGAGCTCTCCATG
438




ATTTGG

CATT






45
67
ATGCATTTGGTATTTTC
16
ATGCATTTGGTATTTT
439




GTCTGG

CGTC






46
68
TGCATTTGGTATTTTCGT
17
TGCATTTGGTATTTTC
440




CTGGG

GTCT






47
69
GCATTTGGTATTTTCGT
18
GCATTTGGTATTTTCG
441




CTGGGG

TCTG






48
70
CATTTGGTATTTTCGTCT
19
CATTTGGTATTTTCGTC
442




GGGGG

TGG






49
71
ATTTGGTATTTTCGTCTG
20
ATTTGGTATTTTCGTCT
443




GGGGG

GGG






79
101
GCGATAGCATTGCGAGA
21
GCGATAGCATTGCGAG
444




CGCTGG

ACGC






85
107
GCATTGCGAGACGCTGG
22
GCATTGCGAGACGCTG
445




AGCCGG

GAGC






163
185
GCACCTACGTTCAATAT
23
GCACCTACGTTCAATA
446




TACAGG

TTAC






207
229
GTTAATTAATTAATGCT
24
GTTAATTAATTAATGC
447




TGTAGG

TTGT






301
323
AACCCCCCCTCCCCCGC
25
AACCCCCCCTCCCCCG
448




TTCTGG

CTTC






388
410
AGATTTCAAATTTTATC
26
AGATTTCAAATTTTAT
449




TTTTGG

CTTT






391
413
TTTCAAATTTTATCTTTT
27
TTTCAAATTTTATCTTT
450




GGCGG

TGG






604
626
ATACACTGAAAATGTTT
28
ATACACTGAAAATGTT
451




AGACGG

TAGA






605
627
TACACTGAAAATGTTTA
29
TACACTGAAAATGTTT
452




GACGGG

AGAC






631
653
ACATCACCCCATAAACA
30
ACATCACCCCATAAAC
453




AATAGG

AAAT






636
658
ACCCCATAAACAAATAG
31
ACCCCATAAACAAATA
454




GTTTGG

GGTT






727
749
TCTAAATCACCACGATC
32
TCTAAATCACCACGAT
455




AAAAGG

CAAA






788
810
TTAGCCTAGCCACACCC
33
TTAGCCTAGCCACACC
456




CCACGG

CCCA






789
811
TAGCCTAGCCACACCCC
34
TAGCCTAGCCACACCC
457




CACGGG

CCAC






851
873
AACTAAGCTATACTAAC
35
AACTAAGCTATACTAA
458




CCCAGG

CCCC






852
874
ACTAAGCTATACTAACC
36
ACTAAGCTATACTAAC
459




CCAGGG

CCCA






856
878
AGCTATACTAACCCCAG
37
AGCTATACTAACCCCA
460




GGTTGG

GGGT






880
902
CAATTTCGTGCCAGCCA
38
CAATTTCGTGCCAGCC
461




CCGCGG

ACCG






912
934
TAACCCAAGTCAATAGA
39
TAACCCAAGTCAATAG
462




AGCCGG

AAGC






1009
1031
CACAAAATAGACTACG
40
ACACAAATAGACTACG
463




AAAGTGG

AAAG






1051
1073
ACAATAGCTAAGACCCA
41
ACAATAGCTAAGACCC
464




AACTGG

AAAC






1052
1074
CAATAGCTAAGACCCAA
42
CAATAGCTAAGACCCA
465




ACTGGG

AACT






1148
1170
AGCCACAGCTTAAAACT
43
AGCCACAGCTTAAAAC
466




CAAAGG

TCAA






1154
1176
AGCTTAAAACTCAAAGG
44
AGCTTAAAACTCAAAG
467




ACCTGG

GACC






1157
1179
TTAAAACTCAAAGGACC
45
TTAAAACTCAAAGGAC
468




TGGCGG

CTGG






1178
1200
GGTGCTTCATATCCCTC
46
GGTGCTTCATATCCCT
469




TAGAGG

CTAG






1267
1289
TCTTCAGCAAACCCTGA
47
TCTTCAGCAAACCCTG
470




TGAAGG

ATGA






1306
1328
AGTACCCACGTAAAGAC
48
AGTACCCACGTAAAGA
471




GTTAGG

CGTT






1312
1334
CACGTAAAGACGTTAGG
49
CACGTAAAGACGTTAG
472




TCAAGG

GTCA






1326
1348
AGGTCAAGGTGTAGCCC
50
AGGTCAAGGTGTAGCC
473




ATGAGG

CATG






1329
1351
TCAAGGTGTAGCCCATG
51
TCAAGGTGTAGCCCAT
474




AGGTGG

GAGG






1339
1361
GCCCATGAGGTGGCAA
52
GCCCATGAGGTGGCAA
475




GAAATGG

GAAA






1340
1362
CCCATGAGGTGGCAAG
53
CCCATGAGGTGGCAAG
476




AAATGGG

AAAT






1389
1411
GATAGCCCTTATGAAAC
54
GATAGCCCTTATGAAA
477




TTAAGG

CTTA






1390
1412
ATAGCCCTTATGAAACT
55
ATAGCCCTTATGAAAC
478




TAAGGG

TTAA






1397
1419
TTATGAAACTTAAGGGT
56
TTATGAAACTTAAGGG
479




CGAAGG

TCGA






1400
1422
TGAAACTTAAGGGTCGA
57
TGAAACTTAAGGGTCG
480




AGGTGG

AAGG






1441
1463
AGTAGAGTGCTTAGTTG
58
AGTAGAGTGCTTAGTT
481




AACAGG

GAAC






1442
1464
GTAGAGTGCTTAGTTGA
59
GTAGAGTGCTTAGTTG
482




ACAGGG

AACA






1494
1516
CCTCCTCAAGTATACTT
60
CCTCCTCAAGTATACT
483




CAAAGG

TCAA






1530
1552
ACCCCTACGCATTTATA
61
ACCCCTACGCATTTAT
484




TAGAGG

ATAG






1548
1570
AGAGGAGACAAGTCGT
62
AGAGGAGACAAGTCG
485




AACATGG

TAACA






1560
1582
TCGTAACATGGTAAGTG
63
TCGTAACATGGTAAGT
486




TACTGG

GTAC






1573
1595
AGTGTACTGGAAAGTGC
64
AGTGTACTGGAAAGTG
487




ACTTGG

CACT






1620
1642
AAAGCACCCAACTTACA
65
AAAGCACCCAACTTAC
488




CTTAGG

ACTT






1726
1748
CATTTACCCAAATAAAG
66
CATTTACCCAAATAAA
489




TATAGG

GTAT






1746
1768
AGGCGATAGAAATTGA
67
AGGCGATAGAAATTG
490




AACCTGG

AAACC






1770
1792
GCAATAGATATAGTACC
68
GCAATAGATATAGTAC
491




GCAAGG

CGCA






1771
1793
CAATAGATATAGTACCG
69
CAATAGATATAGTACC
492




CAAGGG

GCAA






1809
1831
TAACCAAGCATAATATA
70
TAACCAAGCATAATAT
493




GCAAGG

AGCA






1862
1884
TAACTAGAAATAACTTT
71
TAACTAGAAATAACTT
494




GCAAGG

TGCA






1947
1969
CCGTCTATGTAGCAAAA
72
CCGTCTATGTAGCAAA
495




TAGTGG

ATAG






1948
1970
CGTCTATGTAGCAAAAT
73
CGTCTATGTAGCAAAA
496




AGTGGG

TAGT






1960
1982
AAAATAGTGGGAAGAT
74
AAAATAGTGGGAAGA
497




TTATAGG

TTTAT






1966
1988
GTGGGAAGATTTATAGG
75
GTGGGAAGATTTATAG
498




TAGAGG

GTAG






1987
2009
GGCGACAAACCTACCG
76
GGCGACAAACCTACCG
499




AGCCTGG

AGCC






1997
2019
CTACCGAGCCTGGTGAT
77
CTACCGAGCCTGGTGA
500




AGCTGG

TAGC






2086
2108
ATTTAACTGTTAGTCCA
78
ATTTAACTGTTAGTCC
501




AAGAGG

AAAG






2099
2121
TCCAAAGAGGAACAGC
79
TCCAAAGAGGAACAG
502




TCTTTGG

CTCTT






2107
2129
GGAACAGCTCTTTGGAC
80
GGAACAGCTCTTTGGA
503




ACTAGG

CACT






2152
2174
AAAAATTTAACACCCAT
81
AAAAATTTAACACCCA
504




AGTAGG

TACT






2247
2269
CTGAACTCCTCACACCC
82
CTGAACTCCTCACACC
505




AATTGG

CAAT






2414
2436
CCTCACTGTCAACCCAA
83
CCTCACTGTCAACCCA
506




CACAGG

ACAC






2427
2449
CCAACACAGGCATGCTC
84
CCAACACAGGCATGCT
507




ATAAGG

CATA






2432
2454
ACAGGCATGCTCATAAG
85
ACAGGCATGCTCATAA
508




GAAAGG

GGAA






2449
2471
GAAAGGTTAAAAAAAG
86
GAAAGGTTAAAAAAA
509




TAAAAGG

GTAAA






2456
2478
TAAAAAAAGTAAAAGG
87
TAAAAAAAGTAAAAG
510




AACTCGG

GAACT






2515
2537
TCTAGCATCACCAGTAT
88
TCTAGCATCACCAGTA
511




TAGAGG

TTAG






2546
2568
GCCCAGTGACACATGTT
89
GCCCAGTGACACATGT
512




TAACGG

TTAA






2552
2574
TGACACATGTTTAACGG
90
TGACACATGTTTAACG
513




CCGCGG

GCCG






2571
2593
GCGGTACCCTAACCGTG
91
GCGGTACCCTAACCGT
514




CAAAGG

GCAA






2599
2621
TAATCACTTGTTCCTTA
92
TAATCACTTGTTCCTT
515




AATAGG

AAAT






2600
2622
AATCACTTGTTCCTTAA
93
AATCACTTGTTCCTTA
516




ATAGGG

AATA






2614
2636
TAAATAGGGACCTGTAT
94
TAAATAGGGACCTGTA
517




GAATGG

TGAA






2624
2646
CCTGTATGAATGGCTCC
95
CCTGTATGAATGGCTC
518




ACGAGG

CACG






2625
2647
CTGTATGAATGGCTCCA
96
CTGTATGAATGGCTCC
519




CGAGGG

ACGA






2676
2698
AAATTGACCTGCCCGTG
97
AAATTGACCTGCCCGT
520




AAGAGG

GAAG






2679
2701
TTGACCTGCCCGTGAAG
98
TTGACCTGCCCGTGAA
521




AGGCGG

GAGG






2680
2702
TGACCTGCCCGTGAAGA
99
TGACCTGCCCGTGAAG
522




GGCGGG

AGGC






2711
2733
AGCAAGACGAGAAGAC
100
AGCAAGACGAGAAGA
523




CCTATGG

CCCTA






2755
2777
ACAGTACCTAACAAACC
101
ACAGTACCTAACAAAC
524




CACAGG

CCAC






2789
2811
CAAACCTGCATTAAAAA
102
CAAACCTGCATTAAAA
525




TTTCGG

ATTT






2793
2815
CCTGCATTAAAAATTTC
103
CCTGCATTAAAAATTT
526




GGTTGG

CGGT






2794
2816
CTGCATTAAAAATTTCG
104
CTGCATTAAAAATTTC
527




GTTGGG

GGTT






2795
2817
TGCATTAAAAATTTCGG
105
TGCATTAAAAATTTCG
528




TTGGGG

GTTG






2804
2826
AATTTCGGTTGGGGCGA
106
AATTTCGGTTGGGGCG
529




CCTCGG

ACCT






2895
2917
TGATCCAATAACTTGAC
107
TGATCCAATAACTTGA
530




CAACGG

CCAA






2911
2933
CCAACGGAACAAGTTAC
108
CCAACGGAACAAGTTA
531




CCTAGG

CCCT






2912
2934
CAACGGAACAAGTTACC
109
CAACGGAACAAGTTAC
532




CTAGGG

CCTA






2954
2976
CTAGAGTCCATATCAAC
110
CTAGAGTCCATATCAA
533




AATAGG

CAAT






2955
7977
TAGAGTCCATATCAACA
111
TAGAGTCCATATCAAC
534




ATAGGG

AATA






2974
2996
AGGGTTTACGACCTCGA
112
AGGGTTTACGACCTCG
535




TGTTGG

ATGT






2980
3002
TACGACCTCGATGTTGG
113
TACGACCTCGATGTTG
536




ATCAGG

GATC






2992
3014
GTTGGATCAGGACATCC
114
GTTGGATCAGGACATC
537




CGATGG

CCGA






3010
3032
GATGGTGCAGCCGCTAT
115
GATGGTGCAGCCGCTA
538




TAAAGG

TTAA






3058
3080
TACGTGATCTGAGTTCA
116
TACGTGATCTGAGTTC
539




GACCGG

AGAC






3069
3091
AGTTCAGACCGGAGTAA
117
AGTTCAGACCGGAGTA
540




TCCAGG

ATCC






3073
3095
CAGACCGGAGTAATCCA
118
CAGACCGGAGTAATCC
541




GGTCGG

AGGT






3110
3132
CAAATTCCTCCCTGTAC
119
CAAATTCCTCCCTGTA
542




GAAAGG

CGAA






3125
3147
ACGAAAGGACAAGAGA
120
ACGAAAGGACAAGAG
543




AATAAGG

AAATA






3203
3225
ACCCACACCCACCCAAG
121
ACCCACACCCACCCAA
544




AACAGG

GAAC






3204
3226
CCCACACCCACCCAAGA
122
CCCACACCCACCCAAG
545




ACAGGG

AACA






3217
3239
AAGAACAGGGTTTGTTA
123
AAGAACAGGGTTTGTT
546




AGATGG

AAGA






3227
3249
TTTGTTAAGATGGCAGA
124
TTTGTTAAGATGGCAG
547




GCCCGG

AGCC






3262
3284
ACTTAAAACTTTACAGT
125
ACTTAAAACTTTACAG
548




CAGAGG

TCAG






3294
3316
TCTTCTTAACAACATAC
126
TCTTCTTAACAACATA
549




CCATGG

CCCA






3336
3358
TGTACCCATTCTAATCG
127
TGTACCCATTCTAATC
550




CAATGG

GCAA






3370
3392
CTTACCGAACGAAAAAT
128
CTTACCGAACGAAAAA
551




TCTAGG

TTCT






3391
3413
GGCTATATACAACTACG
129
GGCTATATACAACTAC
552




CAAAGG

GCAA






3406
3428
CGCAAAGGCCCCAACGT
130
CGCAAAGGCCCCAAC
553




TGTAGG

GTTGT






3415
3437
CCCAACGTTGTAGGCCC
131
CCCAACGTTGTAGGCC
554




CTACGG

CCTA






3416
3438
CCAACGTTGTAGGCCCC
132
CCAACGTTGTAGGCCC
555




TACGGG

CTAC






3570
3592
CCTCCCCATACCCAACC
133
CCTCCCCATACCCAAC
556




CCCTGG

CCCC






3586
3608
CCCCTGGTCAACCTCAA
134
CCCCTGGTCAACCTCA
557




CCTAGG

ACCT






3643
3665
GTTTACTCAATCCTCTG
135
GTTTACTCAATCCTCT
558




ATCAGG

GATC






3644
3666
TTTACTCAATCCTCTGA
136
TTTACTCAATCCTCTG
559




TCAGGG

ATCA






3676
3698
AACTCAAACTACGCCCT
137
AACTCAAACTACGCCC
560




GATCGG

TGAT






3757
3779
CTATCAACATTACTAAT
138
CTATCAACATTACTAA
561




AAGTGG

TAAG






3828
3850
ACTCCTGCCATCATGAC
139
ACTCCTGCCATCATGA
562




CCTTGG

CCCT






3892
3914
ACCCCCTTCGACCTTGC
140
ACCCCCTTCGACCTTG
563




CGAAGG

CCGA






3893
3915
CCCCCTTCGACCTTGCC
141
CCCCCTTCGACCTTGC
564




GAAGGG

CGAA






3894
3916
CCCCTTCGACCTTGCCG
142
CCCCTTCGACCTTGCC
565




AAGGGG

GAAG






3913
3935
GGGGAGTCCGAACTAGT
143
GGGGAGTCCGAACTA
566




CTCAGG

GTCTC






3937
3959
TTCAACATCGAATACGC
144
TTCAACATCGAATACG
567




CGCAGG

CCGC






4015
4037
CTCACCACTACAATCTT
145
CTCACCACTACAATCT
568




CCTAGG

TCCT






4287
4309
ACTTTGATAGAGTAAAT
146
ACTTTGATAGAGTAAA
569




AATAGG

TAAT






4311
4333
GCTTAAACCCCCTTATT
147
GCTTAAACCCCCTTAT
570




TCTAGG

TTCT






4386
4408
TCACACCCCATCCTAAA
148
TCACACCCCATCCTAA
571




GTAAGG

AGTA






4406
4428
AGGTCAGCTAAATAAGC
149
AGGTCAGCTAAATAAG
572




TATCGG

CTAT






4407
4429
GGTCAGCTAAATAAGCT
150
GGTCAGCTAAATAAGC
573




ATCGGG

TATC






4428
4450
GGCCCATACCCCGAAAA
151
GGCCCATACCCCGAAA
574




TGTTGG

ATGT






4460
4482
TCCCGTACTAATTAATC
152
TCCCGTACTAATTAAT
575




CCCTGG

CCCC






4494
4516
ATCTACTCTACCATCTTT
153
ATCTACTCTACCATCT
576




GCAGG

TTGC






4542
4564
CACTGATTTTTTACCTG
154
CACTGATTTTTTACCT
577




AGTAGG

GAGT






4692
4714
CTCTTCAACAATATACT
155
CTCTTCAACAATATAC
578




CTCCGG

TCTC






4767
4789
ATAGCTATAGCAATAAA
156
ATAGCTATAGCAATAA
579




ACTAGG

AACT






4799
4821
CTTTCACTTCTGAGTCC
157
CTTTCACTTCTGAGTC
580




CAGAGG

CCAG






4809
4831
TGAGTCCCAGAGGTTAC
158
TGAGTCCCAGAGGTTA
581




CCAAGG

CCCA






4827
4849
CAAGGCACCCCTCTGAC
159 
CAAGGCACCCCTCTGA
582




ATCCGG

CATC






4941
4963
TCAATCTTATCCATCAT
160
TCAATCTTATCCATCA
583




AGCAGG

TAGC






4950
4972
TCCATCATAGCAGGCAG
161
TCCATCATAGCAGGCA
584




TTGAGG

GTTG






4953
4975
ATCATAGCAGGCAGTTG
162
ATCATAGCAGGCAGTT
585




AGGTGG

GAGG






5010
5032
TACTCCTCAATTACCCA
163
TACTCCTCAATTACCC
586




CATAGG

ACAT






5202
5274
CCATCCACCCTCCTCTC
164
CCATCCACCCTCCTCT
587




CCTAGG

CCCT






5205
5227
TCCACCCTCCTCTCCCT
165
TCCACCCTCCTCTCCCT
588




AGGAGG

AGG






5223
5245
GGAGGCCTGCCCCCGCT
166
GGAGGCCTGCCCCCGC
589




AACCGG

TAAC






539
5261
TAACCGGCTTTTTGCCC
167
TAACCGGCTTTTTGCC
590




AAATGG

CAAA






5240
5262
AACCGGCTTTTTGCCCA
168
AACCGGCTTTTTGCCC
591




AATGGG

AAAT






5500
5522
TAATAATCTTATAGAAA
169
TAATAATCTTATAGAA
592




TTTAGG

ATTT






5569
5591
CTTAATTTCTGTAACAG
170
CTTAATTTCTGTAACA
593




CTAAGG

GCTA






5646
5668
CTAAGCCCTTACTAGAC
171
CTAAGCCCTTACTAGA
594




CAATGG

CCAA






5647
5669
TAAGCCCTTACTAGACC
172
TAAGCCCTTACTAGAC
595




AATGGG

CAAT






5697
5719
AGCTAAGCACCCTAATC
173
AGCTAAGCACCCTAAT
596




AACTGG

CAAC






5723
5745
CAATCTACTTCTCCCGC
174
CAATCTACTTCTCCCG
597




CGCCGG

CCGC






5724
5746
AATCTACTTCTCCCGCC
175
AATCTACTTCTCCCGC
598




GCCGGG

CGCC






5732
5754
TCTCCCGCCGCCGGGAA
176
TCTCCCGCCGCCGGGA
599




AAAAGG

AAAA






5735
5757
CCCGCCGCCGGGAAAA
177
CCCGCCGCCGGGAAA
600




AAGGCGG

AAAGG






5736
5758
CCGCCGCCGGGAAAAA
178
CCGCCGCCGGGAAAA
601




AGGCGGG

AAGGC






5747
5769
AAAAAAGGCGGGAGAA
179
AAAAAAGGCGGGAGA
602




GCCCCGG

AGCCC






5751
5773
AAGGCGGGAGAAGCCC
180
AAGGCGGGAGAAGCC
603




CGGCAGG

CCGGC






5800
5822
ATTCAATATGAAAATCA
181
ATTCAATATGAAAATC
604




CCTCGG

ACCT






5806
5828
TATGAAAATCACCTCGG
182
TATGAAAATCACCTCG
605




ACTCTGG

GAGC






5816
5838
ACCTCGGAGCTGGTAAA
183
ACCTCGGAGCTGGTAA
606




AAGAGG

AAAG






5928
5950
TCTACAAACCACAAAGA
184
TCTACAAACCACAAAG
607




CATTGG

ACAT






5949
5971
GGAACACTATACCTATT
185
GGAACACTATACCTAT
608




ATTCGG

TATT






5961
5983
CTATTATTCGGCGCATG
186
CTATTATTCGGCGCAT
609




AGCTGG

GAGC






5970
5992
GGCGCATGAGCTGGAGT
187
GGCGCATGAGCTGGA
610




CCTAGG

GTCCT






6005
6027
CCTCCTTATTCGAGCCG
188
CCTCCTTATTCGAGCC
611




AGCTGG

GAGC






6006
6028
CTCCTTATTCGAGCCGA
189
CTCCTTATTCGAGCCG
612




GCTGGG

AGCT






6027
6049
GGCCAGCCAGGCAACCT
190
GGCCAGCCAGGCAAC
613




TCTAGG

CTTCT






6108
3130
ATAGTAATACCCATCAT
191
ATAGTAATACCCATCA
614




AATCGG

TAAT






6111
6133
GTAATACCCATCATAAT
192
GTAATACCCATCATAA
615




CGGAGG

TCGG






6117
6139
CCCATCATAATCGGAGG
193
CCCATCATAATCGGAG
616




CTTTGG

GCTT






6144
6166
TGACTAGTTCCCCTAAT
194
TGACTAGTTCCCCTAA
617




AATCGG

TAAT






6158
6180
AATAATCGGTGCCCCCG
195
AATAATCGGTGCCCCC
618




ATATGG

GATA






6236
6258
CCTGCTCGCATCTGCTA
196
CCTGCTCGCATCTGCT
619




TAGTGG

ATAG






6239
6261
GCTCGCATCTGCTATAG
197
GCTCGCATCTGCTATA
620




TGGAGG

GTGG






6243
6265
GCATCTGCTATAGTGGA
198
GCATCTGCTATAGTGG
621




GGCCGG

AGGC






6249
6271
GCTATAGTGGAGGCCGG
199
GCTATAGTGGAGGCCG
622




AGCAGG

GAGC






6255
6277
GTGGAGGCCGGAGCAG
200
GTGGAGGCCGGAGCA
623




GAACAGG

GGAAC






6282
6304
ACAGTCTACCCTCCCTT
201
ACAGTCTACCCTCCC
624




AGCAGG

TAGC






6283
6305
CAGTCTACCCTCCCTTA
202
CAGTCTACCCTCCCTT
625




GCAGGG

AGCA






6300
6322
GCAGGGAACTACTCCCA
203
GCAGGGAACTACTCCC
626




CCCTGG

ACCC






6342
6364
ATCTTCTCCTTACACCT
204
ATCTTCTCCTTACACCT
627




AGCAGG

AGC






6360
6382
GCAGGTGTCTCCTCTAT
205
GCAGGTGTCTCCTCTA
628




CTTAGG

TCTT






6361
6383
CAGGTGTCTCCTCTATC
206
CAGGTGTCTCCTCTAT
629




TTAGGG

CTTA






6362
6384
AGGTGTCTCCTCTATCT
207
AGGTGTCTCCTCTATC
630




TAGGGG

TTAG






6495
6517
TCTCTCCCAGTCCTAGC
208
TCTCTCCCAGTCCTAG
631




TGCTGG

CTGC






6552
6574
ACCACCTTCTTCGACCC
209
ACCACCTTCTTCGACC
632




CGCCGG

CCGC






6555
6577
ACCTTCTTCGACCCCGC
210
ACCTTCTTCGACCCCG
633




CGGAGG

CCGG






6558
6580
TTCTTCGACCCCGCCGG
211
TTCTTCGACCCCGCCG
634




AGGAGG

GAGG






6597
6619
CAACACCTATTCTGATT
212
CAACACCTATTCTGAT
635




TTTCGG

TTTT






6630
6652
GTTTATATTCTTATCCTA
213
GTTTATATTCTTATCCT
636




CCAGG

ACC






6636
6658
ATTCTTATCCTACCAGG
214
ATTCTTATCCTACCAG
637




CTTCGG

GCTT






6669
6691
CATATTGTAACTTACTA
215
CATATTGTAACTTACT
638




CTCCGG

ACTC






6687
6709
TCCGGAAAAAAAGAAC
216
TCCGGAAAAAAAGAA
639




CATTTGG

CCATT






6696
6718
AAAGAACCATTTGGATA
217
AAAGAACCATTTGGAT
640




CATAGG

ACAT






6701
6723
ACCATTTGGATACATAG
218
ACCATTTGGATACATA
641




GTATGG

GGTA






6723
6745
GTCTGAGCTATGATATC
219
GTCTGAGCTATGATAT
642




AATTGG

CAAT






6732
6754
ATGATATCAATTGGCTT
220
ATGATATCAATTGGCT
643




CCTAGG

TCCT






6713
6755
TGATATCAATTGGCTTC
221
TGATATCAATTGGCTT
644




CTAGGG

CCTA






6768
6790
GCACACCATATATTTAC
222
GCACACCATATATTTrA
645




AGTAGG

CAGT






6831
6853
ATAATCATCGCTATCCC
223
ATAATCATCGCTATCC
646




CACCGG

CCAC






6867
6889
AGCTGACTCGCCACACT
224
AGCTGACTCGCCACAC
647




CCACGG

TCCA






6909
6931
GCTGCAGTGCTCTGAGC
225
GCTGCAGTGCTCTGAG
648




CCTAGG

CCCT






6933
6955
TTCATCTTTCTTTTCACC
226
TTCATCTTTCTTTTCAC
649




GTAGG

CGT






6936
6958
ATCTTTCTTTTCACCGTA
227
ATCTTTCTTTTCACCGT
650




GGTGG

AGG






6945
6967
TTCACCGTAGGTGGCCT
228
TTCACCGTAGGTGGCC
651




GACTGG

TGAC






7032
7054
TTCCACTATGTCCTATC
229
TTCCACTATGTCCTAT
652




AATAGG

CAAT






7053
7075
GGAGCTGTATTTGCCAT
230
GGAGCTGTATTTGCCA
653




CATAGG

TCAT






7056
7078
GCTGTATTTGCCATCAT
231
GCTGTATTTGCCATCA
654




AGGAGG

TAGG






7086
7108
CACTGATTTCCCCTATT
232
CACTGATTTCCCCTAT
655




CTCAGG

TCTC






7140
7162
CATTTCACTATCATATT
233
CATTTCACTATCATAT
656




CATCGG

TCAT






7176
7198
TTCTTCCCACAACACTT
234
TTCTTCCCACAACACT
657




TCTCGG

TTCT






7185
7207
CAACACTTTCTCGGCCT
235
CAACACTTTCTCGGCC
658




ATCCGG

TATC






7205
7227
CGGAATGCCCCGACGTT
236
CGGAATGCCCCGACGT
659




ACTCGG

TACT






7251
7273
TGAAACATCCTATCATC
237
TGAAACATCCTATCAT
660




TGTAGG

CTGT






7358
7380
AGAAGAACCCTCCATAA
238
AGAAGAACCCTCCATA
661




ACCTGG

AACC






7371
7393
ATAAACCTGGAGTGACT
239
ATAAACCTGGAGTGAC
662




ATATGG

TATA






7432
7454
ACATAAAATCTAGACAA
240
ACATAAAATCTAGACA
663




AAAAGG

AAAA






7436
7458
AAAATCTAGACAAAAA
241
AAAATCTAGACAAAA
664




AGGAAGG

AAGGA






7457
7479
GGAATCGAACCCCCCAA
242
GGAATCGAACCCCCCA
665




AGCTGG

AAGC






7476
7498
CTGGTTTCAAGCCAACC
243
CTGGTTTCAAGCCAAC
666




CCATGG

CCCA






7499
7521
CCTCCATGACTTTTTCA
244
CCTCCATGACTTTTTC
667




AAAAGG

AAAA






7544
7566
CTTTGTCAAAGTTAAAT
245
CTTTGTCAAAGTTAAA
668




TATAGG

TTAT






7567
7589
CTAAATCCTATATATCT
246
CTAAATCCTATATATC
669




TAATGG

TTAA






7586
7608
ATGGCACATGCAGCGCA
247
ATGGCACATGCAGCGC
670




AGTAGG

AAGT






7741
7763
TACTAACATCTCAGACG
248
TACTAACATCTCAGAC
671




CTCAGG

GCTC






7831
7853
CATCCTTTACATAACAG
249
CATCCTTTACATAACA
672




ACGAGG

GACG






7865
7887
TCCCTTACCATCAAATC
250
TCCCTTACCATCAAAT
673




AATTGG

CAAT






7875
7897
TCAAATCAATTGGCCAC
251
TCAAATCAATTGGCCA
674




CAATGG

CCAA






7904
7926
ACCTACGAGTACACCGA
252
ACCTACGAGTACACCG
675




CTACGG

ACTA






7907
7929
TACGAGTACACCGACTA
253
TACGAGTACACCGACT
676




CGGCGG

ACGG






7955
7977
CCCCCATTATTCCTAGA
254
CCCCCATTTTCCTAG
677




ACCAGG

AACC






8069
8091
TCATGAGCTGTCCCCAC
255
TCATGAGCTGTCCCCA
678




ATTAGG

CATT






8093
8115
TTAAAAACAGATGCAAT
256
TTAAAAACAGATGCAA
679




TCCCGG

TTCC






8131
8153
CACTTTCACCGCTACAC
257
CACTTTCACCGCTACA
680




GACCGG

CGAC






8132
8154
ACTTTCACCGCTACACG
258
ACTTTCACCGCTACAC
681




ACCGGG

GACC






8133
8155
CTTTCACCGCTACACGA
259
CTTTCACCGCTACACG
682




CCGGGG

ACCG






8134
8156
TTTCACCGCTACACGAC
260
TTTCACCGCTACACGA
683




CGGGGG

CCGG






8144
8166
ACACGACCGGGGGTAT
261
ACACGACCGGGGGTAT
684




ACTACGG

ACTA






8165
8187
GGTCAATGCTCTGAAAT
262
GGTCAATGCTCTGAAA
685




CTGTGG

TCTG






8228
8250
CCCCTAAAAATCTTTGA
263
CCCCTAAAAATCTTTG
686




AATAGG

AAAT






8229
8251
CCCTAAAAATCTTTGAA
264
CCCTAAAAATCTTTGA
687




ATAGGG

AATA






8370
8392
CCCAACTAAATACTACC
265
CCCAACTAAATACTAC
688




GTATGG

CGTA






8551
8573
TTCATTGCCCCCACAAT
266
TTCATTTGCCCCCACAA
689




CCTAGG

TCCT






8698
8720
ATAACCATACACAACAC
267
ATAACCATACACAACA
690




TAAAGG

CTAA






8761
8783
ATTGCCACAACTAACCT
268
ATTGCCACAACTAACC
691




CCTCGG

TCCT






8817
8839
ACTATCTATAAACCTAG
269
ACTATCTATAAACCTA
692




CCATGG

GCCA






8835
8857
CATGGCCATCCCCTTAT
270
CATGGCCATCCCCTTA
693




GAGCGG

TGAG






8836
8858
ATGGCCATCCCCTTATG
271
ATGGCCATCCCCTTAT
694




AGCGGG

GAGC






8851
8873
TGAGCGGGCACAGTGAT
272
TGAGCGGGCACAGTG
695




TATAGG

ATTAT






8899
8921
CTAGCCCACTTCTTACC
273
CTAGCCCACTTCTTAC
696




ACAAGG

CACA






8973
8995
ACTCATTCAACCAATAG
274
ACTCATTCAACCAATA
697




CCCTGG

GCCC






9004
9026
CTAACCGCTAACATTACC
275
CTAACCGCTAACATTA
698




TGCAGG

CTGC






9028
9050
CACCTACTCATGCACCT
276
CACCTACTCATGCACC
699




AATTGG

TAAT






9243
9265
CCCAGCCCATGACCCCT
277
CCCAGCCCATGACCCC
700




AACAGG

TAAC






9244
9266
CCAGCCCATGACCCCTA
278
CCAGCCCATGACCCCT
701




ACAGGG

AACA






9245
9267
CAGCCCATGACCCCTAA
279
CAGCCCATGACCCCTA
702




CAGGGG

ACAG






9273
9295
TCAGCCCTCCTAATGAC
280
TCAGCCCTCCTAATGA
703




CTCCGG

CCTC






9321
9343
TCCATAACGCTCCTCAT
281
TCCATAACGCTCCTCA
704




ACTAGG

TACT






9358
9380
CACTAACCATATACCAA
282
CACTAACCATATACCA
705




TGATGG

ATGA






9390
9412
ACACGAGAAAGCACAT
283
ACACGAGAAAGCACA
706




ACCAAGG

TACCA






9417
9439
CACACACCACCTGTCCA
284
CACACACCACCTGTCC
707




AAAAGG

AAAA






9429
9451
GTCCAAAAAGGCCTTCG
285
GTCCAAAAAGGCCTTC
708




ATACGG

GATA






9430
9452
TCCAAAAAGGCCTTCGA
286
TCCAAAAAGGCCTTCG
709




TACGGG

ATAC






9471
9493
TCAGAAGTTTTTTTCTTC
287
TCAGAAGTTTTTTTCTT
710




GCAGG

CGC






9522
9544
CTAGCCCCTACCCCCCA
288
CTAGCCCCTACCCCCC
711




ATTAGG

AATT






9525
9547
GCCCCTACCCCCCAATT
289
GCCCCTACCCCCCAAT
712




AGGAGG

TAGG






9526
9548
CCCCTACCCCCCAATTA
290
CCCCTACCCCCCAATT
713




GGAGGG

AGGA






9532
9554
CCCCCCAATTAGGAGGG
291
CCCCCCAATTAGGAGG
714




CACTGG

GCAC






9543
9565
GGAGGGCACTGGCCCCC
292
GGAGGGCCACTGGCCCC
715




AACAGG

CAAC






9606
9628
ACATCCGTATTACTCGC
293
ACATCCGTATTACTCG
716




ATCAGG

CATC






9692
9714
ACTGCTTATTACAATTT
294
ACTGCTTATTACAATT
717




TACTGG

TTAC






9693
9715
CTGCTTATTACAATTTT
295
CTGCTTATTACAATTTT
718




ACTGGG

ACT






9756
9778
TCTCCCTTCACCATTTCC
296
TCTCCCTTCACCATTTC
719




GACGG

CGA






9765
9787
ACCATTTCCGACGGCAT
297
ACCATTTCCGACGGCA
720




CTACGG

TCTA






9789
9811
TCAACATTTTTTGTAGC
298
TCAACATTTTTTGTAG
721




CACAGG

CCAC






9798
9820
TTTGTAGCCACAGGCTT
299
TTTGTAGCCACAGGCT
722




CCACGG

TCCA






9816
9838
CACGGACTTCACGTCAT
300
CACGGACTTCACGTCA
723




TATTGG

TTAT






9885
9907
TTTACATCCAAACATCA
301
TTTACATCCAAACATC
724




CTTTGG

ACTT






9910
9932
TCGAAGCCGCCGCCTGA
302
TCGAAGCCGCCGCCTG
725




TACTGG

ATAC






9926
9948
ATACTGGCATTTTGTAG
303
ATACTGGCATTTTGTA
726




ATGTGG

GATG






9963
9985
TATGTCTCCATCTATTG
304
TATGTCTCCATCTATT
727




ATGAGG

GATG






9964
9986
ATGTCTCCATCTATTGA
305
ATGTCTCCATCTATTG
728




TGAGGG

ATGA






10122
10144
TTTTGACTACCACAACT
306
TTTTGACTACCACAAC
729




CAACGG

TCAA






10155
10177
AAATCCACCCCTTACGA
307
AAATCCACCCCTTACG
730




GTGCGG

AGTG






10343
10365
CATCATCCTAGCCCTAA
308
CATCATCCTAGCCCTA
731




GTCTGG

AGTC






10365
10387
GCCTATGAGTGACTACA
309
GCCTATGAGTGACTAC
732




AAAAGG

AAAA






10385
10407
AGGATTAGACTGAACCG
310
AGGATTAGACTGAACC
733




AATTGG

GAAT






10500
10522
GCATTTACCATCTCACT
311
GCATTTACCATCTCAC
734




TCTAGG

TTCT






10551
10573
TCCTCCCTACTATGCCT
312
TCCTCCCTACTATGCC
735




AGAAGG

TAGA






10664
10686
CTTTGCCGCCTGCGAAG
313
CTTTGCCGCCTGCGAA
736




CAGCGG

GCAG






10667
10689
TGCCGCCTGCGAAGCAG
314
TGCCGCCTGCGAAGCA
737




CGGTGG

GCGG






10668
10690
GCCGCCTGCGAAGCAGC
315
GCCGCCTGCGAAGCAG
738




GGTGGG

CGGT






10704
10726
GTCTCAATCTCCAACAC
316
GTCTCAATCTCCAACA
739




ATATGG

CATA






10972
10994
ACTCCTACCCCTCACAA
317
ACTCCTACCCCTCACA
740




TCATGG

ATCA






11128
11150
AACCACACTTATCCCCA
318
AACCACACTTATCCCC
741




CCTTGG

ACCT






11147
11169
TTGGCTATCATCACCCG
319
TTGGCTATCATCACCC
742




ATGAGG

GATG






11174
11196
CAGCCAGAACGCCTGA
320
CAGCCAGAACGCCTGA
743




ACGCAGG

ACGC






11204
11226
TTCCTATTCTACACCCT
321
TTCCTATTCTACACCCT
744




AGTAGG

AGT






11252
11274
ATTTACACTCACAACAC
322
ATTTACACTCACAACA
745




CCTAGG

CCCT






11369
11391
ATAGTAAAGATACCTCT
323
ATAGTAAAGATACCTC
746




TTACGG

TTTA






11417
11439
CATGTCGAAGCCCCCAT
324
CATGTCGAAGCCCCCA
747




CGCTGG

TCGC






11418
11440
ATGTCGAAGCCCCCATC
325
ATGTCGAAGCCCCCAT
748




GCTGGG

CGCT






11453
11475
GCCGCAGTACTCTTAAA
326
GCCGCAGTACTCTTAA
749




ACTAGG

AACT






11456
11478
GCAGTACTCTTAAAACT
327
GCAGTACTCTTAAAAC
750




AGGCGG

TAGG






11462
11484
CTCTTAAAACTAGGCGG
328
CTCTTAAAACTAGGCG
751




CTATGG

GCTA






11540
11562
TTCCTTGTACTATCCCTA
329
TTCCTTGTACTATCCCT
752




TGAGG

ATG






11669
11691
CAAACCCCCTGAAGCTT
330
CAAACCCCCTGAAGCT
753




CACCGG

TCAC






11696
11718
GTCATTCTCATAATCGC
331
GTCATTCTCATAATCG
754




CCACGG

CCCA






11697
11719
TCATTCTCATAATCGCC
332
TCATTCTCATAATCGC
755




CACGGG

CCAC






11777
11799
CGCATCATAATCCTCTC
333
CGCATCATAATCCTCT
756




TCAAGG

CTCA






11866
11888
ACCCCCCACTATTAACC
334
ACCCCCACTATTAAC
757




TACTGG

CTAC






11867
11889
CCCCCCACTATTAACCT
335
CCCCCCACTATTAACC
758




ACTGGG

TACT






11927
11949
AATATCACTCTCCTACT
336
AATATCACTCTCCTAC
759




TACAGG

TTAC






11985
12007
ACATATTTACCACAACA
337
ACATATTTACCACAAC
760




CAATGG

ACAA






11986
12008
CATATTTACCACAACAC
338
CATATTTACCACAACA
761




AATGGG

CAAT






11987
12009
ATATTTACCACAACACA
339
ATATTTACCACAACAC
762




ATGGGG

AATG






12104
12126
CTCAACCCCGACATCAT
340
CTCAACCCCGACATCA
763




TACCGG

TTAC






12105
12127
TCAACCCCGACATCATT
341
TCAACCCCGACATCAT
764




ACCGGG

TACC






12164
12186
GATTGTGAATCTGACAA
342
GATTGTGAATCTGACA
765




CAGAGG

ACAG






12235
12257
TGCCCCCATGTCTAACA
343
TGCCCCCATGTCTAAC
766




ACATGG

AACA






12254
12276
ATGGCTTTCTCAACTTTT
344
ATGGCTTTCTCAACTT
767




AAAGG

TTAA






12272
12294
AAAGGATAACAGCTATC
345
AAAGGATAACAGCTAT
768




CATTGG

CCAT






12279
12301
AACAGCTATCCATTGGT
346
AACAGCTATCCATTGG
769




CTTAGG

TCTT






12294
12316
GTCTTAGGCCCCAAAAA
347
GTCTTAGGCCCCAAAA
770




TTTTGG

ATTT






12608
12630
CTGTAGCATTGTTCGTT
348
CTGTAGCATTGTTCGT
771




ACATGG

TACA






12742
12764
AACCTATTCCAACTGTT
349
AACCTATTCCAACTGT
772




CATCGG

TCAT






12750
12772
CCAACTGTTCATCGGCT
350
CCAACTGTTCATCGGC
773




GAGAGG

TGAG






12751
12773
CAACTGTTCATCGGCTG
351
CAACTGTTCATCGGCT
774




AGAGGG

GAGA






12757
12779
TTCATCGGCTGAGAGGG
352
TTCATCGGCTGAGAGG
775




CGTAGG

GCGT






12847
12869
GCAATCCTATACAACCG
353
GCAATCCTATACAACC
776




TATCGG

GTAT






12856
12878
TACAACCGTATCGGCGA
354
TACAACCGTATCGGCG
777




TATCGG

ATAT






12958
12980
CCAAGCCTCACCCCACT
355
CCAAGCCTCACCCCAC
778




ACTAGG

TACT






12979
13001
GGCCTCCTCCTAGCAGC
356
GGCCTCCTCCTAGCAG
779




AGCAGG

CAGC






12997
13019
GCAGGCAAATCAGCCC
357
GCAGGCAAATCAGCCC
780




AATTAGG

AATT






13030
13052
TGACTCCCCTCAGCCAT
358
TGACTCCCCTCAGCCA
781




AGAAGG

TAGA






13081
13103
TCAAGCACTATAGTTGT
359
TCAAGCACTATAGTTG
782




AGCAGG

TAGC






13156
13178
CAAACTCTAACACTATG
360
CAAACTCTAACACTAT
783




CTTAGG

GCTT






13246
13268
TTCTCCACTTCAAGTCA
361
TTCTCCACTTCAAGTC
784




ACTAGG

AACT






13267
13289
GGACTCATAATAGTTAC
362
GGACTCATAATAGTTA
785




AATCGG

CAAT






13345
13367
GCCATACTATTTATGTG
363
GCCATACTATTTATGT
786




CTCCGG

GCTC






13346
13368
CCATACTATTTATGTGC
364
CCATACTATTTATGTG
787




TCCGGG

CTCC






13393
13415
GAACAAGATATTCGAA
365
GAACAAGATATTCGAA
788




AAATAGG

AAAT






13396
13418
CAAGATATTCGAAAAAT
366
CAAGATATTCGAAAAA
789




AGGAGG

TAGG






13441
13463
ACTTCAACCTCCCTCAC
367
ACTTCAACCTCCCTCA
790




CATTGG

CCAT






13459
13481
ATTGGCAGCCTAGCATT
368
ATTGGCAGCCTAGCAT
791




AGCAGG

TAGC






13477
13499
GCAGGAATACCTTTCCT
369
GCAGGAATACCTTTCC
792




CACAGG

TCAC






13612
13634
ATAATTCTTCTCACCCT
370
ATAATTCTTCTCACCC
793




AACAGG

TAAC






13686
13708
ACTAAACCCCATTAAAC
371
ACTAAACCCCATTAAA
794




GCCTGG

CGCC






13693
13715
CCCATTAAACGCCTGGC
372
CCCATTAAACGCCTGG
795




AGCCGG

CAGC






13708
13730
GCAGCCGGAAGCCTATT
373
GCAGCCGGAAGCCTAT
796




CGCAGG

TCGC






13804
13826
GCCCTCGCTGTCACTTT
374
GCCCTCGCTGTCACTT
797




CCTAGG

TCCT






13894
13916
TTTTATTTCTCCAACATA
375
TTTTATTTCTCCAACAT
798




CTCGG

ACT






13936
13958
CACCGCACAATCCCCTA
376
CACCGCACAATCCCCT
799




TCTAGG

ATCT






14059
14081
ATCATCACCTCAACCCA
377
ATCATCACCTCAACCC
800




AAAAGG

AAAA






14237
14259
TACAAAGCCCCCGCACC
378
TACAAAGCCCCCGCAC
801




AATAGG

CAAT






14417
14439
ACCCCTGACCCCCATGC
379
ACCCCTGACCCCCATG
802




CTCAGG

CCTC






14579
14601
AATACTAAACCCCCATA
380
AATACTAAACCCCCAT
803




AATAGG

AAAT






14585
14607
AAACCCCCATAAATAGG
381
AAACCCCCATAAATAG
804




AGAAGG

GAGA






14664
14686
CATACATCATTATTCTC
382
CATACATCATTATTCT
805




GCACGG

CGCA






14825
14847
ATCTCCGCATGATGAAA
383
ATCTCCGCATGATGAA
806




CTTCGG

ACTT






14837
14859
TGAAACTTCGGCTCACT
384
TGAAACTTCGGCTCAC
807




CCTTGG

TCCT






14867
14889
CTGATCCTCCAAATCAC
385
CTGATCCTCCAAATCA
808




CACAGG

CCAC






14951
14973
ATCACTCGAGACGTAAA
386
ATCACTCGAGACGTAA
809




TTATGG

ATTA






14981
15003
ATCCGCTACCTTCACGC
387
ATCCGCTACCTTCACG
810




CAATGG

CCAA






15020
15042
ATCTGCCTCTTCCTACA
388
ATCTGCCTCTTCCTAC
811




CATCGG

ACAT






15021
15043
TCTGCCTCTTCCTCACA
389
TCTGCCTCTTCCTACA
812




ATCGGG

CATC






15026
15048
CTCTTCCTACACATCGG
390
CTCTTCCTACACATCG
813




GCGAGG

GGCG






15038
15060
ATCGGGCGAGGCCTATA
391
ATCGGGCGAGGCCTAT
814




TTACGG

ATTA






15071
15093
TACTCAGAAACCTGAAA
392
TACTCAGAAACCTGAA
815




CATCGG

ACAT






15113
15135
ACTATAGCAACAGCCTT
393
ACTATAGCAACAGCCT
816




CATAGG

TCAT






15131
15153
ATAGGCTATGTCCTCCC
394
ATAGGCTATGTCCTCC
817




GTGAGG

CGTG






15149
15171
TGAGGCCAAATATCATT
395
TGAGGCCAAATATCAT
818




CTGAGG

TCTG






15150
15172
GAGGCCAAATATCATTC
396
GAGGCCAAATATCATT
819




TGAGGG

CTGA






15151
15173
AGGCCAAATATCATTCT
397
AGGCCAAATATCATTC
820




GAGGGG

TGAG






15194
15216
CTATCCGCCATCCCATA
398
CTATCCGCCATCCCAT
821




CATTGG

ACAT






15195
15217
TATCCGCCATCCCATAC
399
TATCCGCCATCCCATA
822




ATTGGG

CATT






15221
15243
GACCTAGTTCAATGAAT
400
GACCTAGTTCAATGAA
823




CTGAGG

TCTG






15224
15246
CTAGTTCAATGAATCTG
401
CTAGTTCAATGAATCT
824




AGGAGG

GAGG






15334
15356
CCTCCTATTCTTGCACG
402
CCTCCTATTCTTGCAC
825




AAACGG

GAAA






15335
15357
CTCCTATTCTTGCACGA
403
CTCCTATTCTTGCACG
826




AACGGG

AAAC






15353
15375
ACGGGATCAAACAACC
404
ACGGGATCAAACAAC
827




CCCTAGG

CCCCT






15416
15438
TACACAATCAAAGACGC
405
TACACAATCAAAGACG
828




CCTCGG

CCCT






15476
15498
CTATTCTCACCAGACCT
406
CTATTCTCACCAGACC
829




CCTAGG

TCCT






15590
15612
CGATCCGTCCCTAACAA
407
CGATCCGTCCCTAACA
830




ACTAGG

AACT






15593
15615
TCCGTCCCTAACAAACT
408
TCCGTCCCTAACAAAC
831




AGGAGG

TAGG






15740
15762
CTCCTCATTCTAACCTG
409
CTCCTCATTCTAACCT
832




AATCGG

GAAT






15743
15765
CTCATTCTAACCTGAAT
410
CTCATTCTAACCTGAA
833




CGGAGG

TCGG






15776
15798
AGCTACCCTTTTACCAT
411
AGCTACCCTTTTACCA
834




CATTGG

TCAT






15861
15883
TTGAAAACAAAATACTC
412
TTGAAAACAAAATACT
835




AAATGG

CAAA






15862
15884
TGAAAACAAAATACTCA
413
TGAAAACAAAATACTC
836




AATGGG

AAAT






15906
15928
AATACACCAGTCTTGTA
414
AATACACCAGTCTTGT
837




AACCGG

AAAC






15928
15950
GAGATGAAAACCTTTTT
415
GAGATGAAAACCTTTT
838




CCAAGG

TCCA






16012
16034
AACTATTCTCTGTTCTTT
416
AACTTTCTCTGTTCTT
839




CATGG

TCA






16013
16035
ACTATTCTCTGTTCTTTC
417
ACTATTCTCTGTTCTTT
840




ATGGG

CAT






16014
16036
CTATTCTCTGTTCTTTCA
418
CTATTCTCTGTTCTTTC
841




TGGGG

ATG






16026
16048
CTTTCATGGGGAAGCAG
419
CTTTCATGGGGAAGCA
842




ATTTGG

GATT






16027
16049
TTTCATGGGGAAGCAGA
420
TTTCATGGGGAAGCAG
843




TTTGGG

ATTT






16108
16130
CAGCCACCATGAATATT
421
CAGCCACCATGAATAT
844




GTACGG

TGTA






16252
16274
AAAGCCACCCCTCACCC
422
AAAGCCACCCCTCACC
845




ACTAGG

CACT






16348
16370
CAAATCCCTTCTCGTCC
423
CAAATCCCTTCTCGTC
846




CCATGG

CCCA






16367
16389
ATGGATGACCCCCCTCA
424
ATGGATGACCCCCCTC
847




GATAGG

AGAT






16368
16390
TGGATGACCCCCCTCAG
425
TGGATGACCCCCCTCA
848




ATAGGG

GATA






16369
16391
GGATGACCCCCCTCAGA
426
GGATGACCCCCCTCAG
849




TAGGGG

ATAG






16434
16456
GAGTGCTACTCTCCTCG
427
GAGTGCTACTCTCCTC
850




CTCCGG

GCTC






16435
16457
AGTGCTACTCTCCTCGC
428
AGTGCTACTCTCCTCG
851




TCCGGG

CTCC






16449
16471
CGCTCCGGGCCCATAAC
429
CGCTCCGGGCCCATAA
852




ACTTGG

CACT






16450
16472
GCTCCGGGCCCATAACA
430
GCTCCGGGCCCATAAC
853




CTTGGG

ACTT






16451
16473
CTCCGGGCCCATAACAC
431
CTCCGGGCCCATAACA
854




TTGGGG

CTTG






16452
16474
TCCGGGCCCATAACACT
432
TCCGGGCCCATAACAC
855




TGGGGG

TTGG






16482
16504
AGTGAACTGTATCCGAC
433
AGTGAACTGTATCCGA
856




ATCTGG

CATC






16495
16517
CGACATCTGGTTCCTAC
434
CGACATCTGGTTCCTA
857




TTCAGG

CTTC






16496
16518
GACATCTGGTTCCTACT
435
GACATCTGGTTCCTAC
858




TCAGGG

TTCA
















TABLE 4







gRNA target sequence for human mtDNA carrying NGG sequence on the (−) strand.














nt sequence on the (+)







strand containing CCN






Chr end
sequence followed by the

20 nt gRNA target



Chr start
position
reverse complementary

sequence



position
(+
sequence of gRNA target
SEQ
(will encode the gRNA
SEQ


(+ strand)
strand)
sequence
ID NO:
targeting sequence)
ID NO:















17
39
CCCTATTAACCACTCAC
859
GCTCCCGTGAGTGGTT
2628




GGGAGC

AATA






18
40
CCTATTAACCACTCACG
860
AGCTCCCGTGAGTGGT
2629




GGAGCT

TAAT






26
48
CCACTCACGGGAGCTCT
861
GCATGGAGAGCTCCCG
2630




CCATGC

TGAG






43
65
CCATGCATTTGGTATTT
862
AGACGAAAATACCAA
2631




TCGTCT

ATGCA






104
126
CCGGAGCACCCTATGTC
863
TACTGCGACATAGGGT
2632




GCAGTA

GCTC






112
134
CCCTATGTCGCAGTATC
864
AAGACAGATACTGCG
2633




TGTCTT

ACATA






113
135
CCTATGTCGCAGTATCT
865
AAAGACAGATACTGC
2634




GTCTTT

GACAT






140
162
CCTGCCTCATCCTATTA
866
GATAAATAATAGGATG
2635




TTTATC

AGGC






144
166
CCTCATCCTATTATTTAT
867
GTGCGATAAATAATAG
2636




CGCAC

GATG






150
172
CCTATTATTTATCGCAC
868
ACGTAGGTGCGATAAA
2637




CTACGT

TAAT






166
188
CCTACGTTCAATATTAC
869
TCGCCTGTAATATTGA
2638




AGGCGA

ACGT






261
283
CCACTTTCCACACAGAC
870
TATGATGTCTGTGTGG
2639




ATCATA

AAAG






268
290
CCACACAGACATCATAA
871
TTTTTGTTATGATGTCT
2640




CAAAAA

GTG






298
320
CCAAACCCCCCCTCCCC
872
GAAGCGGGGGAGGGG
2641




CGCTTC

GGGTT






304
326
CCCCCCTCCCCCGCTTC
873
TGGCCAGAAGCGGGG
2642




TGGCCA

GAGGG






305
327
CCCCCTCCCCCGCTTCT
874
GTGGCCAGAAGCGGG
2643




GGCCAC

GGAGG






306
328
CCCCTCCCCCGCTTCTG
875
TGTGGCCAGAAGCGG
2644




GCCACA

GGGAG






107
329
CCCTCCCCCGCTTCTGG
876
CTGTGGCCAGAAGCGG
2645




CCACAG

GGGA






308
330
CCTCCCCCGCTTCTGGC
877
GCTGTGGCCAGAAGCG
2646




CACAGC

GGGG






311
333
CCCCCGCTTCTGGCCAC
878
AGTGCTGTGGCCAGAA
2647




AGCACT

GCGG






312
334
CCCCGCTTCTGGCCACA
879
AAGTGCTGTGGCCAGA
2648




GCACTT

AGCG






313
335
CCCGCTTCTGGCCACAG
880
TAAGTGCTGTGGCCAG
2649




CACTTA

AAGC






314
336
CCGCTTCTGGCCACAGC
881
TTAAGTGCTGTGGCCA
2650




ACTTAA

GAAG






324
346
CCACAGCACTTAAACAC
882
AGAGATGTGTTTAAGT
2651




ATCTCT

GCTG






348
370
CCAAACCCCAAAAACA
883
GGTTCTTTGTTTTTGGG
2652




AAGAACC

GTT






353
375
CCCCAAAAACAAAGAA
884
GTTAGGGTTCTTTGTTT
2653




CCCTAAC

TTG






354
376
CCCAAAAACAAAGAAC
885
TGTTAGGGTTCTTTGTT
2654




CCTAACA

TTT






355
377
CCAAAAACAAAGAACC
886
GTGTTAGGGTTCTTTG
2655




CTAACAC

TTTT






369
391
CCCTAACACCAGCCTAA
887
ATCTGGTTAGGCTGGT
2656




CCAGAT

GTTA






370
392
CCTAACACCAGCCTAAC
888
AATCTGGTTAGGCTGG
2657




CAGATT

TGTT






377
399
CCAGCCTAACCAGATTT
889
AATTTGAAATCTGGTT
2658




CAAATT

AGGC






381
403
CCTAACCAGATTTCAAA
890
ATAAAATTTGAAATCT
2659




TTTTAT

GGTT






386
408
CCAGATTTCAAATTTTA
891
AAAAGATAAAATTTGA
2660




TCTTTT

AATC






433
455
CCCCCCAACTAACACAT
892
AAAATAATGTGTTAGT
2661




TATTTT

TGGG






434
456
CCCCCAACTAACACATT
893
GAAAATAATGTGTTAG
2662




ATTTTC

TTGG






435
457
CCCCAACTAACACATTA
894
GGAAAATAATGTGTTA
2663




TTTTCC

GTTG






436
458
CCCAACTAACACATTAT
895
GGGAAAATAATGTGTT
2664




TTTCCC

AGTT






437
459
CCAACTAACACATTATT
896
GGGGAAAATAATGTGT
7665




TTCCCC

TAGT






456
478
CCCCTCCCACTCCCATA
897
TAGTAGTATGGGAGTG
2666




CTACTA

GGAG






457
479
CCCTCCCACTCCCATAC
898
TTAGTAGTATGGGAGT
2667




TACTAA

GGGA






458
480
CCTCCCACTCCCATACT
899
ATTAGTAGTATGGGAG
2668




ACTAAT

TGGG






461
483
CCCACTCCCATACTACT
900
GAGATTAGTAGTATGG
2669




AATCTC

GAGT






462
484
CCACTCCCATACTACTA
901
TGAGATTAGTAGTATG
2670




ATCTCA

GGAG






467
489
CCCATACTACTAATCTC
902
ATTGATGAGATTAGTA
2671




ATCAAT

GTAT






468
490
CCATACTACTAATCTCA
903
TATTGATGAGATTAGT
2672




TCAATA

AGTA






494
516
CCCCCGCCCATCCTACC
904
GTGCTGGGTAGGATGG
2673




CAGCAC

GCGG






495
517
CCCCGCCCATCCTACCC
905
TGTGCTGGGTAGGATG
2674




AGCACA

GGCG






496
518
CCCGCCCATCCTACCCA
906
GTGTGCTGGGTAGGAT
2675




GCACAC

GGGC






497
519
CCGCCCATCCTACCCAG
907
TGTGTGCTGGGTAGGA
2676




CACACA

TGGG






500
522
CCCATCCTACCCAGCAC
908
GTGTGTGTGCTGGGTA
2677




ACACAC

GGAT






501
523
CCATCCTACCCAGCACA
909
TGTGTGTGTGCTGGTA
2678




CACACA

AGGA






505
527
CCTACCCAGCACACACA
910
GCGGTGTGTGTGTGCT
2679




CACCGC

GGGT






509
531
CCCAGCACACACACACC
911
AGCAGCGGTGTGTGTG
2680




GCTGCT

TGCT






510
532
CCAGCACACACACACCG
912
TAGCAGCGGTGTGTGT
2681




CTGCTA

GTGC






524
546
CCGCTGCTAACCCCATA
913
TCGGGGTATGGGGTTA
2682




CCCCGA

GCAG






534
556
CCCCATACCCCGAACCA
914
TTTGGTTGGTTCGGGG
2683




ACCAAA

TATG






535
557
CCCATACCCCGAACCAA
915
GTTTGGTTGGTTCGGG
2684




CCAAAC

GTAT






536
558
CCATACCCCGAACCAAC
916
GGTTTGGTTGGTTCGG
2685




CAAACC

GGTA



541
563
CCCCGAACCAACCAAAC
917
TTTGGGGTTTGGTTGG
2686







CCCAAA

TTCG



542
564
CCCGAACCAACCAAACC
918
CTTTGGGGTTTGGTTG
2687




CCAAAG

GTTC






543
565
CCGAACCAACCAAACCC
919
TCTTTGGGGTTTGGTT
2688




CAAAGA

GGTT






548
570
CCAACCAAACCCCAAA
920
GGGTGTCTTTGGGGTT
2689




GACACCC

TGGT






552
574
CCAAACCCCAAAGACA
921
TGGGGGGTGTCTTTGG
2690




CCCCCCA

GGTT






557
579
CCCCAAAGACACCCCCC
922
AACTGTGGGGGGTGTC
2691




ACAGTT

TTTG






558
580
CCCAAAGACACCCCCCA
923
AAACTGTGGGGGGTGT
2692




CAGTTT

CTTT






559
581
CCAAAGACACCCCCCAC
924
TAAACTGTGGGGGGTG
2693




AGTTTA

TCTT






568
590
CCCCCCACAGTTTATGT
925
TAAGCTACATAAACTG
2694




AGCTTA

TGGG






569
591
CCCCCACAGTTTATGTA
926
GTAAGCTACATAAACT
2695




GCTTAC

GTGG






570
592
CCCCACAGTTTATGTAG
927
GGTAAGCTACATAAAC
2696




CTTACC

TGTG






571
593
CCCACAGTTTATGTAGC
928
AGGTAAGCTACATAAA
2697




TTACCT

CTGT






572
594
CCACAGTTTATGTAGCT
929
GAGGTAAGCTACATAA
2698




TACCTC

ACTG






591
613
CCTCCTCAAAGCAATAC
930
TTCAGTGTATTGCTTT
2699




ACTGAA

GAGG






594
616
CCTCAAAGCAATACACT
931
ATTTTCAGTGTATTGC
2700




GAAAAT

TTTG






637
659
CCCCATAAACAAATAGG
932
ACCAAACCTATTTGTT
2701




TTTGGT

TATG






638
660
CCCATAAACAAATAGGT
933
GACCAAACCTATTTGT
2702




TTGGTC

TTAT






639
661
CCATAAACAAATAGGTT
934
GGACCAAACCTATTTG
2703




TGGTCC

TTTA






660
682
CCTAGCCTTTCTATTAG
935
TAAGAGCTAATAGAA
2704




CTCTTA

AGGCT






665
687
CCTTTCTATTAGCTCTTA
936
CTTACTAAGAGCTAAT
2705




GTAAG

AGAA






705
727
CCCCGTTCCAGTGAGTT
937
AGGGTGAACTCACTGG
2706




CACCCT

AACG






706
728
CCCGTTCCAGTGAGTTC
938
GAGGGTGAACTCACTG
2707




ACCCTC

GAAC






707
729
CCGTTCCAGTGAGTTCA
939
AGAGGGTGAACTCACT
2708




CCCTCT

GGAA






712
734
CCAGTGAGTTCACCCTC
940
GATTTAGAGGGTGAAC
2709




TAAATC

TCAC






724
746
CCCTCTAAATCACCACG
941
TTTGATCGTGGTGATT
2710




ATCAAA

TAGA






725
747
CCTCTAAATCACCACGA
942
TTTTGATCGTGGTGAT
2711




TCAAAA

TTAG






736
758
CCACGATCAAAAGGAA
943
ATGCTTGTTCCTTTTGA
2712




CAAGCAT

TCG






792
814
CCTAGCCACACCCCCAC
944
TTTCCCGTGGGGGTGT
2713




GGGAAA

GGCT






797
819
CCACACCCCCACGGGAA
945
TGCTGTTTCCCGTGGG
2714




ACAGCA

GGTG






802
824
CCCCCACGGGAAACAG
946
ATCACTGCTGTTTCCC
2715




CAGTGAT

GTGG






803
825
CCCCACGGGAAACAGC
947
AATCACTGCTGTTTCC
2716




AGTGATT

CGTG






804
826
CCCACGGGAAACAGCA
948
TAATCACTGCTGTTTC
2717




GTGATTA

CCGT






805
827
CCACGGGAAACAGCAG
949
TTAATCACTGCTGTTT
2718




TGATTAA

CCCG






828
850
CCTTTAGCAATAAACGA
950
AAACTTTCGTTTATTG
2719




AAGTTT

CTAA






867
889
CCCCAGGGTTGGTCAAT
951
CACGAAATTGACCAAC
2720




TTCGTG

CCTG






868
890
CCCAGGGTTGGTCAATT
952
GCACGAAATTGACCAA
2721




TCGTGC

CCCT






869
891
CCAGGGTTGGTCAATTT
953
GGCACGAAATTGACCA
2722




CGTGCC

ACCC






890
912
CCAGCCACCGCGGTCAC
954
AATCGTGTGACCGCGG
2723




ACGATT

TGGC






894
916
CCACCGCGGTCACACGA
955
GGTTAATCGTGTGACC
2724




TTAACC

GCGG






897
919
CCGCGGTCACACGATTA
956
TTGGGTTAATCGTGTG
2725




ACCCAA

ACCG






915
937
CCCAAGTCAATAGAAGC
957
ACGCCGGCTTCTATTG
2726




CGGCGT

ACTT






916
938
CCAAGTCAATAGAAGCC
958
TACGCCGGCTTCTATT
2727




GGCGTA

GACT






931
953
CCGGCGTAAAGAGTGTT
959
ATCTAAAACACTCTTT
2728




TTAGAT

ACGC






956
978
CCCCCTCCCCAATAAAG
960
TTTTAGCTTTTTGGG
2729




CTAAAA

GAGG






957
979
CCCCTCCCCAATAAAGC
961
GTTTTAGCTTTATTGG
2730




TAAAAC

GGAG






958
980
CCCTCCCCAATAAAGCT
962
AGTTTTAGCTTTATTG
2731




AAAACT

GGGA






959
981
CCTCCCCAATAAAGCTA
963
GAGTTTTAGCTTTATT
2732




AAACTC

GGGG






962
984
CCCCAATAAAGCTAAAA
964
GGTGAGTTTTAGCTTT
2733




CTCACC

ATTG






963
985
CCCAATAAAGCTAAAAC
965
AGGTGAGTTTTAGCTT
2734




TCACCT

TATT






964
986
CCAATAAAGCTAAAACT
966
CAGGTGAGTTTTAGCT
2735




CACCTG

TTAT






983
1005
CCTGAGTTGTAAAAAAC
967
ACTGGAGTTTTTTACA
2736




TCCAGT

ACTC






1001
1023
CCAGTTGACACAAAATA
968
GTAGTCTATTTTGTGT
2737




GACTAC

CAAC






1064
1086
CCCAAACTGGGATTAGA
969
GGGGTATCTAATCCCA
2738




TACCCC

GTTT






1065
1087
CCAAACTGGGATTAGAT
970
TGGGGTATCTAATCCC
2739




ACCCCA

AGTT






1083
1105
CCCCACTATGCTTAGCC
971
GTTTAGGGCTAAGCAT
2740




CTAAAC

AGTG






1084
1106
CCCACTATGCTTAGCCC
972
GGTTTAGGGCTAAGCA
2741




TAAACC

TAGT






1085
1107
CCACTATGCTTAGCCCT
973
AGGTTTAGGGCTAAGC
2742




AAACCT

ATAG






1098
1120
CCCTAAACCTCAACAGT
974
GATTTAACTGTTGAGG
2743




TAAATC

TTTA






1099
1121
CCTAAACCTCAACAGTT
975
TGATTTAACTGTTGAG 
2744




AAATCA

GTTT






1105
1127
CCTCAACAGTTAAATCA
976
TTTTGTTGATTTAACTG
2745




ACAAAA

TTG






1135
1157
CCAGAACACTACGAGCC
977
AGCTGTGGCTCGTAGT
2746




ACAGCT

GTTC






1150
1172
CCACAGCTTAAAACTCA
978
GTCCTTTGAGTTTTAA
2747




AAGGAC

GCTG






1172
1194
CCTGGCGGTGCTTCATA
979
GAGGGATATGAAGCA
2748




TCCCTC

CCGCC






1190
1212
CCCTCTAGAGGAGCCTG
980
ACAGAACAGGCTCCT
2749




TTCTGT

TAGA






1191
1213
CCTCTAGAGGAGCCTGT
981
TACAGAACAGGCTCCT
2750




TCTGTA

CTAG






1203
1225
CCTGTTCTGTAATCGAT
982
GGGTTTATCGATTACA
2751




AAACCC

GAAC






1223
1245
CCCCGATCAACCTCAC
983
AGAGGTGGTGAGGTTG
2752




ACCTCT

ATCG






1224
1246
CCCGATCAACCTCACCA
984
AAGAGGTGGTGAGGTT
2753




CCTCTT

GATC






1225
1247
CCGATCAACCTCACCAC
985
CAAGAGGTGGTGAGG
2754




CTCTTG

TTGAT






1233
1255
CCTCACCACCTCTTGCT
986
AGGCTGAGCAAGAGG
2755




CAGCCT

TGGTG






1238
1260
CCACCTCTTGCTCAGCC
987
TATATAGGCTGAGCAA
2756




TATATA

GAGG






1241
1263
CCTCTTGCTCAGCCTAT
988
CGGTATATAGGCTGAG
2757




ATACCG

CAAG






1253
1275
CCTATATACCGCCATCT
989
TGCTGAAGATGGCGGT
2758




TCAGCA

ATAT






1261
1283
CCGCCATCTTCAGCAAA
990
TCAGGGTTTGCTGAAG
2759




CCCTGA

ATGG






1264
1286
CCATCTTCAGCAAACCC
991
TCATCAGGGTTTGCTG
2760




TGATGA

AAGA






1278
1300
CCCTGATGAAGGCTACA
992
TTACTTTGTAGCCTTC
2761




AAGTAA

ATCA






1279
1301
CCTGATGAAGGCTACAA
993
CTTACTTTGTAGCCTTC
2762




AGTAAG

ATC






1310
1332
CCCACGTAAAGACGTTA
994
TTGACCTAACGTCTTT
2763




GGTCAA

ACGT






1311
1333
CCACGTAAAGACGTTAG
995
CTTGACCTAACGTCTT
2764




GTCAAG

TACG






1340
1362
CCCATGAGGTGGCAAG
996
CCCATTTGTTGCCACC
2765




AAATGGG

TCAT






1341
1363
CCATGAGGTGGCAAGA
997
GCCCATTTCTTGCCAC
2766




AATGGGC

CTCA






1375
1397
CCCCAGAAAACTACGAT
998
AGGGCTATCGTAGTTT
2767




AGCCCT

TCTG






1376
1398
CCCAGAAAACTACGATA
999
AAGGGCTATCGTAGTT
2768




GCCCTT

TTCT






1377
1399
CCAGAAAACTACGATA
1000
TAAGGGCTATCGTAGT
2769




GCCCTTA

TTTC






1394
1416
CCCTTATGAAACTTAAG
1001
TCGACCCTTAAGTTTC
2770




GGTCGA

ATAA






1395
1417
CCTTATGAAACTTAAGG
1002
TTCGACCCTTAAGTTT
2771




GTCGAA

CATA






1465
1487
CCCTGAAGCGCGTACAC
1003
GGCGGTGTGTACGCGC
2772




ACCGCC

TTCA






1466
1488
CCTGAAGCGCGTACACA
1004
GGGCGGTGTGTACGCG
2773




CCGCCC

CTTC






1483
1505
CCGCCCTCACCCTCCT
1005
TACTTGAGGAGGGTGA
2774




CAAGTA

CGGG






1486
1508
CCCGTCACCCTCCTCAA
1006
GTATACTTGAGGAGGG
2775




GTATAC

TGAC






1487
1509
CCGTCACCCTCCTCAAG
1007
AGTATACTTGAGGAGG
7776




TATACT

GTGA






1493
1515
CCCTCCTCAAGTATACT
1008
CTTTGAAGTATACTTG
2777




TCAAAG

AGGA






1494
1516
CCTCCTCAAGTATACTT
1009
CCTTTGAAGTATACTT
2778




CAAAGG

GAGG






1497
1519
CCTCAAGTATACTTCAA
1010
TGTCCTTTGAAGTATA
2779




AGGACA

CTTG






1531
1553
CCCCTACGCATTTATAT
1011
TCCTCTATATAAATGC
2780




AGAGGA

GTAG






1532
1554
CCCTACGCATTTATATA
1012
CTCCTCTATATAAATG
2781




GAGGAG

CGTA






1533
1555
CCTACGCATTTATATAG
1013
TCTCCTCTATATAAAT
2782




AGGAGA

GCGT






1601
1623
CCAGAGTGTAGCTTAAC
1014
CTTTGTGTTAAGCTAC
2783




ACAAAG

ACTC






1626
1648
CCCAACTTACACTTAGG
1015
AAATCTCCTAAGTGTA
2784




AGATTT

AGTTT






162
1649
CCAACTTACACTTAGGA
1016
GAAATCTCCTAAGTGT
2785




GATTTC

AAGT






1662
1684
CCGCTCTGAGCTAAACC
1017
GGGCTAGGTTTAGCTC
2786




TAGCCC

AGAG






1677
1699
CCTAGCCCCAAACCCAC
1018
GGTGGAGTGGGTTTGG
2787




TCCACC

GGCT






1682
1704
CCCCAAACCCACTCCAC
1019
AGTAAGGTGGAGTGG
2788




CTTACT

GTTTG






1683
1705
CCCAAACCCACTCCACC
1020
TAGTAAGGTGGAGTGG
2789




TTACTA

GTTT






1684
1706
CCAAACCCACTCCACCT
1021
GTAGTAAGGTGGAGTG
2790




TACTAC

GGTT






1689
1711
CCCACTCCACCTTACTA
1022
GTCTGGTAGTAAGGTG
2791




CCAGAC

GAGT






1690
1712
CCACTCCACCTTACTAC
1023
TGTCTGGTAGTAAGGT
2792




CAGACA

GGAG






1695
1717
CCACCTTACTACCAGAC
1024
AAGGTTGTCTGGTAGT
2793




AACCTT

AAGG






1698
1720
CCTTACTACCAGACAAC
1025
GCTAAGGTTGTCTGGT
2794




CTTAGC

AGTA






1706
1728
CCAGACAACCTTAGCC
1026
ATGGTTTGGCTAAGGT
2795




AACCAT

TGTC






1714
1736
CCTTAGCCAAACCATTT
1027
TTGGGTAAATGGTTTG
2796




ACCCAA

GCTA






1720
1742
CCAAACCATTTACCCAA
1028
CTTTATTTGGGTAAAT
2797




ATAAAG

GGTT






1725
1747
CCATTTACCCAAATAAA
1029
CTATACTTTATTTGGG
2798




GTATAG

TAAA






1732
1754
CCCAAATAAAGTATAGG
1030
CTATCGCCTATACTTT
2799




CGATAG

ATTT






1733
1755
CCAAATAAAGTATAGGC
1031
TCTATCGCCTATACTTT
2800




GATAGA

ATT






1764
1786
CCTGGCGCAATAGATAT
1032
GGTACTATATCTATTG
2801




AGTACC

CGCC






1785
1807
CCGCAAGGGAAAGATG
1033
AATTTTTCATCTTTCCC
2802




AAAAATT

TTG






1812
1834
CCAAGCATAATATAGCA
1034
AGTCCTTGCTATATTA
2803




AGGACT

TGCT






1837
1859
CCCCTATACCTTCTGCA
1035
TCATTATGCAGAAGGT
2804




TAATGA

ATAG






1838
1860
CCCTATACCTTCTGCAT
1036
TTCATTATGCAGAAGG
2805




AATGAA

TATA






1839
1861
CCTATACCTTCTGCATA
1037
ATTCATTATGCAGAAG
2806




ATGAAT

GTAT






1845
1867
CCTTCTGCATAATGAAT
1038
TAGTTAATTCATTATG
2807




TAACTA

CAGA






1889
1911
CCAAAGCTAAGACCCCC
1039
GGTTTCGGGGGTCTTA
2808




GAAACC

GCTT






1901
1923
CCCCCGAAACCAGACG
1040
GGTAGCTCGTCTGGTT
2809




AGCTACC

TCGG






1902
1924
CCCCGAAACCAGACGA
1041
AGGTAGCTCGTCTGGT
2810




GCTAC

TTCG






1903
1925
CCCGAAACCAGACGAG
1042
TAGGTAGCTCGTCTGG
2811




CTACCTA

TTTC






1904
1926
CCGAAACCAGACGAGC
1043
TTAGGTAGCTCGTCTG
2812




TACCTAA

GTTT






1910
1932
CCAGACGAGCTACCTAA
1044
CTGTTCTTTTGGTAGCT
2813




GAACAG

CGTC






1922
1944
CCTAAGAACAGCTAAA
1045
GTGCTCTTTTAGCTGTT
2814




AGAGCAC

CTT






1946
1968
CCCGTCTATGTAGCAAA
1046
CACTATTTTGCTACAT
2815




ATAGTG

AGAC






1947
1969
CCGTCTATGTAGCAAAA
1047
CCACTATTTTGCTACA
2816




TAGTGG

TAGA






1996
2018
CCTACCGAGCCTGGTGA
1048
CAGCTATCACCAGGCT
2817




TAGCTG

CGGT






2000
2022
CCGAGCCTGGTGATAGC
1049
CAACCAGCTATCACCA
2818




TGGTTG

GGCT






2005
2027
CCTGGTGATAGCTGGTT
1050
TTGGACAACCAGCTAT
2819




GTCCAA

CACC






2024
2046
CCAAGATAGAATCTTAG
1051
GTTGAACTAAGATTCT
2820




TTCAAC

ATCT






2057
2079
CCCACAGAACCCTCTAA
1052
GGGGATTTAGAGGGTT
2821




ATCCCC

CTGT






2058
2080
CCACAGAACCCTCTAAA
1053
AGGGGATTTAGAGGGT
2822




TCCCCT

TCTG






2066
7088
CCCTCTAAATCCCCTTG
1054
AATTTACAAGGGGATT
2823




TAAATT

TAGA






2067
2089
CCTCTAAATCCCCTTGT
1055
AAATTTACAAGGGGAT
2824




AAATTT

TTAG






2076
2098
CCCCTTGTAAATTTAAC
1056
CTAACAGTTAAATTTA
2825




TGTTAG

CAAG






2077
2099
CCCTTGTAAATTTAACT
1057
ACTAACAGTTAAATTT
2826




GTTAGT

ACAA






2078
2100
CCTTGTAAATTTAACTG
1058
GACTAACAGTTAAATT
2827




TTAGTC

TACA






2100
2122
CCAAAGAGGAACAGCT
1059
TCCAAAGAGCTGTTCC
2828




CTTTGGA

TCTT






2136
2158
CCTTGTAGAGAGAGTAA
1060
AATTTTTTACTCTCTCT
2829




AAAATT

ACA






2164
2186
CCCATAGTAGGCCTAAA
1061
GCTGCTTTTAGGCCTA
2830




AGCAGC

CTAT






2165
2187
CCATAGTAGGCCTAAAA
1062
GGCTGCTTTTAGGCCT
2831




GCAGCC

ACTA






2175
2197
CCTAAAAGCAGCCACCA
1063
CTTAATTGGTGGCTGC
2832




ATTAAG

TTTT






2186
2208
CCACCAATTAAGAAAGC
1064
TTGAACGCTTTCTTAA
2833




GTTCAA

TTGG






2189
2211
CCAATTAAGAAAGCGTT
1065
AGCTTGAACGCTTTCT
2834




CAAGCT

AAT






2217
2239
CCCACTACCTAAAAAAT
1066
TTTGGGATTTTTTAGG
2835




CCCAAA

TAGT






2218
2240
CCACTACCTAAAAAATC
1067
GTTTGGGATTTTTTAG
2836




CCAAAC

GTAG






2224
2246
CCTAAAAAATCCCAAAC
1068
TTATATGTTTGGGATT
2837




ATATAA

TTTT






2234
2256
CCCAAACATATAACTGA
1069
AGGAGTTCAGTTATAT
2838




ACTCCT

GTTT






2235
2257
CCAAACATATAACTGAA
1070
GAGGAGTTCAGTTATA
2839




CTCCTC

TGTT






2254
2276
CCTCACACCCAATTGGA
1071
GATTGGTCCAATTGGG
2840




CCAATC

TGTG






2261
2283
CCCAATTGGACCAATCT
1072
GGTGATAGATTGGTCC
2841




ATCACC

AATT






2262
2284
CCAATTGGACCAATCTA
1073
GGGTGATAGATTGGTC
2842




TCACCC

CAAT






2271
2293
CCAATCTATCACCCTAT
1074
TCTTCTATAGGGTGAT
2843




AGAAGA

AGAT






2282
2304
CCCTATAGAAGAACTAA
1075
CTAACATTAGTTCTTC
2844




TGTTAG

TATA






2283
2305
CCTATAGAAGAACTAAT
1076
ACTAACATTAGTTCTT
7845




GTTAGT

CTAT






2328
2350
CCTCCGCATAAGCCTGC
1077
TCTGACGCAGGCTTAT
2846




GTCAGA

GCGG






2331
2353
CCGCATAAGCCTGCGTC
1078
TAATCTGACGCAGGCT
2847




AGATTA

TATG






2340
2362
CCTGCGTCAGATTAAAA
1079
TCAGTGTTTTAATCTG
2848




CACTGA

ACGC






2378
2400
CCCAATATCTACAATCA
1080
GTTGGTTGATTGTAGA
2849




ACCAAC

TATT






2379
2401
CCAATATCTACAATCAA
1081
TGTTGGTTGATTGTAG
2850




CCAACA

ATAT






2396
2418
CCAACAAGTCATTATTA
1082
TGAGGGTAATAATGAC
2851




CCCTCA

TTGT






2413
2435
CCCTCACTGTCAACCCA
1083
CTGTGTTGGGTTGACA
2852




ACACAG

GTGA






2414
2436
CCTCACTGTCAACCCAA
1084
CCTGTGTTGGGTTGAC
2853




CACAGG

AGTG






2426
2448
CCCAACACGGCATGCT
1085
CTTATGAGCATGCCTG
2854




CATAAG

TGTT






2427
2449
CCAACACAGGCATGCTC
1086
CCTTATGAGCATGCCT
2855




ATAAGG

GTGT






2488
2510
CCCCGCCTGTTTACCAA
1087
ATGTTTTTGGTAAACA
2856




AAACAT

GGCG






2489
2511
CCCGCCTGTTTACCAAA
1088
GATGTTTTTGGTAAAC
2857




AACATC

AGGC






2490
2512
CCGCCTGTTTACCAAAA
1089
TGATGTTTTTGGTAAA
2858




ACATCA

CAGG






2493
2515
CCTGTTTACCAAAAACA
1090
AGGTGATGTTTTTGGT
2859




TCACCT

AAAC






2501
2523
CCAAAAACATCACCTCT
1091
GATGCTAGAGGTGATG
2860




AGCATC

TTTT






2513
2535
CCTCTAGCATCACCAGT
1092
TCTAATACTGGTGATG
2861




ATTAGA

CTAG






2525
2547
CCAGTATTAGAGGCACC
1093
GCAGGCGGTGCCTCTA
2862




GCCTGC

ATAC






2540
2562
CCGCCTGCCCAGTGACA
1094
AACATGTGTCACTGGG
2863




CATGTT

CAGG






2543
2565
CCTGCCCAGTGACACAT
1095
TTAAACATGTGTCACT
2864




GTTTAA

GGGC






2547
2569
CCCAGTGACACATGTTT
1096
GCCGTTAAACATGTGT
2865




AACGGC

CACT






2548
2570
CCAGTGACACATGTTTA
1097
GGCCGTTAAACATGTG
2866




ACGGCC

TCAC






2569
2591
CCGCGGTACCCTAACCG
1098
TTTGCACGGTTAGGGT
2867




TGCAAA

ACCG






2577
2599
CCCTAACCGTGCAAAGG
1099
ATGCTACCTTTGCACG
2868




TAGCAT

GTTA






2578
2600
CCTAACCGTGCAAAGGT
1100
TATGCTACCTTTGCAC
2869




AGCATA

GGTT






2583
2605
CCGTGCAAAGGTAGCAT
1101
GTGATTATGTACCT
2870




AATCAC

TGCA






2611
2633
CCTTAAATAGGGACCTG
1101
TTCATACAGGTCCCTA
2871




TATGAA

TTTA






2624
2646
CCTGTATGAATGGCTCC
1103
CCTCGTGGAGCCATTC
2872




ACGAGG

ATAC






2639
2661
CCACGAGGGTTCAGCTG
1104
AAGAGACAGCTGAAC
2873




TCTCTT

CCTCG






2670
2692
CCAGTGAAATTGACCTG
1105
CACGGGCAGGTCAATT
2874




CCCGTG

TCAC






2683
2705
CCTGCCCGTGAAGAGGC
1106
ATGCCCGCCTCTTCAC
7875




GGGCAT

GGGC






2687
2709
CCCGTGAAGAGGCGGG
1107
TGTTATGCCCGCCTCT
2876




CATAACA

TCAC






2688
2710
CCGTGAAGAGGCGGGC
1108
GTGTTATGCCCGCCTC
2877




ATAACAC

TTCA






2726
2748
CCCTATGGAGCTTTAAT
1109
TAATAAATTAAAGC 
2878




TTATTA

CATA






2727
2749
CCTATGGAGCTTTAATT
1110
TTAATAAATTAAAGCT
2879




TATTAA

CCAT






2761
2783
CCTAACAAACCCACAGG
1111
TTAGGACCTGTGGGTT
2880




TCCTAA

TGTT






2770
2792
CCCACAGGTCCTAAACT
1112
TTTGGTAGTTTAGGAC
2881




ACCAAA

CTGT






2771
2793
CCACAGGTCCTAAACTA
1113
GTTTGGTAGTTTAGGA
2882




CCAAAC

CCTG






7779
2801
CCTAAACTACCAAACCT
1114
TAATGCAGGTTTGGTA
2883




GCATTA

GTTT






2788
2810
CCAAACCTGCATTAAAA
1115
CGAAATTTTTAATGCA
2884




ATTTCG

GGTT






2793
2815
CCTGCATTAAAAATTTC
1116
CCAACCGAAATTTTTA
2885




GGTTGG

ATGC






2821
2843
CCTCGGAGCAGAACCCA
1117
GGAGGTTGGGTTCTGC
2886




ACCTCC

TCCG






2834
2856
CCCAACCTCCGAGCAGT
1118
GCATGTACTGCTCGGA
2887




ACATGC

GGTT






2835
2857
CCAACCTCCGAGCAGTA
1119
AGCATGTACTGCTCGG
2888




CATGCT

AGGT






2839
2861
CCTCCGAGCAGTACATG
1120
TCTTAGCATGTACTGC
2889




CTAAGA

TCGG






2842
2864
CCGAGCAGTACATGCTA
1121
AAGTCTTAGCATGTAC
2890




AGACTT

TGCT






2867
2889
CCAGTCAAAGCGAACTA
1122
GTATACTTAGTTCGCTT
2891




CTATAC

TGAC






2899
2921
CCAATAACTTGACCAAC
1123
TGTTCCGTTGGTCAACG
2892




GGAACA

TTAT






2911
2933
CCAACGGAACAAGTTAC
1124
CCTAGGGTAACTTGTT
2893




CCTAGG

CCGT






2927
2949
CCCTAGGGATAACAGCG
1125
GGATTGCGCTGTTATC
2894




CAATCC

CCTA






2928
2950
CCTAGGGATAACAGCGC
1126
AGGATTGCGCTGTTAT
2895




AATCCT

CCCT






2948
2970
CCTATTCTAGAGTCCAT
1127
GTTGATATGGACTCTA
2896




ATCAAC

GAAT






2961
2983
CCATATCAACAATAGGG
1128
CGTAAACCCTATTGTT
2897




TTTACG

GATA






2985
3007
CCTCGATGTTGGATCAG
1129
GATGTCCTGATCCAAC
2898




GACATC

ATCG






3007
3029
CCCGATGGTGCAGCCGC
1130
TTAATAGCGGCTGCAC
2899




TATTAA

CATC






3008
3030
CCGATGGTGCAGCCGCT
1131
TTTAATAGCGGCTGCA
2900




ATTAAA

CCAT






3020
3042
CCGCTATTAAAGGTTCG
1132
AACAAACGAACCTTTA
2901




TTTGTT

ATAG






3056
3078
CCTACGTGATCTGAGTT
1133
GGTCTGAACTCAGATC
2902




CAGACC

ACGT






3077
3099
CCGGAGTAATCCAGGTC
1134
GAAACCGACCTGGATT
2903




GGTTTC

ACTC






3087
3109
CCAGGTCGGTTTCTATC
1135
AANGTAGATAGAAAC
2904




TACNTT

CGACC






3116
3138
CCTCCCTGTACGAAAGG
1136
TCTTGTCCTTTCGTACA
2905




ACAAGA

GGG






3119
3141
CCCTGTACGAAAGGACA
1137
TTCTCTTGTCCTTTCGT
2906




AGAGAA

ACA






3120
3142
CCTGTACGAAAGGACA
1138
TTTCTCTTGTCCTTTCG
2907




AGAGAAA

TAC






3148
3170
CCTACTTCACAAAGCGC
1139
GGGAAGGCGCTTTGTG
2908




CTTCCC

AAGT






3164
3186
CCTTCCCCCGTAAATGA
1140
ATGATATCATTTACGG
2909




TATCAT

GGGA






3168
3190
CCCCCGTAAATGATATC
1141
TGAGATGATATCATTT
2910




ATCTCA

ACGG






3169
3191
CCCCGTAAATGATATCA
1142
TTGAGATGATATCATT
2911




TCTCAA

TACG






3170
3192
CCCGTAAATGATATCAT
1143
GTTGAGATGATATCAT
2912




CTCAAC

TTAC






3171
3193
CCGTAAATGATATCATC
1144
AGTTGAGATGATATCA
2913




TCAACT

TTTA






3204
3226
CCCACACCCACCCAAGA
1145
CCCTGTTCTTGGGTGG
2914




ACAGGG

GTGT






3205
3227
CCACACCCACCCAAGAA
1146
ACCCTGTTCTTGGGTG
2915




CAGGGT

GGTG






3210
3232
CCCACCCAAGAACAGG
1147
AACAAACCCTGTTCTT
2916




GTTTGTT

GGGT






3211
3233
CCACCCAAGAACAGGG
1148
TAACAAACCCTGTTCT
2917




TTTGTTA

TGGG






3214
3236
CCCAAGAACAGGGTTTG
1149
TCTTAACAAACCCTGT
2918




TTAAGA

TCTT






3215
3237
CCAAGAACAGGGTTTGT
1150
ATCTTAACAAACCCTG
2919




TAAGAT

TTCT






3245
3267
CCCGGTAATCGCATAAA
1151
TTAAGTTTTATGCGAT
2920




ACTTAA

TACC






3246
3268
CCGGTAATCGCATTAAAA
1152
TTTAAGTTTTATGCGA
2921




CTTAAA

TTAC






3292
3314
CCTCTTCTTAACAACAT
1153
ATGGGTATGTTGTTAA
2922




ACCCAT

GAAG






3310
3332
CCCATGGCCAACCTCCT
1154
AGGAGTAGGAGGTTG
2923




ACTCCT

GCCAT






3311
3333
CCATGGCCAACCTCCTA
1155
GAGGAGTAGGAGGTT
2924




CTCCTC

GGCCA






3317
3339
CCAACCTCCTACTCCTC
1156
TACAATGAGGAGTAG
2925




ATTGTA

GAGGT






3321
3343
CCTCCTACTCCTATTGT
1157
TGGGTACAATGAGGA
2926




ACCCA

GTAGG






3324
3346
CCTACTCCTCATTGTAC
1158
GAATGGGTACAATGA
2927




CCATTC

GGAGT






3330
3352
CCTCATTGTACCCATTC
1159
CGATTAGAATGGGTAC
2928




TAATCG

AATG






3340
3362
CCCATTCTAATCGCAAT
1160
AATGCCATTGCGATTA
2929




GGCATT

GAAT






3341
3363
CCATTCTAATCGCAATG
1161
GAATGCCATTGCGATT
2930




GCATTC

AGAA






3363
3385
CCTAATGCTTACCGAAC
1162
TTTTTCGTTCGGTAAG
2931




GAAAAA

CATT






3374
3396
CCGAACGAAAAATTCTA
1163
ATAGCCTAGAATTTTT
2932




GGCTAT

CGTT






3414
3436
CCCCAACGTTGTAGGCC
1164
CGTAGGGGCCTACAAC
2933




CCTACG

GTTG






3415
3437
CCCAACGTTGTAGGCCC
1165
CCGTAGGGGCCTACAA
2934




CTACGG

CGTT






3416
3438
CCAACGTTGTAGGCCCC
1166
CCCGTAGGGGCCTACA
2935




TACGGG

ACGT






3429
3451
CCCCTACGGGCTACTAC
1167
AGGTTGTAGTAGCCC
2936




AACCCT

GTAG






3430
3452
CCCTACGGGCTACTACA
1168
AAGGGTTGTAGTAGCC
2937




ACCCTT

CGTA






3431
3453
CCTACGGGCTACTACAA
1169
GAAGGGTTGTAGTAGC
2938




CCCTTC

CCGT






3448
3470
CCCTTCGCTGACGCCAT
1170
AGTTTTATGGCGTCAG
2939




AAAACT

CGAA






3449
3471
CCTTCGCTGACGCCATA
1171
GAGTTTTATGGCGTCA
2940




AAACTC

GCGA






3461
3483
CCATAAAACTCTTCACC
1172
CTCTTTGGTGAAGAGT
2941




AAAGAG

TTTA






3476
3498
CCAAAGAGCCCCTAAA
1173
GGCGGGTTTTAGGGGC
2942




ACCCGCC

TCTT






3484
3506
CCCCTAAAACCCGCCAC
1174
GTAGATGTGGCGGGTT
2943




ATCTAC

TTAG






3485
3507
CCCTAAAACCCGCCACA
1175
GGTAGATGTGGCGGGT
2944




TCTACC

TTTA






3486
3508
CCTAAAACCCGCCACAT
1176
TGGTAGATGTGGCGGG
2945




CTACCA

TTTT






3493
3515
CCCGCCACATCTACCAT
1177
AGGGTGATGGTAGATG
2946




CACCCT

TGGC






3494
3516
CCGCCACATCTACCATC
1178
GAGGGTGATGGTAGAT
2947




ACCCTC

GTGG






3497
3519
CCACATCTACCATCACC
1179
GTAGAGGGTGATGGTA
2948




CTCTAC

GATG






3506
3528
CCATCACCCTCTACATC
1180
GGCGGTGATGTAGAG
2949




ACCGCC

GGTGA






3512
3534
CCCTCTACATCACCGCC
1181
GGTCGGGGCGGTGATG
2950




CCGACC

TAGA






3513
3535
CCTCTACATCACCGCCC
1182
AGGTCGGGGCGGTGAT
2951




CGACCT

GTAG






3524
3546
CCGCCCCGACCTTAGCT
1183
GGTGAGAGCTAAGGTC
2952




CTCACC

GGGG






3527
3549
CCCCGACCTTAGCTCTC
1184
GATGGTGAGAGCTAA
2953




ACCATC

GGTCG






3528
3550
CCCGACCTTAGCTCTCA
1185
CGATGGTGAGAGCTAA
2954




CCATCG

GGTC






3529
3551
CCGACCTTAGCTCTCAC
1186
GCGATGGTGAGAGCTA
2955




CATCGC

AGGT






3533
3555
CCTTAGCTCTCACCATC
1187
AAGAGCGATGGTGAG
2956




GCTCTT

AGCTA






3545
3567
CCATCGCTCTTCTACTA
1188
GGTTCATAGTAGAAGA
2957




TGAACC

GCGA






3566
3588
CCCCCCTCCCCATACCC
1189
GGGGTTGGGTATGGGG
2958




AACCCC

AGGG






3567
3589
CCCCCTCCCCATACCA
1190
GGGGGTTGGGTATGGG
2959




ACCCCC

GAGG






3568
3590
CCCCTCCCCATACCCAA
1191
AGGGGGTTGGGTATGG
2960




CCCCCT

GGAG






3569
3591
CCCTCCCCATACCCAAC
1192
CAGGGGGTTGGGTATG
2961




CCCCTG

GGGA






3570
3592
CCTCCCCATACCCAACC
1193
CCAGGGGGTTGGGTAT
2962




CCCTGG

GGGG






3573
3595
CCCCATACCCAACCCCC
1194
TGACCAGGGGGTTGGG
2963




TGGTCA

TATG






3574
3596
CCCATACCCAACCCCCT
1195
TTGACCAGGGGGTTGG
2964




GGTCAA

GTAT






3575
3597
CCATACCCAACCCCCTG
1196
GTTGACCAGGGGGTTG
2965




GTCAAC

GGTA






3580
3602
CCCAACCCCCTGGTCAA
1197
TTGAGGTTGACCAGGG
2966




CCTCAA

GGTT






3581
3603
CCAACCCCCTGGTCAAC
1198
GTTGAGGTTGACCAGG
2967




CTCAAC

GGGT






3585
3607
CCCCCTGGTCAACCTCA
1199
CTAGGTTGAGGTTGAC
2968




ACCTAG

CAGG






3586
3608
CCCCTGGTCAACCTCAA
1200
CCTAGGTTGAGGTTGA
2969




CCTAGG

CCAG






3587
3609
CCCTGGTCAACCTCAAC
1201
GCCTAGGTTGAGGTTG
2970




CTAGGC

ACCA






3588
3610
CCTGGTCAACCTCAACC
1202
GGCCTAGGTTGAGGTT
2971




TAGGCC

GACC






3597
3619
CCTCAACCTAGGCCTCC
1203
TAAATAGGAGGCCTAG
2972




TATTTA

GTTG






3603
3625
CCTAGGCCTCCTATTTA
1204
CTAGAATAAATAGGA
2973




TTCTAG

GGCCT






3609
3631
CCTCCTATTTATTCTAGC
1205
AGGTGGCTAGAATAA
2974




CACCT

ATAGG






3612
3634
CCTATTTATTTAGCCA
1206
TAGAGGTGGCTAGAAT
2975




CCTCTA

AAAT






3626
3648
CCACCTCTAGCCTAGCC
1207
GTAAACGGCTAGGCTA
2976




GTTTAC

GAGG






3629
3651
CCTCTAGCCTAGCCGTT
1208
TGAGTAAACGGCTAGG
2977




TACTCA

CTAG






3636
3658
CCTAGCCGTTTACTCAA
1209
AGAGGATTGAGTAAA
2978




TCCTCT

CGGCT






3641
3663
CCGTTTACTCAATCCTC
1210
TGATCAGAGGATTGAG
2979




TGATCA

TAAA






3654
3676
CCTCTGATCAGGGTGAG
1211
TTGATGCTCACCCTGA
2980




CATCAA

TCAG






3689
3711
CCCTGATCGGCGCCTG
1212
TGCTCGCAGTGCGCCG
2981




CGAGCA

ATCA






3690
3712
CCTGATCGGCACTGC
1213
CTGCTCGCAGTGCGCC
2982




GAGCAG

GATC






3716
3738
CCCAAACAATCTCATAT
1214
GACTTCATATGAGATT
2983




GAAGTC

GTTT






3717
3739
CCAAACAATCTCATATG
1215
TGACTTCATATGAGAT
2984




AAGTCA

TGTT






3740
3762
CCCTAGCCATCATTCTA
1216
TGATAGTAGAATGATG
2985




CTATCA

GCTA






3741
3763
CCTAGCCATCATTCTAC
1217
TTGATAGTAGAATGAT
2986




TATCAA

GGCT






3746
3768
CCATCATTCTACTATCA
1218
TAATGTTGATAGTAGA
2987




ACATTA

ATGA






3782
3804
CCTTTAACCTCTCCACC
1219
GATAAGGGTGGAGAG
2988




CTTATC

GTTAA






3789
3811
CCTCTCCACCCTTATCA
1220
GTGTTGTGATAAGGGT
2989




CAACAC

GGAG






3794
3816
CCACCCTTATCACAACA
1221
TTCTTGTGTTGTGATA
2990




CAAGAA

AGGG






3797
3819
CCCTTATCACAACACAA
1222
GTGTTCTTGTGTTGTG
2991




GAACAC

ATAA






3798
3820
CCTTATCACAACACAAG
1223
GGTGTTCTTGTGTTGT
2992




AACACC

GATA






3819
3841
CCTCTGATTACTCCTGC
1224
ATGATGGCAGGAGTA
2993




CATCAT

ATCAG






3831
3853
CCTGCCATCATGACCCT
1225
TGGCCAAGGGTCATGA
2994




TGGCCA

GGC






3835
3857
CCATCATGACCCTTGGC
1226
ATTATGGCCAAGGGTC
2995




CATAAT

ATGA






3844
3866
CCCTTGGCCATAATATG
1227
ATAAATCATATTATGG
2996




ATTTAT

CCAA






3845
3867
CCTTGGCCATAATATGA
1228
GATAAATCATATTATG
2997




TTTATC

GCCA






3851
3873
CCATAATATGATTTATC
1229
TGTGGAGATAAATCAT
2998




TCCACA

ATTA






3869
3891
CCACACTAGCAGAGACC
1230
TCGGTTGGTCTCTGCT
2999




AACCGA

AGTG






3884
3906
CCAACCGAACCCCCTTC
1231
AAGGTCGAAGGGGGT
3000




GACCTT

TCGGT






3888
3910
CCGAACCCCCTTCGACC
1232
CGGCAAGGTCGAAGG
3001




TTGCCG

GGGTT






3893
3915
CCCCCTTCGACCTTGCC
1233
CCCTTCGGCAAGGTCG
3002




GAAGGG

AAGG






3894
3916
CCCCTTCGACCTTGCCG
1234
CCCCTTCGGCAAGGTC
3003




AAGGGG

GAAG






3895
3917
CCCTTCGACCTTGCCGA
1235
TCCCCTTCGGCAAGGT
3004




AGGGGA

CGAA






3896
3918
CCTTCGACCTTGCCGAA
1236
CTCCCCTTCGGCAAGG
3005




GGGGAG

TCGA






3903
3925
CCTTGCCGAAGGGGAGT
1237
GTTCGGACTCCCCTTC
3006




CCGAAC

GGCA






3908
3930
CCGAAGGGGAGTCCGA
1238
GACTAGTTCGGACTCC
3007




ACTAGTC

CCTT






3920
3942
CCGAACTAGTCTCAGGC
1239
GTTGAAGCCTGAGACT
3008




TTCAAC

AGTT






3953
3975
CCGCAGGCCCCTTCGCC
1240
GAATAGGGCGAAGGG
3009




CTATTC

GCCTG






3960
3982
CCCCTTCGCCCTATTCTT
1241
CTATGAAGAATAGGGC
3010




CATAG

GAAG






3961
3983
CCCTTCGCCCTATTCTTC
1242
GCTATGAAGAATAGG
3011




ATAGC

GCGAA






3962
3984
CCTTCGCCCTATTCTTCA
1243
GGCTATGAAGAATAG
3012




TAGCC

GGCGA






3968
3990
CCCTATTCTTCATAGCCG
1244
GTATTCGGCTATGAAG
3013




GAATAC

AATA






3969
3991
CCTATTCTTCATAGCCG
1245
TGTATTCGGCTATGAA
3014




AATACA

GAAT






3983
4005
CCGAATACACAAACATT
1246
TATAATAATGTTTGTG
3015




ATTATA

TATT






4013
4035
CCCTCACCACTACAATC
1247
TAGGAAGATTGTAGTG
3016




TTCCTA

GTGA






4014
4036
CCTCACCACTACAATCT
1248
CTAGGAAGATTGTAGT
3017




TCCTAG

GGTG






4019
4041
CCACTACAATCTTCCTA
1249
TGTTCCTAGGAAGATT
3018




GGAACA

CTTAG






4032
4054
CCTAGGAACAACATATG
1250
GTGCGTCATATGTTGT
3019




ACGCAC

TCCT






4058
4080
CCCCTGAACTCTACACA
1251
ATATGTTGTGTAGAGT
3020




ACATAT

TCAG






4059
4081
CCCTGAACTCTACACAA
1252
AATATGTTGTGTAGAG
3021




CATATT

TTCA






4060
4082
CCTGAACTCTACCAAC
1253
AAATATGTTGTGTAGA
3022




ATATTT

GTTC






4088
4110
CCAAGACCCTACTTCTA
1254
GGAGGTTAGAAGTAG
3023




ACCTCC

GGTCT






4094
4116
CCCTACTTCTAACCTCC
1255
GAACAGGGAGGTTAG
3024




CTGTTC

AAGTA






4095
4117
CCTACTTCTAACTCCC
1256
AGAACAGGGAGGTTA
3025




TGTTCT

GAAGT






4106
4128
CCTCCCTGTTCTTATGA
1257
TCGAATTCATAAGAAC
3026




ATTCGA

AGGG






4109
4131
CCCTGTTCTTATGAATT
1258
TGTTCGAATTCATAAG
3027




CGAACA

AACA






4110
4132
CCTGTTCTTATGAATTC
1259
CTGTTCGAATTCATAA
3028




GAACAG

GAAC






4137
4159
CCCCCGATTCCGCTACG
1260
GTTGGTCGTAGCGGAA
3029




ACCAAC

TCGG






4138
4160
CCCCGATTCCGCTACGA
1261
AGTTGGTCGTAGCGGA
3030




CCAACT

ATCG






4139
4161
CCCGATTCCGCTACGAC
1262
GAGTTGGTCGTAGCGG
3031




CAACTC

AATC






4140
4162
CCGATTCCGCTACGACC
1263
TGAGTTGGTCGTAGCG
3032




AACTCA

GAAT






4146
4168
CCGCTACGACCAACTCA
1264
GGTGTATGAGTTGGTC
3033




TACACC

GTAG






4155
4177
CCAACTCATACACCTCC
1265
TTCATAGGAGGTGTAT
3034




TATGAA

GAGT






4167
4189
CCTCCTATGAAAAAACT
1266
GTAGGAAGTTTTTTCA
3035




TCCTAC

TAGG






4170
4192
CCTATGAAAAAACTTCC
1267
GTGGTAGGAAGTTTTT
3036




TACCAC

TCAT






4185
4207
CCTACCACTCACCCTAG
1268
GTAATGCTAGGGTGAG
3037




CATTAC

TGGT






4189
4211
CCACTCACCCTAGCATT
1269
ATAAGTAATGCTAGGG
3038




ACTTAT

TGAG






4196
4218
CCCTAGCATTACTTATA
1270
ATATCATATAAGTAAT
3039




TGATAT

GCTA






4197
4219
CCTAGCATTACTTATAT
1271
CATATCATATAAGTAA
3040




GATATG

TGCT






4223
4245
CCATACCCATTACAATC
1272
GCTGGAGATTGTAATG
3041




TCCAGC

GGTA






4228
4250
CCCATTACAATCTCCAG
1273
GGAATGCTGGAGATTG
3042




CATTCC

TAAT






4229
4251
CCATTACAATCTCCAGC
1274
GGGAATGCTGGAGATT
3043




ATTCCC

GTAA






4241
4263
CCAGCATTCCCCCTCAA
1275
TTAGGTTTGAGGGGGA
3044




ACCTAA

ATGC






4249
4271
CCCCCTCAAACCTAAGA
1276
CATATTTCTTAGGTTT
3045




AATATG

GAGG






4250
4272
CCCCTCAAACCTAAGAA
1277
ACATATTTCTTAGGTT
3046




ATATGT

TGAG






4251
4273
CCCTCAAACCTAAGAAA
1278
GACATATTTCTTAGGT
3047




TATGTC

TTGA






4252
4274
CCTCAAACCTAAGAAAT
1279
AGACATATTTCTTAGG
3048




ATGTCT

TTTG






4259
4281
CCTAAGAAATATGTCTG
1280
TTTTATCAGACATATT
3049




ATAAAA

TCTT






4318
4340
CCCCCTTATTTCTAGGA
1281
TCATAGTCCTAGAAAT
3050




CTATGA

AAGG






4319
4341
CCCCTTATTTCTAGGAC
1282
CTCATAGTCCTAGAAA
3051




TATGAG

TAAG






4320
4342
CCCTTATTTCTAGGACT
1283
TCTCATAGTCCTAGAA
3052




ATGAGA

ATAA






4321
4343
CCTTATTTCTAGGACTA
1284
TTCTCATAGTCCTAGA
3053




TGAGAA

AATA






4349
4371
CCATCCCTGAGAATCC
1285
AATTTTGGATTCTCAG
3054




AAAATT

GGAT






4350
4372
CCATCCCTGAGAATCCA
1286
GAATTTTGGATTCTCA
3055




AAATTC

GGGA






4354
4376
CCCTGAGAATCCAAAAT
1287
CGGAGAATTTTGGATT
3056




TCTCCG

CTCA






4355
4377
CCTGAGAATCCAAAATT
1288
ACGGAGAATTTTGGAT
3057




CTCCGT

TCTC






4364
4386
CCAAAATTCTCCGTGCC
1289
ATAGGTGGCACGGAG
3058




ACCTAT

AATTT






4374
4396
CCGTGCCACCTATCACA
1290
ATGGGGTGTGATAGGT
3059




CCCCAT

GGCA






4379
4401
CCACCTATCACACCCCA
1291
TTAGGATGGGGTGTGA
3060




TCCTAA

TAGG






4382
4404
CCTATCACACCCCATCC
1292
ACTTTAGGATGGGGTG
3061




TAAAGT

TGAT






4391
4413
CCCCATCCTAAAGTAAG
1293
GCTGACCTTACTTTAG
3062




GTCAGC

GATG






4392
4414
CCCATCCTAAAGTAAGG
1294
AGCTGACCTTACTTTA
3063




TCAGCT

GGAT






4393
4415
CCATCCTAAAGTAAGGT
1295
TAGCTGACCTTACTTT
3064




CAGCTA

AGGA






4397
4419
CCTAAAGTAAGGTCAGC
1296
TATTTAGCTGACCTTA
3065




TAAATA

CTTT






4430
4452
CCCATACCCCGAAAATG
1297
AACCAACATTTTCGGG
3066




TTGGTT

GTAT






4431
4453
CCATACCCCGAAAATGT
1298
AACCAACATTTTCGGG
3067




TGGTTA

GGTA






4436
4458
CCCCGAAAATGTTGGTT
1299
GGGTATAACCAACATT
3068




ATACCC

TTCG






4437
4459
CCCGAAAATGTTGGTTA
1300
AGGGTATAACCAACAT
3069




TACCCT

TTTC






4438
4460
CCGAAAATGTTGGTTAT
1301
AAGGGTATAACCAAC
3070




ACCCTT

ATTTT






4456
4478
CCCTTCCCGTACTAATT
1302
GGGATTAATTAGTACG
3071




AATCCC

GGAA






4457
4479
CCTTCCCGTACTAATTA
1303
GGGGATTAATTAGTAC
3072




ATCCCC

GGGA






4461
4483
CCCGTACTAATTAATCC
1304
GCCAGGGGATTAATTA
3073




CCTGGC

GTAC






4462
4484
CCGTACTAATTAATCCC
1305
GGCCAGGGGATTAATT
3074




CTGGCC

AGTA






4476
4498
CCCCTGGCCCAACCCGT
1306
TAGATGACGGGTTGGG
3075




CATCTA

CCAG






4477
4499
CCCTGGCCCAACCCGTC
1307
GTAGATGACGGGTTGG
3076




ATCTAC

GCCA






4478
4500
CCTGGCCCAACCCGTCA
1308
AGTAGATGACGGGTTG
3077




TCTACT

GGCC






4483
4505
CCCAACCCGTCATCTAC
1309
GGTAGAGTAGATGAC
3078




TCTACC

GGGTT






4484
4506
CCAACCCGTCATCTACT
1310
TGGTAGAGTAGATGAC
3079




CTACCA

GGGT






4488
4510
CCCGTCATCTACTCTAC
1311
AAGATGGTAGAGTAG
3080




CATCTT

ATGAC






4489
4511
CCGTCATCTACTCTACC
1312
AAAGATGGTAGAGTA
3081




ATCTTT

GATGA






4504
4526
CCATCTTTGCAGGCACA
1313
GATGAGTGTGCCTGCA
3082




CTCATC

AAGA






4555
4577
CCTGAGTAGGCCTA
1314
GTTTATTTCTAGGCCT
3083




ATAAAC

ACTC






4565
4587
CCTAGAAATAAACATGC
1315
AAGCTAGCATGTTTAT
3084




TAGCTT

TTCT






4593
4615
CCAGTTCTAACCAAAAA
1316
TTTATTTTTTTGGTTAG
3085




AATAAA

AAC






4603
4625
CCAAAAAAATAAACCCT
1317
GGAACGAGGGTTTATT
3086




CGTTCC

TTTT






4616
4638
CCCTCGTTCCACAGAAG
1318
TGGCAGCTTCTGTGGA
3087




CTGCCA

ACGA






4617
4639
CCTCGTTCCACAGAAGC
1319
ATGGCAGCTTCTGTGG
3088




TGCCAT

AACG






4624
4646
CCACAGAAGCTGCCATC
1320
ATACTTGATGGCAGCT
3089




AAGTAT

TCTG






4636
4658
CCATCAAGTATTTCCTC
1321
TTGCGTGAGGAAATAC
3090




ACGCAA

TTGA






4649
4671
CCTCACGCAAGCAACCG
1322
TGGATGCGGTTGCTTG
3091




CATCCA

CGTG






4663
4685
CCGCATCCATAATCCTT
1323
TATTAGAAGGATTATG
3097




CTAATA

GATG






4669
4691
CCATAATCCTTCTAATA
1324
GATAGCTATTAGAAGG
3093




GCTATC

ATTA






4676
4698
CCTTCTAATAGCTATCC
1325
TGAAGAGGATAGCTAT
3094




TCTTCA

TAGA






4691
4713
CCTCTTCAACAATATAC
1326
CGGAGAGTATATTGTT
3095




TCTCCG

GAAG






4711
4733
CCGGACAATGAACCATA
1327
ATTGGTTATGGTTCAT
3096




ACCAAT

TGTC






4723
4745
CCATAACCAATACTACC
1328
TTGATTGGTAGTATTG
3097




AATCAA

GTTA






4729
4751
CCAATACTACCAATCAA
1329
TGAGTATTGATTGGTA
3098




TACTCA

GTAT






4738
4760
CCAATCAATACTCATCA
1330
TATTAATGATGAGTAT
3099




TTAATA

TGAT






4795
4817
CCCCCTTTCACTTCTGA
1331
TGGGACTCAGAAGTGA
3100




GTCCCA

AAGG






4796
4818
CCCCTTTCACTTCTGAG
1332
CTGGGACTCAGAAGTG
3101




TCCCAG

AAAG






4797
4819
CCCTTTCACTTCTGAGT
1333
TCTGGGACTCAGAAGT
3102




CCCAGA

GAAA






4798
4820
CCTTTCACTTCTGAGTC
1334
CTCTGGGACTCAGAAG
3103




CCAGAG

TGAA






4814
4836
CCCAGAGGTTACCCAAG
1335
GGGTGCCTTGGGTAAC
3104




GCACCC

CTCT






4815
4837
CCAGAGGTTACCCAAGG
1336
GGGGTGCCTTGGGTAA
3105




CACCCC

CCTC






4825
4847
CCCAAGGCACCCCTCTG
1337
GGATGTCAGAGGGGT
3106




ACATCC

GCCTT






4826
4848
CCAAGGCACCCCTCTGA
1338
CGGATGTCAGAGGGGT
3107




CATCCG

GCCT






4834
4856
CCCCTCTGACATCCGGC
1339
AAGCAGGCCGGATGTC
3108




CTGCTT

AGAG






4835
4857
CCCTCTGACATCCGGCC
1340
GAAGCAGGCCGGATG
3109




TGCTTC

TCAGA






4836
4858
CCTCTGACATCCGGCCT
1341
AGAAGCAGGCCGGAT
3110




GCTTCT

GTCAG






4846
4868
CCGGCCTGCTTCTTCTC
1342
TCATGTGAGAAGAAGC
3111




ACATGA

AGGC






4850
4872
CCTGCTTCTTCTCACAT
1343
TTTGTCATGTGAGAAG
3112




GACAAA

AAGC






4879
4901
CCCCCATCTCAATCATA
1344
TTGGTATATGATTGAG
3113




TACCAA

ATGG






4880
4902
CCCCATCTCAATCATAT
1345
TTTGGTATATGATTGA
3114




ACCAAA

GATG






4881
4903
CCCATCTCAATCATATA
1346
ATTTGGTATATGATTG
3115




CCAAAT

AGAT






4882
4904
CCATCTCAATCATATAC
1347
GATTTGGTATATGATT
3116




CAAATC

GAGA






4898
4920
CCAAATCTCTCCCTCAC
1348
CGTTTAGTGAGGGAGA
3117




TAAACG

GATT






4908
4930
CCCTCACTAAACGTAAG
1349
AGAAGGCTTACGTTTA
3118




CCTTCT

GTGA






4909
4931
CCTCACTAAACGTAAGC
1350
GAGAAGGCTTACGTTT
3119




CTTCTC

AGTG






4925
4947
CCTTCTCCTCACTCTCTC
1351
AGATTGAGAGAGTGA
3120




AATCT

GGAGA






4931
4953
CCTCACTCTCTCAATCTT
1352
TGGATAAGATTGAGAG
3121




ATCCA

AGTG






4951
4973
CCATCATAGCAGGCAGT
1353
ACCTCAACTGCCTGCT
3122




TGAGGT

ATGA






4982
5004
CCAAACCCAGCTACGCA
1354
AGATTTTGCGTAGCTG
3123




AAATCT

GGTT






4987
5009
CCCAGCTACGCAAAATC
1355
TGCTAAGATTTTGCGT
3124




TTAGCA

AGCT






4988
5010
CCAGCTACGCAAAATCT
1356
ATGCTAAGATTTTGCG
3125




TAGCAT

TAGC






5014
5036
CCTCAATTACCCACATA
1357
TCATCCTATGTGGGTA
3126




GGATGA

ATTG






5023
5045
CCCACATAGGATGAATA
1358
TGCTATTATTCATCCT
3127




ATAGCA

ATGT






5024
5046
CCACATAGGATGAATAA
1359
CTGCTATTATTCATCCT
3128




TAGCAG

ATG






5052
5074
CCGTACAACCCTAACAT
1360
ATGGTTATGTTAGGGT
3129




AACCAT

TGTA






5060
5082
CCCTAACATAACCATTC
1361
AATTAAGAATGGTTAT
3130




TTAATT

GTTA






5061
5083
CCTAACATAACCATTCT
1362
AAATTAAGAATGGTTA
3131




TAATTT

TGTT






5071
5093
CCATTCTTAATTTAACT
1363
ATAAATAGTTAAATTA
3132




ATTTAT

AGAA






5099
5121
CCTAACTACTACCGCAT
1364
GTAGGAATGCGGTAGT
3133




TCCTAC

AGTT






5110
5132
CCGCATTCCTACTACTC
1365
TAAGTTGAGTAGTAGG
3134




AACTTA

AATG






5117
5139
CCTACTACTCAACTTAA
1366
TGGAGTTTAAGTTGAG
3135




ACTCCA

TAGT






5137
5159
CCAGCACCACGACCCTA
1367
TAGTAGTAGGGTCGTG
3136




CTACTA

GTGC






5143
5165
CCACGACCCTACTACTA
1368
GCGAGATAGTAGTAG
3137




TCTCGC

GGTCG






5149
5171
CCCTACTACTATCTCGC
1369
TCAGGTGCGAGATAGT
3138




ACCTGA

AGTA






5150
5172
CCTACTACTATCTCGCA
1370
TTCAGGTGCGAGATAG
3139




CCTGAA

TAGT






5167
5189
CCTGAAACAAGCTAACA
1371
TAGTCATGTTAGCTTG
3140




TGACTA

TTTC






5193
5215
CCCTTAATTCCATCCAC
1372
AGGAGGGTGGATGGA
3141




CCTCCT

ATTAA






5194
5216
CCTTAATTCCATCCACC
1373
GAGGAGGGTGGATGG
3142




CTCCTC

AATTA



5202
5224
CCATCCACCCTCCTC
1374
CCTAGGGAGAGGAGG
3143




CCTAGG

GTGGA






5206
5228
CCACCCTCCTCTCCCTA
1375
GCCTCCTAGGGAGAGG
3144




GGAGGC

AGGG






5209
5231
CCCTCCTCTCCCTAGGA
1376
CAGGCCTCCTAGGGAG
3145




GGCCTG

AGGA






5210
5232
GCTCCTCTCCCTAGGAG
1377
CCAGGCCTCCTAGGGA
3146




GCCTGC

GAGG






5213
5235
CCTCTCCCTAGGAGGCC
1378
GGGGCAGGCCTCCTAG
3147




TGCCCC

GGAG






5218
5240
CCCTAGGAGGCCTGCCC
1379
TAGCGGGGGCAGGCCT
3148




CCGCTA

CCTA






5219
5241
CCTAGGAGGCCTGCCCC
1380
TTAGCGGGGGCAGGCC
3149




CGCTAA

TCCT






5228
5250
CCTGCCCCCGCTAACCG
1381
AAAAGCCGGTTAGCG
3150




GCTTTT

GGGGC






5232
5254
CCCCCGCTAACCGGCTT
1382
GGCAAAAAGCCGGTT
3151




TTTGCC

AGCGG






5213
5255
CCCCGCTAACCGGCTTT
1383
GGGCAAAAAGCCGGT
3152




TTGCCC

TAGCG






5234
5256
CCCGCTAACCGGCTTTT
1384
TGGGCAAAAAGCCGG
3153




TGCCCA

TTAGC






5235
5257
CCGCTAACCGGCTTTTT
1385
TTGGGCAAAAAGCCG
3154




GCCCAA

GTTAG






5242
5264
CCGGCTTTTTGCCCAAA
1386
GGCCCATTTGGGCAAA
3155




TGGGCC

AAGC






5253
5275
CCCAAATGGGCCATTAT
1387
TCTTCGATAATGGCCC
3156




CGAAGA

ATTT






5254
5276
CCAAATGGGCCATTATC
1388
TTCTTCGATAATGGCC
3157




GAAGAA

CATT






5263
5285
CCATTATCGAAGAATTC
1389
TTTTGTGAATTCTTCG
3158




ACAAAA

ATAA






5294
5316
CCTCATCATCCCCACCA
1390
CTATGATGGTGGGGAT
3159




TCATAG

GATG






5303
5325
CCCCACCATCATAGCCA
1391
TGATGGTGGCTATGAT
3160




CCATCA

GGTG






5304
5326
CCCACCATCATAGCCAC
1392
GTGATGGTGGCTATGA
3161




CATCAC

TGGT






5305
5327
CCACCATCATAGCCACC
1393
GGTGATGGTGGCTATG
3162




ATCACC

ATGG






5308
5330
CCATCATAGCCACCATC
1394
GAGGGTGATGGTGGCT
3163




ACCCTC

ATGA






5317
5339
CCACCATCACCCTCCTT
1395
GAGGTTAAGGAGGGT
3164




AACCTC

GATGG






5320
5342
CCATCACCCTCCTTAAC
1396
GTAGAGGTTAAGGAG
3165




CTCTAC

GGTGA






5326
5348
CCCTCCTTAACCTCTAC
1397
GTAGAAGTAGAGGTTA
3166




TTCTAC

AGGA






5327
5349
CCTCCTTAACCTCTACTT
1398
GGTAGAAGTAGAGGTT
3167




CTACC

AAGG






5330
5352
CCTTAACCTCTACTTCT
1399
GTAGGTAGAAGTAGA
3168




ACCTAC

GGTTA






5336
5358
CCTCTACTTCTACCTAC
1400
TTAGGCGTAGGTAGAA
3169




GCCTAA

GTAG






5348
5370
CCTACGCCTAATCTACT
1401
AGGTGGAGTAGATTAG
3170




CCACCT

GCGT






5354
5376
CCTAATCTACTCCACCT
1402
TGATTGAGGTGGAGTA
3171




CAATCA

GATT






5365
5387
CCACCTCAATCACACTA
1403
GGGGAGTAGTGTGATT
3172




CTCCCC

GAGG






5368
5390
CCTCAATCACACTACTC
1404
TATGGGGAGTAGTGTG
3173




CCCATA

ATTG






5384
5406
CCCCATATCTAACAACG
1405
TTTTTACGTTGTTAGAT
3174




TAAAAA

ATG






5385
5407
CCCATATCTAACAACGT
1406
ATTTTTACGTTGTTAG
3175




AAAAAT

ATAT






5386
5408
CCATATCTAACAACGT
1407
TATTTTTACGTTGTTAG
3176




AAAATA

ATA






5433
5455
CCCACCCCATTCCTCCC
1408
AGTGTGGGGAGGAAT
3177




CACACT

GGGGT






5434
5456
CCACCCCATTCCTCCCC
1409
GAGTGTGGGGAGGAA
3178




ACACTC

TGGGG






5417
5459
CCCCATTCCTCCCCACA
1410
GATGAGTGTGGGGAG
3179




CTCATC

GAATG






5438
5460
CCCATTCCTCCCCACAC
1411
CGATGAGTGTGGGGA
3180




TCATCG

GGAAT






5439
5461
CCATTCCTCCCCACACT
1412
GCGATGAGTGTGGGG
3181




CATCGC

AGGAA






5444
5466
CCTCCCCACACTCATCG
1413
TAAGGGCGATGAGTGT
3182




CCCTTA

GGGG






5447
5469
CCCCACACTCATCGCCC
1414
TGGTAAGGGCGATGA
3183




TTACCA

GTGTG






5448
5470
CCCACACTCATCGCCCT
1415
GTGGTAAGGGCGATG
3184




TACCAC

AGTGT






5449
5471
CCACACTCATCGCCCTT
1416
CGTGGTAAGGGCGATG
3185




ACCACG

AGTG






5461
5483
CCCTTACCACGCTACTC
1417
AGGTAGGAGTAGCGT
3186




CTACCT

GGTAA






5462
5484
CCTTACCACGCTACTCC
1418
TAGGTAGGAGTAGCGT
3187




TACCTA

GGTA






5467
5489
CCACGCTACTCCTACCT
1419
GGAGATAGGTAGGAG
3188




ATCTCC

TAGCG






5477
5499
CCTACCTATCTCCCCTTT
1420
GTATAAAAGGGGAGA
3189




TATAC

TAGGT






5481
5503
CCTATCTCCCCTTTATA
1421
ATTAGTATAAAAGGGG
3190




CTAAT

AGAT






5488
5510
CCCCTTTTATACTAATA
1422
TAAGATTATTAGTATA
3191




ATCTTA

AAAG






5489
5511
CCCTTTTATACTAATAA
1423
ATAAGATTATTAGTAT
3192




TCTTAT

AAAA






5490
5512
CCTTTTATACTAATAAT
1424
TATAAGATTATTAGTA
3193




CTTATA

TAAA






5534
5556
CCAAGAGCCTTCAAAGC
1425
CTGAGGGCTTTGAAGG
3194




CCTCAG

CTCT






5541
5563
CCTTCAAAGCCCTCAGT
1426
CAACTTACTGAGGGCT
3195




AAGTTG

TTGA






5550
5572
CCCTCAGTAAGTTGCAA
1427
TAAGTATTGCAACTTA
3196




TACTTA

CTGA






5551
5573
CCTCAGTAAGTTGCAAT
1428
TTAAGTATTGCAACTT
3197




ACTTAA

ACTG






5601
5623
CCCCACTCTGCATCAAC
1429
CGTTCAGTTGATGCAG
3198




TGAACG

AGTG






5602
5624
CCCACTCTGCATCAACT
1430
GCGTTCAGTTGATGCA
3199




GAACGC

GAGT






5603
5625
CCACTCTGCATCAACTG
1431
TGCGTTCAGTTGATGC
3200




AACGCA

AGAG






5632
5654
CCACTTTAATTAAGCTA
1432
AGGGCTTAGCTTAATT
3201




AGCCCT

AAAG






5651
5673
CCCTTACTAGACCAATG
1433
AAGTCCCATTGGTCTA
3202




GGACTT

GTAA






5652
5674
CCTTACTAGACCAATGG
1434
TAAGTCCCATTGGTCT
3203




GACTTA

AGTA






5667
5684
CCAATGGGACTTAAACC
1435
TTTGTGGGTTTAAGTC
3204




CACAAA

CCAT






5677
5699
CCCACAAACACTTAGTT
1436
GCTGTTAACTAAGTGT
3205




AACAGC

TTGT






5678
5700
CCACAAACACTTAGTTA
1437
AGCTGTTAACTAAGTG
3206




ACAGCT

TTTG






5706
5728
CCCTAATCAACTGGCTT
1438
AGATTGAAGCCAGTTG
3207




CAATCT

ATTA






5707
5729
CCTAATCAACTGGCTTC
1439
TAGATTGAAGCCAGTT
3208




AATCTA

GATT






5735
5757
CCCGCCGCCGGGAAAA
1440
CCGCCTTTTTTCCCGG
3209




AAGGCGG

CGGC






5736
5758
CCGCCGCCGGGAAAAA
1441
CCCGCCTTTTTTCCCG
3210




AGGCGGG

GCGG






5739
5761
CCGCCGGGAAAAAAGG
1442
TCTCCCGCCTTTTTTCC
3211




CGGGAGA

CGG






5742
5764
CCGGGAAAAAAGGCGG
1443
GCTTCTCCCGCCTTTTT
3212




GAGAAGC

TCC






5764
5786
CCCCGGCAGGTTTGAAG
1444
AACCAGCTTCAAACCT
3213




CTGCTT

GCCG






5765
5787
CCCGGCAGGTTTGAAGC
1445
GAAGCAGCTTCAAACC
3214




TGCTTC

TGCC






5766
5788
CCGGCAGGTTTGAAGCT
1446
AGAAGCAGCTTCAAAC
3215




GCTTCT

CTGC






5817
5839
CCTCGGAGCTGGTAAAA
1447
GCCTCTTTTTACCAGC
3216




AGAGGC

TCCG






5839
5861
CCTAACCCCTGTCTTTA
1448
TAAATCTAAAGACAGG
3217




GATTTA

GGTT






5844
5866
CCCCTGTCTTTAGATTT
1449
GACTGTAAATCTAAAG
3218




ACAGTC

ACAG






5845
5867
CCCTGTCTTTAGATTTA
1450
GGACTGTAAATCTAAA
3219




CAGTCC

GACA






5846
5868
CCTGTCTTTAGATTTAC
1451
TGGACTGTAAATCTAA
3220




AGTCCA

AGAC






5866
5888
CCAATGCTTCACTCAGC
1452
AAAATGGCTGAGTGA
3221




CATTTT

AGCAT






5882
5904
CCATTTTACCTCACCCC
1453
TCAGTGGGGGTGAGGT
3222




CACTGA

AAAA






5890
5912
CCTCACCCCCACTGATG
1454
GGCGAACATCAGTGG
3223




TTCGCC

GGGTG






5895
5917
CCCCCACTGATGTTCGC
1455
CGGTCGGCGAACATCA
3224




CGACCG

GTGG






5896
5918
CCCCACTGATGTTCGCC
1456
ACGGTCGGCGAACATC
3225




GACCGT

AGTG






5897
5919
CCCACTGATGTTCGCCG
1457
AACGGTCGGCGAACAT
3226




ACCGTT

CAGT






5898
5920
CCACTGATGTTCGCCGA
1458
CAACGGTCGGCGAAC
3227




CCGTTG

ATCAG






5911
5933
CCGACCGTTGACTATTC
1459
TGTAGAGAATAGTCAA
3228




TCTACA

CGGT






5915
5937
CCGTTGACTATTCTCTA
1460
GGTTTGTAGAGAATAG
3229




CAAACC

TCAA






5936
5958
CCACAAAGACATTGGA
1461
ATAGTGTTCCAATGTC
3230




ACACTAT

TTTG






5960
5982
CCTATTATTCGGCGCAT
1462
CAGCTCATGCGCCGAA
3231




GAGCTG

TAAT






5987
6009
CCTAGGCACAGCTCTAA
1463
GGAGGCTTAGAGCTGT
3232




GCCTCC

GCCT






6005
6027
CCTCCTTATTCGAGCCG
1464
CCAGCTCGGCTCGAAT
3233




AGCTGG

AAGG






6008
6030
CCTTATTCGAGCCGAGC
1465
GGCCCAGCTCGGCTCG
3234




TGGGCC

AATA






6019
6041
CCGAGCTGGGCCAGCCA
1466
GTTGCCTGGCTGGCCC
3235




GGCAAC

AGCT






6029
6051
CCAGCCAGGCAACCTTC
1467
TACCTAGAAGGTTGCC
3236




TAGGTA

TGGC






6033
6055
CCAGGCAACCTTCTAGG
1468
TCGTTACCTAGAAGGT
3237




TAACGA

TGCC






6041
6063
CCTTCTAGGTAACGACC
1469
AGATGTGGTCGTTACC
3238




ACATCT

TAGA






6056
6078
CCACATCTACAACGTTA
1470
TGACGATAACGTTGTA
3239




TCGTCA

GATG






6082
6104
CCCATGCATTTGTAATA
1471
GAAGATTATTACAAAT
3240




ATCTTC

GCAT






6083
6105
CCATGCATTTGTAATAA
1472
AGAAGATTATTACAAA
3241




TCTTCT

TGCA






6117
6139
CCCATCATAATCGGAGG
1473
CCAAAGCCTCCGATTA
3242




CTTTGG

TGAT






6118
6140
CCATCATAATCGGAGGC
1474
GCCAAAGCCTCCGATT
3243




TTTGGC

ATGA






6153
6175
CCCCTAATAATCGGTGC
1475
TCGGGGGCACCGATTA
3244




CCCCGA

TTAG






6154
6176
CCCTAATAATCGGTGCC
1476
ATCGGGGGCACCGATT
3245




CCCGAT

ATTA






6155
6177
CCTAATAATCGGTGCCC
1477
TATCGGGGGCACCGAT
3246




CCGATA

TATT






6169
6191
CCCCCGATATGGCGTTT
1478
GCGGGGAAACGCCAT
3247




CCCCGC

ATCGG






6170
6192
CCCCGATATGGCGTTTC
1479
TGCGGGGAAACGCCAT
3248




CCCGCA

ATCG






6171
6193
CCCGATATGGCGTTTCC
1480
ATGCGGGGAAACGCC
3249




CCGCAT

ATATC






6172
6194
CCGATATGGCGTTTCCC
1481
TATGCGGGGAAACGCC
3250




CGCATA

ATAT






6186
6208
CCCCGCATAAACAACAT
1482
AAGCTTATGTTGTTTA
3251




AAGCTT

TGCG






6187
6209
CCCGCATAAACAACATA
1483
GAAGCTTATGTTGTTT
3252




AGCTTC

ATGC






6188
6210
CCGCATAAACAACATAA
1484
AGAAGCTTATGTTGTT
3253




GCTTCT

TATG






6219
6241
CCTCCCTCTCTCCTACTC
1485
AGCAGGAGTAGGAGA
3254




CTGCT

GAGGG






6222
6244
CCCTCTCTCCTACTCCTG
1486
GCGAGCAGGAGTAGG
3255




CTCGC

AGAGA






6223
6245
CCTCTCTCCTACTCCTGC
1487
TGCGAGCAGGAGTAG
3256




TCGCA

GAGAG






6210
6252
CCTACTCCTGCTCGCAT
1488
TAGCAGATGCGAGCA
3257




CTGCTA

GGAGT






6236
6258
CCTGCTCGCATCTGCTA
1489
CCACTATAGCAGATGC
3258




TAGTGG

GAGC






6262
6284
CCGGAGCAGGAACAGG
1490
TGTTCAACCTGTTCCT
3259




TTGAACA

GCTC






6290
6312
CCCTCCCTTAGCAGGGA
1491
AGTAGTTCCCTGCTAA
3260




ACTACT

GGGA






6291
6313
CCTCCCTTAGCAGGGAA
1492
GAGTAGTTCCCTGCTA
3261




CTACTC

AGGG






6294
6316
CCCTTAGCAGGGAACTAC
1493
TGGGAGTAGTTCCCTG
3262




CTCCCA

CTAA






6295
6317
CCTTAGCAGGGAACTAC
1494
GTGGGAGTAGTTCCCT
3263




TCCCAC

GCTA






6313
6335
CCCACCCTGGAGCCTCC
1495
GTCTACGGTGGCTCCA
3264




GTAGAC

GGGT






6314
6336
CCACCCTGGAGCCTCCG
1496
GGTCTACGGAGGCTCC
3265




TAGACC

AGGG






6317
6339
CCCTGGAGCCTCCGTAG
1497
TTAGGTCTACGGAGGC
3266




ACCTAA

TCCA






6318
6340
CCTGGAGCCTCCGTAGA
1498
GTTAGGTCTACGGAGG
3267




CCTAAC

CTCC






6325
6347
CCTCCGTAGACCTAACC
1499
GAAGATGGTTAGGTCT
3268




ATCTTC

ACGG






6328
6350
CCGTAGACCTAACCATC
1500
GGAGAAGATGGTTAG
3269




TTCTCC

GTCTA






6335
6357
CCTAACCATCTTCTCCTT
1501
GGTGTAAGGAGAAGA
3270




ACACC

TGGTT






6340
6362
CCATCTTCTCCTTACAC
1502
TGCTAGGTGTAAGGAG
3271




CTAGCA

AAGA






6349
6371
CTTACACCTAGCAGGT
1503
GGAGACACCTGCTAGG
3272




GTCTCC

TGTA






6356
6378
CCTAGCAGGTGTCTCCT
1504
AGATAGAGGAGACAC
3273




CTATCT

CTGCT






6370
6392
CCTCTATCTTAGGGGCC
1505
ATTGATGGCCCCTAAG
3274




ATCAAT

ATAG






6385
6407
CCATCAATTTCATCACA
1506
AATTGTTGTGATGAAA
3275




ACAATT

TTGA






6420
6442
CCCCCTGCCATAACCCA
1507
TGGTATTGGGTTATGG
3276




ATACCA

CAGG






6421
6443
CCCCTGCCATAACCCAA
1508
TTGGTATTGGGTTATGG
3277




TACCAA

GCAG






6422
6444
CCCTGCCATAACCCAAT
1509
TTTGGTATTGGGTTAT
3278




ACCAAA

GGCA






6423
6445
CCTGCCATAACCCAATA
1510
GTTTGGTATTGGGTTA
3279




CCAAAC

TGGC






6427
6449
CCATAACCCAATACCAA
1511
GGGCGTTTGGTATTGG
3280




ACGCCC

GTTA






6433
6455
CCCAATACCAAACGCCC
1512
GAAGAGGGGCGTTTG
3281




CTCTTC

GTATT






6434
6456
CCAATACCAAACGCCCC
1513
CGAAGAGGGGCGTTTG
3282




TCTTCG

GTAT






6440
6462
CCAAACGCCCCTCTTCG
1514
ATCAGACGAAGAGGG
3283




TCTGAT

GCGTT






6447
6469
CCCCTCTTCGTCTGATC
1515
AGGACGGATCAGACG
3284




CGTCCT

AAGAG






6448
6470
CCCTCTTCGTCTGATCC
1516
TAGGACGGATCAGAC
3285




GTCCTA

GAAGA






6449
6471
CCTCTTCGTCTGATCCG
1517
TTAGGACGGATCAGAC
3286




TCCTAA

GAAG






6463
6485
CCGTCCTAATCACAGCA
1518
TAGGACTGCTGTGATT
3287




GTCCTA

AGGA






6467
6489
CCTAATCACAGCAGTCC
1519
GAAGTAGGACTGCTGT
3288




TACTTC

GATT






6482
6504
CCTACTTCTCCTATCTCT
1520
CTGGGAGAGATAGGA
3289




CCCAG

GAAGT






6491
6513
CCTATCTCTCCCAGTCC
1521
CAGCTAGGACTGGGA
3290




TAGCTG

GAGAT






6500
6522
CCCAGTCCTAGCTGCTG
1522
TGATGCCAGCAGCTAG
3291




GCATCA

GACT






6501
6523
CCAGTCCTAGCTGCTGG
1523
GTGATGCCAGCAGCTA
3292




CATCAC

GGAC






6506
6528
CCTAGCTGCTGGCATCA
1524
GTATAGTGATGCCAGC
3293




CTATAC

AGCT






6539
6561
CCGCAACCTCAACACCA
1525
AGAAGGTGGTGTTGAG
3294




CCTTCT

GTTG






6545
6567
CCTCAACACCCTTCT
1526
GGTCGAAGAAGGTGG
3295




TCGACC

TGTTG






6553
6575
CCACCTTCTTCGACCCC
1527
TCCGGCGGGGTCGAAG
3296




GCCGGA

AAGG






6556
6578
CCTTCTTCGACCCCGCC
1528
TCCTCCGGCGGGGTCG
3297




GGAGGA

AAGA






6566
6588
CCCCGCCGGAGGAGGA
1529
TGGGGTCTCCTCCTCC
3298




GACCCCA

GGCG






6567
6589
CCCGCCGGAGGAGGAG
1530
ATGGGGTCTCCTCCTC
3299




ACCCCAT

CGGC






6568
6590
CCGCCGGAGGAGGAGA
1531
AATGGGGTCTCCTCCT
3300




CCCCATT

CCGG






6571
6593
CCGGAGGAGGAGACCC
1532
TAGAATGGGGTCTCCT
3301




CATTCTA

CCTC






6584
6606
CCCCATTCTATACCAAC
1533
ATAGGTGTTGGTATAG
3302




ACCTAT

AATG






6585
6607
CCCATTCTATACCAACA
1534
AATAGGTGTTGGTATA
3303




CCTATT

GAAT






6586
6608
CCATTCTATACCAACAC
1535
GAATAGGTGTTGGTAT
3304




CTATTC

AGAA






6596
6618
CCAACACCTATTCTGAT
1536
CGAAAAATCAGAATA
3305




TTTTCG

GGTGT






6602
6624
CCTATTCTGATTTTTCGG
1537
GGTGACCGAAAAATC
3306




TCACC

AGAAT






6623
6645
CCCTGAAGTTTATATTC
1538
GGATAAGAATATAAA
3307




TTATCC

CTTCA






6624
6646
CCTGAAGTTTATATTCT
1539
AGGATAAGAATATAA
3308




TATCCT

ACTTC






6644
6666
CCTACCAGGCTTCGGAA
1540
AGATTATTCCGAAGCC
3309




TAATCT

TGGT






6648
6670
CCAGGCTTCGGAATAAT
1541
TGGGAGATTATTCCGA
3310




CTCCCA

AGCC






6667
6689
CCCATATTGTAACTAC
1542
GGAGTAGTAAGTTACA
3311




TACTCC

ATAT






6668
6690
CCATATTGTAACTTACT
1543
CGGAGTAGTAAGTTAC
3312




ACTCCG

AATA






6688
6710
CCGGAAAAAAAGAACC
1544
TCCAAATGGTTCTTTTT
3313




ATTTGGA

TTC






6702
6724
CCATTTGGATACATAGG
1545
ACCATACCTATGTATC
3314




TATGGT

CAAA






6749
6771
CCTAGGGTTTATCGTGT
1546
GTGCTCACACGATAAA
3315




GAGCAC

CCCT






6773
6795
CCATATATTTACAGTAG
1547
CTATTCCTACTGTAAA
3316




GAATAG

TATA






6820
6842
CCTCCGCTACCATAATC
1548
AGCGATGATTATGGTA
3317




ATCGCTGCGG

GCGG






6823
6845
CCGCTACCATAATCATC
1549
GATAGCGATGATTATG
3318




GCTATC

GTAG






6829
6851
CCATAATCATCGCTATC
1550
GGTGGGGATAGCGAT
3319




CCCACC

GATTA






6845
6867
CCCCACCGGCGTCAAAG
1551
TAAATACTTTGACGCC
3320




TATTTA

GGTG






6846
6868
CCCACCGGCGTCAAAGT
1552
CTAAATACTTTGACGC
3321




ATTTAG

CGGT






6847
6869
CCACCGGCGTCAAAGTA
1553
GCTAAATACTTTGACG
3322




TTTAGC

CCGG






6850
6872
CCGGCGTCAAAGTATTT
1554
TCAGCTAAATACTTTG
3323




AGCTGA

ACGC






6877
6899
CCACACTCCACGGAAGC
1555
CATATTGCTTCCGTGG
3324




AATATG

AGTG






6884
6906
CCACGGAAGCAATATG
1556
ATCATTTCATATTGCTT
3325




AAATGAT

CCG






6925
6947
CCCTAGATTCATCTTT
1557
GAAAAGAAAGATGAA
3326




CTTTTC

TCCTA






6926
6948
CCTAGGATTCATCTTTC
1558
TGAAAAGAAAGATGA
3327




TTTTCA

ATCCT






6949
6971
CCGTAGGTGGCCTGACT
1559
AATGCCAGTCAGGCCA
3328




GGCATT

CCTA






6959
6981
CCTGACTGGCATTGTAT
1560
TTGCTAATACAATGCC
3329




TAGCAA

AGTC






7027
7049
CCCACTTCCACTATGTC
1561
TGATAGGACATAGTGG
3330




CTATCA

AAGT






7028
7050
CCACTTCCACTATGTCC
1562
TTGATAGGACATAGTG
3331




TATCAA

GAAG






7034
7056
CCACTATGTCCTATCAA
1563
CTCCTATTGATAGGAC
3332




TAGGAG

ATAG






7043
7065
CCTATCAATAGGAGCTG
1564
CAAATACAGCTCCTAT
3333




TATTTG

TGAT






7066
7088
CCATCATAGGAGGCTTC
1565
GTGAATGAAGCCTCCT
3334




ATTCAC

ATGA






7095
7117
CCCCTATTCTCAGGCTA
1566
AGGGTGTAGCCTGAGA
3335




CACCCT

ATAG






7096
7118
CCCTATTCTCAGGCTAC
1567
TAGGGTGTAGCCTGAG
3336




ACCCTA

AATA






7097
7119
CCTATTCFCAGGCTACA
1568
CTAGGGTGTAGCCTGA
3337




CCCTAG

GAAT






7114
7136
CCCTAGACCAAACCTAC
1569
TTTGGCGTAGGTTTGG
3338




GCCAAA

TCTA






7115
7137
CCTAGACCAAACCTACG
1570
TTTTGGCGTAGGTTTG
3339




CCAAAA

GTCT






7121
7143
CCAAACCTACGCCAAAA
1571
AATGGATTTTGGCGTA
3340




TCCATT

GGTT






7126
7148
CCTACGCCAAAATCCAT
1572 
AGTGAAATGGATTTTG
3341




TTCACT

GCGT






7132
7154
CCAAAATCCATTTCACT
1573
TATGATAGTGAAATGG
3342




ATCATA

ATTT






7139
7161
CCATTTCACTATCATAT
1574
CGATGAATATGATAGT
3343




TCATCG

GAAA






7181
7203
CCCACAACACTTTCTCG
1575
ATAGGCCGAGAAAGT
3344




GCCTAT

GTTGT






7182
7204
CCACAACACTTTCTCGG
1576
GATAGGCCGAGAAAG
3345




CCTATC

TGTTG






7199
7221
CCTATCCGGAATGCCCC
1577
AACGTCGGGGCATTCC
3346




GACGTT

GGAT






7204
7226
CCGGAATGCCCCGACGT
1578
CGAGTAACGTCGGGGC
3347




TACTCG

ATTC






7212
7234
CCCCGACGTTACTCGGA
1579
GGGTAGTCCGAGTAAC
3348




CTACCC

GTCG






7213
7235
CCCGACGTTACTCGGAC
1580
GGGGTAGTCCGAGTAA
3349




TACCCC

CGTC






7214
7236
CCGACGTTACTCGGACT
1581
CGGGGTAGTCCGAGTA
3350




ACCCCG

ACGT






7232
7254
CCCCGATGCATACACCA
1582
TTCATGTGGTGTATGC
3351




CATGAA

ATCG






7233
7255
CCCGATGCATACACCAC
1583
TTTCATGTGGTGTATG
3352




ATGAAA

CATC






7234
7256
CCGATGCATACACCACA
1584
GTTTCATGTGGTGTAT
3353




TGAAAC

GCAT






7246
7268
CCACATGAAACATCCTA
1585
AGATGATAGGATGTTT
3354




TCATCT

CATG






7259
7281
CCTATCATCTGTAGGCT
1586
TGAATGAGCCTACAGA
3355




CATTCA

TGAT






7327
7349
CCTTCGCTTCGAAGCGA
1587
GACTTTTCGCTTCGAA
3356




AAAGTC

GCGA






7349
7371
CCTAATAGTAGAAGAAC
1588
TGGAGGGTTCTTCTAC
3357




CCTCCA

TATT






7365
7387
CCCTCCATAAACCTGGA
1589
AGTCACTCCAGGTTTA
3358




GTGACT

TGGA






7366
7388
CCTCCATAAACCTGGAG
1590
TAGTCACTCCAGGTTT
3359




TGACTA

ATGG






7369
7391
CCATAAACCTGGAGTGA
1591
ATATAGTCACTCCAGG
3360




CTATAT

TTTA






7376
7398
CCTGGAGTGACTATATG
1592
GGCATCCATATAGTCA
3361




GATGCC

CTCC






7397
7419
CCCCCCACCCTACCACA
1593
CGAATGTGTGGTAGGG
3362




CATTCG

TGGG






7398
7420
CCCCCACCCTACCACAC
1594
TCGAATGTGTGGTAGG
3363




ATTCGA

GTGG






7399
7421
CCCCACCCTACCACACA
1595
TTCGAATGTGTGGTAG
3364




TTCGAA

GGTG






7400
7422
CCCACCCTACCACACAT
1596
CTTCGAATGTGTGGTA
3365




TCGAAG

GGGT






7401
7423
CCACCCTACCACACATT
1597
TCTTCGAATGTGTGGT
3366




CGAAGA

AGGG






7404
7426
CCCTACCACACATTCGA
1598
GGTTCTTCGAATGTGT
3367




AGAACC

GGTA






7405
7427
CCTACCACACATTCGAA
1599
GGGTTCTTCGAATGTG
3368




GAACCC

TGGT






7409
7431
CCACACATTCGAAGAAC
1600
ATACGGGTTCTTCGAA
3369




CCGTAT

TGTG






7425
7447
CCCGTATACATAAAATC
1601
TGTCTAGATTTTATGT
3370




TAGACA

ATAC






7426
7448
CCGTATACATAAAATCT
1602
TTGTCTAGATTTTATGT
3371




AGACAA

ATA






7466
7488
CCCCCCAAAGCTGGTTT
1603
GGCTTGAAACCAGCTT
3372




CAAGCC

TGGG






7467
7489
CCCCCAAAGCTGGTTTC
1604
TGGCTTGAAACCAGCT
3373




AAGCCA

TTGG






7468
7490
CCCCAAAGTGGTTTCA
1605
TTGGCTTGAAACCAGC
3374




AGCCAA

TTTG






7469
7491
CCCAAAGCTGGTTTCAA
1606
GTTGGCTTGAAACCAG
3375




GCCAAC

CTTT






7470
7492
CCAAAGCTGGTTTCAAG
1607
GGTTGGCTTGAAACCA
3376




CCAACC

GCTT






7487
7509
CCAACCCCATGGCCTCC
1608
AGTCATGGAGGCCATG
3377




ATGACT

GGGT






7491
7513
CCCCATGGCCTCCATGA
1609
AAAAAGTCATGGAGG
3378




CTTTTT

CCATG






7492
7514
CCCATGGCCTCCATGAC
1610
GAAAAAGTCATGGAG
3379




TTTTTC

GCCAT






7493
7515
CCATGGCCTCCATGACT
1611
TGAAAAAGTCATGGA
3380




TTTTCA

GGCCA






7499
7521
CCTCCATGACTTTTTCA
1612
CCTTTTTGAAAAAGTC
3381




AAAAGG

ATGG






7502
7524
CCATGACTTTTTCAAAA
1613
ATACCTTTTTGAAAAA
3382




AGGTAT

GTCA






7533
7555
CCATTTCATAACTTTGT
1614
ACTTTGACAAAGTTAT
3383




CAAAGT

GAAA






7573
7595
CCTATATATCTTAATGG
1615
CATGTGCCATTAAGAT
3384




CACATG

ATAT






7626
7648
CCCCTATCATAGAAGAG
1616
GATAAGCTCTTCTATG
3385




CTTATC

ATAG






7627
7649
CCCTATCATAGAAGAGC
1617
TGATAAGCTCTTCTAT
3386




TTATCA

GATA






7628
7650
CCTATCATAGAAGAGCT
1618
GTGATAAGCTCTTCTA
3387




TATCAC

TGAT






7650
7672
CCTTTCATGATCACGCC
1619
TATGAGGGCGTGATCA
3388




CTCATA

TGAA






7665
7687
CCCTCATAATCATTTTC
1620
GATAAGGAAAATGATT
3389




CTTATC

ATGA






7666
7688
CCTCATAATCATTTTCCT
1621
AGATAAGGAAAATGA
3390




TATCT

TTATG






7681
7703
CCTTATCTGCTTCCTAGT
1622
ACAGGACTAGGAAGC
3391




CCTGT

AGATA






7693
7715
CCTAGTCCTGTATGCCC
1623
GGAAAAGGGCATACA
3392




TTTTCC

GGACT






7699
7721
CCTGTATGCCCTTTTCCT
1624
GTGTTAGGAAAAGGG
3393




AACAC

CATAC






7707
7729
CCCTTTTCCTAACACTC
1625
TGTTGTGAGTGTTAGG
3394




ACAACA

AAAA






7708
7730
CCTTTTCTAACACTCA
1626
TTGTTGTGAGTGTTAG
3395




CAACAA

GAAA






7714
7736
CCTAACACTCACAACAA
1627
TTAGTTTTGTTGTGAG
3396




AACTAA

TGTT






7773
7795
CCGTCTGAACTATCCTG
1628
GGCGGGCAGGATAGTT
3397




CCCGCC

CAGA






7786
7808
CCTGCCCGCCATCATCC
1629
GGACTAGGATGATGGC
3398




TAGTCC

GGGC






7790
7812
CCCGCCATCATCCTAGT
1630
ATGAGGACTAGGATG
3399




CCTCAT

ATGGC






7791
7813
CCGCCATCATCCTAGTC
1631
GATGAGGACTAGGAT
3400




CTCATC

GATGG






7794
7816
CCATCATCCTAGTCCTC
1632
GGCGATGAGGACTAG
3401




ATCGCC

GATGA






7801
7823
CCTAGTCCTCATCGCCC
1633
ATGGGAGGGCGATGA
3402




TCCCAT

GGACT






7807
7829
CCTCATCGCCCTCCCAT
1634
GTAGGGATGGGAGGG
3403




CCCTAC

CGATG






7815
7837
CCCTCCCATCCCTACGC
1635
AAGGATGCGTAGGGA
3404




ATCCTT

TGG






7816
7838
CCTCCCATCCCTACGCA
1636
AAAGGATGCGTAGGG
3405




TCCTTT

ATGGG






7819
7841
CCCATCCCTACGCATCC
1637
TGTAAAGGATGCGTAG
3406




TTTACA

GGAT






7820
7842
CCATCCCTACGCATCCT
1638
ATGTAAAGGATGCGTA
3407




TTACAT

GGGA






7824
7846
CCCTACGCATCCTTTAC
1639
TGTTATGTAAAGGATG
3408




ATAACA

CGTA






7825
7847
CCTACGCATCCTTTACA
1640
CTGTTATGTAAAGGAT
3409




TAACAG

GCGT






7834
7856
CCTTTACATAACAGACG
1641
TGACCTCGTCTGTTAT
3410




AGGTCA

GTAA






7862
7884
CCCTCCCTTACCATCAA
1642
ATTGATTTGATGGTAA
3411




ATCAAT

GGGA






7863
7885
CCTCCCTTACCATCAAA
1643
AATTGATTTGATGGTA
3412




TCAATT

AGGG






7866
7888
CCCTTACCATCAAATCA
1644
GCCAATTGATTTGATG
3413




ATTGGC

GTAA






7867
7889
CCTTACCATCAAATCAA
1645
GGCCAATTGATTTGAT
3414




TTGGCC

GGTA






7872
7894
CCATCAAATCAATTGGC
1646
TTGGTGGCCAATTGAT
3415




CACCAA

TTGA






7888
7910
CCACCAATGGTACTGAA
1647
CGTAGGTTCAGTACCA
3416




CCTACG

TTGG






7891
7913
CCAATGGTACTGAACCT
1648
ACTCGTAGGTTCAGTA
3417




ACGAGT

CCAT






7905
7927
CCTACGAGTACACCGAC
1649
GCCGTAGTCGGTGTAC
3418




TACGGC

TCGT






7917
7939
CCGACTACGGCGGACTA
1650
GAAGATTAGTCCGCCG
3419




ATCTTC

TAGT






7944
7966
CCTACATACTTCCCCCA
1651
GAATAATGGGGGAAG
3420




TTATTC

TATGT






7955
7977
CCCCCATTATTCCTAGA
1652
CCTGGTTCTAGGAATA
3421




ACCAGG

ATGG






7956
7978
CCCCATTTTCCTAGAA
1653
GCCTGGTTCTAGGAAT
3422




CCAGGC

AATG






7957
7979
CCCATTATTCCTAGAAC
1654
CGCCTGGTTCTAGGAA
3423




CAGGCG

TAAT






7958
7980
CCATTATTCCTAGAACC
1655
TCGCCTGGTTCTAGGA
3424




AGGCGA

ATAA






7966
7988
CCTAGAACCAGGCGACC
1656
GTCGCAGGTCGCCTGG
3425




TGCGAC

TTCT






7973
7995
CCAGGCGACCTGCGACT
1657
TCAAGGACTCGCAGGT
3426




CCTTGA

CGCC






7981
8003
CCTGCGACTCCTTGACG
1658
TGTCAACGTCAAGGAG
3427




TTGACA

TCGC






7990
8012
CCTTGACGTTGACAATC
1659
CTACTCGATTGTCAAC
3428




GAGTAG

GTCA






8017
8039
CCCGATTGAAGCCCCCA
1660
TACGAATGGGGGCTTC
3429




TTCCTTA

AATC






8018
8040
CCGATTGAAGCCCCCAT
1661
ATACGAATGGGGGCTT
3430




TCGTAT

CAAT






8028
8050
CCCCCATTCGTATAATA
1662
TGTAATTATTATACGA
3431




ATTACA

ATGG






8029
8051
CCCCATTCGTATAATAA
1663
ATGTAATTATTATACG
3432




TTACAT

AATG






8030
8052
CCCATTCGTATAATAAT
1664
GATGTAATTATTATAC
3433




TACATC

GAAT






8031
8053
CCATTCGTATAATAATT
1665
TGATGTAATTATTATA
3434




ACATCA

CGAA






8080
8102
CCCCACATTAGGCTTAA
1666
CTGTTTTTAAGCCTAA
3435




AAACAG

TGTG






8081
8103
CCCACATTAGGCTTAAA
1667
TCTGTTTTTAAGCCTA
3436




AACAGA

ATGT






8082
8104
CCACATTAGGCTTAAAA
1668
ATCTGTTTTTAAGCCT
3437




ACAGAT

AATG






8111
8133
CCCGGACGTCTAAACCA
1669
GTGGTTTGGTTTAGAC
3438




AACCAC

GTCC






8112
8134
CCGGACGTCTAAACCAA
1670
AGTGGTTTGGTTTAGA
3439




ACCACT

CGTC






8125
8147
CCAAACCACTTTCACCG
1671
GTGTAGCGGTGAAAGT
3440




CTACAC

GGTT






8130
8152
CCACTTTCACCGCTACA
1672
CGGTCGTGTAGCGGTG
3441




CGACCG

AAAG






8139
8161
CCGCTACACGACCGGGG
1673
GTATACCCCCGGTCGT
3442




GTATAC

GTAG






8150
8172
CCGGGGGTATACTACGG
1674
CATTGACCGTAGTATA
3443




TCAATG

CCCC






8194
8216
CCACAGTTTCATGCCCA
1675
GGACGATGGGCATGA
3444




TCGTCC

AACTG






8207
8229
CCCATCGTCCTAGAATT
1676
GGAATTAATTCTAGGA
3445




AATTCC

CGAT






8208
8230
CCATCGTCCTAGAATTA
1677
GGGAATTAATTCTAG
3446




ATTCCC

ACGA






8215
8237
CCTAGAATTAATTCCCC
1678
TTTTTAGGGGAATTAA
3447




TAAAAA

TTCT






8228
8250
CCCCTAAAAATCTTTGA
1679
CCTATTTCAAAGATTT
3448




AATAGG

TTAG






8229
8251
CCCTAAAAATCTTTGAA
1680
CCCTATTTCAAAGATT
3449




ATAGGG

TTTA






8230
8252
CCTAAAAATCTTTGAAA
1681
GCCCTATTTCAAAGAT
3450




TAGGGC

TTTT






8252
8274
CCCGTATTTACCCTATA
1682
GGGTGCTATAGGGTAA
3451




GCACCC

ATAC






8253
8275
CCGTATTTACCCTATAG
1683
GGGGTGCTATAGGGTA
3452




CACCCC

AATA






8262
8284
CCCTATAGCACCCCCTC
1684
GGGGTAGAGGGGGTG
3453




TACCCC

CTATA






8263
8285
CCTATAGCACCCCCTCT
1685
GGGGGTAGAGGGGGT
3454




ACCCCC

GCTAT






8272
8294
CCCCCTCTACCCCCTCT
1686
GGCTCTAGAGGGGGTA
3455




AGAGCC

GAGG






8273
8295
CCCCTCTACCCCCTCTA
1687
GGGCTCTAGAGGGGGT
3456




GAGCCC

AGAG






8274
8296
CCCTCTACCCCCTCTAG
1688
TGGGCTCTAGAGGGGG
3457




AGCCCA

TAGA






8275
8297
CCTCTACCCCCTCTAGA
1689
GTGGGCTCTAGAGGCG
3458




GCCCAC

GTAG






8281
8303
CCCCCTCTAGAGCCCAC
1690
TTTACAGTGGGCTCTA
3459




TGTAAA

GAGG






8282
8304
CCCCTCTAGAGCCCACT
1691
CTTTACAGTGGGCTCT
3460




GTAAAG

AGAG






8283
8305
CCCTCTAGAGCCCACTG
1692
GCTTTACAGTGGGCTC
3461




TAAAGC

TAGA






8284
8306
CCTCTAGAGCCCACTGT
1693
AGCTTTACAGTGGGCT
3462




AAAGCT

CTAG






8293
8315
CCCACTGTAAAGCTAAC
1694
TGCTAAGTTAGCTTTA
3463




TTAGCA

CAGT






8294
8316
CCACTGTAAAGCTAACT
1695
ATGCTAAGTTAGCTTT
3464




TAGCAT

ACAG






8320
8342
CCTTTTAAGTTAAAGAT
1696
CTCTTAATCTTTAACTT
3465




TAAGAG

AAA






8345
8367
CCAACACCTCTTTACAG
1697
ATTTCACTGTAAAGAG
3466




TGAAAT

GTGT






8351
8373
CCTCTTTACAGTGAAAT
1698
TGGGGCATTTCACTGT
3467




GCCCCA

AAAG






8369
8391
CCCCAACTAAATACTAC
1699
CATACGGTAGTATTTA
3468




CGTATG

GTTG






8370
8392
CCCAACTAAATACTACC
1700
CCATACGGTAGTATTT
3469




GTATGG

AGTT






8371
8393
CCAACTAAATACTACCG
1701
GCCATACGGTAGTATT
3470




TATGGC

TAGT






8385
8407
CCGTATGGCCCACCATA
1702
GGTAATTATGGTGGGC
3471




ATTACC

CATA






8393
8415
CCCACCATAATTACCCC
1703
AGTATGGGGGTAATTA
3472




CATACT

TGGT






8394
8416
CCACCATAATTACCCCC
1704
GAGTATGGGGGTAATT
3473




ATACTC

ATGG






8397
8419
CCATAATTACCCCCATA
1705
AAGGAGTATGGGGGT
3474




CTCCTT

AATTA






8406
8428
CCCCCATACTCCTTACA
1706
GAATAGTGTAAGGAGT
3475




CTATTC

ATGG






8407
8429
CCCCATACTCCTTACAC
1707
GGAATAGTGTAAGGA
3476




TATTCC

GTATG






8408
8430
CCCATACTCCTTACACT
1708
AGGAATAGTGTAAGG
3477




ATTCCT

AGTAT






8409
8431
CCATACTCCTTACACTTA
1709
GAGGAATAGTGTAAG
3478




TTCCTC

GAGTA






8416
8438
CCTTACACTATTCCTCA
1710
GGGTGATGAGGAATA
3479




TCACCC

GTGTA






8428
8450
CCTCATCACCCAACTAA
1711
ATATTTTTAGTTGGGT
3480




AAATAT

GATG






8436
8458
CCCAACTAAAAATATTA
1712
TGTGTTTAATATTTTTA
3481




AACACA

GTT






8437
8459
CCAACTAAAAATATTAA
1713
TTGTGTTTAATATTTTT
3482




ACACAA

AGT






8464
8486
CCACCTACCTCCCTCAC
1714
GCTTTGGTGAGGGAGG
3483




CAAAGC

GAGG






8467
8489
CCTACCTCCCTCACCAA
1715
TGGGCTTTGGTGAGGG
3484




AGCCCA

AGGT






8471
8493
CCTCCCTCACCAAAGCC
1716
TTTATGGGCTTTGGTG
3485




CATAAA

AGGG






8474
8496
CCCTCACCAAAGCCCAT
1717
ATTTTTATGGGCTTTG
3486




AAAAAT

GTGA






8475
8497
CCTCACCAAAGCCCATA
1718
TATTTTTATGGGCTTTG
3487




AAAATA

GTG






8480
8502
CCAAAGCCCATAAAAAT
1719
TTTTTTATTTTTATGG
3488




AAAAAA

CTT






8486
8508
CCCATAAAAATAAAAA
1720
TTATAATTTTTTATTTT
3489




ATTATAA

TAT






8487
8509
CCATAAAAATAAAAAA
1721
GTTATAATTTTTTATTT
3490




TTATAAC

TTA






8513
8535
CCCTGAGAACCAAAATG
1722
TTCGTTCATTTTGGTTC
3491




AACGAA

TCA






8514
8536
CCTGAGAACCAAAATG
17231
TTTCGTTCATTTTGGTT
3492




AACGAAA

CTC






8522
8544
CCAAAATGAACGAAAA
1724
GAACAGATTTTCGTTC
3493




TCTGTTC

ATTT






8558
8580
CCCCCACAATCCTAGGC
1725
GGGTAGGCCTAGGATT
3494




CTACCC

GTGG






8559
8581
CCCCACAATCCTAGGCC
1726
CGGGTAGGCCTAGGAT
3495




TACCCG

TGTG






8560
8582
CCCACAATCCTAGGCCT
1727
GCGGGTAGGCCTTAGG
3496




ACCCGC

ATTGT






8561
8583
CCACAATCCTAGGCCTA
1728
GGCGGGTAGGCCTAG
3497




CCCGCC

GATTG






8568
8590
CCTAGGCCTACCCGCCG
1729
GTACTGCGGCGGGTAG
3498




CAGTAC

GCCT






8574
8596
CCTACCCGCCGCAGTAC
1730
TGATCAGTACTGCGGC
3499




TGATCA

GGGT






8578
8600
CCCGCCGCAGTACTGAT
1731
AGAATGATCAGTACTG
3500




CATTCT

CGGC






8579
8601
CCGCCGCAGTACTGATC
1732
TAGAATGATCAGTACT
3501




ATTCTA

GCGG






8582
8604
CCGCAGTACTGATCATT
1733
AAATAGAATGATCAGT
3502




CTATTT

ACTG






8605
8627
CCCCCTCTATTGATCCC
1734
GAGGTGGGGATCAAT
3503




CACCTC

AGAGG






8606
8628
CCCCTCTATTGATCCCC
1735
GGAGGTGGGGATCAA
3504




ACCTCC

TAGAG






8607
8629
CCCTCTATTGATCCCCA
1736
TGGAGGTGGGGATCA
3505




CCTCCA

ATAGA






8608
8630
CCTCTATTGATCCCCAC
1737
TTGGAGGTGGGGATCA
3506




CTCCAA

ATAG






8619
8641
CCCCACCTCCAAATATC
1738
TGATGAGATATTTGGA
3507




TCATCA

GGTG






8620
8642
CCCACCTCCAAATATCT
1739
TTGATGAGATATTTGG
3508




CATCAA

AGGT






8621
8643
CCACCTCCAAATATCTC
1740
GTTGATGAGATATTTG
3509




ATCAAC

GAGG






8624
8646
CCTCCAAATATCTCATC
1741
GTTGTTGATGAGATAT
3510




AACAAC

TTGG






8627
8649
CCAAATATCTCATCAAC
1742
TCGGTTGTTGATGAGA
3511




AACCGA

TATT






8646
8668
CCGACTAATCACCACCC
1743
ATTGTTGGGTGGTGAT
3512




AACAAT

TAGT






8657
8679
CCACCCAACAATGACTA
1744
TTTGATTAGTCATTGTT
3513




ATCAAA

GGG






8660
8682
CCCAACAATGACTAATC
1745
TAGTTTGATTAGTCAT
3514




AAACTA

TGTT






8661
8683
CCAACAATGACTAATCA
1746
TTAGTTTGATTAGTCA
3515




AACTAA

TTGT






8684
8706
CCTCAAAACAAATGATA
1747
TATGGTTATCATTTGTT
3516




ACCTA

TTG






8702
8724
CCATACACAACACTAAA
1748
TCGTCCTTTAGTGTTGT
3517




GGACGA

GTA






8726
8748
CCTGATCTCTTATACTA
1749
GGATACTAGTATAAGA
3518




GTATCC

GATC






8747
8769
CCTTAATCATTTTTATTG
1750
TGTGGCAATAAAAATG
3519




CCACA

ATTA






8765
8787
CCACAACTAACCTCCTC
1751
GAGTCCGAGGAGGTTA
3570




GGACTC

GTTG






8775
8797
CCTCCTCGGACTCCTGC
1752
AGTGAGGCAGGAGTC
3521




CTCACT

CGAGG






8778
8800
CCTCGGACTCCTGCCTC
1753
ATGAGTGAGGCAGGA
3522




ACTCAT

GTCCG






8787
8809
CCTGCCTCACTCATTTA
1754
TTGGTGTAAATGAGTG
3523




CACCAA

AGGC






8791
8813
CCTCACTCATTTACACC
1755
GTGGTTGGTGTAAATG
3524




AACCAC

AGTG






8806
8828
CCAACCACCCAACTATC
1756
TTTATAGATAGTTGGG
3525




TATAAA

TGGT






8810
8832
CCACCCAACTATCTATA
1757
TAGGTTTATAGATAGT
3526




AACCTA

TGGG






8813
8835
CCCAACTATCTATAAAC
1758
GGCTAGGTTTATAGAT
3527




CTAGCC

AGTT






8814
8836
GCCAACTATCTATAAACC
1759
TGGCTAGGTTTATAGA
3528




TAGCCA

TAGT






8829
8851
CCTAGCCATGGCCATCC
1760
ATAAGGGGATGGCCAT
3529




CCTTAT

GGCT






8834
8856
CCATGGCCATCCCCTTA
1761
CGCTCATAAGGGGATG
3530




TGAGCG

GCCA






8840
8862
CCATCCCCTTATGAGCG
1762
TGTGCCCGCTCATAAG
3531




GGCACA

GGGA






8844
8866
CCCCTTATGAGCGGGCA
1763
TCACTGTGCCCGCTCA
3532




CAGTGA

TAAG






8845
8867
CCCTTATGAGCGGGCAC
1764
ATCACTGTGCCCGCTC
3533




AGTGAT

ATAA






8846
8868
CCTTATGAGCGGGCACA
1765
AATCACTGTGCCCGCT
3534




GTGATT

CATA






8897
8919
CCCTAGCCCACTTCTTA
1766
TTGTGGTAAGAAGTGG
3535




CCACAA

GCTA






8898
8920
CCTAGCCCACTTCTTAC
1767
CTTGTGGTAAGAAGTG
3536




CACAAG

GGCT






8903
8925
CCCACTTCTTACCACAA
1768
TGTGCCTTGTGGTAAG
3537




GGCACA

AAGT






8904
8926
CCACTTCTTACCACAAG
1769
GTGTGCCTTGTGGTAA
3538




GCACAC

GAAG






8914
8936
CCACAAGGCACACCTAC
1770
AGGGGTGTAGGTGTGC
3539




ACCCCT

CTTG






8926
8948
CCTACACCCCTTATCCC
1771
AGTATGGGGATAAGG
3540




CATACT

GGTGT






8932
8954
CCCCTTATCCCCATACT
1772
ATAACTAGTATGGGGA
3541




AGTTAT

TAAG






8933
8955
CCCTTATCCCCATACTA
1773
AATAACTAGTATGGGG
3542




GTTATT

ATAA






8934
8956
CCTTATCCCCATACTAG
1774
TAATAACTAGTATGGG
3543




TTATTA

GATA






8940
8962
CCCCATACTAGTTATTA
1775
TTTCGATAATAACTAC
3544




TCGAAA

TATG






8941
8963
CCCATACTAGTTATTAT
1776
GTTTCGATAATAACTA
3545




CGAAAC

GTAT






8942
8964
CCATACTAGTTATTATC
1777
GGTTTCGATAATAACT
3546




GAAACC

AGTA






8963
8985
CCATCAGCCTACTCATT
1778
TGGTTGAATGAGTAGG
3547




CAACCA

CTGA






8970
8992
CCTACTCATTCAACCAA
1779
GGGCTATTGGTTGAAT
3548




TAGCCC

GAGT






8983
9005
CCAATAGCCCTGGCCGT
1780
AGGCGTACGGCCAGG
3549




ACGCCT

GCTAT






8990
9012
CCCTGGCCGTACGCCTA
1781
AGCGGTTAGGCGTACG
3550




ACCGCT

GCCA






8991
9013
CCTGGCCGTACGCCTAA
1782
TAGCGGTTAGGCGTAC
3551




CCGCTA

GGCC






8996
9018
CCGTACGCCTAACCGCT
1783
AATGTTAGCGGTTAGG
3552




AACATT

CGTA






9003
9025
CCTAACCGCTAACATTA
1784
CTGCAGTAATGTTAGC
3553




CTGCAG

GGTT






9008
9030
CCGCTAACATTACTGCA
1785
GTGGCCTGCAGTAATG
3554




GGCCAC

TTAG






9027
9049
CCACCTACTCATGCACC
1786
CAATTAGGTGCATGAG
3555




TAATTG

TAGG






9030
9052
CCTACTCATGCACCTAA
1787
TTCCAATTAGGTGCAT
3556




TTGGAA

GAGT






9042
9064
CCTAATTGGAAGCGCCA
1788
CTAGGGTGGCGCTTCC
3557




CCCTAG

AATT






9056
9078
CCACCCTAGCAATATCA
1789
AATGGTTGATATTGCT
3558




ACCATT

AGGG






9059
9081
CCCTAGCAATATCAACC
1790
GTTAATGGTTGATATT
3559




ATTAAC

GCTA






9060
9082
CCTAGCAATATCAACCA
1791
GGTTAATGGTTGATAT
3560




TTAACC

TGCT






9074
9096
CCATTAACCTTCCCTCT
1792
AAGTGTAGAGGGAAG
3561




ACACTT

GTTAA






9081
9103
CCTTCCCTCTACACTTAT
1793
AGATGATAAGTGTAGA
3562




CATCT

GGGA






9085
9107
CCCTCTACACTTATCAT
1794
GTGAAGATGATAAGTG
3563




CTTCAC

TAGA






9086
9108
CCTCTACACTTATCATC
1795
TGTGAAGATGATAAGT
3564




TTCACA

GTAG






9129
9151
CCTAGAAATCGCTGTCG
1796
TAAGGCGACAGCGAT
3565




CCTTAA

TTCT






9146
9168
CCTTAATCCAAGCCTAC
1797
GAAAACGTAGGCTTGG
3566




GTTTTC

ATTA






9153
9175
CCAAGCCTACGTTTTCA
1798
GAAGTGTGAAAACGT
3567




CACTTTC

AGGCT






9158
9180
CCTACGTTTTCACACTT
1799
TACTAGAAGTGTGAAA
3568




CTAGTA

ACGT






9183
9205
CCTCTACCTGCACGACA
1800
ATGTGTTGTCGTGCAG
3569




ACACAT

GTAG






9189
9211
CCTGCACGACAACACAT
1801
GTCATTATGTGTTGTC
3570




AATGAC

GTGC






9211
9233
CCCACCAATCACATGCC
1802
ATGATAGGCATGTGAT
3571




TATCAT

TGGT






9212
9234
CCACCAATCACATGCCT
1803
TATGATAGGCATGTGA
3572




ATCATA

TTGG






9215
9237
CCAATCACATGCCTATC
1804
CTATATGATAGGCATG
3573




ATATAG

TGAT






9226
9248
CCTATCATATAGTAAAA
1805
GCTGGGTTTTACTATA
3574




CCCAGC

TGAT






9243
9265
CCCAGCCCATGACCCCT
1806
CCTGTTAGGGGTCATG
3575




AACAGG

GGCT






9244
9266
CCAGCCCATGACCCCTA
1807
CCCTGTTAGGGGTCAT
3576




ACAGGG

GGGC






9248
9270
CCCATGACCCCTAACAG
1808
GGGCCCCTGTTAGGGG
3577




GGGCCC

TCAT






9249
9271
CCATGACCCCTAACAGG
1809
AGGGCCCCTGTTAGGG
3578




GGCCCT

GTCA






9255
9277
CCCCTAACAGGGGCCCT
1810
GCTGAGAGGGCCCCTG
3579




CTCAGC

TTAG






9256
9278
CCCTAACAGGGGCCCTC
1811
GGCTGAGAGGGCCCCT
3580




TCAGCC

GTTA






9257
9279
CCTAACAGGGGCCCTCT
1812
GGGCTGAGAGGGCCC
3581




CAGCCC

CTGTT






9268
9290
CCCTCTCAGCCCTCCTA
1813
GGTCATTAGGAGGGCT
3582




ATGACC

GAGA






9269
9291
CCTCTCAGCCCTCCTAA
1814
AGGTCATTAGGAGGGC
3583




TGACCT

TGAG






9277
9299
CCCTCCTAATGACCTCC
1815
TAGGCCGGAGGTCATT
3584




GGCCTA

AGGA






9278
9300
CCTCCTAATGACCTCCG
1816
CTAGGCCGGAGGTCAT
3585




GCCTAG

TAGG






9281
9303
CCTAATGACCTCCGGCC
1817
TGGCTAGGCCGGAGGT
3586




TAGCCA

CATT






9289
9311
CCTCCGGCCTAGCCATG
1818
AAATCACATGGCTAGG
3587




TGATTT

CCGG






9292
9314
CCGGCCTAGCCATGTGA
1819
GTGAAATCACATGGCT
3588




TTTCAC

AGGC






9296
9318
CCTAGCCATGTGATTTC
1820
GGAAGTGAAATCACAT
3589




ACTTCC

GGCT






9301
9323
CCATGTGATTTCACTTC
1821
GGAGTGGAAGTGAAA
3590




CACTCC

TCACA






9317
9339
CCACTCCATAACGCTCC
1822
GTATGAGGAGCGTTAT
3591




TCATAC

GGAG






9322
9344
CCATAACGCTCCTCATA
1823
GCCTAGTATGAGGAGC
3592




CTAGGC

GTTA






9332
9354
CCTCATACTAGGCCTAC
1824
TGGTTAGTAGGCCTAG
3593




TAACCA

TATG






9344
9366
CCTACTAACCAACACAC
1825
TGGTTAGTGTGTTGGT
3594




TAACCA

TAGT






9352
9374
CCAACACACTAACCATA
1826
TTGGTATATGGTFAGT
3595




TACCAA

GTGT






9364
9386
CCATATACCAATGATGG
1827
ATCGCGCCATCATTGG
3596




CGCGAT

TATA






9371
9393
CCAATGATGGCGCGATG
1828
GTGTTACATCGCGCCA
3597




TAACAC

TCAT






9407
9429
CCAAGGCCACCACACAC
1829
CAGGTGGTGTGTGGTG
3598




CACCTG

GCCT






9413
9435
CCACCACACACCACCTG
1830
TTTGGACAGGTGGTGT
3599




TCCAAA

GTGG






9416
9438
CCACACACCACCTGTCC
1831
CTTTTTGGACAGGTGG
3600




AAAAAG

TGTG






9423
9445
CCACCTGTCCAAAAAGG
1832
CGAAGGCCTTTTTGGA
3601




CCTTCG

CAGG






9426
9448
CCTGTCCAAAAAGGCCT
1833
TATCGAAGGCCTTTTT
3602




TCGATA

GGAC






9431
9453
CCAAAAAGGCCTTCGAT
1834
TCCCGTATCGAAGGCC
3603




ACGGGA

TTTT






9440
9462
CCTTCGATACGGGATAA
1835
ATAGGATTATCCCGTA
3604




TCCTAT

TCGA






9458
9480
CCTATTTATTACCTCAG
1836
AAACTTCTGAGGTAAT
3605




AAGTTT

AAAT






9469
9491
CCTCAGAAGTTTTTTTCT
1837
TGCGAAGAAAAAAAC
3606




TCGCA

TTCTG






9505
9527
CCTTTTACCACTCCAGC
1838
GGCTAGGCTGGAGTGG
3607




CTAGCC

TAAA






9512
9534
CCACTCCAGCCTAGCCC
1839
GGGTAGGGGCTAGGCT
3608




CTACCC

GGAG






9517
9539
CCAGCCTAGCCCCTACC
1840
TTGGGGGGTAGGGGCT
3609




CCCCAA

AGGC






9521
9543
CCTAGCCCCTACCCCCC
1841
CTAATTGGGGGGTAGG
3610




AATTAG

GGCT






9526
9548
CCCCTACCCCCCAATTA
1842
CCCTCCTAATTGGGGG
3611




GGAGGG

GTAG






9527
9549
CCCTACCCCCCAATTAG
1843
GCCCTCCTAATTGGGG
3612




GAGGGC

GGTA






9528
9550
CCTACCCCCCAATTAGG
1844
TGCCCTCCTAATTGGG
3613




AGGGCA

GGGT






9532
9554
CCCCCCAATTAGGAGGG
1845
CCAGTGCCCTCCTAAT
3614




CACTGG

TGGG






9533
9555
CCCCCAATTAGGAGGGC
1846
GCCAGTGCCCTCCTAA
3615




ACTGGC

TTGG






9534
9556
CCCCAATTAGGAGGGCA
1847
GGCCAGTGCCCTCCTA
3616




CTGGCC

ATTG






9535
9557
CCCAATTAGGAGGGCAC
1848
GGGCCAGTGCCCTCCT
3617




TGGCCC

AATT






9536
9558
CCAATTAGGAGGGCACT
1849
GGGGCCAGTGCCCTCC
3618




GGCCCC

TAAT






9555
9577
CCCCCAACAGGCATCAC
1850
AGCGGGGTGATGCCTG
3619




CCCGCT

TTGG






9556
9578
CCCCAACAGGCATCACC
1851
TAGCGGGGTGATGCCT
3620




CCGCTA

GTTG






9557
9579
CCCAACAGGCATCACCC
1852
TTAGCGGGGTGATGCC
3621




CGCTAA

TGTT






9558
9580
CCAACAGGCATCACCCC
1853
TTTAGCGGGGTGATGC
3622




GCTAAA

CTGT






9571
9593
CCCCGCTAAATCCCCTA
1854
GACTTCTAGGGGATTT
3623




GAAGTC

AGCG






9572
9594
CCCGCTAAATCCCCTAG
1855
GGACTTCTAGGGGATT
3624




AAGTCC

TAGC






9573
9595
CCGCTAAATCCCCTAGA
1856
GGGACTTCTAGGGGAT
3625




AGTCCC

TTAG






9582
9604
CCCCTAGAAGTCCCACT
1857
TTTAGGAGTGGGACTT
3626




CCTAAA

CTAG






9583
9605
CCCTAGAAGTCCCACTC
1858
GTTTAGGAGTGGGACT
3627




CTAAAC

TCTA






9584
9606
CCTAGAAGTCCCACTCC
1859
TGTTTAGGAGTGGGAC
3628




TAAACA

TTCT






9593
9615
CCCACTCCTAAACACAT
1860
ATACGGATGTGTTTAG
3629




CCGTAT

GAGT






9594
9616
CCACTCCTAAACACATC
1861
AATACGGATGTGTTTA
3630




CGTATT

GGAG






9599
9621
CCTAAACACATCCGTAT
1862
CGAGTAATACGGATGT
3631




TACTCG

GTTT






9610
9632
CCGTATTACTCGCATCA
1863
TACTCCTGATGCGAGT
3632




GGAGTA

AATA






9640
9662
CCTGAGCTCACCATAGT
1864
TATTAGACTATGGTGA
3633




CTAATA

GCTC






9650
9672
CCATAGTCTAATAGAAA
1865
GGTTGTTTTCTATTAG
3634




ACAACC

ACTA






9671
9693
CCGAAACCAAATAATTC
1866
GTGCTTGAATTATTTG
3635




AAGCAC

GTTT






9677
9699
CCAAATAATTCAAGCAC
1867
TAAGCAGTGCTTGAAT
3636




TGCTTA

TATT






9727
9749
CCCTCCTACAAGCCTCA
1868
GTACTCTGAGGCTTGT
3637




GAGTAC

AGGA






9728
9750
CCTCCTACAAGCCTCAG
1869
AGTACTCTGAGGCTTG
3638




AGTACT

TAGG






9731
9753
CCTACAAGCCTCAGAGT
1870
CGAAGTACTCTGAGGC
3639




ACTTCG

TTGT






9739
9761
CCTCAGAGTACTTCGAG
1871
CGAAGTACTCTGAGGC
3640




TCTCCC

CTCTG






9759
9781
CCCTTCACCATTTCCGA
1872
ATGCCGTCGGAAATGG
3641




CGGCAT

TGAA






9760
9782
CCTTCACCATTTCCGAC
1873
GATGCCGTCGGAAATG
3642




GGCATC

GTGA






9766
9788
CCATTTCCGACGGCATC
1874
GCCGTAGATGCCGTCG
3643




TACGGC

GAAA






9772
9794
CCGACGGCATCTACGGC
1875
TGTTGAGCCGTAGATG
3644




TCAACA

CCGT






9805
9827
CCACAGGCTTCCACGGA
1876
GTGAAGTCCGTGGAAG
3645




CTTCAC

CCTG






9815
9837
CCACGGACTTCACGTCA
1877
CAATAATGACGTGAAG
3646




TTATTG

TCCG






9848
9870
CCTCACTATCTGCTTCA
1878
GGCGGATGAAGCAGA
3647




TCCGCC

TAGTG






9866
9888
CCGCCAACTAATATTTC
1879
TAAAGTGAAATATTAG
3648




ACTTTA

TTGG






9869
9891
CCAACTAATATTTCACT
1880
ATGTAAAGTGAAATAT
3649




TTACAT

TAGT






9892
9914
CCAAACATCACTTTGGC
1881
TTCGAAGCCAAAGTGA
3650




TTCGAA

TGTT






9916
9938
CCGCCGCCTGATACTGG
1882
AAAATGCCAGTATCAG
3651




CATTTT

GCGG






9919
9941
CCGCCTGATACTGGCAT
1883
TACAAAATGCCAGTAT
3652




TTTGTA

CAGG






9922
9944
CCTGATACTGGCATTTT
1884
ATCTACAAAATGCCAG
3653




GTAGAT

TATC






9970
9992
CCATCTATTGATGAGGG
1885
GTAAGACCCTCATCAA
3654




TCTTAC

TAGA






10012
10034
CCGTTAACTTCCAATTA
1886
ACTAGTTAATTGGAG
3655




ACTAGT

TTAA






10022
10044
CCAATTAACTAGTTTTG
1887
TGTTGTCAAAACTAGT
3656




ACAACA

TAAT






10069
10091
CCTTAATTTTAATAATC
1888
GGTGTTGATTATTAAA
3657




AACACC

ATTA






10090
10112
CCCTCCTAGCCTTACTA
1889
TATTAGTAGTAAGGCT
3658




CTAATA

AGGA






10091
10113
CCTCCTAGCCTTACTAC
1890
TTATTAGTAGTAAGGC
3659




TAATAA

TAGG






10094
10116
CCTAGCCTTACTACTAA
1891
TAATTATTAGTAGTAA
3660




TAATTA

GGCT






10099
10121
CCTTACTACTAATAATT
1892
TGTAATAATTATTAGT
3661




ATTACA

AGTA






10131
10153
CCACAACTCAACGGCT
1893
TCTATGTAGCCGTTGA
3662




CATAGA

GTTG






10159
10181
CCACCCCTTACGAGTGC
1894
GAAGCCGCACTCGTAA
3663




GGCTTC

GGGG






10162
10184
CCCCTTACGAGTGCGGC
1895
GTCGAAGCCGCACTCG
3664




TTCGAC

TAAG






10163
10185
CCCTTACGAGTGCGGCT
1896
GGTCGAAGCCGCACTC
3665




TCGACC

GTAA






10164
10186
CCTTACGAGTGCGGCTT
1897
GGGTCGAAGCCGCACT
3666




CGACCC

CGTA






10184
10206
CCCTATATCCCCCGCCC
1898
GGACGCGGGCGGGGG
3667




GCGTCC

ATATA






10185
10207
CCTATATCCCCCGCCCG
1899
GGGACGCGGGCGGGG
3668




CGTCCC

GATAT






10192
10214
CCCCCGCCCGCGTCCCT
1900
GGAGAAAGGGACGCG
3669




TTCTCC

GGCGG






10193
10215
CCCCGCCCGCGTCCCTT
1901
TGGAGAAAGGGACGC
3670




TCTCCA

GGGCG






10194
10216
CCCGCCCGCGTCCCTTT
1902
ATGGAGAAAGGGACG
3671




CTCCAT

CGGGC






10195
10217
CCGCCCGCGTCCCTTTC
1903
TATGGAGAAAGGGAC
3672




TCCATA

GCGGG






10198
10220
CCCGCGTCCCTTTCTCC
1904
TTTTATGGAGAAAGGG
3673




ATAAAA

ACGC






10199
10221
CCGCGTCCCTTTCTCCA
1905
ATTTTATGGAGAAAGG
3674




TAAAAT

GACG






10205
10227
CCCTTTCTCCATAAAAT
1906
AGAAGAATTTTATGGA
3675




TCTTCT

GAAA






10206
10228
CCTTTCTCCATAAAATT
1907
AAGAAGAATTTTATGG
3676




CTTCTT

AGAA






10213
10235
CCATAAAATTCTTCTTA
1908
AGCTACTAAGAAGAAT
3677




GTAGCT

TTTA






10240
10262
CCTTCTTATTATTTGATC
1909
TTCTAGATCAAATAAT
3678




TAGAA

AAGA






10267
10289
CCCTCCTTTTACCCCTAC
1910
TCATGGTAGGGGTAAA
3679




CATGA

AGGA






10268
10290
CCTCCTTTTACCCCTACC
1911
CTCATGGTAGGGGTAA
3680




ATGAG

AAGG






10271
10293
CCTTTTACCCCTACCAT
1912
GGGCTCATGGTAGGGG
3681




GAGCCC

TAAA






10278
10300
CCCCTACCATGAGCCCT
1913
GTTTGTAGGGCTCATG
3682




ACAAAC

GTAG






10279
10301
CCCTACCATGAGCCCTA
1914
TGTTTGTAGGGCTCAT
3683




CAAACA

GGTA






10280
10302
CCTACCATGAGCCCTAC
1915
TTGTTTGTAGGGCTCA
3684




AAACAA

TGGT






10284
10306
CCATGAGCCCTACAAAC
1916
TTAGTTGTTTGTAGGG
3685




AACTAA

CTCA






10291
10313
CCCTACAAACAACTAAC
1917
TGGCAGGTTAGTTGTT
3686




CTGCCA

TGTA






10292
10314
CCTACAAACAACTAACC
1918
GTGGCAGGTTAGTTGT
3687




TGCCAC

TTGT






10307
10329
CCTGCCACTAATAGTTA
1919
ATGACATAACTATTAG
3688




TGTCAT

TGGC






10311
10333
CCACTAATAGTTATGTC
1920
AGGGATGACATAACTA
3689




ATCCCT

TTAG






10330
10352
CCCTCTTATTAATCATC
1921
TAGGATGATGATTAAT
3690




ATCCTA

AAGA






10331
10353
CCTCTTATTAATCATCA
1922
CTAGGATGATGATTAA
3691




TCCTAG

TAAG






10349
10371
CCTAGCCCTAAGTCTGG
1923
CATAGCCAGACTTAG
3692




CCTATG

GGCT






10354
10376
CCCTAAGTCTGGCCTAT
1924
TCACTCATAGGCCAGA
3693




GAGTGA

CTTA






10355
10377
CCTAAGTCTGGCCTATG
1925
GTCACTCATAGGCCAG
3694




AGTGAC

ACTT






10366
10388
CCTATGAGTGACTACAA
1926
TCCTTTTTGTAGTCACT
3695




AAAGGA

CAT






10399
10421
CCGAATTGGTATATAGT
1927
GTTTAAACTATATACC
3696




TTAAAC

AATT






10466
10488
CCAAATGCCCCTCATTT
1928
TTATGTAAATGAGGGG
3697




ACATAA

CATT






10473
10495
CCCCTCATTTACATAAA
1929
ATAATATTTATGTAAA
3698




TATTAT

TGAG






10474
10496
CCCTCATTTACATAAAT
1930
TATAATATTTATGTAA
3699




ATTATA

ATGA






10475
10497
CCTCATTTACATAAATA
1931
GTATAATATTTATGTA
3700




TTATAC

AATG






10507
10529
CCATCTCACTTCTAGGA
1932
TAGTATTCCTAGAAGT
3701




ATACTA

GAGA






10544
10566
CCTCATATCCTCCCTAC
1933
GGCATAGTAGGGAGG
3702




TATGCC

ATATG






10552
10574
CCTCCCTACTATGCCTA
1934
TCCTTCTAGGCATAGT
3703




GAAGGA

AGGG






10555
10577
CCCTACTATGCCTAGAA
1935
TATTCCTTCTAGGCAT
3704




GGAATA

AGTA






10556
10578
CCTACTATGCCTAGAAG
1936
TTATTCCTTCTAGCTCA
3705




GAATAA

TAGT






10565
10587
CCTAGAAGGAATAATAC
1937
GCGATAGTATTATTCC
3706




TATCGC

TTCT






10612
10634
CCCTCAACACCCACTCC
1938
TAAGAGGGAGTGGGT
3707




CTCTTA

GTTGA






10613
10635
CCTCAACACCCACTCCC
1939
CTAAGAGGGAGTGGG
3708




TCTTAG

TGTTG






10621
10643
CCCACTCCCTCTTAGCC
1940
AATATTGGCTAAGAGG
3709




AATATT

GAGT






10622
10644
CCACTCCCTCTTAGCCA
1941
CAATATTGGCTAAGAG
3710




ATATTG

GGAG






10627
10649
CCCTCTTAGCCAATATT
1942
AGGCACAATATTGGCT
3711




GTGCCT

AAGA






10628
10650
CCTCTTAGCCAATATTG
1943
TAGGCACAATATTGGC
3712




TGCCTA

TAAG






10636
10658
CCAATATTGTGCCTATT
1944
TATGGCAATAGGCACA
3713




GCCATA

ATAT






10647
10669
CCTATTGCCATACTAGT
1945
GCAAAGACTAGTATGG
3714




CTTTGC

CAAT






10654
10676
CCATACTAGTCTTTGCC
1946
GCAGGCGGCAAAGAC
3715




GCCTGC

TAGTA






10669
10691
CCGCCTGCGAAGCAGCG
1947
GCCCACCGCTGCTTCG
3716




GTGGGC

CAGG






10672
10694
CCTGCGAAGCAGCGGTG
1948
TAGGCCCACCGCTGCT
3717




GGCCTA

TCGC






10691
10713
CCTAGCCCTACTAGTCT
1949
AGATTGAGACTAGTAG
3718




CAATCT

GGCT






10696
10718
CCCTACTAGTCTCAATC
1950
GTTGGAGATTGAGACT
3719




TCCAAC

AGTA






10697
10719
CCTACTAGTCTCAATCT
1951
TGTTGGAGATTGAGAC
3720




CCAACA

TAGT






10714
10736
CCAACACATATGGCCTA
1952
GTAGTCTAGGCCATAT
3721




GACTAC

GTGT






10727
10749
CCTAGACTACGTACATA
1953
TTAGGTTATGTACGTA
3722




ACCTAA

GTCT






10745
10767
CCTAAACCTACTCCAAT
1954
TTTAGCATTGGAGTAG
3723




GCTAAA

GTTT






10751
10773
CCTACTCCAATGCTAAA
1955
ATTAGTTTTAGCATTG
3724




ACTAAT

GAGT






10757
10779
CCAATGCTAAAACTAAT
1956
GGGACGATTAGTTTTA
3725




CGTCCC

GCAT






10777
10799
CCCAACAATTATATTAC
1957
GTGGTAGTAATATAAT
3726




TACCAC

TGTT






10778
10800
CCAACAATTATATTACT
1958
AGTGGTAGTAATATAA
3727




ACCACT

TTGT






10796
10818
CCACTGACATGACTTTC
1959
TTTTTGGAAAGTCATG
3728




CAAAAA

TCAG






10812
10834
CCAAAAAACACATAATT
1960
GATTCAAATTATGTGT
3729




TGAATC

TTTT






10842
10864
CCACCCACAGCCTAATT
1961
GCTAATAATTAGGCTG
3730




ATTAGC

TGGG






10845
10867
CCCACAGCCTAATTATT
1962
GATGCTAATAATTAGG
3731




AGCATC

CTGT






10846
10868
CCACAGCCTAATTATTA
1963
TGATGCTAATAATTAG
3732




GCATCA

GCTG






10852
10874
CCTAATTATTAGCATCA
1964
GAGGGATGATGCTAAT
3733




TCCCTC

AATT






10870
10892
CCCTCTACTATTTTTTAA
1965
TTTGGTTAAAAAATAG
3734




CCAAA

TAGA






10871
10893
CCTCTACTATTTTTTAAC
1966
ATTTGGTTAAAAAATA
3735




CAAAT

GTAG






10888
10910
CCAAATCAACAACAACC
1967
TAAATAGGTTGTTGTT
3736




TATTTA

GATT






10903
10925
CCTATTTAGCTGTTCCC
1968
AGGTTGGGGAACAGCT
3737




CAACCT

AAAT






10917
10939
CCCCAACCTTTTCCTCC
1969
GGGGTCGGAGGAAAA
3738




GACCCC

GGTTG






10918
10940
CCCAACCTTTTCCTCCG
1970
GGGGGTCGGAGGAAA
3739




ACCCCC

AGGTT






10919
10941
CCAACCTTTTCCTCCGA
1971
AGGGGGTCGGAGGAA
3740




CCCCCT

AAGGT






10923
10945
CCTTTTCCTCCGACCCC
1972
TGTTAGGGGGTCGGAG
3741




CTAACA

GAAA






10929
10951
CCTCCGACCCCCTAACA
1973
GGGGGTTGTTAGGGGG
3742




ACCCCC

TCGG






10932
10954
CCGACCCCCTAACAACC
1974
GAGGGGGGTTGTTAGG
3743




CCCCTC

GGGT






10936
10958
CCCCCTAACAACCCCCC
1975
TTAGGAGGGGGGTTGT
3744




TCCTAA

TAGG






10937
10959
CCCCTAACAACCCCCCT
1976
ATTAGGAGGGGGGTTG
3745




CCTAAT

TTAG






10938
10960
CCCTAACAACCCCCCTC
1977
TATTAGGAGGGGGGTT
3746




CTAATA

GTTA






10939
10961
CCTAACAACCCCCCTCC
1978
GTATTAGGAGGGGGGT
3747




TAATAC

TGTT






10947
10969
CCCCCCTCCTAATACTA
1979
GGTAGTTAGTATTAGG
3748




ACTACC

AGGG






10948
10970
CCCCCTCCTAATACTAA
1980
AGGTAGTTAGTATTAG
3749




CTACCT

GAGG






10949
10971
CCCCTCCTAATACTAAC
1981
CAGGTAGTTAGTATTA
3750




TACCTG

GGAG






10950
10972
CCCTCCTAATACTAAC
1982
TCAGGTAGTTAGTATT
3751




ACCTGA

AGGA






10951
10973
CCTCCTAATACTAACTA
1983
GTCAGGTAGTTAGTAT
3752




CCTGAC

TAGG






10954
10976
CCTAATACTAACTACCT
1984
GGAGTCAGGTAGTTAG
3753




GACTCC

TATT






10968
10990
CCTGACTCCTACCCCTC
1985
GATTGTGAGGGGTAGG
3754




ACAATC

AGTC






10975
10997
CCTACCCCTCACAATCA
1986
TTGCCATGATTGTGAG
3755




TGGCAA

GGGT






10979
11001
CCCCTCACAATCATGGC
1987
TGGCTTGCCATGATTG
3756




AAGCCA

TGAG






10980
11002
CCCTCACAATCATGGCA
1988
TTGGCTTGCCATGATT
3757




AGCCAA

GTGA






10981
11003
CCTCACAATCATGGCAA
1989
GTTGGCTTGCCATGAT
3758




GCCAAC

GTG






10999
11021
CCAACGCCACTTATCCA
1990
GTTCACTGGATAAGTG
3759




GTGAAC

GCGT






11005
11027
CCACTTATCCAGTGAAC
1991
ATAGTGGTTCACTGGA
3760




CACTAT

TAAG






11013
11035
CCAGTGAACCACTATCA
1992
TTTTCGTGATAGTGGT
3761




CGAAAA

TCAC






11021
11043
CCACTATCACGAAAAAA
1993
TAGAGTTTTTTTCGTG
3762




ACTCTA

ATAG






11044
11066
CCTCTCTATACTAATCT
1994
GTAGGGAGATTAGTAT
3763




CCCTAC

AGAG






11061
11083
CCCTACAAATCTCCTTA
1995
TATAATTAAGGAGATT
3764




ATTATA

TGTA






11062
11084
CCTACAAATCTCCTTAA
1996
TTATAATTAAGGAGAT
3765




TTATAA

TTGT






11073
11095
CCTTAATTATAACATTC
1997
GGCTGTGAATGTTATA
3766




ACAGCC

ATTA






11094
11116
CCACAGAACTAATCATA
1998
ATAAAATATGATTAGT
3767




TTTTAT

TCTG






11130
11152
CCACACTTATCCCCACC
1999
AGCCAAGGTGGGGAT
3768




TTGGCT

AAGTG






11140
11162
CCCCACCTTGGCTATCA
2000
GGGTGATGATAGCCAA
3769




TCACCC

GGTG






11141
11163
CCCACCTTGGCTATCAT
2001
CGGGTGATGATAGCCA
3770




CACCCG

AGGT






11142
11164
CCACCTTGGCTATCATC
2002
TCGGGTGATGATAGCC
3771




ACCCGA

AAGG






11145
11167
CCTTGGCTATCATCACC
2003
TCATCGGGTGATGATA
3772




CGATGA

GCCA






11160
11182
CCCGATGAGGCAACCA
2004
TTCTGGCTGGTTGCCT
3773




GCCAGAA

CATC






11161
11183
CCGATGAGGCAACCAG
2005
GTTCTGGCTGGTTGCC
3774




CCAGAAC

TCAT






11173
11195
CCAGCCAGAACGCCTGA
2006
CTGCGTTCAGGCGTTC
3775




ACGCAG

TGGC






11177
11199
CCAGAACGCCTGAACGC
2007
GTGCCTGCGTTCAGGC
3776




AGGCAC

GTTC






11185
11207
CCTGAACGCAGGCACAT
2008
GGAAGTATGTGCCTGC
3777




ACTTCC

GTTC






11206
11228
CCTATTCTACACCCTAG
2009
AGCCTACTAGGGTGTA
3778




TAGGCT

GAAT






11217
11239
CCCTAGTAGGCTCCCTT
2010
TAGGGGAAGGGAGCC
3779




CCCCTA

TACTA






11218
11240
CCTAGTAGGCTCCCTTC
2011
GTAGGGGAAGGGAGC
3780




CCCTAC

CTACT






11229
11251
CCCTTCCCCTACTCATC
2012
TAGTGCGATGAGTAGG
3781




GCACTA

GGAA






11230
11252
CCTTCCCCTACTCATCG
2013
TTAGTGCGATGAGTAG
3782




CACTAA

GGGA






11234
11256
CCCCTACTCATCGCACT
2014
TAAATTAGTGCGATGA
3783




AATTTA

GTAG






11235
11257
CCCTACTCATCGCACTA
2015
GTAAATTAGTGCGATG
3784




ATTTAC

AGTA






11236
11258
CCTACTCATCGCACTAA
2016
TGTAAATTAGTGCGAT
3785




TTTACA

GAGT






11268
11290
CCCTAGGCTCACTAAAC
2017
TAGAATGTTTAGTGAG
3786




ATTCTA

CCTA






11269
11291
CCTAGGCTCACTAAACA
2018
GTAGAATGTTTAGTGA
3787




TTCTAC

GCCT






11307
11329
CCCAAGAACTATCAAAC
2019
TCAGGAGTTTGATAGT
3788




TCCTGA

TCTT






11308
11330
CCAAGAACTATCAAACT
2020
CTCAGGAGTTTGATAG
3789




CCTGAG

TTCT






11325
11347
CCTGAGCCAACAACTTA
2021
TCATATTAAGTTGTTG
3790




ATATGA

GCTC






11331
11353
CCAACAACTTAATATGA
2022
AGCTAGTCATATTAAG
3791




CTAGCT

TTGT






11381
11403
CCTCTTTACGGACTCCA
2023
CATAAGTGGAGTCCGT
3792




CTTTATG

AAAG






11395
11417
CCACTTATGACTCCCTA
2024
GGGCTTTAGGGAGTCA
3793




AAGCCC

TAAG






11407
11429
CCCTAAAGCCCATGTCG
2025
GGGCTTCGACATGGGC
3794




AAGCCC

TTTA






11408
11430
CCTAAAGCCCATGTCGA
2026
GGGGCTTCGACATGGG
3795




AGCCCC

CTTT






11415
11437
CCCATGTCGAAGCCCCC
2027
AGCGATGGGGGCTTCG
3796




ATCGCT

ACAT






11416
11438
CCATGTCGAAGCCCCCA
2028
CAGCGATGGGGGCTTC
3797




TCGCTG

GACA






11427
11449
CCCCCATCGCTGGGTCA
2029
TACTATTGACCCAGCG
3798




ATAGTA

ATGG






11428
11450
CCCCATCGCTGGGTCAA
2030
GTACTATTGACCCAGC
3799




TAGTAC

GATG






11429
11451
CCCATCGCTGGGTCAAT
2031
AGTACTATTGACCCAG
3800




AGTACT

CGAT






11430
11452
CCATCGCTGGGTCAATA
2032
AAGTACTATTGACCCA
3801




GTACTT

GCGA






11454
11476
CCGCAGTACTCTTAAAA
2033
GCCTAGTTTTAAGAGT
3802




CTAGGC

ACTG






11494
11516
CCTCACACTCATTCTCA
2034
GGGGGTTGAGAATGA
3803




ACCCCC

GTGTG






11512
11534
CCCCCTGACAAAACACA
2035
AGGCTATGTGTTTTGT
3804




TAGCCT

CAGG






11513
11535
CCCCTGACAAAACACAT
2036
TAGGCTATGTGTTTTG
3805




AGCCTA

TCAG






11514
11536
CCCTGACAAAACACATA
2037
GTAGGCTATGTGTTTT
3806




GCCTAC

GTCA






11515
11537
CCTGACAAAACACATAG
2038
GGTAGGCTATGTGTTT
3807




CCTACC

TGTC






11532
11554
CCTACCCCTTTCCTGTA
2039
GGATAGTACAAGGAA
3808




CTATCC

GGGGT






11536
11558
CCCCTTCCTTGTACTATC
2040
ATAGGGATAGTACAA
3809




CCTAT

GGAAG






11537
11559
CCCTTCCTTGTACTATCC
2041
CATAGGGATAGTACAA
3810




CTATG

GGAA






11538
11560
CCTTCCTTGTACTATCCC
2042
TCATAGGGATAGTACA
3811




TATGA

AGGA






11542
11564
CCTTGTACTATCCCTAT
2043
TGCCTCATAGGGATAG
3812




GAGGCA

TACA






11553
11575
CCCTATGAGGCATAATT
2044
TGTTATAATTATGCCT
3813




ATAACA

CATA






11554
11576
CCTATGAGGCATAATTA
2045
TTGTTATAATTATGCC
3814




TAACAA

TCAT






11580
11602
CCATCTGCCTACGACAA
2046
GTCTGTTTGTCGTAGG
3815




ACAGAC

CAGA






11587
11609
CCTACGACAAACAGACC
2047
ATTTTAGGTCTGTTTGT
3816




TAAAAT

CGT






11602
11624
CCTAAAATCGCTCATTG
2048
AGTATGCAATGAGCGA
3817




CATACT

TTTT






11635
11657
CCACATAGCCCTCGTAG
2049
CTGTTACTACGAGGGC
3818




TAACAG

TATG






11643
11665
CCCTCGTAGTAACAGCC
2050
GAGAATGGCTGTTACT
3819




ATTCTC

ACGA






11644
11666
CCTCGTAGTAACAGCCA
2051
TGAGAATGGCTGTTAC
3820




TTCTCA

TACG






11658
11680
CCATTCTCATCCAAACC
2052
TCAGGGGGTTTGGATG
3821




CCCTGA

AGAA






11668
11690
CCAAACCCCCTGAAGCT
2053
CGGTGAAGCTTCAGGG
3822




TCACCG

GGTT






116173
11695
CCCCCTGAAGCTTCACC
2054
TGCGCCGGTGAAGCTT
3823




GGCGCA

CAGG






11674
11696
CCCCTGAAGCTTCACCG
2055
CTGCGCCGGTGAAGCT
3824




GCGCAG

TCAG






11675
11697
CCCTGAAGCTTCACCGG
2056
ACTGCGCCGGTGAAGC
3825




CGCAGT

TTCA






11676
11698
CCTGAAGCTTCACCGGC
2057
GACTGCGCCGGTGAAG
3826




GCAGTC

CTTC






11688
11710
CCGGCGCAGTCATTCTC
2058
GATTATGAGAATGACT
3827




ATAATC

GCGC






11712
11734
CCCACGGGCTTACATCC
2059
TAATGAGGATGTAAGC
3828




TCATTA

CCGT






11713
11735
CCACGGGCTTACATCCT
2060
GTAATGAGGATGTAAG
3829




CATTAC

CCCG






11727
11749
CCTCATTACTATTCTGC
2061
TGCTAGGCAGAATAGT
3830




CTAGCA

AATG






11743
11765
CCTAGCAAACTCAAACT
2062
GTTCGTAGTTTGAGTT
3831




ACGAAC

TGCT






11788
11810
CCTCTCTCAAGGACTTC
2063
GAGTTTGAAGTCCTTG
3832




AAACTC

AGAG






11815
11837
CCCACTAATAGCTTTTT
2064
GTCATCAAAAAGCTAT
3833




GATGAC

TAGT






11816
11838
CCACTAATAGCTTTTTG
2065
AGTCATCAAAAAGCTA
3834




ATGACT

TTAG






11870
11848
CCTCGCTAACCTCGCCT
2066
GGGGTAAGGCGAGGT
3835




TACCCC

TAGCG






11857
11879
CCTCGCCTTACCCCCCA
2067
TAATAGTGGGGGGTAA
3836




CTATTA

GGCG






11862
11884
CCTTACCCCCCACTATT
2068
TAGGTTAATAGTGGGG
3837




AACCTA

GGTA






11867
11889
CCCCCCACTATTAACCT
2069
CCCAGTAGGTTAATAG
3838




ACTGGG

TGGG






11868
11890
CCCCCACTATTAACCTA
2070
TCCCAGTAGGTTAATA
3839




CTGGGA

GTGG






11869
11891
CCCCACTATTAACCTAC
2071
CTCCCAGTAGGTTAAT
3840




TGGGAG

AGTG






11870
11892
CCCACTATTAACCTACT
2072
TCTCCCAGTAGGTTAA
3841




GGGAGA

TAGT






11871
11893
CCACTATTAACCTACTG
2073
TTCTCCCAGTAGGTTA
3842




GGAGAA

ATAG






11881
11903
CCTACTGGGAGAACTCT
2074
GCACAGAGAGTTCTCC
3843




CTGTGC

CAGT






11910
11932
CCACGTTCTCCTGATCA
2075
GATATTTGATCAGGAG
3844




AATATC

AACG






11919
11941
CCTGATCAAATATCACT
2076
TAGGAGAGTGATATTT
3845




CTCCTA

GATC






11938
11960
CCTACTTACAGGACTCA
2077
GTATGTTGAGTCCTGT
3846




ACATAC

AAGT






11970
11992
CCCTATACTCCCTCTAC
2078
AAATATGTAGAGGGA
3847




ATATTT

GTATA






11971
11993
CCTATACTCCCTCTACA
2079
TAAATATGTAGAGGGA
3848




TATTTA

GTAT






11979
12001
CCCTCTACATATTTACC
2080
TGTTGTGGTAAATATG
3849




ACAACA

TAGA






11980
12002
CCTCTACATATTTACCA
2081
GTGTTGTGGTAAATAT
3850




CAACAC

GTAG






11994
12016
CCACAACACAATGGGG
2082
GAGTGAGCCCCATTGT
3851




CTCACTC

GTTG






12018
12040
CCCACCACATTAACAAC
2083
TTTTATGTTGTTAATGT
3852




ATAAAA

GGT






12019
12041
CCACCACATTAACAACA
2084
GTTTTATGTTGTTAAT
3853




TAAAAC

GTGG






12022
12044
CCACATTAACAACATAA
2085
AGGGTTTTATGTTGTT
3854




AACCCT

AATG






12041
12063
CCCTCATTCACACGAGA
2086
GTGTTTTCTCGTGTGA
3855




AAACAC

ATGA






12042
12064
CCTCATTCACACGAGAA
2087
GGTGTTTTTTCGTGTG
3856




AACACC

AATG






12063
12085
CCCTCATGTTCATACAC
2088
GGATAGGTGTATGAAC
3857




CTATCC

ATGA






12064
12086
CCTCATGTTCATACACC
2089
GGGATAGGTGTATGAA
3858




TATCCC

CATG






12079
12101
CCTATCCCCCATTCTCCT
2090
ATAGGAGGAGAATGG
3859




CCTAT

GGGAT






12084
12106
CCCCCATTCTCCTCCTAT
2091
GAGGGATAGGAGGAG
3860




CCCTC

AATGG






12085
12107
CCCCATTCTCCTCCTATC
2092
TGAGGGATAGGAGGA
3861




CCTCA

GAATG






12086
12108
CCCATTCTCCTCCTATCC
2093
TTGAGGGATAGGAGG
3862




CTCAA

AGAAT






12087
12109
CCATTCTCCTCCTATCCC
2094
GTTGAGGGATAGGAG
3863




TCAAC

GAGAA






12094
12116
CCTCCTATCCCTCAACC
2095
TGTCGGGGTTGAGGGA
3864




CCGACA

TAGG






12097
12119
CCTATCCCTCAACCCCG
2096
TGATGTCGGGGTTGAG
3865




ACATCA

GGAT






12102
12124
CCCTCAACCCCGACATC
2097
GGTAATGATGTCGGGG
3866




ATTACC

TTGA






12103
12125
CCTCAACCCCGACATCA
2098
CGGTAATGATGTCGGG
3867




TTACCG

GTTG






12109
12131
CCCCGACATCATTACCG
2099
AAAACCCGGTAATGAT
3868




GGTTTT

GTCG






12110
12132
CCCGACATCATTACCGG
2100
GAAAACCCGGTAATG
3869




GTTTTC

ATGTC






12111
12133
CCGACATCATTACCGGG
2101
GGAAAACCCGGTAAT
3870




TTTTCC

GATGT






12123
12145
CCGGGTTTTCCTCTTGT
2102
ATATTTACAAGAGGAA
3871




AAATAT

AACC






12132
12154
CCTCTTGTAAATATAGT
2103
GGTTAAACTATATTTA
3872




TTAACC

CAAG






12153
12175
CCAAAACATCAGATTGT
2104
AGATTCACAATCTGAT
3873




GAATCT

GTTT






12194
12216
CCCCTTATTTACCGAGA
2105
GAGCTTTCTCGGTAAA
3874




AAGCTC

TAAG






12195
12217
CCCTTATTTACCGAGAA
2106
TGAGCTTTCTCGGTAA
3875




AGCTCA

ATAA






12196
12218
CCTTATTTACCGAGAAA
2107
GTGAGCTTTCTCGGTA
3876




GCTCAC

AATA






12205
12227
CCGAGAAAGCTCACAA
2108
GCAGTTCTTGTGAGCT
3877




GAACTGC

TTCT






12237
12259
CCCCCATGTCTAACAAC
2109
AGCCATGTTGTTAGAC
3878




ATGGCT

ATGG






12238
12260
CCCCATGTCTAACAACA
2110
AAGCCATGTTGTTAGA
3879




TGGCTT

CATG






12239
12261
CCCATGTCTAACAACAT
2111
AAAGCCATGTTGTTAG
3880




GGCTTT

ACAT






12240
12262
CCATGTCTAACAACATG
2112
GAAAGCCATGTTGTTA
3881




GCTTTC

GACA






12288
12310
CCATTGGTCTTAGGCCC
2113
TTTTTGGGGCCTAAGA
3882




CAAAAA

CCAA






12302
12324
CCCCAAAAATTTTGGTG
2114
GAGTTGCACCAAAATT
3883




CAACTC

TTTG






12303
12325
CCCAAAAATTTTGGTGC
2115
GGAGTTGCACCAAAAT
3884




AACTCC

TTTT






12304
12326
CCAAAAATTTTGGTGCA
2116
TGGAGTTGCACCAAAA
3885




ACTCCA

TTTT






12324
12346
CCAAATAAAAGTAATA
2117
GCATGGTTATTACTTT
3886




ACCATGC

TATT






12341
12363
CCATGCACACTACTATA
2118
GGTGGTTATAGTAGTG
3887




ACCACC

TGCA






12359
12381
CCACCCTAACCCTGACT
2119
TAGGGAAGTCAGGGTT
3888




TCCCTA

AGGG






12362
12384
CCCTAACCCTGACTTCC
2120
AATTAGGGAAGTCAG
3889




CTAATT

GGTTA






12363
12385
CCTAACCCTGACTTCCC
2121
GAATTAGGGAAGTCA
3890




TAATTC

GGGTT






12368
12390
CCCTGACTTCCCTAATT
2122
GGGGGGAATTAGGGA
3891




CCCCCC

AGTCA






12369
12391
CCTGACTTCCCTAATTC
2123
TGGGGGGAATTAGGG
3892




CCCCCA

AAGTC






12377
12399
CCCTAATTCCCCCCATC
2124
CTGTAAGGATGGGGGG
3893




CTTACC

AATTA






12378
12400
CCTAATTCCCCCCATCC
2125
TGGTAAGGATGGGGG
3894




TTACCA

GAATT






12385
12407
CCCCCCATCCTTACCAC
2126
ACGAGGGTGGTAAGG
3895




CCTCGT

ATGGG






12386
12408
CCCCCATCCTTACCACC
2127
AACGAGGGTGGTAAG
3896




CTCGTT

GATGG






12387
12409
CCCCATCCTTACCACCC
2128
TAACGAGGGTGGTAA
3897




TCGTTA

GGATG






12388
12410
CCCATCCTTACCACCCT
2129
TTAACGAGGGTGGTAA
3898




CGTTAA

GGAT






12389
12411
CCATCCTTACCACCCTC
2130
GTTAACGAGGGTGGTA
3899




GTTAAC

AGGA






12393
12415
CCTTACCACCCTCGTTA
2131
TAGGGTTAACGAGGGT
3900




ACCCTA

GGTA






12398
12420
CCACCCTCGTTAACCCT
2132
TTTGTTAGGGTTAACG
3901




AACAAA

AGGG






12401
12423
CCCTCGTTAACCCTAAC
2133
TTTTTTGTTAGGGTTA
3902




AAAAAA

ACGA






12402
12424
CCTCGTTAACCCTAACA
2134
TTTTTTTGTTAGGGTTA
3903




AAAAAA

ACG






12411
12433
CCCTAACAAAAAAAACT
2135
GGTATGAGTTTTTTTT
3904




CATACC

GTTA






12412
12434
CCTAACAAAAAAAACTC
2136
GGGTATGAGTTTTTTT
3905




ATACCC

TGTT






12432
12454
CCCCCATTATGTAAAAT
2137
CAATGGATTTTACATA
3906




CCATTG

ATGG






12433
12455
CCCCATTATGTAAAATC
2138
ACAATGGATTTTACAT
3907




CATTGT

AATG






12434
12456
CCCATTATGTAAAATCC
2139
GACATGGATTTTACA
3908




ATTGTC

TAAT






12435
12457
CCATTATGTAAAATCCA
2140
CGACAATGGATTTTAC
3909




TTGTCG

ATAA






12449
12471
CCATTGTCGCATCCACC
2141
AATAAAGGTGGATGC
3910




TTTATT

GACAA






12461
12483
CCACCTTTATTATCAGT
2142
GAAGAGACTGATAAT
3911




CTCTTC

AAAGG






12464
12486
CCTTTATTATCAGTCTCT
2143
GGGGAAGAGACTGAT
3912




TCCCC

AATAA






12483
12505
CCCCACAACAATATTCA
2144
GGCACATGAATATTGT
3913




TGTGCC

TGTG






12484
12506
CCCACAACAATATTCAT
2145
AGGCACATGAATATTG
3914




GTGCCT

TTGT






12485
12507
CCACAACAATATTCATG
2146
TAGGCACATGAATATT
3915




TGCCTA

GTTG






12504
12526
CCTAGACCAAGAAGTTA
2147
AGATAATAACTTCTTG
3916




TTATCT

GTCT






12510
12532
CCAAGAAGTTATTATCT
2148
AGTTCGAGATAATAAC
3917




CGAACT

TTCT






12542
12564
CCACAACCCAAACAACC
2149
GAGCTGGGTTGTTTGG
3918




CAGCTC

GTTG






12548
12570
CCCAAACAACCCAGCTC
2150
TAGGGAGAGCTGGGTT
3919




TCCCTA

GTTT






12549
12571
CCAAACAACCCAGCTCT
2151
TTAGGGAGAGCTGGGT
3920




CCCTAA

TGTT






12557
12579
CCCAGCTCTCCCTAAGC
2152
TTTGAAGCTTAGGGAG
3921




TTCAAA

AGCT






12558
12580
CCAGCTCTCCCTAAGCT
2153
GTTTGAAGCTTAGGGA
3922




TCAAAC

GAGC






12566
12588
CCCTAACTTCAAACTA
2154
GTAGTCTAGTTTGAAG
3923




GACTAC

CTTA






12567
12589
CCTAAGCTTCAAACTAG
2155
AGTAGTCTAGTTTGAA
3924




ACTACT

GCTT






12593
12615
CCATAATATTCATCCCT
2156
TGCTACAGGGATGAAT
3925




GTAGCA

ATTA






12606
12628
CCCTGTAGCATTGTTCG
2157
ATGTAACGAACAATGC
3926




TTACAT

ACA






12607
12629
CCTGTAGCATTGTTCGT
2158
CATGTAACGAACAATG
3927




TACATG

CTAC






12632
12654
CCATCATAGAATTCTCA
2159
TCACAGTGAGAATTCT
3928




CTGTGA

ATGA






12669
12691
CCCAAACATTAATCAGT
2160
TGAAGAACTGATTAAT
3929




TCTTCA

GTTT






12670
12692
CCAAACATTAATCAGTT
2161
TTGAAGAACTGATTAA
3930




CTTCAA

TGTT






12708
12730
CCTAATTACCATACTAA
2162
CTAAGATTAGTATGGT
3931




TCTTAG

AATT






12716
12738
CCATACTAATCTTAGTT
2163
AGCGGTAACTAAGATT
3932




ACCGCT

AGTA






12734
12756
CCGCTAACAACCTATTC
2164
CAGTTGGAATAGGTTG
3933




CAACTG

TTAG






12744
12766
CCTATTCCAACTGTTCA
2165
AGCCGATGAACAGTTG
3934




TCGGCT

GAAT






12750
12772
CCAACTGTTCATCGGCT
2166
CCTTTCAGCCGATGAA
3935




GAGAGG

CAGT






12788
12810
CCTTCTTGCTCATCAGTT
2167
TCATCAACTGATGAGC
3936




GATGA

AAGA






12815
12837
CCCGAGCAGATGCCAAC
2168
TGCTGTGTTGGCATCT
3937




ACAGCA

GCTC






12816
12838
CCGAGCAGATGCCAAC
2169
CTGCTGTGTTGGCATC
3938




ACAGCAG

TGCT






12827
12849
CCAACACAGCAGCCATT
2170
TGCTTGAATGGCTGCT
3939




CAAGCA

GTGT






12839
12861
CCATTCAAGCAATCCTA
2171
GTTGTATAGGATTGCT
3940




TACAAC

TGAA






12852
12874
CCTATACAACCGTATCG
2172
TATCGCCGATACGGTT
3941




GCGATA

GTAT






12861
12883
CCGTATCGGCGATATCG
2173
TGAAAACCGATATCGCC
3942




GTTTCA

GATA






12885
12907
CCTCGCCTTAGCATGAT
2174
GGATAAATCATGCTAA
3943




TTATCC

GGCG






12890
12912
CCTTAGCATGATTTATC
2175
GTGTAGGATAAATCAT
3944




CTACAC

GCTA






12906
12928
CCTACACTCCAACTCAT
2176
GGTCTCATGAGTTGGA
3945




GAGACC

GTGT






12914
12936
CCAACTCATGAGACCCA
2177
TTGTTGTGGGTCTCAT
3946




CAACAA

GAGT






12927
12949
CCCACAACAAATAGCCC
2178
TTAGAAGGGCTATTTG
3947




TTCTAA

TTGT






12928
12950
CCACAACAAATAGCCCT
2179
TTTAGAAGGGCTATTT
3948




TCTAAA

GTTG






12941
12963
CCCTTCTAAACGCTAAT
2180
GCTTGGATTAGCGTTT
3949




CCAAGC

AGAA






12942
12964
CCTTCTAAACGCTAATC
2181
GGCTTGGATTAGCGTT
3950




CAAGCC

TAGA






12958
12980
CCAAGCCTCACCCCACT
2182
CCTAGTAGTGGGGTGA
3951




ACTAGG

GGCT






12963
12985
CCTCACCCCACTACTAG
2183
GGAGGCCTAGTAGTGG
3952




GCCTCC

GGTG






12968
12990
CCCCACTACTAGGCCTC
2184
TAGGAGGAGGCCTAGT
3953




CTCCTA

AGTG






12969
12991
CCCACTACTAGGCCTCC
2185
CTAGGAGGAGGCCTA
3954




TCCTAG

GTAGT






12970
12992
CCACTACTAGGCCTCCT
2186
GCTAGGAGGAGGCCT
3955




CCTAGC

AGTAG






12981
13003
CCTCCTCCTAGCAGCAG
2187
TGCCTGCTGCTGCTAG
3956




CAGGCA

GAGG






12984
13006
CCTCCTAGCAGCAGCAG
2188
ATTTGCCTGCTGCTGC
3957




GCAAAT

TAGG






12987
13009
CCTAGCAGCAGCAGGC
2189
CTGATTTGCCTGCTGC
3958




AAATCAG

TGCT






13010
13032
CCCAATTAGGTCTCCAC
2190
TCAGGGGTGGAGACCT
3959




CCCTGA

AATT






13011
13033
CCAATTAGGTCTCCACC
2191
GTCAGGGGTGGAGAC
3960




CCTGAC

CTAAT






13023
13045
CCACCCCTGACTCCCCT
2192
TGGCTGAGGGGAGTCA
3961




CAGCCA

GGGG






13026
13048
CCCCTGACTCCCCTCAG
2193
CTATGGCTGAGGGGAG
3962




CCATAG

TCAG






13027
13049
CCCTGACTCCCCTCAGC
2194
TCTATGGCTGAGGGGA
3963




CATAGA

GTCA






13028
13050
CCTGACTCCCCTCAGCC
2195
TTCTATGGCTGAGGGG
3964




ATAGAA

AGTC






13035
13057
CCCCTCAGCCATAGAAG
2196
TGGGGCCTTCTATGGC
3965




GCCCCA

TGAG






13036
13058
CCCTCAGCCATAGAAGG
2197
GTGGGGCCTTCTATGG
3966




CCCCAC

CTGA






13037
13059
CCTCAGCCATAGAAGGC
2198
GGTGGGGCCTTCTATG
3967




CCCACC

GCTG






13043
13065
CCATAGAAGGCCCCACC
2199
GACTGGGGTGGGGCCT
3968




CCAGTC

TCTA






13053
13075
CCCCACCCCAGTCTCAG
2200
GTAGGGCTGAGACTGG
3969




CCCTAC

GGTG






13054
13076
CCCACCCCAGTCTCAGC
2201
AGTAGGGCTGAGACTG
3970




CCTACT

GGGT






13055
13077
CCACCCCAGTCTCAGCC
2202
GAGTAGGGCTGAGACT
3971




CTACTC

GGGG






13058
13080
CCCCAGTCTCAGCCCTA
2203
GTGGAGTAGGGCTGA
3972




CTCCAC

GACTG






13059
13081
CCCAGTCTCAGCCCTAC
2204
AGTGGAGTAGGGCTG
3973




TCCACT

AGACT






13060
13082
CCAGTCTCAGCCCTACT
2205
GAGTGGAGTAGGGCT
3974




CCACTC

GAGAC






13070
13092
CCCTACTCCACTCAAGC
2206
TATAGTGCTTGAGTGG
3975




ACTATA

AGTA






13071
13093
CCTACTCCACTCAAGCA
2207
CTATAGTGCTTGAGTG
3976




CTATAG

GAGT






13077
13099
CCACTCAAGCACTATAG
2208
CTACAACTATAGTGCT
3977




TTGTAG

TGAG






13119
13141
CCGCTTCCACCCCCTAG
2209
TTTCTGCTAGGGGGTG
3978




CAGAAA

GAAG






13125
13147
CCACCCCCTAGCAGAAA
2210
GGCTATTTTCTGCTAG
3979




ATAGCC

GGGG






13128
13150
CCCCCTAGCAGAAAATA
2211
GTGGGCTATTTTCTGC
3980




GCCCAC

TAGG






13129
13151
CCCCTAGCAGAAAATAG
2212
AGTGGGCTATTTTCTG
3981




CCCACT

CTAG






13130
13152
CCCTAGCAGAAAATAGC
2213
TAGTGGGCTATTTTCT
3982




CCACTA

GCTA






13131
13153
CCTAGCAGAAAATAGCC
2214
TAGTGGCCTATTTTC
3983




CACTAA

TGCT






13146
13168
CCCACTAATCCAAACTC
2215
GTGTTAGAGTTTGGAT
3984




TAACAC

TAGT






13147
13169
CCACTAATCCAAACTCT
2216
AGTGTTAGAGTTTGGA
3985




AACACT

TTAG






13155
13177
CCAAACTCTAACACTAT
2217
CTAAGCATAGTGTTAG
3986




GCTTAG

AGTT






13187
13209
CCACTCTGTTCGCAGCA
2218
GCAGACTGCTGCGAAC
3987




GTCTGC

AGAG






13211
13233
CCCTTACACAAAATGAC
2219
TTTGATGTCATTTTGTG
3988




ATCAAA

TAA






13212
13234
CCTTACACAAAATGACA
2220
TTTTGATGTCATTTTGT
3989




TCAAAA

GTA






13244
13266
CCTTCTCCACTTCAAGT
2221
TAGTTGACTTGAAGTG
3990




CAACTA

GAGA






13250
13272
CCACTTCAAGTCAACTA
2222
GAGTCCTAGTTGACTT
3991




GGACTC

GAAG






13296
13318
CCAACCACACCTAGCAT
2223
GCAGGAATGCTAGGTG
3992




TCCTGC

TGGT






13300
13322
CCACACCTAGCATTCCT
2224
ATGTGCAGGAATGCTA
3993




GCACAT

GGTG






13305
13327
CCTAGCATTCCTGCACA
2225
TACAGATGTGCAGGAA
3994




TCTGTA

TGCT






13314
13336
CCTGCACATCTGTACCC
2226
AGGCGTGGGTACAGAT
3995




ACGCCT

GTGC






13328
13350
CCCACGCCTTCTTCAAA
2227
TATGGCTTTGAAGAAG
3996




GCCATA

GCGT






13329
13351
CCACGCCTTCTTCAAAG
2228
GTATGGCTTTGAAGAA
3997




CCATAC

GGCG






13334
13356
CCTTCTTCAAAGCCATA
2229
AAATAGTATGGCTTTG
3998




CTATTT

AAGA






13346
13368
CCATACTATTTATGTGC
2230
CCCGGAGCACATAAAT
3999




TCCGGG

AGTA






13364
13386
CCGGGTCCATCATCCAC
2231
AAGGTTGTGGATGATG
4000




AACCTT

GACC






13370
13392
CCATCATCCACAACCTT
2232
ATTGTTAAGGTTGTGG
4001




AACAAT

ATGA






13377
13399
CCACAACCTTAACAATG
2233
CTTGTTCATTGTTAAG
4002




AACAAG

GTTG






13383
13405
CCTTAACAATGAACAAG
2234
GAATATCTTGTTCATT
4003




ATATTC

GTTA






13430
13452
CCATACCTCTCACTTCA
2235
GGAGGTTGAAGTGAG
4004




ACCTCC

AGGTA






13435
13457
CCTCTCACTTCAACCTC
2236
GTGAGGGAGGTTGAA
4005




CCTCAC

GTGAG






13448
13470
CCTCCCTCACCATTGGC
2237
TAGGCTGCCAATGGTG
4006




AGCCTA

AGGG






13451
13473
CCCTCACCATTGGCAGC
2238
TGCTAGGCTGCCAATG
4007




CTAGCA

GTGA






13452
13474
CCTCACCATTGGCAGCC
2239
ATGCTAGGCTGCCAAT
4008




TAGCAT

GGTG






13457
13479
CCATTGGCAGCCTAGCA
2240
TGCTAATGCTAGGCTG
4009




TTAGCA

CCAA






13467
13489
CCTAGCATTAGCAGGAA
2241
AAGGTATTCCTGCTAA
4010




TACCTT

TGCT






13486
13508
CCTTTCCTCACAGGTTT
2242
GAGTAGAAACCTGTGA
4011




CTACTC

GGAA






13491
13513
CCTCACAGGTTTCTACT
2243
CTTTGGAGTAGAAACC
4012




CCAAAG

TGTG






13508
13530
CCAAAGACCACATCATC
2244
GGTTTCGATGATGTGG
4013




GAAACC

TCTT






13515
13537
CCACATCATCGAAACCG
2245
TGTTTGCGGTTTCGAT
4014




CAAACA

GATG






13529
13551
CCGCAAACATATCATAC
2246
GTTTGTGTATGATATG
4015




ACAAAC

TTTG






13553
13575
CCTGAGCCCTATCTATT
2247
GAGAGTAATAGATAG
4016




ACTCTC

GGCTC






13559
13581
CCCTATCTATTACTCTC
2248
AGCGATGAGAGTAAT
4017




ATCGCT

AGATA






13560
13582
CCTATCTATTACTCTCAT
2249
TAGCGATGAGAGTAAT
4018




CGCTA

AGAT






13583
13605
CCTCCCTGACAAGCGCC
2250
GCTATAGGCGCTTGTC
4019




TATAGC

AGGG






13586
13608
CCCTGACAAGCGCCTAT
2251
AGTGCTATAGGCGCTT
4020




AGCACT

GTCA






13587
13609
CCTGACAAGCGCCTATA
2252
GAGTGCTATAGGCGCT
4021




GCACTC

TGTC






13598
13620
CCTATAGCACTCGAATA
2253
AAGAATTATTCGAGTG
4022




ATTCTT

CTAT






13625
13647
CCCTAACAGGTCAACCT
2254
GAAGCGAGGTTGACCT
4023




CGCTTC

GTTA






13626
13648
CCTAACAGGTCAACCTC
2255
GGAAGCGAGGTTGAC
4024




GCTTCC

CTGTT






13639
13661
CCTCGCTTCCCCACC
2256
TAGTAAGGGTGGGGA
4025




TACTAA

AGCG






13647
13669
CCCCACCCTTACTAACA
2257
CGTTAATGTTAGTAAG
4026




TTAACG

GGTG






13648
13670
CCCACCCTTACTAACAT
2258
TCGTTAATGTTAGTAA
4027




TAACGA

GGGT






13649
13671
CCACCCTTACTAACATT
2259
TTCGTTAATGTTAGTA
4028




AACGAA

AGGG






13652
13674
CCCTTACTAACATTAAC
2260
ATTTTCGTTAATGTTA
4029




GAAAAT

GTAA






13653
13675
CCTTACTAACATTAACG
2261
TATTTTCGTTAATGTTA
4030




AAAATA

GTA






13677
13699
CCCCACCCTACTATACC
2262
TAATGGGGTTTAGTAG
4031




CCATTA

GGTG






13678
13700
CCCACCCTACTAAACCC
2263
TTAATGGGGTTTAGTA
4032




CATTAA

GGGT






13679
13701
CCACCCTACTAAACCCC
2264
TTTAATGGGGTTTAGT
4033




ATTAAA

AGGG






13682
13704
CCCTACTAAACCCCATT
2265
GCGTTTAATGGGGTTT
4034




AAACGC

AGTA






13683
13705
CCTACTAAACCCCATTA
2266
GGCGTTTAATGGGGTT
4035




AACGCC

TAGT






13692
13714
CCCCATTAAACGCCTGG
2267
CGGCTGCCAGGCGTTT
4036




CAGCCG

AATG






13693
13715
CCCATTAAACGCCTGGC
2268
CCGGCTGCCAGGCGTT
4037




AGCCGG

TAAT






13694
13716
CCATTAAACGCCTGGCA
2269
TCCGGCTGCCAGGCGT
4038




GCCGGA

TTAA






13704
13726
CCTGGCAGCCGGAAGCC
2270
CGAATAGGCTTCCGGC
4039




TATTCG

TGCC






13712
13734
CCGGAAGCCTATTCGCA
2271
AAATCCTGCGAATAGG
4040




GGATTT

CTTC






13719
13741
CCTATTCGCAGGATTTC
2272
TAATGAGAAATCCTGC
4041




TCATTA

GAAT






13754
13776
CCCCCGCATCCCCCTTC
2773
TGTTTGGAAGGGGGAT
4042




CAAACA

GCGG






13755
13777
CCCCGCATCCCCCTTCC
2274
TTGTTTGGAAGGGGGA
4043




AAACAA

TGCG






13756
13778
CCCGCATCCCCCTTCCA
7275
GTTGTTTGGAAGGGGG
4044




AACAAC

ATGC






13757
13779
CCGCATCCCCCTTCCAA
2276
TGTTGTTTGGAAGGGG
4045




ACAACA

GATG






13763
13785
CCCCCTTCCAAACAACA
2277
GGGGATTGTTGTTTGG
4046




ATCCCC

AAGG






13764
13786
CCCCTTCCAAACAACAA
2278
GGGGGATTGTTGTTTG
4047




TCCCCC

GAAG






13765
13787
CCCTTCCAAACAACAAT
2279
AGGGGGATTGTTGTTT
4048




CCCCCT

GGAA






13766
13788
CCTTCCAAACAACAATC
2280
GAGGGGGATTGTTGTT
4049




CCCCTC

TGGA






13770
13792
CCAAACAACAATCCCCC
2281
GGTAGAGGGGGATTGT
4050




TCTACC

TGTT






13782
13804
CCCCCTCTACCTAAAAC
2282
CTGTGAGTTTTAGGTA
4051




TCACAG

GAGG






13783
13805
CCCCTCTACCTAAAACT
2283
GCTGTGAGTTTTAGGT
4052




CACAGC

AGAG






13784
13806
CCCTCTACCTAAAACTC
2284
GGCTGTGAGTTTTAGG
4053




ACAGCC

TAGA






13785
13807
CCTCTACCTAAAACTCA
2285
GGGCTGTGAGTTTTAG
4054




CAGCCC

GTAG






13791
13813
CCTAAAACTCACAGCCC
2286
CAGCGAGGGCTGTGA
4055




TCGCTG

GTTTT






13805
13827
CCCTCGCTGTCACTTTC
2287
TCCTAGGAAAGTGACA
4056




CTAGGA

GCGA






13806
13828
CCTCGCTGTCACTTTCCT
2288
GTCCTAGGAAAGTGAC
4057




AGGAC

AGCG






13821
13843
CCTAGGACTTCTAACAG
2289
CTAGGGCTGTTAGAAG
4058




CCCTAG

TCCT






13838
13860
CCCTAGACCTCAACTAC
2290
GGTTAGGTAGTTGAGG
4059




CTAACC

TCTA






13839
13861
CCTAGACCTCAACTACC
2291
TGGTTAGGTAGTTGAG
4060




TAACCA

GTCT






13845
13867
CCTCAACTACCTAACCA
2292
GTTTGTTGGTTAGGTA
4061




ACAAAC

GTTG






13854
13876
CCTAACCAACAAACTTA
2293
TTATTTTAAGTTTGTTG
4062




AAATAA

GTT






13859
13881
CCAACAAACTTAAAATA
2294
GGATTTTATTTTAAGT
4063




AAATCC

TTGT






13880
13902
CCCCACTATGCACATTT
2295
GAAATAAAATGTGCAT
4064




TATTTC

AGTG






13881
13903
CCCACTATGCACATTTT
2296
AGAAATAAAATGTGC
4065




ATTTCT

ATAGT






13882
13904
CCACTATGCACATTTTA
2297
GAGAAATAAAATGTG
4066




TTTCTC

CATAG






13904
13926
CCAACATACTCGGATTC
2298
AGGGTAGAATCCGAGT
4067




TACCCT

ATGT






13923
13945
CCCTAGCATCACACACC
2299
TTGTGCGGTGTGTGAT
4068




GCACAA

GCTA






13924
13946
CCTAGCATCACACACCG
2300
ATTGTGCGGTGTGTGA
4069




CACAAT

TGCT






13938
13960
CCGCACAATCCCCTATC
2301
GGCCTAGATAGGGGAT
4070




TAGGCC

TGTG






13947
13969
CCCCTATCTAGGCCTTC
2302
TCGTAAGAAGGCCTAG
4071




TTACGA

ATAG






13948
13970
CCCTATCTAGGCCTTCT
2303
CTCGTAAGAAGGCCTA
4072




TACGAG

GATA






13949
13971
CCTATCTAGGCCTTCTT
2304
GCTCGTAAGAAGGCCT
4073




ACGAGC

AGAT






13959
13981
CCTTCTTACGAGCCAAA
2305
GCAGGTTTTGGCTCGT
4074




ACCTGC

AAGA






13971
13993
CCAAAACCTGCCCCTAC
2306
GGAGGAGTAGGGGCA
4075




TCCTCC

GGTTT






13977
13999
CCTGCCCCTACTCCTCC
2307
GGTCTAGGAGGAGTA
4076




TAGACC

GGGGC






13981
14003
CCCCTACTCCTCCTAGA
2308
GTTAGGTCTAGGAGGA
4077




CCTAAC

GTAG






13982
14004
CCCTACTCCTCCTAGAC
2309
GGTTAGGTCTAGGAGG
4078




CTAACC

AGTA






13983
14005
CCTACTCCTCCTAGACC
2310
AGGTTAGGTCTAGGAG
4079




TAACCT

GAGT






13989
14011
CCTCCTAGACCTAACCT
2311
CTAGTCAGGTTAGGTC
4080




GACTAG

TAGG






13992
14014
CCTAGACCTAACCTGAC
2312
TTTCTAGTCAGGTTAG
4081




TAGAAA

GTCT






13998
14020
CCTAACCTGACTAGAAA
2313
ATAGCTTTTCTAGTCA
4082




AGCTAT

GGTT






14003
14025
CCTGACTAGAAAAGCTA
2314
AGGTAATAGCTTTTCT
4083




TTACCT

AGTC






14023
14045
CCTAAAACAATTTCACA
2315
TGGTGCTGTGAAATTG
4084




GCACCA

TTTT






14043
14065
CCAAATCTCCACCTCCA
2316
TGATGATGGAGGTGGA
4085




TCATCA

GATT






14051
14073
CCACCTCCATCATCACC
2317
GGTTGAGGTGATGATG
4086




TCAACC

GAGG






14054
14076
CCTCCATCATCACCTCA
2318
TTGGGTTGAGGTGATG
4087




ACCCAA

ATGG






14057
14079
CCATCATCACCTCAACC
2319
TTTTTGGGTTGAGGTG
4088




CAAAAA

ATGA






14066
14088
CCTCAACCCAAAAAGGC
2320
AATTATGCCTTTTTGG
4089




ATAATT

GTTG






14072
14094
CCCAAAAAGGCATAATT
2321
AAGTTTAATTATGCCT
4090




AAACTT

TTTT






14073
14095
CCAAAAAGGCATAATTA
2322
AAAGTTTAATTATGCC
4091




AACTTT

TTTT






14100
14122
CCTCTCTTTCTTCTTCCC
2323
TGAGTGGGAAGAAGA
4092




ACTCA

AAGAG






14115
14137
CCCACTCATCCTAACCC
2324
GGAGTAGGGTTAGGAT
4093




TACTCC

GAGT






14116
14138
CCACTCATCCTAACCCT
2325
AGGAGTAGGGTTAGG
4094




ACTCCT

ATGAG






14124
14146
CCTAACCCTACTCCTAA
2326
ATGTGATTAGGAGTAG
4095




TCACAT

GGTT






14129
14151
CCCTACTCCTAATCACA
2327
AGGTTATGTGATTAGG
4096




TAACCT

AGTA






14130
14152
CCTACTCCTAATCACAT
2328
TAGGTTATGTGATTAG
4097




AACCTA

GAGT






14136
14158
CCTAATCACATAACCTA
2329
GGGGAATAGGTTATGT
4098




TTCCCC

GATT






14149
14171
CCTATTCCCCCGAGCAA
2330
TTGAGATTGCTCGGGG
4099




TCTCAA

GAAT






14155
14177
CCCCCGAGCAATCTCAA
2331
TTGTAATTGAGATTGC
4100




TTACAA

TCGG






14156
14178
CCCCGAGCAATCTCAAT
2332
ATTGTAATTGAGATTG
4101




TACAAT

CTCG






14157
14179
CCCGAGCAATCTCAATT
2333
TATTGTAATTGAGATT
4102




ACAATA

GCTC






14158
14180
CCGAGCAATCTCAATTA
2334
ATATTGTAATTGAGAT
4103




CAATAT

TGCT






14186
14208
CCAACAAACAATGTTCA
2335
ACTGGTTGAACATTGT
4104




ACCAGT

TTGT






14204
14226
CCAGTAACTACTACTAA
2336
CGTTGATTAGTAGTAG
4105




TCAACG

TTAC






14227
14249
CCCATAATCATACAAAG
2337
CGGGGGCTTTGTATGA
4106




CCCCCG

TTAT






14228
14250
CCATAATCATACAAAGC
2338
GCGGGGGCTTTGTATG
4107




CCCCGC

ATTA






14244
14266
CCCCCGCACCAATAGGA
2339
GGAGGATCCTATTGGT
4108




TCCTCC

GCGG






14245
14267
CCCCGCACCAATAGGAT
2340
GGGAGGATCCTATTGG
4109




CCTCCC

TGCG






14246
14268
CCCGCACCAATAGGATC
2341
CGGGAGGATCCTATTG
4110




CTCCCG

GTGC






14247
14269
CCGCACCAATAGGATCC
2342
TCGGGAGGATCCTATT
4111




TCCCGA

GGTG






14252
14274
CCAATAGGATCCTCCCG
2343
TTGATTCGGGAGGATC
4112




AATCAA

CTAT






14262
14284
CCTCCCGAATCAACCCT
2344
GGGGTCAGGGTTGATT
4113




GACCCC

CGGG






14265
14287
CCCGAATCAACCCTGAC
2345
AGAGGGGTCAGGGTT
4114




CCCTCT

GATTC






14266
14288
CCGAATCAACCCTGACC
2346
GAGAGGGGTCAGGGT
4115




CCTCTC

TGATT






14275
14297
CCCTGACCCCTCTCCTT
2347
TTTATGAAGGAGAGGG
4116




CATAAA

GTCA






14276
14298
CCTGACCCCTCTCCTTC
2348
ATTTATGAAGGAGAGG
4117




ATAAAT

GGTC






14281
14303
CCCCTCTCCTTCATAAA
2349
GAATAATTTATGAAGG
4118




TTATTC

AGAG






14282
14304
CCCTCTCCTTCATAAAT
2350
TGAATAATTTATGAAG
4119




TATTCA

GAGA






14283
14305
CCTCTCCTTCATAAATT
2351
CTGAATAATTTATGAA
4120




ATTCAG

GGAG






14288
14310
GCTTCATAAATTATTCA
2352
GGAAGCTGAATAATTT
4121




GCTTCC

ATGA






14309
14331
CCTACACTATTAAAGTT
2353
GTGGTAAACTTTAATA
4122




TACCAC

GTGT






14328
14350
CCACAACCACCACCCCA
2354
GTATGATGGGGTGGTG
4123




TCATAC

GTTG






14334
14356
CCACCACCCCATCATAC
2355
GAAAGAGTATGATGG
4124




TCTTTC

GGTGG






14337
14359
CCACCCCATCATACTCT
2356
GGTGAAAGAGTATGAT
4125




TTCACC

GGGG






14340
14362
CCCCATCATACTCTTTT
2357
CTGGGTGAAAGAGTAT
4126




ACCCAC

GATG






14341
14363
CCCATCATACTCTTTCA
2358
TGTGGGTGAAAGAGTA
4127




CCCACA

TGAT






14342
14364
CCATCATACTCTTTCAC
2359
CTGTGGGTGAAAGAGT
4128




CCACAG

ATGA






14358
14380
CCCACAGCACCAATCCT
2360
GGAGGTAGGATTGGTG
4129




ACCTCC

CTGT






14359
14381
CCACAGCACCAATCCTA
2361
TGGAGGTAGGATTGGT
4130




CCTCCA

GCTG






14367
14389
CCAATCCTACCTCCATC
2362
GTTAGCGATGGAGGTA
4131




GCTAAC

GGAT






14372
14394
CCTACCTCCATCGCTAA
2363
GTGGGGTTAGCGATGG
4132




CCCCAC

AGGT






14376
14398
CCTCCATCGCTAACCCC
2364
TTTAGTGGGGTTAGCG
4133




ACTAAA

ATGG






14379
14401
CCATCGCTAACCCCACT
2365
TGTTTTAGTGGGGTTA
4134




AAAACA

GCGA






14389
14411
CCCCACTAAAACACTCA
2366
TCTTGGTGAGTGTTTT
4135




CCAAGA

AGTG






14390
14412
CCCACTAAAACACTCAC
2367
GTCTTGGTGAGTGTTT
4136




CAAGAC

TAGT






14391
14413
CCACTAAAACACTCACC
2368
GGTCTTGGTGAGTGTT
4137




AAGACC

TTAG






14406
14428
CCAAGACCTCAACCCCT
2369
GGGGTCAGGGGTTGA
4138




GACCCC

GGTCT






14412
14434
CCTCAACCCCTGACCCC
2370
GGCATGGGGGTCAGG
4139




CATGCC

GGTTG






14418
14440
CCCCTGACCCCCATGCC
2371
TCCTGAGGCATGGGGG
4140




TCAGGA

TCAG






14419
14441
CCCTGACCCCCATGCCT
2372
ATCCTGAGGCATGGGG
4141




CAGGAT

GTCA






14420
14442
CCTGACCCCCATGCCTC
2373
TATCCTGAGGCATGGG
4142




AGGATA

GGTC






14425
14447
CCCCCATGCCTCAGGAT
2374
AGGAGTATCCTGAGGC
4143




ACTCCT

ATGG






14426
14448
CCCCATGCCTCAGGATA
2375
GAGGAGTATCCTGAGG
4144




CTCCTC

CATG






14427
14449
CCCATGCCTCAGGATAC
2376
TGAGGAGTATCCTGAG
4145




TCCTCA

GCAT






14428
14450
CCATGCCTCAGGATACT
2377
TTGAGGAGTATCCTGA
4146




CCTCAA

GGCA






14433
14455
CCTCAGGATACTCCTCA
2378
GGCTATTGAGGAGTAT
4147




ATAGCC

CCTG






14445
14467
CCTCAATAGCCATCGCT
2379
TACTACAGCGATGGCT
4148




GTAGTA

ATTG






14454
14476
CCATCGCTGTAGTATAT
2380
CTTTGGATATACTACA
4149




CCAAAG

GCGA






14471
14493
CCAAAGACAACCATCAT
2381
GGGGGAATGATGGTTG
4150




TCCCCC

TCTT






14481
14503
CCATCATTCCCCCTAAA
2382
AATTTATTTAGGGGGA
4151




TAAATT

ATGA






14489
14511
CCCCCTAAATAAATTAA
2383
GTTTTTTTAATTTATTT
4152




AAAAAC

AGG






14490
14512
CCCCTAAATAAATTAA
2384
AGTTTTTTTAATTTATT
4153




AAAACT

TAG






14491
14513
CCCTAAATAAATTAAAA
2385
TAGTTTTTTTAATTTAT
4154




AAACTA

TTA






14492
14514
CCTAAATAAATTAAAAA
2386
ATAGTTTTTTTAATTTA
4155




AACTAT

TTT






14519
14541
CCCATATAACCTCCCCC
2387
AATTTTGGGGGAGGTT
4156




AAAATT

ATAT






14520
14542
CCATATAACCTCCCCCA
2388
GAATTTTGGGGGAGGT
4157




AAATTC

TATA






14528
14550
CCTCCCCCAAAATTCAG
2389
ATTATTCTGAATTTTG
4158




AATAAT

GGGG






14531
14553
CCCCCAAAATTCAGAAT
2390
GTTATTATTCTGAATTT
4159




AATAAC

TGG






14532
14554
CCCCAAAATTCAGAATA
2391
TGTTATTATTCTGAATT
4160




ATAACA

TTG






14533
14555
CCCAAAATTCAGAATAA
2392
GTGTTATTATTCTGAA
4161




TAACAC

TTTT






14534
14556
CCAAAATTCAGAATAAT
2393
TGTGTTATTATTCTGA
4162




AACACA

ATTT






14557
14579
CCCGACCACACCGCTAA
2394
TGATTGTTAGCGGTGT
4163




CAATCA

GGTC






14558
14580
CCGACCACACCGCTAAC
2395
TTGATTGTTAGCGGTG
4164




AATCAA

TGGT






14562
1484
CCACACCGCTAACAATC
2396
AGTATTGATTGTTAGC
4165




AATACT

GGTG






14567
14589
CCGCTAACAATCAATAC
2397
GGTTTAGTATTGATTG
4166




TAAACC

TTAG






14588
14610
CCCCCATAAATAGGAGA
2398
AAGCCTTCTCCTATTT
4167




AGGCTT

ATGG






14589
14611
CCCCATAAATAGGAGA
2399
TAAGCCTTCTCCTATTT
4168




AGGCTTA

ATG






14590
14612
CCCATAAATAGGAGAA
2400
CTAAGCCTTCTCCTAT
4169




GGCTTAG

TTAT






14591
14613
CCATAAATAGGAGAAG
2401
TCTAAGCCTTCTCCTA
4170




GCTTAGA

TTTA






14620
14642
CCCCACAAACCCCATTA
2402
GTTTAGTAATGGGGTT
4171




CTAAAC

TGTG






14621
14643
CCCACAAACCCCATTAC
2403
GGTTTAGTAATGGGGT
4172




TAAACC

TTGT






14622
14644
CCACAAACCCCATTACT
2404
GGGTTTAGTAATGGGG
4173




AAACCC

TTTG






14629
14651
CCCCATTACTAAACCCA
2405
TGAGTGTGGGTTTAGT
4174




CACTCA

AATG






14630
14652
CCCATTACTAAACCCAC
2406
TTGAGTGTGGGTTTAG
4175




ACTCAA

TAAT






14631
14653
CCATTACTAAACCCACA
2407
GTTGAGTGTGGGTTTA
4176




CTCAAC

GTAA






14642
14664
CCCACACTCAACAGAAA
2408
GCTTTGTTTCTGTTGA
4177




CAAAGC

GTGT






14643
14665
CCACACTCAACAGAAAC
2409
TGCTTTGTTTCTGTTGA
4178




AAAGCA

GTG






14694
14716
CCACGACCAATGATATG
2410
GTTTTTCATATCATTG
4179




AAAAAC

GTCG






14700
14722
CCAATGATATGAAAAAC
2411
ACGATGGTTTTTCATA
4180




CATCGT

TCAT






14716
14738
CCATCTTGTATTTCAA
2412
TTGTAGTTGAAATACA
4181




CTACAA

ACGA






14744
14766
CCAATGACCCCAATACG
2413
GTTTTGCGTATTGGGG
4182




CAAAAC

TCAT






14751
14773
CCCCAATACGCAAAACT
2414
GGGGTTAGTTTTGCGT
4183




AACCCC

ATTG






14752
14774
CCCAATACGCAAAACTA
2415
GGGGGTTAGTTTTGCG
4184




ACCCCC

TATT






14753
14775
CCAATACGCAAAACTAA
2416
AGGGGGTTAGTTTTGC
4185




CCCCCT

GTAT






14770
14792
CCCCCTAATAAAATTAA
2417
GGTTAATTAATTTTAT
4186




TTAACC

TAGG






14771
14793
CCCCTAATAAAATTAAT
2418
TGGTTAATTAATTTTA
4187




TAACCA

TTAG






14772
14794
CCCTAATAAAATTAATT
2419
GTGGTTAATTAATTTT
4188




AACCAC

ATTA






14773
14795
CCTAATAAAATFAATTA
2420
AGTGGTTAATTAATTT
4189




ACCACT

TATT






14791
14813
CCACTCATTCATCGACC
2421
TGGGGAGGTCGATGA
4190




TCCCCA

ATGAG






14806
14828
CCTCCCCACCCCATCCA
2422
AGATGTTGGATGGGGT
4191




ACATCT

GGGG






14809
14831
CCCCACCCCATCCAACA
2423
CGGAGATGTTGGATGG
4192




TCTCCG

GGTG






14810
14832
CCCACCCCATCCAACAT
2424
GCGGAGATGTTGGATG
4193




CTCCGC

GGGT






14811
14833
CCACCCCATCCAACATC
2425
TGCGGAGATGTTGGAT
4194




TCCGCA

GGGG






14814
14836
CCCCATCCAACATCTCC
2426
TCATGCGGAGATGTTG
4195




GCATGA

GATG






14815
14837
CCCATCCAACATCTCCG
2427
ATCATGCGGAGATGTT
4196




CATGAT

GGAT






14816
14838
CCATCCAACATCTCCGC
2428
CATCATGCGGAGATGT
4197




ATGATG

TGGA






14820
14842
CCAACATCTCCGCATGA
2429
GTTTCATCATGCGGAG
4198




TGAAAC

ATGT






14829
14851
CCGCATGATGAAACTTC
2430
TGAGCCGAAGTTTCAT
4199




GGCTCA

CATG






14854
14876
CCTTGGCGCCTGCCTGA
2431
GGAGGATCAGGCAGG
4200




TCCTCC

CGCCA






14862
14884
CCTGCCTGATCCTCCAA
2432
GGTGATTTGGAGGATC
4201




ATCACC

AGGC






14866
14888
CCTGATCCTCCAAATCA
2433
CTGTGGTGATTTGGAG
4202




CCACAG

GATC






14872
14894
CCTCCAAATCACCACAG
2434
ATAGTCCTGTGGTGAT
4203




GACTAT

TTGG






14875
14897
CCAAATCACCACAGGAC
2435
GGAATAGTCCTGTGGT
4204




TATTCC

GATT






14883
14905
CCACAGGACTATTCCTA
2436
CATGGCTAGGAATAGT
4205




GCCATG

CCTG






14896
14918
CCTAGCCATGCACTACT
2437
CTGGTGAGTAGTGCAT
4206




CACCAG

GGCT






14901
14923
CCATGCACTACTCACCA
2438
GGCGTCTGGTGAGTAG
4207




GACGCC

TGCA






14915
14937
CCAGACGCCTCAACCGC
2439
GAAAAGGCGGTTGAG
4208




CTTTTC

GCGTC






14922
14944
CCTCAACCGCCTTTTCA
2440
GATTGATGAAAAGGC
4209




TCAATC

GGTTG






14928
14950
CCGCCTTTTCATCAATC
2441
GTGGGCGATTGATGAA
4210




GCCCAC

AAGG






14931
14953
CCTTTTCATCAATCGCC
2442
GATGTGGGCGATTGAT
4211




CACATC

GAAA






14946
14968
CCCACATCACTCGAGAC
2443
ATTTACGTCTCGAGTG
4212




GTAAAT

ATGT






14947
14969
CCACATCACTCGAGACG
2444
AATTTACGTCTCGAGT
4213




TAAATT

GATG






14983
15005
CCGCTACCTTCACGCCA
2445
CGCCATTGGCGTGAAG
4214




ATGGCG

GTAG






14989
15011
CCTTCACGCCAATGGCG
2446
TTGAGGCGCCATTGGC
4215




CCTCAA

GTGA






14997
15019
CCAATGGCGCCTCAATA
2447
AAAGAATATTGAGGC
4216




TTCTTT

GCCAT






15006
15028
CCTCAATATTCTTTATCT
2448
GAGGCAGATAAAGAA
4217




GCCTC

TATTG






15025
15047
CCTCTTCCTACACATCG
2449
CTCGCCCGATGTGTAG
4218




GGCGAG

GAAG






15031
15053
CCTACACATCGGGCGAG
2450
ATAGGCCTCGCCCGAT
4219




GCCTAT

GTGT






15049
15071
CCTATATTACGGATCAT
2451
AGAGAAATGATCCGTA
4220




TTCTCT

ATAT






15081
15103
CCTGAAACTTCGGCATT
2452
GAGGATAATGCCGATG
4221




ATCCTC

TTTC






15100
15122
CCTCCTGCTTGCAACTA
2453
TTGCTATAGTTGCAAG
4222




TAGCAA

CAGG






15103
15125
CCTGCTTGCAACTATAG
2454
CTGTTGCTATAGTTGC
4223




CAACAG

AAGC






15126
15148
CCTTCATAGGCTATGTC
2455
CGGGAGGACATAGCCT
4224




CTCCCG

ATGA






15142
15164
CCTCCCGTGAGGCCAAA
2456
ATGATATTTGGCCTCA
4225




TATCAT

CGGG






15145
15167
CCCGTGAGGCCAAATAT
2457
AGAATGATATTTGGCC
4226




CATTCT

TCAC






15146
15168
CCGTGAGGCCAAATATC
2458
CAGAATGATATTTGGC
4227




ATTCTG

CTCA






15154
15176
CCAAATATCATTCTGAG
2459
TGGCCCCTCAGAATGA
4228




GGGCCA

TATT






15174
15196
CCACAGTAATTACAAAC
2460
TAGTAAGTTTGTAATT
4229




TTACTA

ACTG






15198
15220
CCGCCATCCCATACATT
2461
TGTCCCAATGTATGGG
4230




GGGACA

ATGG






15201
15223
CCATCCCATACATTGGG
2462
GTCTGTCCCAATGTAT
4231




ACAGAC

GGGA






15205
15727
CCCATACATTGGGACAG
2463
CTAGGTCTGTCCCAAT
4232




ACCTAG

GTAT






15206
15228
CCATACATTGGGACAGA
2464
ACTAGGTCTGTCCCAA
4233




CCTAGT

TGTA






15223
15245
CCTAGTTCAATGAATCT
2465
CTCCTCAGATTCATTG
4234




GAGGAG

AACT






15263
15285
CCCACCCTCACACGATT
2466
GTAAAGAATCGTGTGA
4235




CTTTAC

GGGT






15264
15286
CCACCCTCACACGATTC
2467
GGTAAAGAATCGTGTG
4236




TTTACC

AGGG






15267
15289
CCCTCACACGATTCTTT
2468
AAAGGTAAAGAATCG
4237




ACCTTT

TGTGA






15268
15290
CCTCACACGATTCTTTA
2469
GAAAGGTAAAGAATC
4238




CCTTTC

GTGTG






15285
15307
CCTTTCACTTCATCTTGC
2470
GAAGGGCAAGATGAA
4239




CCTTC

GTGAA






15302
15324
CCCTTCATTATTGCAGC
2471
GCTAGGGCTGCAATAA
4240




CCTAGC

TGAA






15303
15325
CCTTCATTATTGCAGCC
2472
TGCTAGGGCTGCAATA
4241




CTAGCA

ATGA






15318
15340
CCCTAGCAACACTCCAC
2473
TAGGAGGTGGAGTGTT
4242




CTCCTA

GCTA






15319
15341
CCTAGCAACACTCCACC
2474
ATAGGAGGTGGAGTGT
4243




TCCTAT

TGCT






15331
15353
CCACCTCCTATTCTTGC
2475
TTTCGTGCAAGAATAG
4244




ACGAAA

GAGG






15334
15356
CCTCCTATTCTTGCACG
2476
CCGTTTCGTGCAAGAA
4245




AAACGG

TAGG






15337
15359
CCTATTCTTGCACGAAA
2477
ATCCCGTTTCGTGCAA
4246




CGGGAT

GAAT






15367
15389
CCCCCTAGGAATCACCT
2478
AATGGGAGGTGATTCC
4247




CCCATT

TAGG






15368
15390
CCCCTAGGAATCACCTC
2479
GAATGGGAGGTGATTC
4248




CCATTC

CTAG






15369
15391
CCCTAGGAATCACCTCC
2480
GGAATGGGAGGTGATT
4249




CATTCC

CCTA






15370
15392
CCTAGGAATCACCTCCC
2481
CGGAATGGGAGGTGA
4250




ATTCCG

TTCCT






15381
15403
CCTCCCATTCCGATAAA
2482
GGTGATTTTATCGGAA
4251




ATCACC

TGGG






15384
15406
CCCATTCCGATAAAATC
2483
GAAGGTGATTTTATCG
4252




ACCTTC

GAAT






15385
15407
CCATTCCGATAAAATCA
2484
GGAAGGTGATTTTATC
4253




CCTTCC

GGAA






15390
15412
CCGATAAAATCACCTTC
2485
AGGGTGGAAGGTGATT
4254




CACCCT

TTAT






15402
15424
CCTTCCACCCTTACTAC
2486
GATTGTGTAGTAAGGG
4255




ACAATC

TGGA






15406
15428
CCACCCTTACTACACAA
2487
CTTTGATTGTGTAGTA
4256




TCAAAG

AGGG






15409
15431
CCCTTACTACACAATCA
2488
CGTCTTTGATTGTGTA
4257




AAGACG

GTAA






15410
15432
CCTTACTACACAATCAA
2489
GCGTCTTTGATTGTGT
4258




AGACGC

AGTA






15432
15454
CCCTCGGCTTACTTCTCT
2490
AAGGAAGAGAAGTAA
4259




TCCTT

GCCGA






15433
15455
CCTCGGCTTACTTCTCTT
2491
GAAGGAAGAGAAGTA
4260




CCTTC

AGCCG






15451
15473
CCTTCTCTCCTTAATGA
2492
TTAATGTCATTAAGGA
4261




CATTAA

GAGA






15459
15481
CCTTAATGACATTAACA
2493
GAATAGTGTTAATGTC
4262




CTATTC

ATTA






15485
15507
CCAGACCTCCTAGGCGA
2494
TCTGGGTCGCCTAGGA
4263




CCCAGA

GGTC






15490
15512
CCTCCTAGGCGACCCAG
2495
AATTGTCTGGGTCGCC
4264




ACAATT

TAGG






15493
15515
CCTAGGCGACCCAGACA
2496
TATAATTGTCTGGGTC
4265




ATTATA

GCCT






15502
15524
CCCAGACAATTATACCC
2497
TGGCTAGGGTATAATT
4266




TAGCCA

GTCT






15503
15525
CCAGACAATTATACCCT
2498
TTGGCTAGGGTATAAT
4267




AGCCAA

TGTC






15516
15538
CCCTAGCCAACCCCTTA
2499
GGTGTTTAAGGGGTTG
4268




AACACC

GCTA






15517
15539
CCTAGCCAACCCCTTAA
2500
GGGTGTTTAAGGGGTT
4269




ACACCC

GGCT






15522
15544
CCAACCCCTTAAACACC
2501
GGGAGGGGTGTTTAAG
4270




CCTCCC

GGGT






15526
15548
CCCCTTAAACACCCCTC
2502
TGTGGGGAGGGGTGTT
4271




CCCACA

TAAG






15527
15549
CCCTTAAACACCCCTCC
2503
ATGTGGGGAGGGGTGT
4272




CCACAT

TTAA






15528
15550
CCTTAAACACCCCTCCC
2504
GATGTGGGGAGGGGT
4273




CACATC

GTTTA






15537
15559
CCCCTCCCCACATCAAG
2505
TTCGGGCTTGATGTGG
4274




CCCGAA

GGAG






15538
15560
CCCTCCCCACATCAAGC
2506
ATTCGGGCTTGATGTG
4275




CCGAAT

GGGA






15539
15561
CCTCCCCACATCAAGCC
2507
CATTCGGGCTTGATGT
4276




CGAATG

GGGG






15542
15564
CCCCACATCAAGCCCGA
2508
TATCATTCGGGCTTGA
4277




ATGATA

TGTG






15543
15565
CCCACATCAAGCCCGAA
2509
ATATCATTCGGGCTTG
4278




TGATAT

ATGT






15544
15566
CCACATCAAGCCCGAAT
2510
AATATCATCGGGCTT
4279




GATATT

GATG






15554
15576
CCCGAATGATATTTCCT
2511
GCGAATAGGAAATATC
4280




ATTCGC

ATTC






15555
15577
CCGAATGATATTTCCTA
2512
GGCGAATAGGAAATA
4281




TTCGCC

TCATT






15568
15590
CCTATTCGCCTACACAA
2513
GGAGAATTGTGTAGGC
4282




TTCTCC

GAAT






15576
15598
CCTACACAATTCTCCGA
2514
GACGGATCGGAGAATT
4283




GTGTGT

GTGT






15589
15611
CCGATCCGTCCCTAACA
2515
CTAGTTTGTTAGGGAC
4284




AACTAG

GGAT






15594
15616
CCGTCCCTAACAAACTA
2516
GCCTCCTAGTTTGTTA
4285




GGAGGC

GGGA






15598
15620
CCCTAACAAACTAGGAG
2517
GGACGCCTCCTAGTTT
4286




GCGTCC

GTTA






15599
15621
CCTAACAAACTAGGAG
2518
AGGACGCCTCCTAGTT
4287




GCGTCCT

TGTT






15619
15641
CCTTGCCCTATTACTAT
2519
GGATGGATAGTAATAG
4288




CCATCC

GGCA






15624
15646
CCCTATTACTATCCATC
2520
GATGAGGATGGATAGT
4289




CTCATC

AATA






15625
15647
CCTATTACTATCCATCC
2521
GGATGAGGATGGATA
4290




TCATCC

GTAAT






15636
15658
CCATCCTCATCCTAGCA
2522
GATTATTGCTAGGATG
4291




ATAATC

AGGA






15640
15662
CCTCATCCTAGCAATAA
2523
TGGGGATTATTGCTAG
4292




TCCCCA

GATG






15646
15668
CCTAGCAATAATCCCCA
2524
GGAGGATGGGGATTAT
4293




TCCTCC

TGCT






15658
15680
CCCCATCCTCCATATAT
2525
GTTTGGATATATGGAG
4294




CCAAAC

GATG






15659
15681
CCCATCCTCCATATATC
2526
TGTTTGGATATATGGA
4295




CAAACA

GGAT






15660
15682
CCATCCTCCATATATCC
2527
TTGTTTGGATATATGG
4296




AAACAA

AGGA






15664
15686
CCTCCATATATCCAAAC
2528
TTTGTTGTTTGGATAT
4297




AACAAA

ATGG






15667
15689
CCATATATCCAAACAAC
2529
TGCTTTGTTGTTTGGAT
4298




AAAGCA

ATA






15675
15697
CCAAACAACAAAGCAT
2530
AAATATTATGCTTTGT
4299




AATATTT

TGTT






15700
15722
CCCACTAAGCCAATCAC
2531
AATAAAGTGATTGGCT
4300




TTTATT

TAGT






15701
15723
CCACTAACCAATCACT
2532
CAATAAAGTGATTGGC
4301




TTATTG

TTAG






15709
15731
CCAATCACTTTATTGAC
2533
CTAGGAGTCAATAAAG
4302




TCCTAG

TGAT






15727
15749
CCTAGCCGCAGACCTCC
2534
GAATGAGGAGGTCTGC
4303




TCATTC

GGCT






15732
15754
CCGCAGACCTCCTCATT
2535
GGTTAGAATGAGGAG
4304




CTAACC

GTCTG






15739
15761
CCTCCTCATTCTAACCT
2536
CGATTCAGGTTAGAAT
4305




GAATCG

GAGG






15742
15764
CCTCATTCTAACCTGAA
2537
CTCCGATTCAGGTTAG
4306




TCGGAG

AATG






15753
15775
CCTGAATCGGAGGACA
2538
TACTGGTTGTCCTCCG
4307




ACCAGTA

ATTC






15770
15792
CCAGTAAGCTACCCTTT
2539
ATGGTAAAAGGGTAG
4308




TACCAT

CTTAC






15781
15803
CCCTTTTACCATCATTG
2540
CTTGTCCAATGATGGT
4309




GACAAG

AAAA






15782
15804
CCTTTTACCATCATTGG
2541
ACTTGTCCAATGATGG
4310




ACAAGT

TAAA






15789
15811
CCATCATTGGACAAGTA
2542
GGATGCTACTTGTCCA
4311




GCATCC

ATGA






15810
15832
CCGTACTATACTTCACA
2543
GATTGTTGTGAAGTAT
4312




ACAATC

AGTA






15832
15854
CCTAATCCTAATACCAA
2544
AGATAGTTGGTATTAG
4313




CTATCT

GATT






15838
15860
CCTAATACCAACTATCT
2545
TTAGGGAGATAGTTGG
4314




CCCTAA

TATT






15845
15867
CCAACTATCTCCCTAAT
2546
TTTTCAATTAGGGAGA
4315




TGAAAA

TAGT






15855
15877
CCCTAATTGAAAACAAA
2547
GAGTATTTTGTTTTCA
4316




ATACTC

ATTA






15856
15878
CCTAATTGAAAACAAAA
2548
TGAGTATTTTGTTTTCA
4317




TACTCA

ATT






15885
15907
CCTGTCCTTGTAGTATA
2549
TTAGTTTATACTACAA
4318




AACTAA

GGAC






15890
15912
CCTTGTAGTATAAACTA
2550
GTGTATTAGTTTATAC
4319




ATACAC

TACA






15912
15934
CCAGTCTTGTAAACCGG
2551
TCATCTCCGGTTTACA
4320




AGATGA

AGAC






15925
15947
CCGGAGATGAAAACCTT
2552
TGGAAAAAGGTTTTCA
4321




TTTCCA

TCTC






15938
15960
CCTTTTTCCAAGGACAA
2553
TCTGATTTGTCCTTGG
4322




ATCAGA

AAAA






15945
15967
CCAAGGACAAATCAGA
2554
CTTTTTCTCTGATTTGT
4323




GAAAAAG

CCT






15977
15999
CCACCATTAGCACCCAA
2555
TTAGCTTTGGGTGCTA
4324




AGCTAA

ATGG






15980
16002
CCATTAGCACCCAAAGC
2556
ATCTTAGCTTTGGGTG
4325




TAAGAT

CTAA






15989
16011
CCCAAAGCTAAGATTCT
2557
TAAATTAGAATCTTAG
4326




AATTTA

CTTT






15990
16012
CCAAAGCTAAGATTCTA
2558
TTAAATTAGAATCTTA
4327




ATTTAA

GCTT






16052
16074
CCACCCAAGTATTGACT
2559
TGGGTGAGTCAATACT
4328




CACCCA

TGGG






16055
16077
CCCAAGTATTGACTCAC
2560
TGATGGGTGAGTCAAT
4329




CCATCA

ACTT






16056
16078
CCAAGTATTGACTCACC
2561
TTGATGGGTGAGTCAA
4330




CATCAA

TACT






16071
16093
CCCATCAACAACCGCTA
2562
AATACATAGCGGTTGT
4331




TGTATT

TGAT






16072
16094
CCATCAACAACCGCTAT
2563
AAATACATAGCGGTTG
4332




GTATTT

TTGA






16082
16104
CCGCTATGTATTTCGTA
2564
GTAATGTACGAAATAC
4333




CATTAC

ATAG






16107
16129
CCAGCCACCATGAATAT
2565
CGTACAATATTCATGG
4334




TGTACG

TGGC






16111
16133
CCACCATGAATATTGTA
2566
GTACCGTACAATATTC
4335




CGGTAC

ATGG






16114
16136
CCATGAATATTGTACGG
2567
ATGGTACCGTACAATA
4336




TACCAT

TTCA






16133
16155
CCATAAATACTTGACCA
2568
TACAGGTGGTCAAGTA
4337




CCTGTA

TTTA






16147
16169
CCACCTGTAGTACATAA
2569
GGGTTTTTATGTACTA
4338




AAACCC

CAGG






16150
16172
CCTGTAGTACATAAAAA
2570
ATTGGGTTTTTATGTA
4339




CCCAAT

CTAC






16167
16189
CCCAATCCACATCAAAA
2571
AGGGGGTTTTGATGTG
4340




CCCCCT

GATT






16168
16190
CCAATCCACATCAAAAC
2572
GAGGGGGTTTTGATGT
4341




CCCCTC

GGAT






16173
16195
CCACATCAAAACCCCCT
2573
ATGGGGAGGGGGTTTT
4342




CCCCAT

GATG






16184
16206
CCCCCTCCCCATGCTTA
2574
TGCTTGTAAGCATGGG
4343




CAAGCA

GG






16185
16207
CCCCTCCCCATGCTTAC
2575
TTGCTTGTAAGCATGG
4344




AAGCAA

GGAG






16186
16208
CCCTCCCCATGCTTACA
2576
CTTGCTTGTAAGCATG
4345




AGCAAG

GGGA






16187
16209
CCTCCCCATGCTTACAA
2577
ACTTGCTTGTAAGCAT
4346




GCAAGT

GGGG






16190
16212
CCCCATGCTTACAAGCA
2578
TGTACTTGCTTGTAAG
4347




AGTACA

CATG






16191
16213
CCCATGCTTACAAGCAA
2579
CTGTACTTGCTTGTAA
4348




GTACAG

GCAT






16192
16214
CCATGCTTACAAGCAAG
2580
GCTGTACTTGCTTGTA
4349




TACAGC

AGCA






16221
16243
CCCTCAACTATCACACA
2581
AGTTGATGTGTGATAG
4350




TCAACT

TTGA






16222
16244
CCTCAACTATCACACAT
2582
CAGTTGATGTGTGATA
4351




CAACTG

GTTG






16250
16272
CCAAAGCCACCCCTCAC
2583
TAGTGGGTGAGGGGTG
4352




CCACTA

GCTT






16256
16278
CCACCCCTCACCCACTA
2584
GTATCCTAGTGGGTGA
4353




GGATAC

GGGG






16259
16281
CCCCTCACCCACTAGGA
2585
TTGGTATCCTAGTGGG
4354




TACCAA

TGAG






16260
16282
CCCTCACCCACTAGGAT
2586
GTTGGTATCCTAGTGG
4355




ACCAAC

GTGA






16261
16283
CCTCACCCACTAGGATA
2587
TGTTGGTATCCTAGTG
4356




CCAACA

GGTG






16266
16288
CCCACTAGGATACCAAC
2588
AGGTTTGTTGGTATCC
4357




AAACCT

TAGT






16267
16289
CCACTAGGATACCAACA
2589
TAGGTTTGTTGGTATC
4358




AACCTA

CTAG






16278
16300
CCAACAAACCTACCCAC
2590
TTAAGGGTGGGTAGGT
4359




CCTTAA

TTGT






16286
16308
CCTACCCACCCTTAACA
2591
ATGTACTGTTAAGGGT
4360




GTACAT

GGGT






16290
16312
CCCACCCTTAACAGTAC
2592
TACTATGTACTGTTAA
4361




ATAGTA

GGGT






16291
16313
CCACCCTTAACAGTACA
2593
GTACTATGTACTGTTA
4362




TAGTAC

AGGG






16294
16316
CCCTTAACAGTACATAG
2594
TATGTACTATGTACTG
4363




TACATA

TTAA






16295
16317
CCTTAACAGTACATAGT
2595
TTATGTACTATGTACT
4364




ACATAA

GTTA






16320
16342
CCATTTACCGTACATAG
2596
AATGTGCTATGTACGG
4365




CACATT

TAAA






16327
16349
CCGTACATAGCACATTA
2597
TGACTGTAATGTGCTA
4366




CAGTCA

TGTA






16353
16375
CCCTTCTCGTCCCCATG
2598
GTCATCCATGGGGACG
4367




GATGAC

AGAA






16354
16376
CCTTCTCGTCCCCATGG
2599
GGTCATCCATGGGGAC
4368




ATGACC

GAGA






16363
16385
CCCCATGGATGACCCCC
2600
TCTGAGGGGGGTCAT
4369




CTCAGA

CATG






16364
16386
CCCATGGATGACCCCCC
2601
ATCTGAGGGGGGTCAT
4370




TCAGAT

CCAT






16365
16387
CCATGGATGACCCCCCT
2602
TATCTGAGGGGGGTCA
4371




CAGATA

TCCA






16375
16397
CCCCCCTCAGATAGGGG
2603
AAGGGACCCCTATCTG
4372




TCCCTT

AGGG






16376
16398
CCCCCTCAGATAGGGGT
2604
CAAGGGACCCCTATCT
4373




CCCTTG

GAGG






16377
16399
CCCCTCAGATAGGGGTC
2605
TCAAGGGACCCCTATC
4374




CCTTGA

TGAG






16378
16400
CCCTCAGATAGGGGTCC
2606
GTCAAGGGACCCCTAT
4375




CTTGAC

CTGA






16379
16401
CCTCAGATAGGGGTCCC
2607
GGTCAAGGGACCCCTA
4376




TTGACC

TCTG






16393
16415
CCCTTGACCACCATCCT
2608
TCACGGAGGATGGTGG
4377




CCGTGA

TCAA






16394
16416
CCTTGACCACCATCCTC
2609
TTCACGGAGGATGGTG
4378




CGTGAA

GTCA






16400
16422
CCACCATCCTCCGTGAA
2610
ATTGATTTCACGGAGG
4379




ATCAAT

ATGG






16403
16425
CCATCCTCCGTGAAATC
2611
GATATTGATTTCACGG
4380




AATATC

AGGA






16407
16429
CCTCCGTGAAATCAATA
2612
GCGGGATATTGATTTC
4381




TCCCGC

ACGG






16410
16432
CCGTGAAATCAATATCC
2613
TGTGCGGGATATTGAT
4382




CGCACA

TTCA






16425
16447
CCCGCACAAGAGTGCTA
2614
GGAGAGTAGCACTCTT
4383




CTCTCC

GTGC






16426
16448
CCGCACAAGAGTGCTAC
2615
AGGAGAGTAGCACTCT
4384




TCTCCT

TGTG






16446
16468
CCTCGCTCCGGGCCCAT
2616
AGTGTTATGGGCCCGG
4385




AACACT

AGCG






16453
16475
CCGGGCCCATAACACTT
2617
ACCCCCAAGTGTTATG
4386




GGGGGT

GGCC






16458
16480
CCCATAACACTTGGGGG
2618
TAGCTACCCCCAAGTG
4387




TAGCTA

TTAT






16459
16481
CCATAACACTTGGGGGT
2619
TTAGCTACCCCCAAGT
4388




AGCTAA

GTTA






16494
16516
CCGACATCTGGTTCCTA
2620
CTGAAGTAGGAACCA
4389




CTTCAG

GATGT






16507
16529
CCTACTTCAGGGTCATA
2621
AGGCTTTATGACCCTG
4390




AAGCCT

AAGT






16527
16549
CCTAAATAGCCCACACG
2622
GGGGAACGTGTGGGCT
4391




TTCCCC

ATTT






16536
16558
CCCACACGTTCCCCTTA
2623
CTTATTTAAGGGGAAC
4392




AATAAG

GTGT






16537
16559
CCACACGTTCCCCTTAA
2624
TCTTATTTAAGGGGAA
4393




ATAAGA

CGTG






16546
16568
CCCCTTAAATAAGACAT
2625
ATCGTGATGTCTTATT
4394




CACGAT

TAAG






16547
16569
CCCTTAAATAAGACATC
2626
CATCGTGATGTCTTAT
4395




ACGATG

TT






16548
16570
CCTTAAATAAGACATCA
2627
CCATCGTGATGTCTTA
4396




CGATGG

TTTA









Applications

The gNAs (e.g., gRNAs) and collections of gNAs (e.g., gRNAs) provided herein are useful for a variety of applications, including depletion, partitioning, capture, or enrichment of target sequences of interest, genome-wide labeling; genome-wide editing, genome-wide function screens; and genome-wide regulation.


In one embodiment, the gNAs are selective for host nucleic acids in abiological sample from a host, but are not selective for non-host nucleic acids in the sample from a host. In one embodiment, the gNAs are selective for non-host nucleic acids from a biological sample from a host but are not selective for the host nucleic acids in the sample. In one embodiment, the gNAs are selective for both host nucleic acids and a subset of the non-host nucleic acids in abiological sample from a host. For example, where a complex biological sample comprises host nucleic acids and nucleic acids from more than one non-host organisms, the gRNAs may be selective for more than one of the non-host species. In such embodiments, the gNAs are used to serially deplete or partition the sequences that are not of interest. For example, saliva from a human contains human DNA, as well as the DNA of more than one bacterial species, but may also contain the genomic material of an unknown pathogenic organism. In such an embodiment, gNAs directed at the human DNA and the known bacteria can be used to serially deplete the human DNA, and the DNA of the known bacterial, thus resulting in a sample comprising the genomic material of the unknown pathogenic organism.


In an exemplary embodiment, the gNAs are selective for human host DNA obtained from a biological sample from the host, but do not hybridize with DNA from an unknown pathogen(s) also obtained from the sample.


In some embodiments, the gNAs are useful for depleting and partitioning of targeted sequences in a sample, enriching a sample for non-host nucleic acids, or serially depleting targeted nucleic acids in a sample comprising: providing nucleic acids extracted from a sample; and contacting the sample with a plurality of complexes comprising (i) any one of the collection of gNAs described herein and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins.


In some embodiments, the gNAs are useful for method of depletion and partitioning of targeted sequences in a sample comprising: providing nucleic acids extracted from a sample, wherein the extracted nucleic acids comprise sequences of interest and targeted sequences for one of depletion and partitioning; contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins, under conditions in which the nucleic acid-guided nuclease system proteins cleave the nucleic acids in the sample.


In some embodiments, the gNAs are useful for enriching a sample for non-host nucleic acids comprising: providing a sample comprising host nucleic acids and non-host nucleic acids; contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein comprising targeting sequences directed at the host nucleic acids; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins, under conditions in which the nucleic acid-guided nuclease system proteins cleave the host nucleic acids in the sample, thereby depleting the sample of host nucleic acids, and allowing for the enrichment of non-host nucleic acids.


In some embodiments, the gNAs are useful for one method for serially depleting targeted nucleic acids in a sample comprising: providing a biological sample from a host comprising host nucleic acids and non-host nucleic acids, wherein the non-host nucleic acids comprise nucleic acids from at least one known non-host organism and nucleic acids from an unknown non-host organism; providing a plurality of complexes comprising (i) a collection of gNAs provided herein, directed at the host nucleic acids; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins; mixing the nucleic acids from the biological sample with the gNA-nucleic acid-guided nuclease system protein complexes (e.g., gRNA-CRISPR/Cas system protein complexes) configured to hybridize to targeted sequences in the host nucleic acids, wherein at least a portion of the complexes hybridizes to the targeted sequences in the host nucleic acids, and wherein at least a portion of the host nucleic acids are cleaved; mixing the remaining nucleic acids from the biological sample with the gNA-nucleic acid-guided nuclease system protein complexes configured to hybridize to targeted sequences in the at least one known non-host nucleic acids, wherein at least a portion of the complexes hybridizes to the targeted sequences in the at least one non-host nucleic acids, and wherein at least a portion of the non-host nucleic acids are cleaved; and isolating the remaining nucleic acids from the unknown non-host organism and preparing for further analysis.


In some embodiments, the gNAs generated herein are used to perform genome-wide or targeted functional screens in a population of cells. In such an embodiment, libraries of in vitro-transcribed gNAs (e.g., gRNAs) or vectors encoding the gNAs can be introduced into a population of cells via transfection or other laboratory techniques known in the art, along with a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein, in a way that gNA-directed nucleic acid-guided nuclease system protein editing can be achieved to sequences across the entire genome or to a specific region of the genome. In one embodiment, the nucleic acid-guided nuclease system protein can be introduced as a DNA. In one embodiment, the nucleic acid-guided nuclease system protein can be introduced as mRNA. In one embodiment, the nucleic acid-guided nuclease system protein can be introduced as protein. In one exemplary embodiment, the nucleic acid-guided nuclease system protein is Cas9.


In some embodiments, the gNAs generated herein are used for the selective capture and/or enrichment of nucleic acid sequences of interest. For example, in some embodiments, the gNAs generated herein are used for capturing target nucleic acid sequences comprising: providing a sample comprising a plurality of nucleic acids; and contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins. Once the sequences of interest are captured, they can be further ligated to create, for example, a sequencing library.


In some embodiments, the gNAs generated herein are used for introducing labeled nucleotides at targeted sites of interest comprising: (a) providing a sample comprising a plurality of nucleic acid fragments; (b) contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-nickases (e.g. Cas9-nickases), wherein the gNAs are complementary to targeted sites of interest in the nucleic acid fragments, thereby generating a plurality of nicked nucleic acid fragments at the targeted sites of interest; and (c) contacting the plurality of nicked nucleic acid fragments with an enzyme capable of initiating nucleic acid synthesis at a nicked site, and labeled nucleotides, thereby generating a plurality of nucleic acid fragments comprising labeled nucleotides in the targeted sites of interest.


In some embodiments, the gNAs generated herein are used for capturing target nucleic acid sequences of interest comprising: (a) providing a sample comprising a plurality of adapter-ligated nucleic acids, wherein the nucleic acids are ligated to a first adapter at one end and are ligated to a second adapter at the other end; and (b) contacting the sample with a collection of gNAs which comprise a plurality of dead nucleic acid-guided nuclease-gNA complexes (e.g., dCas9-gRNA complexes), wherein the dead nucleic acid-guided nuclease (e.g., dCas9) is fused to a transposase, wherein the gNAs are complementary to targeted sites of interest contained in a subset of the nucleic acids, and wherein the dead nucleic acid-guided nuclease-gNA transposase complexes (e.g., dCas9-gRNA transposase complexes) are loaded with a plurality of third adapters, to generate a plurality of nucleic acids fragments comprising either a first or second adapter at one end and a third adapter at the other end. In one embodiment the method further comprises amplifying the product of step (b) using first or second adapter and third adapter-specific PCR.


In some embodiments, the gNAs generated herein are used to perform genome-wide or targeted activation or repression in a population of cells. In such an embodiment, libraries of in vitro-transcribed gNAs (e.g., gRNAs) or vectors encoding the gNAs can be introduced into a population of cells via transfection or other laboratory techniques known in the art, along with a catalytically dead nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein fused to an activator or repressor domain (catalytically dead nucleic acid-guided nuclease system protein-fusion protein), in a way that gNA-directed catalytically dead nucleic acid-guided nuclease system protein-mediated activation or repression can be achieved at sequences across the entire genome or to a specific region of the genome. In one embodiment, the catalytically dead nucleic acid-guided nuclease system protein-fusion protein can be introduced as DNA. In one embodiment, the catalytically dead nucleic acid-guided nuclease system protein-fusion protein can be introduced as mRNA. In one embodiment, the catalytically dead nucleic acid-guided nuclease system protein-fusion protein can be introduced as protein. In some embodiments, the collection of gNAs or nucleic acids encoding for gNAs exhibit specificity for more than one nucleic acid-guided nuclease system protein. In one exemplary embodiment, the catalytically dead nucleic acid-guided nuclease system protein is dCas9.


In some embodiments, the collection comprises gRNAs or nucleic acids encoding for gRNAs with specificity for Cas9 and one or more CRISPR/Cas system proteins selected from selected from the group consisting of Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the collection comprises gRNAs or nucleic acids encoding for gRNAs with specificity for various catalytically dead CRISPR/Cas system proteins fused to different fluorophores, for example for use in the labeling and/or visualization of different genomes or portions of genomes, for use in the labeling and/or visualization of different chromosomal regions, or for use in the labeling and/or visualization of the integration of viral genes/genomes into a genome.


In some embodiments, the collection of gNAs (or nucleic acids encoding for gNAs) have specificity for different nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins, and target different sequences of interest, for example from different species. For example, a first subset of gNAs from a collection of gNAs (or transcribed from a population of nucleic acids encoding such gNAs) targeting a genome from a first species can be first mixed with a first nucleic acid-guided nuclease system protein member (or an engineered version); and a second subset of gNAs from a collection of gNAs (or transcribed from a population of nucleic acids encoding such gNAs) targeting a genome from a second species can be mixed with a second different nucleic acid-guided nuclease system protein member (or an engineered version). In one embodiment, the nucleic acid-guided nuclease system proteins can be a catalytically dead version (for example dCas9) fused with different fluorophores, so that different targeted sequence of interest, e.g. different species genome, or different chromosomes of one species, can be labeled by different fluorescent labels. For example, different chromosomal regions can be labeled by different gRNA-targeted dCas9-fluorophores, for visualization of genetic translocations. For example, different viral genomes can be labeled by different gRNA-targeted dCas9-fluorophores, for visualization of integration of different viral genomes into the host genome. In another embodiment, the nucleic acid-guided nuclease system protein can be dCas9 fused with either activation or repression domain, so that different targeted sequence of interest, e.g. different chromosomes of a genome, can be differentially regulated. In another embodiment, the nucleic acid-guided nuclease system protein can be dCas9 fused different protein domain which can be recognized by different antibodies, so that different targeted sequence of interest, e.g. different DNA sequences within a sample mixture, can be differentially isolated.


Exemplary Compositions of the Invention

In one embodiment, provided herein is a composition comprising a nucleic acid fragment, a nickase nucleic acid-guided nuclease-gNA complex, and labeled nucleotides. In one exemplary embodiment, provided herein is a composition comprising a nucleic acid fragment, a nickase Cas9-gRNA complex, and labeled nucleotides. In such embodiments, the nucleic acid may comprise DNA. The nucleotides can be labeled, for example with biotin. The nucleotides can be part of an antibody-conjugate pair.


In one embodiment, provided herein is a composition comprising a nucleic acid fragment and a catalytically dead nucleic acid-guided nuclease-gNA complex, wherein the catalytically dead nucleic acid-guided nuclease is fused to a transposase. In one exemplary embodiment, provided herein is a composition comprising a DNA fragment and a dCas9-gRNA complex, wherein the dCas9 is fused to a transposase.


In one embodiment, provided herein is a composition comprising a nucleic acid fragment comprising methylated nucleotides, a nickase nucleic acid-guided nuclease-gNA complex, and unmethylated nucleotides. In an exemplary embodiment, provided herein is a composition comprising a DNA fragment comprising methylated nucleotides, a nickase Cas9-gRNA complex, and unmethylated nucleotides.


In one embodiment, provided herein is a gDNA complexed with a nucleic acid-guided-DNA endonuclease. In an exemplary embodiment, the nucleic acid-guided-DNA endonuclease is NgAgo.


In one embodiment, provided herein is a gDNA complexed with a nucleic acid-guided-RNA endonuclease.


In one embodiment, provided herin is a gRNA complexed with a nucleic acid-guided-DNA endonuclease.


In one embodiment, provided herein is a gRNA complexed with a nucleic acid-guided-RNA endonuclease. In one embodiment, the nucleic acid-guided-RNA endonuclease comprises C2c2.


Kits and Articles of Manufacture

The present application provides kits comprising any one or more of the compositions described herein, not limited to adapters, gNAs (e.g., gRNAs), gNA collections (e.g., gRNA collections), nucleic acid molecules encoding the gNA collections, and the like.


In one exemplary embodiment, the kit comprises a collection of DNA molecules capable of transcribing into a library of gRNAs wherein the gRNAs are targeted to human genomic or other sources of DNA sequences.


In one embodiment, the kit comprises a collection of gNAs wherein the gNAs are targeted to human genomic or other sources of DNA sequences.


In some embodiments, provided herein are kits comprising any of the collection of nucleic acids encoding gNAs, as described herein. In some embodiments, provided herein are kits comprising any of the collection of gNAs, as described herein.


The present application also provides all essential reagents and instructions for carrying out the methods of making the gNAs and the collection of nucleic acids encoding gNAs, as described herein. In some embodiments, provided herein are kits that comprise all essential reagents and instructions for carrying out the methods of making individual gNAs and collections of gNAs as described herein.


Also provided herein is computer software monitoring the information before and after contacting a sample with a gNA collection produced herein. In one exemplary embodiment, the software can compute and report the abundance of non-target sequence in the sample before and after providing gNA collection to ensure no off-target targeting occurs, and wherein the software can check the efficacy of targeted-depletion/encrichment/capture/partitioning/labeling/regulation/editing by comparing the abundance of the target sequence before and after providing gNA collection to the sample.


The following examples are included for illustrative purposes and are not intend to limit the scope of the invention.


Examples
Example 1: Construction of a gRNA Library from a T7 Promoter Human DNA Library
T7 Promoter Library Construction

Human genomic DNA (400 ng) was fragmented using an S2 Covaris sonicator (Covaris) for 8 cycles, to yield fragments of 200-300 bp in length. Fragmented DNA was repaired using the NEBNext End Repair Module (NEB) and incubated at 25° C. for 30 min, then heat inactivated at 75° C. for 20 min. To make T7 promoter adapters, oligos T7-1 (5′GCCTCGAGC*T*A*ATACGACTCACTATAGAG3′, * denotes a phosphorothioate backbone linkage)(SEQ ID NO: 4397) and T7-2 (sequence 5′Phos-CTCTATAGTGAGTCGTATTA3′) (SEQ ID NO: 4398) were admixed at 15 μM, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 30° C. T7 promoter blunt adapters (15 pmol total) were then added to the blunt-ended human genomic DNA fragments, and incubated with Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((2) in FIG. 1). Ligations were amplified with 2 μM oligo T7-1, using Hi-Fidelity 2× Master Mix (NEB) for 10 cycles of PCR (98° C. for 20 s, 63° C. for 20 s, 72° C. for 35 s). Amplification was verified by running a small aliquot on agarose gel electrophoresis. PCR amplified products were recovered using 0.6×AxyPrep beads (Axygen) according to the manufacturer's instructions, and resuspended in 15 μL of 10 mM Tris-HCl pH 8.


Digestion of DNA

PCR amplified T7 promoter DNA (2 μg total per digestion) was digested with 0.1 μL of Nt.CviPII (NEB) in 10 μL of NEB buffer 2 (50 mM NaCl, 10 mM Tris-HCl pH 7.9, 10 mM MgCl2, 100 μg/mL BSA) for 10 min at 37° C. ((3) in FIG. 1), then heat inactivated at 75° C. for 20 min. An additional 10 μL of NEB buffer 2 with 1 μL of T7 Endonuclease I (NEB) was added to the reaction, and incubated at 37° C. for 20 min ((4) in FIG. 1). Enzymatic digestion of DNA was verified by agarose gel electrophoresis. Digested DNA was recovered by adding 0.6×AxyPrep beads (Axygen), according to the manufacturer's instructions, and resuspended in 15 μL of 10 mM Tris-HCl pH 8.


Ligation of Adapters and Removal of HGG

DNA was then blunted using T4 DNA Polymerase (NEB) for 20 min at 25° C., followed by heat inactivation at 75° C. for 20 min ((5) in FIG. 1).


To make MlyI adapters, oligos MlyI-1 (sequence 5′>3′, 5′Phos-GGGACTCGGATCCCTATAGTGATACAAAGACGATGACGACAAGCG) (SEQ ID NO: 4399) and MlyI-2 (sequence 5′>3′, TCACTATAGGGATCCGAGTCCC) (SEQ ID NO: 4400) were admixed at 15 μM, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 30° C. MlyI adapters (15 pmol total) were then added to T4 DNA Polymerase-blunted DNA, and incubated with Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((6) in FIG. 1). Ligations were heat inactivated at 75° C. for 20 min, then digested with MlyI and XhoI (NEB) for 1 hr at 37° C., so that HGG motifs are eliminated ((7) in FIG. 1). Digests were then cleaned using 0.8×AxyPrep beads (Axygen), and DNA was resuspended in 10 μL of 10 mM Tris-Cl pH 8.


To make StlgR adapters, oligos stlgR (sequence 5′>3′, 5′Phos-GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTTGGATCCGATGC) (SEQ ID NO: 4401) and stlgRev (sequence 5′>3′, GGATCCAAAAAAAGCACCGACTCGGTGCCACUITTITCAAGTTGATAACGGACTAGCCTTATTTTAAC TTGCTATTTCTAGCTCTAAAAC) (SEQ ID NO: 4402) were admixed at 15 μM, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 60° C. StlgR adapters (5 pmol total) were added to HGG-removed DNA fragments, and incubated with Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((8) in FIG. 1). Ligations were then incubated with Hi-Fidelity 2× Master Mix (NEB), using 2 μM of both oligos T7-1 and gRU (sequence 5′>3′, AAAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4403), and amplified using 20 cycles of PCR (98° C. for 20 s, 60° C. for 20 s, 72° C. for 35 s). Amplification was verified by running a small aliquot on agarose gel electrophoresis. PCR amplified products were recovered using 0.6×AxyPrep beads (Axygen) according to the manufacturer's instructions, and resuspended in 15 μL of 10 mM Tris-HCl pH 8.


In Vitro Transcription

The T7/gRU amplified library of PCR products was then used as template for in vitro transcription, using the HiScribe T7 In Vitro Transcription Kit (NEB). 500-1000 ng of template was incubated overnight at 37° C. according to the manufacturer's instructions. To transcribe the guide libraries into gRNAs, the following in vitro transcription reaction mixture was assembled: 10 μL of purified library (˜500 ng), 6.5 μL of H2O, 2.25 μL of ATP, 2.25 μL of CTP, 2.25 μL of GTP, 2.25 μL of UTP, 2.25 μL of 10× reaction buffer (NEB) and 2.25 μL of T7 RNA Polymerase mix. The reaction was incubated at 37° C. for 24 hr, then purified using the RNA cleanup kit (Life Technologies), eluted with 100 μL of RNase-free water, quantified and stored at −20° C. until use.


Example 2: Construction of gRNA Library from Intact Human Genomic DNA
Digestion of DNA

Human genomic DNA ((1) in FIG. 2; 20 μg total per digestion) was digested with 0.1 μL of Nt.CviPII (NEB) in 40 μL of NEB buffer 2 (50 mM NaCl, 10 mM Tris-HC pH 7.9, 10 mM MgCl2, 100 μg/mL BSA) for 10 min at 37° C., then heat inactivated at 75° C. for 20 min. An additional 40 μL of NEB buffer 2 and 1 μL of T7 Endonuclease I (NEB) was added to the reaction, with 20 min incubation at 37° C. (e.g., (2) in FIG. 2). Fragmentation of genomic DNA was verified with a small aliquot by agarose gel electrophoresis. DNA fragments between 200 and 600 bp were recovered by adding 0.3×AxyPrep beads (Axygen), incubating at 25° C. for 5 min, capturing beads on a magnetic stand and transferring the supernatant to a new tube. DNA fragments below 600 bp do not bind to beads at this bead/DNA ratio and remain in the supernatant. 0.7×AxyPrep beads (Axygen) were then added to the supernatant (this will bind all DNA molecules longer than 200 bp), allowed to bind for 5 min. Beads were captured on a magnetic stand and washed twice with 80% ethanol, air dried. DNA was then resuspended in 15 μL of 10 mM Tris-HCl pH 8. DNA concentration was determined using a Qbit assay (Life Technologies).


Ligation of Adapters

To make T7/MlyI adapters, oligos MlyI-1 (sequence 5′>3′, 5′Phos-GGGGGACTCGGATCCCTATAGTGATACAAAGACGATGACGACAAGCG) (SEQ ID NO: 4404) and T7-7 (sequence 5′>3′, GCCTCGAGC*T*A*ATACGACTCACTATAGGGATCCAAGTCCC, * denotes a phosphorothioate backbone linkage) (SEQ ID NO: 4405) were admixed at 15 μM, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 30° C. The purified, Nt.CviPII/T7 Endonuclease I digested DNA (100 ng) was then ligated to 15 pmol of T7/MlyI adapters using Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((3) in FIG. 2). Ligations were then amplified by 10 cycles of PCR (98° C. for 20 s, 60° C. for 20 s, 72° C. for 35 s) using Hi-Fidelity 2× Master Mix (NEB), and 2 μM of both oligos T7-17 (GCCTCGAGC*T*A*ATACGACTCACTATAGGG * denotes a phosphorothioate backbone linkage) (SEQ ID NO: 4406) and Flag (sequence 5′>3′, CGCTTGTCGTCATCGTCTTTGTA) (SEQ ID NO: 4407). PCR amplification increases the yield of DNA and, given the nature of the Y-shaped adapters we used, always resulted in T7 promoter being added distal to the HGG site and MlyI site being added next to the HGG motif ((4) in FIG. 2).


PCR products were then digested with MlyI and XhoI (NEB) for 1 hr at 37° C., and heat inactivated at 75° C. for 20 min ((5) in FIG. 2). Following that, 5 pmol of adapter StlgR (in Example 1) was ligated using Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((6) in FIG. 2). Ligations were then amplified by PCR using Hi-Fidelity 2× Master Mix (NEB), 2 μM of both oligos T7-7 and gRU (in Example 1) and 20 cycles of PCR (98° C. for 20 s, 60° C. for 20 s, 72° C. for 35 s). Amplification was verified by running a small aliquot on agarose gel electrophoresis. PCR amplified products were recovered using 0.6×AxyPrep beads (Axygen) according to the manufacturer's instructions, and resuspended in 15 μL of 10 mM Tris-HCl pH 8.


Samples were then used as templates for in vitro transcription reaction as described in Example 1.


Example 3: Direct Cutting with CviPII

30 μg of human genomic DNA was digested with 2 units of NtCviPII (New England Biolabs) for 1 hour at 37° C., followed by heat inactivation at 75° C. for 20 minutes. The size of the fragments was verified to be 200-1,000 base pairs using a fragment analyzer instrument (Advanced Analytical). The 5′ or 3′ protruding ends (as shown, for example, in FIG. 3) were converted to blunt ends by adding 100 units of T4 DNA polymerase (New England Biolabs), 100 μM dNTPs and incubating at 12° C. for 30 minutes. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL elution buffer. The DNA was then ligated to MlyI adapter (see, for example, Example 4) or BaeI/EcoP15I adapters (see, for example, Example 4) or BaeI/EcoP15I adapters (see, for example, Example 5)


Example 4: Use of MlyI Adapter

Adapter MlyI was made by combining 2 μmoles of MlyI Ad1 and MlyAd2 in 40 μL water. Adapter BsaXI/MmeI was made by combining 2 μmoles oligo BsMm-Ad1 and 2 μmoles oligo BsMm-Ad2 in 40 μL water. T7 adapter was made by combining 1.5 μmoles of T7-Ad1 and T7-Ad2 oligos in 100 μL water. Stem-loop adapter was made by combining 1.5 μmoles of gR-top and gR-bot oligos in 100 μL water. In all cases, after mixing adapters were heated to 98° C. for 3 min then cooled to room temperature at a cooling rate of 1° C./min in a thermal cycler.









TABLE 5







Oligonucleotides used with MlyI Adapter.










SEQ





ID
Oligo




NO
name
Sequence (5′>3′)
Modification





4408
MlyI-
gagatcagcttctgcattgatgccagcagcccgagtcag
none



Ad1







4409
MlyI-
ctgactcgggctgctgtacaaagacgatgacgacaagcgtta
5′phosphate



Ad2







4410
BsMm-
gagatcagcttctgcattgatgcGGAGCCGCAGTACACTATCCAAC
none



Ad1







4411
BsMm-
GTTGGATAGTGTACTGCGGCTCCtacaaagacgatgacgacaagcg
5′phosphate



Ad2







4412
T7-Ad1
gcctcgagctaatacgactcactatagagNN
none





4398
T7-Ad2
Ctctatagtgagtcgtatta
5′phosphate





4413
gR-top
ttagagctagaaatagcaagttaaaataaggctagtccgttatcaacttgaaaaagtggcaccgagtcggt
5′phosphate




gctttttt






4414
gR-bot
aaaaaagcaccgactcggtgccactttttcaagttgataacggactagccttattttaacttgctatttct
none




agctctaaaac









The DNA containing the CCD blunt ends (from earlier section) was then ligated to 50 pmoles of adapter MlyI, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. The DNA was then recovered by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/mL BSA, pH 7.9). These steps eliminate small (<100 nucleotides) DNA and MlyI adapter dimers.


Purified DNA was then digested by adding 20 units of MlyI (New England Biolabs) and incubating at 37° C. for 1 hour to eliminate both the adapter derived sequences and the CCD (and complementary HGG) motifs. DNA was recovered from the digest by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 30 μL buffer 4.


The purified DNA was then ligated to 50 pmoles of adapter BsaXI/MmeI, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. The DNA was then recovered by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/mL BSA, pH 7.9). DNA was then digested by addition of 20 units MmeI (New England Biolabs) and 40 pmol/μL SAM (S-adenosyl methionine) at 37° C. for 1 hour, followed by heat inactivation at 75° C. for 20 minutes. DNA was then ligated to 30 pmoles T7 adapter using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL buffer 4, then digested with 20 units of BsaXI for 1 hour at 37° C. The guide RNA stem-loop sequences were added by adding 15 pmoles stem-loop adapter and using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 min. DNA was then recovered using a PCR cleanup kit (Zymo), eluted in 20 μL elution buffer and PCR amplified using HiFidelity 2× master mix (New England Biolabs). Primers T7-Ad1 and gRU (sequence 5′>3′ AAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4419) were used to amplify with the following settings (98° C. 3 min; 98° C. for 20 sec, 60° C. for 30 secs, 72° C. for 20 sec, 30 cycles). The PCR amplicon was cleaned up using the PCR cleanup kit and verified by DNA sequencing, then used as template for an in vitro transcription reaction to generate guide RNAs.


Example 5: Use of BaeI/EcoP15I Adapter

Adapter Bae/EcoP15I was made by combining 2 μmoles of BE Ad1 and BE Ad2 in 40 μL water. T7-E adapter was made by combining 1.5 μmoles of T7-Ad3 and T7-Ad4 oligos in 100 μL water. In all cases, after mixing adapters were heated to 98° C. for 3 min then cooled to room temperature at a cooling rate of 1° C./min in a thermal cycler.









TABLE 6







Oligonucleotides used with BaeI/EcoP15I Adapter.










SEQ





ID
Oligo




NO:
name
Sequence (5′>3′)
Modification













4415
BE
ActgctgacACAAgtatcTTTTTTTTTTgtttaaacTTTTTTTTTTgatacACAAgtcagcagA
5′phosphate



Ad1







4416
Be
TagctgacTTGTgtatcAAAAAAAAAAgtttaaacAAAAAAAAAAgatacTTGTgtcagcagT
5′phosphate



Ad2







12
T7-
gcctcgagctaatacgactcactatagag
none



Ad3







4417
T7-
NNctctatagtgagtcgtatta
5′phosphate



Ad4







4418
stlgR
ttagagctagaaatagcaagttaaaataaggctagtccgttatcaacttgaaaaagtggcaccg
5′adenylation




agtcggtgctttttt









The DNA containing the CCD blunt ends (from earlier section) was then ligated to 50 pmoles of adapter BaeI/EcoP15I, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. The DNA was then recovered by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4 (50 mM potassium acetate 20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/mL BSA, pH 7.9). Recovered DNA was then digested with 20 units PmeI for 30 min at 37° C.; DNA was then recovered by incubating with 1.2× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4. These steps eliminate small (<100 nucleotides) DNA and BaeI/EcoP15I adapter multimers.


DNA was then digested by addition of 20 units EcoP15I (New England Biolabs) and 1 mM ATP at 37° C. for 1 hour, followed by heat inactivation at 75° C. for 20 minutes. DNA was then ligated to 30 pmoles T7-E adapter using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL buffer 4.


Purified DNA was then digested by adding 20 units of BaeI (New England Biolabs), 40 pmol/μL SAM (S-adenosyl methionine) and incubating at 37° C. for 1 hour to eliminate both the adapter derived sequences and the CCD (and complementary HGG) motifs. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL elution buffer.


Recovered DNA was then ligated to the stlgR oligo using Thermostable 5′ AppDNA/RNA Ligase


(New England Biolabs) by adding 20 units ligase, 20 pmol stlgR oligo, in 20 μL ss ligation buffer (10 mM Bis-Tris-Propane-HCl, 10 mM MgCl2, 1 mM DTT, 2.5 mM MnCl2, pH 7 @ 25° C.) and incubating at 65° C. for 1 hour followed by heat inactivation at 90° C. for 5 min. DNA product was then PCR amplified using HiFidelity 2× master mix (New England Biolabs). Primers T7-Ad3 and gRU (sequence 5′>3′ AAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4419) were used to amplify with the following settings (98° C. 3 min; 98° C. for 20 sec, 60° C. for 30 secs, 72° C. for 20 see, 30 cycles). The PCR amplicon was cleaned up using the PCR cleanup kit and verified by DNA sequencing, then used as template for an in vitro transcription reaction to generate the guide RNAs.


Example 6: NEMDA Method

NEMDA (Nicking Endonuclease Mediated DNA Amplification) was performed using 50 ng of human genomic DNA. The DNA was incubated in 100 μL thermo polymerase buffer (20 mM Tris-HCl, 10 mM (NH4)2SO4, 10 mM KCl, 6 mM MgSO4, 0.1% Triton® X-100, pH 8.8) supplemented with 0.3 mM dNTPs, 40 units of Bst large fragment DNA polymerase, and 0.1 units of NtCviPII (New England Biolabs) at 55° C. for 45 min, followed by 65° C. for 30 min and finally 80° C. for 20 min in a thermal cycler.


The DNA was then diluted with 300 μL of buffer 4 supplemented with 200 pmoles of T7-RND8 oligo (sequence 5′>3′ gcctcgagctaatacgactcactatagagnnnnnnnn) (SEQ ID NO: 4420) and boiled at 98° C. for 10 min followed by rapid cooling to 10° C. for 5 min. The reaction was then supplemented with 40 units of E. coli DNA polymerase I and 0.1 mM dNTPs (New England Biolabs) and incubated at room temperature for 20 min followed by heat inactivation at 75° C. for 20 min. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 30 μL elution buffer.


DNA was then ligated to 50 pmoles of adapter BaeI/EcoP15I, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. The DNA was then recovered by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80/o ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/mL BSA, pH 7.9). Recovered DNA was then digested with 20 units PmeI for 30 min at 37° C.; DNA was then recovered by incubating with 1.2× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4. These steps eliminate small (<100 nucleotides) DNA and BaeI/EcoP15I adapter multimers.


Purified DNA was then digested by adding 20 units of BaeI (New England Biolabs), 40 pmol/μL SAM (S-adenosyl methionine) and incubating at 37° C. for 1 hour to eliminate both the adapter derived sequences and the CCD (and complementary HGG) motifs. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL elution buffer.


Recovered DNA was then ligated to the stlgR oligo using Thermostable 5′ AppDNA/RNA Ligase (New England Biolabs) by adding 20 units ligase, 20 pmol stlgR oligo, in 20 μL ss ligation buffer (10 mM Bis-Tris-Propane-HCl, 10 mM MgCl2, 1 mM DTT, 2.5 mM MnCl2. pH 7 @ 25° C.) and incubating at 65° C. for 1 hour followed by heat inactivation at 90° C. for 5 min. DNA product was then PCR amplified using HiFidelity 2× master mix (New England Biolabs). Primers T7-Ad3 (sequence 5′>3′ gctcgagctaatacgactcactatagag) (SEQ ID NO: 12) and gRU (sequence 5′>3′ AAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4419) were used to amplify with the following settings (98° C. for 3 min; 98° C. for 20 sec, 60° C. for 30 secs, 72° C. for 20 sec, 30 cycles). The PCR amplicon was cleaned up using the PCR cleanup kit and verified by DNA sequencing, then used as template for an in vitro transcription reaction to generate the guide RNAs.

Claims
  • 1. A collection of nucleic acids the nucleic acids in the collection comprising: a second segment encoding a targeting sequence; anda third segment encoding a nucleic acid-guided nuclease system protein-binding sequence, wherein the collection of nucleic acids comprises at least 105 unique nucleic acid molecules.
  • 2. The collection of claim 1, wherein the nucleic acid-guided nuclease system protein is a CRISPR/Cas system protein.
  • 3. The collection of claim 1, wherein the size of the second segment varies from 15-250 bp across the collection of nucleic acids.
  • 4. The collection of claim 1, wherein at least 10% of the second segments in the collection are greater than 21 bp.
  • 5. The collection of claim 1, wherein the size of the second segment is not 20 bp and is not 21 bp.
  • 6. (canceled)
  • 7. The collection of claim 1, wherein the collection of nucleic acids is a collection of DNA.
  • 8. The collection of claim 7, wherein the second segment is single stranded DNA.
  • 9. The collection of claim 7, wherein the third segment is single stranded DNA.
  • 10. The collection of claim 7, wherein the third segment is double stranded DNA.
  • 11. The collection of claim 1, further comprising a first segment comprising a regulatory region, wherein the regulatory region is a region capable of binding a transcription factor.
  • 12. The collection of claim 1, further comprising a first segment comprising a regulatory region, wherein the regulatory region comprises a promoter.
  • 13. The collection of claim 13, wherein the promoter is selected from the group consisting of T7, SP6, and T3.
  • 14. The collection of claim 1, wherein the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or a viral genome.
  • 15. The collection of claim 1, wherein the targeting sequence is directed at repetitive or abundant DNA.
  • 16. The collection of claim 1, wherein the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA.
  • 17. The collection of claim 1, wherein the sequence of the second segments is selected from Table 3 and/or Table 4.
  • 18. (canceled)
  • 19. The collection of claim 1, wherein the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence.
  • 20. The collection of claim 1, wherein the collection comprises targeting sequences directed to sequences of interest spaced about every 10,000 bp or less across the genome of an organism.
  • 21. The collection of claim 20, wherein the PAM sequence is AGG, CGG, or TGG.
  • 22. The collection of claim 20, wherein the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
  • 23. The collection of claim 1, wherein the third segment comprises DNA encoding a gRNA stem-loop sequence.
  • 24. The collection of claim 1, wherein the sequence of the third segment encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAA AGUGGCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUC AACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2).
  • 25. The collection of claim 1, wherein the sequence of the third segment encodes for a crRNA and a tracrRNA.
  • 26. The collection of claim 1, wherein the nucleic acid-guided nuclease system protein is from a bacterial species.
  • 27. The collection of claim 1, wherein the nucleic acid-guided nuclease system protein is from an archaea species.
  • 28. The collection of claim 2, wherein the CRISPR/Cas system protein is a Type I, Type II, or Type III protein.
  • 29. The collection of claim 2, wherein the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase.
  • 30. The collection of claim 1, wherein the third segment comprises DNA encoding a Cas9-binding sequence.
  • 31. The collection of claim 1, wherein a plurality of third segments of the collection encode for a first nucleic acid-guided nuclease system protein binding sequence, and a plurality of the third segments of the collection encode for a second nucleic acid-guided nuclease system protein binding sequence.
  • 32. The collection of claim 1, wherein the third segments of the collection encode for a plurality of different binding sequences of a plurality of different nucleic acid-guided nuclease system proteins.
  • 33.-192. (canceled)
  • 193. A kit comprising the collection of nucleic acids of claim 1.
  • 194.-235. (canceled)
  • 236. The collection of claim 1, wherein at least 10% of the nucleic acids in the collection vary in size.
  • 237. The collection of claim 1, further comprising a first segment comprising a regulatory region.
  • 238. A collection of guide RNAs generated by transcribing the collection of claim 1.
CROSS-REFERENCE

This application is a continuation of U.S. application Ser. No. 15/742,862, filed Jan. 8, 2018, which is a U.S. National Stage Application under 35 U.S.C. § 371 of International Application No. PCT/US2016/065420, filed Dec. 7, 2016, which claims the benefit of U.S. Provisional Application No. 62/264,262, filed Dec. 7, 2015, and of U.S. Provisional Application No. 62/298,963, filed Feb. 23, 2016, each of which is hereby incorporated by reference in its entirety.

Provisional Applications (2)
Number Date Country
62298963 Feb 2016 US
62264262 Dec 2015 US
Continuations (1)
Number Date Country
Parent 15742862 Jan 2018 US
Child 16995761 US