METHODS AND COMPOSITIONS FOR THE MAKING AND USING OF GUIDE NUCLEIC ACIDS

Abstract
Provided herein are methods and compositions to make guide nucleic acids (gNAs), nucleic acids encoding gNAs, collections of gNAs, and nucleic acids encoding for a collection of gNAs from any source nucleic acid. Also provided herein are methods and compositions to use the resulting gNAs, nucleic acids encoding gNAs, collections of gNAs, and nucleic acids encoding for a collection of gNAs in a variety of applications.
Description
INCORPORATION BY REFERENCE OF SEQUENCE LISTING

The present application is being filed with a Sequence Listing in electronic format. The Sequence Listing is provided as a file entitled ARCB-003/02US_SeqList.txt, created Nov. 19, 2018, which is 816,892 bytes in size. The information in electronic format of the Sequence Listing is incorporated by reference in its entirety.


BACKGROUND

Human clinical DNA samples and sample libraries such as cDNA libraries derived from RNA contain highly abundant sequences that have little informative value and increase the cost of sequencing. While methods have been developed to deplete these unwanted sequences (e.g., via hybridization capture), these methods are often time-consuming and can be inefficient.


Although a guide nucleic acid (gNA) mediated nuclease systems (such as guide RNA (gRNA)-mediated Cas systems) can efficiently deplete any target DNA, targeted depletion of very high numbers of unique DNA molecules is not feasible. For example, a sequencing library derived from human blood may contain >99% human genomic DNA. Using a gRNA-mediated Cas9 system-based method to deplete this genomic DNA to detect an infectious agent circulating in the human blood would require extremely high numbers of gRNAs (about 10-100 million gRNAs), in order to ensure that a gRNA will be present every 30-50 base pairs (bp), and that no target DNA will be missed. Very large numbers of gRNAs can be predicted computationally and then synthesized chemically, but at a prohibitively expensive cost.


Therefore, there is a need in the art to provide a cost-effective method of converting any DNA into a gNA (e.g., gRNA) library to enable, for example, genome-wide depletion of unwanted DNA sequences from those of interest, without prior knowledge about their sequences. Provided herein are methods and compositions that address this need.


SUMMARY

Provided herein are compositions and methods to generate gNAs and collections of gNAs from any source nucleic acid. For example, gRNAs and collections of gRNAs can be generated from source DNA, such as genomic DNA. Such gNAs and collections of the same are useful for a variety of applications, including depletion, partitioning, capture, or enrichment of target sequences of interest, genome-wide labeling, genome-wide editing, genome-wide functional screens, and genome-wide regulation.


In one aspect, the invention described herein provides a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment encoding a nucleic acid-guided nuclease system protein-binding sequence, wherein at least 10% of the nucleic acids in the collection vary in size. In another aspect, the invention described herein provides a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence, wherein the size of the second segment is greater than 21 bp; and a third segment encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease system protein is a CRISPR/Cas system protein. In some embodiments, the size of the second segment varies from 15-250 bp across the collection of nucleic acids. In some embodiments, at least 10% of the second segments in the collection are greater than 21 bp. In some embodiments, the size of the second segment is not 20 bp. In some embodiments, the size of the second segment is not 21 bp. In some embodiments, the collection of nucleic acids is a collection of DNA. In some embodiments, the second segment is single stranded DNA. In some embodiments, the third segment is single stranded DNA. In some embodiments, the second segment is double stranded DNA. In some embodiments, the third segment is double stranded DNA. In some embodiments, the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3. In some embodiments, the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or a viral genome. In some embodiments, the targeting sequence is directed at repetitive or abundant DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the sequence of the second segments is selected from Table 3 and/or Table 4. In some embodiments, the collection comprises at least 102 unique nucleic acid molecules. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the collection comprises targeting sequences directed to sequences of interest spaced about every 10,000 bp or less across the genome of an organism. In some embodiments, the PAM sequence is AGG, CGG, or TGG. In some embodiments, the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the third segment comprises DNA encoding a gRNA stem-loop sequence. In some embodiments, the third segment encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the sequence of the third segment encodes for a crRNA and a tracrRNA. In some embodiments, the nucleic acid-guided nuclease system protein is from a bacterial species. In some embodiments, the nucleic acid-guided nuclease system protein is from an archaea species. In some embodiments, the CRISPR/Cas system protein is a Type I, Type II, or Type III protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the third segment comprises DNA encoding a Cas9-binding sequence. In some embodiments, a plurality of third segments of the collection encode for a first nucleic acid-guided nuclease system protein binding sequence, and a plurality of the third segments of the collection encode for a second nucleic acid-guided nuclease system protein binding sequence. In some embodiments, the third segments of the collection encode for a plurality of different binding sequences of a plurality of different binding sequences of a plurality of different nucleic acid-guided nuclease system proteins.


In another aspect, the invention described herein provides for a collection of guide RNAs (gRNAs), comprising: a first RNA segment a targeting sequence; and a second RNA segment comprising a nucleic acid-guided nuclease system protein-binding sequence, wherein at least 10% of the gRNAs in the collection vary in size. In some embodiments, the nucleic acid-guided nuclease system protein is a CRISPR/Cas system protein. In some embodiments, the size of the first segment varies from 15-250 bp across the collection of gRNAs. In some embodiments, the at least 10% of the first segments in the collection are greater than 21 bp. In some embodiments, the size of the first segment is not 20 bp. In some embodiments, the size of the first segment is not 21 bp. In some embodiments, the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or viral genome. In some embodiments, the targeting sequence is directed at repetitive or abundant DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the sequence of the first segments is RNA encoded by sequences selected from Table 3 and/or Table 4. In some embodiments, the collection comprises at least 102 unique gRNAs. In some embodiments, the gRNAs comprise cytosine, guanine, and adenine. In some embodiments, a subset of the gRNAs further comprises thymine. In some embodiments, a subset of the gRNAs further comprises uracil. In some embodiments, the first segment is at least 80% complementary to a target genomic sequence of interest. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments the PAM sequence is AGG, CGG, or TGG. In some embodiments, the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the second segment comprises a gRNA stem-loop sequence. In some embodiments, the third segment comprises DNA encoding a gRNA stem-loop sequence. In some embodiments, the third segment comprises the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or comprises the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the second segment comprises a crRNA and a tracrRNA. In some embodiments, the nucleic acid-guided nuclease system protein is from a bacterial species. In some embodiments, the nucleic acid-guided nuclease system protein is from an archaea species. In some embodiments, the CRISPR/Cas system protein is a Type I, Type II, or Type III protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the second segment comprises a Cas9-binding sequence. In some embodiments, at least 10% of the gRNAs in the collection vary in their 5′ terminal-end sequence. In some embodiments, the collection comprises targeting sequences directed to sequences of interest spaced every 10,000 bp or less across the genome of an organism. In some embodiments, a plurality of second segments of the collection comprise a first nucleic acid-guided nuclease system protein binding sequence, and a plurality of the second segments of the collection comprise a second nucleic acid-guided nuclease system protein binding sequence. In some embodiments, the second segments of the collection comprise a plurality of different binding sequences of a plurality of different nucleic acid-guided nuclease system proteins. In some embodiments, a plurality of the gRNAs of the collection are attached to a substrate. In some embodiments, a plurality of the gRNAs of the collection comprise a label. In some particular embodiments, a plurality of the gRNAs of the collection comprise different labels.


In another aspect, the invention described herein provides nucleic acid comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence, wherein the targeting sequence is greater than 30 bp; and a third segment encoding a nucleic acid encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas system protein. In some embodiments, the nucleic acid is DNA. In some embodiments, the second segment is single stranded DNA. In some embodiments, the third segment is single stranded DNA. In some embodiments, the second segment is double stranded DNA. In some embodiments, the third segment is double stranded DNA. In some embodiments, the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3. In some embodiments, the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or a viral genome. In some embodiments, the targeting sequence is directed at abundant or repetitive DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the sequence of the second segments is selected from Table 3 and/or Table 4. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the target genomic sequence of interest is 5′ upstream of a PAM sequence. In some embodiments, the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the third segment comprises DNA encoding a gRNA stem-loop sequence. In some embodiments, the third segment comprises DNA encoding a gRNA stem-loop sequence. In some embodiments, the third segment encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the nucleic acid-guided nuclease system protein is from a bacterial species. In some embodiments, the nucleic acid-guided nuclease system protein is from an archaea species. In some embodiments, the CRISPR/Cas system protein is a Type I, Type II, or Type III protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the third segment comprises DNA encoding a Cas9-binding sequence.


In another aspect, the invention described herein provides a guide RNA comprising a first segment comprising a targeting sequence, wherein the size of the first segment is greater than 30 bp; and a second segment comprising a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas system protein. In some embodiments, the gRNA comprises an adenine, a guanine, and a cytosine. In some embodiments, the gRNA further comprises a thymine. In some embodiments, the gRNA further comprises a uracil. In some embodiments, the size of the first RNA segment is between 30 and 250 bp. In some embodiments, the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or viral genome. In some embodiments, the targeting sequence is directed at repetitive or abundant DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the first segment is at least 80% complementary to the target genomic sequence of interest. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the second segment comprises a gRNA stem-loop sequence. In some embodiments, the sequence of the second segment comprises GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or comprises the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the sequence of the third segment comprises a crRNA and a tracrRNA. In some embodiments, the nucleic acid-guided nuclease system protein is from a bacterial species. In some embodiments, the nucleic acid-guided nuclease system protein is from an archaea species. In some embodiments, the CRISPR/Cas system protein is a Type I, Type II, or Type III protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the second segment is a Cas9-binding sequence.


In another aspect, the invention provides a complex comprising a nucleic acid-guided nuclease system protein and a comprising a first segment comprising a targeting sequence, wherein the size of the first segment is greater than 30 bp; and a second segment comprising a nucleic acid-guided nuclease system protein-binding sequence.


In another aspect, the invention described herein provides a method for depleting and partitioning of targeted sequences in a sample, enriching a sample for non-host nucleic acids, or serially depleting targeted nucleic acids in a sample comprising: providing nucleic acids extracted from a sample; and contacting the sample with a plurality of complexes comprising (i) any one of the collection of gRNAs provided herein; and (ii) nucleic acid-guided nuclease system proteins. In some embodiments, the nucleic acid-guided nuclease system proteins are CRISPR/Cas system proteins. In some embodiments, the CRISPR/Cas system proteins are Cas9 proteins.


In another aspect, the invention provides a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, comprising: (a) providing double-stranded DNA molecules, each comprising a sequence of interest 5′ to a PAM sequence, and its reverse complementary sequence on the opposite strand; (b) performing an enzymatic digestion reaction on the double stranded DNA molecules, wherein cleavages are generated at the PAM sequence and/or its reverse complementary sequence on the opposite strand, but never completely remove the PAM sequence and/or its reverse complementary sequence on the opposite strand from the double stranded DNA; (c) ligating adapters comprising a recognition sequence to the resulting DNA molecules of step b; (d) contacting the DNA molecules of step c with an restriction enzyme that recognizes the recognition sequence of step c, whereby generating DNA fragments comprising blunt-ended double strand breaks immediately 5′ to the PAM sequence, whereby removing the PAM sequence and the adapter containing the enzyme recognition site; and (e) ligating the resulting double stranded DNA fragments of step d with a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, whereby generating a plurality of DNA fragments, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas nucleic acid-guided nuclease system protein. In some embodiments, the starting DNA molecules of the collection further comprise a regulatory sequence upstream of the sequence of interest 5′ to the PAM sequence. In some embodiments, the regulatory sequence comprises a promoter. In some embodiments, the promoter comprises a T7, Sp6, or T3 sequence. In some embodiments, the double stranded DNA molecules are genomic DNA, intact DNA, or sheared DNA. In some embodiments, the genomic DNA is human, mouse, avian, fish, plant, insect, bacterial, or viral. In some embodiments, the DNA segments encoding a targeting sequence are at least 22 bp. In some embodiments, the DNA segments encoding a targeting sequence are 15-250 bp in size range. In some embodiments, the PAM sequence is AGG, CGG, or TGG. In some embodiments, the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, step (b) further comprises (1) contacting the DNA molecules with an enzyme capable of creating a nick in a single strand at a CCD site, whereby generating a plurality of nicked double stranded DNA molecules, each comprising a sequence of interest followed by an HGG sequence, wherein the DNA molecules are nicked at the CCD sites; and (2) contacting the nicked double stranded DNA molecules with an endonuclease, whereby generating a plurality of double stranded DNA fragments, each comprising a sequence of interest followed by an HGG sequence wherein residual nucleotides from HGG and/or CCD sequences is (are) left behind. In some embodiments, step (d) further comprises PCR amplification of the adaptor-ligated DNA fragments from step (c) before cutting with the restriction enzyme recognizing the recognition sequence of step (c), wherein after PCR, the recognition sequence is positioned 3′ of the PAM sequence, and a regulatory sequence is positioned at the 5′ distal end of the PAM sequence. In some embodiments, the enzymatic reaction of step (b) comprises the use of a Nt.CviPII enzyme, and a T7 Endonuclease I enzyme. In some embodiments, step (c) further comprises a blunt-end reaction with a T4 DNA Polymerase, if the adapter to be ligated does not comprise an overhang. In some embodiments, the adapter of step (c) is either (1) double stranded, comprising a restriction enzyme recognition sequence in one strand, and a regulatory sequence in the other strand, if the adapter is Y-shaped and comprises an overhang; or (2) has a palindromic enzyme recognition sequence in both strands, if the adapter is not Y-shaped. In some embodiments, the restriction enzyme of step (d) is MlyI. In some embodiments, the restriction enzyme of step (d) is BaeI. In some embodiments, step (d) further comprises contacting the DNA molecules with an XhoI enzyme. In some embodiments, in step (e) the DNA encoding a nucleic acid-guided nuclease system-protein binding sequence encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the targeted sequences of interest are spaced every 10,000 bp or less across the genome of an organism.


In another aspect, the invention provides a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, comprising: (a) providing a plurality of double stranded DNA molecules, each comprising a sequence of interest, an NGG site, and its complement CCN site; (b) contacting the molecules with an enzyme capable of creating a nick in a single strand at a CCN site, whereby generating a plurality of nicked double stranded DNA molecules, each comprising a sequence of interest 5′ to the NGG site, wherein the DNA molecules are nicked at the CCD sites; (c) contacting the nicked double stranded DNA molecules with an endonuclease, whereby generating a plurality of double stranded DNA fragments, each comprising a sequence of interest, wherein the fragments comprise an terminal overhang; (d) contacting the double stranded DNA fragments with an enzyme without 5′ to 3′ exonuclease activity to blunt end the double stranded DNA fragments, whereby generating a plurality of blunt ended double stranded fragments, each comprising a sequence of interest; (e) contacting the blunt ended double stranded fragments of step d with an enzyme that cleaves the terminal NGG site; and (f) ligating the resulting double stranded DNA fragments of step e with a DNA encoding a nucleic acid-guided nuclease system-protein binding sequence, whereby generating a plurality of DNA fragments, each comprising a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas system protein. In some embodiments, the plurality of double stranded DNA molecules have a regulatory sequence 5′ upstream of the NGG sites. In some embodiments, the regulatory sequence comprises a T7, SP6, or T3 sequence. In some embodiments, the NGG site comprises AGG, CGG, or TGG, and the CCN site comprises CCT, CCG, or CCA. In some embodiments, the plurality of double stranded DNA molecules, each comprising a sequence of interest comprise sheared fragments of genomic DNA. In some embodiments, the genomic DNA is mammalian, prokaryotic, eukaryotic, avian, bacterial or viral. In some embodiments, the plurality of double stranded DNA molecules in step (a) are at least 500 bp. In some embodiments, the enzyme in step b is a Nt.CviPII enzyme. In some embodiments, the enzyme in step c is a T7 Endonuclease I. In some embodiments, the enzyme in step d is a T4 DNA Polymerase. In some embodiments, in step f the DNA encoding a nucleic acid-guided nuclease system-protein binding sequence encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the step e additionally comprises ligating adaptors carrying a MlyI recognition site and digesting with MlyI enzyme. In some embodiments, the sequence of interest is spaced every 10,000 bp or less across the genome.


In another aspect, the invention provides a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence and a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, comprising: (a) providing genomic DNA comprising a plurality of sequences of interest, comprising NGG and CCN sites; (b) contacting the genomic DNA with an enzyme capable of creating nicks in the genomic DNA, whereby generating nicked genomic DNA, nicked at CCN sites; (c) contacting the nicked genomic DNA with an endonuclease, whereby generating double stranded DNA fragments, with an overhang; (d) ligating the DNA with overhangs from step c to a Y-shaped adapter, thereby introducing a restriction enzyme recognition sequence only at 3′ of the NGG site and a regulatory sequence 5′ of the sequence of interest; (e) contacting the product from step d with an enzyme that cleaves away the NGG site together with the adaptor carrying the enzyme recognition sequence; and (f) ligating the resulting double stranded DNA fragments of step e with a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, whereby generating a plurality of DNA fragments, each comprising a sequence of interest ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas system protein. In some embodiments, the NGG site comprises AGG, CGG, or TGG, and CCN site comprises CCT, CCG, or CCA. In some embodiments, the regulatory sequence comprises a promoter sequence. In some embodiments, the promoter sequence comprises a T7, SP6, or T3 sequence. In some embodiments, the DNA fragments are sheared fragments of genomic DNA.


In some embodiments, the genomic DNA is mammalian, prokaryotic, eukaryotic, or viral. In some embodiments, the fragments are at least 200 bp. In some embodiments, the enzyme in step b is a Nt.CviPII enzyme. In some embodiments, the enzyme in step c is a T7 Endonuclease I. In some embodiments, step d further comprises PCR amplification of the adaptor-ligated DNA. In some embodiments, in step f, the DNA encoding nucleic acid-guided nuclease system protein-binding sequence encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the enzyme removing NGG site in step e is MlyI. In some embodiments, the target of interest of the collection is spaced every 10,000 bp or less across the genome.


In another aspect, the invention provides kits and/or reagents useful for performing a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, as described in the embodiments herein.


In another aspect, the invention described herein provides kit comprising a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment encoding a CRISPR/Cas system protein-binding sequence, wherein at least 10% of the nucleic acids in the collection vary in size.


In another aspect, the invention described herein provides a kit comprising a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence, wherein the size of the second segment is greater than 21 bp; and a third segment encoding a CRISPR/Cas system protein-binding sequence.


In another aspect, the invention described herein provides a kit comprising a collection of guide RNAs comprising a first RNA segment a targeting sequence; and a second RNA segment comprising a CRISPR/Cas system protein-binding sequence, wherein at least 10% of the gRNAs in the collection vary in size.


In another aspect, the invention described herein provides a method of making a collection of guide nucleic acids, comprising: a. obtaining abundant cells in a source sample; b. collecting nucleic acids from said abundant cells; and c. preparing a collection of guide nucleic acids (gNAs) from said nucleic acids. In some embodiments, said abundant cells comprise cells from one or more most abundant bacterial species in said source sample. In some embodiments, said abundant cells comprise cells from more than one species. In some embodiments, said abundant cells comprise human cells. In some embodiments, said abundant cells comprise animal cells. In some embodiments, said abundant cells comprise plant cells. In some embodiments, said abundant cells comprise bacterial cells. In some embodiments, the method further comprises contacting nucleic acid-guided nucleases with said library of gNAs to form nucleic acid-guided nuclease-gNA complexes. In some embodiments, the method further comprises using said nucleic acid-guided nuclease-gNA complexes to cleave target nucleic acids at target sites, wherein said gNAs are complementary to said target sites. In some embodiments, said target nucleic acids are from said source sample. In some embodiments, a species of said target nucleic acids is the same as a species of said source sample. In some embodiments, said species of said target nucleic acids and said species of said source sample is human. In some embodiments, said species of said target nucleic acids and said species of said source sample is animal. In some embodiments, said species of said target nucleic acids and said species of said source sample is plant.


In another aspect, the invention described herein provides a method of making a collection of nucleic acids, each comprising a targeting sequence, comprising: a. obtaining source DNA; b. nicking said source DNA with a nicking enzyme at nicking enzyme recognition sites, thereby producing double-stranded breaks at proximal nicks; and c. repairing overhangs of said double-stranded breaks, thereby producing a double-stranded fragment comprising (i) a targeting sequence and (ii) said nicking enzyme recognition site. In another aspect, the invention described herein provides a method of making a collection of nucleic acids, each comprising a targeting sequence, comprising: a. obtaining source DNA; b. nicking said source DNA with a nicking enzyme at nicking enzyme recognition sites, thereby producing a nick; and c. synthesizing a new strand from said nick, thereby producing a single-stranded fragment of said source DNA comprising a targeting sequence. In some embodiments, the method further comprises producing a double-stranded fragment comprising said targeting sequence from said single-stranded fragment. In some embodiments, said producing said double-stranded fragment comprises random priming and extension. In some embodiments, said random priming is conducted with a primer comprising a random n-mer region and a promoter region. In some embodiments, said random n-mer region is a random hexamer region. In some embodiments, said random n-mer region is a random octamer region. In some embodiments, said promoter region is a T7 promoter region. In some embodiments, the method further comprises ligating a nuclease recognition site nucleic acid comprising a nuclease recognition site to said double-stranded fragment. In some embodiments, said nuclease recognition site corresponds to a nuclease that cuts at a distance from said nuclease recognition site equal to the length of said nicking enzyme recognition sites. In some embodiments, said nuclease recognition site is a MlyI recognition site. In some embodiments, said nuclease recognition site is a BaeI recognition site. In some embodiments, the method further comprises digesting said double-stranded fragment with said nuclease, thereby removing said nicking enzyme recognition site from said double-stranded fragment. In some embodiments, the method further comprises ligating said double-stranded fragment to a nucleic acid-guided nuclease system protein recognition site nucleic acid comprising a nucleic acid-guided nuclease system protein recognition site. In some embodiments, said nucleic acid-guided nuclease system protein recognition site comprises a guide RNA stem-loop sequence. In some embodiments, said nuclease recognition site corresponds to a nuclease that cuts at a distance from said nuclease recognition site equal to a length of said targeting sequence. In some embodiments, said length of said targeting sequence is 20 base pairs. In some embodiments, said nuclease recognition site is a MmeI recognition site. In some embodiments, the method further comprises digesting said double-stranded fragment with said nuclease. In some embodiments, said nuclease recognition site corresponds to a nuclease that cuts at a distance from said nuclease recognition site equal to a length of said targeting sequence plus a length of said nicking enzyme recognition sites. In some embodiments, said length of said targeting sequence plus a length of said nicking enzyme recognition sites is 23 base pairs. In some embodiments, said nuclease recognition site is a EcoP15I recognition site. In some embodiments, the method further comprises digesting said double-stranded fragment with said nuclease. In some embodiments, the method further comprises ligating said double-stranded fragment to a nucleic acid-guided nuclease system protein recognition site nucleic acid comprising a nucleic acid-guided nuclease system protein recognition site. In some embodiments, said nucleic acid-guided nuclease system protein recognition site comprises a guide RNA stem-loop sequence.


In another aspect, the invention described herein provides a kit comprising all essential reagents and instructions for carrying out the methods of aspects of the invention described herein.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 illustrates an exemplary scheme for producing a collection of gRNAs (a gRNA library) from genomic DNA.



FIG. 2 illustrates another exemplary scheme for producing a collection of gRNAs (a gRNA library) from genomic DNA.



FIG. 3 illustrates an exemplary scheme for nicking of DNA and subsequent treatment with polymerase to generate blunt ends.



FIG. 4 illustrates an exemplary scheme for sequential production of a library of gNAs using three adapters.



FIG. 5 illustrates an exemplary scheme for sequential production of a library of gNAs using one adapter and one oligo.



FIG. 6 illustrates an exemplary scheme for generation of a large pool of DNA fragments with blunt ends using Nicking Enzyme Mediated DNA Amplification (NEMDA).



FIG. 7 illustrates an exemplary scheme for generation of a large pool of gNAs using Nicking Enzyme Mediated DNA Amplification (NEMDA).





DETAILED DESCRIPTION OF THE INVENTION

There is a need in the art for a scalable, low-cost approach to generate large numbers of diverse guide nucleic acids (gNAs) (e.g., gRNAs, gDNAs) for a variety of downstream applications.


Unless defined otherwise herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described.


Numeric ranges are inclusive of the numbers defining the range.


For purposes of interpreting this specification, the following definitions will apply and whenever appropriate, terms used in the singular will also include the plural and vice versa. In the event that any definition set forth below conflicts with any document incorporated herein by reference, the definition set forth shall control.


As used herein, the singular form “a”, “an”, and “the” includes plural references unless indicated otherwise.


It is understood that aspects and embodiments of the invention described herein include “comprising,” “consisting,” and “consisting essentially of” aspects and embodiments.


The term “about” as used herein refers to the usual error range for the respective value readily known to the skilled person in this technical field. Reference to “about” a value or parameter herein includes (and describes) embodiments that are directed to that value or parameter per se.


The term “nucleic acid,” as used herein, refers to a molecule comprising one or more nucleic acid subunits. A nucleic acid can include one or more subunits selected from adenosine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), and modified versions of the same. A nucleic acid comprises deoxyribonucleic acid (DNA), ribonucleic acid (RNA), combinations, or derivatives thereof. A nucleic acid may be single-stranded and/or double-stranded.


The nucleic acids comprise “nucleotides”, which, as used herein, is intended to include those moieties that contain purine and pyrimidine bases, and modified versions of the same. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles. In addition, the term “nucleotide” or “polynucleotide” includes those moieties that contain hapten or fluorescent labels and may contain not only conventional ribose and deoxyribose sugars, but other sugars as well. Modified nucleosides, nucleotides or polynucleotides also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, or are functionalized as ethers, amines, or the like.


The term “nucleic acids” and “polynucleotides” are used interchangeably herein. Polynucleotide is used to describe a nucleic acid polymer of any length, e.g., greater than about 2 bases, greater than about 10 bases, greater than about 100 bases, greater than about 500 bases, greater than 1000 bases, up to about 10,000 or more bases composed of nucleotides, e.g., deoxyribonucleotides or ribonucleotides, and may be produced enzymatically or synthetically (e.g., PNA as described in U.S. Pat. No. 5,948,902 and the references cited therein) which can hybridize with naturally occurring nucleic acids in a sequence specific manner analogous to that of two naturally occurring nucleic acids, e.g., can participate in Watson-Crick base pairing interactions. Naturally-occurring nucleotides include guanine, cytosine, adenine and thymine (G, C, A and T, respectively). DNA and RNA have a deoxyribose and ribose sugar backbones, respectively, whereas PNA's backbone is composed of repeating N-(2-aminoethyl)-glycine units linked by peptide bonds. In PNA various purine and pyrimidine bases are linked to the backbone by methylene carbonyl bonds. A locked nucleic acid (LNA), often referred to as inaccessible RNA, is a modified RNA nucleotide. The ribose moiety of an LNA nucleotide is modified with an extra bridge connecting the 2′ oxygen and 4′ carbon. The bridge “locks” the ribose in the 3′-endo (North) conformation, which is often found in the A-form duplexes. LNA nucleotides can be mixed with DNA or RNA residues in the oligonucleotide whenever desired. The term “unstructured nucleic acid,” or “UNA,” is a nucleic acid containing non-natural nucleotides that bind to each other with reduced stability. For example, an unstructured nucleic acid may contain a G′ residue and a C′ residue, where these residues correspond to non-naturally occurring forms, i.e., analogs, of G and C that base pair with each other with reduced stability, but retain an ability to base pair with naturally occurring C and G residues, respectively. Unstructured nucleic acid is described in US20050233340, which is incorporated by reference herein for disclosure of UNA.


The term “oligonucleotide” as used herein denotes a single-stranded multimer of nucleotides.


Unless otherwise indicated, nucleic acids are written left to right in 5′ to 3′ orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.


The term “cleaving,” as used herein, refers to a reaction that breaks the phosphodiester bonds between two adjacent nucleotides in both strands of a double-stranded DNA molecule, thereby resulting in a double-stranded break in the DNA molecule.


The term “nicking” as used herein, refers to a reaction that breaks the phosphodiester bond between two adjacent nucleotides in only one strand of a double-stranded DNA molecule, thereby resulting in a break in one strand of the DNA molecule.


The term “cleavage site, as used herein, refers to the site at which a double-stranded DNA molecule has been cleaved.


The “nucleic acid-guided nuclease-gNA complex” refers to a complex comprising a nucleic acid-guided nuclease protein and a guide nucleic acid (gNA, for example a gRNA or a gDNA). For example the “Cas9-gRNA complex” refers to a complex comprising a Cas9 protein and a guide RNA (gRNA). The nucleic acid-guided nuclease may be any type of nucleic acid-guided nuclease, including but not limited to wild type nucleic acid-guided nuclease, a catalytically dead nucleic acid-guided nuclease, or a nucleic acid-guided nuclease-nickase.


The term “nucleic acid-guided nuclease-associated guide NA” refers to a guide nucleic acid (guide NA). The nucleic acid-guided nuclease-associated guide NA may exist as an isolated nucleic acid, or as part of a nucleic acid-guided nuclease-gNA complex, for example a Cas9-gRNA complex.


The terms “capture” and “enrichment” are used interchangeably herein, and refer to the process of selectively isolating a nucleic acid region containing: sequences of interest, targeted sites of interest, sequences not of interest, or targeted sites not of interest.


The term “hybridization” refers to the process by which a strand of nucleic acid joins with a complementary strand through base pairing as known in the art. A nucleic acid is considered to be “selectively hybridizable” to a reference nucleic acid sequence if the two sequences specifically hybridize to one another under moderate to high stringency hybridization and wash conditions. Moderate and high stringency hybridization conditions are known (see, e.g., Ausubel, et al., Short Protocols in Molecular Biology, 3rd ed., Wiley & Sons 1995 and Sambrook et al., Molecular Cloning: A Laboratory Manual, Third Edition, 2001 Cold Spring Harbor, N.Y.). One example of high stringency conditions includes hybridization at about 42° C. in 50% formamide, 5×SSC, 5×Denhardt's solution, 0.5% SDS and 100 μg/ml denatured carrier DNA followed by washing two times in 2×SSC and 0.5% SDS at room temperature and two additional times in 0.1×SSC and 0.5% SDS at 42° C.


The term “duplex,” or “duplexed,” as used herein, describes two complementary polynucleotides that are base-paired, i.e., hybridized together.


The term “amplifying” as used herein refers to generating one or more copies of a target nucleic acid, using the target nucleic acid as a template.


The term “genomic region,” as used herein, refers to a region of a genome, e.g., an animal or plant genome such as the genome of a human, monkey, rat, fish or insect or plant. In certain cases, an oligonucleotide used in the method described herein may be designed using a reference genomic region, i.e., a genomic region of known nucleotide sequence, e.g., a chromosomal region whose sequence is deposited at NCBI's Genbank database or other databases, for example.


The term “genomic sequence,” as used herein, refers to a sequence that occurs in a genome. Because RNAs are transcribed from a genome, this term encompasses sequence that exist in the nuclear genome of an organism, as well as sequences that are present in a cDNA copy of an RNA (e.g., an mRNA) transcribed from such a genome.


The term “genomic fragment,” as used herein, refers to a region of a genome, e.g., an animal or plant genome such as the genome of a human, monkey, rat, fish or insect or plant. A genomic fragment may be an entire chromosome, or a fragment of a chromosome. A genomic fragment may be adapter ligated (in which case it has an adapter ligated to one or both ends of the fragment, or to at least the 5′ end of a molecule), or may not be adapter ligated.


In certain cases, an oligonucleotide used in the method described herein may be designed using a reference genomic region, i.e., a genomic region of known nucleotide sequence, e.g., a chromosomal region whose sequence is deposited at NCBI's Genbank database or other databases, for example Such an oligonucleotide may be employed in an assay that uses a sample containing a test genome, where the test genome contains a binding site for the oligonucleotide.


The term “ligating,” as used herein, refers to the enzymatically catalyzed joining of the terminal nucleotide at the 5′ end of a first DNA molecule to the terminal nucleotide at the 3′ end of a second DNA molecule.


If two nucleic acids are “complementary,” each base of one of the nucleic acids base pairs with corresponding nucleotides in the other nucleic acid. The term “complementary” and “perfectly complementary” are used synonymously herein.


The term “separating,” as used herein, refers to physical separation of two elements (e.g., by size or affinity, etc.) as well as degradation of one element, leaving the other intact. For example, size exclusion can be employed to separate nucleic acids, including cleaved targeted sequences.


In a cell, DNA usually exists in a double-stranded form, and as such, has two complementary strands of nucleic acid referred to herein as the “top” and “bottom” strands. In certain cases, complementary strands of a chromosomal region may be referred to as “plus” and “minus” strands, the “first” and “second” strands, the “coding” and “noncoding” strands, the “Watson” and “Crick” strands or the “sense” and “antisense” strands. The assignment of a strand as being a top or bottom strand is arbitrary and does not imply any particular orientation, function or structure. Until they become covalently linked, the first and second strands are distinct molecules. For ease of description, the “top” and “bottom” strands of a double-stranded nucleic acid in which the top and bottom strands have been covalently linked will still be described as the “top” and “bottom” strands. In other words, for the purposes of this disclosure, the top and bottom strands of a double-stranded DNA do not need to be separated molecules. The nucleotide sequences of the first strand of several exemplary mammalian chromosomal regions (e.g., BACs, assemblies, chromosomes, etc.) is known, and may be found in NCBI's Genbank database, for example.


The term “top strand,” as used herein, refers to either strand of a nucleic acid but not both strands of a nucleic acid. When an oligonucleotide or a primer binds or anneals “only to a top strand,” it binds to only one strand but not the other. The term “bottom strand,” as used herein, refers to the strand that is complementary to the “top strand.” When an oligonucleotide binds or anneals “only to one strand,” it binds to only one strand, e.g., the first or second strand, but not the other strand. If an oligonucleotide binds or anneals to both strands of a double-stranded DNA, the oligonucleotide may have two regions, a first region that hybridizes with the top strand of the double-stranded DNA, and a second region that hybridizes with the bottom strand of the double-stranded DNA.


The term “double-stranded DNA molecule” refers to both double-stranded DNA molecules in which the top and bottom strands are not covalently linked, as well as double-stranded DNA molecules in which the top and bottom stands are covalently linked. The top and bottom strands of a double-stranded DNA are base paired with one other by Watson-Crick interactions.


The term “denaturing,” as used herein, refers to the separation of at least a portion of the base pairs of a nucleic acid duplex by placing the duplex in suitable denaturing conditions. Denaturing conditions are well known in the art. In one embodiment, in order to denature a nucleic acid duplex, the duplex may be exposed to a temperature that is above the T. of the duplex, thereby releasing one strand of the duplex from the other. In certain embodiments, a nucleic acid may be denatured by exposing it to a temperature of at least 90° C. for a suitable amount of time (e.g., at least 30 seconds, up to 30 mins). In certain embodiments, fully denaturing conditions may be used to completely separate the base pairs of the duplex. In other embodiments, partially denaturing conditions (e.g., with a lower temperature than fully denaturing conditions) may be used to separate the base pairs of certain parts of the duplex (e.g., regions enriched for A-T base pairs may separate while regions enriched for G-C base pairs may remain paired). Nucleic acid may also be denatured chemically (e.g., using urea or NaOH).


The term “genotyping,” as used herein, refers to any type of analysis of a nucleic acid sequence, and includes sequencing, polymorphism (SNP) analysis, and analysis to identify rearrangements.


The term “sequencing,” as used herein, refers to a method by which the identity of consecutive nucleotides of a polynucleotide are obtained.


The term “next-generation sequencing” refers to the so-called parallelized sequencing-by-synthesis or sequencing-by-ligation platforms, for example, those currently employed by Illumina, Life Technologies, and Roche, etc. Next-generation sequencing methods may also include nanopore sequencing methods or electronic-detection based methods such as Ion Torrent technology commercialized by Life Technologies.


The term “complementary DNA” or cDNA refers to a double-stranded DNA sample that was produced from an RNA sample by reverse transcription of RNA (using primers such as random hexamers or oligo-dT primers) followed by second-strand synthesis by digestion of the RNA with RNaseH and synthesis by DNA polymerase.


The term “RNA promoter adapter” is an adapter that contains a promoter for a bacteriophage RNA polymerase, e.g., the RNA polymerase from bacteriophage T3, T7, SP6 or the like.


Other definitions of terms may appear throughout the specification.


For any of the structural and functional characteristics described herein, methods of determining these characteristics are known in the art.


Guide Nucleic Acids (gNAs)


Provided herein are guide nucleic acids (gNAs) derivable from any nucleic acid source. The gNAs can be guide RNAs (gRNAs) or guide DNAs (gDNAs). The nucleic acid source can be DNA or RNA. Provided herein are methods to generate gNAs from any source nucleic acid, including DNA from a single organism, or mixtures of DNA from multiple organisms, or mixtures of DNA from multiple species, or DNA from clinical samples, or DNA from forensic samples, or DNA from environmental samples, or DNA from metagenomic DNA samples (for example a sample that contains more than one species of organism). Examples of any source DNA include, but are not limited to any genome, any genome fragment, cDNA, synthetic DNA, or a DNA collection (e.g. a SNP collection, DNA libraries). The gNAs provided herein can be used for genome-wide applications.


In some embodiments, the gNAs are derived from genomic sequences (e.g., genomic DNA). In some embodiments, the gNAs are derived from mammalian genomic sequences. In some embodiments, the gNAs are derived from eukaryotic genomic sequences. In some embodiments, the gNAs are derived from prokaryotic genomic sequences. In some embodiments, the gNAs are derived from viral genomic sequences. In some embodiments, the gNAs are derived from bacterial genomic sequences. In some embodiments, the gNAs are derived from plant genomic sequences. In some embodiments, the gNAs are derived from microbial genomic sequences. In some embodiments, the gNAs are derived from genomic sequences from a parasite, for example a eukaryotic parasite.


In some embodiments, the gNAs are derived from repetitive DNA. In some embodiments, the gNAs are derived from abundant DNA. In some embodiments, the gNAs are derived from mitochondrial DNA. In some embodiments, the gNAs are derived from ribosomal DNA. In some embodiments, the gNAs are derived from centromeric DNA. In some embodiments, the gNAs are derived from DNA comprising Alu elements (Alu DNA). In some embodiments, the gNAs are derived from DNA comprising long interspersed nuclear elements (LINE DNA). In some embodiments, the gNAs are derived from DNA comprising short interspersed nuclear elements (SINE DNA). In some embodiments the abundant DNA comprises ribosomal DNA. In some embodiments, the abundant DNA comprises host DNA (e.g., host genomic DNA or all host DNA). In an example, the gNAs can be derived from host DNA (e.g., human, animal, plant) for the depletion of host DNA to allow for easier analysis of other DNA that is present (e.g., bacterial, viral, or other metagenomic DNA). In another example, the gNAs can be derived from the one or more most abundant types (e.g., species) in a mixed sample, such as the one or more most abundant bacteria species in a metagenomic sample. The one or more most abundant types (e.g., species) can comprise the two, three, four, five, six, seven, eight, nine, ten, or more than ten most abundant types (e.g., species). The most abundant types can be the most abundant kingdoms, phyla or divisions, classes, orders, families, genuses, species, or other classifications. The most abundant types can be the most abundant cell types, such as epithelial cells, bone cells, muscle cells, blood cells, adipose cells, or other cell types. The most abundant types can be non-cancerous cells. The most abundant types can be cancerous cells. The most abundant types can be animal, human, plant, fungal, bacterial, or viral. gNAs can be derived from both a host and the one or more most abundant non-host types (e.g., species) in a sample, such as from both human DNA and the DNA of the one or more most abundant bacterial species. In some embodiments, the abundant DNA comprises DNA from the more abundant or most abundant cells in a sample. For example, for a specific sample, the highly abundant cells can be extracted and their DNA can be used to produce gNAs; these gNAs can be used to produce depletion library and applied to original sample to enable or enhance sequencing or detection of low abundance targets.


In some embodiments, the gNAs are derived from DNA comprising short terminal repeats (STRs).


In some embodiments, the gNAs are derived from a genomic fragment, comprising a region of the genome, or the whole genome itself. In one embodiment, the genome is a DNA genome. In another embodiment, the genome is a RNA genome.


In some embodiments, the gNAs are derived from a eukaryotic or prokaryotic organism; from a mammalian organism or a non-mammalian organism; from an animal or a plant; from a bacteria or virus; from an animal parasite; from a pathogen.


In some embodiments, the gNAs are derived from any mammalian organism. In one embodiment the mammal is a human. In another embodiment the mammal is a livestock animal, for example a horse, a sheep, a cow, a pig, or a donkey. In another embodiment, a mammalian organism is a domestic pet, for example a cat, a dog, a gerbil, a mouse, a rat. In another embodiment the mammal is a type of a monkey.


In some embodiments, the gNAs are derived from any bird or avian organism. An avian organism includes but is not limited to chicken, turkey, duck and goose.


In some embodiments, the gNAs are derived from a plant. In one embodiment, the plant is rice, maize, wheat, rose, grape, coffee, fruit, tomato, potato, or cotton.


In some embodiments, the gNAs are derived from a species of bacteria. In one embodiment, the bacteria are tuberculosis-causing bacteria.


In some embodiments, the gNAs are derived from a virus.


In some embodiments, the gNAs are derived from a species of fungi.


In some embodiments, the gNAs are derived from a species of algae.


In some embodiments, the gNAs are derived from any mammalian parasite.


In some embodiments, the gNAs are derived from any mammalian parasite. In one embodiment, the parasite is a worm. In another embodiment, the parasite is a malaria-causing parasite. In another embodiment, the parasite is a Leishmaniasis-causing parasite. In another embodiment, the parasite is an amoeba.


In some embodiments, the gNAs are derived from a nucleic acid target. Contemplated targets include, but are not limited to, pathogens; single nucleotide polymorphisms (SNPs), insertions, deletions, tandem repeats, or translocations; human SNPs or STRs; potential toxins; or animals, fungi, and plants. In some embodiments, the gRNAs are derived from pathogens, and are pathogen-specific gNAs.


In some embodiments, a guide NA of the invention comprises a first NA segment comprising a targeting sequence, wherein the targeting sequence is 15-250 bp; and a second NA segment comprising a nucleic acid guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence. In some embodiments, the targeting sequence is greater than 21 bp, greater than 22 bp, greater than 23 bp, greater than 24 bp, greater than 25 bp, greater than 26 bp, greater than 27 bp, greater than 28 bp, greater than 29 bp, greater than 30 bp, greater than 40 bp, greater than 50 bp, greater than 60 bp, greater than 70 bp, greater than 80 bp, greater than 90 bp, greater than 100 bp, greater than 110 bp, greater than 120 bp, greater than 130 bp, greater than 140 bp, or even greater than 150 bp. In an exemplary embodiment, the targeting sequence is greater than 30 bp. In some embodiments, the targeting sequences of the present invention range in size from 30-50 bp. In some embodiments, targeting sequences of the present invention range in size from 30-75 bp. In some embodiments, targeting sequences of the present invention range in size from 30-100 bp. For example, a targeting sequence can be at least 15 bp, 20 bp, 25 bp, 30 bp, 35 bp, 40 bp, 45 bp, 50 bp, 55 bp, 60 bp, 65 bp, 70 bp, 75 bp, 80 bp, 85 bp, 90 bp, 95 bp, 100 bp, 110 bp, 120 bp, 130 bp, 140 bp, 150 bp, 160 bp, 170 bp, 180 bp, 190 bp, 200 bp, 210 bp, 220 bp, 230 bp, 240 bp, or 250 bp. In specific embodiments, the targeting sequence is at least 22 bp. In specific embodiments, the targeting sequence is at least 30 bp.


In some embodiments, target-specific gNAs can comprise a nucleic acid sequence that is complementary to a region on the opposite strand of the targeted nucleic acid sequence 5′ to a PAM sequence, which can be recognized by a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein. In some embodiments the targeted nucleic acid sequence is immediately 5′ to a PAM sequence. In specific embodiments, the nucleic acid sequence of the gNA that is complementary to a region in a target nucleic acid is 15-250 bp. In specific embodiments, the nucleic acid sequence of the gNA that is complementary to a region in a target nucleic acid is 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 75, 80, 90, or 100 bp.


In some particular embodiments, the targeting sequence is not 20 bp. In some particular embodiments, the targeting sequence is not 21 bp.


In some embodiments, the gNAs comprise any purines or pyrimidines (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, uracil, guanine, and cytosine (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, thymine, guanine, and cytosine (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, thymine, guanine, cytosine and uracil (and/or modified versions of the same).


In some embodiments, the gNAs comprise a label, are attached to a label, or are capable of being labeled. In some embodiments, the gNA comprises is a moiety that is further capable of being attached to a label. A label includes, but is not limited to, enzyme, an enzyme substrate, an antibody, an antigen binding fragment, a peptide, a chromophore, a lumiphore, a fluorophore, a chromogen, a hapten, an antigen, a radioactive isotope, a magnetic particle, a metal nanoparticle, a redox active marker group (capable of undergoing a redox reaction), an aptamer, one member of a binding pair, a member of a FRET pair (either a donor or acceptor fluorophore), and combinations thereof.


In some embodiments, the gNAs are attached to a substrate. The substrate can be made of glass, plastic, silicon, silica-based materials, functionalized polystyrene, functionalized polyethyleneglycol, functionalized organic polymers, nitrocellulose or nylon membranes, paper, cotton, and materials suitable for synthesis. Substrates need not be flat. In some embodiments, the substrate is a 2-dimensional array. In some embodiments, the 2-dimensional array is flat. In some embodiments, the 2-dimensional array is not flat, for example, the array is a wave-like array. Substrates include any type of shape including spherical shapes (e.g., beads). Materials attached to substrates may be attached to any portion of the substrates (e.g., may be attached to an interior portion of a porous substrates material). In some embodiments, the substrate is a 3-dimensional array, for example, a microsphere. In some embodiments, the microsphere is magnetic. In some embodiments, the microsphere is glass. In some embodiments, the microsphere is made of polystyrene. In some embodiments, the microsphere is silica-based. In some embodiments, the substrate is an array with interior surface, for example, is a straw, tube, capillary, cylindrical, or microfluidic chamber array. In some embodiments, the substrate comprises multiple straws, capillaries, tubes, cylinders, or chambers.


Nucleic Acids Encoding gNAs


Also provided herein are nucleic acids encoding for gNAs (e.g., gRNAs or gDNAs). In some embodiments, by encoding it is meant that a gNA results from the transcription of a nucleic acid encoding for a gNA (e.g., gRNA). In some embodiments, by encoding, it is meant that the nucleic acid is a template for the transcription of a gNA (e.g., gRNA). In some embodiments, by encoding, it is meant that a gNA results from the reverse transcription of a nucleic acid encoding for a gNA. In some embodiments, by encoding, it is meant that the nucleic acid is a template for the reverse transcription of a gNA. In some embodiments, by encoding, it is meant that a gNA results from the amplification of a nucleic acid encoding for a gNA. In some embodiments, by encoding, it is meant that the nucleic acid is a template for the amplification of a gNA.


In some embodiments the nucleic acid encoding for a gNA comprises a first segment comprising a regulatory region; a second segment comprising targeting sequence, wherein the second segment can range from 15 bp-250 bp; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence.


In some embodiments, the nucleic acids encoding for gNAs comprise DNA. In some embodiments, the first segment is double stranded DNA. In some embodiments, the first segment is single stranded DNA. In some embodiments, the second segment is single stranded DNA. In some embodiments, the third segment is single stranded DNA. In some embodiments, the second segment is double stranded DNA. In some embodiments, the third segment is double stranded DNA.


In some embodiments, the nucleic acids encoding for gNAs comprise RNA.


In some embodiments the nucleic acids encoding for gNAs comprise DNA and RNA.


In some embodiments, the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3.


Collections of gNAs


Provided herein are collections (interchangeably referred to as libraries) of gNAs.


As used herein, a collection of gNAs denotes a mixture of gNAs containing at least 102 unique gNAs. In some embodiments a collection of gNAs contains at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 unique gNAs. In some embodiments a collection of gNAs contains a total of at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 gNAs.


In some embodiments, a collection of gNAs comprises a first NA segment comprising a targeting sequence; and a second NA segment comprising a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein at least 10% of the gNAs in the collection vary in size. In some embodiments, the first and second segments are in 5′- to 3′-order′.


In some embodiments, the size of the first segment varies from 15-250 bp, or 30-100 bp, or 22-30 bp, or 15-50 bp, or 15-75 bp, or 15-100 bp, or 15-125 bp, or 15-150 bp, or 15-175 bp, or 15-200 bp, or 15-225 bp, or 15-250 bp, or 22-50 bp, or 22-75 bp, or 22-100 bp, or 22-125 bp, or 22-150 bp, or 22-175 bp, or 22-200 bp, or 22-225 bp, or 22-250 bp across the collection of gNAs.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are greater than 21 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are greater than 25 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are greater than 30 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are 15-50 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are 30-100 bp.


In some particular embodiments, the size of the first segment is not 20 bp.


In some particular embodiments, the size of the first segment is not 21 bp.


In some embodiments, the gNAs and/or the targeting sequence of the gNAs in the collection of gRNAs comprise unique 5′ ends. In some embodiments, the collection of gNAs exhibit variability in sequence of the 5′ end of the targeting sequence, across the members of the collection. In some embodiments, the collection of gNAs exhibit variability at least 5%, or at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75% variability in the sequence of the 5′ end of the targeting sequence, across the members of the collection.


In some embodiments, the 3′ end of the gNA targeting sequence can be any purine or pyrimidine (and/or modified versions of the same). In some embodiments, the 3′ end of the gNA targeting sequence is an adenine. In some embodiments, the 3′ end of the gNA targeting sequence is a guanine. In some embodiments, the 3′ end of the gNA targeting sequence is a cytosine. In some embodiments, the 3′ end of the gNA targeting sequence is a uracil. In some embodiments, the 3′ end of the gNA targeting sequence is a thymine. In some embodiments, the 3′ end of the gNA targeting sequence is not cytosine.


In some embodiments, the collection of gNAs comprises targeting sequences which can base-pair with the targeted DNA, wherein the target of interest is spaced at least every 1 bp, at least every 2 bp, at least every 3 bp, at least every 4 bp, at least every 5 bp, at least every 6 bp, at least every 7 bp, at least every 8 bp, at least every 9 bp, at least every 10 bp, at least every 11 bp, at least every 12 bp, at least every 13 bp, at least every 14 bp, at least every 15 bp, at least every 16 bp, at least every 17 bp, at least every 18 bp, at least every 19 bp, 20 bp, at least every 25 bp, at least every 30 bp, at least every 40 bp, at least every 50 bp, at least every 100 bp, at least every 200 bp, at least every 300 bp, at least every 400 bp, at least every 500 bp, at least every 600 bp, at least every 700 bp, at least every 800 bp, at least every 900 bp, at least every 1000 bp, at least every 2500 bp, at least every 5000 bp, at least every 10,000 bp, at least every 15,000 bp, at least every 20,000 bp, at least every 25,000 bp, at least every 50,000 bp, at least every 100,000 bp, at least every 250,000 bp, at least every 500,000 bp, at least every 750,000 bp, or even at least every 1,000,000 bp across a genome of interest.


In some embodiments, the collection of gNAs comprises a first NA segment comprising a targeting sequence; and a second NA segment comprising a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein the gNAs in the collection can have a variety of second NA segments with various specificities for protein members of the nucleic acid-guided nuclease system (e.g., CRISPR/Cas system). For example a collection of gNAs as provided herein, can comprise members whose second segment comprises a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a first nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein; and also comprises members whose second segment comprises a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein, wherein the first and second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins are not the same. In some embodiments a collection of gNAs as provided herein comprises members that exhibit specificity to at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or even at least 20 nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins. In one specific embodiment, a collection of gNAs as provided herein comprises members that exhibit specificity for a Cas9 protein and another protein selected from the group consisting of Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.


In some embodiments, a plurality of the gNA members of the collection are attached to a label, comprise a label or are capable of being labeled. In some embodiments, the gNA comprises is a moiety that is further capable of being attached to a label. A label includes, but is not limited to, enzyme, an enzyme substrate, an antibody, an antigen binding fragment, a peptide, a chromophore, a lumiphore, a fluorophore, a chromogen, a hapten, an antigen, a radioactive isotope, a magnetic particle, a metal nanoparticle, a redox active marker group (capable of undergoing a redox reaction), an aptamer, one member of a binding pair, a member of a FRET pair (either a donor or acceptor fluorophore), and combinations thereof.


In some embodiments, a plurality of the gNA members of the collection are attached to a substrate. The substrate can be made of glass, plastic, silicon, silica-based materials, functionalized polystyrene, functionalized polyethyleneglycol, functionalized organic polymers, nitrocellulose or nylon membranes, paper, cotton, and materials suitable for synthesis. Substrates need not be flat. In some embodiments, the substrate is a 2-dimensional array. In some embodiments, the 2-dimensional array is flat. In some embodiments, the 2-dimensional array is not flat, for example, the array is a wave-like array. Substrates include any type of shape including spherical shapes (e.g., beads). Materials attached to substrates may be attached to any portion of the substrates (e.g., may be attached to an interior portion of a porous substrates material). In some embodiments, the substrate is a 3-dimensional array, for example, a microsphere. In some embodiments, the microsphere is magnetic. In some embodiments, the microsphere is glass. In some embodiments, the microsphere is made of polystyrene. In some embodiments, the microsphere is silica-based. In some embodiments, the substrate is an array with interior surface, for example, is a straw, tube, capillary, cylindrical, or microfluidic chamber array. In some embodiments, the substrate comprises multiple straws, capillaries, tubes, cylinders, or chambers.


Collections of Nucleic Acids Encoding gNAs


Provided herein are collections (interchangeably referred to as libraries) of nucleic acids encoding for gNAs (e.g., gRNAs or gDNAs). In some embodiments, by encoding it is meant that a gNA results from the transcription of a nucleic acid encoding for a gNA. In some embodiments, by encoding, it is meant that the nucleic acid is a template for the transcription of a gNA.


As used herein, a collection of nucleic acids encoding for gNAs denotes a mixture of nucleic acids containing at least 102 unique nucleic acids. In some embodiments a collection of nucleic acids encoding for gNAs contains at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 unique nucleic acids encoding for gNAs. In some embodiments a collection of nucleic acids encoding for gNAs contains a total of at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 nucleic acids encoding for gNAs.


In some embodiments, a collection of nucleic acids encoding for gNAs comprises a first segment comprising a regulatory region; a second segment comprising a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein at least 10% of the nucleic acids in the collection vary in size.


In some embodiments, the first, second, and third segments are in 5′- to 3′-order′.


In some embodiments, the nucleic acids encoding for gNAs comprise DNA. In some embodiments, the first segment is single stranded DNA. In some embodiments, the first segment is double stranded DNA. In some embodiments, the second segment is single stranded DNA. In some embodiments, the third segment is single stranded DNA. In some embodiments, the second segment is double stranded DNA. In some embodiments, the third segment is double stranded DNA.


In some embodiments, the nucleic acids encoding for gNAs comprise RNA.


In some embodiments the nucleic acids encoding for gNAs comprise DNA and RNA.


In some embodiments, the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3.


In some embodiments, the size of the second segments (targeting sequence) in the collection varies from 15-250 bp, or 30-100 bp, or 22-30 bp, or 15-50 bp, or 15-75 bp, or 15-100 bp, or 15-125 bp, or 15-150 bp, or 15-175 bp, or 15-200 bp, or 15-225 bp, or 15-250 bp, or 22-50 bp, or 22-75 bp, or 22-100 bp, or 22-125 bp, or 22-150 bp, or 22-175 bp, or 22-200 bp, or 22-225 bp, or 22-250 bp across the collection of gNAs.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are greater than 21 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are greater than 25 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are greater than 30 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are 15-50 bp.


In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are 30-100 bp.


In some particular embodiments, the size of the second segment is not 20 bp.


In some particular embodiments, the size of the second segment is not 21 bp.


In some embodiments, the gNAs and/or the targeting sequence of the gNAs in the collection of gNAs comprise unique 5′ ends. In some embodiments, the collection of gNAs exhibit variability in sequence of the 5′ end of the targeting sequence, across the members of the collection. In some embodiments, the collection of gNAs exhibit variability at least 5%, or at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75% variability in the sequence of the 5′ end of the targeting sequence, across the members of the collection.


In some embodiments, the collection of nucleic acids comprises targeting sequences, wherein the target of interest is spaced at least every 1 bp, at least every 2 bp, at least every 3 bp, at least every 4 bp, at least every 5 bp, at least every 6 bp, at least every 7 bp, at least every 8 bp, at least every 9 bp, at least every 10 bp, at least every 11 bp, at least every 12 bp, at least every 13 bp, at least every 14 bp, at least every 15 bp, at least every 16 bp, at least every 17 bp, at least every 18 bp, at least every 19 bp, 20 bp, at least every 25 bp, at least every 30 bp, at least every 40 bp, at least every 50 bp, at least every 100 bp, at least every 200 bp, at least every 300 bp, at least every 400 bp, at least every 500 bp, at least every 600 bp, at least every 700 bp, at least every 800 bp, at least every 900 bp, at least every 1000 bp, at least every 2500 bp, at least every 5000 bp, at least every 10,000 bp, at least every 15,000 bp, at least every 20,000 bp, at least every 25,000 bp, at least every 50,000 bp, at least every 100,000 bp, at least every 250,000 bp, at least every 500,000 bp, at least every 750,000 bp, or even at least every 1,000,000 bp across a genome of interest.


In some embodiments, the collection of nucleic acids encoding for gNAs comprise a third segment encoding for a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein the segments in the collection vary in their specificity for protein members of the nucleic acid-guided nuclease system (e.g., CRISPR/Cas system). For example, a collection of nucleic acids encoding for gNAs as provided herein, can comprise members whose third segment encode for a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a first nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein; and also comprises members whose third segment encodes for a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein, wherein the first and second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins are not the same. In some embodiments, a collection of nucleic acids encoding for gNAs as provided herein comprises members that exhibit specificity to at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or even at least 20 nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins. In one specific embodiment, a collection of nucleic acids encoding for gNAs as provided herein comprises members that exhibit specificity for a Cas9 protein and another protein selected from the group consisting of Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.


Sequences of Interest

Provided herein are gNAs and collections of gNAs, derived from any source DNA (for example from genomic DNA, cDNA, artificial DNA, DNA libraries), that can be used to target sequences of interest in a sample for a variety of applications including, but not limited to, enrichment, depletion, capture, partitioning, labeling, regulation, and editing. The gNAs comprise a targeting sequence, directed at sequences of interest.


In some embodiments, the sequences of interest are genomic sequences (genomic DNA). In some embodiments, the sequences of interest are mammalian genomic sequences. In some embodiments, the sequences of interest are eukaryotic genomic sequences. In some embodiments, the sequences of interest are prokaryotic genomic sequences. In some embodiments, the sequences of interest are viral genomic sequences. In some embodiments, the sequences of interest are bacterial genomic sequences. In some embodiments, the sequences of interest are plant genomic sequences. In some embodiments, the sequences of interest are microbial genomic sequences. In some embodiments, the sequences of interest are genomic sequences from a parasite, for example a eukaryotic parasite. In some embodiments, the sequences of interest are host genomic sequences (e.g., the host organism of a microbiome, a parasite, or a pathogen). In some embodiments, the sequences of interest are abundant genomic sequences, such as sequences from the genome or genomes of the most abundant species in a sample.


In some embodiments, the sequences of interest comprise repetitive DNA. In some embodiments, the sequences of interest comprise abundant DNA. In some embodiments, the sequences of interest comprise mitochondrial DNA. In some embodiments, the sequences of interest comprise ribosomal DNA. In some embodiments, the sequences of interest comprise centromeric DNA. In some embodiments, the sequences of interest comprise DNA comprising Alu elements (Alu DNA). In some embodiments, the sequences of interest comprise long interspersed nuclear elements (LINE DNA). In some embodiments, the sequences of interest comprise short interspersed nuclear elements (SINE DNA). In some embodiments, the abundant DNA comprises ribosomal DNA.


In some embodiments, the sequences of interest comprise single nucleotide polymorphisms (SNPs), short tandem repeats (STRs), cancer genes, inserts, deletions, structural variations, exons, genetic mutations, or regulatory regions.


In some embodiments, the sequences of interest can be a genomic fragment, comprising a region of the genome, or the whole genome itself. In one embodiment, the genome is a DNA genome. In another embodiment, the genome is a RNA genome.


In some embodiments, the sequences of interest are from a eukaryotic or prokaryotic organism; from a mammalian organism or a non-mammalian organism; from an animal or a plant; from a bacteria or virus; from an animal parasite; from a pathogen.


In some embodiments, the sequences of interest are from any mammalian organism. In one embodiment the mammal is a human. In another embodiment the mammal is a livestock animal, for example a horse, a sheep, a cow, a pig, or a donkey. In another embodiment, a mammalian organism is a domestic pet, for example a cat, a dog, a gerbil, a mouse, a rat. In another embodiment the mammal is a type of a monkey.


In some embodiments, the sequences of interest are from any bird or avian organism. An avian organism includes but is not limited to chicken, turkey, duck and goose.


In some embodiments, the sequences of interest are from a plant. In one embodiment, the plant is rice, maize, wheat, rose, grape, coffee, fruit, tomato, potato, or cotton.


In some embodiments, the sequences of interest are from a species of bacteria. In one embodiment, the bacteria are tuberculosis-causing bacteria.


In some embodiments, the sequences of interest are from a virus.


In some embodiments, the sequences of interest are from a species of fungi.


In some embodiments, the sequences of interest are from a species of algae.


In some embodiments, the sequences of interest are from any mammalian parasite.


In some embodiments, the sequences of interest are obtained from any mammalian parasite. In one embodiment, the parasite is a worm. In another embodiment, the parasite is a malaria-causing parasite. In another embodiment, the parasite is a Leishmaniasis-causing parasite. In another embodiment, the parasite is an amoeba.


In some embodiments, the sequences of interest are from a pathogen.


Targeting Sequences

As used herein, a targeting sequence is one that directs the gNA to the sequences of interest in a sample. For example, a targeting sequence targets a particular sequence of interest, for example the targeting sequence targets a genomic sequence of interest.


Provided herein are gNAs and collections of gNAs that comprise a segment that comprises a targeting sequence. Also provided herein, are nucleic acids encoding for gNAs, and collections of nucleic acids encoding for gNAs that comprise a segment encoding for a targeting sequence.


In some embodiments, the targeting sequence comprises DNA.


In some embodiments, the targeting sequence comprises RNA.


In some embodiments, the targeting sequence comprises RNA, and shares at least 70% sequence identity, at least 75% sequence identity, at least 80% sequence identity, at least 85% sequence identity, at least 90% sequence identity, at least 95% sequence identity, or shares 100% sequence identity to a sequence 5′ to a PAM sequence on a sequence of interest, except that the RNA comprises uracils instead of thymines. In some embodiments, the PAM sequence is AGG, CGG, or TGG.


In some embodiments, the targeting sequence comprises DNA, and shares at least 70% sequence identity, at least 75% sequence identity, at least 80% sequence identity, at least 85% sequence identity, at least 90% sequence identity, at least 95% sequence identity, or shares 100% sequence identity to a sequence 5′ to a PAM sequence on a sequence of interest.


In some embodiments, the targeting sequence comprises RNA and is complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the targeting sequence is at least 70% complementary, at least 75% complementary, at least 80% complementary, at least 85% complementary, at least 90% complementary, at least 95% complementary, or is 100% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the PAM sequence is AGG, CGG, or TGG.


In some embodiments, the targeting sequence comprises DNA and is complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the targeting sequence is at least 70% complementary, at least 75% complementary, at least 80% complementary, at least 85% complementary, at least 90% complementary, at least 95% complementary, or is 100% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the PAM sequence is AGG, CGG, or TGG.


In some embodiments, a DNA encoding for a targeting sequence of a gRNA shares at least 70% sequence identity, at least 75% sequence identity, at least 80% sequence identity, at least 85% sequence identity, at least 90% sequence identity, at least 95% sequence identity, or shares 100% sequence identity to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the PAM sequence is AGG, CGG, or TGG.


In some embodiments, a DNA encoding for a targeting sequence of a gRNA is complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence and is at least 70% complementary, at least 75% complementary, at least 80% complementary, at least 85% complementary, at least 90% complementary, at least 95% complementary, or is 100% complementary to a sequence 5′ to a PAM sequence on a sequence of interest. In some embodiments, the PAM sequence is AGG, CGG, or TGG.


Nucleic Acid-Guided Nuclease System Proteins

Provided herein are gNAs and collections of gNAs comprising a segment that comprises a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence. Also provided herein, are nucleic acids encoding for gNAs, and collections of nucleic acids encoding for gNAs that comprise a segment encoding a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence. A nucleic acid-guided nuclease system can be an RNA-guided nuclease system. A nucleic acid-guided nuclease system can be a DNA-guided nuclease system.


Methods of the present disclosure can utilize nucleic acid-guided nucleases. As used herein, a “nucleic acid-guided nuclease” is any nuclease that cleaves DNA, RNA or DNA/RNA hybrids, and which uses one or more nucleic acid guide nucleic acids (gNAs) to confer specificity. Nucleic acid-guided nucleases include CRISPR/Cas system proteins as well as non-CRISPR/Cas system proteins.


The nucleic acid-guided nucleases provided herein can be DNA guided DNA nucleases; DNA guided RNA nucleases; RNA guided DNA nucleases; or RNA guided RNA nucleases. The nucleases can be endonucleases. The nucleases can be exonucleases. In one embodiment, the nucleic acid-guided nuclease is a nucleic acid-guided-DNA endonuclease. In one embodiment, the nucleic acid-guided nuclease is a nucleic acid-guided-RNA endonuclease.


A nucleic acid-guided nuclease system protein-binding sequence is a nucleic acid sequence that binds any protein member of a nucleic acid-guided nuclease system. For example, a CRISPR/Cas system protein-binding sequence is a nucleic acid sequence that binds any protein member of a CRISPR/Cas system.


In some embodiments, the nucleic acid-guided nuclease is selected from the group consisting of CAS Class I Type I, CAS Class I Type III, CAS Class I Type IV, CAS Class II Type II, and CAS Class II Type V. In some embodiments, CRISPR/Cas system proteins include proteins from CRISPR Type I systems, CRISPR Type II systems, and CRISPR Type III systems. In some embodiments, the nucleic acid-guided nuclease is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, Csf1, C2c2, and NgAgo.


In some embodiments, nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins) can be from any bacterial or archaeal species.


In some embodiments, the nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins) are from, or are derived from nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins) from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, or Corynebacter diphtheria.


In some embodiments, examples of nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins can be naturally occurring or engineered versions.


In some embodiments, naturally occurring nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins include Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. Engineered versions of such proteins can also be employed.


In some embodiments, engineered examples of nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins include catalytically dead nucleic acid-guided nuclease system proteins. The term “catalytically dead” generally refers to a nucleic acid-guided nuclease system protein that has inactivated nucleases (e.g., HNH and RuvC nucleases). Such a protein can bind to a target site in any nucleic acid (where the target site is determined by the guide NA), but the protein is unable to cleave or nick the target nucleic acid (e.g., double-stranded DNA). In some embodiments, the nucleic acid-guided nuclease system catalytically dead protein is a catalytically dead CRISPR/Cas system protein, such as catalytically dead Cas9 (dCas9). Accordingly, the dCas9 allows separation of the mixture into unbound nucleic acids and dCas9-bound fragments. In one embodiment, a dCas9/gRNA complex binds to targets determined by the gRNA sequence. The dCas9 bound can prevent cutting by Cas9 while other manipulations proceed. In another embodiment, the dCas9 can be fused to another enzyme, such as a transposase, to target that enzyme's activity to a specific site. Naturally occurring catalytically dead nucleic acid-guided nuclease system proteins can also be employed.


In some embodiments, engineered examples of nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins also include nucleic acid-guided nickases (e.g., Cas nickases). A nucleic acid-guided nickase refers to a modified version of a nucleic acid-guided nuclease system protein, containing a single inactive catalytic domain. In one embodiment, the nucleic acid-guided nickase is a Cas nickase, such as Cas9 nickase. A Cas9 nickase may contain a single inactive catalytic domain, for example, either the RuvC- or the HNH-domain. With only one active nuclease domain, the Cas9 nickase cuts only one strand of the target DNA, creating a single-strand break or “nick”. Depending on which mutant is used, the guide NA-hybridized strand or the non-hybridized strand may be cleaved. Nucleic acid-guided nickases bound to 2 gNAs that target opposite strands will create a double-strand break in a target double-stranded DNA. This “dual nickase” strategy can increase the specificity of cutting because it requires that both nucleic acid-guided nuclease/gNA (e.g., Cas9/gRNA) complexes be specifically bound at a site before a double-strand break is formed. Naturally occurring nickase nucleic acid-guided nuclease system proteins can also be employed.


In some embodiments, engineered examples of nucleic acid-guided nuclease system proteins also include nucleic acid-guided nuclease system fusion proteins. For example, a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein may be fused to another protein, for example an activator, a repressor, a nuclease, a fluorescent molecule, a radioactive tag, or a transposase.


In some embodiments, the nucleic acid-guided nuclease system protein-binding sequence comprises a gNA (e.g., gRNA) stem-loop sequence.


In some embodiments, a double-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA on the other strand (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4).


In some embodiments, a single-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence: (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA serves as a transcription template.


In some embodiments, the gNA (e.g., gRNA) stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 1)


In some embodiments, a double-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAA AGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and its reverse-complementary DNA on the other strand (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6).


In some embodiments, a single-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence: (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-stranded DNA serves as a transcription template.


In some embodiments, the gNA (e.g., gRNA) stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC) (SEQ ID NO: 2).


In some embodiments, provided herein is a nucleic acid encoding for a gNA (e.g., gRNA) comprising a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence. In some embodiments, the third segment comprises a single transcribed component, which upon transcription yields a NA (e.g., RNA) stem-loop sequence. In some embodiments, the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence is double-stranded, comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA on the other strand (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4). In some embodiments, the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence is single-stranded, and comprises the following DNA sequence: (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA serves as a transcription template. In some embodiments, upon transcription from the single transcribed component, the resulting gNA (e.g., gRNA) stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 1). In some embodiments, the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence is double-stranded, comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAA AGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and its reverse-complementary DNA on the other strand (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6). In some embodiments, the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence is single-stranded, and comprises the following DNA sequence: (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-stranded DNA serves as a transcription template. In some embodiments, upon transcription from the single transcribed component, the yielded gRNA stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC) (SEQ ID NO: 2). In some embodiments, the third segment comprises two sub-segments, which encode for a crRNA and a tracrRNA upon transcription. In some embodiment, the crRNA does not comprise the N20 plus the extra sequence which can hybridize with tracrRNA. In some embodiments, the crRNA comprises the extra sequence which can hybridize with tracrRNA. In some embodiments, the two sub-segments are independently transcribed. In some embodiments, the two sub-segments are transcribed as a single unit. In some embodiments, the DNA encoding the crRNA comprises NtargetGTTTTAGAGCTATGCTGTTTTG (SEQ ID NO: 7), where Ntarget represents the targeting sequence. In some embodiments, the DNA encoding the tracrRNA comprises the sequence GGAACCATTCAAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGC ACCGAGTCGGTGCTTTTTTT (SEQ ID NO: 8).


In some embodiments, provided herein is a nucleic acid encoding for a gNA (e.g., gRNA) comprising a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence. In some embodiments, the third segment comprises a DNA sequence, which upon transcription yields a gRNA stem-loop sequence capable of binding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein. In one embodiment, the DNA sequence can be double-stranded. In some embodiments, the third segment double stranded DNA comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA on the other strand (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4). In some embodiments, the third segment double stranded DNA comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAA AGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and its reverse-complementary DNA on the other strand (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6). In one embodiment, the DNA sequence can be single-stranded. In some embodiments, the third segment single stranded DNA comprises the following DNA sequence (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA serves as a transcription template. In some embodiments, the third segment single stranded DNA comprises the following DNA sequence (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-stranded DNA serves as a transcription template. In some embodiments, the third segment comprises a DNA sequence which, upon transcription, yields a first RNA sequence that is capable of forming a hybrid with a second RNA sequence, and which hybrid is capable of CRISPR/Cas system protein binding. In some embodiments, the third segment is double-stranded DNA comprising the DNA sequence on one strand: (5′>3′, GTTTTAGAGCTATGCTGTTTTG) (SEQ ID NO: 9) and its reverse complementary DNA sequence on the other strand: (5′>3′, CAAAACAGCATAGCTCTAAAAC) (SEQ ID NO: 10). In some embodiments, the third segment is single-stranded DNA comprising the DNA sequence of (5′>3′, CAAAACAGCATAGCTCTAAAAC) (SEQ ID NO: 10). In some embodiments, the second segment and the third segment together encode for a crRNA sequence. In some embodiments, the second RNA sequence that is capable of forming a hybrid with the first RNA sequence encoded by the third segment of the nucleic acid encoding a gRNA is a tracrRNA. In some embodiments, the tracrRNA comprises the sequence (5′>3′, GGAACCAUUCAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG GCACCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 11). In some embodiments, the tracrRNA is encoded by a double-stranded DNA comprising sequence of (5′>3′, GGAACCATTCAAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGC ACCGAGTCGGTGCTTTTTTT) (SEQ ID NO: 8), and optionally fused with a regulatory sequence at its 5′ end. In some embodiments, the regulatory sequence can be bound by a transcription factor. In some embodiments, the regulatory sequence is a promoter. In some embodiments, the regulatory sequence is a T7 promoter, comprising the sequence of (5′>3′, GCCTCGAGCTAATACGACTCACTATAGAG) (SEQ ID NO: 12).


In some embodiments, provided herein is a nucleic acid encoding for a gNA comprising a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence. In some embodiments, the third segment encodes for a RNA sequence that, upon post-transcriptional cleavage, yields a first RNA segment and a second RNA segment. In some embodiments, the first RNA segment comprises a crRNA and the second RNA segment comprises a tracrRNA, which can form a hybrid and together, provide for nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein binding. In some embodiments, the third segment further comprises a spacer in between the transcriptional unit for the first RNA segment and the second RNA segment, which spacer comprises an enzyme cleavage site.


In some embodiments, provided herein is a gNA (e.g., gRNA) comprising a first NA segment comprising a targeting sequence and a second NA segment comprising a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence. In some embodiments, the size of the first segment is greater than 30 bp. In some embodiments, the second segment comprises a single segment, which comprises the gRNA stem-loop sequence. In some embodiments, the gRNA stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 1). In some embodiments, the gRNA stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC) (SEQ ID NO: 2). In some embodiments, the second segment comprises two sub-segments: a first RNA sub-segment (crRNA) that forms a hybrid with a second RNA sub-segment (tracrRNA), which together act to direct nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein binding. In some embodiments, the sequence of the second sub-segment comprises GUUUUAGAGCUAUGCUGUUUUG. In some embodiments, the first RNA segment and the second RNA segment together forms a crRNA sequence. In some embodiments, the other RNA that will form a hybrid with the second RNA segment is a tracrRNA. In some embodiments the tracrRNA comprises the sequence of 5 ‘>3’, GGAACCAUUCAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG GCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 11).


CRISPR/Cas System Nucleic Acid-Guided Nucleases

In some embodiments, CRISPR/Cas system proteins are used in the embodiments provided herein. In some embodiments, CRISPR/Cas system proteins include proteins from CRISPR Type I systems, CRISPR Type II systems, and CRISPR Type III systems.


In some embodiments, CRISPR/Cas system proteins can be from any bacterial or archaeal species.


In some embodiments, the CRISPR/Cas system protein is isolated, recombinantly produced, or synthetic.


In some embodiments, the CRISPR/Cas system proteins are from, or are derived from CRISPR/Cas system proteins from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, or Corynebacter diphtheria.


In some embodiments, examples of CRISPR/Cas system proteins can be naturally occurring or engineered versions.


In some embodiments, naturally occurring CRISPR/Cas system proteins can belong to CAS Class I Type I, III, or IV, or CAS Class II Type II or V, and can include Cas9, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cmr5, Csf1, C2c2, and Cpf1.


In an exemplary embodiment, the CRISPR/Cas system protein comprises Cas9.


A “CRISPR/Cas system protein-gNA complex” refers to a complex comprising a CRISPR/Cas system protein and a guide NA (e.g. a gRNA or a gDNA). Where the gNA is a gRNA, the gRNA may be composed of two molecules, i.e., one RNA (“crRNA”) which hybridizes to a target and provides sequence specificity, and one RNA, the “tracrRNA”, which is capable of hybridizing to the crRNA. Alternatively, the guide RNA may be a single molecule (i.e., a gRNA) that contains crRNA and tracrRNA sequences.


A CRISPR/Cas system protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type CRISPR/Cas system protein. The CRISPR/Cas system protein may have all the functions of a wild type CRISPR/Cas system protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.


The term “CRISPR/Cas system protein-associated guide NA” refers to a guide NA. The CRISPR/Cas system protein-associated guide NA may exist as isolated NA, or as part of a CRISPR/Cas system protein-gNA complex.


Cas9

In some embodiments, the CRISPR/Cas System protein nucleic acid-guided nuclease is or comprises Cas9. The Cas9 of the present invention can be isolated, recombinantly produced, or synthetic.


Examples of Cas9 proteins that can be used in the embodiments herein can be found in F. A. Ran, L. Cong, W. X. Yan, D. A. Scott, J. S. Gootenberg, A. J. Kriz, B. Zetsche, O. Shalem, X. Wu, K S. Makarova, E. V. Koonin, P. A. Sharp, and F. Zhang; “In vivo genome editing using Staphylococcus aureus Cas9,” Nature 520, 186-191 (9 Apr. 2015) doi:10.1038/nature14299, which is incorporated herein by reference.


In some embodiments, the Cas9 is a Type II CRISPR system derived from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, or Corynebacter diphtheria.


In some embodiments, the Cas9 is a Type II CRISPR system derived from S. pyogenes and the PAM sequence is NGG located on the immediate 3′ end of the target specific guide sequence. The PAM sequences of Type II CRISPR systems from exemplary bacterial species can also include: Streptococcus pyogenes (NGG), Staph aureus (NNGRRT), Neisseria meningitidis (NNNNGA TT), Streptococcus thermophilus (NNAGAA) and Treponema denticola (NAAAAC) which are all usable without deviating from the present invention.


In one exemplary embodiment, Cas9 sequence can be obtained, for example, from the pX330 plasmid (available from Addgene), re-amplified by PCR then cloned into pET30 (from EMD biosciences) to express in bacteria and purify the recombinant 6His tagged protein.


A “Cas9-gNA complex” refers to a complex comprising a Cas9 protein and a guide NA. A Cas9 protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type Cas9 protein, e.g., to the Streptococcus pyogenes Cas9 protein. The Cas9 protein may have all the functions of a wild type Cas9 protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.


The term “Cas9-associated guide NA” refers to a guide NA as described above. The Cas9-associated guide NA may exist isolated, or as part of a Cas9-gNA complex.


Non-CRISPR/Cas System Nucleic Acid-Guided Nucleases

In some embodiments, non-CRISPR/Cas system proteins are used in the embodiments provided herein.


In some embodiments, the non-CRISPR/Cas system proteins can be from any bacterial or archaeal species.


In some embodiments, the non-CRISPR/Cas system protein is isolated, recombinantly produced, or synthetic.


In some embodiments, the non-CRISPR/Cas system proteins are from, or are derived from Aquifex aeolicus, Thermus thermophilus, Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, Natronobacterium gregoryi, or Corynebacter diphtheria.


In some embodiments, the non-CRISPR/Cas system proteins can be naturally occurring or engineered versions.


In some embodiments, a naturally occurring non-CRISPR/Cas system protein is NgAgo (Argonaute from Natronobacterium gregoryi).


A “non-CRISPR/Cas system protein-gNA complex” refers to a complex comprising a non-CRISPR/Cas system protein and a guide NA (e.g. a gRNA or a gDNA). Where the gNA is a gRNA, the gRNA may be composed of two molecules, i.e., one RNA (“crRNA”) which hybridizes to a target and provides sequence specificity, and one RNA, the “tracrRNA”, which is capable of hybridizing to the crRNA. Alternatively, the guide RNA may be a single molecule (i.e., a gRNA) that contains crRNA and tracrRNA sequences.


A non-CRISPR/Cas system protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type non-CRISPR/Cas system protein. The non-CRISPR/Cas system protein may have all the functions of a wild type non-CRISPR/Cas system protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.


The term “non-CRISPR/Cas system protein-associated guide NA” refers to a guide NA. The non-CRISPR/Cas system protein-associated guide NA may exist as isolated NA, or as part of a non-CRISPR/Cas system protein-gNA complex.


Catalytically Dead Nucleic Acid-Guided Nucleases

In some embodiments, engineered examples of nucleic acid-guided nucleases include catalytically dead nucleic acid-guided nucleases (CRISPR/Cas system nucleic acid-guided nucleases or non-CRISPR/Cas system nucleic acid-guided nucleases). The term “catalytically dead” generally refers to a nucleic acid-guided nuclease that has inactivated nucleases, for example inactivated HNH and RuvC nucleases. Such a protein can bind to a target site in any nucleic acid (where the target site is determined by the guide NA), but the protein is unable to cleave or nick the nucleic acid.


Accordingly, the catalytically dead nucleic acid-guided nuclease allows separation of the mixture into unbound nucleic acids and catalytically dead nucleic acid-guided nuclease-bound fragments. In one exemplary embodiment, a dCas9/gRNA complex binds to the targets determined by the gRNA sequence. The dCas9 bound can prevent cutting by Cas9 while other manipulations proceed.


In another embodiment, the catalytically dead nucleic acid-guided nuclease can be fused to another enzyme, such as a transposase, to target that enzyme's activity to a specific site.


In some embodiments, the catalytically dead nucleic acid-guided nuclease is dCas9, dCpf1, dCas3, dCas8a-c, dCas10, dCse1, dCsy1, dCsn2, dCas4, dCsm2, dCm5, dCsf1, dC2C2, or dNgAgo.


In one exemplary embodiment the catalytically dead nucleic acid-guided nuclease protein is a dCas9.


Nucleic Acid-Guided Nuclease Nickases

In some embodiments, engineered examples of nucleic acid-guided nucleases include nucleic acid-guided nuclease nickases (referred to interchangeably as nickase nucleic acid-guided nucleases).


In some embodiments, engineered examples of nucleic acid-guided nucleases include CRISPR/Cas system nickases or non-CRISPR/Cas system nickases, containing a single inactive catalytic domain.


In some embodiments, the nucleic acid-guided nuclease nickase is a Cas9 nickase, Cpf1 nickase, Cas3 nickase, Cas8a-c nickase, Cas10 nickase, Cse1 nickase, Csy1 nickase, Csn2 nickase, Cas4 nickase, Csm2 nickase, Cm5 nickase, Csf1 nickase, C2C2 nickase, or a NgAgo nickase.


In one embodiment, the nucleic acid-guided nuclease nickase is a Cas9 nickase.


In some embodiments, a nucleic acid-guided nuclease nickase can be used to bind to target sequence. With only one active nuclease domain, the nucleic acid-guided nuclease nickase cuts only one strand of a target DNA, creating a single-strand break or “nick”. Depending on which mutant is used, the guide NA-hybridized strand or the non-hybridized strand may be cleaved. nucleic acid-guided nuclease nickases bound to 2 gNAs that target opposite strands can create a double-strand break in the nucleic acid. This “dual nickase” strategy increases the specificity of cutting because it requires that both nucleic acid-guided nuclease/gNA complexes be specifically bound at a site before a double-strand break is formed.


In exemplary embodiments, a Cas9 nickase can be used to bind to target sequence. The term “Cas9 nickase” refers to a modified version of the Cas9 protein, containing a single inactive catalytic domain, i.e., either the RuvC- or the HNH-domain. With only one active nuclease domain, the Cas9 nickase cuts only one strand of the target DNA, creating a single-strand break or “nick”. Depending on which mutant is used, the guide RNA-hybridized strand or the non-hybridized strand may be cleaved. Cas9 nickases bound to 2 gRNAs that target opposite strands will create a double-strand break in the DNA. This “dual nickase” strategy can increase the specificity of cutting because it requires that both Cas9/gRNA complexes be specifically bound at a site before a double-strand break is formed.


Capture of DNA can be carried out using a nucleic acid-guided nuclease nickase. In one exemplary embodiment, a nucleic acid-guided nuclease nickase cuts a single strand of double stranded nucleic acid, wherein the double stranded region comprises methylated nucleotides.


Dissociable and Thermostable Nucleic Acid-Guided Nucleases

In some embodiments, thermostable nucleic acid-guided nucleases are used in the methods provided herein (thermostable CRISPR/Cas system nucleic acid-guided nucleases or thermostable non-CRISPR/Cas system nucleic acid-guided nucleases). In such embodiments, the reaction temperature is elevated, inducing dissociation of the protein; the reaction temperature is lowered, allowing for the generation of additional cleaved target sequences. In some embodiments, thermostable nucleic acid-guided nucleases maintain at least 50% activity, at least 55% activity, at least 60% activity, at least 65% activity, at least 70% activity, at least 75% activity, at least 80% activity, at least 85% activity, at least 90% activity, at least 95% activity, at least 96% activity, at least 97% activity, at least 98% activity, at least 99% activity, or 100% activity, when maintained for at least 75° C. for at least 1 minute. In some embodiments, thermostable nucleic acid-guided nucleases maintain at least 50% activity, when maintained for at least 1 minute at least at 75° C., at least at 80° C., at least at 85° C., at least at 90° C., at least at 91° C., at least at 92° C., at least at 93° C., at least at 94° C., at least at 95° C., 96° C., at least at 97° C., at least at 98° C., at least at 99° C., or at least at 100° C. In some embodiments, thermostable nucleic acid-guided nucleases maintain at least 50% activity, when maintained at least at 75° C. for at least 1 minute, 2 minutes, 3 minutes, 4 minutes, or 5 minutes. In some embodiments, a thermostable nucleic acid-guided nuclease maintains at least 50% activity when the temperature is elevated, lowered to 25° C.−50° C. In some embodiments, the temperature is lowered to 25° C., to 30° C., to 35° C., to 40° C., to 45° C., or to 50° C. In one exemplary embodiment, a thermostable enzyme retains at least 90% activity after 1 min at 95° C.


In some embodiments, the thermostable nucleic acid-guided nuclease is thermostable Cas9, thermostable Cpf1, thermostable Cas3, thermostable Cas8a-c, thermostable Cas10, thermostable Cse1, thermostable Csy1, thermostable Csn2, thermostable Cas4, thermostable Csm2, thermostable Cm5, thermostable Csf1, thermostable C2C2, or thermostable NgAgo.


In some embodiments, the thermostable CRISPR/Cas system protein is thermostable Cas9.


Thermostable nucleic acid-guided nucleases can be isolated, for example, identified by sequence homology in the genome of thermophilic bacteria Streptococcus thermophilus and Pyrococcus furiosus. Nucleic acid-guided nuclease genes can then be cloned into an expression vector. In one exemplary embodiment, a thermostable Cas9 protein is isolated.


In another embodiment, a thermostable nucleic acid-guided nuclease can be obtained by in vitro evolution of a non-thermostable nucleic acid-guided nuclease. The sequence of a nucleic acid-guided nuclease can be mutagenized to improve its thermostability.


Methods of Making Collections of gNAs


Provided herein are methods that enable the generation of a large number of diverse gRNAs, collections of gNAs, from any source nucleic acid (e.g., DNA). Methods provided herein can employ enzymatic methods including but not limited to digestion, ligation, extension, overhang filling, transcription, reverse transcription, amplification.


Generally, the method can comprise providing a nucleic acid (e.g., DNA); employing a first enzyme (or combinations of first enzymes) that cuts at a part of the PAM sequence in the nucleic acid, in a way that a residual nucleotide sequence from the PAM sequence is left; ligating an adapter that positions a restriction enzyme typeIIS site (an enzyme that cuts outside yet near its recognition motif) at a distance to eliminate the PAM sequence; employing a second typeIIS enzyme (or combination of second enzymes) to eliminate the PAM sequence together with the adapter; and fusing a sequence that can be recognized by protein members of the nucleic acid-guided nuclease (e.g., CRISPR/Cas) system, for example, a gRNA stem-loop sequence. In some embodiments, the first enzymatic reactions cuts part of the PAM sequence in a way that residual nucleotide sequence from the PAM sequence is left, and that the nucleotide sequence immediately 5′ to the PAM sequence can be any purine or pyrimidine, not just those with a cytosine 5′ to the PAM sequence, for example, not just those that are C/NGG or C/TAG, etc.


Table 1 shows exemplary strategies/protocols to convert any source nucleic acid (e.g., DNA) into a collection of gNAs (e.g., gRNAs) using different restriction enzymes.









TABLE 1







Exemplary strategies for preparing a collection of guide nucleic acids.













First

3′ Adapter sequence with


CRISPR/Cas

Enzyme/

typeIIS enzyme site


System
PAM
Compo-

(provided with only one


Species
Sequence
nents
Strategy
strand sequence 5′ > 3′)






Streptococcus

NGG
CviPII
Nicks immediately 5′ of
ggGACTCggatccctatagtc



pyogenes



CCD sequence, nicks the
(SEQ ID NO: 4421)


(SP); SpCas9


other strand with T7






endonuclease I, blunt






with T4 DNA polymerase;






ligate to adapter; cut






with MlyI to remove PAM






and adapter; ligate






gRNA stem-loop sequence






at 3′ end







Staphylococcus

NNGRRT
AlwI
Cut, blunt with T4 DNA
ttttagcggccgcctgctgCTCtacaa



aureus (SA);

or

polymerase; ligate to
agacgatgacgacaagcgt


SaCas9
NNGRR

adapter SA; cut with
(SEQ ID NO: 4422)



(N)

EcoP15I to remove PAM






and adapter; blunt end;






ligate gRNA stem-loop






sequence at 3′ end







Neisseria

NNNNGA
TfiI
Cut, blunt with T4 DNA
TCgcggccgcttttattctgctgCTCt



meningitidis

TT

polymerase; ligate to
acaaagacgatgacgacaagcgt


(NM)


adapter NM; cut with
(SEQ ID NO: 4428)





EcoRI to eliminate un-






wanted DNA and EcoP15I






to remove PAM and






adapter; blunt end;






ligate gRNA stem-loop






sequence at 3′ end







Streptococcus

NNAGAA
BsmI
Cut, blunt with T4 DNA
ttgcggccgcttttattctgctgCTCt



thermophilus

W

polymerase; ligate to
acaaagacgatgacgacaagcgt


(ST)


adapter ST; cut with
(SEQ ID NO: 4429)





EcoP15I to remove PAM






and adapter; blunt end;






ligate gRNA stem-loop






sequence at 3′ end







Treponema

NAAAAC
Cly7489I
Cut, blunt with T4 DNA
tttagcggccgcctgctgCTCtacaaa



denticola


I
polymerase; ligate to
gacgatgacgacaagcgt


(TD)


adapter TD; cut with
(SEQ ID NO: 4430)





EcoP15I to remove PAM






and adapter









Table 2 shows additional exemplary strategies/protocols to convert any source nucleic acid (e.g., DNA) into a collection of gNAs (e.g., gRNAs) using different restriction enzymes.









TABLE 2







Additional exemplary strategies for preparing a collection of guide 


nucleic acids.













First




CRISPR/

Enzyme/

Adapter oligo sequence (with


Cas System
PAM
Compo-
Exemplary
Inosine overhangs, all in


Species
Sequence
nent
Strategy
5′ > 3′ direction)






Streptococcus

NGG
CviPII
Nicks immediately 5′ of
Adapter oligo 1:



pyogenes



CCD sequence, nicks the
ggggGACTCggatccctatagtgatac


(SP); SpCas9


other strand with T7
aaagacgatgacgacaagcg





endonuclease I; ligate
(SEQ ID NO: 4404)





to adapter; cut with
Adapter oligo 2:





MlyI to remove PAM and
gcctcgagc*t*a*atacgactcactatag





3′ adapter; ligate
ggatccaagtccc





gRNA stem-loop sequence
(* denotes a phosphorothioate





at 3′ end
backbone linkage)






(SEQ ID NO: 4405)






Staphylococcus

NNGRRT 
AlwI
Cut; ligate to adapter
Adapter oligo 1:



aureus (SA);

or

SA; cut with EcoP15I
IttttagcggccgcctgctgCTCtacaaa


SaCas9
NNGRR

to remove PAM and 3′
gacgatgacgacaagcgt



(N)

adapter; blunt end;
(SEQ ID NO: 4422)





ligate gRNA stem-loop
Adapter oligo 2:





sequence at 3′ end
gagatcagcttctgcattgatgcGAGcag






caggcggccgctaaaa






(SEQ ID NO: 4423)






Neisseria

NNNNGATT
TfiI
Cut; ligate to adapter
Adapter oligo 1:



meningitidis



NM; cut with EcoP15I
attTCgcggccgcttttattctgctgCTCt


(NM)


to remove PAM and 3′
acaaagacgatgacgacaagcgt





adapter; blunt end;
(SEQ ID NO: 4424)





ligate gRNA stem-loop
Adapter oligo 2:





sequence at 3′ end
gagatcagcttctgcattgatgcGAGcag






cagaataaaagcggccgcGA






(SEQ ID NO: 4425)






Streptococcus

NNAGAAW
BsmI
Cut; ligate to adapter
Adapter oligo 1:



thermophilus



ST; cut with EcoP15I
gcggccgcttttattctgctgCTCtacaaa


(ST)


to remove PAM and 3′
gacgatgacgacaagcgt





adapter; blunt end;
(SEQ ID NO: 4426)





ligate gRNA stem-loop
Adapter oligo 2:





sequence at 3′ end
gagatcagcttctgcattgatgcGAGcag






cagaataaaagcggccgcIG






(SEQ ID NO: 4427)









Exemplary applications of the compositions and methods described herein are provided in FIG. 1, FIG. 2, FIG. 3, FIG. 4, FIG. 5, FIG. 6, and FIG. 7. The figures depict non-limiting exemplary embodiments of the present invention that includes a method of constructing a gNA library (e.g., gRNA library) from input nucleic acids (e.g., DNA), such as genomic DNA (e.g., human genomic DNA).


In FIG. 1, the starting material can be fragmented genomic DNA (e.g., human) or other source DNA. These fragments are blunt-ended before constructing the library 101. T7 promoter adapters are ligated to the blunt-ended DNA fragments 102, which is then PCR amplified. Nt.CviPII is then used to generate a nick on one strand of the PCR product immediately 5′ to the CCD sequence 103. T7 Endonuclease I cleaves on the opposite strand 1, 2, or 3 bp 5′ of the nick 104. The resulting DNA fragments are blunt-ended with T4 DNA Polymerase, leaving HGG sequence at the end of the DNA fragment 105. The resulting DNA is cleaned and recovered on beads. An adapter carrying MlyI recognition site is ligated to the blunt-ended DNA fragment immediately 3′ of HGG sequence 106. MlyI generates a blunt-end cleavage immediately 5′ to the HGG sequence, removing HGG together with the adapter sequence 107. The resulting DNA fragments are cleaned and recovered again on beads. A gRNA stem-loop sequence is then ligated to the blunt-end cleaved by MlyI, forming a gRNA library covering the human genome 108. This library of DNA is then PCR amplified and cleaned on beads, ready for in vitro transcription.


In FIG. 2, the starting material can intact genomic DNA (e.g., human) or other source DNA 201. Nt.CviPII and T7 Endonuclease I are used to generate nicks on each strand of the human genomic DNA, resulting in smaller DNA fragments 202. DNA fragments of 200-600 bp are size selected on beads, then ligated with Y-shaped adapters carrying a GG overhang on the 5′. One strand of the Y-shaped adapter contains a MlyI recognition site, wherein the other strand contains a mutated MlyI site and a T7 promoter sequence 203. Because of these features, after PCR amplification, the T7 promoter sequence is at the distal end of the HGG sequence, and the MlyI sequence is at the rear end of HGG 204. Digestion with MlyI generates a cleavage immediately 5′ of HGG sequence 205. MlyI generates a blunt-end cleavage immediately 5′ to the HGG sequence, removing HGG together with the adapter sequence 206. A gRNA stem-loop sequence is then ligated to the blunt-end cleaved by MlyI, forming a gRNA library covering the human genome. This library of DNA is then PCR amplified and cleaned on beads, ready for in vitro transcription.


In FIG. 3, the source DNA (e.g., genomic DNA) can be nicked 301, for example with a nicking enzyme. In some cases, the nicking enzyme can have a recognition site that is three or fewer bases in length. In some cases, CviPII is used, which can recognize and nick at a sequence of CCD (where D represents a base other than C). Nicks can be proximal, surrounding a region containing the sequence (represented by the thicker line) which will be used to yield the guide RNA N20 sequence. When nicks are proximal, a double stranded break can occur and lead to 5′ or 3′ overhangs 302. These overhangs can be repaired, for example with a polymerase (e.g., T4 polymerase). In some cases, such as with 5′ strands, repair can comprise synthesizing a complementary strand. In some case, such as with 3′ strands, repair can comprise removing overhangs. Repair can result in a blunt end including the N20 guide sequence and a sequence complementary to the nick recognition sequence (e.g., HGG, where H represents a base other than G).


In FIG. 4, continuing for example from the end of FIG. 3, different combinations of adapters can be ligated to the DNA to allow for the desired cleaving. Adapters with a recognition site for a nuclease enzyme that cuts 3 base pairs from the site (e.g., MlyI) can be ligated 401, and digestion at that site can be used to remove a left over sequence, such as an HGG sequence 402. Adapters with a recognition site for a nuclease that cuts 20 base pairs from the site (e.g., MmeI) 403. These adapters can also include a second recognition site for a nuclease that cuts the proper number of nucleotides from the site to later remove the first recognition site (e.g., BsaXI). The first enzyme can be used to cut 20 nucleotides down, thereby keeping the N20 sequence 404. Then, a promoter adapter (e.g., T7) can be ligated next to the N20 sequence 405. Then, the nuclease corresponding to the second recognition site (e.g., BsaXI) can be used to remove the adapter for the site that cuts 20 nucleotides away (e.g., MmeI) 406. Finally, the guide RNA stem-loop sequence adapter can be ligated to the N20 sequence 407 to prepare for guide RNA production.


Alternatively, the protocol shown in FIG. 5 can follow the end of a protocol such as that shown in FIG. 3. Adapters with a recognition site for a nuclease enzyme that cleaves 25 nucleotides from the site (e.g., EcoP15I) can be ligated to the DNA 501. These adapters can also include a second recognition site for a nuclease that cuts the proper number of nucleotides from the site to later remove the first recognition site (e.g., BaeI) and any other left-over sequence, such as HGG. The enzyme corresponding to the first recognition site (e.g., EcoP15I) can then be used to cleave after the N20 sequence 502. Then, a promoter adapter (e.g., T7) can be ligated next to the N20 sequence 503. The enzyme corresponding to the second recognition site (e.g., BaeI) can then be used to remove the recognition sites and any residual sequence (e.g., HGG) 504. Finally, the guide RNA stem-loop sequence adapter can be ligated (e.g., by single strand ligation) to the N20 sequence 505.


As an alternative to protocols such as that shown in FIG. 3, the protocol shown in FIG. 6 can be used in preparation for protocols such as those shown in FIG. 4 or FIG. 5. A nick can be introduced by a nicking enzyme (e.g., CviPII) 601. In some cases, the nick recognition site is three or fewer bases in length. In some cases, CviPII is used, which can recognize and nick at a sequence of CCD. A polymerase (e.g., Bst large fragment DNA polymerase) can then be used to synthesize a new DNA strand starting from the nick while displacing the old strand 602. Because of the DNA synthesis, the nick can be sealed and made available to be nicked again 603. Subsequent cycles of nicking and synthesis can be used to yield large amounts of target sequences 604. These single stranded copies of target sequences can be made double stranded, for example by random priming and extension. These double stranded nucleic acids comprising N20 sequences can then be further processed by methods disclosed herein, such as those shown in FIG. 4 or FIG. 5.


As another alternative to protocols such as that shown in FIG. 3 or FIG. 6, the protocol shown in FIG. 7 can be used in preparation for protocols such as those shown in FIG. 4 or FIG. 5. A nick can be introduced by a nicking enzyme (e.g., CviPII) 701. In some cases, the nicking enzyme recognition site is three or fewer bases in length. In some cases, CviPII is used, which can recognize and nick at a sequence of CCD. A polymerase (e.g., Bst large fragment DNA polymerase) can then be used to synthesize a new DNA strand starting from the nick while displacing the old strand (e.g., nicking endonuclease-mediated strand-displacement DNA amplification (NEMDA)). The reaction parameters can be adjusted to control the size of the single stranded DNA produced. For example, the nickase:polymerase ratio (e.g., CviPII:Bts large fragment polymerase ratio) can be adjusted. Reaction temperature can also be adjusted. Next, an oligonucleotide can be added 704 which has (in the 5′>3′ direction) a promoter (e.g., T7 promoter) 702 followed by a random n-mer (e.g., random 6-mer, random 8-mer) 703. The random n-mer region can bind to a region of the single stranded DNA generated previously. For example, binding can be conducted by denaturing at high temperature followed by rapid cool down, which can allow the random n-mer region to bind to the single stranded DNA generated by NEMDA. In some cases, the DNA is denatured at 98° C. for 7 minutes then cooled down rapidly to 10° C. Extension and/or amplification can be used to produce double-stranded DNA. Blunt ends can be produced, for example enzymatically (e.g., by treatment with DNA polymerase I at 20° C.). This can result in one end ending at the promoter (e.g., T7 promoter) and the other end ending at any nicking enzyme recognition sites (e.g., any CCD sites). These fragments can then be purified, for example by size selection (e.g., by gel purification, capillary electrophoresis, or other fragment separation techniques). In some cases, the target fragments are about 50 base pairs in length (adapter sequence (e.g., T7 adapter)+target N20 sequence+nicking enzyme recognition site or complement (e.g., HGG)). Fragments can then be ligated to an adapter comprising a nuclease recognition site for a nuclease that cuts an appropriate distance away to remove the nicking enzyme recognition site 705. For example, for a three-nucleotide long nicking enzyme recognition site (e.g., CCD for CviPII), BaeI can be used. The appropriate nuclease (e.g., BaeI) can then be used to remove the nuclease recognition site and the nicking enzyme recognition site 706. The remaining nucleic acid sequence (e.g., the N20 site) can then be ligated to the final stem-loop sequence for the guide RNA 707. Amplification (e.g., PCR) can be conducted. Guide RNAs can be produced.


In some embodiments, a collection of gNAs (e.g., gRNAs) targeting human mitochondrial DNA (mtDNA) is created, that can be used for directing nucleic acid-guided nuclease (e.g., Cas9) proteins, comprising the nucleic acid-guided nuclease (e.g., Cas9) target sequence. In some embodiments, the targeting sequence of this collection of gNAs (e.g., gRNAs) are encoded by DNA sequences comprising at least the 20 nt sequence provided in the second column from the right of Table 3 (if the NGG sequence is on positive strand) and Table 4 (if the NGG sequence is on negative strand). In some embodiments, a collection of gRNA nucleic acids, as provided herein, with specificity for human mitochondrial DNA, comprise a plurality of members, wherein the members comprise a plurality of targeting sequences provided in the second column from the right column of Table 3 and/or the second column from the right of Table 4.









TABLE 3







gRNA target sequence for human mtDNA carrying NGG sequence 


on the (+) strand.














nt sequence on

20 nt gRNA



Chr start
Chr end
the (+) strand

target sequence



position
position
containing gRNA
SEQ
(will encode the
SEQ


(+
(+
target sequence 
ID
gRNA targeting
ID


strand)
strand)
followed by NGG
NO
sequence)
NO





   13
   35
ATCACCCTATTAACCAC
 13
ATCACCCTATTAACCA
436




TCACGG

CTCA






   14
   36
TCACCCTATTAACCACT
 14
TCACCCTATTAACCAC
437




CACGGG

TCAC






   32
   54
ACGGGAGCTCTCCATGC
 15
ACGGGAGCTCTCCATG
438




ATTTGG

CATT






   45
   67
ATGCATTTGGTATTTTC
 16
ATGCATTTGGTATTTT
439




GTCTGG

CGTC






   46
   68
TGCATTTGGTATTTTCGT
 17
TGCATTTGGTATTTTC
440




CTGGG

GTCT






   47
   69
GCATTTGGTATTTTCGT
 18
GCATTTGGTATTTTCG
441




CTGGGG

TCTG






   48
   70
CATTTGGTATTTTCGTCT
 19
CATTTGGTATTTTCGTC
442




GGGGG

TGG






   49
   71
ATTTGGTATTTTCGTCTG
 20
ATTTGGTATTTTCGTCT
443




GGGGG

GGG






   79
  101
GCGATAGCATTGCGAGA
 21
GCGATAGCATTGCGAG
444




CGCTGG

ACGC






   85
  107
GCATTGCGAGACGCTGG
 22
GCATTGCGAGACGCTG
445




AGCCGG

GAGC






  163
  185
GCACCTACGTTCAATAT
 23
GCACCTACGTTCAATA
446




TACAGG

TTAC






  207
  229
GTTAATTAATTAATGCT
 24
GTTAATTAATTAATGC
447




TGTAGG

TTGT






  301
  323
AACCCCCCCTCCCCCGC
 25
AACCCCCCCTCCCCCG
448




TTCTGG

CTTC






  388
  410
AGATTTCAAATTTTATC
 26
AGATTTCAAATTTTAT
449




TTTTGG

CTTT






  391
  413
TTTCAAATTTTATCTTTT
 27
TTTCAAATTTTATCTTT
450




GGCGG

TGG






  604
  626
ATACACTGAAAATGTTT
 28
ATACACTGAAAATGTT
451




AGACGG

TAGA






  605
  627
TACACTGAAAATGTTTA
 29
TACACTGAAAATGTTT
452




GACGGG

AGAC






  631
  653
ACATCACCCCATAAACA
 30
ACATCACCCCATAAAC
453




AATAGG

AAAT






  636
  658
ACCCCATAAACAAATAG
 31
ACCCCATAAACAAATA
454




GTTTGG

GGTT






  727
  749
TCTAAATCACCACGATC
 32
TCTAAATCACCACGAT
455




AAAAGG

CAAA






  788
  810
TTAGCCTAGCCACACCC
 33
TTAGCCTAGCCACACC
456




CCACGG

CCCA






  789
  811
TAGCCTAGCCACACCCC
 34
TAGCCTAGCCACACCC
457




CACGGG

CCAC






  851
  873
AACTAAGCTATACTAAC
 35
AACTAAGCTATACTAA
458




CCCAGG

CCCC






  852
  874
ACTAAGCTATACTAACC
 36
ACTAAGCTATACTAAC
459




CCAGGG

CCCA






  856
  878
AGCTATACTAACCCCAG
 37
AGCTATACTAACCCCA
460




GGTTGG

GGGT






  880
  902
CAATTTCGTGCCAGCCA
 38
CAATTTCGTGCCAGCC
461




CCGCGG

ACCG






  912
  934
TAACCCAAGTCAATAGA
 39
TAACCCAAGTCAATAG
462




AGCCGG

AAGC






 1009
 1031
CACAAAATAGACTACG
 40
CACAAAATAGACTACG
463




AAAGTGG

AAAG






 1051
 1073
ACAATAGCTAAGACCCA
 41
ACAATAGCTAAGACCC
464




AACTGG

AAAC






 1052
 1074
CAATAGCTAAGACCCAA
 42
CAATAGCTAAGACCCA
465




ACTGGG

AACT






 1148
 1170
AGCCACAGCTTAAAACT
 43
AGCCACAGCTTAAAAC
466




CAAAGG

TCAA






 1154
 1176
AGCTTAAAACTCAAAGG
 44
AGCTTAAAACTCAAAG
467




ACCTGG

GACC






 1157
 1179
TTAAAACTCAAAGGACC
 45
TTAAAACTCAAAGGAC
468




TGGCGG

CTGG






 1178
 1200
GGTGCTTCATATCCCTC
 46
GGTGCTTCATATCCCT
469




TAGAGG

CTAG






 1267
 1289
TCTTCAGCAAACCCTGA
 47
TCTTCAGCAAACCCTG
470




TGAAGG

ATGA






 1306
 1328
AGTACCCACGTAAAGAC
 48
AGTACCCACGTAAAGA
471




GTTAGG

CGTT






 1312
 1334
CACGTAAAGACGTTAGG
 49
CACGTAAAGACGTTAG
472




TCAAGG

GTCA






 1326
 1348
AGGTCAAGGTGTAGCCC
 50
AGGTCAAGGTGTAGCC
473




ATGAGG

CATG






 1329
 1351
TCAAGGTGTAGCCCATG
 51
TCAAGGTGTAGCCCAT
474




AGGTGG

GAGG






 1339
 1361
GCCCATGAGGTGGCAA
 52
GCCCATGAGGTGGCAA
475




GAAATGG

GAAA






 1340
 1362
CCCATGAGGTGGCAAG
 53
CCCATGAGGTGGCAAG
476




AAATGGG

AAAT






 1389
 1411
GATAGCCCTTATGAAAC
 54
GATAGCCCTTATGAAA
477




TTAAGG

CTTA






 1390
 1412
ATAGCCCTTATGAAACT
 55
ATAGCCCTTATGAAAC
478




TAAGGG

TTAA






 1397
 1419
TTATGAAACTTAAGGGT
 56
TTATGAAACTTAAGGG
479




CGAAGG

TCGA






 1400
 1422
TGAAACTTAAGGGTCGA
 57
TGAAACTTAAGGGTCG
480




AGGTGG

AAGG






 1441
 1463
AGTAGAGTGCTTAGTTG
 58
AGTAGAGTGCTTAGTT
481




AACAGG

GAAC






 1442
 1464
GTAGAGTGCTTAGTTGA
 59
GTAGAGTGCTTAGTTG
482




ACAGGG

AACA






 1494
 1516
CCTCCTCAAGTATACTT
 60
CCTCCTCAAGTATACT
483




CAAAGG

TCAA






 1530
 1552
ACCCCTACGCATTTATA
 61
ACCCCTACGCATTTAT
484




TAGAGG

ATAG






 1548
 1570
AGAGGAGACAAGTCGT
 62
AGAGGAGACAAGTCG
485




AACATGG

TAACA






 1560
 1582
TCGTAACATGGTAAGTG
 63
TCGTAACATGGTAAGT
486




TACTGG

GTAC






 1573
 1595
AGTGTACTGGAAAGTGC
 64
AGTGTACTGGAAAGTG
487




ACTTGG

CACT






 1620
 1642
AAAGCACCCAACTTACA
 65
AAAGCACCCAACTTAC
488




CTTAGG

ACTT






 1726
 1748
CATTTACCCAAATAAAG
 66
CATTTACCCAAATAAA
489




TATAGG

GTAT






 1746
 1768
AGGCGATAGAAATTGA
 67
AGGCGATAGAAATTG
490




AACCTGG

AAACC






 1770
 1792
GCAATAGATATAGTACC
 68
GCAATAGATATAGTAC
491




GCAAGG

CGCA






 1771
 1793
CAATAGATATAGTACCG
 69
CAATAGATATAGTACC
492




CAAGGG

GCAA






 1809
 1831
TAACCAAGCATAATATA
 70
TAACCAAGCATAATAT
493




GCAAGG

AGCA






 1862
 1884
TAACTAGAAATAACTTT
 71
TAACTAGAAATAACTT
494




GCAAGG

TGCA






 1947
 1969
CCGTCTATGTAGCAAAA
 72
CCGTCTATGTAGCAAA
495




TAGTGG

ATAG






 1948
 1970
CGTCTATGTAGCAAAAT
 73
CGTCTATGTAGCAAAA
496




AGTGGG

TAGT






 1960
 1982
AAAATAGTGGGAAGAT
 74
AAAATAGTGGGAAGA
497




TTATAGG

TTTAT






 1966
 1988
GTGGGAAGATTTATAGG
 75
GTGGGAAGATTTATAG
498




TAGAGG

GTAG






 1987
 2009
GGCGACAAACCTACCG
 76
GGCGACAAACCTACCG
499




AGCCTGG

AGCC






 1997
 2019
CTACCGAGCCTGGTGAT
 77
CTACCGAGCCTGGTGA
500




AGCTGG

TAGC






 2086
 2108
ATTTAACTGTTAGTCCA
 78
ATTTAACTGTTAGTCC
501




AAGAGG

AAAG






 2099
 2121
TCCAAAGAGGAACAGC
 79
TCCAAAGAGGAACAG
502




TCTTTGG

CTCTT






 2107
 2129
GGAACAGCTCTTTGGAC
 80
GGAACAGCTCTTTGGA
503




ACTAGG

CACT






 2152
 2174
AAAAATTTAACACCCAT
 81
AAAAATTTAACACCCA
504




AGTAGG

TAGT






 2247
 2269
CTGAACTCCTCACACCC
 82
CTGAACTCCTCACACC
505




AATTGG

CAAT






 2414
 2436
CCTCACTGTCAACCCAA
 83
CCTCACTGTCAACCCA
506




CACAGG

ACAC






 2427
 2449
CCAACACAGGCATGCTC
 84
CCAACACAGGCATGCT
507




ATAAGG

CATA






 2432
 2454
ACAGGCATGCTCATAAG
 85
ACAGGCATGCTCATAA
508




GAAAGG

GGAA






 2449
 2471
GAAAGGTTAAAAAAAG
 86
GAAAGGTTAAAAAAA
509




TAAAAGG

GTAAA






 2456
 2478
TAAAAAAAGTAAAAGG
 87
TAAAAAAAGTAAAAG
510




AACTCGG

GAACT






 2515
 2537
TCTAGCATCACCAGTAT
 88
TCTAGCATCACCAGTA
511




TAGAGG

TTAG






 2546
 2568
GCCCAGTGACACATGTT
 89
GCCCAGTGACACATGT
512




TAACGG

TTAA






 2552
 2574
TGACACATGTTTAACGG
 90
TGACACATGTTTAACG
513




CCGCGG

GCCG






 2571
 2593
GCGGTACCCTAACCGTG
 91
GCGGTACCCTAACCGT
514




CAAAGG

GCAA






 2599
 2621
TAATCACTTGTTCCTTA
 92
TAATCACTTGTTCCTT
515




AATAGG

AAAT






 2600
 2622
AATCACTTGTTCCTTAA
 93
AATCACTTGTTCCTTA
516




ATAGGG

AATA






 2614
 2636
TAAATAGGGACCTGTAT
 94
TAAATAGGGACCTGTA
517




GAATGG

TGAA






 2624
 2646
CCTGTATGAATGGCTCC
 95
CCTGTATGAATGGCTC
518




ACGAGG

CACG






 2625
 2647
CTGTATGAATGGCTCCA
 96
CTGTATGAATGGCTCC
519




CGAGGG

ACGA






 2676
 2698
AAATTGACCTGCCCGTG
 97
AAATTGACCTGCCCGT
520




AAGAGG

GAAG






 2679
 2701
TTGACCTGCCCGTGAAG
 98
TTGACCTGCCCGTGAA
521




AGGCGG

GAGG






 2680
 2702
TGACCTGCCCGTGAAGA
 99
TGACCTGCCCGTGAAG
522




GGCGGG

AGGC






 2711
 2733
AGCAAGACGAGAAGAC
100
AGCAAGACGAGAAGA
523




CCTATGG

CCCTA






 2755
 2777
ACAGTACCTAACAAACC
101
ACAGTACCTAACAAAC
524




CACAGG

CCAC






 2789
 2811
CAAACCTGCATTAAAAA
102
CAAACCTGCATTAAAA
525




TTTCGG

ATTT






 2793
 2815
CCTGCATTAAAAATTTC
103
CCTGCATTAAAAATTT
526




GGTTGG

CGGT






 2794
 2816
CTGCATTAAAAATTTCG
104
CTGCATTAAAAATTTC
527




GTTGGG

GGTT






 2795
 2817
TGCATTAAAAATTTCGG
105
TGCATTAAAAATTTCG
528




TTGGGG

GTTG






 2804
 2826
AATTTCGGTTGGGGCGA
106
AATTTCGGTTGGGGCG
529




CCTCGG

ACCT






 2895
 2917
TGATCCAATAACTTGAC
107
TGATCCAATAACTTGA
530




CAACGG

CCAA






 2911
 2933
CCAACGGAACAAGTTAC
108
CCAACGGAACAAGTTA
531




CCTAGG

CCCT






 2912
 2934
CAACGGAACAAGTTACC
109
CAACGGAACAAGTTAC
532




CTAGGG

CCTA






 2954
 2976
CTAGAGTCCATATCAAC
110
CTAGAGTCCATATCAA
533




AATAGG

CAAT






 2955
 2977
TAGAGTCCATATCAACA
111
TAGAGTCCATATCAAC
534




ATAGGG

AATA






 2974
 2996
AGGGTTTACGACCTCGA
112
AGGGTTTACGACCTCG
535




TGTTGG

ATGT






 2980
 3002
TACGACCTCGATGTTGG
113
TACGACCTCGATGTTG
536




ATCAGG

GATC






 2992
 3014
GTTGGATCAGGACATCC
114
GTTGGATCAGGACATC
537




CGATGG

CCGA






 3010
 3032
GATGGTGCAGCCGCTAT
115
GATGGTGCAGCCGCTA
538




TAAAGG

TTAA






 3058
 3080
TACGTGATCTGAGTTCA
116
TACGTGATCTGAGTTC
539




GACCGG

AGAC






 3069
 3091
AGTTCAGACCGGAGTAA
117
AGTTCAGACCGGAGTA
540




TCCAGG

ATCC






 3073
 3095
CAGACCGGAGTAATCCA
118
CAGACCGGAGTAATCC
541




GGTCGG

AGGT






 3110
 3132
CAAATTCCTCCCTGTAC
119
CAAATTCCTCCCTGTA
542




GAAAGG

CGAA






 3125
 3147
ACGAAAGGACAAGAGA
120
ACGAAAGGACAAGAG
543




AATAAGG

AAATA






 3203
 3225
ACCCACACCCACCCAAG
121
ACCCACACCCACCCAA
544




AACAGG

GAAC






 3204
 3226
CCCACACCCACCCAAGA
122
CCCACACCCACCCAAG
545




ACAGGG

AACA






 3217
 3239
AAGAACAGGGTTTGTTA
123
AAGAACAGGGTTTGTT
546




AGATGG

AAGA






 3227
 3249
TTTGTTAAGATGGCAGA
124
TTTGTTAAGATGGCAG
547




GCCCGG

AGCC






 3262
 3284
ACTTAAAACTTTACAGT
125
ACTTAAAACTTTACAG
548




CAGAGG

TCAG






 3294
 3316
TCTTCTTAACAACATAC
126
TCTTCTTAACAACATA
549




CCATGG

CCCA






 3336
 3358
TGTACCCATTCTAATCG
127
TGTACCCATTCTAATC
550




CAATGG

GCAA






 3370
 3392
CTTACCGAACGAAAAAT
128
CTTACCGAACGAAAAA
551




TCTAGG

TTCT






 3391
 3413
GGCTATATACAACTACG
129
GGCTATATACAACTAC
552




CAAAGG

GCAA






 3406
 3428
CGCAAAGGCCCCAACGT
130
CGCAAAGGCCCCAAC
553




TGTAGG

GTTGT






 3415
 3437
CCCAACGTTGTAGGCCC
131
CCCAACGTTGTAGGCC
554




CTACGG

CCTA






 3416
 3438
CCAACGTTGTAGGCCCC
132
CCAACGTTGTAGGCCC
555




TACGGG

CTAC






 3570
 3592
CCTCCCCATACCCAACC
133
CCTCCCCATACCCAAC
556




CCCTGG

CCCC






 3586
 3608
CCCCTGGTCAACCTCAA
134
CCCCTGGTCAACCTCA
557




CCTAGG

ACCT






 3643
 3665
GTTTACTCAATCCTCTG
135
GTTTACTCAATCCTCT
558




ATCAGG

GATC






 3644
 3666
TTTACTCAATCCTCTGA
136
TTTACTCAATCCTCTG
559




TCAGGG

ATCA






 3676
 3698
AACTCAAACTACGCCCT
137
AACTCAAACTACGCCC
560




GATCGG

TGAT






 3757
 3779
CTATCAACATTACTAAT
138
CTATCAACATTACTAA
561




AAGTGG

TAAG






 3828
 3850
ACTCCTGCCATCATGAC
139
ACTCCTGCCATCATGA
562




CCTTGG

CCCT






 3892
 3914
ACCCCCTTCGACCTTGC
140
ACCCCCTTCGACCTTG
563




CGAAGG

CCGA






 3893
 3915
CCCCCTTCGACCTTGCC
141
CCCCCTTCGACCTTGC
564




GAAGGG

CGAA






 3894
 3916
CCCCTTCGACCTTGCCG
142
CCCCTTCGACCTTGCC
565




AAGGGG

GAAG






 3913
 3935
GGGGAGTCCGAACTAGT
143
GGGGAGTCCGAACTA
566




CTCAGG

GTCTC






 3937
 3959
TTCAACATCGAATACGC
144
TTCAACATCGAATACG
567




CGCAGG

CCGC






 4015
 4037
CTCACCACTACAATCTT
145
CTCACCACTACAATCT
568




CCTAGG

TCCT






 4287
 4309
ACTTTGATAGAGTAAAT
146
ACTTTGATAGAGTAAA
569




AATAGG

TAAT






 4311
 4333
GCTTAAACCCCCTTATT
147
GCTTAAACCCCCTTAT
570




TCTAGG

TTCT






 4386
 4408
TCACACCCCATCCTAAA
148
TCACACCCCATCCTAA
571




GTAAGG

AGTA






 4406
 4428
AGGTCAGCTAAATAAGC
149
AGGTCAGCTAAATAAG
572




TATCGG

CTAT






 4407
 4429
GGTCAGCTAAATAAGCT
150
GGTCAGCTAAATAAGC
573




ATCGGG

TATC






 4428
 4450
GGCCCATACCCCGAAAA
151
GGCCCATACCCCGAAA
574




TGTTGG

ATGT






 4460
 4482
TCCCGTACTAATTAATC
152
TCCCGTACTAATTAAT
575




CCCTGG

CCCC






 4494
 4516
ATCTACTCTACCATCTTT
153
ATCTACTCTACCATCT
576




GCAGG

TTGC






 4542
 4564
CACTGATTTTTTACCTG
154
CACTGATTTTTTACCT
577




AGTAGG

GAGT






 4692
 4714
CTCTTCAACAATATACT
155
CTCTTCAACAATATAC
578




CTCCGG

TCTC






 4767
 4789
ATAGCTATAGCAATAAA
156
ATAGCTATAGCAATAA
579




ACTAGG

AACT






 4799
 4821
CTTTCACTTCTGAGTCC
157
CTTTCACTTCTGAGTC
580




CAGAGG

CCAG






 4809
 4831
TGAGTCCCAGAGGTTAC
158
TGAGTCCCAGAGGTTA
581




CCAAGG

CCCA






 4827
 4849
CAAGGCACCCCTCTGAC
159
CAAGGCACCCCTCTGA
582




ATCCGG

CATC






 4941
 4963
TCAATCTTATCCATCAT
160
TCAATCTTATCCATCA
583




AGCAGG

TAGC






 4950
 4972
TCCATCATAGCAGGCAG
161
TCCATCATAGCAGGCA
584




TTGAGG

GTTG






 4953
 4975
ATCATAGCAGGCAGTTG
162
ATCATAGCAGGCAGTT
585




AGGTGG

GAGG






 5010
 5032
TACTCCTCAATTACCCA
163
TACTCCTCAATTACCC
586




CATAGG

ACAT






 5202
 5224
CCATCCACCCTCCTCTC
164
CCATCCACCCTCCTCT
587




CCTAGG

CCCT






 5205
 5227
TCCACCCTCCTCTCCCT
165
TCCACCCTCCTCTCCCT
588




AGGAGG

AGG






 5223
 5245
GGAGGCCTGCCCCCGCT
166
GGAGGCCTGCCCCCGC
589




AACCGG

TAAC






 5239
 5261
TAACCGGCTTTTTGCCC
167
TAACCGGCTTTTTGCC
590




AAATGG

CAAA






 5240
 5262
AACCGGCTTTTTGCCCA
168
AACCGGCTTTTTGCCC
591




AATGGG

AAAT






 5500
 5522
TAATAATCTTATAGAAA
169
TAATAATCTTATAGAA
592




TTTAGG

ATTT






 5569
 5591
CTTAATTTCTGTAACAG
170
CTTAATTTCTGTAACA
593




CTAAGG

GCTA






 5646
 5668
CTAAGCCCTTACTAGAC
171
CTAAGCCCTTACTAGA
594




CAATGG

CCAA






 5647
 5669
TAAGCCCTTACTAGACC
172
TAAGCCCTTACTAGAC
595




AATGGG

CAAT






 5697
 5719
AGCTAAGCACCCTAATC
173
AGCTAAGCACCCTAAT
596




AACTGG

CAAC






 5723
 5745
CAATCTACTTCTCCCGC
174
CAATCTACTTCTCCCG
597




CGCCGG

CCGC






 5724
 5746
AATCTACTTCTCCCGCC
175
AATCTACTTCTCCCGC
598




GCCGGG

CGCC






 5732
 5754
TCTCCCGCCGCCGGGAA
176
TCTCCCGCCGCCGGGA
599




AAAAGG

AAAA






 5735
 5757
CCCGCCGCCGGGAAAA
177
CCCGCCGCCGGGAAA
600




AAGGCGG

AAAGG






 5736
 5758
CCGCCGCCGGGAAAAA
178
CCGCCGCCGGGAAAA
601




AGGCGGG

AAGGC






 5747
 5769
AAAAAAGGCGGGAGAA
179
AAAAAAGGCGGGAGA
602




GCCCCGG

AGCCC






 5751
 5773
AAGGCGGGAGAAGCCC
180
AAGGCGGGAGAAGCC
603




CGGCAGG

CCGGC






 5800
 5822
ATTCAATATGAAAATCA
181
ATTCAATATGAAAATC
604




CCTCGG

ACCT






 5806
 5828
TATGAAAATCACCTCGG
182
TATGAAAATCACCTCG
605




AGCTGG

GAGC






 5816
 5838
ACCTCGGAGCTGGTAAA
183
ACCTCGGAGCTGGTAA
606




AAGAGG

AAAG






 5928
 5950
TCTACAAACCACAAAGA
184
TCTACAAACCACAAAG
607




CATTGG

ACAT






 5949
 5971
GGAACACTATACCTATT
185
GGAACACTATACCTAT
608




ATTCGG

TATT






 5961
 5983
CTATTATTCGGCGCATG
186
CTATTATTCGGCGCAT
609




AGCTGG

GAGC






 5970
 5992
GGCGCATGAGCTGGAGT
187
GGCGCATGAGCTGGA
610




CCTAGG

GTCCT






 6005
 6027
CCTCCTTATTCGAGCCG
188
CCTCCTTATTCGAGCC
611




AGCTGG

GAGC






 6006
 6028
CTCCTTATTCGAGCCGA
189
CTCCTTATTCGAGCCG
612




GCTGGG

AGCT






 6027
 6049
GGCCAGCCAGGCAACCT
190
GGCCAGCCAGGCAAC
613




TCTAGG

CTTCT






 6108
 6130
ATAGTAATACCCATCAT
191
ATAGTAATACCCATCA
614




AATCGG

TAAT






 6111
 6133
GTAATACCCATCATAAT
192
GTAATACCCATCATAA
615




CGGAGG

TCGG






 6117
 6139
CCCATCATAATCGGAGG
193
CCCATCATAATCGGAG
616




CTTTGG

GCTT






 6144
 6166
TGACTAGTTCCCCTAAT
194
TGACTAGTTCCCCTAA
617




AATCGG

TAAT






 6158
 6180
AATAATCGGTGCCCCCG
195
AATAATCGGTGCCCCC
618




ATATGG

GATA






 6236
 6258
CCTGCTCGCATCTGCTA
196
CCTGCTCGCATCTGCT
619




TAGTGG

ATAG






 6239
 6261
GCTCGCATCTGCTATAG
197
GCTCGCATCTGCTATA
620




TGGAGG

GTGG






 6243
 6265
GCATCTGCTATAGTGGA
198
GCATCTGCTATAGTGG
621




GGCCGG

AGGC






 6249
 6271
GCTATAGTGGAGGCCGG
199
GCTATAGTGGAGGCCG
622




AGCAGG

GAGC






 6255
 6277
GTGGAGGCCGGAGCAG
200
GTGGAGGCCGGAGCA
623




GAACAGG

GGAAC






 6282
 6304
ACAGTCTACCCTCCCTT
201
ACAGTCTACCCTCCCT
624




AGCAGG

TAGC






 6283
 6305
CAGTCTACCCTCCCTTA
202
CAGTCTACCCTCCCTT
625




GCAGGG

AGCA






 6300
 6322
GCAGGGAACTACTCCCA
203
GCAGGGAACTACTCCC
626




CCCTGG

ACCC






 6342
 6364
ATCTTCTCCTTACACCT
204
ATCTTCTCCTTACACCT
627




AGCAGG

AGC






 6360
 6382
GCAGGTGTCTCCTCTAT
205
GCAGGTGTCTCCTCTA
628




CTTAGG

TCTT






 6361
 6383
CAGGTGTCTCCTCTATC
206
CAGGTGTCTCCTCTAT
629




TTAGGG

CTTA






 6362
 6384
AGGTGTCTCCTCTATCT
207
AGGTGTCTCCTCTATC
630




TAGGGG

TTAG






 6495
 6517
TCTCTCCCAGTCCTAGC
208
TCTCTCCCAGTCCTAG
631




TGCTGG

CTGC






 6552
 6574
ACCACCTTCTTCGACCC
209
ACCACCTTCTTCGACC
632




CGCCGG

CCGC






 6555
 6577
ACCTTCTTCGACCCCGC
210
ACCTTCTTCGACCCCG
633




CGGAGG

CCGG






 6558
 6580
TTCTTCGACCCCGCCGG
211
TTCTTCGACCCCGCCG
634




AGGAGG

GAGG






 6597
 6619
CAACACCTATTCTGATT
212
CAACACCTATTCTGAT
635




TTTCGG

TTTT






 6630
 6652
GTTTATATTCTTATCCTA
213
GTTTATATTCTTATCCT
636




CCAGG

ACC






 6636
 6658
ATTCTTATCCTACCAGG
214
ATTCTTATCCTACCAG
637




CTTCGG

GCTT






 6669
 6691
CATATTGTAACTTACTA
215
CATATTGTAACTTACT
638




CTCCGG

ACTC






 6687
 6709
TCCGGAAAAAAAGAAC
216
TCCGGAAAAAAAGAA
639




CATTTGG

CCATT






 6696
 6718
AAAGAACCATTTGGATA
217
AAAGAACCATTTGGAT
640




CATAGG

ACAT






 6701
 6723
ACCATTTGGATACATAG
218
ACCATTTGGATACATA
641




GTATGG

GGTA






 6723
 6745
GTCTGAGCTATGATATC
219
GTCTGAGCTATGATAT
642




AATTGG

CAAT






 6732
 6754
ATGATATCAATTGGCTT
220
ATGATATCAATTGGCT
643




CCTAGG

TCCT






 6733
 6755
TGATATCAATTGGCTTC
221
TGATATCAATTGGCTT
644




CTAGGG

CCTA






 6768
 6790
GCACACCATATATTTAC
222
GCACACCATATATTTA
645




AGTAGG

CAGT






 6831
 6853
ATAATCATCGCTATCCC
223
ATAATCATCGCTATCC
646




CACCGG

CCAC






 6867
 6889
AGCTGACTCGCCACACT
224
AGCTGACTCGCCACAC
647




CCACGG

TCCA






 6909
 6931
GCTGCAGTGCTCTGAGC
225
GCTGCAGTGCTCTGAG
648




CCTAGG

CCCT






 6933
 6955
TTCATCTTTCTTTTCACC
226
TTCATCTTTCTTTTCAC
649




GTAGG

CGT






 6936
 6958
ATCTTTCTTTTCACCGTA
227
ATCTTTCTTTTCACCGT
650




GGTGG

AGG






 6945
 6967
TTCACCGTAGGTGGCCT
228
TTCACCGTAGGTGGCC
651




GACTGG

TGAC






 7032
 7054
TTCCACTATGTCCTATC
229
TTCCACTATGTCCTAT
652




AATAGG

CAAT






 7053
 7075
GGAGCTGTATTTGCCAT
230
GGAGCTGTATTTGCCA
653




CATAGG

TCAT






 7056
 7078
GCTGTATTTGCCATCAT
231
GCTGTATTTGCCATCA
654




AGGAGG

TAGG






 7086
 7108
CACTGATTTCCCCTATT
232
CACTGATTTCCCCTAT
655




CTCAGG

TCTC






 7140
 7162
CATTTCACTATCATATT
233
CATTTCACTATCATAT
656




CATCGG

TCAT






 7176
 7198
TTCTTCCCACAACACTT
234
TTCTTCCCACAACACT
657




TCTCGG

TTCT






 7185
 7207
CAACACTTTCTCGGCCT
235
CAACACTTTCTCGGCC
658




ATCCGG

TATC






 7205
 7227
CGGAATGCCCCGACGTT
236
CGGAATGCCCCGACGT
659




ACTCGG

TACT






 7251
 7273
TGAAACATCCTATCATC
237
TGAAACATCCTATCAT
660




TGTAGG

CTGT






 7358
 7380
AGAAGAACCCTCCATAA
238
AGAAGAACCCTCCATA
661




ACCTGG

AACC






 7371
 7393
ATAAACCTGGAGTGACT
239
ATAAACCTGGAGTGAC
662




ATATGG

TATA






 7432
 7454
ACATAAAATCTAGACAA
240
ACATAAAATCTAGACA
663




AAAAGG

AAAA






 7436
 7458
AAAATCTAGACAAAAA
241
AAAATCTAGACAAAA
664




AGGAAGG

AAGGA






 7457
 7479
GGAATCGAACCCCCCAA
242
GGAATCGAACCCCCCA
665




AGCTGG

AAGC






 7476
 7498
CTGGTTTCAAGCCAACC
243
CTGGTTTCAAGCCAAC
666




CCATGG

CCCA






 7499
 7521
CCTCCATGACTTTTTCA
244
CCTCCATGACTTTTTC
667




AAAAGG

AAAA






 7544
 7566
CTTTGTCAAAGTTAAAT
245
CTTTGTCAAAGTTAAA
668




TATAGG

TTAT






 7567
 7589
CTAAATCCTATATATCT
246
CTAAATCCTATATATC
669




TAATGG

TTAA






 7586
 7608
ATGGCACATGCAGCGCA
247
ATGGCACATGCAGCGC
670




AGTAGG

AAGT






 7741
 7763
TACTAACATCTCAGACG
248
TACTAACATCTCAGAC
671




CTCAGG

GCTC






 7831
 7853
CATCCTTTACATAACAG
249
CATCCTTTACATAACA
672




ACGAGG

GACG






 7865
 7887
TCCCTTACCATCAAATC
250
TCCCTTACCATCAAAT
673




AATTGG

CAAT






 7875
 7897
TCAAATCAATTGGCCAC
251
TCAAATCAATTGGCCA
674




CAATGG

CCAA






 7904
 7926
ACCTACGAGTACACCGA
252
ACCTACGAGTACACCG
675




CTACGG

ACTA






 7907
 7929
TACGAGTACACCGACTA
253
TACGAGTACACCGACT
676




CGGCGG

ACGG






 7955
 7977
CCCCCATTATTCCTAGA
254
CCCCCATTATTCCTAG
677




ACCAGG

AACC






 8069
 8091
TCATGAGCTGTCCCCAC
255
TCATGAGCTGTCCCCA
678




ATTAGG

CATT






 8093
 8115
TTAAAAACAGATGCAAT
256
TTAAAAACAGATGCAA
679




TCCCGG

TTCC






 8131
 8153
CACTTTCACCGCTACAC
257
CACTTTCACCGCTACA
680




GACCGG

CGAC






 8132
 8154
ACTTTCACCGCTACACG
258
ACTTTCACCGCTACAC
681




ACCGGG

GACC






 8133
 8155
CTTTCACCGCTACACGA
259
CTTTCACCGCTACACG
682




CCGGGG

ACCG






 8134
 8156
TTTCACCGCTACACGAC
260
TTTCACCGCTACACGA
683




CGGGGG

CCGG






 8144
 8166
ACACGACCGGGGGTAT
261
ACACGACCGGGGGTAT
684




ACTACGG

ACTA






 8165
 8187
GGTCAATGCTCTGAAAT
262
GGTCAATGCTCTGAAA
685




CTGTGG

TCTG






 8228
 8250
CCCCTAAAAATCTTTGA
263
CCCCTAAAAATCTTTG
686




AATAGG

AAAT






 8229
 8251
CCCTAAAAATCTTTGAA
264
CCCTAAAAATCTTTGA
687




ATAGGG

AATA






 8370
 8392
CCCAACTAAATACTACC
265
CCCAACTAAATACTAC
688




GTATGG

CGTA






 8551
 8573
TTCATTGCCCCCACAAT
266
TTCATTGCCCCCACAA
689




CCTAGG

TCCT






 8698
 8720
ATAACCATACACAACAC
267
ATAACCATACACAACA
690




TAAAGG

CTAA






 8761
 8783
ATTGCCACAACTAACCT
268
ATTGCCACAACTAACC
691




CCTCGG

TCCT






 8817
 8839
ACTATCTATAAACCTAG
269
ACTATCTATAAACCTA
692




CCATGG

GCCA






 8835
 8857
CATGGCCATCCCCTTAT
270
CATGGCCATCCCCTTA
693




GAGCGG

TGAG






 8836
 8858
ATGGCCATCCCCTTATG
271
ATGGCCATCCCCTTAT
694




AGCGGG

GAGC






 8851
 8873
TGAGCGGGCACAGTGAT
272
TGAGCGGGCACAGTG
695




TATAGG

ATTAT






 8899
 8921
CTAGCCCACTTCTTACC
273
CTAGCCCACTTCTTAC
696




ACAAGG

CACA






 8973
 8995
ACTCATTCAACCAATAG
274
ACTCATTCAACCAATA
697




CCCTGG

GCCC






 9004
 9026
CTAACCGCTAACATTAC
275
CTAACCGCTAACATTA
698




TGCAGG

CTGC






 9028
 9050
CACCTACTCATGCACCT
276
CACCTACTCATGCACC
699




AATTGG

TAAT






 9243
 9265
CCCAGCCCATGACCCCT
277
CCCAGCCCATGACCCC
700




AACAGG

TAAC






 9244
 9266
CCAGCCCATGACCCCTA
278
CCAGCCCATGACCCCT
701




ACAGGG

AACA






 9245
 9267
CAGCCCATGACCCCTAA
279
CAGCCCATGACCCCTA
702




CAGGGG

ACAG






 9273
 9295
TCAGCCCTCCTAATGAC
280
TCAGCCCTCCTAATGA
703




CTCCGG

CCTC






 9321
 9343
TCCATAACGCTCCTCAT
281
TCCATAACGCTCCTCA
704




ACTAGG

TACT






 9358
 9380
CACTAACCATATACCAA
282
CACTAACCATATACCA
705




TGATGG

ATGA






 9390
 9412
ACACGAGAAAGCACAT
283
ACACGAGAAAGCACA
706




ACCAAGG

TACCA






 9417
 9439
CACACACCACCTGTCCA
284
CACACACCACCTGTCC
707




AAAAGG

AAAA






 9429
 9451
GTCCAAAAAGGCCTTCG
285
GTCCAAAAAGGCCTTC
708




ATACGG

GATA






 9430
 9452
TCCAAAAAGGCCTTCGA
286
TCCAAAAAGGCCTTCG
709




TACGGG

ATAC






 9471
 9493
TCAGAAGTTTTTTTCTTC
287
TCAGAAGTTTTTTTCTT
710




GCAGG

CGC






 9522
 9544
CTAGCCCCTACCCCCCA
288
CTAGCCCCTACCCCCC
711




ATTAGG

AATT






 9525
 9547
GCCCCTACCCCCCAATT
289
GCCCCTACCCCCCAAT
712




AGGAGG

TAGG






 9526
 9548
CCCCTACCCCCCAATTA
290
CCCCTACCCCCCAATT
713




GGAGGG

AGGA






 9532
 9554
CCCCCCAATTAGGAGGG
291
CCCCCCAATTAGGAGG
714




CACTGG

GCAC






 9543
 9565
GGAGGGCACTGGCCCCC
292
GGAGGGCACTGGCCCC
715




AACAGG

CAAC






 9606
 9628
ACATCCGTATTACTCGC
293
ACATCCGTATTACTCG
716




ATCAGG

CATC






 9692
 9714
ACTGCTTATTACAATTT
294
ACTGCTTATTACAATT
717




TACTGG

TTAC






 9693
 9715
CTGCTTATTACAATTTT
295
CTGCTTATTACAATTTT
718




ACTGGG

ACT






 9756
 9778
TCTCCCTTCACCATTTCC
296
TCTCCCTTCACCATTTC
719




GACGG

CGA






 9765
 9787
ACCATTTCCGACGGCAT
297
ACCATTTCCGACGGCA
720




CTACGG

TCTA






 9789
 9811
TCAACATTTTTTGTAGC
298
TCAACATTTTTTGTAG
721




CACAGG

CCAC






 9798
 9820
TTTGTAGCCACAGGCTT
299
TTTGTAGCCACAGGCT
722




CCACGG

TCCA






 9816
 9838
CACGGACTTCACGTCAT
300
CACGGACTTCACGTCA
723




TATTGG

TTAT






 9885
 9907
TTTACATCCAAACATCA
301
TTTACATCCAAACATC
724




CTTTGG

ACTT






 9910
 9932
TCGAAGCCGCCGCCTGA
302
TCGAAGCCGCCGCCTG
725




TACTGG

ATAC






 9926
 9948
ATACTGGCATTTTGTAG
303
ATACTGGCATTTTGTA
726




ATGTGG

GATG






 9963
 9985
TATGTCTCCATCTATTG
304
TATGTCTCCATCTATT
727




ATGAGG

GATG






 9964
 9986
ATGTCTCCATCTATTGA
305
ATGTCTCCATCTATTG
728




TGAGGG

ATGA






10122
10144
TTTTGACTACCACAACT
306
TTTTGACTACCACAAC
729




CAACGG

TCAA






10155
10177
AAATCCACCCCTTACGA
307
AAATCCACCCCTTACG
730




GTGCGG

AGTG






10343
10365
CATCATCCTAGCCCTAA
308
CATCATCCTAGCCCTA
731




GTCTGG

AGTC






10365
10387
GCCTATGAGTGACTACA
309
GCCTATGAGTGACTAC
732




AAAAGG

AAAA






10385
10407
AGGATTAGACTGAACCG
310
AGGATTAGACTGAACC
733




AATTGG

GAAT






10500
10522
GCATTTACCATCTCACT
311
GCATTTACCATCTCAC
734




TCTAGG

TTCT






10551
10573
TCCTCCCTACTATGCCT
312
TCCTCCCTACTATGCC
735




AGAAGG

TAGA






10664
10686
CTTTGCCGCCTGCGAAG
313
CTTTGCCGCCTGCGAA
736




CAGCGG

GCAG






10667
10689
TGCCGCCTGCGAAGCAG
314
TGCCGCCTGCGAAGCA
737




CGGTGG

GCGG






10668
10690
GCCGCCTGCGAAGCAGC
315
GCCGCCTGCGAAGCAG
738




GGTGGG

CGGT






10704
10726
GTCTCAATCTCCAACAC
316
GTCTCAATCTCCAACA
739




ATATGG

CATA






10972
10994
ACTCCTACCCCTCACAA
317
ACTCCTACCCCTCACA
740




TCATGG

ATCA






11128
11150
AACCACACTTATCCCCA
318
AACCACACTTATCCCC
741




CCTTGG

ACCT






11147
11169
TTGGCTATCATCACCCG
319
TTGGCTATCATCACCC
742




ATGAGG

GATG






11174
11196
CAGCCAGAACGCCTGA
320
CAGCCAGAACGCCTGA
743




ACGCAGG

ACGC






11204
11226
TTCCTATTCTACACCCT
321
TTCCTATTCTACACCCT
744




AGTAGG

AGT






11252
11274
ATTTACACTCACAACAC
322
ATTTACACTCACAACA
745




CCTAGG

CCCT






11369
11391
ATAGTAAAGATACCTCT
323
ATAGTAAAGATACCTC
746




TTACGG

TTTA






11417
11439
CATGTCGAAGCCCCCAT
324
CATGTCGAAGCCCCCA
747




CGCTGG

TCGC






11418
11440
ATGTCGAAGCCCCCATC
325
ATGTCGAAGCCCCCAT
748




GCTGGG

CGCT






11453
11475
GCCGCAGTACTCTTAAA
326
GCCGCAGTACTCTTAA
749




ACTAGG

AACT






11456
11478
GCAGTACTCTTAAAACT
327
GCAGTACTCTTAAAAC
750




AGGCGG

TAGG






11462
11484
CTCTTAAAACTAGGCGG
328
CTCTTAAAACTAGGCG
751




CTATGG

GCTA






11540
11562
TTCCTTGTACTATCCCTA
329
TTCCTTGTACTATCCCT
752




TGAGG

ATG






11669
11691
CAAACCCCCTGAAGCTT
330
CAAACCCCCTGAAGCT
753




CACCGG

TCAC






11696
11718
GTCATTCTCATAATCGC
331
GTCATTCTCATAATCG
754




CCACGG

CCCA






11697
11719
TCATTCTCATAATCGCC
332
TCATTCTCATAATCGC
755




CACGGG

CCAC






11777
11799
CGCATCATAATCCTCTC
333
CGCATCATAATCCTCT
756




TCAAGG

CTCA






11866
11888
ACCCCCCACTATTAACC
334
ACCCCCCACTATTAAC
757




TACTGG

CTAC






11867
11889
CCCCCCACTATTAACCT
335
CCCCCCACTATTAACC
758




ACTGGG

TACT






11927
11949
AATATCACTCTCCTACT
336
AATATCACTCTCCTAC
759




TACAGG

TTAC






11985
12007
ACATATTTACCACAACA
337
ACATATTTACCACAAC
760




CAATGG

ACAA






11986
12008
CATATTTACCACAACAC
338
CATATTTACCACAACA
761




AATGGG

CAAT






11987
12009
ATATTTACCACAACACA
339
ATATTTACCACAACAC
762




ATGGGG

AATG






12104
12126
CTCAACCCCGACATCAT
340
CTCAACCCCGACATCA
763




TACCGG

TTAC






12105
12127
TCAACCCCGACATCATT
341
TCAACCCCGACATCAT
764




ACCGGG

TACC






12164
12186
GATTGTGAATCTGACAA
342
GATTGTGAATCTGACA
765




CAGAGG

ACAG






12235
12257
TGCCCCCATGTCTAACA
343
TGCCCCCATGTCTAAC
766




ACATGG

AACA






12254
12276
ATGGCTTTCTCAACTTTT
344
ATGGCTTTCTCAACTT
767




AAAGG

TTAA






12272
12294
AAAGGATAACAGCTATC
345
AAAGGATAACAGCTAT
768




CATTGG

CCAT






12279
12301
AACAGCTATCCATTGGT
346
AACAGCTATCCATTGG
769




CTTAGG

TCTT






12294
12316
GTCTTAGGCCCCAAAAA
347
GTCTTAGGCCCCAAAA
770




TTTTGG

ATTT






12608
12630
CTGTAGCATTGTTCGTT
348
CTGTAGCATTGTTCGT
771




ACATGG

TACA






12742
12764
AACCTATTCCAACTGTT
349
AACCTATTCCAACTGT
772




CATCGG

TCAT






12750
12772
CCAACTGTTCATCGGCT
350
CCAACTGTTCATCGGC
773




GAGAGG

TGAG






12751
12773
CAACTGTTCATCGGCTG
351
CAACTGTTCATCGGCT
774




AGAGGG

GAGA






12757
12779
TTCATCGGCTGAGAGGG
352
TTCATCGGCTGAGAGG
775




CGTAGG

GCGT






12847
12869
GCAATCCTATACAACCG
353
GCAATCCTATACAACC
776




TATCGG

GTAT






12856
12878
TACAACCGTATCGGCGA
354
TACAACCGTATCGGCG
777




TATCGG

ATAT






12958
12980
CCAAGCCTCACCCCACT
355
CCAAGCCTCACCCCAC
778




ACTAGG

TACT






12979
13001
GGCCTCCTCCTAGCAGC
356
GGCCTCCTCCTAGCAG
779




AGCAGG

CAGC






12997
13019
GCAGGCAAATCAGCCC
357
GCAGGCAAATCAGCCC
780




AATTAGG

AATT






13030
13052
TGACTCCCCTCAGCCAT
358
TGACTCCCCTCAGCCA
781




AGAAGG

TAGA






13081
13103
TCAAGCACTATAGTTGT
359
TCAAGCACTATAGTTG
782




AGCAGG

TAGC






13156
13178
CAAACTCTAACACTATG
360
CAAACTCTAACACTAT
783




CTTAGG

GCTT






13246
13268
TTCTCCACTTCAAGTCA
361
TTCTCCACTTCAAGTC
784




ACTAGG

AACT






13267
13289
GGACTCATAATAGTTAC
362
GGACTCATAATAGTTA
785




AATCGG

CAAT






13345
13367
GCCATACTATTTATGTG
363
GCCATACTATTTATGT
786




CTCCGG

GCTC






13346
13368
CCATACTATTTATGTGC
364
CCATACTATTTATGTG
787




TCCGGG

CTCC






13393
13415
GAACAAGATATTCGAA
365
GAACAAGATATTCGAA
788




AAATAGG

AAAT






13396
13418
CAAGATATTCGAAAAAT
366
CAAGATATTCGAAAAA
789




AGGAGG

TAGG






13441
13463
ACTTCAACCTCCCTCAC
367
ACTTCAACCTCCCTCA
790




CATTGG

CCAT






13459
13481
ATTGGCAGCCTAGCATT
368
ATTGGCAGCCTAGCAT
791




AGCAGG

TAGC






13477
13499
GCAGGAATACCTTTCCT
369
GCAGGAATACCTTTCC
792




CACAGG

TCAC






13612
13634
ATAATTCTTCTCACCCT
370
ATAATTCTTCTCACCC
793




AACAGG

TAAC






13686
13708
ACTAAACCCCATTAAAC
371
ACTAAACCCCATTAAA
794




GCCTGG

CGCC






13693
13715
CCCATTAAACGCCTGGC
372
CCCATTAAACGCCTGG
795




AGCCGG

CAGC






13708
13730
GCAGCCGGAAGCCTATT
373
GCAGCCGGAAGCCTAT
796




CGCAGG

TCGC






13804
13826
GCCCTCGCTGTCACTTT
374
GCCCTCGCTGTCACTT
797




CCTAGG

TCCT






13894
13916
TTTTATTTCTCCAACATA
375
TTTTATTTCTCCAACAT
798




CTCGG

ACT






13936
13958
CACCGCACAATCCCCTA
376
CACCGCACAATCCCCT
799




TCTAGG

ATCT






14059
14081
ATCATCACCTCAACCCA
377
ATCATCACCTCAACCC
800




AAAAGG

AAAA






14237
14259
TACAAAGCCCCCGCACC
378
TACAAAGCCCCCGCAC
801




AATAGG

CAAT






14417
14439
ACCCCTGACCCCCATGC
379
ACCCCTGACCCCCATG
802




CTCAGG

CCTC






14579
14601
AATACTAAACCCCCATA
380
AATACTAAACCCCCAT
803




AATAGG

AAAT






14585
14607
AAACCCCCATAAATAGG
381
AAACCCCCATAAATAG
804




AGAAGG

GAGA






14664
14686
CATACATCATTATTCTC
382
CATACATCATTATTCT
805




GCACGG

CGCA






14825
14847
ATCTCCGCATGATGAAA
383
ATCTCCGCATGATGAA
806




CTTCGG

ACTT






14837
14859
TGAAACTTCGGCTCACT
384
TGAAACTTCGGCTCAC
807




CCTTGG

TCCT






14867
14889
CTGATCCTCCAAATCAC
385
CTGATCCTCCAAATCA
808




CACAGG

CCAC






14951
14973
ATCACTCGAGACGTAAA
386
ATCACTCGAGACGTAA
809




TTATGG

ATTA






14981
15003
ATCCGCTACCTTCACGC
387
ATCCGCTACCTTCACG
810




CAATGG

CCAA






15020
15042
ATCTGCCTCTTCCTACA
388
ATCTGCCTCTTCCTAC
811




CATCGG

ACAT






15021
15043
TCTGCCTCTTCCTACAC
389
TCTGCCTCTTCCTACA
812




ATCGGG

CATC






15026
15048
CTCTTCCTACACATCGG
390
CTCTTCCTACACATCG
813




GCGAGG

GGCG






15038
15060
ATCGGGCGAGGCCTATA
391
ATCGGGCGAGGCCTAT
814




TTACGG

ATTA






15071
15093
TACTCAGAAACCTGAAA
392
TACTCAGAAACCTGAA
815




CATCGG

ACAT






15113
15135
ACTATAGCAACAGCCTT
393
ACTATAGCAACAGCCT
816




CATAGG

TCAT






15131
15153
ATAGGCTATGTCCTCCC
394
ATAGGCTATGTCCTCC
817




GTGAGG

CGTG






15149
15171
TGAGGCCAAATATCATT
395
TGAGGCCAAATATCAT
818




CTGAGG

TCTG






15150
15172
GAGGCCAAATATCATTC
396
GAGGCCAAATATCATT
819




TGAGGG

CTGA






15151
15173
AGGCCAAATATCATTCT
397
AGGCCAAATATCATTC
820




GAGGGG

TGAG






15194
15216
CTATCCGCCATCCCATA
398
CTATCCGCCATCCCAT
821




CATTGG

ACAT






15195
15217
TATCCGCCATCCCATAC
399
TATCCGCCATCCCATA
822




ATTGGG

CATT






15221
15243
GACCTAGTTCAATGAAT
400
GACCTAGTTCAATGAA
823




CTGAGG

TCTG






15224
15246
CTAGTTCAATGAATCTG
401
CTAGTTCAATGAATCT
824




AGGAGG

GAGG






15334
15356
CCTCCTATTCTTGCACG
402
CCTCCTATTCTTGCAC
825




AAACGG

GAAA






15335
15357
CTCCTATTCTTGCACGA
403
CTCCTATTCTTGCACG
826




AACGGG

AAAC






15353
15375
ACGGGATCAAACAACC
404
ACGGGATCAAACAAC
827




CCCTAGG

CCCCT






15416
15438
TACACAATCAAAGACGC
405
TACACAATCAAAGACG
828




CCTCGG

CCCT






15476
15498
CTATTCTCACCAGACCT
406
CTATTCTCACCAGACC
829




CCTAGG

TCCT






15590
15612
CGATCCGTCCCTAACAA
407
CGATCCGTCCCTAACA
830




ACTAGG

AACT






15593
15615
TCCGTCCCTAACAAACT
408
TCCGTCCCTAACAAAC
831




AGGAGG

TAGG






15740
15762
CTCCTCATTCTAACCTG
409
CTCCTCATTCTAACCT
832




AATCGG

GAAT






15743
15765
CTCATTCTAACCTGAAT
410
CTCATTCTAACCTGAA
833




CGGAGG

TCGG






15776
15798
AGCTACCCTTTTACCAT
411
AGCTACCCTTTTACCA
834




CATTGG

TCAT






15861
15883
TTGAAAACAAAATACTC
412
TTGAAAACAAAATACT
835




AAATGG

CAAA






15862
15884
TGAAAACAAAATACTCA
413
TGAAAACAAAATACTC
836




AATGGG

AAAT






15906
15928
AATACACCAGTCTTGTA
414
AATACACCAGTCTTGT
837




AACCGG

AAAC






15928
15950
GAGATGAAAACCTTTTT
415
GAGATGAAAACCTTTT
838




CCAAGG

TCCA






16012
16034
AACTATTCTCTGTTCTTT
416
AACTATTCTCTGTTCTT
839




CATGG

TCA






16013
16035
ACTATTCTCTGTTCTTTC
417
ACTATTCTCTGTTCTTT
840




ATGGG

CAT






16014
16036
CTATTCTCTGTTCTTTCA
418
CTATTCTCTGTTCTTTC
841




TGGGG

ATG






16026
16048
CTTTCATGGGGAAGCAG
419
CTTTCATGGGGAAGCA
842




ATTTGG

GATT






16027
16049
TTTCATGGGGAAGCAGA
420
TTTCATGGGGAAGCAG
843




TTTGGG

ATTT






16108
16130
CAGCCACCATGAATATT
421
CAGCCACCATGAATAT
844




GTACGG

TGTA






16252
16274
AAAGCCACCCCTCACCC
422
AAAGCCACCCCTCACC
845




ACTAGG

CACT






16348
16370
CAAATCCCTTCTCGTCC
423
CAAATCCCTTCTCGTC
846




CCATGG

CCCA






16367
16389
ATGGATGACCCCCCTCA
424
ATGGATGACCCCCCTC
847




GATAGG

AGAT






16368
16390
TGGATGACCCCCCTCAG
425
TGGATGACCCCCCTCA
848




ATAGGG

GATA






16369
16391
GGATGACCCCCCTCAGA
426
GGATGACCCCCCTCAG
849




TAGGGG

ATAG






16434
16456
GAGTGCTACTCTCCTCG
427
GAGTGCTACTCTCCTC
850




CTCCGG

GCTC






16435
16457
AGTGCTACTCTCCTCGC
428
AGTGCTACTCTCCTCG
851




TCCGGG

CTCC






16449
16471
CGCTCCGGGCCCATAAC
429
CGCTCCGGGCCCATAA
852




ACTTGG

CACT






16450
16472
GCTCCGGGCCCATAACA
430
GCTCCGGGCCCATAAC
853




CTTGGG

ACTT






16451
16473
CTCCGGGCCCATAACAC
431
CTCCGGGCCCATAACA
854




TTGGGG

CTTG






16452
16474
TCCGGGCCCATAACACT
432
TCCGGGCCCATAACAC
855




TGGGGG

TTGG






16482
16504
AGTGAACTGTATCCGAC
433
AGTGAACTGTATCCGA
856




ATCTGG

CATC






16495
16517
CGACATCTGGTTCCTAC
434
CGACATCTGGTTCCTA
857




TTCAGG

CTTC






16496
16518
GACATCTGGTTCCTACT
435
GACATCTGGTTCCTAC
858




TCAGGG

TTCA
















TABLE 4







gRNA target sequence for human mtDNA carrying NGG sequence 


on the (−) strand.














nt sequence on







the (+) strand







containing CCN







sequence followed





Chr start
Chr end
by the reverse 

20 nt gRNA target 



position
position
complementary 
SEQ
sequence (will
SEQ


(+
(+
sequence of gRNA
ID
encode the gRNA
ID


strand)
strand)
target sequence
NO
targeting sequence)
NO





   17
   39
CCCTATTAACCACTCAC
 859
GCTCCCGTGAGTGGTT
2628




GGGAGC

AATA






   18
   40
CCTATTAACCACTCACG
 860
AGCTCCCGTGAGTGGT
2629




GGAGCT

TAAT






   26
   48
CCACTCACGGGAGCTCT
 861
GCATGGAGAGCTCCCG
2630




CCATGC

TGAG






   43
   65
CCATGCATTTGGTATTT
 862
AGACGAAAATACCAA
2631




TCGTCT

ATGCA






  104
  126
CCGGAGCACCCTATGTC
 863
TACTGCGACATAGGGT
2632




GCAGTA

GCTC






  112
  134
CCCTATGTCGCAGTATC
 864
AAGACAGATACTGCG
2633




TGTCTT

ACATA






  113
  135
CCTATGTCGCAGTATCT
 865
AAAGACAGATACTGC
2634




GTCTTT

GACAT






  140
  162
CCTGCCTCATCCTATTA
 866
GATAAATAATAGGATG
2635




TTTATC

AGGC






  144
  166
CCTCATCCTATTATTTAT
 867
GTGCGATAAATAATAG
2636




CGCAC

GATG






  150
  172
CCTATTATTTATCGCAC
 868
ACGTAGGTGCGATAAA
2637




CTACGT

TAAT






  166
  188
CCTACGTTCAATATTAC
 869
TCGCCTGTAATATTGA
2638




AGGCGA

ACGT






  261
  283
CCACTTTCCACACAGAC
 870
TATGATGTCTGTGTGG
2639




ATCATA

AAAG






  268
  290
CCACACAGACATCATAA
 871
TTTTTGTTATGATGTCT
2640




CAAAAA

GTG






  298
  320
CCAAACCCCCCCTCCCC
 872
GAAGCGGGGGAGGGG
2641




CGCTTC

GGGTT






  304
  326
CCCCCCTCCCCCGCTTC
 873
TGGCCAGAAGCGGGG
2642




TGGCCA

GAGGG






  305
  327
CCCCCTCCCCCGCTTCT
 874
GTGGCCAGAAGCGGG
2643




GGCCAC

GGAGG






  306
  328
CCCCTCCCCCGCTTCTG
 875
TGTGGCCAGAAGCGG
2644




GCCACA

GGGAG






  307
  329
CCCTCCCCCGCTTCTGG
 876
CTGTGGCCAGAAGCGG
2645




CCACAG

GGGA






  308
  330
CCTCCCCCGCTTCTGGC
 877
GCTGTGGCCAGAAGCG
2646




CACAGC

GGGG






  311
  333
CCCCCGCTTCTGGCCAC
 878
AGTGCTGTGGCCAGAA
2647




AGCACT

GCGG






  312
  334
CCCCGCTTCTGGCCACA
 879
AAGTGCTGTGGCCAGA
2648




GCACTT

AGCG






  313
  335
CCCGCTTCTGGCCACAG
 880
TAAGTGCTGTGGCCAG
2649




CACTTA

AAGC






  314
  336
CCGCTTCTGGCCACAGC
 881
TTAAGTGCTGTGGCCA
2650




ACTTAA

GAAG






  324
  346
CCACAGCACTTAAACAC
 882
AGAGATGTGTTTAAGT
2651




ATCTCT

GCTG






  348
  370
CCAAACCCCAAAAACA
 883
GGTTCTTTGTTTTTGGG
2652




AAGAACC

GTT






  353
  375
CCCCAAAAACAAAGAA
 884
GTTAGGGTTCTTTGTTT
2653




CCCTAAC

TTG






  354
  376
CCCAAAAACAAAGAAC
 885
TGTTAGGGTTCTTTGTT
2654




CCTAACA

TTT






  355
  377
CCAAAAACAAAGAACC
 886
GTGTTAGGGTTCTTTG
2655




CTAACAC

TTTT






  369
  391
CCCTAACACCAGCCTAA
 887
ATCTGGTTAGGCTGGT
2656




CCAGAT

GTTA






  370
  392
CCTAACACCAGCCTAAC
 888
AATCTGGTTAGGCTGG
2657




CAGATT

TGTT






  377
  399
CCAGCCTAACCAGATTT
 889
AATTTGAAATCTGGTT
2658




CAAATT

AGGC






  381
  403
CCTAACCAGATTTCAAA
 890
ATAAAATTTGAAATCT
2659




TTTTAT

GGTT






  386
  408
CCAGATTTCAAATTTTA
 891
AAAAGATAAAATTTGA
2660




TCTTTT

AATC






  433
  455
CCCCCCAACTAACACAT
 892
AAAATAATGTGTTAGT
2661




TATTTT

TGGG






  434
  456
CCCCCAACTAACACATT
 893
GAAAATAATGTGTTAG
2662




ATTTTC

TTGG






  435
  457
CCCCAACTAACACATTA
 894
GGAAAATAATGTGTTA
2663




TTTTCC

GTTG






  436
  458
CCCAACTAACACATTAT
 895
GGGAAAATAATGTGTT
2664




TTTCCC

AGTT






  437
  459
CCAACTAACACATTATT
 896
GGGGAAAATAATGTGT
2665




TTCCCC

TAGT






  456
  478
CCCCTCCCACTCCCATA
 897
TAGTAGTATGGGAGTG
2666




CTACTA

GGAG






  457
  479
CCCTCCCACTCCCATAC
 898
TTAGTAGTATGGGAGT
2667




TACTAA

GGGA






  458
  480
CCTCCCACTCCCATACT
 899
ATTAGTAGTATGGGAG
2668




ACTAAT

TGGG






  461
  483
CCCACTCCCATACTACT
 900
GAGATTAGTAGTATGG
2669




AATCTC

GAGT






  462
  484
CCACTCCCATACTACTA
 901
TGAGATTAGTAGTATG
2670




ATCTCA

GGAG






  467
  489
CCCATACTACTAATCTC
 902
ATTGATGAGATTAGTA
2671




ATCAAT

GTAT






  468
  490
CCATACTACTAATCTCA
 903
TATTGATGAGATTAGT
2672




TCAATA

AGTA






  494
  516
CCCCCGCCCATCCTACC
 904
GTGCTGGGTAGGATGG
2673




CAGCAC

GCGG






  495
  517
CCCCGCCCATCCTACCC
 905
TGTGCTGGGTAGGATG
2674




AGCACA

GGCG






  496
  518
CCCGCCCATCCTACCCA
 906
GTGTGCTGGGTAGGAT
2675




GCACAC

GGGC






  497
  519
CCGCCCATCCTACCCAG
 907
TGTGTGCTGGGTAGGA
2676




CACACA

TGGG






  500
  522
CCCATCCTACCCAGCAC
 908
GTGTGTGTGCTGGGTA
2677




ACACAC

GGAT






  501
  523
CCATCCTACCCAGCACA
 909
TGTGTGTGTGCTGGGT
2678




CACACA

AGGA






  505
  527
CCTACCCAGCACACACA
 910
GCGGTGTGTGTGTGCT
2679




CACCGC

GGGT






  509
  531
CCCAGCACACACACACC
 911
AGCAGCGGTGTGTGTG
2680




GCTGCT

TGCT






  510
  532
CCAGCACACACACACCG
 912
TAGCAGCGGTGTGTGT
2681




CTGCTA

GTGC






  524
  546
CCGCTGCTAACCCCATA
 913
TCGGGGTATGGGGTTA
2682




CCCCGA

GCAG






  534
  556
CCCCATACCCCGAACCA
 914
TTTGGTTGGTTCGGGG
2683




ACCAAA

TATG






  535
  557
CCCATACCCCGAACCAA
 915
GTTTGGTTGGTTCGGG
2684




CCAAAC

GTAT






  536
  558
CCATACCCCGAACCAAC
 916
GGTTTGGTTGGTTCGG
2685




CAAACC

GGTA






  541
  563
CCCCGAACCAACCAAAC
 917
TTTGGGGTTTGGTTGG
2686




CCCAAA

TTCG






  542
  564
CCCGAACCAACCAAACC
 918
CTTTGGGGTTTGGTTG
2687




CCAAAG

GTTC






  543
  565
CCGAACCAACCAAACCC
 919
TCTTTGGGGTTTGGTT
2688




CAAAGA

GGTT






  548
  570
CCAACCAAACCCCAAA
 920
GGGTGTCTTTGGGGTT
2689




GACACCC

TGGT






  552
  574
CCAAACCCCAAAGACA
 921
TGGGGGGTGTCTTTGG
2690




CCCCCCA

GGTT






  557
  579
CCCCAAAGACACCCCCC
 922
AACTGTGGGGGGTGTC
2691




ACAGTT

TTTG






  558
  580
CCCAAAGACACCCCCCA
 923
AAACTGTGGGGGGTGT
2692




CAGTTT

CTTT






  559
  581
CCAAAGACACCCCCCAC
 924
TAAACTGTGGGGGGTG
2693




AGTTTA

TCTT






  568
  590
CCCCCCACAGTTTATGT
 925
TAAGCTACATAAACTG
2694




AGCTTA

TGGG






  569
  591
CCCCCACAGTTTATGTA
 926
GTAAGCTACATAAACT
2695




GCTTAC

GTGG






  570
  592
CCCCACAGTTTATGTAG
 927
GGTAAGCTACATAAAC
2696




CTTACC

TGTG






  571
  593
CCCACAGTTTATGTAGC
 928
AGGTAAGCTACATAAA
2697




TTACCT

CTGT






  572
  594
CCACAGTTTATGTAGCT
 929
GAGGTAAGCTACATAA
2698




TACCTC

ACTG






  591
  613
CCTCCTCAAAGCAATAC
 930
TTCAGTGTATTGCTTT
2699




ACTGAA

GAGG






  594
  616
CCTCAAAGCAATACACT
 931
ATTTTCAGTGTATTGC
2700




GAAAAT

TTTG






  637
  659
CCCCATAAACAAATAGG
 932
ACCAAACCTATTTGTT
2701




TTTGGT

TATG






  638
  660
CCCATAAACAAATAGGT
 933
GACCAAACCTATTTGT
2702




TTGGTC

TTAT






  639
  661
CCATAAACAAATAGGTT
 934
GGACCAAACCTATTTG
2703




TGGTCC

TTTA






  660
  682
CCTAGCCTTTCTATTAG
 935
TAAGAGCTAATAGAA
2704




CTCTTA

AGGCT






  665
  687
CCTTTCTATTAGCTCTTA
 936
CTTACTAAGAGCTAAT
2705




GTAAG

AGAA






  705
  727
CCCCGTTCCAGTGAGTT
 937
AGGGTGAACTCACTGG
2706




CACCCT

AACG






  706
  728
CCCGTTCCAGTGAGTTC
 938
GAGGGTGAACTCACTG
2707




ACCCTC

GAAC






  707
  729
CCGTTCCAGTGAGTTCA
 939
AGAGGGTGAACTCACT
2708




CCCTCT

GGAA






  712
  734
CCAGTGAGTTCACCCTC
 940
GATTTAGAGGGTGAAC
2709




TAAATC

TCAC






  724
  746
CCCTCTAAATCACCACG
 941
TTTGATCGTGGTGATT
2710




ATCAAA

TAGA






  725
  747
CCTCTAAATCACCACGA
 942
TTTTGATCGTGGTGAT
2711




TCAAAA

TTAG






  736
  758
CCACGATCAAAAGGAA
 943
ATGCTTGTTCCTTTTGA
2712




CAAGCAT

TCG






  792
  814
CCTAGCCACACCCCCAC
 944
TTTCCCGTGGGGGTGT
2713




GGGAAA

GGCT






  797
  819
CCACACCCCCACGGGAA
 945
TGCTGTTTCCCGTGGG
2714




ACAGCA

GGTG






  802
  824
CCCCCACGGGAAACAG
 946
ATCACTGCTGTTTCCC
2715




CAGTGAT

GTGG






  803
  825
CCCCACGGGAAACAGC
 947
AATCACTGCTGTTTCC
2716




AGTGATT

CGTG






  804
  826
CCCACGGGAAACAGCA
 948
TAATCACTGCTGTTTC
2717




GTGATTA

CCGT






  805
  827
CCACGGGAAACAGCAG
 949
TTAATCACTGCTGTTT
2718




TGATTAA

CCCG






  828
  850
CCTTTAGCAATAAACGA
 950
AAACTTTCGTTTATTG
2719




AAGTTT

CTAA






  867
  889
CCCCAGGGTTGGTCAAT
 951
CACGAAATTGACCAAC
2720




TTCGTG

CCTG






  868
  890
CCCAGGGTTGGTCAATT
 952
GCACGAAATTGACCAA
2721




TCGTGC

CCCT






  869
  891
CCAGGGTTGGTCAATTT
 953
GGCACGAAATTGACCA
2722




CGTGCC

ACCC






  890
  912
CCAGCCACCGCGGTCAC
 954
AATCGTGTGACCGCGG
2723




ACGATT

TGGC






  894
  916
CCACCGCGGTCACACGA
 955
GGTTAATCGTGTGACC
2724




TTAACC

GCGG






  897
  919
CCGCGGTCACACGATTA
 956
TTGGGTTAATCGTGTG
2725




ACCCAA

ACCG






  915
  937
CCCAAGTCAATAGAAGC
 957
ACGCCGGCTTCTATTG
2726




CGGCGT

ACTT






  916
  938
CCAAGTCAATAGAAGCC
 958
TACGCCGGCTTCTATT
2727




GGCGTA

GACT






  931
  953
CCGGCGTAAAGAGTGTT
 959
ATCTAAAACACTCTTT
2728




TTAGAT

ACGC






  956
  978
CCCCCTCCCCAATAAAG
 960
TTTTAGCTTTATTGGG
2729




CTAAAA

GAGG






  957
  979
CCCCTCCCCAATAAAGC
 961
GTTTTAGCTTTATTGG
2730




TAAAAC

GGAG






  958
  980
CCCTCCCCAATAAAGCT
 962
AGTTTTAGCTTTATTG
2731




AAAACT

GGGA






  959
  981
CCTCCCCAATAAAGCTA
 963
GAGTTTTAGCTTTATT
2732




AAACTC

GGGG






  962
  984
CCCCAATAAAGCTAAAA
 964
GGTGAGTTTTAGCTTT
2733




CTCACC

ATTG






  963
  985
CCCAATAAAGCTAAAAC
 965
AGGTGAGTTTTAGCTT
2734




TCACCT

TATT






  964
  986
CCAATAAAGCTAAAACT
 966
CAGGTGAGTTTTAGCT
2735




CACCTG

TTAT






  983
 1005
CCTGAGTTGTAAAAAAC
 967
ACTGGAGTTTTTTACA
2736




TCCAGT

ACTC






 1001
 1023
CCAGTTGACACAAAATA
 968
GTAGTCTATTTTGTGT
2737




GACTAC

CAAC






 1064
 1086
CCCAAACTGGGATTAGA
 969
GGGGTATCTAATCCCA
2738




TACCCC

GTTT






 1065
 1087
CCAAACTGGGATTAGAT
 970
TGGGGTATCTAATCCC
2739




ACCCCA

AGTT






 1083
 1105
CCCCACTATGCTTAGCC
 971
GTTTAGGGCTAAGCAT
2740




CTAAAC

AGTG






 1084
 1106
CCCACTATGCTTAGCCC
 972
GGTTTAGGGCTAAGCA
2741




TAAACC

TAGT






 1085
 1107
CCACTATGCTTAGCCCT
 973
AGGTTTAGGGCTAAGC
2742




AAACCT

ATAG






 1098
 1120
CCCTAAACCTCAACAGT
 974
GATTTAACTGTTGAGG
2743




TAAATC

TTTA






 1099
 1121
CCTAAACCTCAACAGTT
 975
TGATTTAACTGTTGAG
2744




AAATCA

GTTT






 1105
 1127
CCTCAACAGTTAAATCA
 976
TTTTGTTGATTTAACTG
2745




ACAAAA

TTG






 1135
 1157
CCAGAACACTACGAGCC
 977
AGCTGTGGCTCGTAGT
2746




ACAGCT

GTTC






 1150
 1172
CCACAGCTTAAAACTCA
 978
GTCCTTTGAGTTTTAA
2747




AAGGAC

GCTG






 1172
 1194
CCTGGCGGTGCTTCATA
 979
GAGGGATATGAAGCA
2748




TCCCTC

CCGCC






 1190
 1212
CCCTCTAGAGGAGCCTG
 980
ACAGAACAGGCTCCTC
2749




TTCTGT

TAGA






 1191
 1213
CCTCTAGAGGAGCCTGT
 981
TACAGAACAGGCTCCT
2750




TCTGTA

CTAG






 1203
 1225
CCTGTTCTGTAATCGAT
 982
GGGTTTATCGATTACA
2751




AAACCC

GAAC






 1223
 1245
CCCCGATCAACCTCACC
 983
AGAGGTGGTGAGGTTG
2752




ACCTCT

ATCG






 1224
 1246
CCCGATCAACCTCACCA
 984
AAGAGGTGGTGAGGTT
2753




CCTCTT

GATC






 1225
 1247
CCGATCAACCTCACCAC
 985
CAAGAGGTGGTGAGG
2754




CTCTTG

TTGAT






 1233
 1255
CCTCACCACCTCTTGCT
 986
AGGCTGAGCAAGAGG
2755




CAGCCT

TGGTG






 1238
 1260
CCACCTCTTGCTCAGCC
 987
TATATAGGCTGAGCAA
2756




TATATA

GAGG






 1241
 1263
CCTCTTGCTCAGCCTAT
 988
CGGTATATAGGCTGAG
2757




ATACCG

CAAG






 1253
 1275
CCTATATACCGCCATCT
 989
TGCTGAAGATGGCGGT
2758




TCAGCA

ATAT






 1261
 1283
CCGCCATCTTCAGCAAA
 990
TCAGGGTTTGCTGAAG
2759




CCCTGA

ATGG






 1264
 1286
CCATCTTCAGCAAACCC
 991
TCATCAGGGTTTGCTG
2760




TGATGA

AAGA






 1278
 1300
CCCTGATGAAGGCTACA
 992
TTACTTTGTAGCCTTC
2761




AAGTAA

ATCA






 1279
 1301
CCTGATGAAGGCTACAA
 993
CTTACTTTGTAGCCTTC
2762




AGTAAG

ATC






 1310
 1332
CCCACGTAAAGACGTTA
 994
TTGACCTAACGTCTTT
2763




GGTCAA

ACGT






 1311
 1333
CCACGTAAAGACGTTAG
 995
CTTGACCTAACGTCTT
2764




GTCAAG

TACG






 1340
 1362
CCCATGAGGTGGCAAG
 996
CCCATTTCTTGCCACC
2765




AAATGGG

TCAT






 1341
 1363
CCATGAGGTGGCAAGA
 997
GCCCATTTCTTGCCAC
2766




AATGGGC

CTCA






 1375
 1397
CCCCAGAAAACTACGAT
 998
AGGGCTATCGTAGTTT
2767




AGCCCT

TCTG






 1376
 1398
CCCAGAAAACTACGATA
 999
AAGGGCTATCGTAGTT
2768




GCCCTT

TTCT






 1377
 1399
CCAGAAAACTACGATA
1000
TAAGGGCTATCGTAGT
2769




GCCCTTA

TTTC






 1394
 1416
CCCTTATGAAACTTAAG
1001
TCGACCCTTAAGTTTC
2770




GGTCGA

ATAA






 1395
 1417
CCTTATGAAACTTAAGG
1002
TTCGACCCTTAAGTTT
2771




GTCGAA

CATA






 1465
 1487
CCCTGAAGCGCGTACAC
1003
GGCGGTGTGTACGCGC
2772




ACCGCC

TTCA






 1466
 1488
CCTGAAGCGCGTACACA
1004
GGGCGGTGTGTACGCG
2773




CCGCCC

CTTC






 1483
 1505
CCGCCCGTCACCCTCCT
1005
TACTTGAGGAGGGTGA
2774




CAAGTA

CGGG






 1486
 1508
CCCGTCACCCTCCTCAA
1006
GTATACTTGAGGAGGG
2775




GTATAC

TGAC






 1487
 1509
CCGTCACCCTCCTCAAG
1007
AGTATACTTGAGGAGG
2776




TATACT

GTGA






 1493
 1515
CCCTCCTCAAGTATACT
1008
CTTTGAAGTATACTTG
2777




TCAAAG

AGGA






 1494
 1516
CCTCCTCAAGTATACTT
1009
CCTTTGAAGTATACTT
2778




CAAAGG

GAGG






 1497
 1519
CCTCAAGTATACTTCAA
1010
TGTCCTTTGAAGTATA
2779




AGGACA

CTTG






 1531
 1553
CCCCTACGCATTTATAT
1011
TCCTCTATATAAATGC
2780




AGAGGA

GTAG






 1532
 1554
CCCTACGCATTTATATA
1012
CTCCTCTATATAAATG
2781




GAGGAG

CGTA






 1533
 1555
CCTACGCATTTATATAG
1013
TCTCCTCTATATAAAT
2782




AGGAGA

GCGT






 1601
 1623
CCAGAGTGTAGCTTAAC
1014
CTTTGTGTTAAGCTAC
2783




ACAAAG

ACTC






 1626
 1648
CCCAACTTACACTTAGG
1015
AAATCTCCTAAGTGTA
2784




AGATTT

AGTT






 1627
 1649
CCAACTTACACTTAGGA
1016
GAAATCTCCTAAGTGT
2785




GATTTC

AAGT






 1662
 1684
CCGCTCTGAGCTAAACC
1017
GGGCTAGGTTTAGCTC
2786




TAGCCC

AGAG






 1677
 1699
CCTAGCCCCAAACCCAC
1018
GGTGGAGTGGGTTTGG
2787




TCCACC

GGCT






 1682
 1704
CCCCAAACCCACTCCAC
1019
AGTAAGGTGGAGTGG
2788




CTTACT

GTTTG






 1683
 1705
CCCAAACCCACTCCACC
1020
TAGTAAGGTGGAGTGG
2789




TTACTA

GTTT






 1684
 1706
CCAAACCCACTCCACCT
1021
GTAGTAAGGTGGAGTG
2790




TACTAC

GGTT






 1689
 1711
CCCACTCCACCTTACTA
1022
GTCTGGTAGTAAGGTG
2791




CCAGAC

GAGT






 1690
 1712
CCACTCCACCTTACTAC
1023
TGTCTGGTAGTAAGGT
2792




CAGACA

GGAG






 1695
 1717
CCACCTTACTACCAGAC
1024
AAGGTTGTCTGGTAGT
2793




AACCTT

AAGG






 1698
 1720
CCTTACTACCAGACAAC
1025
GCTAAGGTTGTCTGGT
2794




CTTAGC

AGTA






 1706
 1728
CCAGACAACCTTAGCCA
1026
ATGGTTTGGCTAAGGT
2795




AACCAT

TGTC






 1714
 1736
CCTTAGCCAAACCATTT
1027
TTGGGTAAATGGTTTG
2796




ACCCAA

GCTA






 1720
 1742
CCAAACCATTTACCCAA
1028
CTTTATTTGGGTAAAT
2797




ATAAAG

GGTT






 1725
 1747
CCATTTACCCAAATAAA
1029
CTATACTTTATTTGGG
2798




GTATAG

TAAA






 1732
 1754
CCCAAATAAAGTATAGG
1030
CTATCGCCTATACTTT
2799




CGATAG

ATTT






 1733
 1755
CCAAATAAAGTATAGGC
1031
TCTATCGCCTATACTTT
2800




GATAGA

ATT






 1764
 1786
CCTGGCGCAATAGATAT
1032
GGTACTATATCTATTG
2801




AGTACC

CGCC






 1785
 1807
CCGCAAGGGAAAGATG
1033
AATTTTTCATCTTTCCC
2802




AAAAATT

TTG






 1812
 1834
CCAAGCATAATATAGCA
1034
AGTCCTTGCTATATTA
2803




AGGACT

TGCT






 1837
 1859
CCCCTATACCTTCTGCA
1035
TCATTATGCAGAAGGT
2804




TAATGA

ATAG






 1838
 1860
CCCTATACCTTCTGCAT
1036
TTCATTATGCAGAAGG
2805




AATGAA

TATA






 1839
 1861
CCTATACCTTCTGCATA
1037
ATTCATTATGCAGAAG
2806




ATGAAT

GTAT






 1845
 1867
CCTTCTGCATAATGAAT
1038
TAGTTAATTCATTATG
2807




TAACTA

CAGA






 1889
 1911
CCAAAGCTAAGACCCCC
1039
GGTTTCGGGGGTCTTA
2808




GAAACC

GCTT






 1901
 1923
CCCCCGAAACCAGACG
1040
GGTAGCTCGTCTGGTT
2809




AGCTACC

TCGG






 1902
 1924
CCCCGAAACCAGACGA
1041
AGGTAGCTCGTCTGGT
2810




GCTACCT

TTCG






 1903
 1925
CCCGAAACCAGACGAG
1042
TAGGTAGCTCGTCTGG
2811




CTACCTA

TTTC






 1904
 1926
CCGAAACCAGACGAGC
1043
TTAGGTAGCTCGTCTG
2812




TACCTAA

GTTT






 1910
 1932
CCAGACGAGCTACCTAA
1044
CTGTTCTTAGGTAGCT
2813




GAACAG

CGTC






 1922
 1944
CCTAAGAACAGCTAAA
1045
GTGCTCTTTTAGCTGTT
2814




AGAGCAC

CTT






 1946
 1968
CCCGTCTATGTAGCAAA
1046
CACTATTTTGCTACAT
2815




ATAGTG

AGAC






 1947
 1969
CCGTCTATGTAGCAAAA
1047
CCACTATTTTGCTACA
2816




TAGTGG

TAGA






 1996
 2018
CCTACCGAGCCTGGTGA
1048
CAGCTATCACCAGGCT
2817




TAGCTG

CGGT






 2000
 2022
CCGAGCCTGGTGATAGC
1049
CAACCAGCTATCACCA
2818




TGGTTG

GGCT






 2005
 2027
CCTGGTGATAGCTGGTT
1050
TTGGACAACCAGCTAT
2819




GTCCAA

CACC






 2024
 2046
CCAAGATAGAATCTTAG
1051
GTTGAACTAAGATTCT
2820




TTCAAC

ATCT






 2057
 2079
CCCACAGAACCCTCTAA
1052
GGGGATTTAGAGGGTT
2821




ATCCCC

CTGT






 2058
 2080
CCACAGAACCCTCTAAA
1053
AGGGGATTTAGAGGGT
2822




TCCCCT

TCTG






 2066
 2088
CCCTCTAAATCCCCTTG
1054
AATTTACAAGGGGATT
2823




TAAATT

TAGA






 2067
 2089
CCTCTAAATCCCCTTGT
1055
AAATTTACAAGGGGAT
2824




AAATTT

TTAG






 2076
 2098
CCCCTTGTAAATTTAAC
1056
CTAACAGTTAAATTTA
2825




TGTTAG

CAAG






 2077
 2099
CCCTTGTAAATTTAACT
1057
ACTAACAGTTAAATTT
2826




GTTAGT

ACAA






 2078
 2100
CCTTGTAAATTTAACTG
1058
GACTAACAGTTAAATT
2827




TTAGTC

TACA






 2100
 2122
CCAAAGAGGAACAGCT
1059
TCCAAAGAGCTGTTCC
2828




CTTTGGA

TCTT






 2136
 2158
CCTTGTAGAGAGAGTAA
1060
AATTTTTTACTCTCTCT
2829




AAAATT

ACA






 2164
 2186
CCCATAGTAGGCCTAAA
1061
GCTGCTTTTAGGCCTA
2830




AGCAGC

CTAT






 2165
 2187
CCATAGTAGGCCTAAAA
1062
GGCTGCTTTTAGGCCT
2831




GCAGCC

ACTA






 2175
 2197
CCTAAAAGCAGCCACCA
1063
CTTAATTGGTGGCTGC
2832




ATTAAG

TTTT






 2186
 2208
CCACCAATTAAGAAAGC
1064
TTGAACGCTTTCTTAA
2833




GTTCAA

TTGG






 2189
 2211
CCAATTAAGAAAGCGTT
1065
AGCTTGAACGCTTTCT
2834




CAAGCT

TAAT






 2217
 2239
CCCACTACCTAAAAAAT
1066
TTTGGGATTTTTTAGG
2835




CCCAAA

TAGT






 2218
 2240
CCACTACCTAAAAAATC
1067
GTTTGGGATTTTTTAG
2836




CCAAAC

GTAG






 2224
 2246
CCTAAAAAATCCCAAAC
1068
TTATATGTTTGGGATT
2837




ATATAA

TTTT






 2234
 2256
CCCAAACATATAACTGA
1069
AGGAGTTCAGTTATAT
2838




ACTCCT

GTTT






 2235
 2257
CCAAACATATAACTGAA
1070
GAGGAGTTCAGTTATA
2839




CTCCTC

TGTT






 2254
 2276
CCTCACACCCAATTGGA
1071
GATTGGTCCAATTGGG
2840




CCAATC

TGTG






 2261
 2283
CCCAATTGGACCAATCT
1072
GGTGATAGATTGGTCC
2841




ATCACC

AATT






 2262
 2284
CCAATTGGACCAATCTA
1073
GGGTGATAGATTGGTC
2842




TCACCC

CAAT






 2271
 2293
CCAATCTATCACCCTAT
1074
TCTTCTATAGGGTGAT
2843




AGAAGA

AGAT






 2282
 2304
CCCTATAGAAGAACTAA
1075
CTAACATTAGTTCTTC
2844




TGTTAG

TATA






 2283
 2305
CCTATAGAAGAACTAAT
1076
ACTAACATTAGTTCTT
2845




GTTAGT

CTAT






 2328
 2350
CCTCCGCATAAGCCTGC
1077
TCTGACGCAGGCTTAT
2846




GTCAGA

GCGG






 2331
 2353
CCGCATAAGCCTGCGTC
1078
TAATCTGACGCAGGCT
2847




AGATTA

TATG






 2340
 2362
CCTGCGTCAGATTAAAA
1079
TCAGTGTTTTAATCTG
2848




CACTGA

ACGC






 2378
 2400
CCCAATATCTACAATCA
1080
GTTGGTTGATTGTAGA
2849




ACCAAC

TATT






 2379
 2401
CCAATATCTACAATCAA
1081
TGTTGGTTGATTGTAG
2850




CCAACA

ATAT






 2396
 2418
CCAACAAGTCATTATTA
1082
TGAGGGTAATAATGAC
2851




CCCTCA

TTGT






 2413
 2435
CCCTCACTGTCAACCCA
1083
CTGTGTTGGGTTGACA
2852




ACACAG

GTGA






 2414
 2436
CCTCACTGTCAACCCAA
1084
CCTGTGTTGGGTTGAC
2853




CACAGG

AGTG






 2426
 2448
CCCAACACAGGCATGCT
1085
CTTATGAGCATGCCTG
2854




CATAAG

TGTT






 2427
 2449
CCAACACAGGCATGCTC
1086
CCTTATGAGCATGCCT
2855




ATAAGG

GTGT






 2488
 2510
CCCCGCCTGTTTACCAA
1087
ATGTTTTTGGTAAACA
2856




AAACAT

GGCG






 2489
 2511
CCCGCCTGTTTACCAAA
1088
GATGTTTTTGGTAAAC
2857




AACATC

AGGC






 2490
 2512
CCGCCTGTTTACCAAAA
1089
TGATGTTTTTGGTAAA
2858




ACATCA

CAGG






 2493
 2515
CCTGTTTACCAAAAACA
1090
AGGTGATGTTTTTGGT
2859




TCACCT

AAAC






 2501
 2523
CCAAAAACATCACCTCT
1091
GATGCTAGAGGTGATG
2860




AGCATC

TTTT






 2513
 2535
CCTCTAGCATCACCAGT
1092
TCTAATACTGGTGATG
2861




ATTAGA

CTAG






 2525
 2547
CCAGTATTAGAGGCACC
1093
GCAGGCGGTGCCTCTA
2862




GCCTGC

ATAC






 2540
 2562
CCGCCTGCCCAGTGACA
1094
AACATGTGTCACTGGG
2863




CATGTT

CAGG






 2543
 2565
CCTGCCCAGTGACACAT
1095
TTAAACATGTGTCACT
2864




GTTTAA

GGGC






 2547
 2569
CCCAGTGACACATGTTT
1096
GCCGTTAAACATGTGT
2865




AACGGC

CACT






 2548
 2570
CCAGTGACACATGTTTA
1097
GGCCGTTAAACATGTG
2866




ACGGCC

TCAC






 2569
 2591
CCGCGGTACCCTAACCG
1098
TTTGCACGGTTAGGGT
2867




TGCAAA

ACCG






 2577
 2599
CCCTAACCGTGCAAAGG
1099
ATGCTACCTTTGCACG
2868




TAGCAT

GTTA






 2578
 2600
CCTAACCGTGCAAAGGT
1100
TATGCTACCTTTGCAC
2869




AGCATA

GGTT






 2583
 2605
CCGTGCAAAGGTAGCAT
1101
GTGATTATGCTACCTT
2870




AATCAC

TGCA






 2611
 2633
CCTTAAATAGGGACCTG
1102
TTCATACAGGTCCCTA
2871




TATGAA

TTTA






 2624
 2646
CCTGTATGAATGGCTCC
1103
CCTCGTGGAGCCATTC
2872




ACGAGG

ATAC






 2639
 2661
CCACGAGGGTTCAGCTG
1104
AAGAGACAGCTGAAC
2873




TCTCTT

CCTCG






 2670
 2692
CCAGTGAAATTGACCTG
1105
CACGGGCAGGTCAATT
2874




CCCGTG

TCAC






 2683
 2705
CCTGCCCGTGAAGAGGC
1106
ATGCCCGCCTCTTCAC
2875




GGGCAT

GGGC






 2687
 2709
CCCGTGAAGAGGCGGG
1107
TGTTATGCCCGCCTCT
2876




CATAACA

TCAC






 2688
 2710
CCGTGAAGAGGCGGGC
1108
GTGTTATGCCCGCCTC
2877




ATAACAC

TTCA






 2726
 2748
CCCTATGGAGCTTTAAT
1109
TAATAAATTAAAGCTC
2878




TTATTA

CATA






 2727
 2749
CCTATGGAGCTTTAATT
1110
TTAATAAATTAAAGCT
2879




TATTAA

CCAT






 2761
 2783
CCTAACAAACCCACAGG
1111
TTAGGACCTGTGGGTT
2880




TCCTAA

TGTT






 2770
 2792
CCCACAGGTCCTAAACT
1112
TTTGGTAGTTTAGGAC
2881




ACCAAA

CTGT






 2771
 2793
CCACAGGTCCTAAACTA
1113
GTTTGGTAGTTTAGGA
2882




CCAAAC

CCTG






 2779
 2801
CCTAAACTACCAAACCT
1114
TAATGCAGGTTTGGTA
2883




GCATTA

GTTT






 2788
 2810
CCAAACCTGCATTAAAA
1115
CGAAATTTTTAATGCA
2884




ATTTCG

GGTT






 2793
 2815
CCTGCATTAAAAATTTC
1116
CCAACCGAAATTTTTA
2885




GGTTGG

ATGC






 2821
 2843
CCTCGGAGCAGAACCCA
1117
GGAGGTTGGGTTCTGC
2886




ACCTCC

TCCG






 2834
 2856
CCCAACCTCCGAGCAGT
1118
GCATGTACTGCTCGGA
2887




ACATGC

GGTT






 2835
 2857
CCAACCTCCGAGCAGTA
1119
AGCATGTACTGCTCGG
2888




CATGCT

AGGT






 2839
 2861
CCTCCGAGCAGTACATG
1120
TCTTAGCATGTACTGC
2889




CTAAGA

TCGG






 2842
 2864
CCGAGCAGTACATGCTA
1121
AAGTCTTAGCATGTAC
2890




AGACTT

TGCT






 2867
 2889
CCAGTCAAAGCGAACTA
1122
GTATAGTAGTTCGCTT
2891




CTATAC

TGAC






 2899
 2921
CCAATAACTTGACCAAC
1123
TGTTCCGTTGGTCAAG
2892




GGAACA

TTAT






 2911
 2933
CCAACGGAACAAGTTAC
1124
CCTAGGGTAACTTGTT
2893




CCTAGG

CCGT






 2927
 2949
CCCTAGGGATAACAGCG
1125
GGATTGCGCTGTTATC
2894




CAATCC

CCTA






 2928
 2950
CCTAGGGATAACAGCGC
1126
AGGATTGCGCTGTTAT
2895




AATCCT

CCCT






 2948
 2970
CCTATTCTAGAGTCCAT
1127
GTTGATATGGACTCTA
2896




ATCAAC

GAAT






 2961
 2983
CCATATCAACAATAGGG
1128
CGTAAACCCTATTGTT
2897




TTTACG

GATA






 2985
 3007
CCTCGATGTTGGATCAG
1129
GATGTCCTGATCCAAC
2898




GACATC

ATCG






 3007
 3029
CCCGATGGTGCAGCCGC
1130
TTAATAGCGGCTGCAC
2899




TATTAA

CATC






 3008
 3030
CCGATGGTGCAGCCGCT
1131
TTTAATAGCGGCTGCA
2900




ATTAAA

CCAT






 3020
 3042
CCGCTATTAAAGGTTCG
1132
AACAAACGAACCTTTA
2901




TTTGTT

ATAG






 3056
 3078
CCTACGTGATCTGAGTT
1133
GGTCTGAACTCAGATC
2902




CAGACC

ACGT






 3077
 3099
CCGGAGTAATCCAGGTC
1134
GAAACCGACCTGGATT
2903




GGTTTC

ACTC






 3087
 3109
CCAGGTCGGTTTCTATC
1135
AANGTAGATAGAAAC
2904




TACNTT

CGACC






 3116
 3138
CCTCCCTGTACGAAAGG
1136
TCTTGTCCTTTCGTACA
2905




ACAAGA

GGG






 3119
 3141
CCCTGTACGAAAGGACA
1137
TTCTCTTGTCCTTTCGT
2906




AGAGAA

ACA






 3120
 3142
CCTGTACGAAAGGACA
1138
TTTCTCTTGTCCTTTCG
2907




AGAGAAA

TAC






 3148
 3170
CCTACTTCACAAAGCGC
1139
GGGAAGGCGCTTTGTG
2908




CTTCCC

AAGT






 3164
 3186
CCTTCCCCCGTAAATGA
1140
ATGATATCATTTACGG
2909




TATCAT

GGGA






 3168
 3190
CCCCCGTAAATGATATC
1141
TGAGATGATATCATTT
2910




ATCTCA

ACGG






 3169
 3191
CCCCGTAAATGATATCA
1142
TTGAGATGATATCATT
2911




TCTCAA

TACG






 3170
 3192
CCCGTAAATGATATCAT
1143
GTTGAGATGATATCAT
2912




CTCAAC

TTAC






 3171
 3193
CCGTAAATGATATCATC
1144
AGTTGAGATGATATCA
2913




TCAACT

TTTA






 3204
 3226
CCCACACCCACCCAAGA
1145
CCCTGTTCTTGGGTGG
2914




ACAGGG

GTGT






 3205
 3227
CCACACCCACCCAAGAA
1146
ACCCTGTTCTTGGGTG
2915




CAGGGT

GGTG






 3210
 3232
CCCACCCAAGAACAGG
1147
AACAAACCCTGTTCTT
2916




GTTTGTT

GGGT






 3211
 3233
CCACCCAAGAACAGGG
1148
TAACAAACCCTGTTCT
2917




TTTGTTA

TGGG






 3214
 3236
CCCAAGAACAGGGTTTG
1149
TCTTAACAAACCCTGT
2918




TTAAGA

TCTT






 3215
 3237
CCAAGAACAGGGTTTGT
1150
ATCTTAACAAACCCTG
2919




TAAGAT

TTCT






 3245
 3267
CCCGGTAATCGCATAAA
1151
TTAAGTTTTATGCGAT
2920




ACTTAA

TACC






 3246
 3268
CCGGTAATCGCATAAAA
1152
TTAAGTTTTATGCGA
2921




CTTAAA

TTAC






 3292
 3314
CCTCTTCTTAACAACAT
1153
ATGGGTATGTTGTTAA
2922




ACCCAT

GAAG






 3310
 3332
CCCATGGCCAACCTCCT
1154
AGGAGTAGGAGGTTG
2923




ACTCCT

GCCAT






 3311
 3333
CCATGGCCAACCTCCTA
1155
GAGGAGTAGGAGGTT
2924




CTCCTC

GGCCA






 3317
 3339
CCAACCTCCTACTCCTC
1156
TACAATGAGGAGTAG
2925




ATTGTA

GAGGT






 3321
 3343
CCTCCTACTCCTCATTGT
1157
TGGGTACAATGAGGA
2926




ACCCA

GTAGG






 3324
 3346
CCTACTCCTCATTGTAC
1158
GAATGGGTACAATGA
2927




CCATTC

GGAGT






 3330
 3352
CCTCATTGTACCCATTC
1159
CGATTAGAATGGGTAC
2928




TAATCG

AATG






 3340
 3362
CCCATTCTAATCGCAAT
1160
AATGCCATTGCGATTA
2929




GGCATT

GAAT






 3341
 3363
CCATTCTAATCGCAATG
1161
GAATGCCATTGCGATT
2930




GCATTC

AGAA






 3363
 3385
CCTAATGCTTACCGAAC
1162
TTTTTCGTTCGGTAAG
2931




GAAAAA

CATT






 3374
 3396
CCGAACGAAAAATTCTA
1163
ATAGCCTAGAATTTTT
2932




GGCTAT

CGTT






 3414
 3436
CCCCAACGTTGTAGGCC
1164
CGTAGGGGCCTACAAC
2933




CCTACG

GTTG






 3415
 3437
CCCAACGTTGTAGGCCC
1165
CCGTAGGGGCCTACAA
2934




CTACGG

CGTT






 3416
 3438
CCAACGTTGTAGGCCCC
1166
CCCGTAGGGGCCTACA
2935




TACGGG

ACGT






 3429
 3451
CCCCTACGGGCTACTAC
1167
AGGGTTGTAGTAGCCC
2936




AACCCT

GTAG






 3430
 3452
CCCTACGGGCTACTACA
1168
AAGGGTTGTAGTAGCC
2937




ACCCTT

CGTA






 3431
 3453
CCTACGGGCTACTACAA
1169
GAAGGGTTGTAGTAGC
2938




CCCTTC

CCGT






 3448
 3470
CCCTTCGCTGACGCCAT
1170
AGTTTTATGGCGTCAG
2939




AAAACT

CGAA






 3449
 3471
CCTTCGCTGACGCCATA
1171
GAGTTTTATGGCGTCA
2940




AAACTC

GCGA






 3461
 3483
CCATAAAACTCTTCACC
1172
CTCTTTGGTGAAGAGT
2941




AAAGAG

TTTA






 3476
 3498
CCAAAGAGCCCCTAAA
1173
GGCGGGTTTTAGGGGC
2942




ACCCGCC

TCTT






 3484
 3506
CCCCTAAAACCCGCCAC
1174
GTAGATGTGGCGGGTT
2943




ATCTAC

TTAG






 3485
 3507
CCCTAAAACCCGCCACA
1175
GGTAGATGTGGCGGGT
2944




TCTACC

TTTA






 3486
 3508
CCTAAAACCCGCCACAT
1176
TGGTAGATGTGGCGGG
2945




CTACCA

TTTT






 3493
 3515
CCCGCCACATCTACCAT
1177
AGGGTGATGGTAGATG
2946




CACCCT

TGGC






 3494
 3516
CCGCCACATCTACCATC
1178
GAGGGTGATGGTAGAT
2947




ACCCTC

GTGG






 3497
 3519
CCACATCTACCATCACC
1179
GTAGAGGGTGATGGTA
2948




CTCTAC

GATG






 3506
 3528
CCATCACCCTCTACATC
1180
GGCGGTGATGTAGAG
2949




ACCGCC

GGTGA






 3512
 3534
CCCTCTACATCACCGCC
1181
GGTCGGGGCGGTGATG
2950




CCGACC

TAGA






 3513
 3535
CCTCTACATCACCGCCC
1182
AGGTCGGGGCGGTGAT
2951




CGACCT

GTAG






 3524
 3546
CCGCCCCGACCTTAGCT
1183
GGTGAGAGCTAAGGTC
2952




CTCACC

GGGG






 3527
 3549
CCCCGACCTTAGCTCTC
1184
GATGGTGAGAGCTAA
2953




ACCATC

GGTCG






 3528
 3550
CCCGACCTTAGCTCTCA
1185
CGATGGTGAGAGCTAA
2954




CCATCG

GGTC






 3529
 3551
CCGACCTTAGCTCTCAC
1186
GCGATGGTGAGAGCTA
2955




CATCGC

AGGT






 3533
 3555
CCTTAGCTCTCACCATC
1187
AAGAGCGATGGTGAG
2956




GCTCTT

AGCTA






 3545
 3567
CCATCGCTCTTCTACTA
1188
GGTTCATAGTAGAAGA
2957




TGAACC

GCGA






 3566
 3588
CCCCCCTCCCCATACCC
1189
GGGGTTGGGTATGGGG
2958




AACCCC

AGGG






 3567
 3589
CCCCCTCCCCATACCCA
1190
GGGGGTTGGGTATGGG
2959




ACCCCC

GAGG






 3568
 3590
CCCCTCCCCATACCCAA
1191
AGGGGGTTGGGTATGG
2960




CCCCCT

GGAG






 3569
 3591
CCCTCCCCATACCCAAC
1192
CAGGGGGTTGGGTATG
2961




CCCCTG

GGGA






 3570
 3592
CCTCCCCATACCCAACC
1193
CCAGGGGGTTGGGTAT
2962




CCCTGG

GGGG






 3573
 3595
CCCCATACCCAACCCCC
1194
TGACCAGGGGGTTGGG
2963




TGGTCA

TATG






 3574
 3596
CCCATACCCAACCCCCT
1195
TTGACCAGGGGGTTGG
2964




GGTCAA

GTAT






 3575
 3597
CCATACCCAACCCCCTG
1196
GTTGACCAGGGGGTTG
2965




GTCAAC

GGTA






 3580
 3602
CCCAACCCCCTGGTCAA
1197
TTGAGGTTGACCAGGG
2966




CCTCAA

GGTT






 3581
 3603
CCAACCCCCTGGTCAAC
1198
GTTGAGGTTGACCAGG
2967




CTCAAC

GGGT






 3585
 3607
CCCCCTGGTCAACCTCA
1199
CTAGGTTGAGGTTGAC
2968




ACCTAG

CAGG






 3586
 3608
CCCCTGGTCAACCTCAA
1200
CCTAGGTTGAGGTTGA
2969




CCTAGG

CCAG






 3587
 3609
CCCTGGTCAACCTCAAC
1201
GCCTAGGTTGAGGTTG
2970




CTAGGC

ACCA






 3588
 3610
CCTGGTCAACCTCAACC
1202
GGCCTAGGTTGAGGTT
2971




TAGGCC

GACC






 3597
 3619
CCTCAACCTAGGCCTCC
1203
TAAATAGGAGGCCTAG
2972




TATTTA

GTTG






 3603
 3625
CCTAGGCCTCCTATTTA
1204
CTAGAATAAATAGGA
2973




TTCTAG

GGCCT






 3609
 3631
CCTCCTATTTATTCTAGC
1205
AGGTGGCTAGAATAA
2974




CACCT

ATAGG






 3612
 3634
CCTATTTATTCTAGCCA
1206
TAGAGGTGGCTAGAAT
2975




CCTCTA

AAAT






 3626
 3648
CCACCTCTAGCCTAGCC
1207
GTAAACGGCTAGGCTA
2976




GTTTAC

GAGG






 3629
 3651
CCTCTAGCCTAGCCGTT
1208
TGAGTAAACGGCTAGG
2977




TACTCA

CTAG






 3636
 3658
CCTAGCCGTTTACTCAA
1209
AGAGGATTGAGTAAA
2978




TCCTCT

CGGCT






 3641
 3663
CCGTTTACTCAATCCTC
1210
TGATCAGAGGATTGAG
2979




TGATCA

TAAA






 3654
 3676
CCTCTGATCAGGGTGAG
1211
TTGATGCTCACCCTGA
2980




CATCAA

TCAG






 3689
 3711
CCCTGATCGGCGCACTG
1212
TGCTCGCAGTGCGCCG
2981




CGAGCA

ATCA






 3690
 3712
CCTGATCGGCGCACTGC
1213
CTGCTCGCAGTGCGCC
2982




GAGCAG

GATC






 3716
 3738
CCCAAACAATCTCATAT
1214
GACTTCATATGAGATT
2983




GAAGTC

GTTT






 3717
 3739
CCAAACAATCTCATATG
1215
TGACTTCATATGAGAT
2984




AAGTCA

TGTT






 3740
 3762
CCCTAGCCATCATTCTA
1216
TGATAGTAGAATGATG
2985




CTATCA

GCTA






 3741
 3763
CCTAGCCATCATTCTAC
1217
TTGATAGTAGAATGAT
2986




TATCAA

GGCT






 3746
 3768
CCATCATTCTACTATCA
1218
TAATGTTGATAGTAGA
2987




ACATTA

ATGA






 3782
 3804
CCTTTAACCTCTCCACC
1219
GATAAGGGTGGAGAG
2988




CTTATC

GTTAA






 3789
 3811
CCTCTCCACCCTTATCA
1220
GTGTTGTGATAAGGGT
2989




CAACAC

GGAG






 3794
 3816
CCACCCTTATCACAACA
1221
TTCTTGTGTTGTGATA
2990




CAAGAA

AGGG






 3797
 3819
CCCTTATCACAACACAA
1222
GTGTTCTTGTGTTGTG
2991




GAACAC

ATAA






 3798
 3820
CCTTATCACAACACAAG
1223
GGTGTTCTTGTGTTGT
2992




AACACC

GATA






 3819
 3841
CCTCTGATTACTCCTGC
1224
ATGATGGCAGGAGTA
2993




CATCAT

ATCAG






 3831
 3853
CCTGCCATCATGACCCT
1225
TGGCCAAGGGTCATGA
2994




TGGCCA

TGGC






 3835
 3857
CCATCATGACCCTTGGC
1226
ATTATGGCCAAGGGTC
2995




CATAAT

ATGA






 3844
 3866
CCCTTGGCCATAATATG
1227
ATAAATCATATTATGG
2996




ATTTAT

CCAA






 3845
 3867
CCTTGGCCATAATATGA
1228
GATAAATCATATTATG
2997




TTTATC

GCCA






 3851
 3873
CCATAATATGATTTATC
1229
TGTGGAGATAAATCAT
2998




TCCACA

ATTA






 3869
 3891
CCACACTAGCAGAGACC
1230
TCGGTTGGTCTCTGCT
2999




AACCGA

AGTG






 3884
 3906
CCAACCGAACCCCCTTC
1231
AAGGTCGAAGGGGGT
3000




GACCTT

TCGGT






 3888
 3910
CCGAACCCCCTTCGACC
1232
CGGCAAGGTCGAAGG
3001




TTGCCG

GGGTT






 3893
 3915
CCCCCTTCGACCTTGCC
1233
CCCTTCGGCAAGGTCG
3002




GAAGGG

AAGG






 3894
 3916
CCCCTTCGACCTTGCCG
1234
CCCCTTCGGCAAGGTC
3003




AAGGGG

GAAG






 3895
 3917
CCCTTCGACCTTGCCGA
1235
TCCCCTTCGGCAAGGT
3004




AGGGGA

CGAA






 3896
 3918
CCTTCGACCTTGCCGAA
1236
CTCCCCTTCGGCAAGG
3005




GGGGAG

TCGA






 3903
 3925
CCTTGCCGAAGGGGAGT
1237
GTTCGGACTCCCCTTC
3006




CCGAAC

GGCA






 3908
 3930
CCGAAGGGGAGTCCGA
1238
GACTAGTTCGGACTCC
3007




ACTAGTC

CCTT






 3920
 3942
CCGAACTAGTCTCAGGC
1239
GTTGAAGCCTGAGACT
3008




TTCAAC

AGTT






 3953
 3975
CCGCAGGCCCCTTCGCC
1240
GAATAGGGCGAAGGG
3009




CTATTC

GCCTG






 3960
 3982
CCCCTTCGCCCTATTCTT
1241
CTATGAAGAATAGGGC
3010




CATAG

GAAG






 3961
 3983
CCCTTCGCCCTATTCTTC
1242
GCTATGAAGAATAGG
3011




ATAGC

GCGAA






 3962
 3984
CCTTCGCCCTATTCTTCA
1243
GGCTATGAAGAATAG
3012




TAGCC

GGCGA






 3968
 3990
CCCTATTCTTCATAGCC
1244
GTATTCGGCTATGAAG
3013




GAATAC

AATA






 3969
 3991
CCTATTCTTCATAGCCG
1245
TGTATTCGGCTATGAA
3014




AATACA

GAAT






 3983
 4005
CCGAATACACAAACATT
1246
TATAATAATGTTTGTG
3015




ATTATA

TATT






 4013
 4035
CCCTCACCACTACAATC
1247
TAGGAAGATTGTAGTG
3016




TTCCTA

GTGA






 4014
 4036
CCTCACCACTACAATCT
1248
CTAGGAAGATTGTAGT
3017




TCCTAG

GGTG






 4019
 4041
CCACTACAATCTTCCTA
1249
TGTTCCTAGGAAGATT
3018




GGAACA

GTAG






 4032
 4054
CCTAGGAACAACATATG
1250
GTGCGTCATATGTTGT
3019




ACGCAC

TCCT






 4058
 4080
CCCCTGAACTCTACACA
1251
ATATGTTGTGTAGAGT
3020




ACATAT

TCAG






 4059
 4081
CCCTGAACTCTACACAA
1252
AATATGTTGTGTAGAG
3021




CATATT

TTCA






 4060
 4082
CCTGAACTCTACACAAC
1253
AAATATGTTGTGTAGA
3022




ATATTT

GTTC






 4088
 4110
CCAAGACCCTACTTCTA
1254
GGAGGTTAGAAGTAG
3023




ACCTCC

GGTCT






 4094
 4116
CCCTACTTCTAACCTCC
1255
GAACAGGGAGGTTAG
3024




CTGTTC

AAGTA






 4095
 4117
CCTACTTCTAACCTCCC
1256
AGAACAGGGAGGTTA
3025




TGTTCT

GAAGT






 4106
 4128
CCTCCCTGTTCTTATGA
1257
TCGAATTCATAAGAAC
3026




ATTCGA

AGGG






 4109
 4131
CCCTGTTCTTATGAATT
1258
TGTTCGAATTCATAAG
3027




CGAACA

AACA






 4110
 4132
CCTGTTCTTATGAATTC
1259
CTGTTCGAATTCATAA
3028




GAACAG

GAAC






 4137
 4159
CCCCCGATTCCGCTACG
1260
GTTGGTCGTAGCGGAA
3029




ACCAAC

TCGG






 4138
 4160
CCCCGATTCCGCTACGA
1261
AGTTGGTCGTAGCGGA
3030




CCAACT

ATCG






 4139
 4161
CCCGATTCCGCTACGAC
1262
GAGTTGGTCGTAGCGG
3031




CAACTC

AATC






 4140
 4162
CCGATTCCGCTACGACC
1263
TGAGTTGGTCGTAGCG
3032




AACTCA

GAAT






 4146
 4168
CCGCTACGACCAACTCA
1264
GGTGTATGAGTTGGTC
3033




TACACC

GTAG






 4155
 4177
CCAACTCATACACCTCC
1265
TTCATAGGAGGTGTAT
3034




TATGAA

GAGT






 4167
 4189
CCTCCTATGAAAAAACT
1266
GTAGGAAGTTTTTTCA
3035




TCCTAC

TAGG






 4170
 4192
CCTATGAAAAAACTTCC
1267
GTGGTAGGAAGTTTTT
3036




TACCAC

TCAT






 4185
 4207
CCTACCACTCACCCTAG
1268
GTAATGCTAGGGTGAG
3037




CATTAC

TGGT






 4189
 4211
CCACTCACCCTAGCATT
1269
ATAAGTAATGCTAGGG
3038




ACTTAT

TGAG






 4196
 4218
CCCTAGCATTACTTATA
1270
ATATCATATAAGTAAT
3039




TGATAT

GCTA






 4197
 4219
CCTAGCATTACTTATAT
1271
CATATCATATAAGTAA
3040




GATATG

TGCT






 4223
 4245
CCATACCCATTACAATC
1272
GCTGGAGATTGTAATG
3041




TCCAGC

GGTA






 4228
 4250
CCCATTACAATCTCCAG
1273
GGAATGCTGGAGATTG
3042




CATTCC

TAAT






 4229
 4251
CCATTACAATCTCCAGC
1274
GGGAATGCTGGAGATT
3043




ATTCCC

GTAA






 4241
 4263
CCAGCATTCCCCCTCAA
1275
TTAGGTTTGAGGGGGA
3044




ACCTAA

ATGC






 4249
 4271
CCCCCTCAAACCTAAGA
1276
CATATTTCTTAGGTTT
3045




AATATG

GAGG






 4250
 4272
CCCCTCAAACCTAAGAA
1277
ACATATTTCTTAGGTT
3046




ATATGT

TGAG






 4251
 4273
CCCTCAAACCTAAGAAA
1278
GACATATTTCTTAGGT
3047




TATGTC

TTGA






 4252
 4274
CCTCAAACCTAAGAAAT
1279
AGACATATTTCTTAGG
3048




ATGTCT

TTTG






 4259
 4281
CCTAAGAAATATGTCTG
1280
TTTTATCAGACATATT
3049




ATAAAA

TCTT






 4318
 4340
CCCCCTTATTTCTAGGA
1281
TCATAGTCCTAGAAAT
3050




CTATGA

AAGG






 4319
 4341
CCCCTTATTTCTAGGAC
1282
CTCATAGTCCTAGAAA
3051




TATGAG

TAAG






 4320
 4342
CCCTTATTTCTAGGACT
1283
TCTCATAGTCCTAGAA
3052




ATGAGA

ATAA






 4321
 4343
CCTTATTTCTAGGACTA
1284
TTCTCATAGTCCTAGA
3053




TGAGAA

AATA






 4349
 4371
CCCATCCCTGAGAATCC
1285
AATTTTGGATTCTCAG
3054




AAAATT

GGAT






 4350
 4372
CCATCCCTGAGAATCCA
1286
GAATTTTGGATTCTCA
3055




AAATTC

GGGA






 4354
 4376
CCCTGAGAATCCAAAAT
1287
CGGAGAATTTTGGATT
3056




TCTCCG

CTCA






 4355
 4377
CCTGAGAATCCAAAATT
1288
ACGGAGAATTTTGGAT
3057




CTCCGT

TCTC






 4364
 4386
CCAAAATTCTCCGTGCC
1289
ATAGGTGGCACGGAG
3058




ACCTAT

AATTT






 4374
 4396
CCGTGCCACCTATCACA
1290
ATGGGGTGTGATAGGT
3059




CCCCAT

GGCA






 4379
 4401
CCACCTATCACACCCCA
1291
TTAGGATGGGGTGTGA
3060




TCCTAA

TAGG






 4382
 4404
CCTATCACACCCCATCC
1292
ACTTTAGGATGGGGTG
3061




TAAAGT

TGAT






 4391
 4413
CCCCATCCTAAAGTAAG
1293
GCTGACCTTACTTTAG
3062




GTCAGC

GATG






 4392
 4414
CCCATCCTAAAGTAAGG
1294
AGCTGACCTTACTTTA
3063




TCAGCT

GGAT






 4393
 4415
CCATCCTAAAGTAAGGT
1295
TAGCTGACCTTACTTT
3064




CAGCTA

AGGA






 4397
 4419
CCTAAAGTAAGGTCAGC
1296
TATTTAGCTGACCTTA
3065




TAAATA

CTTT






 4430
 4452
CCCATACCCCGAAAATG
1297
AACCAACATTTTCGGG
3066




TTGGTT

GTAT






 4431
 4453
CCATACCCCGAAAATGT
1298
TAACCAACATTTTCGG
3067




TGGTTA

GGTA






 4436
 4458
CCCCGAAAATGTTGGTT
1299
GGGTATAACCAACATT
3068




ATACCC

TTCG






 4437
 4459
CCCGAAAATGTTGGTTA
1300
AGGGTATAACCAACAT
3069




TACCCT

TTTC






 4438
 4460
CCGAAAATGTTGGTTAT
1301
AAGGGTATAACCAAC
3070




ACCCTT

ATTTT






 4456
 4478
CCCTTCCCGTACTAATT
1302
GGGATTAATTAGTACG
3071




AATCCC

GGAA






 4457
 4479
CCTTCCCGTACTAATTA
1303
GGGGATTAATTAGTAC
3072




ATCCCC

GGGA






 4461
 4483
CCCGTACTAATTAATCC
1304
GCCAGGGGATTAATTA
3073




CCTGGC

GTAC






 4462
 4484
CCGTACTAATTAATCCC
1305
GGCCAGGGGATTAATT
3074




CTGGCC

AGTA






 4476
 4498
CCCCTGGCCCAACCCGT
1306
TAGATGACGGGTTGGG
3075




CATCTA

CCAG






 4477
 4499
CCCTGGCCCAACCCGTC
1307
GTAGATGACGGGTTGG
3076




ATCTAC

GCCA






 4478
 4500
CCTGGCCCAACCCGTCA
1308
AGTAGATGACGGGTTG
3077




TCTACT

GGCC






 4483
 4505
CCCAACCCGTCATCTAC
1309
GGTAGAGTAGATGAC
3078




TCTACC

GGGTT






 4484
 4506
CCAACCCGTCATCTACT
1310
TGGTAGAGTAGATGAC
3079




CTACCA

GGGT






 4488
 4510
CCCGTCATCTACTCTAC
1311
AAGATGGTAGAGTAG
3080




CATCTT

ATGAC






 4489
 4511
CCGTCATCTACTCTACC
1312
AAAGATGGTAGAGTA
3081




ATCTTT

GATGA






 4504
 4526
CCATCTTTGCAGGCACA
1313
GATGAGTGTGCCTGCA
3082




CTCATC

AAGA






 4555
 4577
CCTGAGTAGGCCTAGAA
1314
GTTTATTTCTAGGCCT
3083




ATAAAC

ACTC






 4565
 4587
CCTAGAAATAAACATGC
1315
AAGCTAGCATGTTTAT
3084




TAGCTT

TTCT






 4593
 4615
CCAGTTCTAACCAAAAA
1316
TTTATTTTTTTGGTTAG
3085




AATAAA

AAC






 4603
 4625
CCAAAAAAATAAACCCT
1317
GGAACGAGGGTTTATT
3086




CGTTCC

TTTT






 4616
 4638
CCCTCGTTCCACAGAAG
1318
TGGCAGCTTCTGTGGA
3087




CTGCCA

ACGA






 4617
 4639
CCTCGTTCCACAGAAGC
1319
ATGGCAGCTTCTGTGG
3088




TGCCAT

AACG






 4624
 4646
CCACAGAAGCTGCCATC
1320
ATACTTGATGGCAGCT
3089




AAGTAT

TCTG






 4636
 4658
CCATCAAGTATTTCCTC
1321
TTGCGTGAGGAAATAC
3090




ACGCAA

TTGA






 4649
 4671
CCTCACGCAAGCAACCG
1322
TGGATGCGGTTGCTTG
3091




CATCCA

CGTG






 4663
 4685
CCGCATCCATAATCCTT
1323
TATTAGAAGGATTATG
3092




CTAATA

GATG






 4669
 4691
CCATAATCCTTCTAATA
1324
GATAGCTATTAGAAGG
3093




GCTATC

ATTA






 4676
 4698
CCTTCTAATAGCTATCC
1325
TGAAGAGGATAGCTAT
3094




TCTTCA

TAGA






 4691
 4713
CCTCTTCAACAATATAC
1326
CGGAGAGTATATTGTT
3095




TCTCCG

GAAG






 4711
 4733
CCGGACAATGAACCATA
1327
ATTGGTTATGGTTCAT
3096




ACCAAT

TGTC






 4723
 4745
CCATAACCAATACTACC
1328
TTGATTGGTAGTATTG
3097




AATCAA

GTTA






 4729
 4751
CCAATACTACCAATCAA
1329
TGAGTATTGATTGGTA
3098




TACTCA

GTAT






 4738
 4760
CCAATCAATACTCATCA
1330
TATTAATGATGAGTAT
3099




TTAATA

TGAT






 4795
 4817
CCCCCTTTCACTTCTGA
1331
TGGGACTCAGAAGTGA
3100




GTCCCA

AAGG






 4796
 4818
CCCCTTTCACTTCTGAG
1332
CTGGGACTCAGAAGTG
3101




TCCCAG

AAAG






 4797
 4819
CCCTTTCACTTCTGAGT
1333
TCTGGGACTCAGAAGT
3102




CCCAGA

GAAA






 4798
 4820
CCTTTCACTTCTGAGTC
1334
CTCTGGGACTCAGAAG
3103




CCAGAG

TGAA






 4814
 4836
CCCAGAGGTTACCCAAG
1335
GGGTGCCTTGGGTAAC
3104




GCACCC

CTCT






 4815
 4837
CCAGAGGTTACCCAAGG
1336
GGGGTGCCTTGGGTAA
3105




CACCCC

CCTC






 4825
 4847
CCCAAGGCACCCCTCTG
1337
GGATGTCAGAGGGGT
3106




ACATCC

GCCTT






 4826
 4848
CCAAGGCACCCCTCTGA
1338
CGGATGTCAGAGGGGT
3107




CATCCG

GCCT






 4834
 4856
CCCCTCTGACATCCGGC
1339
AAGCAGGCCGGATGTC
3108




CTGCTT

AGAG






 4835
 4857
CCCTCTGACATCCGGCC
1340
GAAGCAGGCCGGATG
3109




TGCTTC

TCAGA






 4836
 4858
CCTCTGACATCCGGCCT
1341
AGAAGCAGGCCGGAT
3110




GCTTCT

GTCAG






 4846
 4868
CCGGCCTGCTTCTTCTC
1342
TCATGTGAGAAGAAGC
3111




ACATGA

AGGC






 4850
 4872
CCTGCTTCTTCTCACAT
1343
TTTGTCATGTGAGAAG
3112




GACAAA

AAGC






 4879
 4901
CCCCCATCTCAATCATA
1344
TTGGTATATGATTGAG
3113




TACCAA

ATGG






 4880
 4902
CCCCATCTCAATCATAT
1345
TTTGGTATATGATTGA
3114




ACCAAA

GATG






 4881
 4903
CCCATCTCAATCATATA
1346
ATTTGGTATATGATTG
3115




CCAAAT

AGAT






 4882
 4904
CCATCTCAATCATATAC
1347
GATTTGGTATATGATT
3116




CAAATC

GAGA






 4898
 4920
CCAAATCTCTCCCTCAC
1348
CGTTTAGTGAGGGAGA
3117




TAAACG

GATT






 4908
 4930
CCCTCACTAAACGTAAG
1349
AGAAGGCTTACGTTTA
3118




CCTTCT

GTGA






 4909
 4931
CCTCACTAAACGTAAGC
1350
GAGAAGGCTTACGTTT
3119




CTTCTC

AGTG






 4925
 4947
CCTTCTCCTCACTCTCTC
1351
AGATTGAGAGAGTGA
3120




AATCT

GGAGA






 4931
 4953
CCTCACTCTCTCAATCTT
1352
TGGATAAGATTGAGAG
3121




ATCCA

AGTG






 4951
 4973
CCATCATAGCAGGCAGT
1353
ACCTCAACTGCCTGCT
3122




TGAGGT

ATGA






 4982
 5004
CCAAACCCAGCTACGCA
1354
AGATTTTGCGTAGCTG
3123




AAATCT

GGTT






 4987
 5009
CCCAGCTACGCAAAATC
1355
TGCTAAGATTTTGCGT
3124




TTAGCA

AGCT






 4988
 5010
CCAGCTACGCAAAATCT
1356
ATGCTAAGATTTTGCG
3125




TAGCAT

TAGC






 5014
 5036
CCTCAATTACCCACATA
1357
TCATCCTATGTGGGTA
3126




GGATGA

ATTG






 5023
 5045
CCCACATAGGATGAATA
1358
TGCTATTATTCATCCT
3127




ATAGCA

ATGT






 5024
 5046
CCACATAGGATGAATAA
1359
CTGCTATTATTCATCCT
3128




TAGCAG

ATG






 5052
 5074
CCGTACAACCCTAACAT
1360
ATGGTTATGTTAGGGT
3129




AACCAT

TGTA






 5060
 5082
CCCTAACATAACCATTC
1361
AATTAAGAATGGTTAT
3130




TTAATT

GTTA






 5061
 5083
CCTAACATAACCATTCT
1362
AAATTAAGAATGGTTA
3131




TAATTT

TGTT






 5071
 5093
CCATTCTTAATTTAACT
1363
ATAAATAGTTAAATTA
3132




ATTTAT

AGAA






 5099
 5121
CCTAACTACTACCGCAT
1364
GTAGGAATGCGGTAGT
3133




TCCTAC

AGTT






 5110
 5132
CCGCATTCCTACTACTC
1365
TAAGTTGAGTAGTAGG
3134




AACTTA

AATG






 5117
 5139
CCTACTACTCAACTTAA
1366
TGGAGTTTAAGTTGAG
3135




ACTCCA

TAGT






 5137
 5159
CCAGCACCACGACCCTA
1367
TAGTAGTAGGGTCGTG
3136




CTACTA

GTGC






 5143
 5165
CCACGACCCTACTACTA
1368
GCGAGATAGTAGTAG
3137




TCTCGC

GGTCG






 5149
 5171
CCCTACTACTATCTCGC
1369
TCAGGTGCGAGATAGT
3138




ACCTGA

AGTA






 5150
 5172
CCTACTACTATCTCGCA
1370
TTCAGGTGCGAGATAG
3139




CCTGAA

TAGT






 5167
 5189
CCTGAAACAAGCTAACA
1371
TAGTCATGTTAGCTTG
3140




TGACTA

TTTC






 5193
 5215
CCCTTAATTCCATCCAC
1372
AGGAGGGTGGATGGA
3141




CCTCCT

ATTAA






 5194
 5216
CCTTAATTCCATCCACC
1373
GAGGAGGGTGGATGG
3142




CTCCTC

AATTA






 5202
 5224
CCATCCACCCTCCTCTC
1374
CCTAGGGAGAGGAGG
3143




CCTAGG

GTGGA






 5206
 5228
CCACCCTCCTCTCCCTA
1375
GCCTCCTAGGGAGAGG
3144




GGAGGC

AGGG






 5209
 5231
CCCTCCTCTCCCTAGGA
1376
CAGGCCTCCTAGGGAG
3145




GGCCTG

AGGA






 5210
 5232
CCTCCTCTCCCTAGGAG
1377
GCAGGCCTCCTAGGGA
3146




GCCTGC

GAGG






 5213
 5235
CCTCTCCCTAGGAGGCC
1378
GGGGCAGGCCTCCTAG
3147




TGCCCC

GGAG






 5218
 5240
CCCTAGGAGGCCTGCCC
1379
TAGCGGGGGCAGGCCT
3148




CCGCTA

CCTA






 5219
 5241
CCTAGGAGGCCTGCCCC
1380
TTAGCGGGGGCAGGCC
3149




CGCTAA

TCCT






 5228
 5250
CCTGCCCCCGCTAACCG
1381
AAAAGCCGGTTAGCG
3150




GCTTTT

GGGGC






 5232
 5254
CCCCCGCTAACCGGCTT
1382
GGCAAAAAGCCGGTT
3151




TTTGCC

AGCGG






 5233
 5255
CCCCGCTAACCGGCTTT
1383
GGGCAAAAAGCCGGT
3152




TTGCCC

TAGCG






 5234
 5256
CCCGCTAACCGGCTTTT
1384
TGGGCAAAAAGCCGG
3153




TGCCCA

TTAGC






 5235
 5257
CCGCTAACCGGCTTTTT
1385
TTGGGCAAAAAGCCG
3154




GCCCAA

GTTAG






 5242
 5264
CCGGCTTTTTGCCCAAA
1386
GGCCCATTTGGGCAAA
3155




TGGGCC

AAGC






 5253
 5275
CCCAAATGGGCCATTAT
1387
TCTTCGATAATGGCCC
3156




CGAAGA

ATTT






 5254
 5276
CCAAATGGGCCATTATC
1388
TTCTTCGATAATGGCC
3157




GAAGAA

CATT






 5263
 5285
CCATTATCGAAGAATTC
1389
TTTTGTGAATTCTTCG
3158




ACAAAA

ATAA






 5294
 5316
CCTCATCATCCCCACCA
1390
CTATGATGGTGGGGAT
3159




TCATAG

GATG






 5303
 5325
CCCCACCATCATAGCCA
1391
TGATGGTGGCTATGAT
3160




CCATCA

GGTG






 5304
 5326
CCCACCATCATAGCCAC
1392
GTGATGGTGGCTATGA
3161




CATCAC

TGGT






 5305
 5327
CCACCATCATAGCCACC
1393
GGTGATGGTGGCTATG
3162




ATCACC

ATGG






 5308
 5330
CCATCATAGCCACCATC
1394
GAGGGTGATGGTGGCT
3163




ACCCTC

ATGA






 5317
 5339
CCACCATCACCCTCCTT
1395
GAGGTTAAGGAGGGT
3164




AACCTC

GATGG






 5320
 5342
CCATCACCCTCCTTAAC
1396
GTAGAGGTTAAGGAG
3165




CTCTAC

GGTGA






 5326
 5348
CCCTCCTTAACCTCTAC
1397
GTAGAAGTAGAGGTTA
3166




TTCTAC

AGGA






 5327
 5349
CCTCCTTAACCTCTACTT
1398
GGTAGAAGTAGAGGTT
3167




CTACC

AAGG






 5330
 5352
CCTTAACCTCTACTTCT
1399
GTAGGTAGAAGTAGA
3168




ACCTAC

GGTTA






 5336
 5358
CCTCTACTTCTACCTAC
1400
TTAGGCGTAGGTAGAA
3169




GCCTAA

GTAG






 5348
 5370
CCTACGCCTAATCTACT
1401
AGGTGGAGTAGATTAG
3170




CCACCT

GCGT






 5354
 5376
CCTAATCTACTCCACCT
1402
TGATTGAGGTGGAGTA
3171




CAATCA

GATT






 5365
 5387
CCACCTCAATCACACTA
1403
GGGGAGTAGTGTGATT
3172




CTCCCC

GAGG






 5368
 5390
CCTCAATCACACTACTC
1404
TATGGGGAGTAGTGTG
3173




CCCATA

ATTG






 5384
 5406
CCCCATATCTAACAACG
1405
TTTTTACGTTGTTAGAT
3174




TAAAAA

ATG






 5385
 5407
CCCATATCTAACAACGT
1406
ATTTTTACGTTGTTAG
3175




AAAAAT

ATAT






 5386
 5408
CCATATCTAACAACGTA
1407
TATTTTTACGTTGTTAG
3176




AAAATA

ATA






 5433
 5455
CCCACCCCATTCCTCCC
1408
AGTGTGGGGAGGAAT
3177




CACACT

GGGGT






 5434
 5456
CCACCCCATTCCTCCCC
1409
GAGTGTGGGGAGGAA
3178




ACACTC

TGGGG






 5437
 5459
CCCCATTCCTCCCCACA
1410
GATGAGTGTGGGGAG
3179




CTCATC

GAATG






 5438
 5460
CCCATTCCTCCCCACAC
1411
CGATGAGTGTGGGGA
3180




TCATCG

GGAAT






 5439
 5461
CCATTCCTCCCCACACT
1412
GCGATGAGTGTGGGG
3181




CATCGC

AGGAA






 5444
 5466
CCTCCCCACACTCATCG
1413
TAAGGGCGATGAGTGT
3182




CCCTTA

GGGG






 5447
 5469
CCCCACACTCATCGCCC
1414
TGGTAAGGGCGATGA
3183




TTACCA

GTGTG






 5448
 5470
CCCACACTCATCGCCCT
1415
GTGGTAAGGGCGATG
3184




TACCAC

AGTGT






 5449
 5471
CCACACTCATCGCCCTT
1416
CGTGGTAAGGGCGATG
3185




ACCACG

AGTG






 5461
 5483
CCCTTACCACGCTACTC
1417
AGGTAGGAGTAGCGT
3186




CTACCT

GGTAA






 5462
 5484
CCTTACCACGCTACTCC
1418
TAGGTAGGAGTAGCGT
3187




TACCTA

GGTA






 5467
 5489
CCACGCTACTCCTACCT
1419
GGAGATAGGTAGGAG
3188




ATCTCC

TAGCG






 5477
 5499
CCTACCTATCTCCCCTTT
1420
GTATAAAAGGGGAGA
3189




TATAC

TAGGT






 5481
 5503
CCTATCTCCCCTTTTATA
1421
ATTAGTATAAAAGGGG
3190




CTAAT

AGAT






 5488
 5510
CCCCTTTTATACTAATA
1422
TAAGATTATTAGTATA
3191




ATCTTA

AAAG






 5489
 5511
CCCTTTTATACTAATAA
1423
ATAAGATTATTAGTAT
3192




TCTTAT

AAAA






 5490
 5512
CCTTTTATACTAATAAT
1424
TATAAGATTATTAGTA
3193




CTTATA

TAAA






 5534
 5556
CCAAGAGCCTTCAAAGC
1425
CTGAGGGCTTTGAAGG
3194




CCTCAG

CTCT






 5541
 5563
CCTTCAAAGCCCTCAGT
1426
CAACTTACTGAGGGCT
3195




AAGTTG

TTGA






 5550
 5572
CCCTCAGTAAGTTGCAA
1427
TAAGTATTGCAACTTA
3196




TACTTA

CTGA






 5551
 5573
CCTCAGTAAGTTGCAAT
1428
TTAAGTATTGCAACTT
3197




ACTTAA

ACTG






 5601
 5623
CCCCACTCTGCATCAAC
1429
CGTTCAGTTGATGCAG
3198




TGAACG

AGTG






 5602
 5624
CCCACTCTGCATCAACT
1430
GCGTTCAGTTGATGCA
3199




GAACGC

GAGT






 5603
 5625
CCACTCTGCATCAACTG
1431
TGCGTTCAGTTGATGC
3200




AACGCA

AGAG






 5632
 5654
CCACTTTAATTAAGCTA
1432
AGGGCTTAGCTTAATT
3201




AGCCCT

AAAG






 5651
 5673
CCCTTACTAGACCAATG
1433
AAGTCCCATTGGTCTA
3202




GGACTT

GTAA






 5652
 5674
CCTTACTAGACCAATGG
1434
TAAGTCCCATTGGTCT
3203




GACTTA

AGTA






 5662
 5684
CCAATGGGACTTAAACC
1435
TTTGTGGGTTTAAGTC
3204




CACAAA

CCAT






 5677
 5699
CCCACAAACACTTAGTT
1436
GCTGTTAACTAAGTGT
3205




AACAGC

TTGT






 5678
 5700
CCACAAACACTTAGTTA
1437
AGCTGTTAACTAAGTG
3206




ACAGCT

TTTG






 5706
 5728
CCCTAATCAACTGGCTT
1438
AGATTGAAGCCAGTTG
3207




CAATCT

ATTA






 5707
 5729
CCTAATCAACTGGCTTC
1439
TAGATTGAAGCCAGTT
3208




AATCTA

GATT






 5735
 5757
CCCGCCGCCGGGAAAA
1440
CCGCCTTTTTTCCCGG
3209




AAGGCGG

CGGC






 5736
 5758
CCGCCGCCGGGAAAAA
1441
CCCGCCTTTTTTCCCG
3210




AGGCGGG

GCGG






 5739
 5761
CCGCCGGGAAAAAAGG
1442
TCTCCCGCCTTTTTTCC
3211




CGGGAGA

CGG






 5742
 5764
CCGGGAAAAAAGGCGG
1443
GCTTCTCCCGCCTTTTT
3212




GAGAAGC

TCC






 5764
 5786
CCCCGGCAGGTTTGAAG
1444
AAGCAGCTTCAAACCT
3213




CTGCTT

GCCG






 5765
 5787
CCCGGCAGGTTTGAAGC
1445
GAAGCAGCTTCAAACC
3214




TGCTTC

TGCC






 5766
 5788
CCGGCAGGTTTGAAGCT
1446
AGAAGCAGCTTCAAAC
3215




GCTTCT

CTGC






 5817
 5839
CCTCGGAGCTGGTAAAA
1447
GCCTCTTTTTACCAGC
3216




AGAGGC

TCCG






 5839
 5861
CCTAACCCCTGTCTTTA
1448
TAAATCTAAAGACAGG
3217




GATTTA

GGTT






 5844
 5866
CCCCTGTCTTTAGATTT
1449
GACTGTAAATCTAAAG
3218




ACAGTC

ACAG






 5845
 5867
CCCTGTCTTTAGATTTA
1450
GGACTGTAAATCTAAA
3219




CAGTCC

GACA






 5846
 5868
CCTGTCTTTAGATTTAC
1451
TGGACTGTAAATCTAA
3220




AGTCCA

AGAC






 5866
 5888
CCAATGCTTCACTCAGC
1452
AAAATGGCTGAGTGA
3221




CATTTT

AGCAT






 5882
 5904
CCATTTTACCTCACCCC
1453
TCAGTGGGGGTGAGGT
3222




CACTGA

AAAA






 5890
 5912
CCTCACCCCCACTGATG
1454
GGCGAACATCAGTGG
3223




TTCGCC

GGGTG






 5895
 5917
CCCCCACTGATGTTCGC
1455
CGGTCGGCGAACATCA
3224




CGACCG

GTGG






 5896
 5918
CCCCACTGATGTTCGCC
1456
ACGGTCGGCGAACATC
3225




GACCGT

AGTG






 5897
 5919
CCCACTGATGTTCGCCG
1457
AACGGTCGGCGAACAT
3226




ACCGTT

CAGT






 5898
 5920
CCACTGATGTTCGCCGA
1458
CAACGGTCGGCGAAC
3227




CCGTTG

ATCAG






 5911
 5933
CCGACCGTTGACTATTC
1459
TGTAGAGAATAGTCAA
3228




TCTACA

CGGT






 5915
 5937
CCGTTGACTATTCTCTA
1460
GGTTTGTAGAGAATAG
3229




CAAACC

TCAA






 5936
 5958
CCACAAAGACATTGGA
1461
ATAGTGTTCCAATGTC
3230




ACACTAT

TTTG






 5960
 5982
CCTATTATTCGGCGCAT
1462
CAGCTCATGCGCCGAA
3231




GAGCTG

TAAT






 5987
 6009
CCTAGGCACAGCTCTAA
1463
GGAGGCTTAGAGCTGT
3232




GCCTCC

GCCT






 6005
 6027
CCTCCTTATTCGAGCCG
1464
CCAGCTCGGCTCGAAT
3233




AGCTGG

AAGG






 6008
 6030
CCTTATTCGAGCCGAGC
1465
GGCCCAGCTCGGCTCG
3234




TGGGCC

AATA






 6019
 6041
CCGAGCTGGGCCAGCCA
1466
GTTGCCTGGCTGGCCC
3235




GGCAAC

AGCT






 6029
 6051
CCAGCCAGGCAACCTTC
1467
TACCTAGAAGGTTGCC
3236




TAGGTA

TGGC






 6033
 6055
CCAGGCAACCTTCTAGG
1468
TCGTTACCTAGAAGGT
3237




TAACGA

TGCC






 6041
 6063
CCTTCTAGGTAACGACC
1469
AGATGTGGTCGTTACC
3238




ACATCT

TAGA






 6056
 6078
CCACATCTACAACGTTA
1470
TGACGATAACGTTGTA
3239




TCGTCA

GATG






 6082
 6104
CCCATGCATTTGTAATA
1471
GAAGATTATTACAAAT
3240




ATCTTC

GCAT






 6083
 6105
CCATGCATTTGTAATAA
1472
AGAAGATTATTACAAA
3241




TCTTCT

TGCA






 6117
 6139
CCCATCATAATCGGAGG
1473
CCAAAGCCTCCGATTA
3242




CTTTGG

TGAT






 6118
 6140
CCATCATAATCGGAGGC
1474
GCCAAAGCCTCCGATT
3243




TTTGGC

ATGA






 6153
 6175
CCCCTAATAATCGGTGC
1475
TCGGGGGCACCGATTA
3244




CCCCGA

TTAG






 6154
 6176
CCCTAATAATCGGTGCC
1476
ATCGGGGGCACCGATT
3245




CCCGAT

ATTA






 6155
 6177
CCTAATAATCGGTGCCC
1477
TATCGGGGGCACCGAT
3246




CCGATA

TATT






 6169
 6191
CCCCCGATATGGCGTTT
1478
GCGGGGAAACGCCAT
3247




CCCCGC

ATCGG






 6170
 6192
CCCCGATATGGCGTTTC
1479
TGCGGGGAAACGCCAT
3248




CCCGCA

ATCG






 6171
 6193
CCCGATATGGCGTTTCC
1480
ATGCGGGGAAACGCC
3249




CCGCAT

ATATC






 6172
 6194
CCGATATGGCGTTTCCC
1481
TATGCGGGGAAACGCC
3250




CGCATA

ATAT






 6186
 6208
CCCCGCATAAACAACAT
1482
AAGCTTATGTTGTTTA
3251




AAGCTT

TGCG






 6187
 6209
CCCGCATAAACAACATA
1483
GAAGCTTATGTTGTTT
3252




AGCTTC

ATGC






 6188
 6210
CCGCATAAACAACATAA
1484
AGAAGCTTATGTTGTT
3253




GCTTCT

TATG






 6219
 6241
CCTCCCTCTCTCCTACTC
1485
AGCAGGAGTAGGAGA
3254




CTGCT

GAGGG






 6222
 6244
CCCTCTCTCCTACTCCTG
1486
GCGAGCAGGAGTAGG
3255




CTCGC

AGAGA






 6223
 6245
CCTCTCTCCTACTCCTGC
1487
TGCGAGCAGGAGTAG
3256




TCGCA

GAGAG






 6230
 6252
CCTACTCCTGCTCGCAT
1488
TAGCAGATGCGAGCA
3257




CTGCTA

GGAGT






 6236
 6258
CCTGCTCGCATCTGCTA
1489
CCACTATAGCAGATGC
3258




TAGTGG

GAGC






 6262
 6284
CCGGAGCAGGAACAGG
1490
TGTTCAACCTGTTCCT
3259




TTGAACA

GCTC






 6290
 6312
CCCTCCCTTAGCAGGGA
1491
AGTAGTTCCCTGCTAA
3260




ACTACT

GGGA






 6291
 6313
CCTCCCTTAGCAGGGAA
1492
GAGTAGTTCCCTGCTA
3261




CTACTC

AGGG






 6294
 6316
CCCTTAGCAGGGAACTA
1493
TGGGAGTAGTTCCCTG
3262




CTCCCA

CTAA






 6295
 6317
CCTTAGCAGGGAACTAC
1494
GTGGGAGTAGTTCCCT
3263




TCCCAC

GCTA






 6313
 6335
CCCACCCTGGAGCCTCC
1495
GTCTACGGAGGCTCCA
3264




GTAGAC

GGGT






 6314
 6336
CCACCCTGGAGCCTCCG
1496
GGTCTACGGAGGCTCC
3265




TAGACC

AGGG






 6317
 6339
CCCTGGAGCCTCCGTAG
1497
TTAGGTCTACGGAGGC
3266




ACCTAA

TCCA






 6318
 6340
CCTGGAGCCTCCGTAGA
1498
GTTAGGTCTACGGAGG
3267




CCTAAC

CTCC






 6325
 6347
CCTCCGTAGACCTAACC
1499
GAAGATGGTTAGGTCT
3268




ATCTTC

ACGG






 6328
 6350
CCGTAGACCTAACCATC
1500
GGAGAAGATGGTTAG
3269




TTCTCC

GTCTA






 6335
 6357
CCTAACCATCTTCTCCTT
1501
GGTGTAAGGAGAAGA
3270




ACACC

TGGTT






 6340
 6362
CCATCTTCTCCTTACAC
1502
TGCTAGGTGTAAGGAG
3271




CTAGCA

AAGA






 6349
 6371
CCTTACACCTAGCAGGT
1503
GGAGACACCTGCTAGG
3272




GTCTCC

TGTA






 6356
 6378
CCTAGCAGGTGTCTCCT
1504
AGATAGAGGAGACAC
3273




CTATCT

CTGCT






 6370
 6392
CCTCTATCTTAGGGGCC
1505
ATTGATGGCCCCTAAG
3274




ATCAAT

ATAG






 6385
 6407
CCATCAATTTCATCACA
1506
AATTGTTGTGATGAAA
3275




ACAATT

TTGA






 6420
 6442
CCCCCTGCCATAACCCA
1507
TGGTATTGGGTTATGG
3276




ATACCA

CAGG






 6421
 6443
CCCCTGCCATAACCCAA
1508
TTGGTATTGGGTTATG
3277




TACCAA

GCAG






 6422
 6444
CCCTGCCATAACCCAAT
1509
TTTGGTATTGGGTTAT
3278




ACCAAA

GGCA






 6423
 6445
CCTGCCATAACCCAATA
1510
GTTTGGTATTGGGTTA
3279




CCAAAC

TGGC






 6427
 6449
CCATAACCCAATACCAA
1511
GGGCGTTTGGTATTGG
3280




ACGCCC

GTTA






 6433
 6455
CCCAATACCAAACGCCC
1512
GAAGAGGGGCGTTTG
3281




CTCTTC

GTATT






 6434
 6456
CCAATACCAAACGCCCC
1513
CGAAGAGGGGCGTTTG
3282




TCTTCG

GTAT






 6440
 6462
CCAAACGCCCCTCTTCG
1514
ATCAGACGAAGAGGG
3283




TCTGAT

GCGTT






 6447
 6469
CCCCTCTTCGTCTGATC
1515
AGGACGGATCAGACG
3284




CGTCCT

AAGAG






 6448
 6470
CCCTCTTCGTCTGATCC
1516
TAGGACGGATCAGAC
3285




GTCCTA

GAAGA






 6449
 6471
CCTCTTCGTCTGATCCG
1517
TTAGGACGGATCAGAC
3286




TCCTAA

GAAG






 6463
 6485
CCGTCCTAATCACAGCA
1518
TAGGACTGCTGTGATT
3287




GTCCTA

AGGA






 6467
 6489
CCTAATCACAGCAGTCC
1519
GAAGTAGGACTGCTGT
3288




TACTTC

GATT






 6482
 6504
CCTACTTCTCCTATCTCT
1520
CTGGGAGAGATAGGA
3289




CCCAG

GAAGT






 6491
 6513
CCTATCTCTCCCAGTCC
1521
CAGCTAGGACTGGGA
3290




TAGCTG

GAGAT






 6500
 6522
CCCAGTCCTAGCTGCTG
1522
TGATGCCAGCAGCTAG
3291




GCATCA

GACT






 6501
 6523
CCAGTCCTAGCTGCTGG
1523
GTGATGCCAGCAGCTA
3292




CATCAC

GGAC






 6506
 6528
CCTAGCTGCTGGCATCA
1524
GTATAGTGATGCCAGC
3293




CTATAC

AGCT






 6539
 6561
CCGCAACCTCAACACCA
1525
AGAAGGTGGTGTTGAG
3294




CCTTCT

GTTG






 6545
 6567
CCTCAACACCACCTTCT
1526
GGTCGAAGAAGGTGG
3295




TCGACC

TGTTG






 6553
 6575
CCACCTTCTTCGACCCC
1527
TCCGGCGGGGTCGAAG
3296




GCCGGA

AAGG






 6556
 6578
CCTTCTTCGACCCCGCC
1528
TCCTCCGGCGGGGTCG
3297




GGAGGA

AAGA






 6566
 6588
CCCCGCCGGAGGAGGA
1529
TGGGGTCTCCTCCTCC
3298




GACCCCA

GGCG






 6567
 6589
CCCGCCGGAGGAGGAG
1530
ATGGGGTCTCCTCCTC
3299




ACCCCAT

CGGC






 6568
 6590
CCGCCGGAGGAGGAGA
1531
AATGGGGTCTCCTCCT
3300




CCCCATT

CCGG






 6571
 6593
CCGGAGGAGGAGACCC
1532
TAGAATGGGGTCTCCT
3301




CATTCTA

CCTC






 6584
 6606
CCCCATTCTATACCAAC
1533
ATAGGTGTTGGTATAG
3302




ACCTAT

AATG






 6585
 6607
CCCATTCTATACCAACA
1534
AATAGGTGTTGGTATA
3303




CCTATT

GAAT






 6586
 6608
CCATTCTATACCAACAC
1535
GAATAGGTGTTGGTAT
3304




CTATTC

AGAA






 6596
 6618
CCAACACCTATTCTGAT
1536
CGAAAAATCAGAATA
3305




TTTTCG

GGTGT






 6602
 6624
CCTATTCTGATTTTTCGG
1537
GGTGACCGAAAAATC
3306




TCACC

AGAAT






 6623
 6645
CCCTGAAGTTTATATTC
1538
GGATAAGAATATAAA
3307




TTATCC

CTTCA






 6624
 6646
CCTGAAGTTTATATTCT
1539
AGGATAAGAATATAA
3308




TATCCT

ACTTC






 6644
 6666
CCTACCAGGCTTCGGAA
1540
AGATTATTCCGAAGCC
3309




TAATCT

TGGT






 6648
 6670
CCAGGCTTCGGAATAAT
1541
TGGGAGATTATTCCGA
3310




CTCCCA

AGCC






 6667
 6689
CCCATATTGTAACTTAC
1542
GGAGTAGTAAGTTACA
3311




TACTCC

ATAT






 6668
 6690
CCATATTGTAACTTACT
1543
CGGAGTAGTAAGTTAC
3312




ACTCCG

AATA






 6688
 6710
CCGGAAAAAAAGAACC
1544
TCCAAATGGTTCTTTTT
3313




ATTTGGA

TTC






 6702
 6724
CCATTTGGATACATAGG
1545
ACCATACCTATGTATC
3314




TATGGT

CAAA






 6749
 6771
CCTAGGGTTTATCGTGT
1546
GTGCTCACACGATAAA
3315




GAGCAC

CCCT






 6773
 6795
CCATATATTTACAGTAG
1547
CTATTCCTACTGTAAA
3316




GAATAG

TATA






 6820
 6842
CCTCCGCTACCATAATC
1548
AGCGATGATTATGGTA
3317




ATCGCT

GCGG






 6823
 6845
CCGCTACCATAATCATC
1549
GATAGCGATGATTATG
3318




GCTATC

GTAG






 6829
 6851
CCATAATCATCGCTATC
1550
GGTGGGGATAGCGAT
3319




CCCACC

GATTA






 6845
 6867
CCCCACCGGCGTCAAAG
1551
TAAATACTTTGACGCC
3320




TATTTA

GGTG






 6846
 6868
CCCACCGGCGTCAAAGT
1552
CTAAATACTTTGACGC
3321




ATTTAG

CGGT






 6847
 6869
CCACCGGCGTCAAAGTA
1553
GCTAAATACTTTGACG
3322




TTTAGC

CCGG






 6850
 6872
CCGGCGTCAAAGTATTT
1554
TCAGCTAAATACTTTG
3323




AGCTGA

ACGC






 6877
 6899
CCACACTCCACGGAAGC
1555
CATATTGCTTCCGTGG
3324




AATATG

AGTG






 6884
 6906
CCACGGAAGCAATATG
1556
ATCATTTCATATTGCTT
3325




AAATGAT

CCG






 6925
 6947
CCCTAGGATTCATCTTT
1557
GAAAAGAAAGATGAA
3326




CTTTTC

TCCTA






 6926
 6948
CCTAGGATTCATCTTTC
1558
TGAAAAGAAAGATGA
3327




TTTTCA

ATCCT






 6949
 6971
CCGTAGGTGGCCTGACT
1559
AATGCCAGTCAGGCCA
3328




GGCATT

CCTA






 6959
 6981
CCTGACTGGCATTGTAT
1560
TTGCTAATACAATGCC
3329




TAGCAA

AGTC






 7027
 7049
CCCACTTCCACTATGTC
1561
TGATAGGACATAGTGG
3330




CTATCA

AAGT






 7028
 7050
CCACTTCCACTATGTCC
1562
TTGATAGGACATAGTG
3331




TATCAA

GAAG






 7034
 7056
CCACTATGTCCTATCAA
1563
CTCCTATTGATAGGAC
3332




TAGGAG

ATAG






 7043
 7065
CCTATCAATAGGAGCTG
1564
CAAATACAGCTCCTAT
3333




TATTTG

TGAT






 7066
 7088
CCATCATAGGAGGCTTC
1565
GTGAATGAAGCCTCCT
3334




ATTCAC

ATGA






 7095
 7117
CCCCTATTCTCAGGCTA
1566
AGGGTGTAGCCTGAGA
3335




CACCCT

ATAG






 7096
 7118
CCCTATTCTCAGGCTAC
1567
TAGGGTGTAGCCTGAG
3336




ACCCTA

AATA






 7097
 7119
CCTATTCTCAGGCTACA
1568
CTAGGGTGTAGCCTGA
3337




CCCTAG

GAAT






 7114
 7136
CCCTAGACCAAACCTAC
1569
TTTGGCGTAGGTTTGG
3338




GCCAAA

TCTA






 7115
 7137
CCTAGACCAAACCTACG
1570
TTTTGGCGTAGGTTTG
3339




CCAAAA

GTCT






 7121
 7143
CCAAACCTACGCCAAAA
1571
AATGGATTTTGGCGTA
3340




TCCATT

GGTT






 7126
 7148
CCTACGCCAAAATCCAT
1572
AGTGAAATGGATTTTG
3341




TTCACT

GCGT






 7132
 7154
CCAAAATCCATTTCACT
1573
TATGATAGTGAAATGG
3342




ATCATA

ATTT






 7139
 7161
CCATTTCACTATCATAT
1574
CGATGAATATGATAGT
3343




TCATCG

GAAA






 7181
 7203
CCCACAACACTTTCTCG
1575
ATAGGCCGAGAAAGT
3344




GCCTAT

GTTGT






 7182
 7204
CCACAACACTTTCTCGG
1576
GATAGGCCGAGAAAG
3345




CCTATC

TGTTG






 7199
 7221
CCTATCCGGAATGCCCC
1577
AACGTCGGGGCATTCC
3346




GACGTT

GGAT






 7204
 7226
CCGGAATGCCCCGACGT
1578
CGAGTAACGTCGGGGC
3347




TACTCG

ATTC






 7212
 7234
CCCCGACGTTACTCGGA
1579
GGGTAGTCCGAGTAAC
3348




CTACCC

GTCG






 7213
 7235
CCCGACGTTACTCGGAC
1580
GGGGTAGTCCGAGTAA
3349




TACCCC

CGTC






 7214
 7236
CCGACGTTACTCGGACT
1581
CGGGGTAGTCCGAGTA
3350




ACCCCG

ACGT






 7232
 7254
CCCCGATGCATACACCA
1582
TTCATGTGGTGTATGC
3351




CATGAA

ATCG






 7233
 7255
CCCGATGCATACACCAC
1583
TTTCATGTGGTGTATG
3352




ATGAAA

CATC






 7234
 7256
CCGATGCATACACCACA
1584
GTTTCATGTGGTGTAT
3353




TGAAAC

GCAT






 7246
 7268
CCACATGAAACATCCTA
1585
AGATGATAGGATGTTT
3354




TCATCT

CATG






 7259
 7281
CCTATCATCTGTAGGCT
1586
TGAATGAGCCTACAGA
3355




CATTCA

TGAT






 7327
 7349
CCTTCGCTTCGAAGCGA
1587
GACTTTTCGCTTCGAA
3356




AAAGTC

GCGA






 7349
 7371
CCTAATAGTAGAAGAAC
1588
TGGAGGGTTCTTCTAC
3357




CCTCCA

TATT






 7365
 7387
CCCTCCATAAACCTGGA
1589
AGTCACTCCAGGTTTA
3358




GTGACT

TGGA






 7366
 7388
CCTCCATAAACCTGGAG
1590
TAGTCACTCCAGGTTT
3359




TGACTA

ATGG






 7369
 7391
CCATAAACCTGGAGTGA
1591
ATATAGTCACTCCAGG
3360




CTATAT

TTTA






 7376
 7398
CCTGGAGTGACTATATG
1592
GGCATCCATATAGTCA
3361




GATGCC

CTCC






 7397
 7419
CCCCCCACCCTACCACA
1593
CGAATGTGTGGTAGGG
3362




CATTCG

TGGG






 7398
 7420
CCCCCACCCTACCACAC
1594
TCGAATGTGTGGTAGG
3363




ATTCGA

GTGG






 7399
 7421
CCCCACCCTACCACACA
1595
TTCGAATGTGTGGTAG
3364




TTCGAA

GGTG






 7400
 7422
CCCACCCTACCACACAT
1596
CTTCGAATGTGTGGTA
3365




TCGAAG

GGGT






 7401
 7423
CCACCCTACCACACATT
1597
TCTTCGAATGTGTGGT
3366




CGAAGA

AGGG






 7404
 7426
CCCTACCACACATTCGA
1598
GGTTCTTCGAATGTGT
3367




AGAACC

GGTA






 7405
 7427
CCTACCACACATTCGAA
1599
GGGTTCTTCGAATGTG
3368




GAACCC

TGGT






 7409
 7431
CCACACATTCGAAGAAC
1600
ATACGGGTTCTTCGAA
3369




CCGTAT

TGTG






 7425
 7447
CCCGTATACATAAAATC
1601
TGTCTAGATTTTATGT
3370




TAGACA

ATAC






 7426
 7448
CCGTATACATAAAATCT
1602
TTGTCTAGATTTTATGT
3371




AGACAA

ATA






 7466
 7488
CCCCCCAAAGCTGGTTT
1603
GGCTTGAAACCAGCTT
3372




CAAGCC

TGGG






 7467
 7489
CCCCCAAAGCTGGTTTC
1604
TGGCTTGAAACCAGCT
3373




AAGCCA

TTGG






 7468
 7490
CCCCAAAGCTGGTTTCA
1605
TTGGCTTGAAACCAGC
3374




AGCCAA

TTTG






 7469
 7491
CCCAAAGCTGGTTTCAA
1606
GTTGGCTTGAAACCAG
3375




GCCAAC

CTTT






 7470
 7492
CCAAAGCTGGTTTCAAG
1607
GGTTGGCTTGAAACCA
3376




CCAACC

GCTT






 7487
 7509
CCAACCCCATGGCCTCC
1608
AGTCATGGAGGCCATG
3377




ATGACT

GGGT






 7491
 7513
CCCCATGGCCTCCATGA
1609
AAAAAGTCATGGAGG
3378




CTTTTT

CCATG






 7492
 7514
CCCATGGCCTCCATGAC
1610
GAAAAAGTCATGGAG
3379




TTTTTC

GCCAT






 7493
 7515
CCATGGCCTCCATGACT
1611
TGAAAAAGTCATGGA
3380




TTTTCA

GGCCA






 7499
 7521
CCTCCATGACTTTTTCA
1612
CCTTTTTGAAAAAGTC
3381




AAAAGG

ATGG






 7502
 7524
CCATGACTTTTTCAAAA
1613
ATACCTTTTTGAAAAA
3382




AGGTAT

GTCA






 7533
 7555
CCATTTCATAACTTTGT
1614
ACTTTGACAAAGTTAT
3383




CAAAGT

GAAA






 7573
 7595
CCTATATATCTTAATGG
1615
CATGTGCCATTAAGAT
3384




CACATG

ATAT






 7626
 7648
CCCCTATCATAGAAGAG
1616
GATAAGCTCTTCTATG
3385




CTTATC

ATAG






 7627
 7649
CCCTATCATAGAAGAGC
1617
TGATAAGCTCTTCTAT
3386




TTATCA

GATA






 7628
 7650
CCTATCATAGAAGAGCT
1618
GTGATAAGCTCTTCTA
3387




TATCAC

TGAT






 7650
 7672
CCTTTCATGATCACGCC
1619
TATGAGGGCGTGATCA
3388




CTCATA

TGAA






 7665
 7687
CCCTCATAATCATTTTC
1620
GATAAGGAAAATGATT
3389




CTTATC

ATGA






 7666
 7688
CCTCATAATCATTTTCCT
1621
AGATAAGGAAAATGA
3390




TATCT

TTATG






 7681
 7703
CCTTATCTGCTTCCTAGT
1622
ACAGGACTAGGAAGC
3391




CCTGT

AGATA






 7693
 7715
CCTAGTCCTGTATGCCC
1623
GGAAAAGGGCATACA
3392




TTTTCC

GGACT






 7699
 7721
CCTGTATGCCCTTTTCCT
1624
GTGTTAGGAAAAGGG
3393




AACAC

CATAC






 7707
 7729
CCCTTTTCCTAACACTC
1625
TGTTGTGAGTGTTAGG
3394




ACAACA

AAAA






 7708
 7730
CCTTTTCCTAACACTCA
1626
TTGTTGTGAGTGTTAG
3395




CAACAA

GAAA






 7714
 7736
CCTAACACTCACAACAA
1627
TTAGTTTTGTTGTGAG
3396




AACTAA

TGTT






 7773
 7795
CCGTCTGAACTATCCTG
1628
GGCGGGCAGGATAGTT
3397




CCCGCC

CAGA






 7786
 7808
CCTGCCCGCCATCATCC
1629
GGACTAGGATGATGGC
3398




TAGTCC

GGGC






 7790
 7812
CCCGCCATCATCCTAGT
1630
ATGAGGACTAGGATG
3399




CCTCAT

ATGGC






 7791
 7813
CCGCCATCATCCTAGTC
1631
GATGAGGACTAGGAT
3400




CTCATC

GATGG






 7794
 7816
CCATCATCCTAGTCCTC
1632
GGCGATGAGGACTAG
3401




ATCGCC

GATGA






 7801
 7823
CCTAGTCCTCATCGCCC
1633
ATGGGAGGGCGATGA
3402




TCCCAT

GGACT






 7807
 7829
CCTCATCGCCCTCCCAT
1634
GTAGGGATGGGAGGG
3403




CCCTAC

CGATG






 7815
 7837
CCCTCCCATCCCTACGC
1635
AAGGATGCGTAGGGA
3404




ATCCTT

TGGGA






 7816
 7838
CCTCCCATCCCTACGCA
1636
AAAGGATGCGTAGGG
3405




TCCTTT

ATGGG






 7819
 7841
CCCATCCCTACGCATCC
1637
TGTAAAGGATGCGTAG
3406




TTTACA

GGAT






 7820
 7842
CCATCCCTACGCATCCT
1638
ATGTAAAGGATGCGTA
3407




TTACAT

GGGA






 7824
 7846
CCCTACGCATCCTTTAC
1639
TGTTATGTAAAGGATG
3408




ATAACA

CGTA






 7825
 7847
CCTACGCATCCTTTACA
1640
CTGTTATGTAAAGGAT
3409




TAACAG

GCGT






 7834
 7856
CCTTTACATAACAGACG
1641
TGACCTCGTCTGTTAT
3410




AGGTCA

GTAA






 7862
 7884
CCCTCCCTTACCATCAA
1642
ATTGATTTGATGGTAA
3411




ATCAAT

GGGA






 7863
 7885
CCTCCCTTACCATCAAA
1643
AATTGATTTGATGGTA
3412




TCAATT

AGGG






 7866
 7888
CCCTTACCATCAAATCA
1644
GCCAATTGATTTGATG
3413




ATTGGC

GTAA






 7867
 7889
CCTTACCATCAAATCAA
1645
GGCCAATTGATTTGAT
3414




TTGGCC

GGTA






 7872
 7894
CCATCAAATCAATTGGC
1646
TTGGTGGCCAATTGAT
3415




CACCAA

TTGA






 7888
 7910
CCACCAATGGTACTGAA
1647
CGTAGGTTCAGTACCA
3416




CCTACG

TTGG






 7891
 7913
CCAATGGTACTGAACCT
1648
ACTCGTAGGTTCAGTA
3417




ACGAGT

CCAT






 7905
 7927
CCTACGAGTACACCGAC
1649
GCCGTAGTCGGTGTAC
3418




TACGGC

TCGT






 7917
 7939
CCGACTACGGCGGACTA
1650
GAAGATTAGTCCGCCG
3419




ATCTTC

TAGT






 7944
 7966
CCTACATACTTCCCCCA
1651
GAATAATGGGGGAAG
3420




TTATTC

TATGT






 7955
 7977
CCCCCATTATTCCTAGA
1652
CCTGGTTCTAGGAATA
3421




ACCAGG

ATGG






 7956
 7978
CCCCATTATTCCTAGAA
1653
GCCTGGTTCTAGGAAT
3422




CCAGGC

AATG






 7957
 7979
CCCATTATTCCTAGAAC
1654
CGCCTGGTTCTAGGAA
3423




CAGGCG

TAAT






 7958
 7980
CCATTATTCCTAGAACC
1655
TCGCCTGGTTCTAGGA
3424




AGGCGA

ATAA






 7966
 7988
CCTAGAACCAGGCGACC
1656
GTCGCAGGTCGCCTGG
3425




TGCGAC

TTCT






 7973
 7995
CCAGGCGACCTGCGACT
1657
TCAAGGAGTCGCAGGT
3426




CCTTGA

CGCC






 7981
 8003
CCTGCGACTCCTTGACG
1658
TGTCAACGTCAAGGAG
3427




TTGACA

TCGC






 7990
 8012
CCTTGACGTTGACAATC
1659
CTACTCGATTGTCAAC
3428




GAGTAG

GTCA






 8017
 8039
CCCGATTGAAGCCCCCA
1660
TACGAATGGGGGCTTC
3429




TTCGTA

AATC






 8018
 8040
CCGATTGAAGCCCCCAT
1661
ATACGAATGGGGGCTT
3430




TCGTAT

CAAT






 8028
 8050
CCCCCATTCGTATAATA
1662
TGTAATTATTATACGA
3431




ATTACA

ATGG






 8029
 8051
CCCCATTCGTATAATAA
1663
ATGTAATTATTATACG
3432




TTACAT

AATG






 8030
 8052
CCCATTCGTATAATAAT
1664
GATGTAATTATTATAC
3433




TACATC

GAAT






 8031
 8053
CCATTCGTATAATAATT
1665
TGATGTAATTATTATA
3434




ACATCA

CGAA






 8080
 8102
CCCCACATTAGGCTTAA
1666
CTGTTTTTAAGCCTAA
3435




AAACAG

TGTG






 8081
 8103
CCCACATTAGGCTTAAA
1667
TCTGTTTTTAAGCCTA
3436




AACAGA

ATGT






 8082
 8104
CCACATTAGGCTTAAAA
1668
ATCTGTTTTTAAGCCT
3437




ACAGAT

AATG






 8111
 8133
CCCGGACGTCTAAACCA
1669
GTGGTTTGGTTTAGAC
3438




AACCAC

GTCC






 8112
 8134
CCGGACGTCTAAACCAA
1670
AGTGGTTTGGTTTAGA
3439




ACCACT

CGTC






 8125
 8147
CCAAACCACTTTCACCG
1671
GTGTAGCGGTGAAAGT
3440




CTACAC

GGTT






 8130
 8152
CCACTTTCACCGCTACA
1672
CGGTCGTGTAGCGGTG
3441




CGACCG

AAAG






 8139
 8161
CCGCTACACGACCGGGG
1673
GTATACCCCCGGTCGT
3442




GTATAC

GTAG






 8150
 8172
CCGGGGGTATACTACGG
1674
CATTGACCGTAGTATA
3443




TCAATG

CCCC






 8194
 8216
CCACAGTTTCATGCCCA
1675
GGACGATGGGCATGA
3444




TCGTCC

AACTG






 8207
 8229
CCCATCGTCCTAGAATT
1676
GGAATTAATTCTAGGA
3445




AATTCC

CGAT






 8208
 8230
CCATCGTCCTAGAATTA
1677
GGGAATTAATTCTAGG
3446




ATTCCC

ACGA






 8215
 8237
CCTAGAATTAATTCCCC
1678
TTTTTAGGGGAATTAA
3447




TAAAAA

TTCT






 8228
 8250
CCCCTAAAAATCTTTGA
1679
CCTATTTCAAAGATTT
3448




AATAGG

TTAG






 8229
 8251
CCCTAAAAATCTTTGAA
1680
CCCTATTTCAAAGATT
3449




ATAGGG

TTTA






 8230
 8252
CCTAAAAATCTTTGAAA
1681
GCCCTATTTCAAAGAT
3450




TAGGGC

TTTT






 8252
 8274
CCCGTATTTACCCTATA
1682
GGGTGCTATAGGGTAA
3451




GCACCC

ATAC






 8253
 8275
CCGTATTTACCCTATAG
1683
GGGGTGCTATAGGGTA
3452




CACCCC

AATA






 8262
 8284
CCCTATAGCACCCCCTC
1684
GGGGTAGAGGGGGTG
3453




TACCCC

CTATA






 8263
 8285
CCTATAGCACCCCCTCT
1685
GGGGGTAGAGGGGGT
3454




ACCCCC

GCTAT






 8272
 8294
CCCCCTCTACCCCCTCT
1686
GGCTCTAGAGGGGGTA
3455




AGAGCC

GAGG






 8273
 8295
CCCCTCTACCCCCTCTA
1687
GGGCTCTAGAGGGGGT
3456




GAGCCC

AGAG






 8274
 8296
CCCTCTACCCCCTCTAG
1688
TGGGCTCTAGAGGGGG
3457




AGCCCA

TAGA






 8275
 8297
CCTCTACCCCCTCTAGA
1689
GTGGGCTCTAGAGGGG
3458




GCCCAC

GTAG






 8281
 8303
CCCCCTCTAGAGCCCAC
1690
TTTACAGTGGGCTCTA
3459




TGTAAA

GAGG






 8282
 8304
CCCCTCTAGAGCCCACT
1691
CTTTACAGTGGGCTCT
3460




GTAAAG

AGAG






 8283
 8305
CCCTCTAGAGCCCACTG
1692
GCTTTACAGTGGGCTC
3461




TAAAGC

TAGA






 8284
 8306
CCTCTAGAGCCCACTGT
1693
AGCTTTACAGTGGGCT
3462




AAAGCT

CTAG






 8293
 8315
CCCACTGTAAAGCTAAC
1694
TGCTAAGTTAGCTTTA
3463




TTAGCA

CAGT






 8294
 8316
CCACTGTAAAGCTAACT
1695
ATGCTAAGTTAGCTTT
3464




TAGCAT

ACAG






 8320
 8342
CCTTTTAAGTTAAAGAT
1696
CTCTTAATCTTTAACTT
3465




TAAGAG

AAA






 8345
 8367
CCAACACCTCTTTACAG
1697
ATTTCACTGTAAAGAG
3466




TGAAAT

GTGT






 8351
 8373
CCTCTTTACAGTGAAAT
1698
TGGGGCATTTCACTGT
3467




GCCCCA

AAAG






 8369
 8391
CCCCAACTAAATACTAC
1699
CATACGGTAGTATTTA
3468




CGTATG

GTTG






 8370
 8392
CCCAACTAAATACTACC
1700
CCATACGGTAGTATTT
3469




GTATGG

AGTT






 8371
 8393
CCAACTAAATACTACCG
1701
GCCATACGGTAGTATT
3470




TATGGC

TAGT






 8385
 8407
CCGTATGGCCCACCATA
1702
GGTAATTATGGTGGGC
3471




ATTACC

CATA






 8393
 8415
CCCACCATAATTACCCC
1703
AGTATGGGGGTAATTA
3472




CATACT

TGGT






 8394
 8416
CCACCATAATTACCCCC
1704
GAGTATGGGGGTAATT
3473




ATACTC

ATGG






 8397
 8419
CCATAATTACCCCCATA
1705
AAGGAGTATGGGGGT
3474




CTCCTT

AATTA






 8406
 8428
CCCCCATACTCCTTACA
1706
GAATAGTGTAAGGAGT
3475




CTATTC

ATGG






 8407
 8429
CCCCATACTCCTTACAC
1707
GGAATAGTGTAAGGA
3476




TATTCC

GTATG






 8408
 8430
CCCATACTCCTTACACT
1708
AGGAATAGTGTAAGG
3477




ATTCCT

AGTAT






 8409
 8431
CCATACTCCTTACACTA
1709
GAGGAATAGTGTAAG
3478




TTCCTC

GAGTA






 8416
 8438
CCTTACACTATTCCTCA
1710
GGGTGATGAGGAATA
3479




TCACCC

GTGTA






 8428
 8450
CCTCATCACCCAACTAA
1711
ATATTTTTAGTTGGGT
3480




AAATAT

GATG






 8436
 8458
CCCAACTAAAAATATTA
1712
TGTGTTTAATATTTTTA
3481




AACACA

GTT






 8437
 8459
CCAACTAAAAATATTAA
1713
TTGTGTTTAATATTTTT
3482




ACACAA

AGT






 8464
 8486
CCACCTACCTCCCTCAC
1714
GCTTTGGTGAGGGAGG
3483




CAAAGC

TAGG






 8467
 8489
CCTACCTCCCTCACCAA
1715
TGGGCTTTGGTGAGGG
3484




AGCCCA

AGGT






 8471
 8493
CCTCCCTCACCAAAGCC
1716
TTTATGGGCTTTGGTG
3485




CATAAA

AGGG






 8474
 8496
CCCTCACCAAAGCCCAT
1717
ATTTTTATGGGCTTTG
3486




AAAAAT

GTGA






 8475
 8497
CCTCACCAAAGCCCATA
1718
TATTTTTATGGGCTTTG
3487




AAAATA

GTG






 8480
 8502
CCAAAGCCCATAAAAAT
1719
TTTTTTATTTTTATGGG
3488




AAAAAA

CTT






 8486
 8508
CCCATAAAAATAAAAA
1720
TTATAATTTTTTATTTT
3489




ATTATAA

TAT






 8487
 8509
CCATAAAAATAAAAAA
1721
GTTATAATTTTTTATTT
3490




TTATAAC

TTA






 8513
 8535
CCCTGAGAACCAAAATG
1722
TTCGTTCATTTTGGTTC
3491




AACGAA

TCA






 8514
 8536
CCTGAGAACCAAAATG
1723
TTTCGTTCATTTTGGTT
3492




AACGAAA

CTC






 8522
 8544
CCAAAATGAACGAAAA
1724
GAACAGATTTTCGTTC
3493




TCTGTTC

ATTT






 8558
 8580
CCCCCACAATCCTAGGC
1725
GGGTAGGCCTAGGATT
3494




CTACCC

GTGG






 8559
 8581
CCCCACAATCCTAGGCC
1726
CGGGTAGGCCTAGGAT
3495




TACCCG

TGTG






 8560
 8582
CCCACAATCCTAGGCCT
1727
GCGGGTAGGCCTAGG
3496




ACCCGC

ATTGT






 8561
 8583
CCACAATCCTAGGCCTA
1728
GGCGGGTAGGCCTAG
3497




CCCGCC

GATTG






 8568
 8590
CCTAGGCCTACCCGCCG
1729
GTACTGCGGCGGGTAG
3498




CAGTAC

GCCT






 8574
 8596
CCTACCCGCCGCAGTAC
1730
TGATCAGTACTGCGGC
3499




TGATCA

GGGT






 8578
 8600
CCCGCCGCAGTACTGAT
1731
AGAATGATCAGTACTG
3500




CATTCT

CGGC






 8579
 8601
CCGCCGCAGTACTGATC
1732
TAGAATGATCAGTACT
3501




ATTCTA

GCGG






 8582
 8604
CCGCAGTACTGATCATT
1733
AAATAGAATGATCAGT
3502




CTATTT

ACTG






 8605
 8627
CCCCCTCTATTGATCCC
1734
GAGGTGGGGATCAAT
3503




CACCTC

AGAGG






 8606
 8628
CCCCTCTATTGATCCCC
1735
GGAGGTGGGGATCAA
3504




ACCTCC

TAGAG






 8607
 8629
CCCTCTATTGATCCCCA
1736
TGGAGGTGGGGATCA
3505




CCTCCA

ATAGA






 8608
 8630
CCTCTATTGATCCCCAC
1737
TTGGAGGTGGGGATCA
3506




CTCCAA

ATAG






 8619
 8641
CCCCACCTCCAAATATC
1738
TGATGAGATATTTGGA
3507




TCATCA

GGTG






 8620
 8642
CCCACCTCCAAATATCT
1739
TTGATGAGATATTTGG
3508




CATCAA

AGGT






 8621
 8643
CCACCTCCAAATATCTC
1740
GTTGATGAGATATTTG
3509




ATCAAC

GAGG






 8624
 8646
CCTCCAAATATCTCATC
1741
GTTGTTGATGAGATAT
3510




AACAAC

TTGG






 8627
 8649
CCAAATATCTCATCAAC
1742
TCGGTTGTTGATGAGA
3511




AACCGA

TATT






 8646
 8668
CCGACTAATCACCACCC
1743
ATTGTTGGGTGGTGAT
3512




AACAAT

TAGT






 8657
 8679
CCACCCAACAATGACTA
1744
TTTGATTAGTCATTGTT
3513




ATCAAA

GGG






 8660
 8682
CCCAACAATGACTAATC
1745
TAGTTTGATTAGTCAT
3514




AAACTA

TGTT






 8661
 8683
CCAACAATGACTAATCA
1746
TTAGTTTGATTAGTCA
3515




AACTAA

TTGT






 8684
 8706
CCTCAAAACAAATGATA
1747
TATGGTTATCATTTGTT
3516




ACCATA

TTG






 8702
 8724
CCATACACAACACTAAA
1748
TCGTCCTTTAGTGTTGT
3517




GGACGA

GTA






 8726
 8748
CCTGATCTCTTATACTA
1749
GGATACTAGTATAAGA
3518




GTATCC

GATC






 8747
 8769
CCTTAATCATTTTTATTG
1750
TGTGGCAATAAAAATG
3519




CCACA

ATTA






 8765
 8787
CCACAACTAACCTCCTC
1751
GAGTCCGAGGAGGTTA
3520




GGACTC

GTTG






 8775
 8797
CCTCCTCGGACTCCTGC
1752
AGTGAGGCAGGAGTC
3521




CTCACT

CGAGG






 8778
 8800
CCTCGGACTCCTGCCTC
1753
ATGAGTGAGGCAGGA
3522




ACTCAT

GTCCG






 8787
 8809
CCTGCCTCACTCATTTA
1754
TTGGTGTAAATGAGTG
3523




CACCAA

AGGC






 8791
 8813
CCTCACTCATTTACACC
1755
GTGGTTGGTGTAAATG
3524




AACCAC

AGTG






 8806
 8828
CCAACCACCCAACTATC
1756
TTTATAGATAGTTGGG
3525




TATAAA

TGGT






 8810
 8832
CCACCCAACTATCTATA
1757
TAGGTTTATAGATAGT
3526




AACCTA

TGGG






 8813
 8835
CCCAACTATCTATAAAC
1758
GGCTAGGTTTATAGAT
3527




CTAGCC

AGTT






 8814
 8836
CCAACTATCTATAAACC
1759
TGGCTAGGTTTATAGA
3528




TAGCCA

TAGT






 8829
 8851
CCTAGCCATGGCCATCC
1760
ATAAGGGGATGGCCAT
3529




CCTTAT

GGCT






 8834
 8856
CCATGGCCATCCCCTTA
1761
CGCTCATAAGGGGATG
3530




TGAGCG

GCCA






 8840
 8862
CCATCCCCTTATGAGCG
1762
TGTGCCCGCTCATAAG
3531




GGCACA

GGGA






 8844
 8866
CCCCTTATGAGCGGGCA
1763
TCACTGTGCCCGCTCA
3532




CAGTGA

TAAG






 8845
 8867
CCCTTATGAGCGGGCAC
1764
ATCACTGTGCCCGCTC
3533




AGTGAT

ATAA






 8846
 8868
CCTTATGAGCGGGCACA
1765
AATCACTGTGCCCGCT
3534




GTGATT

CATA






 8897
 8919
CCCTAGCCCACTTCTTA
1766
TTGTGGTAAGAAGTGG
3535




CCACAA

GCTA






 8898
 8920
CCTAGCCCACTTCTTAC
1767
CTTGTGGTAAGAAGTG
3536




CACAAG

GGCT






 8903
 8925
CCCACTTCTTACCACAA
1768
TGTGCCTTGTGGTAAG
3537




GGCACA

AAGT






 8904
 8926
CCACTTCTTACCACAAG
1769
GTGTGCCTTGTGGTAA
3538




GCACAC

GAAG






 8914
 8936
CCACAAGGCACACCTAC
1770
AGGGGTGTAGGTGTGC
3539




ACCCCT

CTTG






 8926
 8948
CCTACACCCCTTATCCC
1771
AGTATGGGGATAAGG
3540




CATACT

GGTGT






 8932
 8954
CCCCTTATCCCCATACT
1772
ATAACTAGTATGGGGA
3541




AGTTAT

TAAG






 8933
 8955
CCCTTATCCCCATACTA
1773
AATAACTAGTATGGGG
3542




GTTATT

ATAA






 8934
 8956
CCTTATCCCCATACTAG
1774
TAATAACTAGTATGGG
3543




TTATTA

GATA






 8940
 8962
CCCCATACTAGTTATTA
1775
TTTCGATAATAACTAG
3544




TCGAAA

TATG






 8941
 8963
CCCATACTAGTTATTAT
1776
GTTTCGATAATAACTA
3545




CGAAAC

GTAT






 8942
 8964
CCATACTAGTTATTATC
1777
GGTTTCGATAATAACT
3546




GAAACC

AGTA






 8963
 8985
CCATCAGCCTACTCATT
1778
TGGTTGAATGAGTAGG
3547




CAACCA

CTGA






 8970
 8992
CCTACTCATTCAACCAA
1779
GGGCTATTGGTTGAAT
3548




TAGCCC

GAGT






 8983
 9005
CCAATAGCCCTGGCCGT
1780
AGGCGTACGGCCAGG
3549




ACGCCT

GCTAT






 8990
 9012
CCCTGGCCGTACGCCTA
1781
AGCGGTTAGGCGTACG
3550




ACCGCT

GCCA






 8991
 9013
CCTGGCCGTACGCCTAA
1782
TAGCGGTTAGGCGTAC
3551




CCGCTA

GGCC






 8996
 9018
CCGTACGCCTAACCGCT
1783
AATGTTAGCGGTTAGG
3552




AACATT

CGTA






 9003
 9025
CCTAACCGCTAACATTA
1784
CTGCAGTAATGTTAGC
3553




CTGCAG

GGTT






 9008
 9030
CCGCTAACATTACTGCA
1785
GTGGCCTGCAGTAATG
3554




GGCCAC

TTAG






 9027
 9049
CCACCTACTCATGCACC
1786
CAATTAGGTGCATGAG
3555




TAATTG

TAGG






 9030
 9052
CCTACTCATGCACCTAA
1787
TTCCAATTAGGTGCAT
3556




TTGGAA

GAGT






 9042
 9064
CCTAATTGGAAGCGCCA
1788
CTAGGGTGGCGCTTCC
3557




CCCTAG

AATT






 9056
 9078
CCACCCTAGCAATATCA
1789
AATGGTTGATATTGCT
3558




ACCATT

AGGG






 9059
 9081
CCCTAGCAATATCAACC
1790
GTTAATGGTTGATATT
3559




ATTAAC

GCTA






 9060
 9082
CCTAGCAATATCAACCA
1791
GGTTAATGGTTGATAT
3560




TTAACC

TGCT






 9074
 9096
CCATTAACCTTCCCTCT
1792
AAGTGTAGAGGGAAG
3561




ACACTT

GTTAA






 9081
 9103
CCTTCCCTCTACACTTAT
1793
AGATGATAAGTGTAGA
3562




CATCT

GGGA






 9085
 9107
CCCTCTACACTTATCAT
1794
GTGAAGATGATAAGTG
3563




CTTCAC

TAGA






 9086
 9108
CCTCTACACTTATCATC
1795
TGTGAAGATGATAAGT
3564




TTCACA

GTAG






 9129
 9151
CCTAGAAATCGCTGTCG
1796
TTAAGGCGACAGCGAT
3565




CCTTAA

TTCT






 9146
 9168
CCTTAATCCAAGCCTAC
1797
GAAAACGTAGGCTTGG
3566




GTTTTC

ATTA






 9153
 9175
CCAAGCCTACGTTTTCA
1798
GAAGTGTGAAAACGT
3567




CACTTC

AGGCT






 9158
 9180
CCTACGTTTTCACACTT
1799
TACTAGAAGTGTGAAA
3568




CTAGTA

ACGT






 9183
 9205
CCTCTACCTGCACGACA
1800
ATGTGTTGTCGTGCAG
3569




ACACAT

GTAG






 9189
 9211
CCTGCACGACAACACAT
1801
GTCATTATGTGTTGTC
3570




AATGAC

GTGC






 9211
 9233
CCCACCAATCACATGCC
1802
ATGATAGGCATGTGAT
3571




TATCAT

TGGT






 9212
 9234
CCACCAATCACATGCCT
1803
TATGATAGGCATGTGA
3572




ATCATA

TTGG






 9215
 9237
CCAATCACATGCCTATC
1804
CTATATGATAGGCATG
3573




ATATAG

TGAT






 9226
 9248
CCTATCATATAGTAAAA
1805
GCTGGGTTTTACTATA
3574




CCCAGC

TGAT






 9243
 9265
CCCAGCCCATGACCCCT
1806
CCTGTTAGGGGTCATG
3575




AACAGG

GGCT






 9244
 9266
CCAGCCCATGACCCCTA
1807
CCCTGTTAGGGGTCAT
3576




ACAGGG

GGGC






 9248
 9270
CCCATGACCCCTAACAG
1808
GGGCCCCTGTTAGGGG
3577




GGGCCC

TCAT






 9249
 9271
CCATGACCCCTAACAGG
1809
AGGGCCCCTGTTAGGG
3578




GGCCCT

GTCA






 9255
 9277
CCCCTAACAGGGGCCCT
1810
GCTGAGAGGGCCCCTG
3579




CTCAGC

TTAG






 9256
 9278
CCCTAACAGGGGCCCTC
1811
GGCTGAGAGGGCCCCT
3580




TCAGCC

GTTA






 9257
 9279
CCTAACAGGGGCCCTCT
1812
GGGCTGAGAGGGCCC
3581




CAGCCC

CTGTT






 9268
 9290
CCCTCTCAGCCCTCCTA
1813
GGTCATTAGGAGGGCT
3582




ATGACC

GAGA






 9269
 9291
CCTCTCAGCCCTCCTAA
1814
AGGTCATTAGGAGGGC
3583




TGACCT

TGAG






 9277
 9299
CCCTCCTAATGACCTCC
1815
TAGGCCGGAGGTCATT
3584




GGCCTA

AGGA






 9278
 9300
CCTCCTAATGACCTCCG
1816
CTAGGCCGGAGGTCAT
3585




GCCTAG

TAGG






 9281
 9303
CCTAATGACCTCCGGCC
1817
TGGCTAGGCCGGAGGT
3586




TAGCCA

CATT






 9289
 9311
CCTCCGGCCTAGCCATG
1818
AAATCACATGGCTAGG
3587




TGATTT

CCGG






 9292
 9314
CCGGCCTAGCCATGTGA
1819
GTGAAATCACATGGCT
3588




TTTCAC

AGGC






 9296
 9318
CCTAGCCATGTGATTTC
1820
GGAAGTGAAATCACAT
3589




ACTTCC

GGCT






 9301
 9323
CCATGTGATTTCACTTC
1821
GGAGTGGAAGTGAAA
3590




CACTCC

TCACA






 9317
 9339
CCACTCCATAACGCTCC
1822
GTATGAGGAGCGTTAT
3591




TCATAC

GGAG






 9322
 9344
CCATAACGCTCCTCATA
1823
GCCTAGTATGAGGAGC
3592




CTAGGC

GTTA






 9332
 9354
CCTCATACTAGGCCTAC
1824
TGGTTAGTAGGCCTAG
3593




TAACCA

TATG






 9344
 9366
CCTACTAACCAACACAC
1825
TGGTTAGTGTGTTGGT
3594




TAACCA

TAGT






 9352
 9374
CCAACACACTAACCATA
1826
TTGGTATATGGTTAGT
3595




TACCAA

GTGT






 9364
 9386
CCATATACCAATGATGG
1827
ATCGCGCCATCATTGG
3596




CGCGAT

TATA






 9371
 9393
CCAATGATGGCGCGATG
1828
GTGTTACATCGCGCCA
3597




TAACAC

TCAT






 9407
 9429
CCAAGGCCACCACACAC
1829
CAGGTGGTGTGTGGTG
3598




CACCTG

GCCT






 9413
 9435
CCACCACACACCACCTG
1830
TTTGGACAGGTGGTGT
3599




TCCAAA

GTGG






 9416
 9438
CCACACACCACCTGTCC
1831
CTTTTTGGACAGGTGG
3600




AAAAAG

TGTG






 9423
 9445
CCACCTGTCCAAAAAGG
1832
CGAAGGCCTTTTTGGA
3601




CCTTCG

CAGG






 9426
 9448
CCTGTCCAAAAAGGCCT
1833
TATCGAAGGCCTTTTT
3602




TCGATA

GGAC






 9431
 9453
CCAAAAAGGCCTTCGAT
1834
TCCCGTATCGAAGGCC
3603




ACGGGA

TTTT






 9440
 9462
CCTTCGATACGGGATAA
1835
ATAGGATTATCCCGTA
3604




TCCTAT

TCGA






 9458
 9480
CCTATTTATTACCTCAG
1836
AAACTTCTGAGGTAAT
3605




AAGTTT

AAAT






 9469
 9491
CCTCAGAAGTTTTTTTCT
1837
TGCGAAGAAAAAAAC
3606




TCGCA

TTCTG






 9505
 9527
CCTTTTACCACTCCAGC
1838
GGCTAGGCTGGAGTGG
3607




CTAGCC

TAAA






 9512
 9534
CCACTCCAGCCTAGCCC
1839
GGGTAGGGGCTAGGCT
3608




CTACCC

GGAG






 9517
 9539
CCAGCCTAGCCCCTACC
1840
TTGGGGGGTAGGGGCT
3609




CCCCAA

AGGC






 9521
 9543
CCTAGCCCCTACCCCCC
1841
CTAATTGGGGGGTAGG
3610




AATTAG

GGCT






 9526
 9548
CCCCTACCCCCCAATTA
1842
CCCTCCTAATTGGGGG
3611




GGAGGG

GTAG






 9527
 9549
CCCTACCCCCCAATTAG
1843
GCCCTCCTAATTGGGG
3612




GAGGGC

GGTA






 9528
 9550
CCTACCCCCCAATTAGG
1844
TGCCCTCCTAATTGGG
3613




AGGGCA

GGGT






 9532
 9554
CCCCCCAATTAGGAGGG
1845
CCAGTGCCCTCCTAAT
3614




CACTGG

TGGG






 9533
 9555
CCCCCAATTAGGAGGGC
1846
GCCAGTGCCCTCCTAA
3615




ACTGGC

TTGG






 9534
 9556
CCCCAATTAGGAGGGCA
1847
GGCCAGTGCCCTCCTA
3616




CTGGCC

ATTG






 9535
 9557
CCCAATTAGGAGGGCAC
1848
GGGCCAGTGCCCTCCT
3617




TGGCCC

AATT






 9536
 9558
CCAATTAGGAGGGCACT
1849
GGGGCCAGTGCCCTCC
3618




GGCCCC

TAAT






 9555
 9577
CCCCCAACAGGCATCAC
1850
AGCGGGGTGATGCCTG
3619




CCCGCT

TTGG






 9556
 9578
CCCCAACAGGCATCACC
1851
TAGCGGGGTGATGCCT
3620




CCGCTA

GTTG






 9557
 9579
CCCAACAGGCATCACCC
1852
TTAGCGGGGTGATGCC
3621




CGCTAA

TGTT






 9558
 9580
CCAACAGGCATCACCCC
1853
TTTAGCGGGGTGATGC
3622




GCTAAA

CTGT






 9571
 9593
CCCCGCTAAATCCCCTA
1854
GACTTCTAGGGGATTT
3623




GAAGTC

AGCG






 9572
 9594
CCCGCTAAATCCCCTAG
1855
GGACTTCTAGGGGATT
3624




AAGTCC

TAGC






 9573
 9595
CCGCTAAATCCCCTAGA
1856
GGGACTTCTAGGGGAT
3625




AGTCCC

TTAG






 9582
 9604
CCCCTAGAAGTCCCACT
1857
TTTAGGAGTGGGACTT
3626




CCTAAA

CTAG






 9583
 9605
CCCTAGAAGTCCCACTC
1858
GTTTAGGAGTGGGACT
3627




CTAAAC

TCTA






 9584
 9606
CCTAGAAGTCCCACTCC
1859
TGTTTAGGAGTGGGAC
3628




TAAACA

TTCT






 9593
 9615
CCCACTCCTAAACACAT
1860
ATACGGATGTGTTTAG
3629




CCGTAT

GAGT






 9594
 9616
CCACTCCTAAACACATC
1861
AATACGGATGTGTTTA
3630




CGTATT

GGAG






 9599
 9621
CCTAAACACATCCGTAT
1862
CGAGTAATACGGATGT
3631




TACTCG

GTTT






 9610
 9632
CCGTATTACTCGCATCA
1863
TACTCCTGATGCGAGT
3632




GGAGTA

AATA






 9640
 9662
CCTGAGCTCACCATAGT
1864
TATTAGACTATGGTGA
3633




CTAATA

GCTC






 9650
 9672
CCATAGTCTAATAGAAA
1865
GGTTGTTTTCTATTAG
3634




ACAACC

ACTA






 9671
 9693
CCGAAACCAAATAATTC
1866
GTGCTTGAATTATTTG
3635




AAGCAC

GTTT






 9677
 9699
CCAAATAATTCAAGCAC
1867
TAAGCAGTGCTTGAAT
3636




TGCTTA

TATT






 9727
 9749
CCCTCCTACAAGCCTCA
1868
GTACTCTGAGGCTTGT
3637




GAGTAC

AGGA






 9728
 9750
CCTCCTACAAGCCTCAG
1869
AGTACTCTGAGGCTTG
3638




AGTACT

TAGG






 9731
 9753
CCTACAAGCCTCAGAGT
1870
CGAAGTACTCTGAGGC
3639




ACTTCG

TTGT






 9739
 9761
CCTCAGAGTACTTCGAG
1871
GGGAGACTCGAAGTA
3640




TCTCCC

CTCTG






 9759
 9781
CCCTTCACCATTTCCGA
1872
ATGCCGTCGGAAATGG
3641




CGGCAT

TGAA






 9760
 9782
CCTTCACCATTTCCGAC
1873
GATGCCGTCGGAAATG
3642




GGCATC

GTGA






 9766
 9788
CCATTTCCGACGGCATC
1874
GCCGTAGATGCCGTCG
3643




TACGGC

GAAA






 9772
 9794
CCGACGGCATCTACGGC
1875
TGTTGAGCCGTAGATG
3644




TCAACA

CCGT






 9805
 9827
CCACAGGCTTCCACGGA
1876
GTGAAGTCCGTGGAAG
3645




CTTCAC

CCTG






 9815
 9837
CCACGGACTTCACGTCA
1877
CAATAATGACGTGAAG
3646




TTATTG

TCCG






 9848
 9870
CCTCACTATCTGCTTCA
1878
GGCGGATGAAGCAGA
3647




TCCGCC

TAGTG






 9866
 9888
CCGCCAACTAATATTTC
1879
TAAAGTGAAATATTAG
3648




ACTTTA

TTGG






 9869
 9891
CCAACTAATATTTCACT
1880
ATGTAAAGTGAAATAT
3649




TTACAT

TAGT






 9892
 9914
CCAAACATCACTTTGGC
1881
TTCGAAGCCAAAGTGA
3650




TTCGAA

TGTT






 9916
 9938
CCGCCGCCTGATACTGG
1882
AAAATGCCAGTATCAG
3651




CATTTT

GCGG






 9919
 9941
CCGCCTGATACTGGCAT
1883
TACAAAATGCCAGTAT
3652




TTTGTA

CAGG






 9922
 9944
CCTGATACTGGCATTTT
1884
ATCTACAAAATGCCAG
3653




GTAGAT

TATC






 9970
 9992
CCATCTATTGATGAGGG
1885
GTAAGACCCTCATCAA
3654




TCTTAC

TAGA






10012
10034
CCGTTAACTTCCAATTA
1886
ACTAGTTAATTGGAAG
3655




ACTAGT

TTAA






10022
10044
CCAATTAACTAGTTTTG
1887
TGTTGTCAAAACTAGT
3656




ACAACA

TAAT






10069
10091
CCTTAATTTTAATAATC
1888
GGTGTTGATTATTAAA
3657




AACACC

ATTA






10090
10112
CCCTCCTAGCCTTACTA
1889
TATTAGTAGTAAGGCT
3658




CTAATA

AGGA






10091
10113
CCTCCTAGCCTTACTAC
1890
TTATTAGTAGTAAGGC
3659




TAATAA

TAGG






10094
10116
CCTAGCCTTACTACTAA
1891
TAATTATTAGTAGTAA
3660




TAATTA

GGCT






10099
10121
CCTTACTACTAATAATT
1892
TGTAATAATTATTAGT
3661




ATTACA

AGTA






10131
10153
CCACAACTCAACGGCTA
1893
TCTATGTAGCCGTTGA
3662




CATAGA

GTTG






10159
10181
CCACCCCTTACGAGTGC
1894
GAAGCCGCACTCGTAA
3663




GGCTTC

GGGG






10162
10184
CCCCTTACGAGTGCGGC
1895
GTCGAAGCCGCACTCG
3664




TTCGAC

TAAG






10163
10185
CCCTTACGAGTGCGGCT
1896
GGTCGAAGCCGCACTC
3665




TCGACC

GTAA






10164
10186
CCTTACGAGTGCGGCTT
1897
GGGTCGAAGCCGCACT
3666




CGACCC

CGTA






10184
10206
CCCTATATCCCCCGCCC
1898
GGACGCGGGCGGGGG
3667




GCGTCC

ATATA






10185
10207
CCTATATCCCCCGCCCG
1899
GGGACGCGGGCGGGG
3668




CGTCCC

GATAT






10192
10214
CCCCCGCCCGCGTCCCT
1900
GGAGAAAGGGACGCG
3669




TTCTCC

GGCGG






10193
10215
CCCCGCCCGCGTCCCTT
1901
TGGAGAAAGGGACGC
3670




TCTCCA

GGGCG






10194
10216
CCCGCCCGCGTCCCTTT
1902
ATGGAGAAAGGGACG
3671




CTCCAT

CGGGC






10195
10217
CCGCCCGCGTCCCTTTC
1903
TATGGAGAAAGGGAC
3672




TCCATA

GCGGG






10198
10220
CCCGCGTCCCTTTCTCC
1904
TTTTATGGAGAAAGGG
3673




ATAAAA

ACGC






10199
10221
CCGCGTCCCTTTCTCCA
1905
ATTTTATGGAGAAAGG
3674




TAAAAT

GACG






10205
10227
CCCTTTCTCCATAAAAT
1906
AGAAGAATTTTATGGA
3675




TCTTCT

GAAA






10206
10228
CCTTTCTCCATAAAATT
1907
AAGAAGAATTTTATGG
3676




CTTCTT

AGAA






10213
10235
CCATAAAATTCTTCTTA
1908
AGCTACTAAGAAGAAT
3677




GTAGCT

TTTA






10240
10262
CCTTCTTATTATTTGATC
1909
TTCTAGATCAAATAAT
3678




TAGAA

AAGA






10267
10289
CCCTCCTTTTACCCCTAC
1910
TCATGGTAGGGGTAAA
3679




CATGA

AGGA






10268
10290
CCTCCTTTTACCCCTACC
1911
CTCATGGTAGGGGTAA
3680




ATGAG

AAGG






10271
10293
CCTTTTACCCCTACCAT
1912
GGGCTCATGGTAGGGG
3681




GAGCCC

TAAA






10278
10300
CCCCTACCATGAGCCCT
1913
GTTTGTAGGGCTCATG
3682




ACAAAC

GTAG






10279
10301
CCCTACCATGAGCCCTA
1914
TGTTTGTAGGGCTCAT
3683




CAAACA

GGTA






10280
10302
CCTACCATGAGCCCTAC
1915
TTGTTTGTAGGGCTCA
3684




AAACAA

TGGT






10284
10306
CCATGAGCCCTACAAAC
1916
TTAGTTGTTTGTAGGG
3685




AACTAA

CTCA






10291
10313
CCCTACAAACAACTAAC
1917
TGGCAGGTTAGTTGTT
3686




CTGCCA

TGTA






10292
10314
CCTACAAACAACTAACC
1918
GTGGCAGGTTAGTTGT
3687




TGCCAC

TTGT






10307
10329
CCTGCCACTAATAGTTA
1919
ATGACATAACTATTAG
3688




TGTCAT

TGGC






10311
10333
CCACTAATAGTTATGTC
1920
AGGGATGACATAACTA
3689




ATCCCT

TTAG






10330
10352
CCCTCTTATTAATCATC
1921
TAGGATGATGATTAAT
3690




ATCCTA

AAGA






10331
10353
CCTCTTATTAATCATCA
1922
CTAGGATGATGATTAA
3691




TCCTAG

TAAG






10349
10371
CCTAGCCCTAAGTCTGG
1923
CATAGGCCAGACTTAG
3692




CCTATG

GGCT






10354
10376
CCCTAAGTCTGGCCTAT
1924
TCACTCATAGGCCAGA
3693




GAGTGA

CTTA






10355
10377
CCTAAGTCTGGCCTATG
1925
GTCACTCATAGGCCAG
3694




AGTGAC

ACTT






10366
10388
CCTATGAGTGACTACAA
1926
TCCTTTTTGTAGTCACT
3695




AAAGGA

CAT






10399
10421
CCGAATTGGTATATAGT
1927
GTTTAAACTATATACC
3696




TTAAAC

AATT






10466
10488
CCAAATGCCCCTCATTT
1928
TTATGTAAATGAGGGG
3697




ACATAA

CATT






10473
10495
CCCCTCATTTACATAAA
1929
ATAATATTTATGTAAA
3698




TATTAT

TGAG






10474
10496
CCCTCATTTACATAAAT
1930
TATAATATTTATGTAA
3699




ATTATA

ATGA






10475
10497
CCTCATTTACATAAATA
1931
GTATAATATTTATGTA
3700




TTATAC

AATG






10507
10529
CCATCTCACTTCTAGGA
1932
TAGTATTCCTAGAAGT
3701




ATACTA

GAGA






10544
10566
CCTCATATCCTCCCTAC
1933
GGCATAGTAGGGAGG
3702




TATGCC

ATATG






10552
10574
CCTCCCTACTATGCCTA
1934
TCCTTCTAGGCATAGT
3703




GAAGGA

AGGG






10555
10577
CCCTACTATGCCTAGAA
1935
TATTCCTTCTAGGCAT
3704




GGAATA

AGTA






10556
10578
CCTACTATGCCTAGAAG
1936
TTATTCCTTCTAGGCA
3705




GAATAA

TAGT






10565
10587
CCTAGAAGGAATAATAC
1937
GCGATAGTATTATTCC
3706




TATCGC

TTCT






10612
10634
CCCTCAACACCCACTCC
1938
TAAGAGGGAGTGGGT
3707




CTCTTA

GTTGA






10613
10635
CCTCAACACCCACTCCC
1939
CTAAGAGGGAGTGGG
3708




TCTTAG

TGTTG






10621
10643
CCCACTCCCTCTTAGCC
1940
AATATTGGCTAAGAGG
3709




AATATT

GAGT






10622
10644
CCACTCCCTCTTAGCCA
1941
CAATATTGGCTAAGAG
3710




ATATTG

GGAG






10627
10649
CCCTCTTAGCCAATATT
1942
AGGCACAATATTGGCT
3711




GTGCCT

AAGA






10628
10650
CCTCTTAGCCAATATTG
1943
TAGGCACAATATTGGC
3712




TGCCTA

TAAG






10636
10658
CCAATATTGTGCCTATT
1944
TATGGCAATAGGCACA
3713




GCCATA

ATAT






10647
10669
CCTATTGCCATACTAGT
1945
GCAAAGACTAGTATGG
3714




CTTTGC

CAAT






10654
10676
CCATACTAGTCTTTGCC
1946
GCAGGCGGCAAAGAC
3715




GCCTGC

TAGTA






10669
10691
CCGCCTGCGAAGCAGCG
1947
GCCCACCGCTGCTTCG
3716




GTGGGC

CAGG






10672
10694
CCTGCGAAGCAGCGGTG
1948
TAGGCCCACCGCTGCT
3717




GGCCTA

TCGC






10691
10713
CCTAGCCCTACTAGTCT
1949
AGATTGAGACTAGTAG
3718




CAATCT

GGCT






10696
10718
CCCTACTAGTCTCAATC
1950
GTTGGAGATTGAGACT
3719




TCCAAC

AGTA






10697
10719
CCTACTAGTCTCAATCT
1951
TGTTGGAGATTGAGAC
3720




CCAACA

TAGT






10714
10736
CCAACACATATGGCCTA
1952
GTAGTCTAGGCCATAT
3721




GACTAC

GTGT






10727
10749
CCTAGACTACGTACATA
1953
TTAGGTTATGTACGTA
3722




ACCTAA

GTCT






10745
10767
CCTAAACCTACTCCAAT
1954
TTTAGCATTGGAGTAG
3723




GCTAAA

GTTT






10751
10773
CCTACTCCAATGCTAAA
1955
ATTAGTTTTAGCATTG
3724




ACTAAT

GAGT






10757
10779
CCAATGCTAAAACTAAT
1956
GGGACGATTAGTTTTA
3725




CGTCCC

GCAT






10777
10799
CCCAACAATTATATTAC
1957
GTGGTAGTAATATAAT
3726




TACCAC

TGTT






10778
10800
CCAACAATTATATTACT
1958
AGTGGTAGTAATATAA
3727




ACCACT

TTGT






10796
10818
CCACTGACATGACTTTC
1959
TTTTTGGAAAGTCATG
3728




CAAAAA

TCAG






10812
10834
CCAAAAAACACATAATT
1960
GATTCAAATTATGTGT
3729




TGAATC

TTTT






10842
10864
CCACCCACAGCCTAATT
1961
GCTAATAATTAGGCTG
3730




ATTAGC

TGGG






10845
10867
CCCACAGCCTAATTATT
1962
GATGCTAATAATTAGG
3731




AGCATC

CTGT






10846
10868
CCACAGCCTAATTATTA
1963
TGATGCTAATAATTAG
3732




GCATCA

GCTG






10852
10874
CCTAATTATTAGCATCA
1964
GAGGGATGATGCTAAT
3733




TCCCTC

AATT






10870
10892
CCCTCTACTATTTTTTAA
1965
TTTGGTTAAAAAATAG
3734




CCAAA

TAGA






10871
10893
CCTCTACTATTTTTTAAC
1966
ATTTGGTTAAAAAATA
3735




CAAAT

GTAG






10888
10910
CCAAATCAACAACAACC
1967
TAAATAGGTTGTTGTT
3736




TATTTA

GATT






10903
10925
CCTATTTAGCTGTTCCC
1968
AGGTTGGGGAACAGCT
3737




CAACCT

AAAT






10917
10939
CCCCAACCTTTTCCTCC
1969
GGGGTCGGAGGAAAA
3738




GACCCC

GGTTG






10918
10940
CCCAACCTTTTCCTCCG
1970
GGGGGTCGGAGGAAA
3739




ACCCCC

AGGTT






10919
10941
CCAACCTTTTCCTCCGA
1971
AGGGGGTCGGAGGAA
3740




CCCCCT

AAGGT






10923
10945
CCTTTTCCTCCGACCCC
1972
TGTTAGGGGGTCGGAG
3741




CTAACA

GAAA






10929
10951
CCTCCGACCCCCTAACA
1973
GGGGGTTGTTAGGGGG
3742




ACCCCC

TCGG






10932
10954
CCGACCCCCTAACAACC
1974
GAGGGGGGTTGTTAGG
3743




CCCCTC

GGGT






10936
10958
CCCCCTAACAACCCCCC
1975
TTAGGAGGGGGGTTGT
3744




TCCTAA

TAGG






10937
10959
CCCCTAACAACCCCCCT
1976
ATTAGGAGGGGGGTTG
3745




CCTAAT

TTAG






10938
10960
CCCTAACAACCCCCCTC
1977
TATTAGGAGGGGGGTT
3746




CTAATA

GTTA






10939
10961
CCTAACAACCCCCCTCC
1978
GTATTAGGAGGGGGGT
3747




TAATAC

TGTT






10947
10969
CCCCCCTCCTAATACTA
1979
GGTAGTTAGTATTAGG
3748




ACTACC

AGGG






10948
10970
CCCCCTCCTAATACTAA
1980
AGGTAGTTAGTATTAG
3749




CTACCT

GAGG






10949
10971
CCCCTCCTAATACTAAC
1981
CAGGTAGTTAGTATTA
3750




TACCTG

GGAG






10950
10972
CCCTCCTAATACTAACT
1982
TCAGGTAGTTAGTATT
3751




ACCTGA

AGGA






10951
10973
CCTCCTAATACTAACTA
1983
GTCAGGTAGTTAGTAT
3752




CCTGAC

TAGG






10954
10976
CCTAATACTAACTACCT
1984
GGAGTCAGGTAGTTAG
3753




GACTCC

TATT






10968
10990
CCTGACTCCTACCCCTC
1985
GATTGTGAGGGGTAGG
3754




ACAATC

AGTC






10975
10997
CCTACCCCTCACAATCA
1986
TTGCCATGATTGTGAG
3755




TGGCAA

GGGT






10979
11001
CCCCTCACAATCATGGC
1987
TGGCTTGCCATGATTG
3756




AAGCCA

TGAG






10980
11002
CCCTCACAATCATGGCA
1988
TTGGCTTGCCATGATT
3757




AGCCAA

GTGA






10981
11003
CCTCACAATCATGGCAA
1989
GTTGGCTTGCCATGAT
3758




GCCAAC

TGTG






10999
11021
CCAACGCCACTTATCCA
1990
GTTCACTGGATAAGTG
3759




GTGAAC

GCGT






11005
11027
CCACTTATCCAGTGAAC
1991
ATAGTGGTTCACTGGA
3760




CACTAT

TAAG






11013
11035
CCAGTGAACCACTATCA
1992
TTTTCGTGATAGTGGT
3761




CGAAAA

TCAC






11021
11043
CCACTATCACGAAAAAA
1993
TAGAGTTTTTTTCGTG
3762




ACTCTA

ATAG






11044
11066
CCTCTCTATACTAATCT
1994
GTAGGGAGATTAGTAT
3763




CCCTAC

AGAG






11061
11083
CCCTACAAATCTCCTTA
1995
TATAATTAAGGAGATT
3764




ATTATA

TGTA






11062
11084
CCTACAAATCTCCTTAA
1996
TTATAATTAAGGAGAT
3765




TTATAA

TTGT






11073
11095
CCTTAATTATAACATTC
1997
GGCTGTGAATGTTATA
3766




ACAGCC

ATTA






11094
11116
CCACAGAACTAATCATA
1998
ATAAAATATGATTAGT
3767




TTTTAT

TCTG






11130
11152
CCACACTTATCCCCACC
1999
AGCCAAGGTGGGGAT
3768




TTGGCT

AAGTG






11140
11162
CCCCACCTTGGCTATCA
2000
GGGTGATGATAGCCAA
3769




TCACCC

GGTG






11141
11163
CCCACCTTGGCTATCAT
2001
CGGGTGATGATAGCCA
3770




CACCCG

AGGT






11142
11164
CCACCTTGGCTATCATC
2002
TCGGGTGATGATAGCC
3771




ACCCGA

AAGG






11145
11167
CCTTGGCTATCATCACC
2003
TCATCGGGTGATGATA
3772




CGATGA

GCCA






11160
11182
CCCGATGAGGCAACCA
2004
TTCTGGCTGGTTGCCT
3773




GCCAGAA

CATC






11161
11183
CCGATGAGGCAACCAG
2005
GTTCTGGCTGGTTGCC
3774




CCAGAAC

TCAT






11173
11195
CCAGCCAGAACGCCTGA
2006
CTGCGTTCAGGCGTTC
3775




ACGCAG

TGGC






11177
11199
CCAGAACGCCTGAACGC
2007
GTGCCTGCGTTCAGGC
3776




AGGCAC

GTTC






11185
11207
CCTGAACGCAGGCACAT
2008
GGAAGTATGTGCCTGC
3777




ACTTCC

GTTC






11206
11228
CCTATTCTACACCCTAG
2009
AGCCTACTAGGGTGTA
3778




TAGGCT

GAAT






11217
11239
CCCTAGTAGGCTCCCTT
2010
TAGGGGAAGGGAGCC
3779




CCCCTA

TACTA






11218
11240
CCTAGTAGGCTCCCTTC
2011
GTAGGGGAAGGGAGC
3780




CCCTAC

CTACT






11229
11251
CCCTTCCCCTACTCATC
2012
TAGTGCGATGAGTAGG
3781




GCACTA

GGAA






11230
11252
CCTTCCCCTACTCATCG
2013
TTAGTGCGATGAGTAG
3782




CACTAA

GGGA






11234
11256
CCCCTACTCATCGCACT
2014
TAAATTAGTGCGATGA
3783




AATTTA

GTAG






11235
11257
CCCTACTCATCGCACTA
2015
GTAAATTAGTGCGATG
3784




ATTTAC

AGTA






11236
11258
CCTACTCATCGCACTAA
2016
TGTAAATTAGTGCGAT
3785




TTTACA

GAGT






11268
11290
CCCTAGGCTCACTAAAC
2017
TAGAATGTTTAGTGAG
3786




ATTCTA

CCTA






11269
11291
CCTAGGCTCACTAAACA
2018
GTAGAATGTTTAGTGA
3787




TTCTAC

GCCT






11307
11329
CCCAAGAACTATCAAAC
2019
TCAGGAGTTTGATAGT
3788




TCCTGA

TCTT






11308
11330
CCAAGAACTATCAAACT
2020
CTCAGGAGTTTGATAG
3789




CCTGAG

TTCT






11325
11347
CCTGAGCCAACAACTTA
2021
TCATATTAAGTTGTTG
3790




ATATGA

GCTC






11331
11353
CCAACAACTTAATATGA
2022
AGCTAGTCATATTAAG
3791




CTAGCT

TTGT






11381
11403
CCTCTTTACGGACTCCA
2023
CATAAGTGGAGTCCGT
3792




CTTATG

AAAG






11395
11417
CCACTTATGACTCCCTA
2024
GGGCTTTAGGGAGTCA
3793




AAGCCC

TAAG






11407
11429
CCCTAAAGCCCATGTCG
2025
GGGCTTCGACATGGGC
3794




AAGCCC

TTTA






11408
11430
CCTAAAGCCCATGTCGA
2026
GGGGCTTCGACATGGG
3795




AGCCCC

CTTT






11415
11437
CCCATGTCGAAGCCCCC
2027
AGCGATGGGGGCTTCG
3796




ATCGCT

ACAT






11416
11438
CCATGTCGAAGCCCCCA
2028
CAGCGATGGGGGCTTC
3797




TCGCTG

GACA






11427
11449
CCCCCATCGCTGGGTCA
2029
TACTATTGACCCAGCG
3798




ATAGTA

ATGG






11428
11450
CCCCATCGCTGGGTCAA
2030
GTACTATTGACCCAGC
3799




TAGTAC

GATG






11429
11451
CCCATCGCTGGGTCAAT
2031
AGTACTATTGACCCAG
3800




AGTACT

CGAT






11430
11452
CCATCGCTGGGTCAATA
2032
AAGTACTATTGACCCA
3801




GTACTT

GCGA






11454
11476
CCGCAGTACTCTTAAAA
2033
GCCTAGTTTTAAGAGT
3802




CTAGGC

ACTG






11494
11516
CCTCACACTCATTCTCA
2034
GGGGGTTGAGAATGA
3803




ACCCCC

GTGTG






11512
11534
CCCCCTGACAAAACACA
2035
AGGCTATGTGTTTTGT
3804




TAGCCT

CAGG






11513
11535
CCCCTGACAAAACACAT
2036
TAGGCTATGTGTTTTG
3805




AGCCTA

TCAG






11514
11536
CCCTGACAAAACACATA
2037
GTAGGCTATGTGTTTT
3806




GCCTAC

GTCA






11515
11537
CCTGACAAAACACATAG
2038
GGTAGGCTATGTGTTT
3807




CCTACC

TGTC






11532
11554
CCTACCCCTTCCTTGTA
2039
GGATAGTACAAGGAA
3808




CTATCC

GGGGT






11536
11558
CCCCTTCCTTGTACTATC
2040
ATAGGGATAGTACAA
3809




CCTAT

GGAAG






11537
11559
CCCTTCCTTGTACTATCC
2041
CATAGGGATAGTACAA
3810




CTATG

GGAA






11538
11560
CCTTCCTTGTACTATCCC
2042
TCATAGGGATAGTACA
3811




TATGA

AGGA






11542
11564
CCTTGTACTATCCCTAT
2043
TGCCTCATAGGGATAG
3812




GAGGCA

TACA






11553
11575
CCCTATGAGGCATAATT
2044
TGTTATAATTATGCCT
3813




ATAACA

CATA






11554
11576
CCTATGAGGCATAATTA
2045
TTGTTATAATTATGCC
3814




TAACAA

TCAT






11580
11602
CCATCTGCCTACGACAA
2046
GTCTGTTTGTCGTAGG
3815




ACAGAC

CAGA






11587
11609
CCTACGACAAACAGACC
2047
ATTTTAGGTCTGTTTGT
3816




TAAAAT

CGT






11602
11624
CCTAAAATCGCTCATTG
2048
AGTATGCAATGAGCGA
3817




CATACT

TTTT






11635
11657
CCACATAGCCCTCGTAG
2049
CTGTTACTACGAGGGC
3818




TAACAG

TATG






11643
11665
CCCTCGTAGTAACAGCC
2050
GAGAATGGCTGTTACT
3819




ATTCTC

ACGA






11644
11666
CCTCGTAGTAACAGCCA
2051
TGAGAATGGCTGTTAC
3820




TTCTCA

TACG






11658
11680
CCATTCTCATCCAAACC
2052
TCAGGGGGTTTGGATG
3821




CCCTGA

AGAA






11668
11690
CCAAACCCCCTGAAGCT
2053
CGGTGAAGCTTCAGGG
3822




TCACCG

GGTT






11673
11695
CCCCCTGAAGCTTCACC
2054
TGCGCCGGTGAAGCTT
3823




GGCGCA

CAGG






11674
11696
CCCCTGAAGCTTCACCG
2055
CTGCGCCGGTGAAGCT
3824




GCGCAG

TCAG






11675
11697
CCCTGAAGCTTCACCGG
2056
ACTGCGCCGGTGAAGC
3825




CGCAGT

TTCA






11676
11698
CCTGAAGCTTCACCGGC
2057
GACTGCGCCGGTGAAG
3826




GCAGTC

CTTC






11688
11710
CCGGCGCAGTCATTCTC
2058
GATTATGAGAATGACT
3827




ATAATC

GCGC






11712
11734
CCCACGGGCTTACATCC
2059
TAATGAGGATGTAAGC
3828




TCATTA

CCGT






11713
11735
CCACGGGCTTACATCCT
2060
GTAATGAGGATGTAAG
3829




CATTAC

CCCG






11727
11749
CCTCATTACTATTCTGC
2061
TGCTAGGCAGAATAGT
3830




CTAGCA

AATG






11743
11765
CCTAGCAAACTCAAACT
2062
GTTCGTAGTTTGAGTT
3831




ACGAAC

TGCT






11788
11810
CCTCTCTCAAGGACTTC
2063
GAGTTTGAAGTCCTTG
3832




AAACTC

AGAG






11815
11837
CCCACTAATAGCTTTTT
2064
GTCATCAAAAAGCTAT
3833




GATGAC

TAGT






11816
11838
CCACTAATAGCTTTTTG
2065
AGTCATCAAAAAGCTA
3834




ATGACT

TTAG






11848
11870
CCTCGCTAACCTCGCCT
2066
GGGGTAAGGCGAGGT
3835




TACCCC

TAGCG






11857
11879
CCTCGCCTTACCCCCCA
2067
TAATAGTGGGGGGTAA
3836




CTATTA

GGCG






11862
11884
CCTTACCCCCCACTATT
2068
TAGGTTAATAGTGGGG
3837




AACCTA

GGTA






11867
11889
CCCCCCACTATTAACCT
2069
CCCAGTAGGTTAATAG
3838




ACTGGG

TGGG






11868
11890
CCCCCACTATTAACCTA
2070
TCCCAGTAGGTTAATA
3839




CTGGGA

GTGG






11869
11891
CCCCACTATTAACCTAC
2071
CTCCCAGTAGGTTAAT
3840




TGGGAG

AGTG






11870
11892
CCCACTATTAACCTACT
2072
TCTCCCAGTAGGTTAA
3841




GGGAGA

TAGT






11871
11893
CCACTATTAACCTACTG
2073
TTCTCCCAGTAGGTTA
3842




GGAGAA

ATAG






11881
11903
CCTACTGGGAGAACTCT
2074
GCACAGAGAGTTCTCC
3843




CTGTGC

CAGT






11910
11932
CCACGTTCTCCTGATCA
2075
GATATTTGATCAGGAG
3844




AATATC

AACG






11919
11941
CCTGATCAAATATCACT
2076
TAGGAGAGTGATATTT
3845




CTCCTA

GATC






11938
11960
CCTACTTACAGGACTCA
2077
GTATGTTGAGTCCTGT
3846




ACATAC

AAGT






11970
11992
CCCTATACTCCCTCTAC
2078
AAATATGTAGAGGGA
3847




ATATTT

GTATA






11971
11993
CCTATACTCCCTCTACA
2079
TAAATATGTAGAGGGA
3848




TATTTA

GTAT






11979
12001
CCCTCTACATATTTACC
2080
TGTTGTGGTAAATATG
3849




ACAACA

TAGA






11980
12002
CCTCTACATATTTACCA
2081
GTGTTGTGGTAAATAT
3850




CAACAC

GTAG






11994
12016
CCACAACACAATGGGG
2082
GAGTGAGCCCCATTGT
3851




CTCACTC

GTTG






12018
12040
CCCACCACATTAACAAC
2083
TTTTATGTTGTTAATGT
3852




ATAAAA

GGT






12019
12041
CCACCACATTAACAACA
2084
GTTTTATGTTGTTAAT
3853




TAAAAC

GTGG






12022
12044
CCACATTAACAACATAA
2085
AGGGTTTTATGTTGTT
3854




AACCCT

AATG






12041
12063
CCCTCATTCACACGAGA
2086
GTGTTTTCTCGTGTGA
3855




AAACAC

ATGA






12042
12064
CCTCATTCACACGAGAA
2087
GGTGTTTTCTCGTGTG
3856




AACACC

AATG






12063
12085
CCCTCATGTTCATACAC
2088
GGATAGGTGTATGAAC
3857




CTATCC

ATGA






12064
12086
CCTCATGTTCATACACC
2089
GGGATAGGTGTATGAA
3858




TATCCC

CATG






12079
12101
CCTATCCCCCATTCTCCT
2090
ATAGGAGGAGAATGG
3859




CCTAT

GGGAT






12084
12106
CCCCCATTCTCCTCCTAT
2091
GAGGGATAGGAGGAG
3860




CCCTC

AATGG






12085
12107
CCCCATTCTCCTCCTATC
2092
TGAGGGATAGGAGGA
3861




CCTCA

GAATG






12086
12108
CCCATTCTCCTCCTATCC
2093
TTGAGGGATAGGAGG
3862




CTCAA

AGAAT






12087
12109
CCATTCTCCTCCTATCCC
2094
GTTGAGGGATAGGAG
3863




TCAAC

GAGAA






12094
12116
CCTCCTATCCCTCAACC
2095
TGTCGGGGTTGAGGGA
3864




CCGACA

TAGG






12097
12119
CCTATCCCTCAACCCCG
2096
TGATGTCGGGGTTGAG
3865




ACATCA

GGAT






12102
12124
CCCTCAACCCCGACATC
2097
GGTAATGATGTCGGGG
3866




ATTACC

TTGA






12103
12125
CCTCAACCCCGACATCA
2098
CGGTAATGATGTCGGG
3867




TTACCG

GTTG






12109
12131
CCCCGACATCATTACCG
2099
AAAACCCGGTAATGAT
3868




GGTTTT

GTCG






12110
12132
CCCGACATCATTACCGG
2100
GAAAACCCGGTAATG
3869




GTTTTC

ATGTC






12111
12133
CCGACATCATTACCGGG
2101
GGAAAACCCGGTAAT
3870




TTTTCC

GATGT






12123
12145
CCGGGTTTTCCTCTTGT
2102
ATATTTACAAGAGGAA
3871




AAATAT

AACC






12132
12154
CCTCTTGTAAATATAGT
2103
GGTTAAACTATATTTA
3872




TTAACC

CAAG






12153
12175
CCAAAACATCAGATTGT
2104
AGATTCACAATCTGAT
3873




GAATCT

GTTT






12194
12216
CCCCTTATTTACCGAGA
2105
GAGCTTTCTCGGTAAA
3874




AAGCTC

TAAG






12195
12217
CCCTTATTTACCGAGAA
2106
TGAGCTTTCTCGGTAA
3875




AGCTCA

ATAA






12196
12218
CCTTATTTACCGAGAAA
2107
GTGAGCTTTCTCGGTA
3876




GCTCAC

AATA






12205
12227
CCGAGAAAGCTCACAA
2108
GCAGTTCTTGTGAGCT
3877




GAACTGC

TTCT






12237
12259
CCCCCATGTCTAACAAC
2109
AGCCATGTTGTTAGAC
3878




ATGGCT

ATGG






12238
12260
CCCCATGTCTAACAACA
2110
AAGCCATGTTGTTAGA
3879




TGGCTT

CATG






12239
12261
CCCATGTCTAACAACAT
2111
AAAGCCATGTTGTTAG
3880




GGCTTT

ACAT






12240
12262
CCATGTCTAACAACATG
2112
GAAAGCCATGTTGTTA
3881




GCTTTC

GACA






12288
12310
CCATTGGTCTTAGGCCC
2113
TTTTTGGGGCCTAAGA
3882




CAAAAA

CCAA






12302
12324
CCCCAAAAATTTTGGTG
2114
GAGTTGCACCAAAATT
3883




CAACTC

TTTG






12303
12325
CCCAAAAATTTTGGTGC
2115
GGAGTTGCACCAAAAT
3884




AACTCC

TTTT






12304
12326
CCAAAAATTTTGGTGCA
2116
TGGAGTTGCACCAAAA
3885




ACTCCA

TTTT






12324
12346
CCAAATAAAAGTAATA
2117
GCATGGTTATTACTTT
3886




ACCATGC

TATT






12341
12363
CCATGCACACTACTATA
2118
GGTGGTTATAGTAGTG
3887




ACCACC

TGCA






12359
12381
CCACCCTAACCCTGACT
2119
TAGGGAAGTCAGGGTT
3888




TCCCTA

AGGG






12362
12384
CCCTAACCCTGACTTCC
2120
AATTAGGGAAGTCAG
3889




CTAATT

GGTTA






12363
12385
CCTAACCCTGACTTCCC
2121
GAATTAGGGAAGTCA
3890




TAATTC

GGGTT






12368
12390
CCCTGACTTCCCTAATT
2122
GGGGGGAATTAGGGA
3891




CCCCCC

AGTCA






12369
12391
CCTGACTTCCCTAATTC
2123
TGGGGGGAATTAGGG
3892




CCCCCA

AAGTC






12377
12399
CCCTAATTCCCCCCATC
2124
GGTAAGGATGGGGGG
3893




CTTACC

AATTA






12378
12400
CCTAATTCCCCCCATCC
2125
TGGTAAGGATGGGGG
3894




TTACCA

GAATT






12385
12407
CCCCCCATCCTTACCAC
2126
ACGAGGGTGGTAAGG
3895




CCTCGT

ATGGG






12386
12408
CCCCCATCCTTACCACC
2127
AACGAGGGTGGTAAG
3896




CTCGTT

GATGG






12387
12409
CCCCATCCTTACCACCC
2128
TAACGAGGGTGGTAA
3897




TCGTTA

GGATG






12388
12410
CCCATCCTTACCACCCT
2129
TTAACGAGGGTGGTAA
3898




CGTTAA

GGAT






12389
12411
CCATCCTTACCACCCTC
2130
GTTAACGAGGGTGGTA
3899




GTTAAC

AGGA






12393
12415
CCTTACCACCCTCGTTA
2131
TAGGGTTAACGAGGGT
3900




ACCCTA

GGTA






12398
12420
CCACCCTCGTTAACCCT
2132
TTTGTTAGGGTTAACG
3901




AACAAA

AGGG






12401
12423
CCCTCGTTAACCCTAAC
2133
TTTTTTGTTAGGGTTA
3902




AAAAAA

ACGA






12402
12424
CCTCGTTAACCCTAACA
2134
TTTTTTTGTTAGGGTTA
3903




AAAAAA

ACG






12411
12433
CCCTAACAAAAAAAACT
2135
GGTATGAGTTTTTTTT
3904




CATACC

GTTA






12412
12434
CCTAACAAAAAAAACTC
2136
GGGTATGAGTTTTTTT
3905




ATACCC

TGTT






12432
12454
CCCCCATTATGTAAAAT
2137
CAATGGATTTTACATA
3906




CCATTG

ATGG






12433
12455
CCCCATTATGTAAAATC
2138
ACAATGGATTTTACAT
3907




CATTGT

AATG






12434
12456
CCCATTATGTAAAATCC
2139
GACAATGGATTTTACA
3908




ATTGTC

TAAT






12435
12457
CCATTATGTAAAATCCA
2140
CGACAATGGATTTTAC
3909




TTGTCG

ATAA






12449
12471
CCATTGTCGCATCCACC
2141
AATAAAGGTGGATGC
3910




TTTATT

GACAA






12461
12483
CCACCTTTATTATCAGT
2142
GAAGAGACTGATAAT
3911




CTCTTC

AAAGG






12464
12486
CCTTTATTATCAGTCTCT
2143
GGGGAAGAGACTGAT
3912




TCCCC

AATAA






12483
12505
CCCCACAACAATATTCA
2144
GGCACATGAATATTGT
3913




TGTGCC

TGTG






12484
12506
CCCACAACAATATTCAT
2145
AGGCACATGAATATTG
3914




GTGCCT

TTGT






12485
12507
CCACAACAATATTCATG
2146
TAGGCACATGAATATT
3915




TGCCTA

GTTG






12504
12526
CCTAGACCAAGAAGTTA
2147
AGATAATAACTTCTTG
3916




TTATCT

GTCT






12510
12532
CCAAGAAGTTATTATCT
2148
AGTTCGAGATAATAAC
3917




CGAACT

TTCT






12542
12564
CCACAACCCAAACAACC
2149
GAGCTGGGTTGTTTGG
3918




CAGCTC

GTTG






12548
12570
CCCAAACAACCCAGCTC
2150
TAGGGAGAGCTGGGTT
3919




TCCCTA

GTTT






12549
12571
CCAAACAACCCAGCTCT
2151
TTAGGGAGAGCTGGGT
3920




CCCTAA

TGTT






12557
12579
CCCAGCTCTCCCTAAGC
2152
TTTGAAGCTTAGGGAG
3921




TTCAAA

AGCT






12558
12580
CCAGCTCTCCCTAAGCT
2153
GTTTGAAGCTTAGGGA
3922




TCAAAC

GAGC






12566
12588
CCCTAAGCTTCAAACTA
2154
GTAGTCTAGTTTGAAG
3923




GACTAC

CTTA






12567
12589
CCTAAGCTTCAAACTAG
2155
AGTAGTCTAGTTTGAA
3924




ACTACT

GCTT






12593
12615
CCATAATATTCATCCCT
2156
TGCTACAGGGATGAAT
3925




GTAGCA

ATTA






12606
12628
CCCTGTAGCATTGTTCG
2157
ATGTAACGAACAATGC
3926




TTACAT

TACA






12607
12629
CCTGTAGCATTGTTCGT
2158
CATGTAACGAACAATG
3927




TACATG

CTAC






12632
12654
CCATCATAGAATTCTCA
2159
TCACAGTGAGAATTCT
3928




CTGTGA

ATGA






12669
12691
CCCAAACATTAATCAGT
2160
TGAAGAACTGATTAAT
3929




TCTTCA

GTTT






12670
12692
CCAAACATTAATCAGTT
2161
TTGAAGAACTGATTAA
3930




CTTCAA

TGTT






12708
12730
CCTAATTACCATACTAA
2162
CTAAGATTAGTATGGT
3931




TCTTAG

AATT






12716
12738
CCATACTAATCTTAGTT
2163
AGCGGTAACTAAGATT
3932




ACCGCT

AGTA






12734
12756
CCGCTAACAACCTATTC
2164
CAGTTGGAATAGGTTG
3933




CAACTG

TTAG






12744
12766
CCTATTCCAACTGTTCA
2165
AGCCGATGAACAGTTG
3934




TCGGCT

GAAT






12750
12772
CCAACTGTTCATCGGCT
2166
CCTCTCAGCCGATGAA
3935




GAGAGG

CAGT






12788
12810
CCTTCTTGCTCATCAGTT
2167
TCATCAACTGATGAGC
3936




GATGA

AAGA






12815
12837
CCCGAGCAGATGCCAAC
2168
TGCTGTGTTGGCATCT
3937




ACAGCA

GCTC






12816
12838
CCGAGCAGATGCCAAC
2169
CTGCTGTGTTGGCATC
3938




ACAGCAG

TGCT






12827
12849
CCAACACAGCAGCCATT
2170
TGCTTGAATGGCTGCT
3939




CAAGCA

GTGT






12839
12861
CCATTCAAGCAATCCTA
2171
GTTGTATAGGATTGCT
3940




TACAAC

TGAA






12852
12874
CCTATACAACCGTATCG
2172
TATCGCCGATACGGTT
3941




GCGATA

GTAT






12861
12883
CCGTATCGGCGATATCG
2173
TGAAACCGATATCGCC
3942




GTTTCA

GATA






12885
12907
CCTCGCCTTAGCATGAT
2174
GGATAAATCATGCTAA
3943




TTATCC

GGCG






12890
12912
CCTTAGCATGATTTATC
2175
GTGTAGGATAAATCAT
3944




CTACAC

GCTA






12906
12928
CCTACACTCCAACTCAT
2176
GGTCTCATGAGTTGGA
3945




GAGACC

GTGT






12914
12936
CCAACTCATGAGACCCA
2177
TTGTTGTGGGTCTCAT
3946




CAACAA

GAGT






12927
12949
CCCACAACAAATAGCCC
2178
TTAGAAGGGCTATTTG
3947




TTCTAA

TTGT






12928
12950
CCACAACAAATAGCCCT
2179
TTTAGAAGGGCTATTT
3948




TCTAAA

GTTG






12941
12963
CCCTTCTAAACGCTAAT
2180
GCTTGGATTAGCGTTT
3949




CCAAGC

AGAA






12942
12964
CCTTCTAAACGCTAATC
2181
GGCTTGGATTAGCGTT
3950




CAAGCC

TAGA






12958
12980
CCAAGCCTCACCCCACT
2182
CCTAGTAGTGGGGTGA
3951




ACTAGG

GGCT






12963
12985
CCTCACCCCACTACTAG
2183
GGAGGCCTAGTAGTGG
3952




GCCTCC

GGTG






12968
12990
CCCCACTACTAGGCCTC
2184
TAGGAGGAGGCCTAGT
3953




CTCCTA

AGTG






12969
12991
CCCACTACTAGGCCTCC
2185
CTAGGAGGAGGCCTA
3954




TCCTAG

GTAGT






12970
12992
CCACTACTAGGCCTCCT
2186
GCTAGGAGGAGGCCT
3955




CCTAGC

AGTAG






12981
13003
CCTCCTCCTAGCAGCAG
2187
TGCCTGCTGCTGCTAG
3956




CAGGCA

GAGG






12984
13006
CCTCCTAGCAGCAGCAG
2188
ATTTGCCTGCTGCTGC
3957




GCAAAT

TAGG






12987
13009
CCTAGCAGCAGCAGGC
2189
CTGATTTGCCTGCTGC
3958




AAATCAG

TGCT






13010
13032
CCCAATTAGGTCTCCAC
2190
TCAGGGGTGGAGACCT
3959




CCCTGA

AATT






13011
13033
CCAATTAGGTCTCCACC
2191
GTCAGGGGTGGAGAC
3960




CCTGAC

CTAAT






13023
13045
CCACCCCTGACTCCCCT
2192
TGGCTGAGGGGAGTCA
3961




CAGCCA

GGGG






13026
13048
CCCCTGACTCCCCTCAG
2193
CTATGGCTGAGGGGAG
3962




CCATAG

TCAG






13027
13049
CCCTGACTCCCCTCAGC
2194
TCTATGGCTGAGGGGA
3963




CATAGA

GTCA






13028
13050
CCTGACTCCCCTCAGCC
2195
TTCTATGGCTGAGGGG
3964




ATAGAA

AGTC






13035
13057
CCCCTCAGCCATAGAAG
2196
TGGGGCCTTCTATGGC
3965




GCCCCA

TGAG






13036
13058
CCCTCAGCCATAGAAGG
2197
GTGGGGCCTTCTATGG
3966




CCCCAC

CTGA






13037
13059
CCTCAGCCATAGAAGGC
2198
GGTGGGGCCTTCTATG
3967




CCCACC

GCTG






13043
13065
CCATAGAAGGCCCCACC
2199
GACTGGGGTGGGGCCT
3968




CCAGTC

TCTA






13053
13075
CCCCACCCCAGTCTCAG
2200
GTAGGGCTGAGACTGG
3969




CCCTAC

GGTG






13054
13076
CCCACCCCAGTCTCAGC
2201
AGTAGGGCTGAGACTG
3970




CCTACT

GGGT






13055
13077
CCACCCCAGTCTCAGCC
2202
GAGTAGGGCTGAGACT
3971




CTACTC

GGGG






13058
13080
CCCCAGTCTCAGCCCTA
2203
GTGGAGTAGGGCTGA
3972




CTCCAC

GACTG






13059
13081
CCCAGTCTCAGCCCTAC
2204
AGTGGAGTAGGGCTG
3973




TCCACT

AGACT






13060
13082
CCAGTCTCAGCCCTACT
2205
GAGTGGAGTAGGGCT
3974




CCACTC

GAGAC






13070
13092
CCCTACTCCACTCAAGC
2206
TATAGTGCTTGAGTGG
3975




ACTATA

AGTA






13071
13093
CCTACTCCACTCAAGCA
2207
CTATAGTGCTTGAGTG
3976




CTATAG

GAGT






13077
13099
CCACTCAAGCACTATAG
2208
CTACAACTATAGTGCT
3977




TTGTAG

TGAG






13119
13141
CCGCTTCCACCCCCTAG
2209
TTTCTGCTAGGGGGTG
3978




CAGAAA

GAAG






13125
13147
CCACCCCCTAGCAGAAA
2210
GGCTATTTTCTGCTAG
3979




ATAGCC

GGGG






13128
13150
CCCCCTAGCAGAAAATA
2211
GTGGGCTATTTTCTGC
3980




GCCCAC

TAGG






13129
13151
CCCCTAGCAGAAAATAG
2212
AGTGGGCTATTTTCTG
3981




CCCACT

CTAG






13130
13152
CCCTAGCAGAAAATAGC
2213
TAGTGGGCTATTTTCT
3982




CCACTA

GCTA






13131
13153
CCTAGCAGAAAATAGCC
2214
TTAGTGGGCTATTTTC
3983




CACTAA

TGCT






13146
13168
CCCACTAATCCAAACTC
2215
GTGTTAGAGTTTGGAT
3984




TAACAC

TAGT






13147
13169
CCACTAATCCAAACTCT
2216
AGTGTTAGAGTTTGGA
3985




AACACT

TTAG






13155
13177
CCAAACTCTAACACTAT
2217
CTAAGCATAGTGTTAG
3986




GCTTAG

AGTT






13187
13209
CCACTCTGTTCGCAGCA
2218
GCAGACTGCTGCGAAC
3987




GTCTGC

AGAG






13211
13233
CCCTTACACAAAATGAC
2219
TTTGATGTCATTTTGTG
3988




ATCAAA

TAA






13212
13234
CCTTACACAAAATGACA
2220
TTTTGATGTCATTTTGT
3989




TCAAAA

GTA






13244
13266
CCTTCTCCACTTCAAGT
2221
TAGTTGACTTGAAGTG
3990




CAACTA

GAGA






13250
13272
CCACTTCAAGTCAACTA
2222
GAGTCCTAGTTGACTT
3991




GGACTC

GAAG






13296
13318
CCAACCACACCTAGCAT
2223
GCAGGAATGCTAGGTG
3992




TCCTGC

TGGT






13300
13322
CCACACCTAGCATTCCT
2224
ATGTGCAGGAATGCTA
3993




GCACAT

GGTG






13305
13327
CCTAGCATTCCTGCACA
2225
TACAGATGTGCAGGAA
3994




TCTGTA

TGCT






13314
13336
CCTGCACATCTGTACCC
2226
AGGCGTGGGTACAGAT
3995




ACGCCT

GTGC






13328
13350
CCCACGCCTTCTTCAAA
2227
TATGGCTTTGAAGAAG
3996




GCCATA

GCGT






13329
13351
CCACGCCTTCTTCAAAG
2228
GTATGGCTTTGAAGAA
3997




CCATAC

GGCG






13334
13356
CCTTCTTCAAAGCCATA
2229
AAATAGTATGGCTTTG
3998




CTATTT

AAGA






13346
13368
CCATACTATTTATGTGC
2230
CCCGGAGCACATAAAT
3999




TCCGGG

AGTA






13364
13386
CCGGGTCCATCATCCAC
2231
AAGGTTGTGGATGATG
4000




AACCTT

GACC






13370
13392
CCATCATCCACAACCTT
2232
ATTGTTAAGGTTGTGG
4001




AACAAT

ATGA






13377
13399
CCACAACCTTAACAATG
2233
CTTGTTCATTGTTAAG
4002




AACAAG

GTTG






13383
13405
CCTTAACAATGAACAAG
2234
GAATATCTTGTTCATT
4003




ATATTC

GTTA






13430
13452
CCATACCTCTCACTTCA
2235
GGAGGTTGAAGTGAG
4004




ACCTCC

AGGTA






13435
13457
CCTCTCACTTCAACCTC
2236
GTGAGGGAGGTTGAA
4005




CCTCAC

GTGAG






13448
13470
CCTCCCTCACCATTGGC
2237
TAGGCTGCCAATGGTG
4006




AGCCTA

AGGG






13451
13473
CCCTCACCATTGGCAGC
2238
TGCTAGGCTGCCAATG
4007




CTAGCA

GTGA






13452
13474
CCTCACCATTGGCAGCC
2239
ATGCTAGGCTGCCAAT
4008




TAGCAT

GGTG






13457
13479
CCATTGGCAGCCTAGCA
2240
TGCTAATGCTAGGCTG
4009




TTAGCA

CCAA






13467
13489
CCTAGCATTAGCAGGAA
2241
AAGGTATTCCTGCTAA
4010




TACCTT

TGCT






13486
13508
CCTTTCCTCACAGGTTT
2242
GAGTAGAAACCTGTGA
4011




CTACTC

GGAA






13491
13513
CCTCACAGGTTTCTACT
2243
CTTTGGAGTAGAAACC
4012




CCAAAG

TGTG






13508
13530
CCAAAGACCACATCATC
2244
GGTTTCGATGATGTGG
4013




GAAACC

TCTT






13515
13537
CCACATCATCGAAACCG
2245
TGTTTGCGGTTTCGAT
4014




CAAACA

GATG






13529
13551
CCGCAAACATATCATAC
2246
GTTTGTGTATGATATG
4015




ACAAAC

TTTG






13553
13575
CCTGAGCCCTATCTATT
2247
GAGAGTAATAGATAG
4016




ACTCTC

GGCTC






13559
13581
CCCTATCTATTACTCTC
2248
AGCGATGAGAGTAAT
4017




ATCGCT

AGATA






13560
13582
CCTATCTATTACTCTCAT
2249
TAGCGATGAGAGTAAT
4018




CGCTA

AGAT






13583
13605
CCTCCCTGACAAGCGCC
2250
GCTATAGGCGCTTGTC
4019




TATAGC

AGGG






13586
13608
CCCTGACAAGCGCCTAT
2251
AGTGCTATAGGCGCTT
4020




AGCACT

GTCA






13587
13609
CCTGACAAGCGCCTATA
2252
GAGTGCTATAGGCGCT
4021




GCACTC

TGTC






13598
13620
CCTATAGCACTCGAATA
2253
AAGAATTATTCGAGTG
4022




ATTCTT

CTAT






13625
13647
CCCTAACAGGTCAACCT
2254
GAAGCGAGGTTGACCT
4023




CGCTTC

GTTA






13626
13648
CCTAACAGGTCAACCTC
2255
GGAAGCGAGGTTGAC
4024




GCTTCC

CTGTT






13639
13661
CCTCGCTTCCCCACCCT
2256
TTAGTAAGGGTGGGGA
4025




TACTAA

AGCG






13647
13669
CCCCACCCTTACTAACA
2257
CGTTAATGTTAGTAAG
4026




TTAACG

GGTG






13648
13670
CCCACCCTTACTAACAT
2258
TCGTTAATGTTAGTAA
4027




TAACGA

GGGT






13649
13671
CCACCCTTACTAACATT
2259
TTCGTTAATGTTAGTA
4028




AACGAA

AGGG






13652
13674
CCCTTACTAACATTAAC
2260
ATTTTCGTTAATGTTA
4029




GAAAAT

GTAA






13653
13675
CCTTACTAACATTAACG
2261
TATTTTCGTTAATGTTA
4030




AAAATA

GTA






13677
13699
CCCCACCCTACTAAACC
2262
TAATGGGGTTTAGTAG
4031




CCATTA

GGTG






13678
13700
CCCACCCTACTAAACCC
2263
TTAATGGGGTTTAGTA
4032




CATTAA

GGGT






13679
13701
CCACCCTACTAAACCCC
2264
TTTAATGGGGTTTAGT
4033




ATTAAA

AGGG






13682
13704
CCCTACTAAACCCCATT
2265
GCGTTTAATGGGGTTT
4034




AAACGC

AGTA






13683
13705
CCTACTAAACCCCATTA
2266
GGCGTTTAATGGGGTT
4035




AACGCC

TAGT






13692
13714
CCCCATTAAACGCCTGG
2267
CGGCTGCCAGGCGTTT
4036




CAGCCG

AATG






13693
13715
CCCATTAAACGCCTGGC
2268
CCGGCTGCCAGGCGTT
4037




AGCCGG

TAAT






13694
13716
CCATTAAACGCCTGGCA
2269
TCCGGCTGCCAGGCGT
4038




GCCGGA

TTAA






13704
13726
CCTGGCAGCCGGAAGCC
2270
CGAATAGGCTTCCGGC
4039




TATTCG

TGCC






13712
13734
CCGGAAGCCTATTCGCA
2271
AAATCCTGCGAATAGG
4040




GGATTT

CTTC






13719
13741
CCTATTCGCAGGATTTC
2272
TAATGAGAAATCCTGC
4041




TCATTA

GAAT






13754
13776
CCCCCGCATCCCCCTTC
2273
TGTTTGGAAGGGGGAT
4042




CAAACA

GCGG






13755
13777
CCCCGCATCCCCCTTCC
2274
TTGTTTGGAAGGGGGA
4043




AAACAA

TGCG






13756
13778
CCCGCATCCCCCTTCCA
2275
GTTGTTTGGAAGGGGG
4044




AACAAC

ATGC






13757
13779
CCGCATCCCCCTTCCAA
2276
TGTTGTTTGGAAGGGG
4045




ACAACA

GATG






13763
13785
CCCCCTTCCAAACAACA
2277
GGGGATTGTTGTTTGG
4046




ATCCCC

AAGG






13764
13786
CCCCTTCCAAACAACAA
2278
GGGGGATTGTTGTTTG
4047




TCCCCC

GAAG






13765
13787
CCCTTCCAAACAACAAT
2279
AGGGGGATTGTTGTTT
4048




CCCCCT

GGAA






13766
13788
CCTTCCAAACAACAATC
2280
GAGGGGGATTGTTGTT
4049




CCCCTC

TGGA






13770
13792
CCAAACAACAATCCCCC
2281
GGTAGAGGGGGATTGT
4050




TCTACC

TGTT






13782
13804
CCCCCTCTACCTAAAAC
2282
CTGTGAGTTTTAGGTA
4051




TCACAG

GAGG






13783
13805
CCCCTCTACCTAAAACT
2283
GCTGTGAGTTTTAGGT
4052




CACAGC

AGAG






13784
13806
CCCTCTACCTAAAACTC
2284
GGCTGTGAGTTTTAGG
4053




ACAGCC

TAGA






13785
13807
CCTCTACCTAAAACTCA
2285
GGGCTGTGAGTTTTAG
4054




CAGCCC

GTAG






13791
13813
CCTAAAACTCACAGCCC
2286
CAGCGAGGGCTGTGA
4055




TCGCTG

GTTTT






13805
13827
CCCTCGCTGTCACTTTC
2287
TCCTAGGAAAGTGACA
4056




CTAGGA

GCGA






13806
13828
CCTCGCTGTCACTTTCCT
2288
GTCCTAGGAAAGTGAC
4057




AGGAC

AGCG






13821
13843
CCTAGGACTTCTAACAG
2289
CTAGGGCTGTTAGAAG
4058




CCCTAG

TCCT






13838
13860
CCCTAGACCTCAACTAC
2290
GGTTAGGTAGTTGAGG
4059




CTAACC

TCTA






13839
13861
CCTAGACCTCAACTACC
2291
TGGTTAGGTAGTTGAG
4060




TAACCA

GTCT






13845
13867
CCTCAACTACCTAACCA
2292
GTTTGTTGGTTAGGTA
4061




ACAAAC

GTTG






13854
13876
CCTAACCAACAAACTTA
2293
TTATTTTAAGTTTGTTG
4062




AAATAA

GTT






13859
13881
CCAACAAACTTAAAATA
2294
GGATTTTATTTTAAGT
4063




AAATCC

TTGT






13880
13902
CCCCACTATGCACATTT
2295
GAAATAAAATGTGCAT
4064




TATTTC

AGTG






13881
13903
CCCACTATGCACATTTT
2296
AGAAATAAAATGTGC
4065




ATTTCT

ATAGT






13882
13904
CCACTATGCACATTTTA
2297
GAGAAATAAAATGTG
4066




TTTCTC

CATAG






13904
13926
CCAACATACTCGGATTC
2298
AGGGTAGAATCCGAGT
4067




TACCCT

ATGT






13923
13945
CCCTAGCATCACACACC
2299
TTGTGCGGTGTGTGAT
4068




GCACAA

GCTA






13924
13946
CCTAGCATCACACACCG
2300
ATTGTGCGGTGTGTGA
4069




CACAAT

TGCT






13938
13960
CCGCACAATCCCCTATC
2301
GGCCTAGATAGGGGAT
4070




TAGGCC

TGTG






13947
13969
CCCCTATCTAGGCCTTC
2302
TCGTAAGAAGGCCTAG
4071




TTACGA

ATAG






13948
13970
CCCTATCTAGGCCTTCT
2303
CTCGTAAGAAGGCCTA
4072




TACGAG

GATA






13949
13971
CCTATCTAGGCCTTCTT
2304
GCTCGTAAGAAGGCCT
4073




ACGAGC

AGAT






13959
13981
CCTTCTTACGAGCCAAA
2305
GCAGGTTTTGGCTCGT
4074




ACCTGC

AAGA






13971
13993
CCAAAACCTGCCCCTAC
2306
GGAGGAGTAGGGGCA
4075




TCCTCC

GGTTT






13977
13999
CCTGCCCCTACTCCTCC
2307
GGTCTAGGAGGAGTA
4076




TAGACC

GGGGC






13981
14003
CCCCTACTCCTCCTAGA
2308
GTTAGGTCTAGGAGGA
4077




CCTAAC

GTAG






13982
14004
CCCTACTCCTCCTAGAC
2309
GGTTAGGTCTAGGAGG
4078




CTAACC

AGTA






13983
14005
CCTACTCCTCCTAGACC
2310
AGGTTAGGTCTAGGAG
4079




TAACCT

GAGT






13989
14011
CCTCCTAGACCTAACCT
2311
CTAGTCAGGTTAGGTC
4080




GACTAG

TAGG






13992
14014
CCTAGACCTAACCTGAC
2312
TTTCTAGTCAGGTTAG
4081




TAGAAA

GTCT






13998
14020
CCTAACCTGACTAGAAA
2313
ATAGCTTTTCTAGTCA
4082




AGCTAT

GGTT






14003
14025
CCTGACTAGAAAAGCTA
2314
AGGTAATAGCTTTTCT
4083




TTACCT

AGTC






14023
14045
CCTAAAACAATTTCACA
2315
TGGTGCTGTGAAATTG
4084




GCACCA

TTTT






14043
14065
CCAAATCTCCACCTCCA
2316
TGATGATGGAGGTGGA
4085




TCATCA

GATT






14051
14073
CCACCTCCATCATCACC
2317
GGTTGAGGTGATGATG
4086




TCAACC

GAGG






14054
14076
CCTCCATCATCACCTCA
2318
TTGGGTTGAGGTGATG
4087




ACCCAA

ATGG






14057
14079
CCATCATCACCTCAACC
2319
TTTTTGGGTTGAGGTG
4088




CAAAAA

ATGA






14066
14088
CCTCAACCCAAAAAGGC
2320
AATTATGCCTTTTTGG
4089




ATAATT

GTTG






14072
14094
CCCAAAAAGGCATAATT
2321
AAGTTTAATTATGCCT
4090




AAACTT

TTTT






14073
14095
CCAAAAAGGCATAATTA
2322
AAAGTTTAATTATGCC
4091




AACTTT

TTTT






14100
14122
CCTCTCTTTCTTCTTCCC
2323
TGAGTGGGAAGAAGA
4092




ACTCA

AAGAG






14115
14137
CCCACTCATCCTAACCC
2324
GGAGTAGGGTTAGGAT
4093




TACTCC

GAGT






14116
14138
CCACTCATCCTAACCCT
2325
AGGAGTAGGGTTAGG
4094




ACTCCT

ATGAG






14124
14146
CCTAACCCTACTCCTAA
2326
ATGTGATTAGGAGTAG
4095




TCACAT

GGTT






14129
14151
CCCTACTCCTAATCACA
2327
AGGTTATGTGATTAGG
4096




TAACCT

AGTA






14130
14152
CCTACTCCTAATCACAT
2328
TAGGTTATGTGATTAG
4097




AACCTA

GAGT






14136
14158
CCTAATCACATAACCTA
2329
GGGGAATAGGTTATGT
4098




TTCCCC

GATT






14149
14171
CCTATTCCCCCGAGCAA
2330
TTGAGATTGCTCGGGG
4099




TCTCAA

GAAT






14155
14177
CCCCCGAGCAATCTCAA
2331
TTGTAATTGAGATTGC
4100




TTACAA

TCGG






14156
14178
CCCCGAGCAATCTCAAT
2332
ATTGTAATTGAGATTG
4101




TACAAT

CTCG






14157
14179
CCCGAGCAATCTCAATT
2333
TATTGTAATTGAGATT
4102




ACAATA

GCTC






14158
14180
CCGAGCAATCTCAATTA
2334
ATATTGTAATTGAGAT
4103




CAATAT

TGCT






14186
14208
CCAACAAACAATGTTCA
2335
ACTGGTTGAACATTGT
4104




ACCAGT

TTGT






14204
14226
CCAGTAACTACTACTAA
2336
CGTTGATTAGTAGTAG
4105




TCAACG

TTAC






14227
14249
CCCATAATCATACAAAG
2337
CGGGGGCTTTGTATGA
4106




CCCCCG

TTAT






14228
14250
CCATAATCATACAAAGC
2338
GCGGGGGCTTTGTATG
4107




CCCCGC

ATTA






14244
14266
CCCCCGCACCAATAGGA
2339
GGAGGATCCTATTGGT
4108




TCCTCC

GCGG






14245
14267
CCCCGCACCAATAGGAT
2340
GGGAGGATCCTATTGG
4109




CCTCCC

TGCG






14246
14268
CCCGCACCAATAGGATC
2341
CGGGAGGATCCTATTG
4110




CTCCCG

GTGC






14247
14269
CCGCACCAATAGGATCC
2342
TCGGGAGGATCCTATT
4111




TCCCGA

GGTG






14252
14274
CCAATAGGATCCTCCCG
2343
TTGATTCGGGAGGATC
4112




AATCAA

CTAT






14262
14284
CCTCCCGAATCAACCCT
2344
GGGGTCAGGGTTGATT
4113




GACCCC

CGGG






14265
14287
CCCGAATCAACCCTGAC
2345
AGAGGGGTCAGGGTT
4114




CCCTCT

GATTC






14266
14288
CCGAATCAACCCTGACC
2346
GAGAGGGGTCAGGGT
4115




CCTCTC

TGATT






14275
14297
CCCTGACCCCTCTCCTT
2347
TTTATGAAGGAGAGGG
4116




CATAAA

GTCA






14276
14298
CCTGACCCCTCTCCTTC
2348
ATTTATGAAGGAGAGG
4117




ATAAAT

GGTC






14281
14303
CCCCTCTCCTTCATAAA
2349
GAATAATTTATGAAGG
4118




TTATTC

AGAG






14282
14304
CCCTCTCCTTCATAAAT
2350
TGAATAATTTATGAAG
4119




TATTCA

GAGA






14283
14305
CCTCTCCTTCATAAATT
2351
CTGAATAATTTATGAA
4120




ATTCAG

GGAG






14288
14310
CCTTCATAAATTATTCA
2352
GGAAGCTGAATAATTT
4121




GCTTCC

ATGA






14309
14331
CCTACACTATTAAAGTT
2353
GTGGTAAACTTTAATA
4122




TACCAC

GTGT






14328
14350
CCACAACCACCACCCCA
2354
GTATGATGGGGTGGTG
4123




TCATAC

GTTG






14334
14356
CCACCACCCCATCATAC
2355
GAAAGAGTATGATGG
4124




TCTTTC

GGTGG






14337
14359
CCACCCCATCATACTCT
2356
GGTGAAAGAGTATGAT
4125




TTCACC

GGGG






14340
14362
CCCCATCATACTCTTTC
2357
GTGGGTGAAAGAGTAT
4126




ACCCAC

GATG






14341
14363
CCCATCATACTCTTTCA
2358
TGTGGGTGAAAGAGTA
4127




CCCACA

TGAT






14342
14364
CCATCATACTCTTTCAC
2359
CTGTGGGTGAAAGAGT
4128




CCACAG

ATGA






14358
14380
CCCACAGCACCAATCCT
2360
GGAGGTAGGATTGGTG
4129




ACCTCC

CTGT






14359
14381
CCACAGCACCAATCCTA
2361
TGGAGGTAGGATTGGT
4130




CCTCCA

GCTG






14367
14389
CCAATCCTACCTCCATC
2362
GTTAGCGATGGAGGTA
4131




GCTAAC

GGAT






14372
14394
CCTACCTCCATCGCTAA
2363
GTGGGGTTAGCGATGG
4132




CCCCAC

AGGT






14376
14398
CCTCCATCGCTAACCCC
2364
TTTAGTGGGGTTAGCG
4133




ACTAAA

ATGG






14379
14401
CCATCGCTAACCCCACT
2365
TGTTTTAGTGGGGTTA
4134




AAAACA

GCGA






14389
14411
CCCCACTAAAACACTCA
2366
TCTTGGTGAGTGTTTT
4135




CCAAGA

AGTG






14390
14412
CCCACTAAAACACTCAC
2367
GTCTTGGTGAGTGTTT
4136




CAAGAC

TAGT






14391
14413
CCACTAAAACACTCACC
2368
GGTCTTGGTGAGTGTT
4137




AAGACC

TTAG






14406
14428
CCAAGACCTCAACCCCT
2369
GGGGTCAGGGGTTGA
4138




GACCCC

GGTCT






14412
14434
CCTCAACCCCTGACCCC
2370
GGCATGGGGGTCAGG
4139




CATGCC

GGTTG






14418
14440
CCCCTGACCCCCATGCC
2371
TCCTGAGGCATGGGGG
4140




TCAGGA

TCAG






14419
14441
CCCTGACCCCCATGCCT
2372
ATCCTGAGGCATGGGG
4141




CAGGAT

GTCA






14420
14442
CCTGACCCCCATGCCTC
2373
TATCCTGAGGCATGGG
4142




AGGATA

GGTC






14425
14447
CCCCCATGCCTCAGGAT
2374
AGGAGTATCCTGAGGC
4143




ACTCCT

ATGG






14426
14448
CCCCATGCCTCAGGATA
2375
GAGGAGTATCCTGAGG
4144




CTCCTC

CATG






14427
14449
CCCATGCCTCAGGATAC
2376
TGAGGAGTATCCTGAG
4145




TCCTCA

GCAT






14428
14450
CCATGCCTCAGGATACT
2377
TTGAGGAGTATCCTGA
4146




CCTCAA

GGCA






14433
14455
CCTCAGGATACTCCTCA
2378
GGCTATTGAGGAGTAT
4147




ATAGCC

CCTG






14445
14467
CCTCAATAGCCATCGCT
2379
TACTACAGCGATGGCT
4148




GTAGTA

ATTG






14454
14476
CCATCGCTGTAGTATAT
2380
CTTTGGATATACTACA
4149




CCAAAG

GCGA






14471
14493
CCAAAGACAACCATCAT
2381
GGGGGAATGATGGTTG
4150




TCCCCC

TCTT






14481
14503
CCATCATTCCCCCTAAA
2382
AATTTATTTAGGGGGA
4151




TAAATT

ATGA






14489
14511
CCCCCTAAATAAATTAA
2383
GTTTTTTTAATTTATTT
4152




AAAAAC

AGG






14490
14512
CCCCTAAATAAATTAAA
2384
AGTTTTTTTAATTTATT
4153




AAAACT

TAG






14491
14513
CCCTAAATAAATTAAAA
2385
TAGTTTTTTTAATTTAT
4154




AAACTA

TTA






14492
14514
CCTAAATAAATTAAAAA
2386
ATAGTTTTTTTAATTTA
4155




AACTAT

TTT






14519
14541
CCCATATAACCTCCCCC
2387
AATTTTGGGGGAGGTT
4156




AAAATT

ATAT






14520
14542
CCATATAACCTCCCCCA
2388
GAATTTTGGGGGAGGT
4157




AAATTC

TATA






14528
14550
CCTCCCCCAAAATTCAG
2389
ATTATTCTGAATTTTG
4158




AATAAT

GGGG






14531
14553
CCCCCAAAATTCAGAAT
2390
GTTATTATTCTGAATTT
4159




AATAAC

TGG






14532
14554
CCCCAAAATTCAGAATA
2391
TGTTATTATTCTGAATT
4160




ATAACA

TTG






14533
14555
CCCAAAATTCAGAATAA
2392
GTGTTATTATTCTGAA
4161




TAACAC

TTTT






14534
14556
CCAAAATTCAGAATAAT
2393
TGTGTTATTATTCTGA
4162




AACACA

ATTT






14557
14579
CCCGACCACACCGCTAA
2394
TGATTGTTAGCGGTGT
4163




CAATCA

GGTC






14558
14580
CCGACCACACCGCTAAC
2395
TTGATTGTTAGCGGTG
4164




AATCAA

TGGT






14562
14584
CCACACCGCTAACAATC
2396
AGTATTGATTGTTAGC
4165




AATACT

GGTG






14567
14589
CCGCTAACAATCAATAC
2397
GGTTTAGTATTGATTG
4166




TAAACC

TTAG






14588
14610
CCCCCATAAATAGGAGA
2398
AAGCCTTCTCCTATTT
4167




AGGCTT

ATGG






14589
14611
CCCCATAAATAGGAGA
2399
TAAGCCTTCTCCTATTT
4168




AGGCTTA

ATG






14590
14612
CCCATAAATAGGAGAA
2400
CTAAGCCTTCTCCTAT
4169




GGCTTAG

TTAT






14591
14613
CCATAAATAGGAGAAG
2401
TCTAAGCCTTCTCCTA
4170




GCTTAGA

TTTA






14620
14642
CCCCACAAACCCCATTA
2402
GTTTAGTAATGGGGTT
4171




CTAAAC

TGTG






14621
14643
CCCACAAACCCCATTAC
2403
GGTTTAGTAATGGGGT
4172




TAAACC

TTGT






14622
14644
CCACAAACCCCATTACT
2404
GGGTTTAGTAATGGGG
4173




AAACCC

TTTG






14629
14651
CCCCATTACTAAACCCA
2405
TGAGTGTGGGTTTAGT
4174




CACTCA

AATG






14630
14652
CCCATTACTAAACCCAC
2406
TTGAGTGTGGGTTTAG
4175




ACTCAA

TAAT






14631
14653
CCATTACTAAACCCACA
2407
GTTGAGTGTGGGTTTA
4176




CTCAAC

GTAA






14642
14664
CCCACACTCAACAGAAA
2408
GCTTTGTTTCTGTTGA
4177




CAAAGC

GTGT






14643
14665
CCACACTCAACAGAAAC
2409
TGCTTTGTTTCTGTTGA
4178




AAAGCA

GTG






14694
14716
CCACGACCAATGATATG
2410
GTTTTTCATATCATTG
4179




AAAAAC

GTCG






14700
14722
CCAATGATATGAAAAAC
2411
ACGATGGTTTTTCATA
4180




CATCGT

TCAT






14716
14738
CCATCGTTGTATTTCAA
2412
TTGTAGTTGAAATACA
4181




CTACAA

ACGA






14744
14766
CCAATGACCCCAATACG
2413
GTTTTGCGTATTGGGG
4182




CAAAAC

TCAT






14751
14773
CCCCAATACGCAAAACT
2414
GGGGTTAGTTTTGCGT
4183




AACCCC

ATTG






14752
14774
CCCAATACGCAAAACTA
2415
GGGGGTTAGTTTTGCG
4184




ACCCCC

TATT






14753
14775
CCAATACGCAAAACTAA
2416
AGGGGGTTAGTTTTGC
4185




CCCCCT

GTAT






14770
14792
CCCCCTAATAAAATTAA
2417
GGTTAATTAATTTTAT
4186




TTAACC

TAGG






14771
14793
CCCCTAATAAAATTAAT
2418
TGGTTAATTAATTTTA
4187




TAACCA

TTAG






14772
14794
CCCTAATAAAATTAATT
2419
GTGGTTAATTAATTTT
4188




AACCAC

ATTA






14773
14795
CCTAATAAAATTAATTA
2420
AGTGGTTAATTAATTT
4189




ACCACT

TATT






14791
14813
CCACTCATTCATCGACC
2421
TGGGGAGGTCGATGA
4190




TCCCCA

ATGAG






14806
14828
CCTCCCCACCCCATCCA
2422
AGATGTTGGATGGGGT
4191




ACATCT

GGGG






14809
14831
CCCCACCCCATCCAACA
2423
CGGAGATGTTGGATGG
4192




TCTCCG

GGTG






14810
14832
CCCACCCCATCCAACAT
2424
GCGGAGATGTTGGATG
4193




CTCCGC

GGGT






14811
14833
CCACCCCATCCAACATC
2425
TGCGGAGATGTTGGAT
4194




TCCGCA

GGGG






14814
14836
CCCCATCCAACATCTCC
2426
TCATGCGGAGATGTTG
4195




GCATGA

GATG






14815
14837
CCCATCCAACATCTCCG
2427
ATCATGCGGAGATGTT
4196




CATGAT

GGAT






14816
14838
CCATCCAACATCTCCGC
2428
CATCATGCGGAGATGT
4197




ATGATG

TGGA






14820
14842
CCAACATCTCCGCATGA
2429
GTTTCATCATGCGGAG
4198




TGAAAC

ATGT






14829
14851
CCGCATGATGAAACTTC
2430
TGAGCCGAAGTTTCAT
4199




GGCTCA

CATG






14854
14876
CCTTGGCGCCTGCCTGA
2431
GGAGGATCAGGCAGG
4200




TCCTCC

CGCCA






14862
14884
CCTGCCTGATCCTCCAA
2432
GGTGATTTGGAGGATC
4201




ATCACC

AGGC






14866
14888
CCTGATCCTCCAAATCA
2433
CTGTGGTGATTTGGAG
4202




CCACAG

GATC






14872
14894
CCTCCAAATCACCACAG
2434
ATAGTCCTGTGGTGAT
4203




GACTAT

TTGG






14875
14897
CCAAATCACCACAGGAC
2435
GGAATAGTCCTGTGGT
4204




TATTCC

GATT






14883
14905
CCACAGGACTATTCCTA
2436
CATGGCTAGGAATAGT
4205




GCCATG

CCTG






14896
14918
CCTAGCCATGCACTACT
2437
CTGGTGAGTAGTGCAT
4206




CACCAG

GGCT






14901
14923
CCATGCACTACTCACCA
2438
GGCGTCTGGTGAGTAG
4207




GACGCC

TGCA






14915
14937
CCAGACGCCTCAACCGC
2439
GAAAAGGCGGTTGAG
4208




CTTTTC

GCGTC






14922
14944
CCTCAACCGCCTTTTCA
2440
GATTGATGAAAAGGC
4209




TCAATC

GGTTG






14928
14950
CCGCCTTTTCATCAATC
2441
GTGGGCGATTGATGAA
4210




GCCCAC

AAGG






14931
14953
CCTTTTCATCAATCGCC
2442
GATGTGGGCGATTGAT
4211




CACATC

GAAA






14946
14968
CCCACATCACTCGAGAC
2443
ATTTACGTCTCGAGTG
4212




GTAAAT

ATGT






14947
14969
CCACATCACTCGAGACG
2444
AATTTACGTCTCGAGT
4213




TAAATT

GATG






14983
15005
CCGCTACCTTCACGCCA
2445
CGCCATTGGCGTGAAG
4214




ATGGCG

GTAG






14989
15011
CCTTCACGCCAATGGCG
2446
TTGAGGCGCCATTGGC
4215




CCTCAA

GTGA






14997
15019
CCAATGGCGCCTCAATA
2447
AAAGAATATTGAGGC
4216




TTCTTT

GCCAT






15006
15028
CCTCAATATTCTTTATCT
2448
GAGGCAGATAAAGAA
4217




GCCTC

TATTG






15025
15047
CCTCTTCCTACACATCG
2449
CTCGCCCGATGTGTAG
4218




GGCGAG

GAAG






15031
15053
CCTACACATCGGGCGAG
2450
ATAGGCCTCGCCCGAT
4219




GCCTAT

GTGT






15049
15071
CCTATATTACGGATCAT
2451
AGAGAAATGATCCGTA
4220




TTCTCT

ATAT






15081
15103
CCTGAAACATCGGCATT
2452
GAGGATAATGCCGATG
4221




ATCCTC

TTTC






15100
15122
CCTCCTGCTTGCAACTA
2453
TTGCTATAGTTGCAAG
4222




TAGCAA

CAGG






15103
15125
CCTGCTTGCAACTATAG
2454
CTGTTGCTATAGTTGC
4223




CAACAG

AAGC






15126
15148
CCTTCATAGGCTATGTC
2455
CGGGAGGACATAGCCT
4224




CTCCCG

ATGA






15142
15164
CCTCCCGTGAGGCCAAA
2456
ATGATATTTGGCCTCA
4225




TATCAT

CGGG






15145
15167
CCCGTGAGGCCAAATAT
2457
AGAATGATATTTGGCC
4226




CATTCT

TCAC






15146
15168
CCGTGAGGCCAAATATC
2458
CAGAATGATATTTGGC
4227




ATTCTG

CTCA






15154
15176
CCAAATATCATTCTGAG
2459
TGGCCCCTCAGAATGA
4228




GGGCCA

TATT






15174
15196
CCACAGTAATTACAAAC
2460
TAGTAAGTTTGTAATT
4229




TTACTA

ACTG






15198
15220
CCGCCATCCCATACATT
2461
TGTCCCAATGTATGGG
4230




GGGACA

ATGG






15201
15223
CCATCCCATACATTGGG
2462
GTCTGTCCCAATGTAT
4231




ACAGAC

GGGA






15205
15227
CCCATACATTGGGACAG
2463
CTAGGTCTGTCCCAAT
4232




ACCTAG

GTAT






15206
15228
CCATACATTGGGACAGA
2464
ACTAGGTCTGTCCCAA
4233




CCTAGT

TGTA






15223
15245
CCTAGTTCAATGAATCT
2465
CTCCTCAGATTCATTG
4234




GAGGAG

AACT






15263
15285
CCCACCCTCACACGATT
2466
GTAAAGAATCGTGTGA
4235




CTTTAC

GGGT






15264
15286
CCACCCTCACACGATTC
2467
GGTAAAGAATCGTGTG
4236




TTTACC

AGGG






15267
15289
CCCTCACACGATTCTTT
2468
AAAGGTAAAGAATCG
4237




ACCTTT

TGTGA






15268
15290
CCTCACACGATTCTTTA
2469
GAAAGGTAAAGAATC
4238




CCTTTC

GTGTG






15285
15307
CCTTTCACTTCATCTTGC
2470
GAAGGGCAAGATGAA
4239




CCTTC

GTGAA






15302
15324
CCCTTCATTATTGCAGC
2471
GCTAGGGCTGCAATAA
4240




CCTAGC

TGAA






15303
15325
CCTTCATTATTGCAGCC
2472
TGCTAGGGCTGCAATA
4241




CTAGCA

ATGA






15318
15340
CCCTAGCAACACTCCAC
2473
TAGGAGGTGGAGTGTT
4242




CTCCTA

GCTA






15319
15341
CCTAGCAACACTCCACC
2474
ATAGGAGGTGGAGTGT
4243




TCCTAT

TGCT






15331
15353
CCACCTCCTATTCTTGC
2475
TTTCGTGCAAGAATAG
4244




ACGAAA

GAGG






15334
15356
CCTCCTATTCTTGCACG
2476
CCGTTTCGTGCAAGAA
4245




AAACGG

TAGG






15337
15359
CCTATTCTTGCACGAAA
2477
ATCCCGTTTCGTGCAA
4246




CGGGAT

GAAT






15367
15389
CCCCCTAGGAATCACCT
2478
AATGGGAGGTGATTCC
4247




CCCATT

TAGG






15368
15390
CCCCTAGGAATCACCTC
2479
GAATGGGAGGTGATTC
4248




CCATTC

CTAG






15369
15391
CCCTAGGAATCACCTCC
2480
GGAATGGGAGGTGATT
4249




CATTCC

CCTA






15370
15392
CCTAGGAATCACCTCCC
2481
CGGAATGGGAGGTGA
4250




ATTCCG

TTCCT






15381
15403
CCTCCCATTCCGATAAA
2482
GGTGATTTTATCGGAA
4251




ATCACC

TGGG






15384
15406
CCCATTCCGATAAAATC
2483
GAAGGTGATTTTATCG
4252




ACCTTC

GAAT






15385
15407
CCATTCCGATAAAATCA
2484
GGAAGGTGATTTTATC
4253




CCTTCC

GGAA






15390
15412
CCGATAAAATCACCTTC
2485
AGGGTGGAAGGTGATT
4254




CACCCT

TTAT






15402
15424
CCTTCCACCCTTACTAC
2486
GATTGTGTAGTAAGGG
4255




ACAATC

TGGA






15406
15428
CCACCCTTACTACACAA
2487
CTTTGATTGTGTAGTA
4256




TCAAAG

AGGG






15409
15431
CCCTTACTACACAATCA
2488
CGTCTTTGATTGTGTA
4257




AAGACG

GTAA






15410
15432
CCTTACTACACAATCAA
2489
GCGTCTTTGATTGTGT
4258




AGACGC

AGTA






15432
15454
CCCTCGGCTTACTTCTCT
2490
AAGGAAGAGAAGTAA
4259




TCCTT

GCCGA






15433
15455
CCTCGGCTTACTTCTCTT
2491
GAAGGAAGAGAAGTA
4260




CCTTC

AGCCG






15451
15473
CCTTCTCTCCTTAATGA
2492
TTAATGTCATTAAGGA
4261




CATTAA

GAGA






15459
15481
CCTTAATGACATTAACA
2493
GAATAGTGTTAATGTC
4262




CTATTC

ATTA






15485
15507
CCAGACCTCCTAGGCGA
2494
TCTGGGTCGCCTAGGA
4263




CCCAGA

GGTC






15490
15512
CCTCCTAGGCGACCCAG
2495
AATTGTCTGGGTCGCC
4264




ACAATT

TAGG






15493
15515
CCTAGGCGACCCAGACA
2496
TATAATTGTCTGGGTC
4265




ATTATA

GCCT






15502
15524
CCCAGACAATTATACCC
2497
TGGCTAGGGTATAATT
4266




TAGCCA

GTCT






15503
15525
CCAGACAATTATACCCT
2498
TTGGCTAGGGTATAAT
4267




AGCCAA

TGTC






15516
15538
CCCTAGCCAACCCCTTA
2499
GGTGTTTAAGGGGTTG
4268




AACACC

GCTA






15517
15539
CCTAGCCAACCCCTTAA
2500
GGGTGTTTAAGGGGTT
4269




ACACCC

GGCT






15522
15544
CCAACCCCTTAAACACC
2501
GGGAGGGGTGTTTAAG
4270




CCTCCC

GGGT






15526
15548
CCCCTTAAACACCCCTC
2502
TGTGGGGAGGGGTGTT
4271




CCCACA

TAAG






15527
15549
CCCTTAAACACCCCTCC
2503
ATGTGGGGAGGGGTGT
4272




CCACAT

TTAA






15528
15550
CCTTAAACACCCCTCCC
2504
GATGTGGGGAGGGGT
4273




CACATC

GTTTA






15537
15559
CCCCTCCCCACATCAAG
2505
TTCGGGCTTGATGTGG
4274




CCCGAA

GGAG






15538
15560
CCCTCCCCACATCAAGC
2506
ATTCGGGCTTGATGTG
4275




CCGAAT

GGGA






15539
15561
CCTCCCCACATCAAGCC
2507
CATTCGGGCTTGATGT
4276




CGAATG

GGGG






15542
15564
CCCCACATCAAGCCCGA
2508
TATCATTCGGGCTTGA
4277




ATGATA

TGTG






15543
15565
CCCACATCAAGCCCGAA
2509
ATATCATTCGGGCTTG
4278




TGATAT

ATGT






15544
15566
CCACATCAAGCCCGAAT
2510
AATATCATTCGGGCTT
4279




GATATT

GATG






15554
15576
CCCGAATGATATTTCCT
2511
GCGAATAGGAAATATC
4280




ATTCGC

ATTC






15555
15577
CCGAATGATATTTCCTA
2512
GGCGAATAGGAAATA
4281




TTCGCC

TCATT






15568
15590
CCTATTCGCCTACACAA
2513
GGAGAATTGTGTAGGC
4282




TTCTCC

GAAT






15576
15598
CCTACACAATTCTCCGA
2514
GACGGATCGGAGAATT
4283




TCCGTC

GTGT






15589
15611
CCGATCCGTCCCTAACA
2515
CTAGTTTGTTAGGGAC
4284




AACTAG

GGAT






15594
15616
CCGTCCCTAACAAACTA
2516
GCCTCCTAGTTTGTTA
4285




GGAGGC

GGGA






15598
15620
CCCTAACAAACTAGGAG
2517
GGACGCCTCCTAGTTT
4286




GCGTCC

GTTA






15599
15621
CCTAACAAACTAGGAG
2518
AGGACGCCTCCTAGTT
4287




GCGTCCT

TGTT






15619
15641
CCTTGCCCTATTACTAT
2519
GGATGGATAGTAATAG
4288




CCATCC

GGCA






15624
15646
CCCTATTACTATCCATC
2520
GATGAGGATGGATAGT
4289




CTCATC

AATA






15625
15647
CCTATTACTATCCATCC
2521
GGATGAGGATGGATA
4290




TCATCC

GTAAT






15636
15658
CCATCCTCATCCTAGCA
2522
GATTATTGCTAGGATG
4291




ATAATC

AGGA






15640
15662
CCTCATCCTAGCAATAA
2523
TGGGGATTATTGCTAG
4292




TCCCCA

GATG






15646
15668
CCTAGCAATAATCCCCA
2524
GGAGGATGGGGATTAT
4293




TCCTCC

TGCT






15658
15680
CCCCATCCTCCATATAT
2525
GTTTGGATATATGGAG
4294




CCAAAC

GATG






15659
15681
CCCATCCTCCATATATC
2526
TGTTTGGATATATGGA
4295




CAAACA

GGAT






15660
15682
CCATCCTCCATATATCC
2527
TTGTTTGGATATATGG
4296




AAACAA

AGGA






15664
15686
CCTCCATATATCCAAAC
2528
TTTGTTGTTTGGATAT
4297




AACAAA

ATGG






15667
15689
CCATATATCCAAACAAC
2529
TGCTTTGTTGTTTGGAT
4298




AAAGCA

ATA






15675
15697
CCAAACAACAAAGCAT
2530
AAATATTATGCTTTGT
4299




AATATTT

TGTT






15700
15722
CCCACTAAGCCAATCAC
2531
AATAAAGTGATTGGCT
4300




TTTATT

TAGT






15701
15723
CCACTAAGCCAATCACT
2532
CAATAAAGTGATTGGC
4301




TTATTG

TTAG






15709
15731
CCAATCACTTTATTGAC
2533
CTAGGAGTCAATAAAG
4302




TCCTAG

TGAT






15727
15749
CCTAGCCGCAGACCTCC
2534
GAATGAGGAGGTCTGC
4303




TCATTC

GGCT






15732
15754
CCGCAGACCTCCTCATT
2535
GGTTAGAATGAGGAG
4304




CTAACC

GTCTG






15739
15761
CCTCCTCATTCTAACCT
2536
CGATTCAGGTTAGAAT
4305




GAATCG

GAGG






15742
15764
CCTCATTCTAACCTGAA
2537
CTCCGATTCAGGTTAG
4306




TCGGAG

AATG






15753
15775
CCTGAATCGGAGGACA
2538
TACTGGTTGTCCTCCG
4307




ACCAGTA

ATTC






15770
15792
CCAGTAAGCTACCCTTT
2539
ATGGTAAAAGGGTAG
4308




TACCAT

CTTAC






15781
15803
CCCTTTTACCATCATTG
2540
CTTGTCCAATGATGGT
4309




GACAAG

AAAA






15782
15804
CCTTTTACCATCATTGG
2541
ACTTGTCCAATGATGG
4310




ACAAGT

TAAA






15789
15811
CCATCATTGGACAAGTA
2542
GGATGCTACTTGTCCA
4311




GCATCC

ATGA






15810
15832
CCGTACTATACTTCACA
2543
GATTGTTGTGAAGTAT
4312




ACAATC

AGTA






15832
15854
CCTAATCCTAATACCAA
2544
AGATAGTTGGTATTAG
4313




CTATCT

GATT






15838
15860
CCTAATACCAACTATCT
2545
TTAGGGAGATAGTTGG
4314




CCCTAA

TATT






15845
15867
CCAACTATCTCCCTAAT
2546
TTTTCAATTAGGGAGA
4315




TGAAAA

TAGT






15855
15877
CCCTAATTGAAAACAAA
2547
GAGTATTTTGTTTTCA
4316




ATACTC

ATTA






15856
15878
CCTAATTGAAAACAAAA
2548
TGAGTATTTTGTTTTCA
4317




TACTCA

ATT






15885
15907
CCTGTCCTTGTAGTATA
2549
TTAGTTTATACTACAA
4318




AACTAA

GGAC






15890
15912
CCTTGTAGTATAAACTA
2550
GTGTATTAGTTTATAC
4319




ATACAC

TACA






15912
15934
CCAGTCTTGTAAACCGG
2551
TCATCTCCGGTTTACA
4320




AGATGA

AGAC






15925
15947
CCGGAGATGAAAACCTT
2552
TGGAAAAAGGTTTTCA
4321




TTTCCA

TCTC






15938
15960
CCTTTTTCCAAGGACAA
2553
TCTGATTTGTCCTTGG
4322




ATCAGA

AAAA






15945
15967
CCAAGGACAAATCAGA
2554
CTTTTTCTCTGATTTGT
4323




GAAAAAG

CCT






15977
15999
CCACCATTAGCACCCAA
2555
TTAGCTTTGGGTGCTA
4324




AGCTAA

ATGG






15980
16002
CCATTAGCACCCAAAGC
2556
ATCTTAGCTTTGGGTG
4325




TAAGAT

CTAA






15989
16011
CCCAAAGCTAAGATTCT
2557
TAAATTAGAATCTTAG
4326




AATTTA

CTTT






15990
16012
CCAAAGCTAAGATTCTA
2558
TTAAATTAGAATCTTA
4327




ATTTAA

GCTT






16052
16074
CCACCCAAGTATTGACT
2559
TGGGTGAGTCAATACT
4328




CACCCA

TGGG






16055
16077
CCCAAGTATTGACTCAC
2560
TGATGGGTGAGTCAAT
4329




CCATCA

ACTT






16056
16078
CCAAGTATTGACTCACC
2561
TTGATGGGTGAGTCAA
4330




CATCAA

TACT






16071
16093
CCCATCAACAACCGCTA
2562
AATACATAGCGGTTGT
4331




TGTATT

TGAT






16072
16094
CCATCAACAACCGCTAT
2563
AAATACATAGCGGTTG
4332




GTATTT

TTGA






16082
16104
CCGCTATGTATTTCGTA
2564
GTAATGTACGAAATAC
4333




CATTAC

ATAG






16107
16129
CCAGCCACCATGAATAT
2565
CGTACAATATTCATGG
4334




TGTACG

TGGC






16111
16133
CCACCATGAATATTGTA
2566
GTACCGTACAATATTC
4335




CGGTAC

ATGG






16114
16136
CCATGAATATTGTACGG
2567
ATGGTACCGTACAATA
4336




TACCAT

TTCA






16133
16155
CCATAAATACTTGACCA
2568
TACAGGTGGTCAAGTA
4337




CCTGTA

TTTA






16147
16169
CCACCTGTAGTACATAA
2569
GGGTTTTTATGTACTA
4338




AAACCC

CAGG






16150
16172
CCTGTAGTACATAAAAA
2570
ATTGGGTTTTTATGTA
4339




CCCAAT

CTAC






16167
16189
CCCAATCCACATCAAAA
2571
AGGGGGTTTTGATGTG
4340




CCCCCT

GATT






16168
16190
CCAATCCACATCAAAAC
2572
GAGGGGGTTTTGATGT
4341




CCCCTC

GGAT






16173
16195
CCACATCAAAACCCCCT
2573
ATGGGGAGGGGGTTTT
4342




CCCCAT

GATG






16184
16206
CCCCCTCCCCATGCTTA
2574
TGCTTGTAAGCATGGG
4343




CAAGCA

GAGG






16185
16207
CCCCTCCCCATGCTTAC
2575
TTGCTTGTAAGCATGG
4344




AAGCAA

GGAG






16186
16208
CCCTCCCCATGCTTACA
2576
CTTGCTTGTAAGCATG
4345




AGCAAG

GGGA






16187
16209
CCTCCCCATGCTTACAA
2577
ACTTGCTTGTAAGCAT
4346




GCAAGT

GGGG






16190
16212
CCCCATGCTTACAAGCA
2578
TGTACTTGCTTGTAAG
4347




AGTACA

CATG






16191
16213
CCCATGCTTACAAGCAA
2579
CTGTACTTGCTTGTAA
4348




GTACAG

GCAT






16192
16214
CCATGCTTACAAGCAAG
2580
GCTGTACTTGCTTGTA
4349




TACAGC

AGCA






16221
16243
CCCTCAACTATCACACA
2581
AGTTGATGTGTGATAG
4350




TCAACT

TTGA






16222
16244
CCTCAACTATCACACAT
2582
CAGTTGATGTGTGATA
4351




CAACTG

GTTG






16250
16272
CCAAAGCCACCCCTCAC
2583
TAGTGGGTGAGGGGTG
4352




CCACTA

GCTT






16256
16278
CCACCCCTCACCCACTA
2584
GTATCCTAGTGGGTGA
4353




GGATAC

GGGG






16259
16281
CCCCTCACCCACTAGGA
2585
TTGGTATCCTAGTGGG
4354




TACCAA

TGAG






16260
16282
CCCTCACCCACTAGGAT
2586
GTTGGTATCCTAGTGG
4355




ACCAAC

GTGA






16261
16283
CCTCACCCACTAGGATA
2587
TGTTGGTATCCTAGTG
4356




CCAACA

GGTG






16266
16288
CCCACTAGGATACCAAC
2588
AGGTTTGTTGGTATCC
4357




AAACCT

TAGT






16267
16289
CCACTAGGATACCAACA
2589
TAGGTTTGTTGGTATC
4358




AACCTA

CTAG






16278
16300
CCAACAAACCTACCCAC
2590
TTAAGGGTGGGTAGGT
4359




CCTTAA

TTGT






16286
16308
CCTACCCACCCTTAACA
2591
ATGTACTGTTAAGGGT
4360




GTACAT

GGGT






16290
16312
CCCACCCTTAACAGTAC
2592
TACTATGTACTGTTAA
4361




ATAGTA

GGGT






16291
16313
CCACCCTTAACAGTACA
2593
GTACTATGTACTGTTA
4362




TAGTAC

AGGG






16294
16316
CCCTTAACAGTACATAG
2594
TATGTACTATGTACTG
4363




TACATA

TTAA






16295
16317
CCTTAACAGTACATAGT
2595
TTATGTACTATGTACT
4364




ACATAA

GTTA






16320
16342
CCATTTACCGTACATAG
2596
AATGTGCTATGTACGG
4365




CACATT

TAAA






16327
16349
CCGTACATAGCACATTA
2597
TGACTGTAATGTGCTA
4366




CAGTCA

TGTA






16353
16375
CCCTTCTCGTCCCCATG
2598
GTCATCCATGGGGACG
4367




GATGAC

AGAA






16354
16376
CCTTCTCGTCCCCATGG
2599
GGTCATCCATGGGGAC
4368




ATGACC

GAGA






16363
16385
CCCCATGGATGACCCCC
2600
TCTGAGGGGGGTCATC
4369




CTCAGA

CATG






16364
16386
CCCATGGATGACCCCCC
2601
ATCTGAGGGGGGTCAT
4370




TCAGAT

CCAT






16365
16387
CCATGGATGACCCCCCT
2602
TATCTGAGGGGGGTCA
4371




CAGATA

TCCA






16375
16397
CCCCCCTCAGATAGGGG
2603
AAGGGACCCCTATCTG
4372




TCCCTT

AGGG






16376
16398
CCCCCTCAGATAGGGGT
2604
CAAGGGACCCCTATCT
4373




CCCTTG

GAGG






16377
16399
CCCCTCAGATAGGGGTC
2605
TCAAGGGACCCCTATC
4374




CCTTGA

TGAG






16378
16400
CCCTCAGATAGGGGTCC
2606
GTCAAGGGACCCCTAT
4375




CTTGAC

CTGA






16379
16401
CCTCAGATAGGGGTCCC
2607
GGTCAAGGGACCCCTA
4376




TTGACC

TCTG






16393
16415
CCCTTGACCACCATCCT
2608
TCACGGAGGATGGTGG
4377




CCGTGA

TCAA






16394
16416
CCTTGACCACCATCCTC
2609
TTCACGGAGGATGGTG
4378




CGTGAA

GTCA






16400
16422
CCACCATCCTCCGTGAA
2610
ATTGATTTCACGGAGG
4379




ATCAAT

ATGG






16403
16425
CCATCCTCCGTGAAATC
2611
GATATTGATTTCACGG
4380




AATATC

AGGA






16407
16429
CCTCCGTGAAATCAATA
2612
GCGGGATATTGATTTC
4381




TCCCGC

ACGG






16410
16432
CCGTGAAATCAATATCC
2613
TGTGCGGGATATTGAT
4382




CGCACA

TTCA






16425
16447
CCCGCACAAGAGTGCTA
2614
GGAGAGTAGCACTCTT
4383




CTCTCC

GTGC






16426
16448
CCGCACAAGAGTGCTAC
2615
AGGAGAGTAGCACTCT
4384




TCTCCT

TGTG






16446
16468
CCTCGCTCCGGGCCCAT
2616
AGTGTTATGGGCCCGG
4385




AACACT

AGCG






16453
16475
CCGGGCCCATAACACTT
2617
ACCCCCAAGTGTTATG
4386




GGGGGT

GGCC






16458
16480
CCCATAACACTTGGGGG
2618
TAGCTACCCCCAAGTG
4387




TAGCTA

TTAT






16459
16481
CCATAACACTTGGGGGT
2619
TTAGCTACCCCCAAGT
4388




AGCTAA

GTTA






16494
16516
CCGACATCTGGTTCCTA
2620
CTGAAGTAGGAACCA
4389




CTTCAG

GATGT






16507
16529
CCTACTTCAGGGTCATA
2621
AGGCTTTATGACCCTG
4390




AAGCCT

AAGT






16527
16549
CCTAAATAGCCCACACG
2622
GGGGAACGTGTGGGCT
4391




TTCCCC

ATTT






16536
16558
CCCACACGTTCCCCTTA
2623
CTTATTTAAGGGGAAC
4392




AATAAG

GTGT






16537
16559
CCACACGTTCCCCTTAA
2624
TCTTATTTAAGGGGAA
4393




ATAAGA

CGTG






16546
16568
CCCCTTAAATAAGACAT
2625
ATCGTGATGTCTTATT
4394




CACGAT

TAAG






16547
16569
CCCTTAAATAAGACATC
2626
CATCGTGATGTCTTAT
4395




ACGATG

TTAA






16548
16570
CCTTAAATAAGACATCA
2627
CCATCGTGATGTCTTA
4396




CGATGG

TTTA









Applications

The gNAs (e.g., gRNAs) and collections of gNAs (e.g., gRNAs) provided herein are useful for a variety of applications, including depletion, partitioning, capture, or enrichment of target sequences of interest; genome-wide labeling; genome-wide editing; genome-wide function screens; and genome-wide regulation.


In one embodiment, the gNAs are selective for host nucleic acids in a biological sample from a host, but are not selective for non-host nucleic acids in the sample from a host. In one embodiment, the gNAs are selective for non-host nucleic acids from a biological sample from a host but are not selective for the host nucleic acids in the sample. In one embodiment, the gNAs are selective for both host nucleic acids and a subset of the non-host nucleic acids in a biological sample from a host. For example, where a complex biological sample comprises host nucleic acids and nucleic acids from more than one non-host organisms, the gRNAs may be selective for more than one of the non-host species. In such embodiments, the gNAs are used to serially deplete or partition the sequences that are not of interest. For example, saliva from a human contains human DNA, as well as the DNA of more than one bacterial species, but may also contain the genomic material of an unknown pathogenic organism. In such an embodiment, gNAs directed at the human DNA and the known bacteria can be used to serially deplete the human DNA, and the DNA of the known bacterial, thus resulting in a sample comprising the genomic material of the unknown pathogenic organism.


In an exemplary embodiment, the gNAs are selective for human host DNA obtained from a biological sample from the host, but do not hybridize with DNA from an unknown pathogen(s) also obtained from the sample.


In some embodiments, the gNAs are useful for depleting and partitioning of targeted sequences in a sample, enriching a sample for non-host nucleic acids, or serially depleting targeted nucleic acids in a sample comprising: providing nucleic acids extracted from a sample; and contacting the sample with a plurality of complexes comprising (i) any one of the collection of gNAs described herein and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins.


In some embodiments, the gNAs are useful for method of depletion and partitioning of targeted sequences in a sample comprising: providing nucleic acids extracted from a sample, wherein the extracted nucleic acids comprise sequences of interest and targeted sequences for one of depletion and partitioning; contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins, under conditions in which the nucleic acid-guided nuclease system proteins cleave the nucleic acids in the sample.


In some embodiments, the gNAs are useful for enriching a sample for non-host nucleic acids comprising: providing a sample comprising host nucleic acids and non-host nucleic acids; contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein comprising targeting sequences directed at the host nucleic acids; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins, under conditions in which the nucleic acid-guided nuclease system proteins cleave the host nucleic acids in the sample, thereby depleting the sample of host nucleic acids, and allowing for the enrichment of non-host nucleic acids.


In some embodiments, the gNAs are useful for one method for serially depleting targeted nucleic acids in a sample comprising: providing a biological sample from a host comprising host nucleic acids and non-host nucleic acids, wherein the non-host nucleic acids comprise nucleic acids from at least one known non-host organism and nucleic acids from an unknown non-host organism; providing a plurality of complexes comprising (i) a collection of gNAs provided herein, directed at the host nucleic acids; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins; mixing the nucleic acids from the biological sample with the gNA-nucleic acid-guided nuclease system protein complexes (e.g., gRNA-CRISPR/Cas system protein complexes) configured to hybridize to targeted sequences in the host nucleic acids, wherein at least a portion of the complexes hybridizes to the targeted sequences in the host nucleic acids, and wherein at least a portion of the host nucleic acids are cleaved; mixing the remaining nucleic acids from the biological sample with the gNA-nucleic acid-guided nuclease system protein complexes configured to hybridize to targeted sequences in the at least one known non-host nucleic acids, wherein at least a portion of the complexes hybridizes to the targeted sequences in the at least one non-host nucleic acids, and wherein at least a portion of the non-host nucleic acids are cleaved; and isolating the remaining nucleic acids from the unknown non-host organism and preparing for further analysis.


In some embodiments, the gNAs generated herein are used to perform genome-wide or targeted functional screens in a population of cells. In such an embodiment, libraries of in vitro-transcribed gNAs (e.g., gRNAs) or vectors encoding the gNAs can be introduced into a population of cells via transfection or other laboratory techniques known in the art, along with a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein, in a way that gNA-directed nucleic acid-guided nuclease system protein editing can be achieved to sequences across the entire genome or to a specific region of the genome. In one embodiment, the nucleic acid-guided nuclease system protein can be introduced as a DNA. In one embodiment, the nucleic acid-guided nuclease system protein can be introduced as mRNA. In one embodiment, the nucleic acid-guided nuclease system protein can be introduced as protein. In one exemplary embodiment, the nucleic acid-guided nuclease system protein is Cas9.


In some embodiments, the gNAs generated herein are used for the selective capture and/or enrichment of nucleic acid sequences of interest. For example, in some embodiments, the gNAs generated herein are used for capturing target nucleic acid sequences comprising: providing a sample comprising a plurality of nucleic acids; and contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins. Once the sequences of interest are captured, they can be further ligated to create, for example, a sequencing library.


In some embodiments, the gNAs generated herein are used for introducing labeled nucleotides at targeted sites of interest comprising: (a) providing a sample comprising a plurality of nucleic acid fragments; (b) contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-nickases (e.g. Cas9-nickases), wherein the gNAs are complementary to targeted sites of interest in the nucleic acid fragments, thereby generating a plurality of nicked nucleic acid fragments at the targeted sites of interest; and (c) contacting the plurality of nicked nucleic acid fragments with an enzyme capable of initiating nucleic acid synthesis at a nicked site, and labeled nucleotides, thereby generating a plurality of nucleic acid fragments comprising labeled nucleotides in the targeted sites of interest.


In some embodiments, the gNAs generated herein are used for capturing target nucleic acid sequences of interest comprising: (a) providing a sample comprising a plurality of adapter-ligated nucleic acids, wherein the nucleic acids are ligated to a first adapter at one end and are ligated to a second adapter at the other end; and (b) contacting the sample with a collection of gNAs which comprise a plurality of dead nucleic acid-guided nuclease-gNA complexes (e.g., dCas9-gRNA complexes), wherein the dead nucleic acid-guided nuclease (e.g., dCas9) is fused to a transposase, wherein the gNAs are complementary to targeted sites of interest contained in a subset of the nucleic acids, and wherein the dead nucleic acid-guided nuclease-gNA transposase complexes (e.g., dCas9-gRNA transposase complexes) are loaded with a plurality of third adapters, to generate a plurality of nucleic acids fragments comprising either a first or second adapter at one end and a third adapter at the other end. In one embodiment the method further comprises amplifying the product of step (b) using first or second adapter and third adapter-specific PCR.


In some embodiments, the gNAs generated herein are used to perform genome-wide or targeted activation or repression in a population of cells. In such an embodiment, libraries of in vitro-transcribed gNAs (e.g., gRNAs) or vectors encoding the gNAs can be introduced into a population of cells via transfection or other laboratory techniques known in the art, along with a catalytically dead nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein fused to an activator or repressor domain (catalytically dead nucleic acid-guided nuclease system protein-fusion protein), in a way that gNA-directed catalytically dead nucleic acid-guided nuclease system protein-mediated activation or repression can be achieved at sequences across the entire genome or to a specific region of the genome. In one embodiment, the catalytically dead nucleic acid-guided nuclease system protein-fusion protein can be introduced as DNA. In one embodiment, the catalytically dead nucleic acid-guided nuclease system protein-fusion protein can be introduced as mRNA. In one embodiment, the catalytically dead nucleic acid-guided nuclease system protein-fusion protein can be introduced as protein. In some embodiments, the collection of gNAs or nucleic acids encoding for gNAs exhibit specificity for more than one nucleic acid-guided nuclease system protein. In one exemplary embodiment, the catalytically dead nucleic acid-guided nuclease system protein is dCas9.


In some embodiments, the collection comprises gRNAs or nucleic acids encoding for gRNAs with specificity for Cas9 and one or more CRISPR/Cas system proteins selected from selected from the group consisting of Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the collection comprises gRNAs or nucleic acids encoding for gRNAs with specificity for various catalytically dead CRISPR/Cas system proteins fused to different fluorophores, for example for use in the labeling and/or visualization of different genomes or portions of genomes, for use in the labeling and/or visualization of different chromosomal regions, or for use in the labeling and/or visualization of the integration of viral genes/genomes into a genome.


In some embodiments, the collection of gNAs (or nucleic acids encoding for gNAs) have specificity for different nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins, and target different sequences of interest, for example from different species. For example, a first subset of gNAs from a collection of gNAs (or transcribed from a population of nucleic acids encoding such gNAs) targeting a genome from a first species can be first mixed with a first nucleic acid-guided nuclease system protein member (or an engineered version); and a second subset of gNAs from a collection of gNAs (or transcribed from a population of nucleic acids encoding such gNAs) targeting a genome from a second species can be mixed with a second different nucleic acid-guided nuclease system protein member (or an engineered version). In one embodiment, the nucleic acid-guided nuclease system proteins can be a catalytically dead version (for example dCas9) fused with different fluorophores, so that different targeted sequence of interest, e.g. different species genome, or different chromosomes of one species, can be labeled by different fluorescent labels. For example, different chromosomal regions can be labeled by different gRNA-targeted dCas9-fluorophores, for visualization of genetic translocations. For example, different viral genomes can be labeled by different gRNA-targeted dCas9-fluorophores, for visualization of integration of different viral genomes into the host genome. In another embodiment, the nucleic acid-guided nuclease system protein can be dCas9 fused with either activation or repression domain, so that different targeted sequence of interest, e.g. different chromosomes of a genome, can be differentially regulated. In another embodiment, the nucleic acid-guided nuclease system protein can be dCas9 fused different protein domain which can be recognized by different antibodies, so that different targeted sequence of interest, e.g. different DNA sequences within a sample mixture, can be differentially isolated.


Exemplary Compositions of the Invention

In one embodiment, provided herein is a composition comprising a nucleic acid fragment, a nickase nucleic acid-guided nuclease-gNA complex, and labeled nucleotides. In one exemplary embodiment, provided herein is a composition comprising a nucleic acid fragment, a nickase Cas9-gRNA complex, and labeled nucleotides. In such embodiments, the nucleic acid may comprise DNA. The nucleotides can be labeled, for example with biotin. The nucleotides can be part of an antibody-conjugate pair.


In one embodiment, provided herein is a composition comprising a nucleic acid fragment and a catalytically dead nucleic acid-guided nuclease-gNA complex, wherein the catalytically dead nucleic acid-guided nuclease is fused to a transposase. In one exemplary embodiment, provided herein is a composition comprising a DNA fragment and a dCas9-gRNA complex, wherein the dCas9 is fused to a transposase.


In one embodiment, provided herein is a composition comprising a nucleic acid fragment comprising methylated nucleotides, a nickase nucleic acid-guided nuclease-gNA complex, and unmethylated nucleotides. In an exemplary embodiment, provided herein is a composition comprising a DNA fragment comprising methylated nucleotides, a nickase Cas9-gRNA complex, and unmethylated nucleotides.


In one embodiment, provided herein is a gDNA complexed with a nucleic acid-guided-DNA endonuclease. In an exemplary embodiment, the nucleic acid-guided-DNA endonuclease is NgAgo.


In one embodiment, provided herein is a gDNA complexed with a nucleic acid-guided-RNA endonuclease.


In one embodiment, provided herein is a gRNA complexed with a nucleic acid-guided-DNA endonuclease.


In one embodiment, provided herein is a gRNA complexed with a nucleic acid-guided-RNA endonuclease. In one embodiment, the nucleic acid-guided-RNA endonuclease comprises C2c2.


Kits and Articles of Manufacture

The present application provides kits comprising any one or more of the compositions described herein, not limited to adapters, gNAs (e.g., gRNAs), gNA collections (e.g., gRNA collections), nucleic acid molecules encoding the gNA collections, and the like.


In one exemplary embodiment, the kit comprises a collection of DNA molecules capable of transcribing into a library of gRNAs wherein the gRNAs are targeted to human genomic or other sources of DNA sequences.


In one embodiment, the kit comprises a collection of gNAs wherein the gNAs are targeted to human genomic or other sources of DNA sequences.


In some embodiments, provided herein are kits comprising any of the collection of nucleic acids encoding gNAs, as described herein. In some embodiments, provided herein are kits comprising any of the collection of gNAs, as described herein.


The present application also provides all essential reagents and instructions for carrying out the methods of making the gNAs and the collection of nucleic acids encoding gNAs, as described herein. In some embodiments, provided herein are kits that comprise all essential reagents and instructions for carrying out the methods of making individual gNAs and collections of gNAs as described herein.


Also provided herein is computer software monitoring the information before and after contacting a sample with a gNA collection produced herein. In one exemplary embodiment, the software can compute and report the abundance of non-target sequence in the sample before and after providing gNA collection to ensure no off-target targeting occurs, and wherein the software can check the efficacy of targeted-depletion/encrichment/capture/partitioning/labeling/regulation/editing by comparing the abundance of the target sequence before and after providing gNA collection to the sample.


The following examples are included for illustrative purposes and are not intend to limit the scope of the invention.


EXAMPLES
Example 1: Construction of a gRNA Library from a T7 Promoter Human DNA Library
T7 Promoter Library Construction

Human genomic DNA (400 ng) was fragmented using an S2 Covaris sonicator (Covaris) for 8 cycles, to yield fragments of 200-300 bp in length. Fragmented DNA was repaired using the NEBNext End Repair Module (NEB) and incubated at 25° C. for 30 min, then heat inactivated at 75° C. for 20 min. To make T7 promoter adapters, oligos T7-1 (5′GCCTCGAGC*T*A*ATACGACTCACTATAGAG3′, * denotes a phosphorothioate backbone linkage)(SEQ ID NO: 4397) and T7-2 (sequence 5′Phos-CTCTATAGTGAGTCGTATTA3′) (SEQ ID NO: 4398) were admixed at 15 μM, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 30° C. T7 promoter blunt adapters (15 pmol total) were then added to the blunt-ended human genomic DNA fragments, and incubated with Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((2) in FIG. 1). Ligations were amplified with 2 μM oligo T7-1, using Hi-Fidelity 2× Master Mix (NEB) for 10 cycles of PCR (98° C. for 20 s, 63° C. for 20 s, 72° C. for 35 s). Amplification was verified by running a small aliquot on agarose gel electrophoresis. PCR amplified products were recovered using 0.6× AxyPrep beads (Axygen) according to the manufacturer's instructions, and resuspended in 15 μL of 10 mM Tris-HCl pH 8.


Digestion of DNA

PCR amplified T7 promoter DNA (2 μg total per digestion) was digested with 0.1 μL of Nt.CviPII (NEB) in 10 μL of NEB buffer 2 (50 mM NaCl, 10 mM Tris-HCl pH 7.9, 10 mM MgCl2, 100 μg/mL BSA) for 10 min at 37° C. ((3) in FIG. 1), then heat inactivated at 75° C. for 20 min. An additional 10 μL of NEB buffer 2 with 1 μL of T7 Endonuclease I (NEB) was added to the reaction, and incubated at 37° C. for 20 min ((4) in FIG. 1). Enzymatic digestion of DNA was verified by agarose gel electrophoresis. Digested DNA was recovered by adding 0.6× AxyPrep beads (Axygen), according to the manufacturer's instructions, and resuspended in 15 μL of 10 mM Tris-HCl pH 8.


Ligation of Adapters and Removal of HGG

DNA was then blunted using T4 DNA Polymerase (NEB) for 20 min at 25° C., followed by heat inactivation at 75° C. for 20 min ((5) in FIG. 1).


To make MlyI adapters, oligos MlyI-1 (sequence 5′>3′, 5′Phos-GGGACTCGGATCCCTATAGTGATACAAAGACGATGACGACAAGCG) (SEQ ID NO: 4399) and MlyI-2 (sequence 5′>3′, TCACTATAGGGATCCGAGTCCC) (SEQ ID NO: 4400) were admixed at 15 μM, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 30° C. MlyI adapters (15 pmol total) were then added to T4 DNA Polymerase-blunted DNA, and incubated with Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((6) in FIG. 1). Ligations were heat inactivated at 75° C. for 20 min, then digested with MlyI and XhoI (NEB) for 1 hr at 37° C., so that HGG motifs are eliminated ((7) in FIG. 1). Digests were then cleaned using 0.8× AxyPrep beads (Axygen), and DNA was resuspended in 10 μL of 10 mM Tris-Cl pH 8.


To make StlgR adapters, oligos stlgR (sequence 5′>3′, 5′Phos-GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTTGGATCCGATGC) (SEQ ID NO: 4401) and stlgRev (sequence 5′>3′, GGATCCAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAAC TTGCTATTTCTAGCTCTAAAAC) (SEQ ID NO: 4402) were admixed at 15 μM, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 60° C. StlgR adapters (5 pmol total) were added to HGG-removed DNA fragments, and incubated with Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((8) in FIG. 1). Ligations were then incubated with Hi-Fidelity 2× Master Mix (NEB), using 2 μM of both oligos T7-1 and gRU (sequence 5′>3′, AAAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4403), and amplified using 20 cycles of PCR (98° C. for 20 s, 60° C. for 20 s, 72° C. for 35 s). Amplification was verified by running a small aliquot on agarose gel electrophoresis. PCR amplified products were recovered using 0.6× AxyPrep beads (Axygen) according to the manufacturer's instructions, and resuspended in 15 μL of 10 mM Tris-HCl pH 8.


In Vitro Transcription

The T7/gRU amplified library of PCR products was then used as template for in vitro transcription, using the HiScribe T7 In Vitro Transcription Kit (NEB). 500-1000 ng of template was incubated overnight at 37° C. according to the manufacturer's instructions. To transcribe the guide libraries into gRNAs, the following in vitro transcription reaction mixture was assembled: 10 μL of purified library (˜500 ng), 6.5 μL of H2O, 2.25 μL of ATP, 2.25 μL of CTP, 2.25 μL of GTP, 2.25 μL of UTP, 2.25 μL of 10× reaction buffer (NEB) and 2.25 μL of T7 RNA Polymerase mix. The reaction was incubated at 37° C. for 24 hr, then purified using the RNA cleanup kit (Life Technologies), eluted with 100 μL of RNase-free water, quantified and stored at −20° C. until use.


Example 2: Construction of gRNA Library from Intact Human Genomic DNA
Digestion of DNA

Human genomic DNA ((1) in FIG. 2; 20 μg total per digestion) was digested with 0.1 μL of Nt.CviPII (NEB) in 40 μL of NEB buffer 2 (50 mM NaCl, 10 mM Tris-HCl pH 7.9, 10 mM MgCl2, 100 μg/mL BSA) for 10 min at 37° C., then heat inactivated at 75° C. for 20 min. An additional 40 μL of NEB buffer 2 and 1 μL of T7 Endonuclease I (NEB) was added to the reaction, with 20 min incubation at 37° C. (e.g., (2) in FIG. 2). Fragmentation of genomic DNA was verified with a small aliquot by agarose gel electrophoresis. DNA fragments between 200 and 600 bp were recovered by adding 0.3× AxyPrep beads (Axygen), incubating at 25° C. for 5 min, capturing beads on a magnetic stand and transferring the supernatant to a new tube. DNA fragments below 600 bp do not bind to beads at this bead/DNA ratio and remain in the supernatant. 0.7× AxyPrep beads (Axygen) were then added to the supernatant (this will bind all DNA molecules longer than 200 bp), allowed to bind for 5 min. Beads were captured on a magnetic stand and washed twice with 80% ethanol, air dried. DNA was then resuspended in 15 μL of 10 mM Tris-HCl pH 8. DNA concentration was determined using a Qbit assay (Life Technologies).


Ligation of Adapters

To make T7/MlyI adapters, oligos MlyI-1 (sequence 5′>3′, 5′Phos-GGGGGACTCGGATCCCTATAGTGATACAAAGACGATGACGACAAGCG) (SEQ ID NO: 4404) and T7-7 (sequence 5′>3′, GCCTCGAGC*T*A*ATACGACTCACTATAGGGATCCAAGTCCC, * denotes a phosphorothioate backbone linkage) (SEQ ID NO: 4405) were admixed at 15 μM, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 30° C. The purified, Nt.CviPII/T7 Endonuclease I digested DNA (100 ng) was then ligated to 15 pmol of T7/MlyI adapters using Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((3) in FIG. 2). Ligations were then amplified by 10 cycles of PCR (98° C. for 20 s, 60° C. for 20 s, 72° C. for 35 s) using Hi-Fidelity 2× Master Mix (NEB), and 2 μM of both oligos T7-17 (GCCTCGAGC*T*A*ATACGACTCACTATAGGG * denotes a phosphorothioate backbone linkage) (SEQ ID NO: 4406) and Flag (sequence 5′>3′, CGCTTGTCGTCATCGTCTTTGTA) (SEQ ID NO: 4407). PCR amplification increases the yield of DNA and, given the nature of the Y-shaped adapters we used, always resulted in T7 promoter being added distal to the HGG site and MlyI site being added next to the HGG motif ((4) in FIG. 2).


PCR products were then digested with MlyI and XhoI (NEB) for 1 hr at 37° C., and heat inactivated at 75° C. for 20 min ((5) in FIG. 2). Following that, 5 pmol of adapter StlgR (in Example 1) was ligated using Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((6) in FIG. 2). Ligations were then amplified by PCR using Hi-Fidelity 2× Master Mix (NEB), 2 μM of both oligos T7-7 and gRU (in Example 1) and 20 cycles of PCR (98° C. for 20 s, 60° C. for 20 s, 72° C. for 35 s). Amplification was verified by running a small aliquot on agarose gel electrophoresis. PCR amplified products were recovered using 0.6× AxyPrep beads (Axygen) according to the manufacturer's instructions, and resuspended in 15 μL of 10 mM Tris-HCl pH 8.


Samples were then used as templates for in vitro transcription reaction as described in Example 1.


Example 3: Direct Cutting with CviPII

30 μg of human genomic DNA was digested with 2 units of NtCviPII (New England Biolabs) for 1 hour at 37° C., followed by heat inactivation at 75° C. for 20 minutes. The size of the fragments was verified to be 200-1,000 base pairs using a fragment analyzer instrument (Advanced Analytical). The 5′ or 3′ protruding ends (as shown, for example, in FIG. 3) were converted to blunt ends by adding 100 units of T4 DNA polymerase (New England Biolabs), 100 μM dNTPs and incubating at 12° C. for 30 minutes. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL elution buffer. The DNA was then ligated to MlyI adapter (see, for example, Example 4) or BaeI/EcoP15I adapters (see, for example, Example 4) or BaeI/EcoP15I adapters (see, for example, Example 5)


Example 4: Use of MlyI Adapter

Adapter MlyI was made by combining 2 μmoles of MlyI Ad1 and MlyAd2 in 40 μL water. Adapter BsaXI/MmeI was made by combining 2 moles oligo BsMm-Ad1 and 2 moles oligo BsMm-Ad2 in 40 μL water. T7 adapter was made by combining 1.5 μmoles of T7-Ad1 and T7-Ad2 oligos in 100 μL water. Stem-loop adapter was made by combining 1.5 μmoles of gR-top and gR-bot oligos in 100 μL water. In all cases, after mixing adapters were heated to 98° C. for 3 min then cooled to room temperature at a cooling rate of 1° C./min in a thermal cycler.









TABLE 5







Oligonucleotides used with MlyI Adapter.










SEQ
Oligo




ID NO
name
Sequence (5′ > 3′)
Modification





4408
MlyI-
gagatcagcttctgcattgatgccagcagcccgagtcag
none



Ad1







4409
MlyI-
ctgactcgggctgctgtacaaagacgatgacgacaagcgtta
5′phosphate



Ad2







4410
BsMm-
gagatcagcttctgcattgatgcGGAGCCGCAGTACACTATCCAAC
none



Ad1







4411
BsMm-
GTTGGATAGTGTACTGCGGCTCCtacaaagacgatgacgacaagcg
5′phosphate



Ad2







4412
T7-Ad1
gcctcgagctaatacgactcactatagagNN
none





4398
T7-Ad2
Ctctatagtgagtcgtatta
5′phosphate





4413
gR-top
ttagagctagaaatagcaagttaaaataaggctagtccgttatcaa
5′phosphate




cttgaaaaagtggcaccgatcggtgctttttt






4414
gR-bot
aaaaaagcaccgactcggtgccactttttcaagttgataacggact
none




agccttattttaacttgctatttctagctctaaaac










The DNA containing the CCD blunt ends (from earlier section) was then ligated to 50 pmoles of adapter MlyI, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. The DNA was then recovered by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/mL BSA, pH 7.9). These steps eliminate small (<100 nucleotides) DNA and MlyI adapter dimers.


Purified DNA was then digested by adding 20 units of MlyI (New England Biolabs) and incubating at 37° C. for 1 hour to eliminate both the adapter derived sequences and the CCD (and complementary HGG) motifs. DNA was recovered from the digest by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 30 μL buffer 4.


The purified DNA was then ligated to 50 pmoles of adapter BsaXI/MmeI, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. The DNA was then recovered by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/mL BSA, pH 7.9). DNA was then digested by addition of 20 units MmeI (New England Biolabs) and 40 pmol/μL SAM (S-adenosyl methionine) at 37° C. for 1 hour, followed by heat inactivation at 75° C. for 20 minutes. DNA was then ligated to 30 pmoles T7 adapter using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL buffer 4, then digested with 20 units of BsaXI for 1 hour at 37° C. The guide RNA stem-loop sequences were added by adding 15 pmoles stem-loop adapter and using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 min. DNA was then recovered using a PCR cleanup kit (Zymo), eluted in 20 μL elution buffer and PCR amplified using HiFidelity 2× master mix (New England Biolabs). Primers T7-Ad1 and gRU (sequence 5′>3′ AAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4419) were used to amplify with the following settings (98° C. 3 min; 98° C. for 20 sec, 60° C. for 30 secs, 72° C. for 20 sec, 30 cycles). The PCR amplicon was cleaned up using the PCR cleanup kit and verified by DNA sequencing, then used as template for an in vitro transcription reaction to generate guide RNAs.


Example 5: Use of BaeI/EcoP15I Adapter

Adapter BaeI/EcoP15I was made by combining 2 moles of BE Ad1 and BE Ad2 in 40 μL water. T7-E adapter was made by combining 1.5 μmoles of T7-Ad3 and T7-Ad4 oligos in 100 μL water. In all cases, after mixing adapters were heated to 98° C. for 3 min then cooled to room temperature at a cooling rate of 1° C./min in a thermal cycler.









TABLE 6







Oligonucleotides used with BaeI/EcoP15I Adapter.










SEQ
Oligo




ID NO:
name
Sequence (5 > 3)
Modification





4416
BE
ActgctgacACAAgtatcTTTTTTTTTTgtttaaacTTTTTTTTTT
5′phosphate



Ad1
gatacACAAgtcagcagA






4416
Be
TctgctgacTTGTgtatcAAAAAAAAAAgtttaaacAAAAAAAAAA
5′phosphate



Ad2
gatacTTGTgtcagcagT






 12
T7-
gcctcgagctaatacgactcactatagag
none



Ade







4417
T7-
NNctctatagtgagtcgtatta
5′phosphate



Ad4







4418
stIgR
ttagagctagaaatagcaagttaaaataaggctagtccgttatcaa
5′adenylation




cttgaaaaagtggcaccgagtcggtgctttttt










The DNA containing the CCD blunt ends (from earlier section) was then ligated to 50 pmoles of adapter BaeI/EcoP15I, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. The DNA was then recovered by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4 (50 mM potassium acetate 20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/mL BSA, pH 7.9). Recovered DNA was then digested with 20 units PmeI for 30 min at 37° C.; DNA was then recovered by incubating with 1.2× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4. These steps eliminate small (<100 nucleotides) DNA and BaeI/EcoP15I adapter multimers.


DNA was then digested by addition of 20 units EcoP15I (New England Biolabs) and 1 mM ATP at 37° C. for 1 hour, followed by heat inactivation at 75° C. for 20 minutes. DNA was then ligated to 30 pmoles T7-E adapter using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL buffer 4.


Purified DNA was then digested by adding 20 units of BaeI (New England Biolabs), 40 pmol/μL SAM (S-adenosyl methionine) and incubating at 37° C. for 1 hour to eliminate both the adapter derived sequences and the CCD (and complementary HGG) motifs. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL elution buffer.


Recovered DNA was then ligated to the stlgR oligo using Thermostable 5′ AppDNA/RNA Ligase


(New England Biolabs) by adding 20 units ligase, 20 pmol stlgR oligo, in 20 μL ss ligation buffer (10 mM Bis-Tris-Propane-HCl, 10 mM MgCl2, 1 mM DTT, 2.5 mM MnCl2, pH 7 @ 25° C.) and incubating at 65° C. for 1 hour followed by heat inactivation at 90° C. for 5 min. DNA product was then PCR amplified using HiFidelity 2× master mix (New England Biolabs). Primers T7-Ad3 and gRU (sequence 5′>3′ AAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4419) were used to amplify with the following settings (98° C. 3 min; 98° C. for 20 sec, 60° C. for 30 secs, 72° C. for 20 sec, 30 cycles). The PCR amplicon was cleaned up using the PCR cleanup kit and verified by DNA sequencing, then used as template for an in vitro transcription reaction to generate the guide RNAs.


Example 6: NEMDA Method

NEMDA (Nicking Endonuclease Mediated DNA Amplification) was performed using 50 ng of human genomic DNA. The DNA was incubated in 100 μL thermo polymerase buffer (20 mM Tris-HCl, 10 mM (NH4)2SO4, 10 mM KCl, 6 mM MgSO4, 0.1% Triton® X-100, pH 8.8) supplemented with 0.3 mM dNTPs, 40 units of Bst large fragment DNA polymerase, and 0.1 units of NtCviPII (New England Biolabs) at 55° C. for 45 min, followed by 65° C. for 30 min and finally 80° C. for 20 min in a thermal cycler.


The DNA was then diluted with 300 μL of buffer 4 supplemented with 200 pmoles of T7-RND8 oligo (sequence 5′>3′ gcctcgagctaatacgactcactatagagnnnnnnnn) (SEQ ID NO: 4420) and boiled at 98° C. for 10 min followed by rapid cooling to 10° C. for 5 min. The reaction was then supplemented with 40 units of E. coli DNA polymerase I and 0.1 mM dNTPs (New England Biolabs) and incubated at room temperature for 20 min followed by heat inactivation at 75° C. for 20 min. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 30 μL elution buffer.


DNA was then ligated to 50 pmoles of adapter BaeI/EcoP15I, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. The DNA was then recovered by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/mL BSA, pH 7.9). Recovered DNA was then digested with 20 units PmeI for 30 min at 37° C.; DNA was then recovered by incubating with 1.2× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4. These steps eliminate small (<100 nucleotides) DNA and BaeI/EcoP15I adapter multimers.


Purified DNA was then digested by adding 20 units of BaeI (New England Biolabs), 40 pmol/μL SAM (S-adenosyl methionine) and incubating at 37° C. for 1 hour to eliminate both the adapter derived sequences and the CCD (and complementary HGG) motifs. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL elution buffer.


Recovered DNA was then ligated to the stlgR oligo using Thermostable 5′ AppDNA/RNA Ligase (New England Biolabs) by adding 20 units ligase, 20 pmol stlgR oligo, in 20 μL ss ligation buffer (10 mM Bis-Tris-Propane-HCl, 10 mM MgCl2, 1 mM DTT, 2.5 mM MnCl2, pH 7 @ 25° C.) and incubating at 65° C. for 1 hour followed by heat inactivation at 90° C. for 5 min. DNA product was then PCR amplified using HiFidelity 2× master mix (New England Biolabs). Primers T7-Ad3 (sequence 5′>3′ gcctcgagctaatacgactcactatagag) (SEQ ID NO: 12) and gRU (sequence 5′>3′ AAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4419) were used to amplify with the following settings (98° C. for 3 min; 98° C. for 20 sec, 60° C. for 30 secs, 72° C. for 20 sec, 30 cycles). The PCR amplicon was cleaned up using the PCR cleanup kit and verified by DNA sequencing, then used as template for an in vitro transcription reaction to generate the guide RNAs.

Claims
  • 1-144. (canceled)
  • 145. A method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, comprising: a. providing double-stranded DNA molecules, each comprising a sequence of interest 5′ to a PAM sequence, and its reverse complementary sequence on the opposite strand;b. performing an enzymatic digestion reaction on the double stranded DNA molecules, wherein cleavages are generated at the PAM sequence and/or its reverse complementary sequence on the opposite strand, but never completely remove the PAM sequence and/or its reverse complementary sequence on the opposite strand from the double stranded DNA;c. ligating adapters comprising a recognition sequence to the resulting DNA molecules of step (b);d. contacting the DNA molecules of step (c) with a restriction enzyme that recognizes the recognition sequence of step (c), whereby generating DNA fragments comprising blunt-ended double strand breaks immediately 5′ to the PAM sequence, whereby removing the PAM sequence and the adapter containing the enzyme recognition site; ande. ligating the resulting double stranded DNA fragments of step (d) with a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, whereby generating a plurality of DNA fragments, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence.
  • 146. The method of claim 145, wherein the nucleic acid-guided nuclease system protein is a CRISPR/Cas system protein.
  • 147. The method of claim 145, wherein the starting DNA molecules of the collection further comprise a regulatory sequence upstream of the sequence of interest 5′ to the PAM sequence.
  • 148. The method of claim 146, wherein the regulatory sequence comprises a promoter.
  • 149. The method of claim 148, wherein the promoter comprises a T7, Sp6, or T3 sequence.
  • 150. The method of claim 145, wherein the double stranded DNA molecules are genomic DNA, intact DNA, or sheared DNA.
  • 151. The method of claim 150, wherein the genomic DNA is human, mouse, avian, fish, plant, insect, bacterial, or viral.
  • 152. The method of claim 145, wherein the DNA segments encoding a targeting sequence are at least 22 bp.
  • 153. The method of claim 145, wherein the DNA segments encoding a targeting sequence are 15-250 bp in size range.
  • 154. The method of claim 145, wherein the PAM sequence is AGG, CGG, or TGG.
  • 155. The method of claim 145, wherein the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
  • 156. The method of claim 145, wherein step (b) further comprises (1) contacting the DNA molecules with an enzyme capable of creating a nick in a single strand at a CCD site, whereby generating a plurality of nicked double stranded DNA molecules, each comprising a sequence of interest followed by an HGG sequence, wherein the DNA molecules are nicked at the CCD sites; and (2) contacting the nicked double stranded DNA molecules with an endonuclease, whereby generating a plurality of double stranded DNA fragments, each comprising a sequence of interest followed by an HGG sequence wherein residual nucleotides from HGG and/or CCD sequences is (are) left behind.
  • 157. The method of claim 145, wherein step (d) further comprises PCR amplification of the adaptor-ligated DNA fragments from step (c) before cutting with the restriction enzyme recognizing the recognition sequence of step (c), wherein after PCR, the recognition sequence is positioned 3′ of the PAM sequence, and a regulatory sequence is positioned at the 5′ distal end of the PAM sequence.
  • 158. The method of claim 145, wherein the enzymatic reaction of step (b) comprises the use of a Nt.CviPII enzyme, and a T7 Endonuclease I enzyme.
  • 159. The method of claim 145, wherein step (c) further comprises a blunt-end reaction with a T4 DNA Polymerase, if the adapter to be ligated does not comprise an overhang.
  • 160. The method of claim 145, wherein the adapter of step (c) is either (1) double stranded, comprising a restriction enzyme recognition sequence in one strand, and a regulatory sequence in the other strand, if the adapter is Y-shaped and comprises an overhang; or (2) has a palindromic enzyme recognition sequence in both strands, if the adapter is not Y-shaped.
  • 161. The method of claim 145, wherein the restriction enzyme of step (d) is MlyI.
  • 162. The method of claim 145, wherein step (d) further comprises contacting the DNA molecules with an XhoI enzyme
  • 163. The method of claim 145, wherein in step (e) the DNA encoding a nucleic acid-guided nuclease system-protein binding sequence encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGC (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUU AUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2).
  • 164. The method of claim 145, wherein the targeted sequences of interest are spaced every 10,000 bp or less across the genome of an organism.
  • 165-234. (canceled)
  • 235. A kit comprising all essential reagents and instructions for carrying out the method of claim 145.
CROSS-REFERENCE

This is a U.S. National Stage Application under 35 U.S.C. § 371 of International Application No. PCT/US2016/065420, filed on Dec. 7, 2016, which claims the benefit of U.S. Provisional Application No. 62/264,262, filed Dec. 7, 2015, and of U.S. Provisional Application No. 62/298,963, filed Feb. 23, 2016, each of which is hereby incorporated by reference in its entirety.

PCT Information
Filing Document Filing Date Country Kind
PCT/US16/65420 12/7/2016 WO 00
Provisional Applications (2)
Number Date Country
62298963 Feb 2016 US
62264262 Dec 2015 US