The instant application contains a Sequence Listing which has been submitted in XML format via Patent Center and is hereby incorporated by reference in its entirety. Said XML copy, created on Apr. 24, 2024, is named 046528-7097US2 Sequence Listing.xml and is 602,112 bytes in size.
Single DNA molecules when stretched out can provide a wide window to genomic data. Although commercial devices to stretch single DNA molecules exist, the length of linearized DNA achieved is still short for efficient large-scale genome assembly via sequence mapping. There is a need in the art for new devices and methods that are useful for immobilizing and linearizing oligonucleotides and/or for the interrogation of immobilized oligonucleotides. This disclosure addresses that need.
Further, restriction mapping has been applied in human genomics for physical mapping of genome fragments based on restriction enzyme cutting and was used extensively during the Human Genome Project to guide genome assembly. However, traditional restriction mapping is highly labor-intensive and requires large amounts of sample. More importantly, a traditional restriction map provides a “fingerprint” of the genomic DNA, not an ordered sequence of restriction sites. Therefore, there is a need in the art for DNA mapping methodologies that overcome the drawbacks of the currently practiced mapping techniques, the present invention addresses this need.
In one aspect, the invention provides a method of immobilizing and linearizing an oligonucleotide, wherein the method comprises providing a micropatterned substrate, wherein the micropatterned substrate comprises at least one binding region having a first width; and at least one non-binding region having a second width; contacting the micropatterned substrate with a solution comprising a at least one oligonucleotide molecule, wherein one end of at least one oligonucleotide molecule attaches to the binding region of the micropatterned substrate; and combing the at least one oligonucleotide molecule such that the at least one oligonucleotide molecule extends from the binding region into at least a portion of an adjacent non-binding region; thereby immobilizing and linearizing the at least one oligonucleotide molecule.
In another aspect, the invention provides a method of optically mapping DNA, wherein the method comprises providing a micropatterned substrate as described elsewhere herein; contacting the micropatterned substrate with a solution comprising at least one molecule of DNA, wherein one end of at least one molecule of DNA attaches to the binding region of the micropatterned substrate; and combing the at least one molecule of DNA such that the at least one molecule of DNA extends from the binding region into at least a portion of an adjacent non-binding region; and optically mapping the at least one molecule of DNA.
In yet another aspect, the invention provides a method of on surface DNA sequencing library generation, wherein the method comprises providing a micropatterned substrate, as described elsewhere herein; contacting the micropatterned substrate with a solution comprising at least one molecule of DNA, the at least one molecule of DNA comprising a T7 promoter; wherein one end of at least one molecule of DNA attaches to the binding region of the micropatterned substrate; combing the at least one molecule of DNA such that the at least one molecule of DNA extends from the binding region into at least a portion of an adjacent non-binding region; and generating a DNA sequencing library.
In yet another aspect, the invention comprises a method of DNA sequencing library generation, the method comprises providing a micropatterned substrate, as described elsewhere herein; contacting the micropatterned substrate with a solution comprising at least one molecule of DNA, the at least one molecule of DNA wherein one end of at least one molecule of DNA attaches to the binding region of the micropatterned substrate; combing the at least one molecule of DNA such that the at least one molecule of DNA extends from the binding region into at least a portion of an adjacent non-binding region; amplifying the at least one molecule of DNA using at least one isothermal amplification method, thereby forming an amplified product; eluting the amplified product from the device; and generating a DNA sequencing library using the eluted amplified product.
In yet another aspect, the invention provides a method of on surface DNA sequencing library generation, wherein the method comp comprises providing a micropatterned substrate, the micropatterned substrate, as described elsewhere herein; contacting the micropatterned substrate with a solution comprising at least one molecule of DNA, the at least one molecule of DNA wherein one end of at least one molecule of DNA attaches to the binding region of the micropatterned substrate; combing the at least one molecule of DNA such that the at least one molecule of DNA extends from the binding region into at least a portion of an adjacent non-binding region; performing a tagmentation reaction on the at least one molecule of DNA, thereby generating at least one tagmented product; amplifying the at least one tagmented product, thereby forming an amplified product; and generating a DNA sequencing library using the amplified product.
In yet another aspect, the invention provides a method of on surface DNA sequencing, wherein the method comprises: providing a micropatterned substrate, as described elsewhere herein; contacting the micropatterned substrate with a solution comprising at least one molecule of DNA, the at least one molecule of DNA wherein one end of at least one molecule of DNA attaches to the binding region of the micropatterned substrate; combing the at least one molecule of DNA such that the at least one molecule of DNA extends from the binding region into at least a portion of an adjacent non-binding region; and sequencing the at least one molecule of DNA.
In certain embodiments, the binding regions and the non-binding regions alternate across at least a portion of the substrate
In certain embodiments, the first width is 10 to 40 μm and the second width is 10 to 170 μm.
In certain embodiments, the combing comprises generating a receding meniscus.
In certain embodiments, the micropatterned substrate comprises a silica wafer.
In certain embodiments, the binding region comprises at least one selected from the group consisting of octenyl, octadecyl, docosenyl, SU-8, polymethylmethacrylate, polydimethylsiloxane, and polystyrene.
In certain embodiments, the non-binding region comprises at least one selected from the group consisting of polyethylene glycol (PEG), polyvinylpyrrolidone, and their derivatives.
In certain embodiments, the methods described herein further comprise coating the micropatterned substrate with a hydrogel.
In certain embodiments, the optical mapping of the at least one molecule of DNA comprises contacting the at least one molecule of DNA with at least one nicking endonuclease; incorporating at least one fluorescent dye-terminator into the at least one molecule of DNA; staining the at least one molecule of DNA; and imaging the at least one molecule of DNA.
In certain embodiments, the nicking endonuclease is selected from the group consisting of Nt.BspQI, Nb.BbvCI, Nt.BbvCI, Nb.BssSI, Cas9 nickase
In certain embodiments, optically mapping the at least one molecule of DNA comprises contacting the at least one molecule of DNA with at least one guide RNA sequence complementary to at least a portion of the at least one molecule of DNA and an inactive CRISPR-Cas9; and imaging the at least one molecule of DNA.
In certain embodiments, the imaging comprises fluorescence microscopy. In certain embodiments, the imaging comprises epifluorescence or total internal reflection fluorescence microscopy (TIRF).
In certain embodiments, the isothermal amplification method is selected from the group consisting of strand displacement at nicks or PNA-displaced sites.
In certain embodiments, sequencing the at least one molecule of DNA comprises one or more techniques selected from group consisting of: direct DNA sequencing by DNA polymerase with reversible DNA terminators; generating RNA from the at least one molecule of DNA using RNA polymerase and sequencing using T7 reverse transcriptase with reversible RNA terminators; amplifying the at least one DNA molecule on the substrate, and sequencing with reversible DNA terminators, by DNA ligation reaction with DNA ligase; and sequencing-by-hybridization using fluorescently labeled short oligonucleotides.
In certain embodiments, the method is performed in a flow cell.
In yet another aspect, the invention provides a method for mapping a genome, wherein the method is capable of resolving a single nucleotide polymorphism (SNP), the method comprising introducing to the genome a CRISPR/Cas9 system comprising at least one single-guide RNA (sgRNA) specific for a target sequence or a plurality of target sequences across the genome and a Cas9 D10A, wherein the CRISPR/Cas9 system nick labels the target sequence, and the target sequence or genome is analyzed.
The following detailed description of preferred embodiments of the invention will be better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there are shown in the drawings embodiments which are presently preferred. It should be understood, however, that the invention is not limited to the precise arrangements and instrumentalities of the embodiments shown in the drawings.
Described herein is a microfabricated surface that can not only comb the DNA molecules efficiently but also provides for sequence-specific enzymatic fluorescent DNA labelling. By modifying a glass surface with two contrasting functionalities, such that DNA binds selectively to one of the two regions, DNA extension can be controlled, which is known to be critical for sequence-recognition by an enzyme. Moreover, the surface modification provides enzymatic access to the DNA backbone, as well as minimizing nonspecific fluorescent dye adsorption. These enhancements make the designed surface suitable for largescale and high-resolution single DNA molecule studies.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although any methods and materials similar or equivalent to those described herein can be used in the practice for testing of the present invention, the preferred materials and methods are described herein. In describing and claiming the present invention, the following terminology will be used.
It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
The articles “a” and “an” are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, “an element” means one element or more than one element.
“About” as used herein when referring to a measurable value such as an amount, a temporal duration, and the like, is meant to encompass non-limiting variations of ±20% or ±10%, ±5%, ±1%, or ±0.1% from the specified value, as such variations are appropriate.
As used herein, the term “inactive CRISPR-Cas9” or “dCas9” means a mutant Cas9 enzyme that is devoid of endonuclease activity, limiting its function to programmable RNA-guided sequence-specific binding to DNA.
As used herein, the term “fluorescence microscopy” means optical microscopy that employs the phenomenon of fluorescence to form an image of the object. The fluorescing object is excited by light of higher wavelength, and the emitted light of lower wavelength is collected to form an image.
The term “total internal reflection fluorescence microscopy” or “TIRF” is a fluorescence microscopy technique consisting a special illumination technique to generate evanescent light waves at the fluorescent sample interface. This results in high axial resolution, usually 200 nm or less, suitable to screen out high fluorescence background.
The term “fluorescent dye-terminator” means a fluorophore-tagged reversible-terminating nucleotide. A reversible-terminating nucleotide or a reversible terminator is a modified deoxynucleotide analog that reversibly terminates primer extension by a polymerase. Upon mild chemical treatment or photocleavage, the termination function in reversed and primer extension may resume.
Ranges: throughout this disclosure, various aspects of the invention can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 2.7, 3, 4, 5, 5.3, and 6. This applies regardless of the breadth of the range.
In one aspect, the invention provides a method of immobilizing and linearizing an oligonucleotide, the method comprising providing a micropatterned substrate, the micropatterned substrate comprising at least one binding region having a first width and at least one non-binding region having a second width; wherein the binding regions and non-binding regions alternate across at least a portion of the substrate; contacting the micropatterned substrate with a solution comprising a plurality of oligonucleotides, wherein one end of at least one oligonucleotide molecule attaches to a binding region of the micropatterned substrate; and combing the at least one oligonucleotide molecule such that the at least one oligonucleotide molecule extends from the binding region into at least a portion of an adjacent non-binding region; thereby immobilizing and linearizing the at least one oligonucleotide molecule.
In various embodiments, the first width is about 10 to about 40 μm and the second width is about 10 to about 170 μm. In various embodiments, the first width is about 10 μm and the second width is about 40 μm. In various embodiments, the first width is about 10 μm and the second width is about 15 μm. In various embodiments, the first width is about 10 μm and the second width is about 170 μm.
The materials from which the micropatterned substrate is made are not particularly limited. A person of ordinary skill in the art in possession of this disclosure is able to select an appropriate substrate onto which the binding and non-binding regions are placed. In various embodiments, the micropatterned substrate comprises a silica or a silicon wafer.
In various embodiments, the binding region comprises a material to which DNA and other oligonucleotides attach with high affinity. The attachment may be covalent or non-covalent. In various embodiments, the binding region comprises at least one selected from the group consisting of octenyl, octadecyl, docosenyl. In various embodiments the binding region comprises octenyl. In various embodiments the binding region comprises a hydrophobic polymer coating. In various embodiments the hydrophobic polymer coating is selected from the group consisting of SU-8, polymethylmethacrylate, polydimethylsiloxane, polystyrene. Any long-chain aliphatic functional group such as hexyl, undecyl, (or their vinyl-terminated derivatives-hexenyl, undecenyl) are known to immobilize DNA molecules and therefore may be used as hydrophobic polymers to form the binding region in various embodiments of the invention. Multiple hydrophobic polymers are also known to do the same. In various embodiments, the hydrophobic polymer is selected from the group consisting of cyclicolefin copolymers, polydimethylsiloxane, poly(methyl methacrylate) and polystyrene.
In various embodiments, the non-binding comprises a material to which DNA and other oligonucleotides attach do not attach or do not attach with high affinity. In various embodiments, the non-binding region comprises at least one selected from the group consisting of polyethylene glycol (PEG) and polyvinylpyrrolidone. In various embodiments the non-binding region comprises PEG or a PEG derivative including but not limited to Tween, e.g. Tween-20, or Triton X-100.
One example of a method for producing the micropatterned substrate is illustrated in
Various techniques for DNA combing are known in the art and all of them are contemplated in combination with the present invention. In various embodiments, the combing comprises generating a receding meniscus.
In various embodiments, the method further comprise coating the micropatterned substrate with a hydrogel after the DNA combing step is performed. In various embodiments, the hydrogel comprises polyacrylamide. In various embodiments, the hydrogel comprises agarose, paraformaldehyde or PEG-acrylate.
In various embodiments of this aspect and the aspects described below, the method is performed in a flow cell. Various configurations of flow cell are available and can be selected by a person of ordinary skill in the art.
In one aspect, the invention provides a method of optically mapping DNA, the method comprising: providing a micropatterned substrate, the micropatterned substrate comprising: at least one binding region having a first width; and at least one non-binding region having a second width; wherein the binding regions and the non-binding regions alternate across at least a portion of the substrate; contacting the micropatterned substrate with a solution comprising at least one molecule of DNA, wherein one end of at least one molecule of DNA attaches to the binding region of the micropatterned substrate; and combing the at least one molecule of DNA such that the at least one molecule of DNA extends from the binding region into at least a portion of an adjacent non-binding region; and optically mapping the at least one molecule of DNA.
Various methods of optically mapping DNA are known to one of ordinary skill in the art and all are contemplated for use in combination with the present invention. In various embodiments, optical mapping of DNA is performed by using nicking endocnucleases and DNA polymerase to insert various fluorescent dye-terminators into the molecule or molecules of DNA under interrogation. In various embodiments, optically mapping the at least one molecule of DNA comprises contacting the at least one molecule of DNA with at least one nicking endonuclease; incorporating at least one fluorescent dye-terminator into the at least one molecule of DNA; staining the at least one molecule of DNA; and imaging the at least one molecule of DNA.
In various embodiments, various nicking endonucleases are employed depending on the sequence of the DNA molecule under interrogation. In various embodiments, the nicking endonuclease is selected from the group consisting of Nt.BspQI, Nb.BbvC1, Cas9 nickase, Nb.BssSI.
In various embodiments, incorporating fluorescent dye-terminators into is performed by contacting the at least one DNA molecule with a solution comprising one or more fluorescent dye terminators and at least one DNA polymerase. A person of skill in the art is able to select a suitable polymerase based on the specifics of the method as described herein.
In various embodiments, optically mapping comprises contacting the DNA with a solution comprising inactive CRISPR-Cas9 (dCas9) and a suitable guide RNA based on the sequence of the DNA to be interrogated such that the guide RNA/dCas9 complex binds to the DNA. The bound complex is then detected. In various embodiments, optically mapping the at least one molecule of DNA comprises contacting the at least one molecule of DNA with at least one guide RNA sequence complementary to at least a portion of the at least one molecule of DNA and an inactive CRISPR-Cas9; and imaging the at least one molecule of DNA.
In various embodiments, imaging comprises any technique that allows the detection and location of the labeled DNA molecules. In various embodiments, imaging comprises fluorescence microscopy. In various embodiments, imaging comprises epifluorescence or total internal reflection fluorescence microscopy (TIRF). In various embodiments, the method further comprises various steps of data processing to interpret data obtained during the imaging step. Various software is available commercially and a person of ordinary skill in possession of this disclosure is able to select a suitable technique from the relevant literature or to generate their own methodology.
In another aspect, the invention provides a method of on-surface DNA sequencing library generation, the method comprising immobilizing and linearizing DNA as described above, and contacting the at least one molecule of DNA with at least one RNA polymerase, thereby generating at least one molecule of RNA. Following RNA generation, the library may be generated by contacting the at least one molecule of RNA with at least one reverse transcriptase, thereby converting the at least one molecule of RNA to cDNA, followed by eluting the cDNA from the device. The eluted cDNA is used to generate a DNA sequencing library. In some aspects, the at least one molecule of DNA comprises a T7 promoter to facilitate RNA generation.
In another aspect, the invention provides a method of on-surface DNA sequencing library generation, the method comprising immobilizing and linearizing DNA as described above, and amplifying the at least one molecule of DNA using at least one isothermal amplification method, thereby forming an amplified product; and eluting the amplified product from the device. The eluted amplified product is converted to a DNA sequencing library. In various embodiments, the isothermal amplification method is selected from the group consisting of strand displacement at nicks or PNA-displaced sites.
In another aspect, the invention provides a method of on-surface DNA sequencing library generation, the method comprising immobilizing and linearizing the DNA as described above and performing a tagmentation reaction on the at least one molecule of DNA, thereby generating at least one tagmented product; amplifying the at least one tagmented product, thereby forming an amplified product. The amplified product is eluted from the device and is used to generate a DNA sequencing library. In various embodiments, the DNA sequencing library is generated by contacting the amplified product with at least one RNA polymerase, thereby generating at least one molecule of RNA; and contacting the at least one molecule of RNA with at least one reverse transcriptase, thereby converting the at least one molecule of RNA to cDNA and generating a DNA sequencing library.
The methods of generating DNA sequencing libraries described herein may be directed to the entire genome or to targeted regions. In various embodiments the DNA molecules are chosen based on target specific labeling using a CRISPR-Cas9 labeling system before performing the above steps.
In another aspect, the invention provides a method of on-surface DNA sequencing, the method comprising immobilizing and linearizing DNA as described and sequencing the at least one molecule of DNA. In various embodiments, sequencing the at least one molecule of DNA comprises one or more techniques selected from group consisting of direct DNA sequencing by DNA polymerase with reversible DNA terminators; generating RNA from the at least one molecule of DNA using RNA polymerase and sequencing using T7 reverse transcriptase with reversible RNA terminators; amplifying the at least one DNA molecule on the substrate, and sequencing with reversible DNA terminators, by DNA ligation reaction with DNA ligase; and sequencing-by-hybridization using fluorescently labeled short oligonucleotides.
In yet another aspect, the invention provides a method for mapping a genome, wherein the method is capable of resolving a single nucleotide polymorphism (SNP), the method comprising introducing to the genome a CRISPR/Cas9 system comprising at least one single-guide RNA (sgRNA) specific for a target sequence or a plurality of target sequences across the genome and a Cas9 D10A, wherein the CRISPR/Cas9 system nick labels the target sequence, and the target sequence or genome is analyzed.
In certain embodiments, the analyzing is by nucleotide sequencing and/or imaging.
In certain embodiment, the genome is a human genome or a microbial genome. In certain embodiments, the method is capable of distinguishing a microbe from another closely-related microbe.
In certain embodiments, the SNP is in a protospacer adjacent motif (PAM SNP) sequence. In certain embodiments, the at least one sgRNA targets a PAM and/or a PAM SNP.
In certain embodiments, the method is capable of mapping a genomic region that spans a length of at least 1 kb, 10 kb, 100 kb, 300 kb, or 500 kb in the genome.
In yet another aspect, the invention provides a method of defining a long distance haplotype in a genome, the method comprising administering to the genome a CRISPR/Cas9 system comprising a Cas9 D10A and a plurality of single-guide RNAs (sgRNAs) specific for a plurality of loci of a genomic region or a plurality of target regions across the genome, wherein the CRISPR/Cas9 system nick labels the plurality of loci of the genomic region or the plurality of target regions across the genome, and the target sequence or genome is analyzed thereby defining the long distance haplotype in the genome.
In certain embodiments, the genome is a human genome or a microbial genome.
In certain embodiments, the plurality of sgRNA comprises at least one sgRNA that targets a PAM or a PAM SNP.
In yet another aspect, the invention provides a method for customized mapping of a whole genome, the method comprising, nick labeling the genome with a CRISPR/Cas9 system and analyzing the nucleotide sequence, wherein the CRISPR/Cas9 system comprises a Cas9 D10A and a plurality of sgRNAs designed by a method comprising:
In certain embodiments, the microbe is distinguished at the strain level.
The invention is further described in detail by reference to the following experimental examples. These examples are provided for purposes of illustration only, and are not intended to be limiting unless otherwise specified. Thus, the invention should in no way be construed as being limited to the following examples, but rather, should be construed to encompass any and all variations which become evident as a result of the teaching provided herein.
Without further description, it is believed that one of ordinary skill in the art can, using the preceding description and the following illustrative examples, make and utilize the present invention and practice the claimed methods. The following working examples therefore, specifically point out the preferred embodiments of the present invention, and are not to be construed as limiting in any way the remainder of the disclosure.
Glass coverslips (22×22 mm, VWR 48366-067) were used as substrates to covalently graft octenyl, PEG, and 1-amino-undecane (AU) functional groups via silanization reaction with 7-octenyltrimethoxysilane (OTMS) (Gelest, SIO6709.0), 2-[methoxy (polyethyleneoxy) 6-9 propyl] trimethoxysilane (PTMS) (Gelest, SIM6492.7), and 11-aminoundecyltriethoxysilane (AUTS) (Gelest, SIA0630.0) respectively. Briefly, surface groups of cleaned substrates were activated by treatment with either highly corrosive “piranha” solution or air plasma etching (Femto science, CUTE, 200W 1-3 min). Activation exposed silanol groups on the glass surface, and under low-humidity conditions (<10% RH) reacted with the silane solution producing clear coatings of the respective functional groups. Reaction temperatures were between 21 and 23° C.
Micropatterning was performed in a class 10,000 cleanroom using positive photolithography. The fabrication process flow is shown in
Photomasks were designed using a CAD program and ordered from CAD/Art Services, Inc (Bandon, OR). A single pattern contained repetitive regions of inked and transparent bands with definite line widths and spacing. For example, one pattern consisted of 10 um-wide inked lines with 40 um-spacing, that we term ‘10-40’. Similarly, 10-10, 10-15, 20-90 and 40-170 patterns were also designed. The objective was to maximize the area of PEG region containing combed DNA for fluorescence visualization, without any loss in DNA combing density.
Mammalian cells were embedded in gel plugs and High Molecular Weight DNA was purified as described in a commercial large DNA purification kit (BioRad #170-3592). Plugs were incubated with lysis buffer and proteinase K for four hours at 50° C. The plugs were washed and then solubilized with GELase (Epicentre). The purified DNA was subjected to 2.5 hours of drop-dialysis. It was quantified using Quant-iT dsDNA Assay Kit (Life Technology), and the quality was assessed using pulsed-field gel electrophoresis.
Briefly, DNA samples were prepared for molecular combing in 50 mM MES, 100 mM NaCl, pH 5.5-6.0 at concentrations ranging from 0.1 to 0.6 ng/μL. The substrate was first immersed into DNA solution for a two-to-twenty-minute dwell time to allow the partially denatured tail ends to interact with the substrate. It was then withdrawn at a rate of 100 μm/s using a translational stage (Thorlabs MTS25-Z8).
An SU-8 mold with channel widths ranging from 1 to 18 mm and heights ranging from 10 to 180 μm was fabricated. After casting PDMS, individual channels were cut out and fluid ports were bored with a biopsy punch. The face of the imprinted PDMS block was then air plasma treated and adhered to the functionalized substrate to create a liquid-tight flow cell. DNA was adsorbed and linearized using flow cells. Briefly, 2-4 μL of YOYO-1-stained (100 nM) λ bacteriophage DNA in TE buffer (pH 8.0) was added into the flow cell port. The shear force exerted by the flowing buffer solution linearized the DNA as it adsorbed onto the positively-charged AU surface.
Polyacrylamide gel was used to maintain a stable aqueous environment around the DNA backbone. After combing the DNA onto micropatterned substrate, a low-adhesion PVC tape (18733, Semiconductor Equipment Corp) that was cut to specific dimensions (as that of the desired ‘microliter-well’) was transferred onto the micropatterned substrate. This tape acted as a stencil delimiting the casting area of polyacrylamide gel. Polyacrylamide gel was prepared (4-10%) and pipetted at one-end of the microliter-well. A glass slide that was coated with the PVC tape was used to spread the gel droplet throughout the stenciled microliter-well area. After 5 mins of casting time, the slide and micropatterned substrate are gently separated from each other. The polyacrylamide layer is then hydrated immediately with CutSmart 1×buffer, before preparing for the next step in device assembly.
Polyacrylamide gel overlay: The linearized DNA is susceptible to damage under the effect of flow forces. A polyacrylamide gel overlay helps prevent this damage. But addition of a gel layer would impede diffusion kinetics of the reagents, unless it is made as a thin film with a thickness 10 μm or below. Reaction times with the current prototype, that uses 75 μm gel, are in the range of 1-1.5 h—this will be reduced to <1 min if a 1 μm-thick gel overlay is used. However, fabricating films of such low thickness was challenging, possible due to insufficient diffusion during gel polymerization. We devised a way to fabricate thin polyacrylamide gel films by the addition of methacrylate functional group (or equivalent) to the PEG sections of the device so as to seed gel formation. Addition of a participating chemical group to the surface has resulted in films of lower thicknesses than without the participating group.
Polyacrylamide gel casting device: The gel was cast by using a spacer whose height can be controlled. A specially designed device was constructed to enable thin film fabrication on the micropatterned substrate. This device consists of a PDMS-coated glass slide, defined photoresist spacer films, and inlet and outlet ports for addition of the pre-polymer gel mixture. PDMS was coated on a glass slide to form a strong, durable hydrophobic coating. We have used SU-8 photoresist to form the spacer and defined its height by the viscosity and spin speed during its coating on the PDMS-coated glass slide. PDMS, being too hydrophobic for SU-8 spread, breaks the SU-8 film after spin coating. For this, we optimized an SU-8 coating protocol with extended soft-bake times (5-15 min) on hotplate at lower temperatures (than the recommended 95° C.), followed by soft-bake in a gravity oven (15-20 min) at 95° C.
Temperature and microfluidics control: The gel-overlaid micropatterned substrate is mated with an optimized microfluidic channel array, made of a machinable polymer such as PMMA, PDMS, and others. The assembled device is placed in a compact heat control instrument that uses a thermoelectric element to maintain optimal reaction temperature throughout the sequencing reaction. The heat control instrument would be capable of maintaining reaction temperatures in the range of 37-65° C. The primary performance aspect for the instrument is temperature stability. Using temperature probes local to the reaction volume, we will optimize the control parameters. In a variation, these temperature probes may be embedded into the microchannel array to provide a closed-loop control.
Enzymatic reactions were performed in two formats: (1) PDMS reaction wells assembled atop micropatterned substrate, and (2) PDMS-PMMA composite assembly on top of the substrate with a cast PA gel.
PDMS slabs, that were cast in plastic dishes, were cut into approximately 12×20 mm blocks. PDMS was adhered to the functionalized substrate by either double-sided tape or plasma activation. PDMS adhered using double sided tape was first mated to a strip of double-sided tape and then an array of reaction wells was created using a 4 mm biopsy punch. PDMS adhered with plasma activation first had an array of wells punched out, followed by a 2-minute plasma treatment (Harrick Plasma, PDC-32G). DNA was combed onto functionalized substrates, allowed to dry at room temperature for 5 minutes, and the prepared PDMS well blocks were carefully positioned onto the targeted combing region. Each well was used for a unique experimental reaction condition. This microwell-format was used for reaction without a protecting hydrogel layer.
A PMMA sheet was laser cut to form the top and bottom layers of the device assembly, as well as to generate molds for PDMS gaskets that will surround the gel region of the microwell-plate. PDMS was cast into these molds and the resulting gaskets were mated to the PMMA top layer and placed over the gel-coated substrate such that the gaskets surrounded the gel area without any contact. This assembly was then clamped to the PMMA bottom layer. The mouths of the microliter-wells are sealed with a tape, creating a tightly-sealed compartment for carrying out reactions.
T7 phage DNA (500 ng) was added into a combing reservoir containing 50 mM MES, 100 mM NaCl, pH 6.0 buffer and homogenized for 1 hour before combing onto 10-10 and 10-15 micropatterned substrates. Reaction wells were assembles as described above. Combed DNA molecules were rehydrated with rehydration buffer (0.1% BSA, 20 μM NTPs, 1 mM DTT, 5 mM MgCl2, 50 mM Tris, pH 7.8) for 2 minutes. T7 RNA polymerase (RNAP) reaction buffer from New England Biolabs diluted to 1×concentration (40 mM Tris-HCl, 5 mM MgCl2, 1 mM DTT, pH 7.8) was then added to prime the same well for an additional minute. The master mix for transcription reaction is prepared in a 0.6 ml microcentrifuge tube prior to pipetting into the well. Reaction mix contains 2.5 U of T7 RNAP, 10 μM Cy3-UTP, 200 μM NTPs, 100 μM DTT, 1 U/μL RiboGuard RNase inhibitor (Lucigen), 1×T7 RNAP reaction buffer. The mixture was gently pipetted into the well and the device was incubated in a humidified oven at 37° C. for 1 h. The well was evacuated and washed with 1×RNAP reaction buffer. The DNA backbone was stained with YOYO-1.
Human DNA (500 ng) was suspended in a combing reservoir containing 50 mM MES, 100 mM NaCl, pH 6.0 buffer and let homogenize overnight before combing onto micropatterned substrates. After the assembly of PDMS reaction wells, combed DNA molecules were rehydrated with NEB 3.1 buffer for up to 15 minutes and then evacuated. Nt.BspQI (5 U) diluted in NEBuffer 3 (New England Biolabs) was added to the reaction well and incubated at 37° C. in a humidified oven for an hour. This will create the nicking sites for polymerase extension. The reaction mix was now evacuated and washed twice with NEBuffer 2.0, following which up to 5 U of either Taq DNA polymerase or DNA polymerase I (New England Biolabs) and dye-nucleotide mix (25-133 nM each of ATTO-532-dUTP, dATP, dGTP, gCTP) were added and let incubate at 37° C. for an hour inside a humidified oven to incorporate fluorescent dUTPs. After washing away the free dyes, the DNA backbone is subsequently stained with (YOYO-1) iodide (Life Technologies, Y3601). For some observations, labeled DNA was not stained before visualization on the microscope.
To ensure observation of full-length λ-DNA molecules within the PEG section, DNA was concatemerized by heat-treating in 10 mM Tris-HCl buffer, pH 7.8, for 10 min at 65° C. followed by 1 h incubation at 37° C. After this, DNA was suspended in a reservoir for combing onto a 10-40 substrate. PA gel was cast onto the surface of two microliter-wells and a device was assembled as described earlier. A nicking mix with 20 U of Nb.BbvCI (New England Biolabs) in 1×CutSmart buffer (New England Biolabs) was added onto the gel surface of one of the microliter-wells. In the control well, 1×Cutsmart buffer was added. The device was incubated at 37° C. for 2 h, after which both wells were evacuated and washed with 1×CutSmart buffer. Next, a labeling mix with 10 U of Klenow Fragment (3′→5′ exo-) (New England Biolabs), ATTO-532-dUTP (266 nM), and dATP/dGTP/dCTP (each 133 nM) in NEBuffer 2 (New England Biolabs) was added to both the wells. The labeling reaction was performed at 37° C. for 2 h, following which the wells were evacuated and washed with 1×NEBuffer 2 thoroughly before imaging. After acquiring a few images, 100 nM YOYO-1 solution was added to the wells to stain the DNA backbone for re-imaging.
Imaging was performed on a custom-built, semi-automated inverted fluorescent microscopy system. It includes a Rapid Automated Modular Microscope and Modular Infinity Microscope system (ASI) with an XYZ motorized stage (ASI, MS-2000), CRISP autofocus system (ASI), and high-speed filter wheel (Finger Lakes Instrumentation, HS-625) combined with a 100× oil-immersion objective (Olympus, UPlanSApo, NA=1.40). Diode-pumped solid-state laser light sources with 473 nm and 532 nm wavelengths (LASEVER, LSR473ML-100, LSR532ML-200), controlled through u Manager (Open Imaging) using a custom-made TTL control system were used. Images were acquired with an iXon EMCCD (Andor, DU-888E-C00-#BV) or ORCA-Flash4.0 V2 CMOS (Hamamatsu, C11440).
Data collected from the imaging system is processed on a computing cluster in ImageJ using previously developed computational methods and algorithms together with manual curation. Images were first processed to remove background signal and normalize signal intensity. Once processed, images were analyzed semi-automatically using the Ridge Detection ImageJ plug-in.
In this method, we create nicks in the linearized DNA either by use of an enzyme or using physical means such as heat. The created 3′ ends will be extended using a strand-displacing polymerase to generate flap strands. This generated single strand DNA flap would be sequenced using sequencing-by-ligation or sequencing by hybridization. In a variation, these flap sequences can be quickly detected using the robust Hybridization Chain Reaction, providing a simpler means to map DNA sequences.
Combinations: The above methods of manipulation may be combined and applied to the linearized DNA.
Two Haemophilus influenzae strains with complete genome sequences were used: the standard lab strain Rd KW20 (RR722, NC_000907) and a marked derivative of clinical isolate 86-028NP (RR3131, NC_007416.2, carrying novobiocin and nalidixic acid resistance alleles, NovR and NalR)(25,31,32). Bacterial culture followed standard protocols; cells were grown to stationary phase (OD600nm=1.2) in supplemented brain-heart infusion (10 μg/ml hemin 2 μg/ml NAD) shaking at 37° C., and then cells were harvested by centrifugation at 4,000 rpm for 5 minutes before DNA extractions (33,34). Purification of ultra-high MW DNA fragments followed the Bionano Prep Cell Culture DNA Isolation Protocol. Briefly, cells were: (a) resuspended in cell buffer (˜5×109 CFU/ml); (b) embedded in 2% low-melt agarose (BioRad) plugs to minimize shearing forces; (c) lysed using Bionano cell lysis buffer supplemented with 167 μl Proteinase K (Qiagen) rocking overnight at 50° C.; (d) RNase treatment by adding 50 μl of RNase A solution and incubating the plugs for 1 hour at 37° C. (Qiagen); and (c) washing in TE buffer with intermittent mixing. Finally, DNA was purified from low-melt agarose plugs by drop dialysis. Plugs were melted at 72° C., then incubated with 2 μl agarase (Thermo Fisher Scientific) for 45 minutes. Melted plugs were dialyzed into TE buffer using 0.1 μm Millipore membrane filters for 45 minutes at a ratio of 15 ml buffer per ˜200 μl sample. DNA was allowed to homogenize overnight at room temperature before fluorometric quantification using the Qbit dsDNA BR kit (Thermo Fisher Scientific).
sgRNA oligos: sgRNAs were encoded on 55 nt DNA oligos with a 5′ T7 promoter sequence (5′-TTCTAATACGACTCACTATAG-3′) (SEQ ID NO: 446), followed by the target 20mer sequence, complementary to the target gDNA sequence, and finally an overlap sequence (5′-GTTTTAGAGCTAGA-3′) (SEQ ID NO: 447). Individually synthesized sgRNA oligos were then pooled into an equimolar mixture. sgRNA complementary oligo: An 80 nt long oligo was designed with the 3′ end complementary to the overlap sequence and remainder encoded the Cas9 binding sequence (5′-AAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAA CTTGCTATTTCTAGCTCTAAAAC-3′) (SEQ ID NO.448). All oligos are obtained from Integrated DNA Technology. The sgRNA oligo mix was hybridized to the sgRNA complementary oligo (at 10 μM each) in 1×NEBuffer2 (New England BioLabs, NEB) with 2 mM dNTPs at 90° C. for 15 sec followed by 43° C. for 5 min. To complete dsDNA synthesis, the hybridization mixture was incubated at 37° C. for 1 hr with 5 U of Klenow Fragment 3′→5′ exo-(NEB). To degrade linear ssDNA remaining, the dsDNA was then treated with Exonuclease I in 1×Exonuclease I reaction buffer (NEB) for 1 hr at 37° C. Finally, dsDNA was purified using QIAquick Nucleotide Removal Kit (Qiagen) and eluted in 30 ul elution buffer. Quality and concentration were assessed using agarose gel electrophoresis and the Synergy H1Hybrid Multi-Mode Reader (Bio Tek).
sgRNA was synthesized using HiScribe T7 High Yield RNA Synthesis Kit (NEB) following the Standard RNA Synthesis protocol. In summary, lug dsDNA was incubated with 1×reaction buffer, 10 mM NTPs and T7 RNA polymerase enzyme mix at 37° C. for 2 hrs followed by DNase I treatment at 37° C. for 15 min to remove dsDNA from the reaction. sgRNA was then purified using RNA Clean & Concentrator Kits (Zymo Research). The concentration of the purified sgRNA was assessed using Synergy H1Hybrid Multi-Mode Reader (Bio Tek).
For DNA nicking using the 48 and 162 sgRNA mix (Table 3 and Table 4), 1.25 μM of the synthesized sgRNA was first incubated with 5 μM of Cas9 D10A (NEB) in 1×NEBuffer 3.1 (NEB) at 37° C. for 15 min to form a sgRNA-Cas9 complex. 300 ng of the DNA sample was then added to the sgRNA-Cas9 complex mixture and incubated at 37° C. for 60 min. For DNA nicking with both Cas9 and Nt.BspQI, 2.5 μM gRNA was first incubated with 100 ng of Cas9 D10A in 1× NEBuffer 3.1 at 37° C. for 15 min. After that, 300 ng of DNA and 5 U of Nt.BspQI (NEB) were added to the sample mixture and incubated at 37° C. for 2 hours. The nicked DNA samples were then labeled using 5 U Taq DNA Polymerase (NEB), 1×thermopol buffer (NEB), 266 nM free nucleotides mix (dATP, dCTP,dGTP (NEB) and Atto-532-dUTP (Jena Bioscience)) at 72° C. for 60 min. the labeled sample was then treated with Proteinase K at 56° C. for 30 min and 1 μM IrysPrep stop solution (BioNano Genomics) was added to the reaction.
Labeled DNA samples were stained and prepared for loading on an Irys Chip (BioNano Genomics) following manufacturer instructions. The sample was then linearized and imaged. The stained samples were loaded and imaged inside the nanochannels following the established protocol. Each Irys Chip contains two nanochannel devices, which can generate data from >60 Gb of long chromosomal DNA fragments (>150 kb). The image analysis was done using BioNano Genomics commercial software (Irys View 2.5) for segmenting and detecting DNA backbone YOYO-1 staining, similar to early optical mapping methods, and localizing the green labels by fitting the point-spread functions.
Briefly, the assembler is a custom implementation of the overlap-layout-consensus paradigm with a maximum likelihood model. An overlap graph was generated based on the pairwise comparison of all molecules as input. Redundant and spurious edges were removed. The assembler outputs the longest path in the graph and consensus maps were derived. Consensus maps are further refined by mapping single-molecule maps to the consensus maps and label positions are recalculated. Refined consensus maps are extended by mapping single molecules to the ends of the consensus and calculating label positions beyond the initial maps. After the merging of overlapping maps, a final set of consensus maps was output and used for subsequent analysis. RefAligner works similarly but compares molecules directly to an in silico nicked reference instead of first forming contigs. These maps were then opened in Irsyview visualization software from BioNano Genomics.
The results of the experiments are now described.
The micropatterned surface is dual-functionalized with two repetitive functional areas. One area is functionalized with octenyl, which is hydrophobic and adsorbs the tail-ends of DNA molecules. The other area is functionalized with polyethylene glycol (PEG), a passivating group which does not attract DNA and prevents the attachment of free stain and labeled nucleotide molecules. With this micropatterned surface, DNA molecules bind in an end-selective manner to the hydrophobic octenyl surface only, and then linearize uniformally through PEG regions by receding meniscus through dynamic combing. DNA molecules can be stretched in an orderly fashion with less potential for formation of both intermolecular intersections and intramolecular loops.
For DNA combing to work on this micropatterned surface, the DNA ends need to be attached preferentially to the octenyl-functionalized surface. Dynamic molecular combing (coverslip withdrawn from a reservoir) is the most widely used method of generating such receding meniscus among others including gravity, dragging, capillary flow, gas pressure, wicking with filter paper, and evaporation. The DNA adsorption and linearization on octenyl-functionalized and AU-functionalized surfaces were first compared. Parallel, linear individual molecules adsorbed to octenyl surface in an orientation perpendicular to the receding meniscus, while on AU surface, DNA molecules were found to be adsorbed in a globular form. This is consistent with the fact that a coiled DNA molecule was expected to adsorb at multiple points along its backbone through the electrostatic attraction between the negatively-charged DNA backbone and the weakly cationic AU layer. In order to linearize the DNA molecules on an AU-functionalized surface, a concurrent shearing flow was necessary to generate linearization at an adequate rate compatible with the adsorption kinetics between DNA and alkylamines. Clearly, preferential attachment of DNA ends to octenyl-functionalized surface is critical for the dynamic combing.
A micropatterned octenyl/PEG surface was designed in part to alleviate the complications of DNA combing such as DNA aggregations and high fluorescent background of salinized substrate.
To characterize the silanization process, the contact angles on octenyl and PEG regions were measured using Surface Analyst 3001 (BTG Labs). After glass substrates were grafted with OTMS, activation of the surface using air-plasma (200 W, 3 min) yielded a contact angle of 73° (reaction time, 4 h). Piranha-activation followed by overnight silanization resulted in substrates with marginally higher hydrophobicity (contact angle, 76°) but this was accompanied with reduced reproducibility. Contact angle measured on a 10-10 micropatterned substrate after PEG-grafting (PTMS, 32.5 nM) was found to be 26° in the PEG-only region and 45° in the patterned region. Different contact angles confirm the presence of contrasting surface functional groups, octenyl and PEG. Interestingly, contact angle on the 10-10 substrate, that has an even distribution of the two modifications, was in between the contact angles on octenyl and PEG-coated surfaces.
Photolithography soft-bake temperature as well as PEG-silane concentration were found to affect DNA attachment. To assess the impact, photolithography was performed on two octenyl-coated glass substrates with different soft bake (without post-exposure bake) temperatures, 95 and 115° C. PR was stripped, and substrates were cleaned thoroughly before combing T7 DNA followed by visualization. For the substrate baked at 115° C., DNA density was observed to be lower in the previously PR-covered region than in the PR-stripped region. However, the 95° C.-baked substrate had similar DNA densities on both, previously resist-covered and resist-stripped regions. This interaction between unexposed PR (SC1813) and octenyl functional group (or any silane) at 115° C. has not been reported previously.
To ascertain that the PR thin film shielded underlying octenyl layer from plasma treatment, T7 DNA was combed on micropatterned substrates that were plasma-treated, PR-stripped and cleaned. DNA combing density on the octenyl region remained unaffected compared to that observed on substrates that were not treated with plasma. Moreover, there was no DNA attached to the activated glass surface indicating a high degree of hydrophilicity.
The optimum PEG-silane concentration was found to be was 32.5 nM. At higher concentrations (>240 nM), DNA combing density was found to decrease dramatically, likely due to parallel reaction with unreacted methoxy groups (or hydroxyls) in the octenyl region. Higher DNA concentration (3×) in the combing reservoir did not improve DNA density significantly.
A micropatterned glass substrate with 10 μm wide octenyl and 15 μm wide PEG sections (10-15) was combed with λ bacteriophage DNA. The substrate was immersed into λ-DNA solution for an extended incubation time (compared to an unpatterned octenyl substrate) of 15 minutes, after which it was withdrawn at 0.1 mm/s, dip-stained with a reservoir containing YOYO-1, and imaged (
To evaluate the stretching factor (s.f.) of DNA on OTMS-PEG substrate, λ-DNA was combed on a 10-40 substrate as well as on an unpatterned OTMS substrate. DNA backbone length measurements on the unpatterned substrate yielded a peak at 21 μm (
The resulting mean s.f. on PEG section was found to be ˜84%. This clearly reflects the overall reduction in s.f. due to PEG surface modification. By increasing density of the grafted PEG, we may potentially be able to under-stretch the DNA further. In general, the micropatterned substrates produced marginally higher stretching uniformity compared to unpatterned OTMS substrates, with standard deviations of 3 μm and 4.1 μm, respectively. Additionally, individual molecules were observed to be less aggregated on OTMS-PEG substrates compared to OTMS substrates.
Further investigated was the linearization of long human DNA (hgDNA) molecules onto OTMS-PEG substrates. Typical resulting images are shown in
Table 1: Molecular size distribution of human DNA combed on 10-40 and 40-170 micropatterned OTMS-PEG substrate. Nested length distributions obtained from the dataset used for
The OTMS-PEG substrates when viewed on epifluorescence microscope at high intensity illumination (473 nm, 100-150 mW; 532 nm, 150-500 mW) barely presented any autofluorescence to enable distinction between the PEG and octenyl sections. As noted above, YOYO-1 dye molecules adsorb more to octenyl sections relative to PEG sections. To further verify reduction in adsorption of fluorescent dyes in the PEG sections, a micropatterned 10-20 substrate was incubated with a solution containing ATTO-532-dUTP (100 nM). After washing out the free dye-nucleotides from the surface, the fluorescence intensity in the octenyl section was found to be about fifteen times higher in the PEG section. One can easily observe more distinctive bright spots in octenyl sections (
RNA transcription of T7 DNA on micropatterned surface was then tested. An evaporating oil, 1-dodecanol, was used to obtain non-overstretched DNA molecules (close to 100% of T7 DNA contour length). It was observed that dodecanol residue after combing, did not evaporate over time at room temperature or when oven-dried (65° C.) for 4 min. Moreover, reusing the same DNA reservoir with a floating dodecanol layer was not practical. By manipulating the common interface between DNA solution, combing substrate and air (triple-phase contact line) via surface modification, a high density combing of non-overstretched T7 DNA was achieved. After DNA combing, the transcription reaction on a 10-15 OTMS-PEG substrate could be performed. The results showed T7 RNAP successfully interacted with DNA molecules and was able to locate promoter sites to initiate transcription (
To test if two successive enzymatic reactions may be performed on the micropatterned substrate, nick-labeling was performed on hgDNA molecules linearized on a 10-40 substrate (
Taken together from the above experiments, the PEG sections not only significantly reduce the random adsorption of free fluorescent dyes but is also amenable to enzymatic reaction.
To demonstrate on-surface DNA mapping via fluorescent nucleotide incorporation, λ-DNA was used as a model genome and nick-labeled at the seven BbvCI sites (
Hence, to begin analysis, labeled λ-DNA molecules were identified by delineating a rectangle 60 px in height (corresponding to end-to-end distance between farthest BbvCI sites, 27.8 kbp) with an arbitrary width, to act as a reading frame, in randomly selecting molecules with at least 4 labels within the boundaries of the rectangle. These molecules are shown in
Each peak in
This is the first report of on-surface fluorescent labeling and mapping of long DNA molecules that has the potential for adaption to high-throughput whole genome mapping, with the flexibility to perform multiple cyclic enzymatic reactions on fixed DNA. Going further, whole genome as well as targeted single DNA interrogation should be possible on this platform, such as multi-color mapping and base-by-base sequencing. As noted in previous section, stretching of λ-DNA was less uniform than in nanochannel arrays, but the flexibility to perform multiple labeling steps on fixed DNA is highly significant and can open up new ways to analyze DNA sequence.
Clonal human DNA template, RP11-1116M14, was used to perform proof-in-principle experiments. For this, DNA was combed on a micropatterned (˜8 μm-wide ‘DNA binding’, ˜42 μm-wide ‘DNA-passivating’) glass coverslip. Circular holes were punched through a PDMS slab which was bonded to the coverslip. The device now contained 5-6 microliter reactor wells that can be operated independently. The chip is then mounted atop the microscope stage for image capture. In some instances, reaction and wash buffers have been introduced using a syringe pump setup and a modified PDMS layer (flow cell), while the chip was held on the microscope throughout the experiment. EnGen® Spy dCas9 (SNAP-tag®) was purchased from New England Biolabs. Fluorophore-tagged tracrRNA (Atto-550 and Alexa Fluor 647N) was purchased from Integrated DNA Technologies. Multiple probes were designed in-house, to target RP11-1116M14 as well as human genomic data. After validating the design from a reference map incorporating tolerance factors known to affect dCas9 targeting, crRNAs were ordered from GE and Integrated DNA Technologies.
After complexing the crRNA (probe containing the target sequence) with tracrRNA (universal sequence tagged with fluorophore) to result in guide-RNA complex (gRNA), dCas9 is added to complex with the gRNA. Further, this solution was added to the designated well containing combed DNA molecules to perform labeling. Imaging was either performed with or without the evacuation of the well, as well as before or after DNA backbone staining with YOYO-1.
In the experiment to demonstrate two cycles of labeling on a single DNA molecule observed in real-time, protease purchased from Qiagen was used to break down the dCas9-gRNA complex from the first labeling step. After this, protease solution was evacuated and washed multiple times before introducing the second dCas9-gRNA complex.
Reference maps were generated using Basic Local Alignment Search Tool (BLAST) and SAMtools for the analysis of experimental data.
The length of the template DNA (bacterial artificial chromosome, BAC), RP11-1116M14 (M14 in short), is around 160 kbp including regions of bacterial genome that it was cloned with. This translates to 54.4 μm when stretched to true length (100% stretched). In the combing experiments with the device, a stretching factor of nearly 1 is achieved, validated using λ-phage DNA (48.5 kbp). The width of the DNA-passivating region was chosen (42 μm) to allow for maximum length of DNA template to be probed by the labeling chemistry.
To map single M14 molecules, two repeating motifs were targeted, one of which (ALU-1) is relatively more frequent and results in denser clusters of target sequences than the other (22q-Whole). The ALU-1 probe has been designed to target the Alu element, the most abundant repetitive element comprising around 11% of the human genome. The reference map generated with the ALU-1 probe is shown in
Around 50 mm2 area of the glass surface area was scanned (4-5 wells) to obtain images for each labeled-DNA species. Images were captured in the TIRF configuration, by manually scanning for single DNA molecules stretched fully across the DNA-passivating region.
The obtained images were analyzed using ImageJ by aligning single molecules against the respective reference map (
The images of molecules aligned against the reference maps are shown in the panels below. Each molecule is indexed and compared against the reference assuming 100% stretching, i.e. no over-stretching or under-stretching, although not all molecules will be 100% stretched.
Long (>1 Mbp) backbone-stained single DNA molecules are linearized using the proposed microchannel device in an aqueous buffer. The excess backbone-staining dye is washed away with fresh buffer, and the location of DNA molecules on device surface is registered using an automated XYZ stage and microprocessor.
An equimolar mixture of individual (A, T, G, C) fluorophore-tagged reversible terminator nucleotides is prepared in an aqueous buffer with added DNA polymerase enzyme. In the first cycle, the master mix is introduced into the microchannel to initiate single base incorporation at sequence-specific nicked sites (enzymatic) or randomly generated single strand breaks (enzymatic or heat). After the incorporation step, the excess master mix is cleared out and the channel is washed using a wash buffer. At the registered positions of DNA molecules, fluorescence signal is collected on all four imaging channels, with a base call made based on the fluorophore detected. Subsequently, a second cycle of single base incorporation is carried out, washed, and imaged. This process will continue until desired or until read errors begin to increase. Typically, using this chemistry on bound-DNA templates, read length is 300 bp and above.
The above method is used to sequence DNA at regions along a single long molecule. The additional co-locational information of sequenced regions enables accurate (high confidence) mapping/assembly of the sequenced fragments. This method of measurement is not only unique but also provides valuable genetic data in disease diagnosis.
In one instance, DNA sequencing is initiated at specific sites across the long molecules simultaneously using nickase enzymes. After single nucleotide incorporation and detection, this step is repeated multiple times to sequence the hotspots on individual DNA molecules.
In another instance, the DNA sequencing is initiated at several random sites across the long DNA molecules simultaneously, either by nucleases, heat or UV exposure. After single nucleotide incorporation and detection, this step is repeated multiple times to sequence DNA. At the end, the DNA backbone is stained with an intercalating dye, and visualized under a multichannel fluorescence microscope. This will define the linkage between the sequencing reads.
The main strategy for long-range optical mapping is based on measuring the distances between the short sequence motifs recognized by nicking endonucleases (6-8 bp) on single long DNA molecules. The key information is the pattern of distances between motifs. Current labeling strategies can only detect single-base differences at polymorphisms that happen to coincide with nickase motifs, which has limited the potential applications of optical mapping. For example, the H. influenzae strains RR722 and RR3131 share a 100 kb region (819-916 kb of RR722, NC_000907, and 884-981 kb of RR3131, NC_007416) with 99% sequence similarity. The Nt.BspQI sequence motif maps for the two strains are almost identical for this region, except for one extra nick of the RR3131 genome, due to an adenine single-nucleotide difference from RR722, thus the nicking enzyme labels the RR3131's allele but not RR722's allele (
A strategy was devised to use multiplexed CRISPR-Cas9 labeling to distinguish single-nucleotide variants affecting 3′-NGG PAM sites since the editing system has a strong requirement for the PAM immediately following the 20 bp recognition sequences. Genetic variation impacting PAM sites (i.e. if one of the G bases of a PAM in one genome is variant in another) is expected to strongly impact labeling, even if they share the 20 bp recognition sequence. Thus, it is predicted that strong differential labeling at gRNA-guided PAM variants could reliably differentiate the single base difference between two genomes over long distances.
To demonstrate single-base resolution of multiplexed CRISPR-Cas9 labels at variation affecting PAM sites, gRNAs targeting three distinct 20mer recognition sequences were designed, but for each one of the two H. influenzae strains lacked a 3′-NGG PAM signal due to single nucleotide variation (Table 2). Labeling by both Nt.BspQI and CRISPR-Cas9 were performed in a single tube reaction, and the results of optical mapping are shown in
Single-base variation away from either G in the PAM nearly eliminated the corresponding labeling. At “locus 1” (NTHI0914-hypothetical protein of RR3131 and HI_0755-conserved hypothetical protein of RR722), the two strains share the same 20 bp recognition sequence (5′-AAAAATTGCTGCATCTTCTT-3′(SEQ ID NO: 427) as the gRNA, but RR3131 has a 3′-TGG PAM sequence, while RR722 has a TGA sequence instead. CRISPR-Cas9-mediated optical mapping clearly shows high-efficiency labeling at position 885289 in RR3131 (˜90% labeling), whereas RR722 molecules totally lacked labels (0%) at position 819899 (red arrow at “locus 1” in
In summary, labeling efficiency was over 90% for gRNAs with an NGG PAM sequence, whereas almost none of the molecules were labeled if there is an alternative allele in the PAM sequences. This is in contrast to the variable labeling efficiencies seen for different mismatches from the 20 nt recognition sequences in the sgRNA experiments below. These results suggest that a customized optical mapping using gRNAs to target many of these polymorphisms (or “PAM SNPs”) could be an effective means to define long-distance haplotype structure in human genomes. It could also be applicable in other sample types, particularly mixed microbial specimens. The new DLE labeling strategy (6 bp motif) from BioNano genomics provides 50% more labeling site than Nt.BspqI labeling (7 bp motif) in human genome and potentially other genomes, which may resolve some haplotype features. However, the density of 1 snp per megbase is not enough to construct the whole-genome haplotype based on SNPs considering the the average DNA length of 300 kb.
An in silico analysis of whole genomes from the 1000 genomes project (36,37) was performed to determine the potential number and distribution of heterozygous PAM SNPs in the human genome, Out of 161 million NGG sites in hg38, on average, there are 220,000 heterozygous PAM SNPs in a single diploid human genome. In addition, there are on average 40,000 heterozygous indels (>4 bp) within potential CRISPR-Cas9 recognition sequences (20 bp+NGG); >2 bp heterozygous indels within the 20 bp gRNA recognition sequence preferentially target the matching allele. Together, the genomic density of these sites is ideal to generate long-distance haplotypes using CRISPR-Cas9 labeling of PAM sites with single molecules in these experiments longer than 100 kb.
The previously described method to synthesize multiple sgRNAs in a single tube reaction was adapted.
In the second customized mapping strategy, the mapping patterns were customized across a genome by selecting sets of specific single-guide RNAs (sgRNAs) for features of interest. This is particularly useful in designing different patterns to differentiate similar genomes or conserved sequences between strains or haplotypes. In designing the patterns, it is critical to avoid evenly distributed sgRNAs, because only long molecules across the entire pattern can be uniquely aligned. To test this, first a two custom optical mapping patterns were designed using the different H. influenzae bacterial strains, lab strain Rd KW20 (RR722), and a marked derivative of clinical isolate 86-028NP (RR3131) as the model systems.
48 sgRNAs were designed to target a 300 kb region of RR722 (0-350 kb of NC_000907), which shares high sequence similarity with RR3131 strain (0-315 kb NC_007416). Each sgRNA was designed to have a single perfect match of 20 bases upstream of PAM NGGs based on the Rd reference genome (cr 1). These 48 sgRNAs are evenly distributed across the 300 kb region of RR722 (RR722 reference map in
A single mixture of 48 sgRNA was then generated, which was used to label and map targeted regions in both the RR722 and RR3131 genomes. The individual molecules are indicated as thin lines that are aligned to blue references in
CRIPSR-Cas9 tagging is prone to off-target labeling. It is important to reduce off-target labeling as much as possible, especially when trying to use custom-target mapping to map sequences with high similarity. The 48 sgRNAs (20 base recognition sequence) against the RR3131 reference were aligned. 15 sgRNAs out of the above 48 sgRNAs that have imperfect matches to the RR3131 genome. Some of them result in off-target labeling in RR3131. In
7 of these 15 sgRNAs show several partial matches (<8 bases) across the 300 kb region, but without a PAM NGG next to the best match, which could not be labeled. These 7 sgRNAs are designated as “N/A” in Table. 4 and are not likely to contribute to off-target labeling. 6 of the remaining 8 sgRNAs were found to match the RR3131 reference around off-target loci with a PAM motif and a single mismatch in the 20 recognition sequences. These 6 are contributing to the off-target labeling and designated as “off-target” in Table 3. The final 2 sgRNAs of the 15 did not produce a label in RR3131 and are listed as “No label”. Of the two, the sgRNA at 219206 of RR722 ((SEQ ID NO: 442) TTGTTTTACGATATAATACGNGG) also shows a single base mismatch on RR3371 strain, but did not result in off-target labeling. The sgRNA at 323878 of RR722 (SEQ ID NO:444)(TAATCAAGCATTAGATAGCTNGG) has several mismatches close to the 5′ end and also did not result in off-target labeling.
All six sgRNAs that caused high-frequency off-target labeling had a single mismatch to the target sequences of RR3131. Five of six had the single mismatch close to the 5′ end, distal from the PAM sequences, except the sgRNA at 86065 of RR722 (SEQ ID NO: 434) (GTTACATTACACACAAACTINGG) with the single mismatch at the 3rd base upstream of PAM. For example, the sgRNA at 21722 of RR722 ((SEQ ID NO: 430) (GCTTTTTAGGATATCGTCCCNGG)) is designed to target the RR722 genome at coordinate 21722, but it also matches a synthetic position in RR3131 (at coordinate 21698) with a single mismatch (G/A) at the 9th base from the 5′ end. The off-target labeling of the RR3131 chromosome around 21698 was likely caused by this sgRNA. For the same reason, the sgRNA at 59529 of RR722 ((SEQ ID NO: 432)GCGGTATCCACCCCCACTGCNGG) likely generated the off-target labeling on RR3131 around 60913 with a single mismatch at the 3rd base. Notably, the off-target labeling on RR3131 is more efficient with sgRNA designed for RR722 at 59529 locus than the sgRNA of RR722 at 21722 locus, which may reflect that its mismatch is closer to the 5′ end.
Overall, these results are consistent with the observation that the last 8-10 seed bases of sgRNA upstream of the PAM are more important for reducing the off-target labeling (38-41), and that multiple mismatches also reduce off-target labeling.
TGAAGGGATAAATATTGCGATGG
GCGTAAAGCATTAGATAGCTTGG
Based on the target labeling results and the reports that 8 seeding bases immediately upstream of the PAM sequence (NGG) have higher discrimination, the design pipeline was optimized to select a set of sgRNAs spanning the full RR722 genome in a series of four stepwise filters: a) collected all possible sgRNAs with a single perfect match to the RR722 reference (all 20mers followed by a 3′ PAM NGG that occur only once in RR722) were first collected; 40870 such possible sgRNAs were available. (b) From those, only the 8-base seeding sequences proximal to the PAM with single perfect hits to the reference were collected. If an 8-base seed had multiple perfect hits to the reference, it was discarded since these had a high chance of contributing to off-target labeling. The remaining sgRNAs (15339) all had a single perfect hit of 20 bases and a single perfect hit of the 8-base seeding sequences. (c) Since all 8 base-seeding sequences have multiple hits with a single mismatch, a third filter was then applied to minimize the number of hits in the 8-base seeding sequences with single mismatches to RR722. This resulted in 1,507 gRNAs with <5 singly mismatched hits in all 8-base seeding sequences. (d) From this dataset, off-target nicks were further minimized by keeping the sgRNAs with one more mismatch in the first 12 bases from the 5′ end (415 remains). The sgRNA design flow chart is summarized in
This set of 162 sgRNAs was synthesized in a single-tube reaction and used to label RR722 chromosomal DNA. The resulting samples were run on the optical mapping setup described in the methods section. Total 0.5 Gb data with an average molecule length of 244 kb was collected.
Here it is shown for the first time that individual alleles can be differentiated at any locus across the whole genome using CRISPR-Cas9 fluorescent labeling. It could be an effective means to define long-distance haplotype structure in target regions of complex genomes, such as the human genome. This approach provides several advantages over long read sequencing techniques, including Oxford nanopore sequencing and PacBio SMRT sequencing techniques. First, the average DNA length is at 300 kb, which is more than an order longer than the read length of long-read sequencing techniques. In turn, it can span across much longer haplotype structure without computational assembly. Secondly, no target enrichment is needed to scan the whole genome to define long-distance haplotype structure in target regions, while maintaining low cost at about $500 per genome. While the target enrichment of a single region of 300 kb in the long-read sequencing target is still very challenging, as a 300 kb region counts the only 10000th of the genome. A large amount of input materials are needed to generate enough starting material to create a sequencing library. Without enrichment, the cost is prohibitive to haplotype a large number of samples. Thirdly, the cost can be further reduced by generating multiple sets of sgRNAs to haplotype multiple regions.
Traditionally, genome mapping strategy is based on measuring distances between short (6-8 bp) sequence motifs across the genome, which were interrogated either by restriction enzyme cutting, or fluorescent tagging with nickase or methyltransferase (reference). However, the distribution of motifs is fixed for any given genome. Here it is also for the first time that one can customize the mapping patterns by designing a custom set of multiple sgRNAs to fluorescently tag any 20 bp sequences with CRISPR-cas9 genome editing system. This will greatly expand the applications of genome mapping in targeting specific features of interests, clinically relevant structural variants, repetitive regions, and other inaccessible regions by sequence motif labeling. More overall, one added benefit is that our multiple sgRNAs provide more sequence information than sequence motif mapping, multiple different 20mers vs the same 6-8mer. This will greatly increase the accuracy of pinpointing the breakpoints of structural variants and other specific features. The in silico mapping human genome was performed by targeting repetitive elements such as ALU and SINE-1 repeats. It was estimated that one sgRNA from ALU and one sgRNA from LINE-1 will result in 90% coverage of the human genome. This coverage is similar to the existing optical mapping schemes with Nt.Bspq1 and DLE labeling offered by Bionano Genomics. Off target hits are a lot more complicated in the human genome due to the larger genome size and long stretches of repeats.
The custom-designed genomic labeling strategies described here could find wide applications for analyzing complex genomes like humans', including determining long-range haplotype structure, higher precision breakpoint calling for complex structural variants, and improved resolution of complex repeat arrays. These strategies may also find applications in microbial comparative or community analyses since one can design gRNAs to identify characteristic markers on large genomic fragments of different microorganisms (e.g. pathogenic species) and virulence genes (e.g. antibiotic resistance genes and alleles).
Table 4. shows a set of 48 sgRNAs designed based on RR722 reference sequences. sgRNA sequences are shown below. #N/A indicates that the sgRNAs don't have a hit in RR3131. The 55 mer oligos are ordered and used in sgRNA synthesis, with the promoter sequence underlined and the overlap sequence in bold.
TTCTAATACGACTCACTATAGGCAATCAAAGATGCAGC
TTCTAATACGACTCACTATAGTGTATGCACTGCACAGAA
TTCTAATACGACTCACTATAGTTTTCTTCAATATGAAGC
TTCTAATACGACTCACTATAGGCTTTTTAGGATATCGTC
TTCTAATACGACTCACTATAGCGAATTTCTTTATATAAG
TTCTAATACGACTCACTATAGGGCGATGTGCTACATATG
TTCTAATACGACTCACTATAGTTACCCGTTTCTACTGCA
TTCTAATACGACTCACTATAGATTATTATTGTGGGATTA
TTCTAATACGACTCACTATAGGCGGTATCCACCCCCACT
TTCTAATACGACTCACTATAGTAGCCTAGGCTTAGAGA
TTCTAATACGACTCACTATAGGTGTGACATTTTGCGCTA
TTCTAATACGACTCACTATAGGTTACATTACACACAAAC
TTCTAATACGACTCACTATAGGGGGCGTAAATTCTTAAC
TTCTAATACGACTCACTATAGGCATATTGTTTCACCTGA
TTCTAATACGACTCACTATAGACAACGTCATCTCGGTTA
TTCTAATACGACTCACTATAGGAATTAAAAGAACCGAT
TTCTAATACGACTCACTATAGCGTAAAGTTTTACTTTGC
TTCTAATACGACTCACTATAGGATCTTATAAAGATAAGA
TTCTAATACGACTCACTATAGTTTTTAATCGGCGGAATT
TTCTAATACGACTCACTATAGACAACCCGCAATCTTGCC
TTCTAATACGACTCACTATAGAATATTATCGGTTGGTTA
TTCTAATACGACTCACTATAGACTACAGGTATGAATCAG
TTCTAATACGACTCACTATAGTCTCTGATTTAGTTAAACT
TTCTAATACGACTCACTATAGTGAGAAAAAAGATTTGCT
TTCTAATACGACTCACTATAGGTTAAACCTACAGTGCCG
TTCTAATACGACTCACTATAGGCTTCTCGATTTCACCAA
TTCTAATACGACTCACTATAGTGGATAGTCGCACACCTT
TTCTAATACGACTCACTATAGGCGAGTTTTTATGAGTAA
TTCTAATACGACTCACTATAGGCGACGATGACGCTAAC
TTCTAATACGACTCACTATAGTCTTCAATAGGACTGAAC
TTCTAATACGACTCACTATAGTTGTTTTACGATATAATA
TTCTAATACGACTCACTATAGTAGGTACTGTAAGAGAT
TTCTAATACGACTCACTATAGTAACGTATTAGATGCCAC
TTCTAATACGACTCACTATAGAATGGGTCGGAAAGTAC
TTCTAATACGACTCACTATAGGTTAAGTTTAGTCATCGG
TTCTAATACGACTCACTATAGCGAAGGGATAAATATTG
TTCTAATACGACTCACTATAGATTTTCATTGTATAGATG
TTCTAATACGACTCACTATAGCAGCCGTGGAAATCCTTC
TTCTAATACGACTCACTATAGTAGCACTTAAAAGAGGA
TTCTAATACGACTCACTATAGTTACTCAAATAGTGCGTT
TTCTAATACGACTCACTATAGGCCTGATGTGGATTCTAT
TTCTAATACGACTCACTATAGGCTCTGCCAATAATTTCT
TTCTAATACGACTCACTATAGTAATCAAGCATTAGATAG
TTCTAATACGACTCACTATAGTTTTGCATAATTCGGGGA
TTCTAATACGACTCACTATAGGCGAGTTTACTTTGAAAT
TTCTAATACGACTCACTATAGTATTGGATGATTTTGACA
TTCTAATACGACTCACTATAGATTAAAACGAATCCGAGT
TTCTAATACGACTCACTATAGTTACTCTTGGATTAGTGG
TTCTAATACGACTCACTATAGCAAAGCGCACCACGACTGACGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGACTGAACCTTGCAGTACCTCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTTTGTGTACTCAGCCCGACCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGAGTAGCCGTTGCAGGGACACGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGATTGGAAAAAAACAGGCCACGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGTAGTGGATACAACCTCGGCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGAATAAACATCACCTGTACACGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGCGCAAAAATTTTCGGCGGGCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGCAATGGCTAATTGGGCTCGGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTTTATGATAAAAGGACTCGCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGATGTAGCTCGGTTCGACTCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGAGAAAGTGGGGCGGGAGCCTGT
TTTAGAGCTAGA
TTCTAATACGACTCACTATAGAATACAGGTACTGCCCCGCGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTAGCTCAGTTGGTAGAGCCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGCACCAATTCCGCCCGCCCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTTACAAACCAATGCCGTCGAGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGCAAAGCAACGACCAACAGCCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGATTGTAGAAGTACCGAGAGCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGCGATTAATGGCAGTGGACACGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGATACAATGTTGAAGCGCCTCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGCGGCGATTGTTTCCTTCCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGCGGGTACAGAAGAGGCTCCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGCGGCGGGTAAAATCCCGGGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGCTTTTTGCCCCCTCCTCTCGTTTT
AGAGCTAGA
TTCTAATACGACTCACTATAGTGGTTATTTTATCTTCCCCGGTTTT
AGAGCTAGA
TTCTAATACGACTCACTATAGCCGCCGCCACTGCCTCCCTCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTATCCAAAGGCTCTCACTCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGCAGTGAAATTAGCGGCAGGCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGCAATACGCTCACTACGCGCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGCGTAATATTTGACGAGACTCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTTGGCGATTCTATCGGGCCTGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTAACCAGTTACGCGAGAGCCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGAAATCGTCGATACAGACCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTGTATTGGGACTGGACTCCAGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGAGTTATTTTTCCCCGATCCTGTTTT
AGAGCTAGA
TTCTAATACGACTCACTATAGATCTAATGCACCACTAGGACGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTCGGAGACGAGTGCCTCGCCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGTCAAAAGTGTTCGCGGGCCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTGTTCGTGCCGTGGGAGGCGGTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGAAAACTTACGTTGTCTTCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTGTTCTGGTAAAGAGACCTCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTGTCGGTTGGTAACCTACCGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTTTCAATTTATTGACCTCCGGTTTT
AGAGCTAGA
TTCTAATACGACTCACTATAGCCGCCATTTTATCCCCCGGCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGCCAACCATTAATCCGTCTCGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGATGGGAAGAAAACTGACGGAGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGACTTTCCATACGGAGGGCGCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTCAACTCACTGGGGGACGGCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGAGCACAATGGGCTTGGACCCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGAGTGACATTCCGCACTCGTCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGGTGCGTTACCTTACCCTCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTCTACACGTTGATAGGTGCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTACATTACACCAGTCCCCGGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTCCATTACTGGTATGGTCCGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGGATTTAGAAAACGGCGCGCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGGAACCAACGCACGGAACCCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGCCCGCTCGTTTTGACCTACGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGCTGATGTGTTACTCCATCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTTTGTTACTTTTAGTCCCGTGTTTT
AGAGCTAGA
TTCTAATACGACTCACTATAGGCCCAAAATGCACGGACTAGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGATGCGGATATTCTCGTCCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGACAAAGCTGAAAACGGCCAGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGCCGGAGATGACGCCCCTCCGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGCTCGAGATGTTTCAGGAGAGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTGCAACGGTAATGACGGGGCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTTTGCAGAAATTGCTCTGCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTTATATAACTGGCTACCGACGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGAAATCTTTGGTTCTCTCGCCGTTTT
AGAGCTAGA
TTCTAATACGACTCACTATAGCGGTCACTTTGCGACCTCAGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTGGTAGCATTGTTCCGTCCGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGAGATCAAATGGTGGGTCCTGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGGTGGCGTACTTACTCGCCGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGCGAACCAAGTAGAGCTCCAGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGGTTCATTCATTCCGGTTCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGATGGTTAAAGGTCCGGGTCCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTTAAAAAATCAACTCGGATCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGCGCAACGTTGCGTACGTCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGATTAACTTGGTGGACCCAGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTGAAATCTTATCTCACTCCGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGCTGCATTAAAATCACGTGTCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGACTTGATCCACAACCCAGTCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGACTTTTGTAAAAGACCGACCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGCTGCGGCAATTGTCGCCGGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGCTGAGGTTTTAACTCTCGTCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTTACACCAATTAAGCCACCGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGGAAAAATGGTCCCCCCTACGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTCGTGGTATTTCAGGCCCTGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTTTCCAATTCCACGACGCGGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGAAAACATTCTTACCGTCTCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGAGTTCTTTTGTCGGAGGGCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTTGGGGGACAAACCCCGGGCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTGGCTATCAGCTTCTCGGCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGATACACTAGAAAGCCTAGTCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTTTGGCATAATTCCCAGCTCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGAAAGCGAAATCTGGTCACCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTTAATGTTGTATTAGGGACGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGACTCAAGCTGTTCGCCTACCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGAACAGCACCAGTGAGGACGCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTGAACAGCAAATGGGTAGGGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGCATCTGCAATCACGGCGCCAGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTGCATATCAGTTGGGAACCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGAAGAAGATGCAAAACGTCCCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTTATTTCTAAAGCACCTCGCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGGAACCTCTTGGGGGTCAGCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGCATTGACCATTGCCGCAGCGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTCAGAAGTGAAGGGGCTGCCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGCTGGCTGATTTTCAGGGGGCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGCTGGTTTACTCGGTCAGGTCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGCGATAACAAAACGACCAGTCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTCATTACAAGGGGTCGTCCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGAACGCGTAGCTGCTCCTCTGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGCAATATTCGTCATACTCGGGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGATCGTAATAAAAACGACGCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGCAAGTGATTCGAAGTATCCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGTATCAGCAAACTGAGTCCAGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGTTCCTATTGGACGAATCCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGGTTACATTATTCCCGGTCTGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGACGAATTCGACCAGAACCGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTTCTCTAATTCATAGGCCCCGTTTT
AGAGCTAGA
TTCTAATACGACTCACTATAGATTTGCCGTGTCCTGGCCCGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGGATAAATATCAGACATGCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGGGCAAACAATCGTCTCGTCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGATCGATATGCCTCCGGGCACGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGGGAATTGAGTGCCAGCGCGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGAATGTATGGTTGCCCTGCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGATCACTATCGTGCGTACCCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGTGCCTAATTGAAAGGAGGCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGTGATTTTAGATTGGGTGCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGAGGCATTGGATTCGGGCCAGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGAATACGTGTTCTGGAAACCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGTTTTTAAAGCGGCACGGACGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGAACATAAAGAGAAAGACCCTGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGAAGCCGAACCATTCGAGGCGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGTATTTATCAAACCGGGCAGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGAATGAATAAAGCGCTCTCCGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGACTCAGCAATTACGCCCCGGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGCCCGTGAAGTGGCAGAGGTCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGCCAATCCATTCTGTCAGCCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGCACCGAGTATGTCAGACCGCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTGCTTGGAAAGTTCGAGACAGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGAAAATGAAGAACGCGCGGGGT
TTTAGAGCTAGA
TTCTAATACGACTCACTATAGTAAATCTTCAAACTGCGGACGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGAAGCAAAAGCACTTCCGCCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTATATGAAAAATCATGTCCGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGTGCTAGTGACTTCGGGGCCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTAGTGAATTAGATAGGGTACGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTATTGCTGGTGCAGGGGGGGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGCAATTGTGCCACCACGTCCGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGTGGCGTAAGTGGAACGGGTCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTCTGCATATCTGCCCTCCCTGTTTT
AGAGCTAGA
TTCTAATACGACTCACTATAGCAATTGATATTCGCCCCCCGGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGATTCAGCTGTGGCAGGACAGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGAGTGCCGGATAACGTCCGGGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTGCTGATGTTCAAGGCTCCTGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGGTCAAATCAGGTGAGCTCACGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGACGACCATGGTTGCCGCCCCGTTT
TAGAGCTAGA
TTCTAATACGACTCACTATAGATGTCAAAGGTAGCCCGCCGGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGGCGACATCCGCCATAGGCCCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGTTATGGGGGAGAGCGAGGTCGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGCATCAGTTACGAGGGCGCGTGTT
TTAGAGCTAGA
TTCTAATACGACTCACTATAGCCTTCAACTTCACCCGGGCGGTTT
TAGAGCTAGA
The disclosures of each and every patent, patent application, and publication cited herein are hereby incorporated herein by reference in their entirety.
While this invention has been disclosed with reference to specific embodiments, it is apparent that other embodiments and variations of this invention may be devised by others skilled in the art without departing from the true spirit and scope of the invention. The appended claims are intended to be construed to include all such embodiments and equivalent variations.
The present application is a continuation of U.S. patent application Ser. No. 16/945,638, filed Jul. 31, 2020, which claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 62/881,776, filed Aug. 1, 2019. The contents of each of these applications are hereby incorporated by reference in their entireties herein.
This invention was made with government support under Grant No. R01-HG005946 awarded by the National Institutes of Health (NIH). The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62881776 | Aug 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16945638 | Jul 2020 | US |
Child | 18644807 | US |