The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jun. 11, 2018, is named 38558_718_832_SL.txt and is 9,877 bytes in size.
Determining the spatial distribution of biological molecules, including, for example, nucleic acids, carbohydrates and proteins, can be of interests in life sciences research, molecular diagnostics, forensic science, personal medicines and other applications. In addition to understanding the gene expression profile of a particular cell or tissue, spatial information of biological molecules within the cell or tissue may also provide other valuable information. For example, spatial distribution of biomolecules in a tissue and a cell may govern many biological processes, ranging from organ development to formation of cell polarity. Advances in elucidating spatial distribution of genes may be used in gene expression profiling of cancer cells when monitoring cancer therapies.
While recent advancement in nucleic acid sequencing technologies has greatly improved the routine detection of deoxyribonucleic acid (DNA) sequences, resolving the precise sequences of large biomolecules is still a major challenge. A spatial expression pattern relates to where the gene is expressed, such as what body tissue expresses this gene, which germ layer during development expresses this gene, or which cell type expresses this gene, etc. Methods for studying spatial gene expression are a substantial tool for verifying predicted regulatory interactions and for predicting properties of missing components in a regulation network.
The present disclosure provides methods, devices and systems for the design, manufacturing of spatially encoded nucleic acid arrays which can be used for a variety of molecular detections including detecting mutation distribution within tissue sections.
An aspect of the present disclosure provides a method for detecting spatial distribution of a plurality of target molecules within a biological sample, comprising: (a) providing a substrate comprising a plurality of distinct locations, each of the plurality of distinct locations comprises two or more coordinates; (b) attaching a plurality of zipcodes to the plurality of distinct locations; (c) contacting a biological sample comprising a plurality of target molecules with the substrate, thereby generating a plurality of report molecules, each report molecule comprising: (i) a fragment of a first target molecule of the plurality of target molecules, or a barcode indicating the presence of the first target molecule; and (ii) a first zipcode of the plurality of zipcodes, wherein the first zipcode encodes coordinates of contact on the substrate for the fragment of the first target molecule or the barcode; and (d) sequencing the plurality of report molecules and determining the coordinates of contact, thereby determining the spatial distribution of the plurality of target molecules within the biological sample.
In some embodiments, the plurality of target molecules are deoxyribonucleic acids (DNA), ribonucleic acids (RNA), complementary deoxyribonucleic acids (cDNA), proteins, carbohydrates, lipids, natural products, antigens, metabolites, peptides, aptamer, cells, or binding partners thereof. In some embodiments, the binding partners are antibodies, aptamers, or synthetic antibody mimics. In some embodiments, each binding partner comprises another barcode encoding for the target molecule each binding partner binds to or recognizes. In some embodiments, each zipcode comprises (i) a bottom adapter attached to a distinct location, (ii) a coordinate zipcode attached to the bottom adapter; and (iii) a top adapter attached to the coordinate zipcode. In some embodiments, the coordinate zipcode encodes the two or more coordinates of the distinct location the coordinate zipcode is attached to. In some embodiments, wherein each zipcode comprises (i) a bottom adapter attached to a distinct location, (ii) a lower zipcode attached to the bottom adapter; (iii) a separator sequence attached to the lower zipcode; (iv) an upper zipcode attached to the separator sequence; and (v) a top adapter attached to the upper zipcode. In some embodiments, the lower zipcode encodes a first coordinate and the upper zipcode encodes a second coordinate, and wherein the two or more coordinates comprises the first coordinate and the second coordinate for the distinct location. In some embodiments, the report molecule is deoxyribonucleic acids (DNA) or derivatives thereof. In some embodiments, the biological sample is a tissue section, a derivative of the tissue section, a transfer of the tissue section, or a derivative of the transfer of the tissue section.
Another aspect of the present disclosure provides a method for detecting spatial distribution of a plurality of target molecules in a biological sample, comprising: (a) providing a substrate having a plurality of distinct locations, each distinct location comprising a first and second coordinates; (b) attaching a plurality of zipcodes to each distinct location; thereby encoding the first and second coordinates by the plurality of zipcodes attached to each distinct location; (c) contacting a biological sample comprising a plurality of target molecules with a plurality of binding partners, wherein at least a fraction of the plurality of binding partners bind to or recognize at least a fraction of the plurality of target molecules to form a plurality of first tagged complexes; (d) placing the plurality of first tagged complexes on the substrate, thereby allowing the binding partners in the plurality of first tagged complexes to bind to or recognize at least a fraction of the plurality of zipcodes to form a plurality of second tagged complexes; (e) generating a plurality of report molecules based on the plurality of second tagged complexes; wherein each report molecule encodes for a selected binding partner and a selected zipcode in one of the plurality of second tagged complexes; and (f) sequencing the plurality of report molecules and determining the first and second coordinates and the binding partner for each report molecule; thereby determining the spatial distribution of the plurality of target molecules within the biological sample.
In some embodiments, the plurality of target molecules are deoxyribonucleic acids (DNA), ribonucleic acids (RNA), proteins, complementary deoxyribonucleic acids (cDNA), carbohydrates, lipids, natural products, antigens, metabolites, peptides, aptamers, or cells. In some embodiments, the plurality of binding partners are antibodies, aptamers, or synthetic antibody mimics. In some embodiments, each of the plurality of binding partners comprises a barcode encoding for the target molecule it binds to or recognizes. In some embodiments, each of the plurality of binding partners comprises a barcode encoding for the target molecule it binds to or recognizes. In some embodiments, in (d) the forming the plurality of second tagged complexes is ligating by a ligase, a gap filling, annealing, or hybridizing. In some embodiments, each zipcode comprises (i) a first bottom adapter attached to the distinct location, (ii) a first coordinate zipcode attached to the first bottom adapter; and (iii) a first top adapter attached to the first coordinate zipcode. In some embodiments, the first top adapter is a primer that enables tagging the binding partners. In some embodiments, the first bottom adapter is a sequencing adaptor for sequencing library. In some embodiments, the first coordinate zipcode encodes the first and second coordinates. In some embodiments, each zipcode comprises (i) a first bottom adapter attached to the distinct location, (ii) a first lower zipcode attached to the first bottom adapter; (iii) a first separator sequence attached to the first lower zipcode; (iv) a first upper zipcode attached to the first separator sequence; and (v) a first top adapter attached to the first upper zipcode. In some embodiments, the first lower zipcode encodes the first coordinate and the first upper zipcode encodes the second coordinate. In some embodiments, the first separator sequence comprises a sequence selected from GGG, CCC and TT. In some embodiments, the first lower zipcode comprises from 5 to 24 bases. In some embodiments, the first lower zipcode comprises no more than 16 bases. In some embodiments, the first upper zipcode comprises from 5 to 24 bases. In some embodiments, the first upper zipcode comprises no more than 16 bases. In some embodiments, different zipcodes attached to different distinct locations have an edit distance of 4. In some embodiments, different zipcodes attached to different distinct locations have a long-range minimum edit distance of 5. In some embodiments, the biological sample is a tissue section or a transfer of a tissue section. In some embodiments, the report molecule is a deoxyribonucleic acid (DNA).
Still another aspect of the present disclosure provides a method for detecting spatial distribution of biomolecule expression, comprising: (a) providing a substrate having a plurality of distinct locations, each distinct location comprising a first and second coordinates; (b) attaching a plurality of zipcodes to each distinct location; thereby encoding the first and second coordinates by the zipcodes; (c) contacting a biological sample comprising a plurality of biomolecules with the plurality of zipcodes, thereby attaching at least a fraction of the plurality of zipcodes with at least a fraction of the plurality of biomolecules or fragments thereof, or at least a fraction of copies of the plurality of biomolecules or fragments thereof, and generating a plurality of tagged molecules; and (d) sequencing the plurality of tagged molecules and determining the first and second coordinates for at least the fraction of the plurality of biomolecules; thereby determining the spatial distribution of the plurality of biomolecules within the biological sample.
In some embodiments, the plurality of biomolecules are deoxyribonucleic acid (DNA). In some embodiments, the plurality of biomolecules are complementary deoxyribonucleic acid (cDNA) of ribonucleic acid (RNA). In some embodiments, the RNA is messenger RNA (mRNA). In some embodiments, the method further comprises, prior to (c), reverse transcribing the mRNA to complementary DNA (cDNA). In some embodiments, each zipcode comprises (i) a first bottom adapter attached to the distinct location, (ii) a first coordinate zipcode attached to the first bottom adapter; and (iii) a first top adapter attached to the first coordinate zipcode. In some embodiments, the first top adapter is a primer enabling tagging biomolecules. In some embodiments, the first bottom adapter is a sequencing adaptor for sequencing library. In some embodiments, the first coordinate zipcode encodes the first and second coordinates. In some embodiments, each zipcode comprises (i) a first bottom adapter attached to the distinct location, (ii) a first lower zipcode attached to the first bottom adapter; (iii) a first separator sequence attached to the first lower zipcode; (iv) a first upper zipcode attached to the first separator sequence; and (v) a first top adapter attached to the first upper zipcode. In some embodiments, the first lower zipcode encodes the first coordinate and the first upper zipcode encodes the second coordinate. In some embodiments, the first separator sequence comprises a sequence selected from GGG, CCC and TT. In some embodiments, the first lower zipcode comprises from 5 to 24 bases. In some embodiments, the first lower zipcode comprises no more than 16 bases. In some embodiments, the first upper zipcode comprises from 5 to 24 bases. In some embodiments, the first upper zipcode comprises no more than 16 bases. In some embodiments, different zipcodes attached to different distinct locations have an edit distance of 4. In some embodiments, different zipcodes attached to different distinct locations have a long-range minimum edit distance of 5. In some embodiments, different zipcodes attached to different distinct locations have a long-range minimum edit distance of 5. In some embodiments, the attaching in (c) comprises ligating or annealing.
Another aspect of the present disclosure provides a zip array, comprising: (a) a first location; and (b) a plurality of first zipcodes attached to the first location, wherein each first zipcode comprises (i) a first bottom adapter attached to the first location, (ii) a first lower zipcode attached to the first bottom adapter; (iii) a first separator sequence attached to the first lower zipcode; (iv) a first upper zipcode attached to the first separator sequence; and (v) a first top adapter attached to the first upper zipcode.
In some embodiments, the first location comprises a first coordinate and a second coordinate. In some embodiments, the first lower zipcode encodes the first coordinate, and wherein the first upper zipcode encodes the second coordinate. In some embodiments, the first separator sequence comprises a sequence selected from GGG, CCC and TT. In some embodiments, the first lower zipcode comprises from 5 to 24 bases. In some embodiments, the first lower zipcode comprises no more than 16 bases. In some embodiments, the first upper zipcode comprises from 5 to 24 bases. In some embodiments, the first upper zipcode comprises no more than 16 bases. In some embodiments, the method further comprising, (c) a second location; (d) a plurality of second zipcodes attached to the second location, wherein each second zipcode comprises (i) a second bottom adapter attached to the second location, (ii) a second lower zipcode attached to the second bottom adapter; (iii) a second separator sequence attached to the second lower zipcode; (iv) a second upper zipcode attached to the second separator sequence; and (v) a second top adapter attached to the second upper zipcode. In some embodiments, the second location comprises a third coordinate and a fourth coordinate. In some embodiments, the second lower zipcode encodes the third coordinate and the second upper zipcode encodes the third coordinate. In some embodiments, the second separator sequence comprises a sequence selected from GGG, CCC and TT. In some embodiments, the second lower zipcode comprises from 5 to 24 bases. In some embodiments, the second lower zipcode comprises no more than 16 bases. In some embodiments, the second upper zipcode comprises from 5 to 24 bases. In some embodiments, the second upper zipcode comprises no more than 16 bases. In some embodiments, the first and second locations are adjacent, and wherein both the first and second lower zipcodes pair and the first and second upper zipcodes pair have an edit distance of 4. In some embodiments, the first and second locations are not adjacent, and wherein the first and second lower zipcodes have a long-range minimum edit distance of at least 5. In some embodiments, the first and second locations are not adjacent, and wherein the first and second upper zipcodes have a long-range minimum edit distance of 5. In some embodiments, the first location is no more than 5 μm in length. In some embodiments, the first location is no more than 2 μm in length. In some embodiments, the zipcode array further comprises more than 1 million first locations, wherein each first location is different from another. In some embodiments, the first bottom adaptor is a sequencing adaptor. In some embodiments, the first top adaptor is a primer.
Still another aspect of the present disclosure provides a zipcode array, comprising: (a) a first location; (b) a second location; (c) a plurality of first zipcodes attached to the first location, wherein each first zipcode comprises (i) a first bottom adapter attached to the first location, (ii) a first coordinate zipcode attached to the first bottom adapter; and (iii) a first top adapter attached to the first coordinate zipcode; and (d) a plurality of second zipcodes attached to the second location, wherein each second zipcode comprises (i) a second bottom adapter attached to the first location, (ii) a second coordinate zipcode attached to the second bottom adapter; and (iii) a second top adapter attached to the second coordinate zipcode.
In some embodiments, the first location comprises a first coordinate and a second coordinate. In some embodiments, the first coordinate zipcode encodes the first coordinate and the second coordinate. In some embodiments, the first coordinate zipcode comprises from 6 to 48 bases. In some embodiments, the first coordinate zipcode comprises no more than 32 bases. In some embodiments, the second location comprises a third coordinate and a fourth coordinate. In some embodiments, the second coordinate zipcode encodes the third coordinate and the fourth coordinate. In some embodiments, the second coordinate zipcode comprises from 6 to 48 bases. In some embodiments, the second coordinate zipcode comprises no more than 32 bases. In some embodiments, the first location is no more than 5 μm in length. In some embodiments, the first location is no more than 2 μm in length. In some embodiments, the zipcode array further comprises more than 1 million locations including the first and second locations, wherein each location of the more than 1 million locations is distinguishable from another. In some embodiments, the first bottom adaptor is a sequencing adaptor. In some embodiments, the first top adaptor is a primer.
Another aspect of the present disclosure provides a method for detecting spatial distribution of a plurality of ribonucleic acid molecules in a biological sample, comprising: (a) contacting a first surface comprising a plurality of first oligonucleotides with a biological sample comprising a plurality of ribonucleic acid molecules; (b) extending a fraction of the plurality of first oligonucleotides by a transcriptase using the plurality of ribonucleic acid molecules as templates, thereby generating a plurality of second oligonucleotides, each of the plurality of second oligonucleotides comprising a fragment of complementary DNA (cDNA) of one of the plurality of ribonucleic acid molecules; (c) contacting a zipcode array comprising a plurality of zipcode oligonucleotides with the plurality of second oligonucleotides in the presence of a polymerase, thereby extending the plurality of second oligonucleotides and generating a plurality of third oligonucleotides, each of the plurality of third oligonucleotides comprising one of the plurality of second oligonucleotides and a complementary sequence of one of the plurality of zipcode oligonucleotides; (d) separating the first surface comprising the plurality of third oligonucleotides from the zipcode array; and (e) sequencing the plurality of third oligonucleotides; thereby determining the spatial distribution of the plurality of ribonucleic acid molecules within the biological sample.
In some embodiments, the extending in (c) further comprises a template switching reaction. In some embodiments, the method further comprises, in (b) after transcription, denaturing hybridized second oligonucleotides from ribonucleic acid molecule templates. In some embodiments, the first surface is a gel matrix. In some embodiments, the zipcode array comprises a plurality of distinct locations, and each distinct location comprises a first coordinate and a second coordinate. In some embodiments, a plurality of first zipcode oligonucleotides attached to a first distinct location of the plurality of distinct locations encode the first coordinate of the first distinct location and the second coordinate of the first distinct location. In some embodiments, each first zipcode oligonucleotide comprises (i) a bottom adapter attached to the first distinct location, (ii) a coordinate zipcode attached to the bottom adapter; and (iii) a top adapter attached to the coordinate zipcode. In some embodiments, each first zipcode oligonucleotide comprises (i) a bottom adapter attached to the first distinct location, (ii) a lower zipcode attached to the bottom adapter; (iii) a separator sequence attached to the lower zipcode; (iv) an upper zipcode attached to the separator sequence; and (v) a top adapter attached to the upper zipcode. In some embodiments, the lower zipcode encodes the first coordinate of the first distinct location and the upper zipcode encodes the second coordinate of the first distinct location. In some embodiments, the lower zipcode encodes the first coordinate of the first distinct location and the upper zipcode encodes the second coordinate of the first distinct location. In some embodiments, the biological sample is a tissue section, a derivative of the tissue section, a transfer of the tissue section, or a derivative of the transfer of the tissue section. In some embodiments, at least two second oligonucleotides of the plurality of second oligonucleotides comprise different fragments of complementary DNA (cDNA) sequence(s). In some embodiments, orientation of the plurality of zipcode oligonucleotides on the zipcode array is from 5′ to 3′
Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also “figure” and “FIG.” herein), of which:
While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.
As used herein, the singular form “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise.
As used herein, the term “about” generally refers to the indicated numerical value±10%.
As used herein, open terms, for example, “comprise”, “contain”, “include”, “including”, “have”, “having” and the like refer to comprising unless otherwise indicates.
As used herein, the term “embedding” and “a string of synthetic steps” generally refer to a series of active and inactive steps designed for forming an individual polymer on the substrate and can be used interchangeably. For example, in cases where light-directed synthetic methods are employed, the “embedding” refer to a series exposure and non-exposure steps.
As used herein, the term “edit distance” generally refers to the minimum number of changes (such as insertions, deletions, substitutions and translocations) needed to convert one polymer into another. For example, the edit distance between sequences AGCGCTTAGCCTAGAGCTCTAG (SEQ ID NO: 1) and GCGCTTAGCTTAGAGCTCTATTG (SEQ ID NO: 2) is 4.
As used herein, the term “polymer” generally refers to any kind of natural or non-natural large molecules, composed of multiple subunits. Polymers may comprise homopolymers, which contain a single type of repeating subunits, and copolymers, which contain a mixture of repeating subunits. In some cases, polymers are biological polymers that are composed of a variety of different but structurally related subunits, for example, polynucleotides such as DNA composed of a plurality of nucleotide subunits.
As used herein, the term “substrate” generally refers to a substance, structure, surface, material, means, or composition, which comprises a nonbiological, synthetic, nonliving, planar, spherical or flat surface. The substrate may include, for example and without limitation, semiconductors, synthetic metals, synthetic semiconductors, insulators and dopants; metals, alloys, elements, compounds and minerals; synthetic, cleaved, etched, lithographed, printed, machined and microfabricated slides, devices, structures and surfaces; industrial polymers, plastics, membranes; silicon, silicates, glass, metals and ceramics; wood, paper, cardboard, cotton, wool, cloth, woven and nonwoven fibers, materials and fabrics; nanostructures and microstructures. The substrate may comprises an immobilization matrix such as but not limited to, insolubilized substance, solid phase, surface, layer, coating, woven or nonwoven fiber, matrix, crystal, membrane, insoluble polymer, plastic, glass, biological or biocompatible or bioerodible or biodegradable polymer or matrix, microparticle or nanoparticle. Other example may include, for example and without limitation, monolayers, bilayers, commercial membranes, resins, matrices, fibers, separation media, chromatography supports, polymers, plastics, glass, mica, gold, beads, microspheres, nanospheres, silicon, gallium arsenide, organic and inorganic metals, semiconductors, insulators, microstructures and nanostructures. Microstructures and nanostructures may include, without limitation, microminiaturized, nanometer-scale and supramolecular probes, tips, bars, pegs, plugs, rods, sleeves, wires, filaments, and tubes.
As used herein, the term “biological sample” generally refers to any sample containing biological material(s) or molecule(s), or any sample containing derivatives of the biological material(s) or molecule(s). Examples of or sources of biological samples may include any primary, intermediate or semi-processed, or processed samples, e.g., blood, serum, plasma, urine, saliva, spinal fluid, cerebrospinal fluid, milk, or any other biological fluid, skin cells, cell or tissue samples, biopsied cells or tissue, sputum, mucus, hair, stool, semen, buccal samples, nasal swab samples, or homogenized animal or plant tissues as well as cells, bacteria, virus, yeast, and mycoplasma, optionally isolated or purified, cell lysate, nuclear extract, nucleic acid extract, protein extract, cytoplasmic extract, etc. Biological samples can also include, e.g., environmental samples or food samples, to be tested for microorganisms. Examples of biological samples may also include any composition or material containing biomolecule(s), either naturally existing or synthesized, e.g., DNA, RNA, nucleic acid, polynucleotide, oligonucleotide, amino acid, peptide, polypeptide, biological analytes, drugs, therapeutic agents, hormones, cytokines, etc. The biological samples can be provided fresh, such as blood samples obtained from a finger stick or a heel stick and directly applied to a sample node. The biological samples can be provided in a container or via a carrier. In some cases, a biological sample is pretreated or partially treated, e.g., with a lysing agent, such as a detergent (e.g., SDS or Sarcosyl), a precipitating agent, such as perchloric acid, a chaotrope, such as guanidinium chloride, a precipitating agent, such as acetone or an alcohol, or some other agent. In some cases, a biological sample is absorbed to, or stored or maintained in a sample holder, e.g., dry storage of a biological sample in a sample holder.
As used herein, the term “subunit” generally refers to a subdivision of a larger molecule or a single molecule that assembles (or “coassembles”) with other molecules to form a larger molecular complex such as polymers. Non-limiting example of subunits include monomers, simple carbohydrates or monosaccharide moieties, fatty acids, amino Acids, and nucleotides.
As used herein, the term “nucleic acid” generally refers to a polymer comprising one or more nucleic acid subunits or nucleotides. A nucleic acid may include one or more subunits selected from adenosine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), or variants thereof. A nucleotide can include A, C, G, T or U, or variants thereof. A nucleotide can include any subunit that can be incorporated into a growing nucleic acid strand. Such subunit can be an A, C, G, T, or U, or any other subunit that is specific to one or more complementary A, C, G, T or U, or complementary to a purine (i.e., A or G, or variant thereof) or a pyrimidine (i.e., C, T or U, or variant thereof). A subunit can enable individual nucleic acid bases or groups of bases (e.g., AA, TA, AT, GC, CG, CT, TC, GT, TG, AC, CA, or uracil-counterparts thereof) to be resolved. In some examples, a nucleic acid is deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), or derivatives thereof. A nucleic acid may be single-stranded or double-stranded.
As used herein, the term “adjacent” or “adjacent to,” includes “next to,” “adjoining,” and “abutting.” In one example, a first location is adjacent to a second location when the first location is in direct contact and shares a common border with the second location and there is no space between the two locations. In some cases, the adjacent is not diagonally adjacent.
As used herein, the term “biomolecule” generally refers to any molecule that is present in living organisms or derivative thereof. Biomolecules include proteins, antibodies, peptides, enzymes, carbohydrates, lipids, nucleic acids, oligonucleotides, aptamer, primary metabolites, secondary metabolites, and natural products.
The term “nucleotide,” as used herein, generally refers a molecule that can serve as the monomer, or subunit, of a nucleic acid, such as deoxyribonucleic acid (DNA) or ribonucleic acid RNA). A nucleotide can be a deoxynucleotide triphosphate (dNTP) or an analog thereof, e.g., a molecule having a plurality of phosphates in a phosphate chain, such as 2, 3, 4, 5, 6, 7, 8, 9, or 10 phosphates. A nucleotide can generally include adenosine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), or variants thereof. A nucleotide can include any subunit that can be incorporated into a growing nucleic acid strand. Such subunit can be an A, C, G, T, or U, or any other subunit that is specific to one or more complementary A, C, G, T or U, or complementary to a purine (i.e., A or G, or variant thereof) or a pyrimidine (i.e., C, T or U, or variant thereof). A subunit can enable individual nucleic acid bases or groups of bases (e.g., AA, TA, AT, GC, CG, CT, TC, GT, TG, AC, CA, or uracil-counterparts thereof) to be resolved. A nucleotide may be labeled or unlabeled. A labeled nucleotide may yield a detectable signal, such as an optical, electrostatic or electrochemical signal.
As used herein, the term “zipcode” generally refers to a known, determinable, and/or decodable sequence, such as, for example, a nucleic acid sequence (DNA sequence or RNA sequence), a protein sequence, and a polymer sequence (including synthetic polymers, carbohydrates, lipids, etc.), that allows the identification of a specific location of the sequence, e.g., the nucleic acid, in one, two or multiple dimensional spaces. A zipcode can encode the decodable sequence's own location. For example, each of the zipcode may be a nucleic acid (may be many copies in a spatially defined location such as a square feature of any size from about 10 nm to about 1 cm, including for example, no larger than 0.1 μm, no larger than 0.2 μm, no larger than 0.5 μm, no larger than 1 μm, no larger than 2 μm, no larger than 5 μm, no larger than 10 μm, no larger than 20 μm, no larger than 30 μm, no larger than 40 μm, no larger than 50 μm, no larger than 100 μm, no larger than 200 μm, no larger than 500 μm, no larger than 1 mm, no larger than 2 mm, and no larger than 5 mm. Zipcode arrays can be used to detect the distribution of ribonucleic acid (RNA), protein, deoxyribonucleic acid (DNA) or other molecules distribution in two or three dimensional space. These biomolecules can be detected in tissue, cell, organism or non-living systems. If a nucleic acid sequence is a zipcode, the complementary sequence of the nucleic acid sequence can also be a zipcode. In this disclosure, a zipcode and its complementary copy can encode the same position/location on the zipcode array.
The zipcodes can be designed for precision sequence performance, e.g., GC content between 40% and 60%, no homo-polymer runs longer than two, no self-complementary stretches longer than 3, and be comprised of sequences not present in a human genome reference. Zipcodes can be of sufficient length and comprise sequences that can be sufficiently different to allow the identification of each nucleic acid (e.g., oligonucleic acids) or peptides based on zipcode(s) with which each nucleic acid or peptides is associated.
As used herein, the term “Y-adapter” generally refers to adapters with two DNA strands, part of which are not complementary to each other, thereby forming a fork of single-stranded DNA arms. The non-complementary arms of the Y-adapter can contain different elements such as identifiers, sequencing adapters, primer binding sites etc. On the top end of the Y-shape, one arm of the Y is different from the other arm of the Y. The bottom end of the Y-shape is double stranded (i.e. contains complementary strands). As used herein, Y-adapter and Y-shaped adapters are the same.
The attachment of the adapters to DNA fragments may be effected by ligating the Y-adapters to one or both 5′- or 3′-ends of the DNA fragments and then optionally carrying out an initial primer extension reaction, in which extension products complementary to the immobilized oligonucleotides are formed. This step may comprise an amplification step for multiplying the adapter-fragment-constructs. The forked or Y-adapters can be ligated to both ends of the DNA fragments by a DNA ligase. Only the double-stranded bottom end of the Y-adapter is able to ligate to the fragments DNA.
For use in the present disclosure, the Y-adapter may be ligated to both ends of the double stranded DNA fragments, wherein one strand of the adapter DNA is ligated to one 5′-end of the DNA fragment and the other strand thereof may be ligated to the respective 3′ end of the DNA fragment, and this may happen on both sides of the DNA fragment. The sequence of the Y-adapter can be determined by considering various factors, including but not limited to, the type of DNA sequencing technology or system used for the DNA fragments library; and the primers used for PCR process after or during the construction of the DNA fragments library.
As used herein, the term “transposome” generally refers to a complex that comprises an integration enzyme such as an integrase or transposase, and a nucleic acid comprising an integration recognition site, such as a transposase recognition site. In some examples, the transposase can form a functional complex with a transposase recognition site that is capable of catalyzing a transposition reaction. The transposase may bind to the transposase recognition site and insert the transposase recognition site into a target nucleic acid in a process sometimes termed “tagmentation.” In some examples, one strand of the transposase recognition site may be transferred into the target nucleic acid. In some examples, a transposome may comprise a dimeric transposase comprising two subunits, and two non-contiguous transposon sequences. In some examples, a transposome may comprise a dimeric transposase comprising two subunits, and a contiguous transposon sequence.
Transposases may include, but are not limited to Mu, TnlO, Tn5, hyperactive Tn5 See Goryshin and Reznikoff, J. Biol. Chem., 273:7367 (1998). Some examples can include the use of a hyperactive Tn5 transposase and a Tn5-type transposase recognition site. See Goryshin and Reznikoff, J. Biol. Chem., 273:7367 (1998). Some examples can include a MuA transposase and a Mu transposase recognition site comprising R1 and R2 end sequences. See, Mizuuchi, K., Cell, 35: 785, 1983; Savilahti, H, et al., EMBO J., 14: 4893, 1995. For example, a transposase recognition site that forms a complex with a hyperactive Tn5 transposase (e.g., EZ-Tn5™ Transposase, Epicentre Biotechnologies, Madison, Wis.) may comprise the following 19b transferred strand (mosaic end or “ME”) and non-transferred strands: 5′ AGATGTGTATAAGAGACAG 3′ (SEQ ID NO: 3), 5′ CTGTCT CTTATACACATCT 3′ (SEQ ID NO: 4), respectively.
Another aspect of the present disclosure provides a method for synthesizing an array of polymers on a substrate. The array of polymers may comprise at least 100, 500, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 150,000, 200,000, 250,000, 300,000, 350,000, 400,000, 450,000, 500,000, 600,000, 700,000, 800,000, 900,000, 1,000,000, 10,000,000, 20,000,000, 30,000,000, 40,000,000, 50,000,000, 60,000,000, 70,000,000, 80,000,000, 90,000,000, 100,000,000, 200,000,000, 300,000,000, 400,000,000, 500,000,000, 600,000,000, 700,000,000, 800,000,000, 900,000,000, 1,000,000,000, 2,000,000,000, 3,000,000,000, 4,000,000,000, 5,000,000,000, or more unique polymeric molecules. First, a substrate which may fit for the purposes of polymer synthesis may be provided. The substrate may comprise a plurality of distinct locations. Each of the locations may comprise at least one site that is capable of attaching a subunit of the polymers onto the substrate. Each location may be adjacent to at least one, two, three, four, five, or six other locations. Each location may or may not have the same size, shape, or area. In some cases, a certain percentage of the locations has the same or a different size, shape, and/or area, for example, greater than or equal to 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% of the locations may have the same size, shape and/or area.
Next, a set of masks may be provided. Each mask of the set may be used for defining a different subset of distinct locations on the substrate. Each mask may comprise a plurality of openings, which define a pattern of active regions and inactive regions on the substrate. During polymer synthesis, subunits can be added onto the locations within the active regions.
The openings may take various shapes, regular or irregular, such as square, rectangular, triangular, diamond, hexagonal, and circle. Each mask may have its own design of openings, which defines a distinct pattern of active and inactive regions on the substrate. The openings may or may not be aligned in a single direction. Each opening may cover an integer number of distinct locations on the substrate. For each mask, the openings may or may not be of the same shape. For each distinct location on the substrate, the set of masks collectively may define a unique string of synthetic steps or embedding (i.e., a sequence of subunits to be introduced onto the substrate) used to form the polymers in that location. Each mask may be used for at least one synthetic step for forming the polymers. In some cases, the set of masks are designed such that each pair of strings of synthetic steps (or embeddings) used to form the polymers at two adjacent locations differ from each other by a maximum number of synthetic steps, for example, by at most 500, 400, 300, 200, 100, 90, 80, 70, 60, 50, 45, 40, 35, 30, 25, 20, 18, 16, 14, 12, 10, 9, 8, 7, 6, 5, 4, 3, or 2 synthetic steps. In some cases, two strings of synthetic steps used to form polymers at two adjacent locations differ from each other by one and only one synthetic step. For example, each pair of embeddings used to synthesize neighboring polymers in two adjacent locations differs by one and only one exposure/non-exposure step.
For each mask, a certain percentage (e.g., 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95% or more) or all of the openings may have the same length and/or width. In some cases, the length of the openings may be the same as the substrate. In some cases, the length of the openings may be less than that of the substrate such that one mask is only capable of masking a portion of the substrate. In cases where all of the openings have the same length, their widths may vary and one or more of the openings may or may not have the same width. For example, the width of the openings may be greater than or equal to about 1 nm, 10 nm, 50 nm, 100 nm, 250 nm, 500 nm, 750 nm, 1 μm, 2 μm, 3 μm, 4 μm, 5 μm, 6 μm, 7 μm, 8 μm, 9 μm, 10 μm, 20 μm, 40 μm, 60 μm, 80 μm, 100 μm, 200 μm, 300 μm, 400 μm, 500 μm, 600 μm, 700 μm, 800 μm, 900 μm, 1,000 μm, or more. In some cases, the width of the openings may be smaller than or equal to about 50 mm, 10 mm, 1,000 μm, 900 μm, 800 μm, 700 μm, 600 μm, 500 μm, 400 μm, 300 μm, 200 μm, 100 μm, 90 μm, 80 μm, 70 μm, 60 μm, 50 μm, 40 μm, 30 μm, 20 μm, 10 μm, 8 μm, 6 μm, 4 μm, 2 μm, 1 μm, or less. In some cases, the width of the openings may be between any of the two values described herein, for example, 12 μm.
The length of the openings may vary. In some cases, each of the openings has a length of greater than or equal to about 1 μm, 10 μm, 25 μm, 50 μm, 75 μm, 100 μm, 200 μm, 400 μm, 600 μm, 800 μm, 1,000 μm, 2,000 μm, 3,000 μm, 3,500 μm, 4,000 μm, 4,500 μm, 5,000 μm, 5,500 μm, 6,000 μm, 7,000 μm, 8,000 μm, 9,000 μm, 10,000 μm, or more. In some cases, the length of the opening may be smaller than or equal to about 50,000 μm, 25,000 μm, 10,000 μm, 8,000 μm, 7,000 μm, 6,500 μm, 6,000 μm, 5,500 μm, 5,000 μm, 4,500 μm, 4,000 μm, 3,000 μm, 2,000 μm, 1,000 μm, 800 μm, 600 μm, 400 μm, 200 μm, 100 μm or less. In some cases, the length of the openings may be between any of the two values described herein, for example, 4,900 μm.
To synthesize polymers having multiple segments, more than one set of masks may be provided and each set of masks may be used for synthesizing, for example, a specific segment of the polymers. For example, a first set of masks having openings of the same length but different widths may be used for forming a first segment of the polymers and a second set of masks having openings of the same width but different lengths may be used for forming a second segment of the polymers. The openings of the first set and the second set of masks may be aligned in a first direction and a second direction, respectively, and the first and the second directions can be orthogonal to each other. In some cases, the same set of masks for the first segment synthesis may be used to form the second segments of the polymers by rotating the masks 90 degrees. A third set of masks (or a separate mask) may be used in some situations for forming a third segment (e.g., a known sequence of polymers commonly shared by all the polymers) of the polymers, which mask(s) may be designed to subject all the locations to the polymer synthesis.
The mask can be formed of various materials, such as glass, silicon-based (e.g., silica nitrides, silica), polymeric, semiconductor, or metallic materials. In some cases, the mask comprises lithographic masks (or photomasks). Thickness of the mask may vary. In some cases, the mask may have a thickness of greater than or equal to 1 μm, 10 μm, 50 μm, 100 μm, 250 μm, 500 μm, 750 μm, 1 millimeter (mm), 2 mm, 3 mm, 4 mm, 5 mm, 6 mm, 7 mm, 8 mm, 9 mm, 10 mm, 15 mm, 20 mm, 25 mm, 30 mm, 35 mm, 40 mm, 45 mm, 50 mm, or more. In some cases, the mask may have a thickness of less than or equal to about 500 mm, 250 mm, 100 mm, 50 mm, 40 mm, 30 mm, 20 mm, 10 mm, 8 mm, 6 mm, 4 mm, 2 mm, 1 mm, 900 μm, 800 μm, 700 μm, 600 μm, 500 μm, 400 μm, 300 μm, 200 μm, 100 μm, or less. In some cases, thickness of the mask may be between any of the two values described herein, e.g., about 7.5 mm.
In one aspect, methods are provided to detect the distribution of a biomolecule in a two dimensional space. In some embodiments, the biomolecule may be made to react with the nucleic acid zipcodes. Zipcodes that have reacted with the biomolecule may then be sequenced or otherwise detected. Because the zipcodes encode their own locations, by detecting zipcodes, the biomolecule's spatial distribution can then be determined accordingly. Therefore, it is desirable to obtain high resolution zipcode arrays that can be decoded with high accuracy.
Turning now to
Next, a computer executable logic may be provided and used to (i) select a mask to overlay the substrate; and (ii) select one or more subunits to be introduced onto each location on the substrate using the mask. The computer executable logic that selects the mask the one or more subunits is configures to generate the polymer array(s). Each polymer synthesized on (and thus immobilized at) a distinct location on the substrate may have a unique sequence (or a string of subunits). Each polymer immobilized at a distinct location may differ from another immobilized at adjacent distinct locations in the sequence by a maximum number of subunits, for example, by at most 500, 400, 300, 200, 100, 90, 80, 70, 60, 50, 45, 40, 35, 30, 25, 20, 18, 16, 14, 12, 10, 9, 8, 7, 6, 5, 4, 3, or 2, including substitutions, insertions, deletions, and/or translocations of single subunits. Subsequently, polymer synthesis may be performed using the selected masks and strings of subunits.
Various techniques can be used for synthesizing the polymers on the substrate, for example, chemical synthesis, electrochemical synthesis, or photoelectrochemical synthesis. In some cases, a light-directed synthesis is employed. A light source may be provided. The light source may be capable of performing the light-directed synthesis of the polymeric molecules on the substrate. The light source may provide various forms of radiations, such as visible light, ultraviolet light (UV), infrared (IR), extreme ultraviolet lithography (EUV), X-ray, electrons, and ions. The light source can provide a single wavelength, e.g. a laser, or a band of wavelengths. In some cases, the light beam provided by the light source may be in the range of ultraviolet to near ultraviolet wavelengths. A mask may be provided and positioned along an optical path between the light source and the substrate.
For example, in step 1 of
As described above and elsewhere herein, multiple synthetic steps may be included in the whole polymer synthetic process, and in some cases, for each individual step, there is one and only one mask that is selected and placed along the optical path between the substrate and the light source. In some cases, to synthesize polymeric molecules with pre-defined sequences of subunits, a set of masks can be used and the combination of the masks determines a set of strings of synthetic steps (a series of exposure and non-exposure steps) for all of the locations on the substrate. An example multi-step synthetic route of polymer arrays is shown in steps of
As provided herein, a computer system may be utilized to generate a mask design file for producing physical masks for use in the synthetic reactions. The computer system may comprise a computer readable medium, which may comprise codes that, upon execution by one or more computer processors, implements a method for generating the mask design file. In some cases, a mask set may be designed such that all pairs of strings of synthetic steps for forming polymers in adjacent zipcodes differ from each other by at most 500, 400, 300, 200, 100, 90, 80, 70, 60, 50, 45, 40, 35, 30, 25, 20, 18, 16, 14, 12, 10, 9, 8, 7, 6, 5, 4, 3, or 2 synthetic steps. For example, one zipcode may differ from any adjacent zipcodes by one and only one synthetic step.
During the manufacturing of the zipcode arrays, synthetic errors can occur so that the synthesized polymer or DNA sequences on the substrate are different from the desired sequences. In one aspect of the present disclosure, methods are provided to address expected errors during manufacturing and sequencing (determination of zipcodes) steps.
In some cases, misalignment of the masks relative to the substrate may occur and cause errors during synthesis, resulting in a mismatch between actual and desired polymeric sequences. In most instances, such misalignment causes errors where neighboring embeddings differ from each other at certain synthetic steps.
As used herein, the term “brown codes” generally refer to zipcode sequences that differ from their neighbors or adjacent zipcodes by exact one synthesis step (one bit), thereby can result in one base (or character) difference between the zipcode sequences. These codes are roughly the same length.
To reduce the risk of errors caused by the misalignment, a set of embeddings may be generated to minimize the overall number of differences between neighboring embeddings. For example, the neighboring embeddings may differ from each other by exactly one change. An example set of embeddings and resulting polymeric sequences are shown in
DNA zipcodes may be error correcting. This can be accomplished, for example, by ensuring that different zipcodes on the zipcode array have edit distances of at least 3, such as, for example, an edit distance of 4, as shown in
In some cases, it may be desired to have a plurality of polymers synthesized, which polymers have (1) roughly equal lengths, and (2) higher long-range minimum edit distance, The long-range minimum edit distance, as used herein, dictates for some given D, if two polymers are ≥D locations apart on the substrate, their edit distance must be The synthesized polymers can be short or long. In cases where short sequences are needed, the synthetic route may comprise (i) generating embeddings of all possible lengths that meet the abovementioned two constraints, i.e., all resulting polymers have substantially the same length and higher long-range minimum edit distance, and (ii) using the shortest length that yields enough polymers for synthesis. An example method is illustrated in
In some embodiments, starting with random embedding, first choose next embeddings randomly using “brown code” constraint and using Depth First Search. Then the zipcodes may be checked for long-range minimum edit distance of DNA zipcodes.
In some cases, two or more (e.g., 2, 3, 4, 5, 6, 7, 9, or 10) of the generated embeddings are concatenated to form a new set of embeddings that can be used for synthesizing polymers having multiple segments. Additionally or alternatively, a common known embedding with a much shorter length than (e.g., a string of 0's and 1's of length less than 2, 3, 4, 5, 6, 7, 8, 9, or 10) and distinguished from the concatenated embeddings may be inserted into neighboring concatenated embeddings to separate them, and each of the concatenated embeddings may correspond to a segment of the polymers. For example, as shown in
The upper or lower segments of zipcodes may be 4-24 bases in length, 8-20 bases in length, 12-16 base in length, or no more than 16 bases in length. If a single zipcode may be used for determine the spatial information of biomolecule as well.
Optionally, the encoding of two dimensional zipcodes can be done by two sequence segments: one for the x coordinate and the second for the y coordinate. In some cases, a separator such as the sequence “GGG” may be inserted between the x and y sequences to aid decoding. In some cases, two 1D brown codes can be concatenated by a common string (e.g., FF or CCC or GGG) to generated a 2D brown code to encode x or y-coordinate, as shown in
The same 36 masks were used for synthesizing both the lower and upper zipcodes (rotating 90 degrees for upper zipcodes). The chip has 5,000 different sub-zipcodes (x and y), which yields 25 million 2 μm zipcodes on a 10 mm×10 mm chip. The zipcodes can be linked to top and bottom adaptor sequences. These adaptor sequences are added to facilitate biochemical reactions on surface. For example, in one format of the Yosemite chip (available at Technologies, Inc., Palo Alto, Calif.), the bottom adaptor is a sequencing adaptor for preparing a sequencing library and the top adaptor is a primer for cDNA synthesis to catch RNA molecules. The probes can be in 5′ to 3′ orientation or 3′ to 5′ orientation. Synthesis can be in 5′ to 3′ orientation or 3′ to 5′ orientation. In some cases, the probes are synthesized in 3′ to 5′ orientation and they are then flipped to result in 5′ to 3′ (from surface) orientation.
Zipcodes, once sequenced, can be decoded for its positional information (x and y location) using a software that comparing the designed zipcodes and putative zipcodes identified. Because sequencing and synthesis errors can occur, the decoding software may use approximate string search to determine the zipcode match and the resulting positional information. Centrillion's PostMark™ zipcode decoding software is available from Technologies, Inc. (Palo Alto, Calif.) to decode the Yosemite zipcode array.
In one embodiment, extra zipcodes may be designed but not used in the actual chip synthesis. During decoding, these designed but not used codes are also compared with putative zipcodes from sequencing reactions. A match of the unused zipcodes can indicate a zipcode decoding error. Therefore, these extra zipcodes can be used to assess the stringency of the decoding algorithm.
The following examples illustrate the application of a Yosemite zipcode array.
In this example, the Yosemite chip described above was prepared.
Generating Spatial Oligo Array Hydrogel
Printing oligonucleotide, can be, for example:
The printing oligo (IDT) consists of a 5′ acrydite group that attaches the oligo to the hydrogel, a Uracil-Specific Excision Reagent (USER) enzyme site, a C3 spacer to reduce unwanted background reverse transcription noise, a T7 promoter, and a sequencing adapter sequence. Printing uses an oligo array chip (“Yosemite” Zipcode array, Centrillion Technologies, Inc., Palo Alto, Calif.), as template and extends the oligo to contain a spatial zipcode, for example, a 26 mer, a 27 mer, a 28 mer, a 29 mer, a 30 mer, etc., after the sequencing adapter sequence on the printing oligo, and followed by a poly(T) tail designed to capture mRNA in the tissue specimen.
The extended oligonucleotide on the printed hydrogel, can be, for example:
A 6% acrylamide gel containing 50 μM of printing oligo is casted on a silanized glass slide. 10 μL of the printing oligo mixture is covered with a 15-mm diameter circle cover slip and let polymerize at 25° C. for 30 mins. The cover slip is carefully lifted and excess unpolymerized oligo is rinsed away with MilliQ water. A 1 cm×1 cm oligo array chip is stuck on a 1 in×1 in acrylic base with double sided tape. Two 15 mm×15 mm frame seal (Bio Rad) are stuck on top of each other and onto the acrylic base to surround the oligo array chip. 100 μL of printing solution with 1× Thermopol Buffer (NEB), 0.2 μg/μL BSA, 200 μM dNTP mixture, and 0.32 U/μL Bst DNA polymerase, large fragment (NEB) was added on top of the oligo array chip. The glass slide with the oligo hydrogel is stacked on top of the chip with the gel side facing the chip. A 1 in×1 in PDMS cushion is added on top of the glass slide and the whole cassette is held together by a 2″ binder clip. The whole setup is incubated in a humidified container at 55° C. for 3 hrs. The two surfaces are separated by removing the binder clip, PDMS cushion and immersing the oligo hydrogel and oligo array chip, still stuck together, in 95° C. MilliQ water for 15 mins. The oligo hydrogel is released from the chip by carefully lifting the glass slide away from the chip.
In this example, the Yosemite chip described in Example 1 above is used to detect the spatial distribution of RNA molecules.
Generating Spatial Oligo Array Hydrogel
A procedure that is the same as or similar to the one disclosed in Example 1 is used in Example 2.
Capturing mRNA from Tissue Section with Printed Oligo Hydrogel
Purchased fresh frozen sections of mouse olfactory bulb or mouse embryo E13 was thawed at room temperature for 5 mins and fixed for 10 mins in 4% formaldehyde (diluted 1:9 from 36.5-38.0 stock solution (Sigma-Aldrich) in 1× phosphate buffered saline (PBS)). After fixation, the sections were rinsed with 1×PBS and 500 μL of pre-warmed 0.1% pepsin (Sigma-Aldrich) dissolved in 0.1M HCl was added on top of the tissue section. The section was incubated in a humidified chamber at 37° C. for 6-10 mins (for olfactory bulb: 6 mins; for embryo: 10 mins). The solution was topped off and rinsed with 1×PBS. Excess liquid was dabbed dry with Kimwipe®. 20 μL of reverse transcription solution was added onto the printed oligo hydrogel and the permeabilized section was stacked on top of the gel carefully avoiding air bubbles. The reverse transcription solution contained 1× First Strand buffer (Invitrogen), 5 mM dithiothreitol (DTT) (Invitrogen), 500 μM dNTP mix, 50 ng/μL Actinomycin D (Sigma-Aldrich), 1% DMSO (NEB), 20 U/μL Superscript III (Invitrogen) and 2 U/μL RNaseOUT (Invitrogen). The section and oligo hydrogel was incubated in a humidified chamber at 42° C. overnight (15-16 hrs).
cDNA Library Preparation and Sequencing
The tissue section and oligo hydrogel is removed from the humidified chamber and immersed in 0.1×SSC buffer for 2 mins. The sections were lifted from the oligo hydrogel slide and excess liquid was dabbed dry with Kimwipe®. The oligo hydrogel is then scraped off from the glass slide and into a 0.2 mL polymerase chain reaction (PCR) tube. 20 μL of oligo release solution containing 1.1× Second Strand Buffer (Invitrogen), 250 μM dNTP mix, 0.1 U/μL USER Enzyme (NEB) was added and incubated at 37° C. for 2 hrs. 5 μL of second strand solution containing 3× First Stand Buffer, 3.7 U/μL DNA polymerase I (NEB), 0.18 U/μL RNaseH (NEB), 20 U/μL T4 DNA ligase, and 0.5 mM ATP was further added to the hydrogel and incubated at 16° C. for 2 hrs. 2 μL of T4 DNA polymerase was additionally added and the reaction was incubated at 16° C. for another 20 mins. The reaction was stopped by adding 25 mM EDTA and the supernatant was transferred to a new tube. The sample was purified using Agencourt AMPure XP beads (Beckman Coulter) with beads to sample ratio of 0.75: 1 and eluted in nuclease-free water (IDT). The sample was mixed with In Vitro Transcription solution containing 1×T7 Reaction Buffer (Ambion), 7.5 mM of each NTP (Ambion), 1×T7 Enzyme Mix (Ambion) and 1 U/μL SUPERaseIN (Ambion). The sample was incubated at 37° C. for 15-16 hrs.
The sample was purified using Agencourt AMPure XP beads with beads to sample ratio of 0.75: 1 and eluted in nuclease-free water (IDT). 0.5 μM of sequencing ligation adapter (IDT) was added to the sample and heated at 70° C. for 2 mins and immediately placed on ice. Adapter ligation solution containing 1×T4 RNA Ligase Reaction Buffer (NEB), 20 U/μL T4 RNA Ligase2, truncated (NEB), 4 U/μL RNase Inhibitor, Murine (NEB) is added to the sample and incubated at 25° C. for 1 hr. The sample was purified using Agencourt AMPure XP beads with beads to sample ratio of 0.75: 1 and eluted in nuclease-free water (IDT). 1 μM of RT primer (IDT) and 0.5 mM dNTP mixture was added to the sample and heated at 65° C. for 5 mins and then immediately placed on ice. Reverse transcription solution containing 1× First Strand Buffer, 5 mM DTT and 10 U/μL Superscript III and 2 U RNaseOUT was added to the sample and the reaction was incubated at 50° C. for 1 hr. The sample was purified using Agencourt AMPure XP beads with beads to sample ratio of 0.75: 1 and eluted in nuclease-free water (IDT).
To determine how many PCR cycles is needed to amplify the sample, 1/5 volume of the sample is added to a qPCR mixture containing 1×KAPA HiFi Reaction Buffer, 0.3 mM dNTP mix, 0.5 μM sequencing adapter 1, 0.5 μM sequencing adapter 2, 1×EVA Green (Biotium) and 0.5 U/reaction KAPA HiFi DNA Polymerase. After the number of cycles is determined for the sample, the remaining sample is amplified with the same conditions as the qPCR reaction. The amplified sample or sequencing library was purified using Agencourt AMPure XP beads with beads to sample ratio of 0.75: 1 and eluted in nuclease-free water (IDT). The concentration of the library was quantified using KAPA Library Quantification Kits (KAPA Biosystems) per the manufacturer's protocol. Libraries were diluted to 2 nM and sequenced on the Illumina MiSeq platform using paired-end sequencing per the manufacturer's protocol.
Sequencing ligation adapter:
Rt Primer:
Sequencing Adapter 1:
Sequencing Adapter 2:
Sequencing Alignment and Zipcode Decoding
The read containing the gene information was first aligned to the mouse genome using Spliced Transcripts Alignment to a Reference (STAR) software. STAR is free open source software distributed under GPLv3 license and can be downloaded from the web address of <http://code.google.com/p/rna-star/>. The reads that were aligned were extracted and the corresponding read that contains the zipcode was decoded for its positional x, y coordinate information. The number of reads aligned to each gene was counted with htseq-count. The reads that aligned to a coding transcript were extracted and whenever a read that contains a coding transcript and the corresponding read contains a decoded zipcode, the information is written to a new file that contains the combined gene expression and positional information.
Results:
In another aspect, Zipcode Arrays are used to detect the two dimensional distribution of any molecules that can be detected by a binding partner such as an antibody, an aptamer, or a synthetic antibody mimics (SyAMs) (P. J. McEnaney, et al., “Chemically Synthesized Molecules with the Targeting and Effector Functions of Antibodies.” Journal of the American Chemical Society, 2014 Dec. 31; 136(52):18034-43. DOI: 10.1021/ja509513c, which is incorporated herein by reference).
The binding partner can be labeled with an oligonucleotide barcode or oligonucleotide tag. Antibodies can be readily conjugated with oligonucleotide barcode sequences (see, for example, “Antibody-Oligonucleotide Conjugate Preparation and applications,” [online]. Retrieved from <http://www.solulink.com/products/white-papers/antibody-oligo-conjugate-preparation.pdf>, which is incorporated herein by reference). The oligonucleotide tag can be selected based upon its hybridization specificity to selected targeted sequence and its uniqueness (each tag may represent one antibody or one binding partner type/identity). The binding partners can be mixed with and react with, for example, a tissue section or transfer thereof. The stained tissue section can be then placed on top of a zipcode array or a copy of the zipcode array. The zipcodes can be made to react with the binding tags. For example, the tag oligonucleotides can contain a common sequence similar to the poly-A tail of the mRNAs. By binding with the common sequence of oligonucleotide tags, the zipcode can serve as a primer for an extension reaction to copy the tag sequences. In some cases, after the extension reaction to copy the tag sequences, some zipcode sequences can be linked with the binding partner (antibody) tag sequences. There are many methods to link tag sequences with the zipcode, some of which can be found elsewhere in this disclosure. In some cases, the tag oligonucleotides and the zipcodes can be ligated directly or through an intermediate. In some cases, the tag oligonucleotides can be used as templates in an extension reaction such that copies of the tag oligonucleotides can be added to the sequences to be analyzed.
Afterwards, the zipcodes can be sequenced to analyze the distribution of the binding partners' tags for their spatial distribution. This can be performed after an amplification reaction. The zipcode may provide the location of the binding partners and the tag nucleotide sequences may provide the identification of the binding partners. Many binding partners can be used at the same time to detect many different molecules at the same time. The method can be used to detect protein molecules, lipids, antigens, natural products, metabolites, and other biological molecules.
Turning now to
Therefore, when biomolecule 928 binds detector 926 to form the first complex, the first complex may be placed on the zipcode array 900. Nonbinding detectors 926 may be washed away. Due to close vicinity for binding sequence pairs 914A and 918A, and 914B and 918B, the binding partner 916A may bind to zipcode 902A and the binding partner 916B may bind to zipcode 902B. Then a sequencing reaction may be employed to produce a reporting sequence comprising both coordinate zipcode 910 and barcode 920. This reporting sequence then may report the spatial distribution of biomolecule 928 in the biological sample.
By analyzing the frequency of tags or in specific locations, it is possible to plot the distribution of interested or target molecules.
In this example, the zipcode arrays are used to spatially analyze tumor tissues.
Nucleic acids other than RNA can be analyzed spatially as well. In some examples, the method, kit and system of the present disclosure can be used to provide spatial profiling of the genome and epigenome in tissues, e.g., tumor tissues. In other examples, the method, kit and system of the present disclosure can be used to provide megabase sequencing analysis of very long nucleic acid sequences by scaffolding short reads obtained from the very long nucleic acid sequences and/or relying on long range sequencing contiguity.
For example, a hyperactive Tn5 transposase (not shown) may be bound to immobilized oligonucleotides on the substrate surface. The oligonucleotide may contain 19 bp of the “mosaic end-recognition sequence” (shown as “ME” in
Further steps, such as, for example, denaturing, polymerase-catalyzed elongation, etc., may generate DNA fragments with adapters and zipcodes. See Panel B of
Panel C of
Capturing mRNA from Tissue Section with Oligo Hydrogel
Oligo Gel Oligonucleotide:
The oligo (IDT) consists of a 5′ acrydite group that attaches the oligo to the hydrogel, a USER enzyme site, a spacer to reduce unwanted background reverse transcription noise, a sequencing adapter sequence, a 9-mer semi-randomized unique molecular identifier (UMI) and a poly-20TVN capture region.
A 6% acrylamide gel containing 1 μM of the oligo is casted on a silanized glass slide. 5 μL of the printing oligo mixture is covered with a 22×22 mm square cover slip and let polymerize at 25 C for 30 mins. The cover slip is carefully lifted and excess non-polymerized oligo is rinsed away with MilliQ water.
Purchased fresh frozen sections of mouse olfactory bulb or mouse embryo E13 was thawed at room temperature for 5 mins and fixed for 10 mins in 4% formaldehyde (diluted 1:9 from 36.5-38.0 stock solution (Sigma-Aldrich) in 1×PBS). After fixation, the sections were rinsed with 1×PBS and 500 μL of pre-warmed 0.1% pepsin (Sigma-Aldrich) dissolved in 0.1M HCl was added on top of the tissue section. The section was incubated in a humidified chamber at 37° C. for 6-10 mins. (olfactory bulb: 6 mins and embryo: 10 mins). The solution was topped off and rinsed with 1×PBS. Excess liquid was dabbed dry with Kimwipe®. 10 μL of reverse transcription solution was added onto the oligo hydrogel and the permeabilized section was stacked on top of the gel carefully avoiding air bubbles. The reverse transcription solution contained 1× First Strand buffer (Invitrogen), 5 mM DTT (Invitrogen), 500 μM dNTP mix, 50 ng/μL Actinomycin D (Sigma-Aldrich), 1% DMSO (NEB), 20 U/μL Maxima H Minus Reverse Transcriptase (ThermoFisher Scientific), 1 μM iso-TS adapter and 2 U/μL RNaseOUT (Invitrogen). The tissue section and oligo hydrogel was incubated in a humidified chamber at 42° C. overnight (15-16 hrs).
The template switching reaction can be enhanced the next day by removing the tissue section and adding the reverse transcription solution onto the oligo gel and perform a second round of reverse transcription at 42° C. for 1 hr.
Iso-Template Switching (TS) Adapter:
Adding Spatial Information onto the cDNA Library Oligo Gel
The oligo array chip is stuck on a microscope slide with double sided tape. 35 μL of printing solution with 1× Thermopol Buffer (NEB), 0.2 μg/μL BSA, 200 μM dNTP mix, and 0.32 U/μL Bst DNA polymerase, large fragment (NEB) was added on top of the oligo array chip. The glass slide with the oligo hydrogel containing cDNA is preheated at 94° C. for 3 mins in a slide PCR and directly cooled on an ice block. It is then stacked on top of the chip with the gel side facing the chip. The whole cassette is held together by a two 2″ binder clip. The whole setup is incubated in a humidified container at 55° C. for 2 hrs. The two surfaces are separated by removing the binder clip and immersing the oligo hydrogel and oligo array chip, still stuck together, in 95 C 0.3×SSC for 20 mins. The oligo hydrogel is released from the chip by carefully pushing the glass slide away from the chip.
The cDNA library attached to the hydrogel:
Spatial Info Tagged cDNA Library Preparation and Sequencing
To determine how many PCR cycles is needed to amplify the sample, 1/5 volume of the sample is added to a qPCR mixture containing 1×KAPA HiFi Reaction Buffer, 0.3 mM dNTP mix, 0.5 μM sequencing adapter 1, 0.5 μM sequencing adapter 2, 1×EVA Green (Biotium) and 0.5 U/reaction KAPA HiFi DNA Polymerase. After the number of cycles is determined for the sample, the remaining sample is amplified with the same conditions as the qPCR reaction. The amplified sample or sequencing library was purified using Agencourt AMPure XP beads with beads to sample ratio of 0.75: 1 and eluted in nuclease-free water (IDT). The concentration of the library was quantified using KAPA Library Quantification Kits (KAPA Biosystems) per the manufacturer's protocol. Libraries were diluted to 2 nM and sequenced on the Illumina MiSeq platform using paired-end sequencing per the manufacturer's protocol.
Sequencing Adapter 1:
Sequencing Adapter 2:
Sequencing Alignment and Zipcode Decoding
The read containing the gene information was first aligned to the mouse genome using STAR. The reads that were aligned were extracted and the corresponding read that contains the zipcode was decoded for its positional x, y coordinate information. The number of reads aligned to each gene was counted with htseq-count. The reads that aligned to a coding transcript were extracted and whenever a read that contains a coding transcript and the corresponding read contains a decoded zipcode, the information is written to a new file that contains the combined gene expression and positional information.
Then the zipcode-containing cDNA sequences may be subjected to PCR with Adapter B and Adapter C′ to produce cDNA library for sequencing analysis.
Protein and other biomolecules can be analyzed with labeled antibodies against the molecules of interest. The labels may contain a Tag sequence (comprising barcode sequence) indicating the identities of the target molecules. Linker sequences connecting with zipcodes and/or sequencing adaptors can be added.
The tag design used in
rcFC1:25-rcSP1:33-Zipcode:33-36-rcSC1:20-Tag:9-SP2:34-Index:6-FC2:24
Hybridization
Extension
Restriction Digest and Ligation
PCR
A. Hybridize desired oligos. For Jacaranda oligos, make a 100 μM 50/50 mixture of desired oligos (i.e. 20 μL JACANCHOR Y3′P (Ref 163486340) and 20 μL Jacaranda Anchor Y oligo (Ref 162665174) for a total of 40 then place on thermocycler with the following steps (1 run; no cycles):
B. Silanate iron-oxide wafers with the following steps:
Casting the Gel onto Diced Wafers
Stretching Human Genomic DNA
Restriction Digest and Ligation
Restriction Digest and Ligation without Stretching
Printing Jacaranda Chips onto Gel
Adding Flow Cell Adapters Via PCR
Gel Extraction
Turning now to
The top center panel of
A “Y-oligonucleotide” (single-stranded, shown as “Y-oligo” on the right side of the top center panel) may be allowed to hybridize with the immobilized zipcode sequence. The Y-oligonucleotide may comprise the flow cell adapter sequence Fc1 and a sequencing primer binding site (Seq1). Further, the Y-oligonucleotide may comprise a sequence complimentary to a top portion of the common bottom sequence of the zipcode, thereby allowing the Y-oligonucleotide to hybridize to the common bottom sequence of the zipcode sequences. Then a polymerase may extend the hybridized Y-oligonucleotide using the zipcode sequence as a template, starting from the free 3′-end of the Y-oligonucleotide hybridized to the common bottom sequence and through the zipcode and the top sequence parts of the zipcode sequence. In some cases after the polymerization, blunt double-stranded molecules with 5′ phosphates may be formed on the surface of the zipcode array. The double-stranded molecule may comprise a copy of a zipcode sequence and a copy of the newly synthesized Y-oligonucleotide-based product of the polymerization reaction.
The top right panel of
The middle left panel of
The bottom panel of
Turning now to
The top middle panel shows an example design of a single-stranded Y-adapter segment that is conjugated (i.e., covalently bound) to a 5′ acrydite moiety (shown as “acrydite” in the top center panel of
Digestion and ligation of stretched genomic DNA, using vanishing and appearing restriction sites methodology disclosed in U.S. Patent Publication No. 20017/0044600, may generate Y-adapters covalently attached to genomic DNA fragments, as shown in the top right panel of
A wash with NaOH followed by a heat denaturation step may separate the ligated strands (see the top right panel) from each other and prepare them for hybridization to a zipcode array, such as, for example, the one described in the top middle panel of
Then, a zipcode array, such as, for example, the one described in the top middle panel of
After the extension is complete, separation of the acrylamide gel matrix from the zipcode array may result in immobilized oligonucleotides on the acrylamide gel matrix, as shown in the middle right panel of
Liberation of the DNA fragments library from the acrylamide gel matrix using Fc1-Seq1/Fc2-Seq2 primers may give a library with the general structure shown in the bottom panel of
Turning now to
The description above shows various examples and embodiments. A zipcode array can be used to analyze a variety of molecules including DNA, RNA and protein molecules in 2D formats. The zipcode array can be used to analyze molecules in 3D formats as well. For example, a tissue sample may be sliced vertically into a stack of sheets, and each sheet of tissue sample may be associated with an index number denoting the relative position to other sheets of tissue sample. Each sheet of tissue sample may be analyzed for a variety of molecules including DNA, RNA and protein molecules in 2D formats. However, the zipcode arrays used may comprise zipcode sequences comprising, in addition to the x and y coordinates, a z coordinates, such that, for tissue analysis, 3D analysis can use the z coordinates for each sheet of tissue sample to provide assemble 3D information of the variety of molecules.
The methods, kits and devices describe in this disclosure may analyze a variety of molecules in biological samples or material samples. In cases of genome analysis, the 2D information may be used to decode the arrangement of subsequences of a long DNA sequence. In other cases, the zipcode arrays may be used to decode positional information of molecules at a cellular resolution. For example, zipcode arrays may be at 2 μM resolution containing more than 25 million zipcodes. Higher resolution can be achieved by reducing the feature size to nm range using higher resolution oligonucleotide array synthesis methods.
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
This application is a National Stage Entry of PCT/US2018/034086, filed May 23, 2018, which claims the benefit of U.S. Provisional Patent Application No. 62/509,764, filed on May 23, 2017, U.S. Provisional Patent Application No. 62/509,765, filed on May 23, 2017, U.S. Provisional Patent Application No. 62/509,766, filed on May 23, 2017, U.S. Provisional Patent Application No. 62/510,353, filed on May 24, 2017, U.S. Provisional Patent Application No. 62/510,356, filed on May 24, 2017, U.S. Provisional Patent Application No. 62/510,358, filed on May 24, 2017, and U.S. Provisional Patent Application No. 62/568,200, filed on Oct. 4, 2017, each of which is entirely incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2018/034086 | 5/23/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62509764 | May 2017 | US | |
62509765 | May 2017 | US | |
62509766 | May 2017 | US | |
62510358 | May 2017 | US | |
62510356 | May 2017 | US | |
62510353 | May 2017 | US | |
62568200 | Oct 2017 | US |