Spatial metagenomic characterization of microbial biogeography

Information

  • Patent Grant
  • 11525164
  • Patent Number
    11,525,164
  • Date Filed
    Wednesday, March 27, 2019
    5 years ago
  • Date Issued
    Tuesday, December 13, 2022
    2 years ago
Abstract
The present disclosure provides for a method of determining microbial identities and/or abundances in a biological sample. The method may comprise: (a) immobilizing the biological sample in a matrix; (b) fracturing/breaking the matrix (that comprises the biological sample) into clusters; and (c) determining identities and/or abundances of microbes in the clusters.
Description
SEQUENCE LISTING

This application contains a Sequence Listing which is being submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jul. 13, 2022, is named 38837-202_SEQUENCE-LISTING_ST25.txt and is 122,158 bytes in size.


FIELD OF THE INVENTION

The present invention provides for a method to determine the micron-scale spatial co-localization of genomic material within a 3-dimensional sample by microdroplet encapsulation and high-throughput sequencing of fractionations of microclusters from the sample.


BACKGROUND OF THE INVENTION

The local spatial organization of the gut microbiome influences a variety of ecological properties, including colonization (see Lee, S. M. et al. Bacterial colonization factors control specificity and stability of the gut microbiota. Nature 1-6 (2013). doi:10.1038/nature12447; Pereira, F. C. & Berry, D. Microbial nutrient niches in the gut. Environ Microbiol 19, 1366-1378 (2017); Donaldson, G. P. et al. Gut microbiota utilize immunoglobulin A for mucosal colonization. Science 360, 795-800 (2018); Whitaker, W. R., Shepherd, E. S. & Sonnenburg, J. L. Tunable Expression Tools Enable Single-Cell Strain Distinction in the Gut Microbiome. Cell 169, 538-546.e12 (2017)), metabolism (see Nagara, Y., Takada, T., Nagata, Y., Kado, S. & Kushiro, A. Microscale spatial analysis provides evidence for adhesive monopolization of dietary nutrients by specific intestinal bacteria. PLoS ONE 12, e0175497 (2017)), host-microbe and inter-microbial interactions (see Wexler, A. G. et al. Human symbionts inject and neutralize antibacterial toxins to persist in the gut. Proc. Natl. Acad. Sci. U.S.A. 201525637-6 (2016). doi:10.1073/pnas.1525637113) and community stability (see Reichenbach, T., Mobilia, M. & Frey, E. Mobility promotes and jeopardizes biodiversity in rock—paper—scissors games. Nature 448, 1046-1049 (2007); Coyte, K. Z., Schluter, J. & Foster, K. R. The ecology of the microbiome: Networks, competition, and stability. Science 350, 663-666 (2015)). However, current microbiome profiling approaches such as metagenomic sequencing require homogenization of the input material and thus the physical destruction of any underlying spatial information. While imaging techniques could reveal useful spatial information, they rely on hybridization by short DNA probes of limited spectral diversity, yielding data with low taxonomic resolution and often requiring extensive empirical optimization (see Valm, A. M., Welch, J. L. M. & Borisy, G. G. CLASI-FISH: Principles of combinatorial labeling and spectral imaging. Systematic and Applied Microbiology 35, 496-502 (2012); Amann, R. & Fuchs, B. M. Single-cell identification in microbial communities by improved fluorescence in situ hybridization techniques. Nature Reviews Microbiology 6, 339-348 (2008)). Bacteria are also densely packed in communities, limiting identification and analysis of individual cells (see Mark Welch, J. L., Hasegawa, Y., McNulty, N. P., Gordon, J. I. & Borisy, G. G. Spatial organization of a model 15-member human gut microbiota established in gnotobiotic mice. Proc. Natl. Acad. Sci. U.S.A. 21, 201711596-E9114 (2017)). Imaging approaches can profile simple synthetic communities composed of a small number of cultivable species (see Geva-Zatorsky, N. et al. (2015); Whitaker, W. R., Shepherd, E. S. & Sonnenburg, J. L. (2017), but imaging techniques are challenging to scale to complex and diverse natural microbiomes. A direct and unbiased method for high-taxonomic resolution and micron-scale dissection of natural microbial biogeography is critically needed to mechanistically elucidate the role of the gut microbiome in health and disease.


In macroecology, plot sampling is used to study the spatial organization of large ecosystems, which are otherwise impractical to fully characterize. By surveying many smaller plots from a larger region, one can delineate local distributions of species and statistically infer fundamental properties of global community organization and function. The methods of the present invention provide a multiplexed sequencing technique that analyzes microbial cells in their native geographical context to statistically reconstruct the local spatial organization of the microbiome. Microbial colocalization can be shown in a variety of biological samples, including, soil, gut and biofilm. The methods of the present invention can determine which microbes are spatially associated with which other microbes and can comprise the following steps: (1) taking an intact sample and preserving its spatial structure via in-situ perfusion and polymerization of a chemical matrix, (2) processing that matrix by chemical or enzymatic steps, (3) fractioning the matrix into smaller microparticles, (4) capture each microparticle in emulsion droplets with unique molecular barcodes, (5) PCR amplification of said genetic material from microparticles in each droplet, (6) breaking up the droplets and pooling amplified material for next-generation sequencing measurements.


SUMMARY

The present disclosure provides for a method of determining the compositions/identities and/or abundances of organisms (e.g., microbes such as microbial identities and/or abundances) in a biological sample. The method may comprise: (a) immobilizing the biological sample in a matrix; (b) fracturing/breaking the matrix (that comprises the biological sample) into clusters; and (c) determining identities and/or abundances of microbes in the clusters.


The clusters (each cluster of the clusters) may comprise co-localized cells.


In step (c), the identities and/or abundances of organisms (e.g., microbes) may be determined by sequencing DNAs (e.g., genomic DNAs) and/or RNAs.


In step (c), the identities and/or abundances of organisms (e.g., microbes) may be determined by analyzing proteins, polypeptides, carbohydrates, and/or metabolites.


The matrix may be a gel matrix.


In step (a), the biological sample may be immobilized via perfusion and polymerization of the matrix.


The matrix may comprise a polymer, such as an acrylamide polymer.


The matrix may comprise a plurality of 16S ribosomal RNA (16S rRNA) (gene) amplification primers. The plurality of 16S rRNA amplification primers may be covalently linked to the matrix. The plurality of 16S rRNA (gene) amplification primers may be linked to the matrix through photocleavable linkers, such as acrydite linkers.


The method may further comprise step (d) processing the matrix by chemical or enzymatic means after step (a) or step (b). For example, step (d) may comprise lysing cells. The method may further comprise step (e) passing the clusters through a filter for size selection. After step (e), the clusters may have a median diameter ranging from about 1 μm to about 100 μm, from about 10 μm to about 50 μm, from about 1 μm to about 20 μm, from about 1 μm to about 50 μm, from about 10 μm to about 40 μm, from about 10 μm to about 80 μm, about 1 μm, about 5 μm, about 10 μm, about 20 μm, about 30 μm, about 40 μm, about 50 μm, about 60 μm, about 70 μm, about 80 μm, about 90 μm, about 100 μm, about 120 μm, about 150 μm, about 170 μm, about 200 μm, about 300 μm, about 400 μm, about 500 μm, about 600 μm, about 700 μm, about 80 μm, or about 900 μm.


The clusters may be microparticles.


In step (b), the matrix may be fractured through cryo-fracturing such as cryo-bead beating.


In step (c), identities and/or abundances of organisms (e.g., microbes) may be determined through droplet-based encapsulation.


The droplet-based encapsulation may be through co-encapsulating the clusters with beads in droplets (e.g., emulsion droplets), wherein each droplet comprises (consists essentially of, or consists of) a cluster and a bead, each bead comprising a unique molecular barcode.


The beads may comprise a plurality of 16S rRNA (gene) amplification primers. The plurality of 16S rRNA (gene) amplification primers linked to each bead may comprise a unique (and/or identical) molecular barcode.


The plurality of 16S rRNA (gene) amplification primers may be covalently linked to the beads.


The plurality of 16S rRNA (gene) amplification primers may be linked to the beads through photocleavable linkers, such as acrydite linkers.


The beads may comprise a polymer, such as an acrylamide polymer.


The droplet-based encapsulation may be through capturing the clusters in emulsion droplets comprising molecular barcodes, each emulsion droplet comprising identical molecular barcodes.


The (emulsion) droplets may have a diameter ranging from about 35 μm to about 45 μm, from about 1 μm to about 100 μm, from about 10 μm to about 50 μm, from about 1 μm to about 20 μm, from about 1 μm to about 50 μm, from about 10 μm to about 40 μm, or from about 10 μm to about 80 μm.


The method may further comprise step (f) cleaving the plurality of 16S rRNA (gene) amplification primers from the matrix and/or the beads.


The method may further comprise step (g) degrading the matrix. The matrix may be degraded through exposure to reducing conditions.


The method may further comprise step (h) polymerase chain reaction (PCR) amplification.


The sequencing/analysis may be deep sequencing or any sequencing or other techniques discussed herein or understood by a skilled artisan.


The biological sample may be obtained from a mammal. The biological sample may be obtained from a nervous system, a pulmonary system, a peripheral vascular system, a cardiovascular system, and/or a gastrointestinal system of a mammal. The biological sample may be obtained from the brain, a lung, a bronchus, an alveolus, an artery, a vein, a heart, an esophagus, a stomach, a small intestine, a large intestine, or combinations thereof.


The biological sample may be obtained from a tumor or may be a tumor sample.


The biological sample may be a soil sample, a gut sample, and/or a biofilm sample.


The biological sample may be an environmental sample.





BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.



FIGS. 1a-1c: Metagenomic Plot-sampling by sequencing (MaP-seq) and quality control. a) Schematic of the MaP-seq technique for micron-scale plot-sampling of microbiome samples. b) MaP-seq profiling of a mixture of clusters prepared from homogenized fecal bacteria or E. coli. The number of reads (k, thousands of reads) for each barcode (of 399 total) belonging to either the E. coli OTU or fecal OTUs is displayed as a scatter plot; blue dots: E. coli cluster, red dots: fecal cluster, purple dots: mixed cluster. c) Correlation between OTU relative abundance (RA) measurements obtained by standard bulk 16S sequencing of the same homogenized fecal community compared to MaP-seq OTU RA measurements averaged across individual homogenized fecal clusters (162 total, clusters with <10% E. coli reads); all RA are plotted on log 10 scale, only OTUs with greater than 0.01% RA are displayed, r indicates Pearson correlation.



FIGS. 2a-2d: Spatial organization of the mouse distal colon microbiota. a) MaP-seq profiling of ˜30 μm median diameter distal colon clusters. Raw relative abundance (RA) data from MaP-seq is displayed as a heatmap; columns represent individual clusters (of 1,406), and rows represent abundant and prevalent OTUs (>2% RA in >10% of all clusters; 24 of 246 detected OTUs) aggregated from two technical replicate datasets of the same sample. Shading denotes the RA of individual OTUs in each cluster (linear scale); OTUs are sorted by decreasing prevalence (proportion of clusters OTU is >2% RA), and clusters are clustered by Euclidean distance. The prevalence of each OTU across clusters is displayed to the right as a bar plot, and each bar is colored by the OTUs assigned taxonomy at the family level (legend in d). b) Correlation between OTU RA measurements obtained by standard bulk 16S sequencing of the same sample compared to OTU prevalence across clusters as calculated in a); n.d. indicates not detected>2% RA in any clusters, only OTUs greater than 0.01% RA as measured by bulk 16S sequencing are displayed. c) Histogram of the number of OTUs per cluster (OTUs>2% RA), shown for homogenized fecal clusters which serve as a mixed control (red outline, 162 total) and distal colon clusters (grey, 1,406 total) of the same size. Dotted lines indicate median value for each group. d) For each abundant and prevalent OTU pair (OTUi,j) spatial associations were calculated, shading indicates log 2 odds ratio, x denotes statistically significant association (Fisher's exact test, p<0.05, FDR=0.05); colored boxes represent OTU taxonomy at the family level.



FIGS. 3a-3e: Spatial association detection and technical reproducibility. a) Expanded view of FIG. 2a; abundant and prevalent OTUs (>2% abundance in >1% of clusters) are displayed; the cluster map is organized as in FIG. 2a, RA denotes relative abundance. The column indicators on top indicate the technical replicate each cluster originated from (red, replicate 1; black, replicate 2). b) A volcano plot visualization of data from FIG. 2d; red dotted line indicates threshold for statistically significant associations (Fisher's exact test, p<0.05). c) Correlation between association detection utilizing raw or subsampled reads. Reads were subsampled for all clusters to the minimum read cutoff (717 reads) and association detection was performed as before. The resulting odds ratios of pairwise associations were highly correlated to those calculated from the raw reads (Pearson r=0.96). These results suggest that variable read counts for each cluster do not significantly alter detected associations (i.e. due to use of a 2% abundance threshold). d) Dependence of association detection on cluster sampling depth. The full dataset was subsampled, and the same association detection was performed. The number of significant associations detected is plotted; the line indicates the mean and error band indicates the standard deviation of ten iterations of subsampling. The number of significant associations detected linearly increases with the number of clusters sampled, implying even deeper cluster data collection could enable characterization of weaker associations between less abundant taxa. e) Technical reproducibility of association detection between the two technical replicate datasets of the same sample. Association detection was performed on each technical replicate, and the calculated odds ratio of association is plotted for all pairwise associations. For associations detected as significant in at least one of the two replicates, the sign of association is the same between both replicates for the majority of cases (64/74 associations). For associations detected as significant in both replicates (15 associations) the sign is the same in all cases. These results indicate good correspondence of detected microscopic spatial associations between technical replicates.



FIGS. 4a-4h: Quality control of the MaP-seq technique by cluster mixing experiment. a) Schematic of the cluster mixing experiment; clusters containing either E. coli or homogenized fecal bacteria are prepared, mixed, and processed through the MaP-seq pipeline. b) Example of resulting distribution of read counts per identified unique barcode for the mixing experiment. A conservative threshold cutoff for considering real clusters is set as the total number reads divided by 2,500 (i.e., the number of clusters that were utilized as input during microfluidic encapsulation, and assuming an equal read distribution for each cluster). The calculated read cutoff (1,440 reads) is indicated by the red line, which results in 399 clusters for downstream analysis. The Y-axis is set to a maximum of 500 clusters for visualization purposes. c) Resulting raw data for the mixing experiment displayed as a cluster map; columns indicate the 399 clusters passing the read cutoff and rows indicate prevalent and abundant OTUs (OTUs present>2% relative abundance in >1% of all clusters). The E. coli OTU is the first row, while other rows represent fecal bacterial OTUs. The plot is arranged as in FIG. 2a, RA denotes relative abundance. d) An alternative visualization of FIG. 1b, plotting the fraction of reads in each cluster mapping to the spike-in E. coli OTU; most clusters show either entirely spike-in mapping reads or no spike-in mapping reads as expected. A small number of clusters show low levels of residual contamination; for this reason, a conservative relative abundance cutoff is used throughout downstream analysis (>2% relative abundance) to classify an OTU as present within a given cluster. e) Detection sensitivity of MaP-seq; the relative abundance of OTUs is compared to the proportion of clusters an OTU is detected in (with >2% relative abundance cutoff). Higher abundance OTUs display higher detection sensitivity as expected. f) Detection of significant pairwise associations in the cluster mixing experiment. The two communities contain defined spatial associations; the fecal bacteria are expected to be positively associated with each other, whereas the fecal bacteria should be negatively associated with E. coli. Association analysis was conducted in the same manner as FIG. 1d; the fecal bacteria are found to be strongly associated and negatively associated with E. coli as expected. The associations are much stronger than observed in the murine gut (i.e. note that the color map scale spans a larger range this plot). g-h) To confirm technical reproducibility across different experiments and particles sizes, the cluster mixing experiment was repeated but with particles of ˜20 μm median size. Fecal bacteria constituted one community and Sporocarcina pasteurii, an environmental taxa constituted a second community. g) is analyzed as in FIGS. 1b and h) as in FIG. 1c. This revealed low mixing rates (1.65% mixed), negligible contamination (<0.003% of reads) and good correlation to bulk 16S sequencing (Pearson correlation r=0.72), confirming technical reproducibility of the technique across different experiments and particle sizes.



FIGS. 5a-5d: Survey of spatial organization across the mouse gastrointestinal tract. a) Top: absolute abundance within gut intestinal compartments calculated from spike-in sequencing (arbitrary units, normalized to the maximum value) and number of OTUs (i.e. alpha diversity, number of OTUs>0.1% relative abundance). Bottom: absolute abundance of abundant OTUs (>1% of maximum OTU absolute abundance in any sample) is shown below as a heatmap (log 10 scale); OTUs are clustered by Bray-Curtis dissimilarity. b) Histogram of the number of OTUs per cluster (OTUs>2% RA). The number of clusters aggregated from two technical replicates is indicated (si6 n=386, cec n=405, co2 n=259), and dotted line indicates median value. c) tSNE visualization of clusters utilizing Bray-Curtis dissimilarity of OTU relative abundances (subsampled to 314 reads across all clusters, number of clusters indicated above). On the left, each cluster is colored by site of origin; on the right each cluster is colored by the relative abundance of the six most abundant families within each cluster (linear scale). d) Pairwise spatial associations for abundant and prevalent OTUs visualized as a circular graph; the number of clusters utilized is subsampled to the lowest number across the three samples (259 clusters). Nodes indicate OTUs, sizing is proportional to the prevalence of OTUs across clusters and color represents OTU taxonomy at the family level, dotted edges denote all possible associations and shaded edges denote statistically significant associations (p<0.05, FDR=0.05).



FIGS. 6a-6b: MaP-seq profiling of colonic samples at a smaller size scale. a) Colonic clusters of ˜7 μm diameter were profiled in parallel. A histogram is shown with the number of OTUs per cluster compared to the ˜20 μm clusters profiled in FIG. 3b. The smaller size-scale contains a significantly lower number of OTUs per cluster as expected (Mann-Whitney U test, p<10−6). The number of clusters aggregated across two technical replicates is indicated, and the dotted line indicates the median value. b) Pairwise spatial associations for prevalent and abundant OTUs visualized as a force directed graph. Nodes indicate OTUs, and sizes are proportional to prevalence of OTUs across clusters and coloring represent taxonomy at the family level. Edges represent statistically significant associations (Fisher's exact test, p<0.05, FDR=0.05). ˜20 μm colonic clusters display same data as shown in FIG. 3d. The full dataset for each sample is utilized in calculation of pairwise associations. Robust positive co-associations are recapitulated between the Bacteroidales taxa between at both of the size scales.



FIGS. 7a-7h: Analysis of taxa with altered spatial structuring in the cecum. a) OTU clustering or self-aggregation in the murine cecum; for prevalent OTUs (>2% RA in >10% all clusters) the proportion of times an OTU is observed as the majority of the cluster (>50% relative abundance) is plotted. Grey dotted line indicates the average clustering value, and black dotted line indicates two times the average clustering value. b) FISH imaging of a cecum section from the same sample profiled by MaP-seq; green is Erec482 probe targeting Lachnospiraceae, blue is Lab148 probe targeting Lactobacillaceae, and magenta is Ato291 probe targeting Coriobacteriales. c-f) Four representative regions showing Erec482 targeted Lachnospiraceae displaying self-aggregating clusters. The source of each of the four regions is indicated by a yellow outline in b). g-h) Two representative regions showing areas with no Lachnospiraceae self-aggregation. The source of the two regions is indicated by a red outline in b).



FIGS. 8a-8h: Erec482-stained bacterial aggregations appear to exclude other bacteria and additional imaging controls. a) The same region shown in FIG. 512c is displayed, but the four channels are displayed independently. b) To investigate if other bacteria not targeted by the utilized FISH probes (Lab148 and Ato291 probes) may be present in the apparent Erec482 targeted Lachnospiraceae clusters, DAPI counterstaining (targeted to cell gDNA) was also investigated. A bacterial aggregation is displayed from the image in a); the region is indicated by a yellow outline. Apparent Erec482 aggregations display a single bacterial morphology under DAPI staining, and the DAPI staining co-localizes with Erec482 probe fluorescence. These results imply that the apparent Erec482 Lachnospiraceae clusters exclude other bacteria in the cecum. c) A representative region not displaying Erec482 targeted Lachnospiraceae clusters; a variety of cell morphologies are observable with DAPI staining and Erec482, Lab148 and Ato291 stained bacteria are present. The region displayed is indicated by a red outline in a). d-f) To validate the Erec482-stained structures, we performed two-color FISH utilizing the Erec482 probe (this time with a Cy3 fluorophore) and a Eub338 probe targeted to all bacteria. d) shows the Eub338 probe, e) shows the Erec482 probe, demonstrating that similar aggregations as observed previously (i.e. see inset zoom of specific structures, yellow outline) are co-stained in both channels, indicating they are bacteria. f) shows a different section not stained with a Cy3 probe but with same exposure settings, indicating that the Erec482 staining is specific and not due to autofluorescence. g-h) Additional controls showing Eub338 and Non338 (scrambled control probe) FISH with same exposure settings. g) shows Eub338 probe, h) shows Non338 probe. Lumenal bacteria are bound by the Eub338 and not Non338 probe validating the FISH staining conditions.



FIGS. 9a-9d: Spatial organization in the colon after dietary perturbation. a) Absolute abundance of dominant OTUs (>1% of maximum OTU absolute abundance in any sample) in the distal colon of co-housed mice fed a low fat, plant-polysaccharide diet (LF) or high fat diet (HF) for 10 days is shown as a heatmap (log 10 scale). Labels on right indicate LF enriched, HF enriched and shared OTUs. b) Top: histogram of the number of OTUs per cluster (OTUs>2% RA). Bottom: histogram of the number of distinct families per cluster (families>2% RA). For both plots, green indicates LF clusters and orange indicates HF clusters, dotted line indicates median value, and the number of clusters aggregated from two technical replicates is indicated (LF co2 n=495, HF co2 n=938). c) Histogram of net relatedness index (NRI) calculated for each cluster containing at least two OTUs, green indicates LF clusters and orange indicates HF clusters. d) tSNE visualization of clusters utilizing Bray-Curtis dissimilarity of OTU relative abundances (subsampled to 121 reads across all clusters). Left, cluster colored by site of origin; LF (green), HF (orange), number of clusters indicated above. In addition a biological replicate from an adjacent colonic segment of the same LF mouse is shown (LF(rep), dark green, n=359 clusters). Red arrows indicate examples of cluster configurations observed in both diet conditions. Right, each cluster is colored by the relative abundance of the eight most abundant families within each cluster (linear scale).



FIGS. 10a-10d: Additional information for tSNE analysis of dietary perturbation clusters. a) Same figure as FIG. 4d for reference. b) Clusters from each source (LF, LF(rep), HF) plotted separately on the same tSNE manifold for visualization purposes. c) Clusters are shaded by the number of OTUs per cluster (OTUs>2% RA in the subsampled dataset utilized for tSNE analysis). d) Clusters are shaded by the log 10 relative abundance of individual OTUs within each cluster. Red arrows on Bacteroidaceae OTU 6 and Porphyromondaceae OTU 5 plots indicate the same regions in FIG. 4d where clusters dominated by each of these taxa respectively are observed in both diets. The 24 OTUs with the highest average relative abundance across all clusters are displayed.



FIGS. 11a-11d: Barcoded bead quality control. a) Schematic of the microfluidic droplet generation device utilized to fabricate barcoded beads. b) Image of resulting barcoded gel beads visualized by phase contrast and hybridized with a FISH probe targeted to the terminal 16S 515f primer region present in fully extended primer product (bead_515f_cy5, see Table 4). c) Quantification of cleanup of primer synthesis intermediates by Exol cleanup; the mean fluorescence intensity of beads was quantified (using Nikon Elements AR) when hybridized by a FISH probe targeted to the 515f site present on fully extended primer product (bead_515f_cy5) or a FISH probe targeted to the pe1 primer extension site (bead_pe1_cy5, see Table 4) present in all synthesis intermediates. Before cleanup the amount of pe1 sites on beads are higher than 515f sites, while after cleanup the amount of pe1 and 515f sites on beads are roughly equal, implying removal of un-extended primer intermediates (which contain pe1 sites, but not the terminal 515f site). d) Photorelease of amplification primer from beads; beads were subjected to no UV exposure or UV exposure for 10 minutes and supernatant was collected and analyzed via Agilent Bioanalyzer dsDNA HS assay; peaks at ˜40 s and ˜110 s are gel migration markers. A short primer product is observed to be released in a UV exposure dependent fashion.



FIGS. 12a-12b: Barcoded bead synthesis schematic. a) Beads are synthesized via a three-step split-and-pool synthesis approach, resulting in 96 (see Cordero, 0. X. & Datta, M. S. Microbial interactions and community assembly at microscales. Current Opinion in Microbiology 31, 227-234 (2016)) or 884,736 possible unique barcodes. The three sets of primers are denoted primer extension sets 1-3 (i.e. pe1, pe2, and pe3). b) Extension strategy utilized for bead synthesis. A primer is linked to the gel bead via an acrydite linker and also contains a photocleavable linker group. Barcoded primers are hybridized to this linked primer and serve as an extension template for adding barcodes to the bead-linked primers. After each round, the extension template primer is stripped, and the next round of extension is performed. The sequence of the final primer product is indicated at the bottom.



FIGS. 13a-13e: Cluster generation and quality control. a) Schematic of cluster generation process. A tissue section is fixed and embedded in a gel matrix by in situ acrylamide perfusion and polymerization. Shown is a murine intestinal section within a set gel as an example (excess gel is untrimmed at this step); a PCR tube placed to the right for scale. The gel-embedded sample is then subjected to cryofracturing, lysis preparation steps, and finally size-selection by passing clusters through nylon mesh filters of various sizes. b) Microscopy of four resulting clusters generated from murine colonic samples (size-selected for “large” clusters) visualized with phase-contrast or stained with SYBR Green I targeting genomic DNA; individual cells fixed in their original spatial orientation can be observed as punctate dots within the clusters. c) Resulting size distributions of clusters after size-selection to three size scales (small, medium and large); size-selected clusters were stained with SYBR Green I and imaged, clusters were identified by a fluorescence threshold, and the equivalent diameter of identified clusters was calculated using Nikon Elements AR. d) Photorelease of reverse amplification primer from clusters; clusters were subjected to no UV exposure or UV exposure for 10 minutes and supernatant was collected and analyzed via Agilent Bioanalyzer dsDNA HS assay; peaks at ˜40 s and ˜110 s are gel migration markers. A short primer product is observed to be released in a UV exposure dependent fashion. e) Degradation of cluster polyacrylamide gel matrices by exposure to reducing conditions; clusters were incubated in PCR encapsulation mix with and without 1 mM DTT (i.e., final concentration of DTT in droplets) for 2 hours; without DTT clusters remain stable and retain their structure; with DTT reducing conditions, the gel matrix degrades resulting in dispersion of individual cells observable as stained puncta.



FIGS. 14a-14c: Microfluidic encapsulation of barcoded beads and clusters. a) Schematic of the microfluidic droplet generation device utilized to co-encapsulate barcoded beads and clusters. Beads are packed single file to enable loading that beats Poisson encapsulation statistics expected by random loading. b) Image of the microfluidic device during operation. c) Resulting emulsion after encapsulation; beads can be observed as a faint sphere within droplets; orange arrows indicate three example droplets (of many in the field of view) with a single barcoded bead (but no clusters). One droplet with a single barcoded bead (red arrow) and a single cluster (blue arrow) can be observed in this field of view.



FIGS. 15a-15b: Preliminary results of spatial changes in small intestinal microbiome in wild-type (WT) and ciprofloxacin (Cipro)-treated mice. a) Bulk abundance and composition in the murine small intestine. b) Spatial co-occurrence network of murine microbiome in WT and Cipro conditions. Each node correspond to a significant OTU. Each edge corresponds to co-occurrence of two OTUs with colors denoting increasing likelihood of co-occurrence.



FIGS. 16a-16b: a) Antibiotics-FMT study design. b) Comparison of fecal microbiome of wild-type C57BL6/J mice from two suppliers, Taconic and Jackson Labs.





DETAILED DESCRIPTION

The methods and systems of the present disclosure provide a Metagenomic Plot-sampling by sequencing (MaP-seq), a multiplexed sequencing technique that analyzes microbial cells in their native geographical context to statistically reconstruct the local spatial organization of the microbiome (FIG. 1a). To perform MaP-seq, an input sample is first physically fixed by immobilizing the microbiota via perfusion and in situ polymerization of an acrylamide polymer matrix that also contains a covalently linked reverse 16S rRNA amplification primer. The embedded sample is then fractured via cryo-bead beating, subjected to cell lysis, and passed through nylon mesh filters for size selection to yield cell clusters or particles of desired and tunable physical sizes (i.e., by utilizing different mesh filter sizes). Resulting clusters contain genomic DNA immobilized in their original arrangement, preserving local spatial information. Next, a microfluidic device is used to co-encapsulate these clusters with gel beads, each containing uniquely barcoded forward 16S rRNA amplification primers. Primers are photocleaved from the beads and clusters, genomic DNA is released from clusters by triggered degradation of the polymer matrix within droplets, and PCR amplification of the 16S V4 region is performed. Droplets are then broken apart, and the resulting library is subjected to deep sequencing. Sequencing reads are filtered and grouped by their unique barcodes, which yield the identity and abundance of bacterial operational taxonomic units (OTUs) within individual cell clusters.


The present disclosure provides for a method of determining the compositions/identities and/or abundances of organisms (e.g., microbes such as microbial identities and/or abundances) in a biological sample. The method may comprise: (a) immobilizing the biological sample in a matrix; (b) fracturing/breaking the matrix (that comprises the biological sample) into clusters; and (c) determining identities and/or abundances of microbes in the clusters.


The clusters (each cluster of the clusters) may comprise co-localized cells.


In step (c), the identities and/or abundances of organisms (e.g., microbes) may be determined by sequencing DNAs (e.g., genomic DNAs) and/or RNAs.


In step (c), the identities and/or abundances of organisms (e.g., microbes) may be determined by analyzing proteins, polypeptides, carbohydrates, and/or metabolites.


The matrix may be a gel matrix.


In step (a), the biological sample may be immobilized via perfusion and polymerization of the matrix.


The matrix may comprise a polymer, such as an acrylamide polymer.


The matrix may comprise a plurality of 16S ribosomal RNA (16S rRNA) (gene) amplification primers. The plurality of 16S rRNA amplification primers may be covalently linked to the matrix. The plurality of 16S rRNA (gene) amplification primers may be linked to the matrix through photocleavable linkers, such as acrydite linkers.


The method may further comprise step (d) processing the matrix by chemical or enzymatic means after step (a) or step (b). For example, step (d) may comprise lysing cells. The method may further comprise step (e) passing the clusters through a filter for size selection. After step (e), the clusters may have a median diameter ranging from about 1 μm to about 100 μm, from about 10 μm to about 50 μm, from about 1 μm to about 20 μm, from about 1 μm to about 50 μm, from about 10 μm to about 40 μm, from about 10 μm to about 80 μm, about 1 μm, about 5 μm, about 10 μm, about 20 μm, about 30 μm, about 40 μm, about 50 μm, about 60 μm, about 70 μm, about 80 μm, about 90 μm, about 100 μm, about 120 μm, about 150 μm, about 170 μm, about 200 μm, about 300 μm, about 400 μm, about 500 μm, about 600 μm, about 700 μm, about 80 μm, or about 900 μm.


The clusters may be microparticles.


In step (b), the matrix may be fractured through cryo-fracturing such as cryo-bead beating.


In step (c), identities and/or abundances of organisms (e.g., microbes) may be determined through droplet-based encapsulation.


The droplet-based encapsulation may be through co-encapsulating the clusters with beads in droplets (e.g., emulsion droplets), wherein each droplet comprises (consists essentially of, or consists of) a cluster and a bead, each bead comprising a unique molecular barcode.


The beads may comprise a plurality of 16S rRNA (gene) amplification primers. The plurality of 16S rRNA (gene) amplification primers linked to each bead may comprise a unique (and/or identical) molecular barcode.


The plurality of 16S rRNA (gene) amplification primers may be covalently linked to the beads.


The plurality of 16S rRNA (gene) amplification primers may be linked to the beads through photocleavable linkers, such as acrydite linkers.


The beads may comprise a polymer, such as an acrylamide polymer.


The droplet-based encapsulation may be through capturing the clusters in emulsion droplets comprising molecular barcodes, each emulsion droplet comprising identical molecular barcodes.


The (emulsion) droplets may have a diameter ranging from about 35 μm to about 45 μm, from about 1 μm to about 100 μm, from about 10 μm to about 50 μm, from about 1 μm to about 20 μm, from about 1 μm to about 50 μm, from about 10 μm to about 40 μm, or from about 10 μm to about 80 μm.


The method may further comprise step (f) cleaving the plurality of 16S rRNA (gene) amplification primers from the matrix and/or the beads.


The method may further comprise step (g) degrading the matrix. The matrix may be degraded through exposure to reducing conditions.


The method may further comprise step (h) polymerase chain reaction (PCR) amplification.


The sequencing/analysis may be deep sequencing, or any sequencing or other techniques discussed herein or understood by a skilled artisan.


The biological sample may be obtained from a mammal. The biological sample may be obtained from a nervous system, a pulmonary system, a peripheral vascular system, a cardiovascular system, and/or a gastrointestinal system of a mammal. The biological sample may be obtained from the brain, a lung, a bronchus, an alveolus, an artery, a vein, a heart, an esophagus, a stomach, a small intestine, a large intestine, or combinations thereof.


The biological sample may be obtained from a tumor or may be a tumor sample.


The biological sample may be a soil sample, a gut sample, and/or a biofilm sample.


The biological sample may be an environmental sample.


The present nucleic acids (e.g., primers such as 16S rRNA amplification primers) may or may not comprise barcode elements (e.g., a unique molecular barcode for each bead). Barcode elements may be used as identifiers for a cluster and may indicate the presence of one or more specific sequences in a cluster (e.g., DNA or RNA). Members of a set of barcode elements have a sufficiently unique nucleic acid sequence such that each barcode element is readily distinguishable from the other barcode elements of the set. Barcode elements may be of any length of nucleotides, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30 or more nucleotides in length. Detecting barcode elements and determining the nucleic acid sequence of a barcode element or plurality of barcode elements are used to determine the presence of an associated DNA or RNA element. Barcode elements can be detected by any method known in the art, including sequencing or microarray methods.


In one embodiment, barcoded primers are constructed via a split-and-pool primer extension strategy with 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more barcode extension rounds. Klein, A. M. et al. Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells. Cell 161, 1187-1201 (2015). Bose, S. et al. Scalable microfluidics for single-cell RNA printing and sequencing. Genome Biology 1-16 (2015). doi:10.1186/s13059-015-0684-3.


Microbial identities and/or abundances, or specific changes in microbiome or microbiota discussed herein can be detected using various methods, including, without limitation, quantitative PCR or high-throughput sequencing methods which detect over- and under-represented genes in the total bacterial population (e.g., 454-sequencing for community analysis; screening of microbial 16S ribosomal RNAs (16S rRNA), etc.), or transcriptomic or proteomic studies that identify lost or gained microbial transcripts or proteins within total bacterial populations. See, e.g., U.S. Patent Publication No. 2010/0074872; Eckburg et al., Science, 2005, 308:1635-8; Costello et al., Science, 2009, 326:1694-7; Orrice et al., Science, 2009, 324:1190-2; Li et al., Nature, 2010, 464: 59-65; Bjursell et al., Journal of Biological Chemistry, 2006, 281:36269-36279; Mahowald et al., PNAS, 2009, 14:5859-5864; Wikoff et al., PNAS, 2009, 10:3698-3703.


The composition/identities and abundance of the established microbiota can be studied by sequencing the 16S ribosomal RNA (or 16S rRNA) gene of a sample. 16S rRNA is a component of the 30S small subunit of prokaryotic ribosomes.


In additional embodiments, the determining step involves screening bacterial 16S rRNA genes using PCR.


The DNA library may be a genomic DNA or metagenomic library. A metagenomic library is a collection of the genomic DNAs of a mixture of organisms, such as a mixture of microbes.


The present method may or may not comprise a step of processing the matrix by chemical or enzymatic means after or before any suitable step, including, but not limited to, cell lysis, addition of a detergent or surfactant, addition of protease, addition of RNase, alcohol precipitation (e.g., ethanol precipitation, or isopropanol precipitation), salt precipitation, organic extraction (e.g., phenol-chloroform extraction), solid phase extraction, silica gel membrane extraction, CsCl gradient purification.


Photocleavable linkers may be cleaved by UV light. Photocleavable linkers may be a photocleavable oligonucleotide. Photocleavable linkers may be o-nitrobenzyl derivatives (Zhao et al. 2012: o-nitrobenzyl alcohol derivatives). U.S. Patent Publication No. 20080227742.


Sequencing


DNA may be amplified via polymerase chain reaction (PCR) before being sequenced.


The present method may comprise a step of analyzing DNA or RNA by sequencing or by microarray analysis. It should be appreciated that any suitable means of determining DNA sequence may be used in the present method.


The DNA may be sequenced using vector-based primers; or a specific gene is sought by using specific primers. PCR and sequencing techniques are well known in the art; reagents and equipment are readily available commercially.


Non-limiting examples of sequencing methods include Sanger sequencing or chain termination sequencing, Maxam-Gilbert sequencing, capillary array DNA sequencing, thermal cycle sequencing (Sears et al., Biotechniques, 13:626-633 (1992)), solid-phase sequencing (Zimmerman et al., Methods Mol. Cell Biol., 3:39-42 (1992)), sequencing with mass spectrometry such as matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF/MS; Fu et al., Nat. Biotechnol., 16:381-384 (1998)), and sequencing by hybridization (Chee et al., Science, 274:610-614 (1996); Drmanac et al., Science, 260:1649-1652 (1993); Drmanac et al., Nat. Biotechnol., 16:54-58 (1998)), NGS (next-generation sequencing) (Chen et al., Genome Res. 18:1143-1149 (2008); Srivatsan et al. PloS Genet. 4:e1000139 (2008)), Polony sequencing (Porreca et al., Curr. Protoc. Mol. Biol. Chp. 7; 7.8 (2006), ion semiconductor sequencing (Elliott et al., J. Biomol Tech. 1:24-30 (2010), DNA nanoball sequencing (Kaji et al., Chem Soc Rev 39:948-56 (2010), single-molecule real-time sequencing (Flusberg et al., Nat. Methods 6:461-5 (2010), sequencing by synthesis (e.g., Illumina/Solexa sequencing), sequencing by ligation, sequencing by hybridization, nanopore DNA sequencing (Wanunu, Phys Life Rev 9:125-58 (2012), massively Parallel Signature Sequencing (MPSS); pyro sequencing, SOLiD sequencing (McKeman et al. 2009 Genome Res 19:1527-1541; Shearer et al. 2010 Proc Natl Acad Sci USA 107:21104-21109); shortgun sequencing; Heliscope single molecule sequencing; single molecule real time (SMRT) sequencing. U.S. Patent Publication No. 20140329705.


High-throughput sequencing, next-generation sequencing (NGS), and/or deep-sequencing technologies include, but are not limited to, Illumina/Solex sequencing technology (Bentley et al. 2008 Nature 456:53-59), Roche/454 (Margulies et al. 2005 Nature 437:376-380), Pacbio (Flusberg et al. 2010 Nature methods 7:461-465; Korlach et al. 2010 Methods in enzymology 472:431-455; Schadt et al. 2010 Nature reviews. Genetics 11:647-657; Schadt et al. 2010 Human molecular genetics 19:R227-240; Eid et al. 2009 Science 323:133-138; Imelfort and Edwards, 2009 Briefings in bioinformatics 10:609-618), Ion Torrent (Rothberg et al. 2011 Nature 475:348-352)) and more. For example, Polony technology utilizes a single step to generate billions of “distinct clones” for sequencing. As another example, ion-sensitive field-effect transistor (ISFET) sequencing technology provides a non-optically based sequencing technique. U.S. Patent Publication No. 20140329712.


Several methods of DNA analysis are encompassed in the present disclosure. As used herein “deep sequencing” indicates that the depth of the process is many times larger than the length of the sequence under study. Deep sequencing is encompassed in next generation sequencing methods which include but are not limited to single molecule realtime sequencing (Pacific Bio), Ion semiconductor (Ion torrent sequencing), Pyrosequencing (454), Sequencing by synthesis (lilumina), Sequencing by ligations (SOLID sequencing) and Chain termination (Sanger sequencing).


Sequencing reads may be first subjected to quality control to identify overrepresented sequences and low-quality ends. The start and/or end of a read may or may not be trimmed. Sequences mapping to the genome may be removed and excluded from further analysis. As used herein, the term “read” refers to the sequence of a DNA fragment obtained after sequencing. In certain embodiments, the reads are paired-end reads, where the DNA fragment is sequenced from both ends of the molecule.


The level of the DNA or RNA (e.g., mRNA) molecules may be determined/detected using routine methods known to those of ordinary skill in the art. The level of the nucleic acid molecule may be determined/detected by nucleic acid hybridization using a nucleic acid probe, or by nucleic acid amplification using one or more nucleic acid primers.


Nucleic acid hybridization can be performed using Southern blots, Northern blots, nucleic acid microarrays, etc.


Nucleic acid microarray technology, which is also known as DNA chip technology, gene chip technology, and solid-phase nucleic acid array technology, may be based on, but not limited to, obtaining an array of identified nucleic acid probes on a fixed substrate, labeling target molecules with reporter molecules (e.g., radioactive, chemiluminescent, or fluorescent tags such as fluorescein, Cye3-dUTP, or Cye5-dUTP, etc.), hybridizing target nucleic acids to the probes, and evaluating target-probe hybridization. Jackson et al. (1996) Nature Biotechnology, 14: 1685-1691. Chee et al. (1995) Science, 274: 610-613.


The sensitivity of the assays may be enhanced through use of a nucleic acid amplification system that multiplies the target nucleic acid being detected.


Nucleic acid amplification assays include, but are not limited to, the polymerase chain reaction (PCR), reverse transcription polymerase chain reaction (RT-PCR), real-time RT-PCR, quantitative RT-PCR, etc.


Measuring or detecting the amount or level of mRNA in a sample can be performed in any manner known to one skilled in the art and such techniques for measuring or detecting the level of an mRNA are well known and can be readily employed. A variety of methods for detecting mRNAs have been described and may include, Northern blotting, microarrays, real-time PCR, RT-PCR, targeted RT-PCR, in situ hybridization, deep-sequencing, single-molecule direct RNA sequencing (RNAseq), bioluminescent methods, bioluminescent protein reassembly, BRET (bioluminescence resonance energy transfer)-based methods, fluorescence correlation spectroscopy and surface-enhanced Raman spectroscopy (Cissell, K. A. and Deo, S. K. (2009) Anal. Bioanal. Chem., 394:1109-1116).


The methods of the present invention may include the step of reverse transcribing RNA when assaying the level or amount of an mRNA.


Sequencing reads (e.g., the quality-corrected reads) may be mapped onto the genome of the microbe using any alignment algorithms known in the art. Non-limiting examples of such mapping algorithms include Bowtie; Bowtie2 (Langmead et al. 2009; Langmead et al., Fast gapped-read alignment with Bowtie 2. Nature methods 9(4), 357-9 (2012); Burrows-Wheeler Aligner (BWA, see, Li et al: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26(5), 589-95 (2010)); SOAP2 (Li et al., SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics, 25(15), 1966-7 (2009)); GATK; SMRA; PINDEL; SNAP (Zaharia et al., Faster and More Accurate Sequence Alignment with SNAP, arXiv:1111.5572 (2011)]; TMAP1-4; SMALT; and Masai (Siragusa et al., Fast and sensitive read mapping with approximate seeds and multiple backtracking. CoRR abs/1208.4238 (2012)). A recent overview of the alignment algorithms can be found in Li et al., A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics 2010, 11(5), 473-483. U.S. Patent Publication Nos. 20140214334, 20140108323 and 20140315726.


Mathematical algorithms that can be used for alignment also include, the algorithm of Myers and Miller (1988) CABIOS 4:11-17; the local alignment algorithm of Smith et al. (1981) Adv. Appl. Math. 2:482; the global alignment algorithm of Needleman and Wunsch (1970) J. Mol. Biol. 48:443-453; the search-for-local alignment method of Pearson and Lipman (1988) Proc. Natl. Acad. Sci. 85:2444-2448; the algorithm of Karlin and Altschul (1990) Proc. Natl. Acad. Sci. USA 872264, modified as in Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90:5873-5877. Computer implementations of these mathematical algorithms can be utilized for comparison of sequences to determine optimum alignment. Such implementations include, but are not limited to: CLUSTAL in the PC/Gene program (available from Intelligenetics, Mountain View, Calif); the ALIGN program (Version 2.0) and GAP, BESTFIT, BLAST, FASTA, and TFASTA in the GCG Wisconsin Genetics Software Package, Version 10 (available from Accelrys Inc., 9685 Scranton Road, San Diego, Calif., USA). Alignments using these programs can be performed using the default parameters. The CLUSTAL program is well described by Higgins et al. (1988) Gene 73:237-244 (1988); Higgins et al. (1989) CABIOS 5:151-153; Corpet et al. (1988) Nucleic Acids Res. 16:10881-90; Huang et al. (1992) CABIOS 8:155-65; and Pearson et al. (1994) Meth. Mol. Biol. 24:307-331. The ALIGN program is based on the algorithm of Myers and Miller (1988) supra. A PAM120 weight residue table, a gap length penalty of 12, and a gap penalty of 4 can be used with the ALIGN program when comparing amino acid sequences. The BLAST programs of Altschul et al. (1990) J. Mol. Biol. 215:403 are based on the algorithm of Karlin and Altschul (1990) supra. To obtain gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul et al. (1997) Nucleic Acids Res. 25:3389. Alternatively, PSI-BLAST (in BLAST 2.0) can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al. (1997) supra. In another embodiment, GSNAP (Thomas D. Wu, Serban Nacu “Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics. 2010 Apr. 1; 26(7):873-81. 2010) can also be used.


Algorithms and parameters for alignment can be adjusted depending on the type of bacteria selected, the type of target sequence being characterized, etc.


Mapped reads may be post-processed by removing PCR duplicates (multiple, identical reads), etc.


Organisms


The organism may be a eukaryotic organism, including human and non-human eukaryotic organisms. The organism may be a multicellular eukaryotic organism. The organism may be an arthropod such as an insect. The organism also may be a plant or a fungus. The organism may be prokaryotic.


In one embodiment, the cell is a mammalian cell, such as a human cell. Human cells may include human embryonic kidney cells (e.g., HEK293T cells), human dermal fibroblasts, human cancer cells, etc.


In another embodiment, the cell is a yeast cell. The organism may be a yeast. In yet another embodiment, the cell is a bacterial cell. The organism may be bacteria.


Molecular Biology


In accordance with the present invention, there may be numerous tools and techniques within the skill of the art, such as those commonly used in molecular immunology, cellular immunology, pharmacology, and microbiology. See, e.g., Sambrook et al. (2001) Molecular Cloning: A Laboratory Manual. 3rd ed. Cold Spring Harbor Laboratory Press: Cold Spring Harbor, N.Y.; Ausubel et al. eds. (2005) Current Protocols in Molecular Biology. John Wiley and Sons, Inc.: Hoboken, N.J.; Bonifacino et al. eds. (2005) Current Protocols in Cell Biology. John Wiley and Sons, Inc.: Hoboken, N.J.; Coligan et al. eds. (2005) Current Protocols in Immunology, John Wiley and Sons, Inc.: Hoboken, N.J.; Coico et al. eds. (2005) Current Protocols in Microbiology, John Wiley and Sons, Inc.: Hoboken, N.J.; Coligan et al. eds. (2005) Current Protocols in Protein Science, John Wiley and Sons, Inc.: Hoboken, N.J.; and Enna et al. eds. (2005) Current Protocols in Pharmacology, John Wiley and Sons, Inc.: Hoboken, N.J.


The terms used in this specification generally have their ordinary meanings in the art, within the context of this invention and the specific context where each term is used. Certain terms are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner in describing the methods of the invention and how to use them. Moreover, it will be appreciated that the same thing can be said in more than one way. Consequently, alternative language and synonyms may be used for any one or more of the terms discussed herein, nor is any special significance to be placed upon whether or not a term is elaborated or discussed herein. Synonyms for certain terms are provided. A recital of one or more synonyms does not exclude the use of the other synonyms. The use of examples anywhere in the specification, including examples of any terms discussed herein, is illustrative only, and in no way limits the scope and meaning of the invention or any exemplified term. Likewise, the invention is not limited to its preferred embodiments.


As used herein, the term “isolated” and the like means that the referenced material is free of components found in the natural environment in which the material is normally found. In particular, isolated biological material is free of cellular components. In the case of nucleic acid molecules, an isolated nucleic acid includes a PCR product, an isolated mRNA, a cDNA, an isolated genomic DNA, or a restriction fragment. In another embodiment, an isolated nucleic acid is preferably excised from the chromosome in which it may be found. Isolated nucleic acid molecules can be inserted into plasmids, cosmids, artificial chromosomes, and the like. Thus, in a specific embodiment, a recombinant nucleic acid is an isolated nucleic acid. An isolated protein may be associated with other proteins or nucleic acids, or both, with which it associates in the cell, or with cellular membranes if it is a membrane-associated protein. An isolated material may be, but need not be, purified.


The term “purified” and the like as used herein refers to material that has been isolated under conditions that reduce or eliminate unrelated materials, i.e., contaminants. For example, a purified protein is preferably substantially free of other proteins or nucleic acids with which it is associated in a cell; a purified nucleic acid molecule is preferably substantially free of proteins or other unrelated nucleic acid molecules with which it can be found within a cell. As used herein, the term “substantially free” is used operationally, in the context of analytical testing of the material. Preferably, purified material substantially free of contaminants is at least 50% pure; more preferably, at least 90% pure, and more preferably still at least 99% pure. Purity can be evaluated by chromatography, gel electrophoresis, immunoassay, composition analysis, biological assay, and other methods known in the art.


The terms “expression profile” or “gene expression profile” refers to any description or measurement of one or more of the genes that are expressed by a cell, tissue, or organism under or in response to a particular condition. Expression profiles can identify genes that are up-regulated, down-regulated, or unaffected under particular conditions. Gene expression can be detected at the nucleic acid level or at the protein level. The expression profiling at the nucleic acid level can be accomplished using any available technology to measure gene transcript levels. For example, the method could employ in situ hybridization, Northern hybridization or hybridization to a nucleic acid microarray, such as an oligonucleotide microarray, or a cDNA microarray. Alternatively, the method could employ reverse transcriptase-polymerase chain reaction (RT-PCR) such as fluorescent dye-based quantitative real time PCR (TaqMan® PCR). In the Examples section provided below, nucleic acid expression profiles were obtained using Affymetrix GeneChip® oligonucleotide microarrays. The expression profiling at the protein level can be accomplished using any available technology to measure protein levels, e.g., using peptide-specific capture agent arrays.


The terms “gene signature” and “signature genes” will be used interchangeably herein and mean the particular transcripts that have been found to be differentially expressed in some prostate cancer patients.


The terms “gene”, “gene transcript”, and “transcript” are used interchangeably in the application. The term “gene”, also called a “structural gene” means a DNA sequence that codes for or corresponds to a particular sequence of amino acids which comprise all or part of one or more proteins or enzymes, and may or may not include regulatory DNA sequences, such as promoter sequences, which determine for example the conditions under which the gene is expressed. Some genes, which are not structural genes, may be transcribed from DNA to RNA, but are not translated into an amino acid sequence. Other genes may function as regulators of structural genes or as regulators of DNA transcription. “Transcript” or “gene transcript” is a sequence of RNA produced by transcription of a particular gene. Thus, the expression of the gene can be measured via the transcript.


The term “genomic DNA” as used herein means all DNA from a subject including coding and non-coding DNA, and DNA contained in introns and exons.


The term “nucleic acid hybridization” refers to anti-parallel hydrogen bonding between two single-stranded nucleic acids, in which A pairs with T (or U if an RNA nucleic acid) and C pairs with G. Nucleic acid molecules are “hybridizable” to each other when at least one strand of one nucleic acid molecule can form hydrogen bonds with the complementary bases of another nucleic acid molecule under defined stringency conditions. Stringency of hybridization is determined, e.g., by (i) the temperature at which hybridization and/or washing is performed, and (ii) the ionic strength and (iii) concentration of denaturants such as formamide of the hybridization and washing solutions, as well as other parameters. Hybridization requires that the two strands contain substantially complementary sequences. Depending on the stringency of hybridization, however, some degree of mismatches may be tolerated. Under “low stringency” conditions, a greater percentage of mismatches are tolerable (i.e., will not prevent formation of an anti-parallel hybrid).


The terms “vector”, “cloning vector” and “expression vector” mean the vehicle by which a DNA or RNA sequence (e.g. a foreign gene) can be introduced into a host cell, so as to transform the host and promote expression (e.g. transcription and translation) of the introduced sequence. Vectors include, but are not limited to, plasmids, phages, and viruses.


Vectors typically comprise the DNA of a transmissible agent, into which foreign DNA is inserted. A common way to insert one segment of DNA into another segment of DNA involves the use of enzymes called restriction enzymes that cleave DNA at specific sites (specific groups of nucleotides) called restriction sites. A “cassette” refers to a DNA coding sequence or segment of DNA which codes for an expression product that can be inserted into a vector at defined restriction sites. The cassette restriction sites are designed to ensure insertion of the cassette in the proper reading frame. Generally, foreign DNA is inserted at one or more restriction sites of the vector DNA, and then is carried by the vector into a host cell along with the transmissible vector DNA. A segment or sequence of DNA having inserted or added DNA, such as an expression vector, can also be called a “DNA construct” or “gene construct.” A common type of vector is a “plasmid”, which generally is a self-contained molecule of double-stranded DNA, usually of bacterial origin, that can readily accept additional (foreign) DNA and which can readily introduced into a suitable host cell. A plasmid vector often contains coding DNA and promoter DNA and has one or more restriction sites suitable for inserting foreign DNA. Coding DNA is a DNA sequence that encodes a particular amino acid sequence for a particular protein or enzyme. Promoter DNA is a DNA sequence which initiates, regulates, or otherwise mediates or controls the expression of the coding DNA. Promoter DNA and coding DNA may be from the same gene or from different genes, and may be from the same or different organisms. A large number of vectors, including plasmid and fungal vectors, have been described for replication and/or expression in a variety of eukaryotic and prokaryotic hosts. Non-limiting examples include pKK plasmids (Clonetech), pUC plasmids, pET plasmids (Novagen, Inc., Madison, Wis.), pRSET or pREP plasmids (Invitrogen, San Diego, Calif.), or pMAL plasmids (New England Biolabs, Beverly, Mass.), and many appropriate host cells, using methods disclosed or cited herein or otherwise known to those skilled in the relevant art. Recombinant cloning vectors will often include one or more replication systems for cloning or expression, one or more markers for selection in the host, e.g. antibiotic resistance, and one or more expression cassettes.


A “polynucleotide” or “nucleotide sequence” is a series of nucleotide bases (also called “nucleotides”) in a nucleic acid, such as DNA and RNA, and means any chain of two or more nucleotides. A nucleotide sequence typically carries genetic information, including the information used by cellular machinery to make proteins and enzymes. These terms include double or single stranded genomic and cDNA, RNA, any synthetic and genetically manipulated polynucleotide, and both sense and anti-sense polynucleotide. This includes single- and double-stranded molecules, i.e., DNA-DNA, DNA-RNA and RNA-RNA hybrids, as well as “protein nucleic acids” (PNA) formed by conjugating bases to an amino acid backbone. This also includes nucleic acids containing modified bases, for example thio-uracil, thio-guanine and fluoro-uracil.


“Nucleic acid” refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form. The nucleic acids herein may be flanked by natural regulatory (expression control) sequences, or may be associated with heterologous sequences, including promoters, internal ribosome entry sites (IRES) and other ribosome binding site sequences, enhancers, response elements, suppressors, signal sequences, polyadenylation sequences, introns, 5′- and 3′-non-coding regions, and the like. The term encompasses nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, which have similar binding properties as the reference nucleic acid, and which are metabolized in a manner similar to the reference nucleotides. The nucleic acids may also be modified by many means known in the art. Non-limiting examples of such modifications include methylation, “caps”, substitution of one or more of the naturally occurring nucleotides with an analog, and internucleotide modifications such as, for example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoroamidates, and carbamates) and with charged linkages (e.g., phosphorothioates, and phosphorodithioates). Polynucleotides may contain one or more additional covalently linked moieties, such as, for example, proteins (e.g., nucleases, toxins, antibodies, signal peptides, and poly-L-lysine), intercalators (e.g., acridine, and psoralen), chelators (e.g., metals, radioactive metals, iron, and oxidative metals), and alkylators. The polynucleotides may be derivatized by formation of a methyl or ethyl phosphotriester or an alkyl phosphoramidate linkage. Modifications of the ribose-phosphate backbone may be done to facilitate the addition of labels, or to increase the stability and half-life of such molecules in physiological environments. Nucleic acid analogs can find use in the methods of the invention as well as mixtures of naturally occurring nucleic acids and analogs. Furthermore, the polynucleotides herein may also be modified with a label capable of providing a detectable signal, either directly or indirectly. Exemplary labels include radioisotopes, fluorescent molecules, and biotin.


The term “polypeptide” as used herein means a compound of two or more amino acids linked by a peptide bond. “Polypeptide” is used herein interchangeably with the term “protein.”


The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system, i.e., the degree of precision required for a particular purpose, such as a pharmaceutical formulation. For example, “about” can mean within 1 or more than 1 standard deviations, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated, the term “about” meaning within an acceptable error range for the particular value should be assumed.


Tissue Fixation/Immobilization of Sample


In one embodiment, a tissue section is fixed and embedded in a gel matrix by in situ perfusion and acrylamide polymerization. Other methods of tissue fixation include using methyl methacrylate and glycol methacrylate, also referred to as Technovit® (emsdiasum.com/microscopy/technical/datasheet/14654 immunohistochemistry.aspx, retrieved, Mar. 26, 2019; see also, Hasegawa et al. Preservation of three-dimensional spatial structure in the gut microbiome, biorxiv.org/content/biorxiv/early/2017/08/11/175224.full.pdf, retrieved, Mar. 26, 2019). Tissues can also be fixed using a combination of sodium acrylate, a monomer used to produce superabsorbent materials, along with the comonomer acrylamide and the crosslinker N—N′-methylenebisacrylamide such as that used with expansion microscopy. Chen et al. Expansion Microscopy Science 347 (6221):543-548 (2015). Other techniques for tissue fixation, include nanoporous hydrogel-fixation, also referred to as CLARITY. Chung et al. Structural and molecular interrogation of intact biological systems. Nature 497:332-337 (2013).


Metagenomic Plot Sampling by Sequencing (MaP-Seq)


MaP-seq was applied to the mouse colonic microbiome. The methods and systems of the present invention could be applied to any structural, anatomic system, including, but not limited to the brain (central nervous system), the pulmonary system (the lungs, bronchi and alveoli), the genitouringary tract, including, but not limited to the kidneys, ureters, bladder, urethra, ovaries, testicles, prostate, penis and vagina, the peripheral vascular and cardiovascular systems, including, but not limited to the arteries (coronary, pulmonary, aorta, femoral, carotid, basilar), veins (pulmonary, vena cava, femoral), heart (left ventricle, right ventricle, left atrium, right atrium), the gastrointestinal system such as the esophagus, stomach (including, but not limited to the fundus and pyloric valve), the liver, gall balder, small intestines (ileum and jejunum), large intestines (colon), the eye and the skin. The methods and systems of the present invention could be applied to any mammalian or non-mammalian species, including, but not limited to, rats, mice, canines, felines, cows, sheep, horses, goats, birds, humans (cadaver material), reptiles and fish.


The methods and systems of the present invention could also be applied to any three-dimensional structure such as a solid tumor of any organ, including, but not limited to, bladder, bone, colon, esophagus, salivary glands, kidney, lung, Central Nervous System, Neuroendocrine System, ovaries, prostate, testicles, soft tissue and skin.


The methods and systems of the present invention could also be applied to biofilms.


We generated and characterized cell clusters (˜30 μm median diameter) from a segment of the distal colon (including both epithelium and digesta) of a mouse fed a plant-polysaccharide diet, yielding 1,406 clusters passing strict quality filtering across two technical replicates (FIG. 2a, FIG. 3a (Methods). Other cell cluster sizes are encompassed by the methods and systems of the invention, including, ˜10 μm, ˜20 μm, ˜25 μm, ˜35 μm, ˜40 μm, ˜50 μm, ˜60 μm, ˜70 μm, ˜80 μm, ˜90 μm or ˜100 μm. Additional sizes range from ˜100 μm to ˜500 μm. 236 total OTUs were identified with their prevalence across clusters highly correlating to bulk abundance obtained by standard 16S sequencing, implying that more abundant taxa are also physically dispersed over more space (FIG. 2b, Pearson correlation r=0.90). The spatial distribution of taxa across clusters appeared mixed (median 9 OTUs per cluster), but some clusters contained only a few OTUs indicating spatial aggregation or clumping in a fraction of the community (FIG. 2c). Moreover, this observed distribution of OTUs per cluster was significantly lower than clusters of the same size generated from homogenized fecal bacteria, which serve as a control for a well-mixed community (Mann-Whitney U test, p<10−26). These results suggest that at the scale of tens of microns, individual taxa in the gut microbiome are neither fully mixed nor highly structured, but rather are heterogeneously distributed in mixed patches.


We next explored whether these observed spatial distributions reflect specific associations between individual bacterial taxa that may result from processes such as positive or negative interspecies interactions (e.g., cooperative metabolism (see Rakoff-Nahoum, S., Coyne, M. J. & Comstock, L. E. An Ecological Network of Polysaccharide Utilization among Human Intestinal Symbionts. Current Biology 24, 40-49 (2014)); contact-dependent killing (see Wexler, A. G. et al. (2016))) or local habitat filtering (see Nagara, Y., Takada, T., Nagata, Y., Kado, S. & Kushiro, A. (2017). Across abundant and prevalent OTUs (>2% abundance in >10% of clusters, n=24), we assessed whether their pairwise co-occurrences were detected more or less frequently than expected in comparison to a null model of independent, random assortment of OTUs (Methods, Fisher's exact test, p<0.05, FDR=0.05). Application of this strategy to the cluster mixing control experiment confirmed our ability to accurately detect positive and negative spatial associations that are expected (FIG. 4f). Out of 276 possible pairwise combinations of taxa in the murine colon, we detected 75 statistically significant associations between diverse taxa, the majority of which were positive (72/75) but relatively weak in magnitude (FIG. 2d, FIG. 3b-c). The strongest co-occurrence was a positive association between abundant Bacteroidaceae and Porphyromonadaceae taxa from the Bacteroidales order (odds ratio 3.9, p<10−23). In addition, a small number of negative associations were observed, which could reflect antagonistic processes such as production of inhibitory factors or competitive exclusion.


The number of detected associations increased as more of the dataset is sampled, implying that detection of weaker relationships between less abundant taxa can be improved by analyzing more clusters FIG. 3d). Nonetheless, the detected associations showed good correspondence between technical replicates FIG. 3e). Importantly, despite high inter-host microbiome variability, the nature of the associations (i.e., sign, magnitude, and number) and some strong associations could be recapitulated in MaP-seq profiling of a second co-housed mouse, such as the co-occurrence of Bacteroidales taxa. This characterization implies that individual taxa in the colon are organized in distinct and reproducible spatial relationships.


To further investigate how the spatial organization of the microbiota is influenced by their environmental context, we applied spatial metagenomics along the gastrointestinal (GI) tract. The mammalian GI tract is composed of distinct anatomical regions with different pH levels, oxygen concentrations, host-derived antimicrobials and transit times that together influence the local microbiota assemblage (see Donaldson, G. P., Lee, S. M. & Mazmanian, S. K. Gut biogeography of the bacterial microbiota. 1-13 (2015). doi:10.1038/nrmicro3552). We first performed an adapted 16S community profiling approach along the murine GI tract that could also infer absolute OTU abundances (see Ji, B. W. et al. Quantifying spatiotemporal dynamics and noise in absolute microbiota abundances using replicate sampling. biorxiv.org doi:10.1101/310649 (2018)) FIG. 5a (Methods). This new mouse cohort (2 co-housed mice) shared only ˜20% of OTUs with the previous group, illustrating the significant inter-animal microbiome heterogeneity inherent to such studies. This further highlights challenges for other spatial profiling techniques such as 16S FISH imaging where probes must be designed in advance, in comparison to MaP-seq, which can be applied to measure diverse bacteria without advance specification. Analysis of microbiota in absolute abundance across the intestine revealed increased bacterial density (˜16 fold higher) and species richness in the large intestine compared to the small intestine, with the cecum harboring the highest bacterial density and number of OTUs. We chose three separate GI regions that exhibited distinct microbiota assemblages for characterization by MaP-seq: the ileum (si6), cecum (cec) and distal colon (co2). Given the high degree of species mixing previously observed at ˜30 μm, we used smaller sized clusters (˜20 μm median diameter) to capture higher-resolution spatial associations. Ranges of clusters can include, 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 400, 600, 800, 1000 μm.


The distribution of OTUs per cluster was compared with the spatial organization of taxa in the three regions FIG. 5a. ˜20 μm clusters displayed lower numbers of OTUs per cluster than ˜30 μm clusters (median 3-4 OTUs per cluster). The ileum possessed significantly fewer OTUs per cluster than the cecum or distal colon (Mann-Whitney U test, p<10−18 and p<10−14 respectively). By comparison, the cecum and colon displayed similar OTU distributions, while the cecum harbored more clusters with a large number of OTUs.


To understand how the local spatial organization of the microbiome may vary within and across different gut compartments, we visualized the cell clusters data across the three gut regions using t-distributed Stochastic Neighbor Embedding (tSNE, utilizing Bray-Curtis distance of OTU relative abundance within clusters), as well as the abundance of prevalent bacterial families in cell clusters across the resulting manifold FIG. 5c (Methods, FIG. 3c). While some cell clusters from the ileum, cecum and distal colon separately projected into distinct groups, other clusters from each site projected more broadly across the manifold. Interestingly, a subset of cell clusters from the cecum projected into a dense group and are compositionally dominated by Lachnospiraceae, which were generally not present in clusters from the ileum or distal colon. When cell clusters from a second co-housed mouse were added to the tSNE analysis, they were distributed in a similar manner to clusters from the first mouse across the manifold and displayed a similar cecum-specific Lachnospiraceae group.


Next, we explored whether these different spatial distributions reflect distinct spatial co-associations between taxa at each GI site (FIG. 5d). The ileum harbored a network of positive and negative associations between the few taxa present. On the other hand, the cecum exhibited a dense network of positively co-associated taxa, primarily between abundant Lachnospiraceae, Ruminococcaceae, and Porphyromonadaceae. Similar to the cecum, the distal colon displayed only positive associations, including strong groupings between three abundant Porphyromonadaceae (OTUs 5, 8, 9). Profiling the colon at an even smaller size-scale (˜7 μm) confirmed strong positive associations between these three taxa FIG. 6, indicating that this spatial clustering occurs robustly at short, local length-scales.


We further investigated whether MaP-seq could identify individual taxa with unique or altered spatial patterns. While the cecum harbored the densest community and the highest degree of species mixing of the three sites FIG. 5a-b, we hypothesized that specific taxa may self-aggregate to a higher degree than others, for example by uniquely utilizing a specific metabolite (see Nagara, Y., Takada, T., Nagata, Y., Kado, S. & Kushiro, A. (2017). Assessing the aggregation of abundant taxa revealed a Lachnospiraceae (OTU 7; putatively of the genus Dorea, 60% confidence by RDP) that clustered two-fold greater than the average clustering metric value of all taxa (FIG. 7a). To validate this finding with an orthogonal approach, we performed 16S FISH on GI sections from the same murine sample using previously validated probes that targeted Lachnospiraceae (Erec482) as well as two other abundant taxa for which FISH probes were available but were predicted not to cluster at a similar degree (Coriobacteriaceae: Ato291, Lactobacillaceae: Lab148; Methods). Strikingly, imaging confirmed that while Lachnospiraceae were distributed across the cecum, they also formed large clustered aggregates that appeared to exclude other bacteria FIGS. 7-8. Importantly, this result highlights that individual taxa in the gut can organize in unique and spatially varying micron-scale structures that can be revealed by using MaP-seq.


Having established the local spatial organization across the GI tract of mice fed a standard plant-polysaccharide diet, we next sought to understand the extent to which diet might influence spatial structuring. Diet is known to play a major role in shaping the variation of gut microbiota across individuals (see Carmody, R. N. et al. Diet Dominates Host Genotype in Shaping the Murine Gut Microbiota. Cell Host & Microbe 17, 72-84 (2015); Sonnenburg, E. D. et al. Diet-induced extinctions in the gut microbiota compound over generations. Nature 529, 212-215 (2016)). While diet shifts can rapidly alter microbiota composition within days (see David, L. A. et al. Diet rapidly and reproducibly alters the human gut microbiome. Nature 505, 559-563 (2014)), the detailed ecological mechanisms underlying these community-scale changes are not well understood. We thus took co-housed mice and split them into two cohorts where one was maintained on the plant-polysaccharide based diet (LF, same as in the previous cohorts) and one was switched to a high fat, high sugar diet (HF, commonly utilized in dietary-induced obesity studies) to assess microbiota changes associated with these two diets representing distinct macronutrient profiles. After 10 days on the two diets, a considerable loss of species richness in the cecum and colon was observed in HF-fed mice compared to LF-fed mice FIG. 9.


To determine if a dietary shift could alter the spatial organization of the microbiota, which could contribute to the observed loss of species diversity, we performed MaP-seq on distal colon samples from mice fed the LF or HF diet. We found that the distribution of unique OTUs per ˜20 μm cluster was similar between both diets FIG. 9b (top). This implies that species distributions at the local ˜20 μm scale is governed by factors that are either common to or not affected by the two diets, for example spatial autocorrelation of bacterial growth. However, assessing diversity at the higher taxonomic family-rank revealed significantly higher diversity in HF clusters (Mann-Whitney U test, p<10−22, FIG. 9b, bottom, indicating that while both LF and HF clusters contained similar numbers of OTUs, taxa within individual HF clusters were more phylogenetically diverse. Furthermore, positive co-associations were more frequently observed between diverse taxa in HF diet than in LF diet, which in contrast had co-associations mostly between Porphyromonadaceae or Lachnospiraceae.


Next, to compare the taxa spatial organization across the two diets, we visualized clusters using tSNE as before FIG. 9d, FIG. 10. Cell clusters from the two diets each formed highly distinct groups with minimal overlap, indicating that the spatial organization in the distal colon was significantly altered by the dietary shift. Despite this overall separation, we observed examples of cluster configurations that were shared between the two diets. For example, HF clusters were observed in a predominantly LF region marked by high abundance of a Porphyromondaceae taxa (OTU 5), and LF clusters were observed in a predominantly HF region marked by high abundance of a Bacteroidaceae taxa (OTU 6) FIG. 10d.


The following are examples of the present invention and are not to be construed as limiting.


EXAMPLES
Example 1 Spatial Metagenomic Characterization of Microbial Biogeography in the Gut

Spatial structuring promotes biodiversity and is important to the maintenance of natural ecological systems1,2. Many microbial communities, including the mammalian gut microbiome, display intricate spatial organization3-9. Mapping spatial distributions of bacterial species enables the detailed delineation of fundamental ecological processes and interactions that underlie community-wide behaviors10-12. However, current approaches have a limited capacity to measure the spatial organization of natural microbiomes with hundreds of species13-17. Here, we describe spatial metagenomics, a framework to dissect the organization of a microbiome at micron-scale spatial resolution and metagenomic depth through nucleic acid “plot sampling”. Intact microbiome samples are immobilized within a gel matrix and subjected to cryo-fracturing to generate clusters of co-localized cells, and the identities and abundances of taxa present in these clusters are determined via droplet-based encapsulation and deep sequencing. Analysis of thousands of microbiome clusters from the mouse intestine across three distinct regions revealed heterogeneous microbial distributions with positive and negative co-associations between specific taxa. While the murine intestinal microbiome mostly exhibited regionally distinct spatial organizations, robust associations between Bacteroidales taxa were observed across gut compartments. Analysis of a dietary perturbation revealed phylogenetically clustered regions suggesting local habitat filtering that may be important to maintenance of diversity observed on plant-polysaccharide diets, and enabled identification of spatial niches that may be shared across distinct diets. Spatial metagenomics constitutes a powerful new culture-independent technique to mechanistically study microbial biogeography in complex habitats.


To perform MaP-seq, an input sample is first physically fixed by immobilizing the microbiota via perfusion and in situ polymerization of an acrylamide polymer matrix that also contains a covalently linked reverse 16S rRNA amplification primer. The embedded sample is then fractured via cryo-bead beating, subjected to cell lysis, and passed through nylon mesh filters for size selection to yield cell clusters or particles of desired and tunable physical sizes (i.e. by utilizing different mesh filter sizes). Resulting clusters contain genomic DNA immobilized in their original arrangement, preserving local spatial information. Next, a microfluidic device is used to co-encapsulate these clusters with gel beads, each containing uniquely barcoded forward 16S rRNA amplification primers. Primers are photocleaved from the beads and clusters, genomic DNA is released from clusters by triggered degradation of the polymer matrix within droplets, and PCR amplification of the 16S V4 region is performed. Droplets are then broken apart, and the resulting library is subjected to deep sequencing. Sequencing reads are filtered and grouped by their unique barcodes, which yield the identity and abundance of bacterial operational taxonomic units (OTUs) within individual cell clusters.


To rigorously test the feasibility of this spatial metagenomics approach, we first generated separate cluster communities from either homogenized mouse fecal bacteria or E. coli (Methods) and profiled them with MaP-seq. The resulting data revealed that the majority of detected barcodes mapped uniquely to their respective initial communities with minimal mixing (FIG. 1b, 4.3% mixed) and negligible contamination introduced during sample processing (<0.2% of reads). In addition, the average abundance of taxa across individual fecal clusters obtained by enzymatic lysis and droplet PCR displayed good correlation with standard mechanical cell lysis and bulk 16S PCR measurements (FIG. 1c, Pearson correlation r=0.76). A replicate community mixing experiment with new particles of a smaller size confirmed technical performance of the approach. Together, these results indicate that MaP-seq accurately measures bacterial identity and abundance within individual spatially constrained cell clusters.


To explore the utility of spatial metagenomics to map the natural biogeography of microbiota in complex communities, we applied MaP-seq to the mouse colonic microbiome. We generated and characterized cell clusters (˜30 μm median diameter) from a segment of the distal colon (including both epithelium and digesta) of a mouse fed a plant-polysaccharide diet, yielding 1,406 clusters passing strict quality filtering across two technical replicates (FIG. 2a, Methods). 236 total OTUs were identified with their prevalence across clusters highly correlating to bulk abundance obtained by standard 16S sequencing, implying that more abundant taxa are also physically dispersed over more space (FIG. 2b, Pearson correlation r=0.90). The spatial distribution of taxa across clusters appeared mixed (median 9 OTUs per cluster), but some clusters contained only a few OTUs indicating spatial aggregation or clumping in a fraction of the community (FIG. 2c). Moreover, this observed distribution of OTUs per cluster was significantly lower than clusters of the same size generated from homogenized fecal bacteria, which serve as a control for a well-mixed community (Mann-Whitney U test, p<10-26). These results suggest that at the scale of tens of microns, individual taxa in the gut microbiome are neither fully mixed nor highly structured, but rather are heterogeneously distributed in mixed patches. Peristaltic mixing across the gut likely acts to decrease strong spatial segregation between taxa, but nevertheless the weak but significant spatial structuring observed could play an important role in the maintenance of high microbial diversity observed in the healthy gut1,22.


We next explored whether these observed spatial distributions reflect specific associations between individual taxa that may result from processes such as positive or negative interspecies interactions (e.g., cooperative metabolism24, contact-dependent killing20) or local habitat filtering11. Across abundant and prevalent OTUs (>2% abundance in >10% of clusters, n=24), we assessed whether their pairwise co-occurrences were detected more or less frequently than expected in comparison to a null model of independent, random assortment of OTUs (Methods, Fisher's exact test, p<0.05, FDR=0.05). Application of this strategy to the cluster mixing control experiment confirmed our ability to accurately detect positive and negative spatial associations that are expected. Out of 276 possible pairwise combinations of taxa in the murine colon, we detected 75 statistically significant associations between diverse taxa, the majority of which were positive (72/75) but relatively weak in magnitude (FIG. 2d). The strongest co-occurrence was a positive association between abundant Bacteroidaceae and Porphyromonadaceae taxa from the Bacteroidales order (odds ratio 3.9, p<10-23). In addition, a small number of negative associations were observed, which could reflect antagonistic processes such as production of inhibitory factors or competitive exclusion.


The number of detected associations increased as more of the dataset is sampled, implying that detection of weaker relationships between less abundant taxa can be improved by analyzing more clusters. Nonetheless, the detected associations showed good correspondence between technical replicates. Importantly, despite high inter-host microbiome variability, the nature of the associations (i.e., sign, magnitude, and number) and some strong associations could be recapitulated in MaP-seq profiling of a second co-housed mouse, such as the co-occurrence of Bacteroidales taxa. This characterization implies that individual taxa in the colon are organized in distinct and reproducible spatial relationships.


To further investigate how the spatial organization of the microbiota is influenced by their environmental context, we applied spatial metagenomics along the gastrointestinal (GI) tract. The mammalian GI tract is composed of distinct anatomical regions with different pH levels, oxygen concentrations, host-derived antimicrobials and transit times that together influence the local microbiota assemblage9. We first performed an adapted 16S community profiling approach along the murine GI tract that could also infer absolute OTU abundances25 (FIG. 3a). This new mouse cohort (2 co-housed mice) shared only ˜20% of OTUs with the previous group, illustrating the significant inter-animal microbiome heterogeneity inherent to such studies. This further highlights challenges for other spatial profiling techniques such as 16S FISH imaging where probes must be designed in advance, in comparison to MaP-seq, which can be applied to measure diverse bacteria without advance specification. Analysis of microbiota in absolute abundance across the intestine revealed increased bacterial density (˜16 fold higher) and species richness in the large intestine compared to the small intestine, with the cecum harboring the highest bacterial density and number of OTUs. We chose three separate GI regions that exhibited distinct microbiota assemblages for characterization by MaP seq: the ileum (si6), cecum (cec) and distal colon (co2). Given the high degree of species mixing previously observed at ˜30 we used smaller sized clusters (˜20 μm median diameter) to capture higher-resolution spatial associations.


We first assessed the distribution of OTUs per cluster to compare the spatial organization of taxa in the three regions (FIG. 3b). ˜20 μm clusters displayed lower numbers of OTUs per cluster than ˜30 μm clusters (median 3-4 OTUs per cluster). The ileum possessed significantly fewer OTUs per cluster than the cecum or distal colon (Mann-Whitney U test, p<10-18 and p<10-14 respectively). In comparison, the cecum and colon displayed similar OTU distributions, while the cecum harbored more clusters with a large number of OTUs. This suggests that GI regions with more diverse microbiota also exhibit higher spatial diversity at microscopic scales.


To understand how the local spatial organization of the microbiome may vary within and across different gut compartments, we visualized the cell clusters data across the three gut regions using t-distributed Stochastic Neighbor Embedding (tSNE, utilizing Bray-Curtis distance of OTU relative abundance within clusters), as well as the abundance of prevalent bacterial families in cell clusters across the resulting manifold (Methods, FIG. 3c). While some cell clusters from the ileum, cecum and distal colon separately projected into distinct groups, other clusters from each site projected more broadly across the manifold. Interestingly, a subset of cell clusters from the cecum projected into a dense group and are compositionally dominated by Lachnospiraceae, which were generally not present in clusters from the ileum or distal colon. When cell clusters from a second co-housed mouse were added to the tSNE analysis, they were distributed in a similar manner to clusters from the first mouse across the manifold and displayed a similar cecum-specific Lachnospiraceae group, further strengthening these results. Our observations suggest that the spatial distribution of some taxa at different GI regions may have distinct local organizations from one another while other taxa may have similar local organization along the GI tract.


Next, we explored whether these different spatial distributions reflect distinct spatial co-associations between taxa at each GI site (FIG. 3d). The ileum harbored a network of positive and negative associations between the few taxa present. On the other hand, the cecum exhibited a dense network of positively co-associated taxa, primarily between abundant Lachnospiraceae, Ruminococcaceae, and Porphyromonadaceae. Similar to the cecum, the distal colon displayed only positive associations, including strong groupings between three abundant Porphyromonadaceae (OTUs 5, 8, 9). Profiling the colon at an even smaller size-scale (˜7 μm) confirmed strong positive associations between these three taxa, indicating that this spatial clustering occurs robustly at short, local length scales. Species from these abundant Bacteroidales taxa often contain diverse carbohydrate-active enzymes26 and are known to engage in cooperative metabolic cross-feeding24,27, which could promote these spatial co-associations.


While the spatial association networks revealed by MaP-seq differed across the three GI regions, some common co-associations (or lack of associations) were observed. For example, a positive association between Lachnospiraceae (OTU 10) and Lactobacillaceae (OTU 4) was found in both the cecum and colon; on the other hand, Coriobacteriaceae (OTU 1), an abundant taxon at all sites, lacked co-associations with other taxa and was thus randomly assorted at all sites. Together, the differing spatial architectures observed across GI sites suggest that regional environmental factors can variably shape some local spatial structuring of the microbiota, while conserved spatial patterns across sites are more likely the result of robust ecological interactions not affected by environmental variations.


We further investigated whether MaP-seq could identify individual taxa with unique or altered spatial patterns. While the cecum harbored the densest community and the highest degree of species mixing of the three sites (FIG. 3a-b), we hypothesized that specific taxa may self-aggregate to a higher degree than others, for example by uniquely utilizing a specific metabolite11. Assessing the aggregation of abundant taxa revealed a Lachnospiraceae (OTU 7; putatively of the genus Dorea, 60% confidence by RDP) that clustered two-fold greater than the average clustering metric value of all taxa. To validate this finding with an orthogonal approach, we performed 16S FISH on GI sections from the same murine sample using previously validated probes that targeted Lachnospiraceae (Erec482) as well as two other abundant taxa for which FISH probes were available but were predicted not to cluster at a similar degree (Coriobacteriaceae: Ato291, Lactobacillaceae: Lab148; Methods). Strikingly, imaging confirmed that while Lachnospiraceae were distributed across the cecum, they also formed large clustered aggregates that appeared to exclude other bacteria. Importantly, this result highlights that individual taxa in the gut can organize in unique and spatially varying micron scale structures that can be revealed by MaP-seq.


Having established the local spatial organization across the GI tract of mice fed a standard plant polysaccharide diet, we next sought to understand the extent to which diet might influence spatial structuring. Diet is known to play a major role in shaping the variation of gut microbiota across individuals28,29. While diet shifts can rapidly alter microbiota composition within days30, the detailed ecological mechanisms underlying these community-scale changes are not well understood. We thus took co-housed mice and split them into two cohorts where one was maintained on the plant polysaccharide based diet (LF, same as in the previous cohorts) and one was switched to a high fat, high sugar diet (HF, commonly utilized in dietary-induced obesity studies) to assess microbiota changes associated with these two diets representing distinct macronutrient profiles. After 10 days on the two diets, a considerable loss of species richness in the cecum and colon was observed in HF-fed mice compared to LF-fed mice (FIG. 4a).


To determine if a dietary shift could alter the spatial organization of the microbiota, which could contribute to the observed loss of species diversity, we performed MaP-seq on distal colon samples from mice fed the LF or HF diet. We found that the distribution of unique OTUs per ˜20 μm cluster was similar between both diets (FIG. 4b, top). This implies that species distributions at the local ˜20 μm scale is governed by factors that are either common to or not affected by the two diets, for example 202 spatial autocorrelation of bacterial growth. However, assessing diversity at the higher taxonomic family-rank revealed significantly higher diversity in HF clusters (Mann-Whitney U test, p<10-22, FIG. 4b, bottom), indicating that while both LF and HF clusters contained similar numbers of OTUs, taxa within individual HF clusters were more phylogenetically diverse. Furthermore, positive co-associations were more frequently observed between diverse taxa in HF diet than in LF diet, which in contrast had co-associations mostly between Porphyromonadaceae or Lachnospiraceae. Interestingly, our observation of increased bacterial mixing at higher taxonomic levels has also been documented in mice fed with a plant polysaccharide deficient diet (compared to a LF plant-polysaccharide rich diet) using confocal imaging with 16S FISH probes of limited phylum-level specificity6, which further highlights the utility of examining spatial organization at a higher taxonomic resolution that is achievable by MaP-seq.


Understanding the phylogenetic distribution of an ecosystem can provide important insights into ecological processes underlying community assembly31,32. To better quantify possible changes in phylogenetic diversity between the two diets, we calculated the net relatedness index (NRI) of clusters, a standardized effect size of the mean phylogenetic distance of taxa present within clusters against a null model of random sampling from the local species pool (Methods) 31. For each microbiota cluster, a positive NRI value indicates phylogenetic clustering of its taxa, whereas a negative NRI indicates phylogenetic over-dispersion. While most clusters had NRI values near 0, suggesting random phylogenetic distributions, both LF and HF diets showed a subset of clusters with high negative NRI values suggesting a high degree of phylogenetic over-dispersion. Interestingly, NRI values in LF clusters were overall significantly higher compared to HF values (Mann-Whitney U test, p<10-18), driven by a subset of LF clusters with positive NRIs not observed in HF clusters (FIG. 4c). The phylogenetic clustering observed in this subset of LF clusters suggests that ecological habitat filtering due to factors associated with the LF diet (e.g. complex plant polysaccharides) may be important in shaping in the formation of these clusters at length-scale of ˜20 μm (assuming that more phylogenetically similar taxa also have more similar phenotypes). A possible explanation for the loss of species diversity when transitioning from a LF to a HF diet could thus be the loss of this LF-specific local niche, which stably hosts these closely related taxa. Indeed, the same taxa (predominantly Lachnospiraceae OTUs) that are abundantly found in LF clusters with high NRI values are those that are almost completely lost on HF diet.


Next, to compare the taxa spatial organization across the two diets, we visualized clusters using tSNE as before (FIG. 4d). Cell clusters from the two diets each formed highly distinct groups with minimal overlap, indicating that the spatial organization in the distal colon was significantly altered by the dietary shift. Despite this overall separation, we observed examples of cluster configurations that were shared between the two diets. For example, HF clusters were observed in a predominantly LF region marked by high abundance of a Porphyromondaceae taxa (OTU 5), and LF clusters were observed in a predominantly HF region marked by high abundance of a Bacteroidaceae taxa (OTU 6). These shared cluster regions could represent spatial niches that may be independent of the diet (e.g. mucus layers secreted by the host). Taken together, MaP-seq profiling of a diet perturbation enabled mechanistic analysis of ecological processes underlying community shifts and loss of diversity.


Spatial metagenomics enables the high-throughput characterization of microbial biogeography through microscopic plot sampling of co-localized nucleic acids at tunable length scales. This general approach could be applied to interrogate a variety of perturbations in the gut (e.g., diet, antibiotics, fecal microbiota transplantation), other mammalian associated microbiota (e.g. skin, genital), or diverse environmental ecosystems, such as soils or biofilms. Importantly, MaP-seq enables in-depth analysis of these processes at previously inaccessible and ecologically meaningful local length scales within individual microbiomes. Improvements to further increase the throughput of the approach could better delineate weaker or rarer co-associations and help investigate structuring across many different characteristic length scales within microbiomes. A variety of established spatial ecology tools and emerging computational and analytical approaches could be applied to this new type of high-dimensional microbiome dataset. Extensions of this general framework to spatially profile other biological molecules such as RNA, proteins and metabolites will enable mapping of complex cellular systems across mechanistically important and functionally distinct axes. Plot sampling of biological structures at microscopic scales opens up new directions of research that employ spatial ecology tools to study these complex systems.


Materials and reagents. All primers and FISH probes were ordered from Integrated DNA Technologies. Primers containing any modifications were HPLC purified by the manufacturer. Photocleavable primers were protected from unnecessary light exposure throughout.


Animal procedures. All mouse procedures were approved by the Columbia University Medical Center Institutional Animal Care and Use Committee (protocol AC-AAAR1513) and complied with all relevant regulations. 6-8 week-old female C57BL6/J mice were obtained from Taconic (colonic analysis, FIG. 2) or Jackson (analysis across GI tract, FIG. 3; dietary perturbation, FIG. 4) and fed a plant-polysaccharide based diet (LabDiet 5053). Dietary perturbation was performed by splitting four co-housed mice into two cages; one cage received the same plant-polysaccharide based diet and one cage received high fat diet (Teklad TD.06414).


Microfluidic device fabrication. Devices were fabricated utilizing standard SU-8 soft lithography. Silanized SU-8 silicon wafer molds were fabricated by FlowJEM with a feature height of ˜40 μm. PDMS (Dow Corning Sylgard 184) was mixed for 5 minutes at a ratio of 10:1 base to curing agent, degassed under house vacuum for 30 minutes, and poured over the wafer. The PDMS mixture was cured at 80° C. for 1 hour, allowed to cool to room temperature and removed from the wafer. Individual devices were cut from the PDMS slab and ports were punched utilizing a 1 mm biopsy punch (World Precision Instruments 504646). FIG. 11. Uniquely barcoded bead design and construction. We designed custom barcoded hydrogel beads containing one of 884,736 unique barcoded primers per bead and a partial sequencing adapter and 16S V4 primer 515f (see Parada, A. E., Needham, D. M. & Fuhrman, J. A. Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. Environ Microbiol 18, 1403-1414 (2016); Walters, W. et al. Improved Bacterial 16S rRNA Gene (V4 and V4-5) and Fungal Internal Transcribed Spacer Marker Gene Primers for Microbial Community Surveys. mSystems 1, e00009-15-10 (2015)). Theoretically, around 17,500 clusters can be captured per sample with a 1% multiple barcoding rate (see Klein, A. M. et al. Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells. Cell 161, 1187-1201 (2015)). Barcoded primer sequences were constructed via a split-and-pool primer extension strategy (see Klein, A. M. et al. (2015); Bose, S. et al. Scalable microfluidics for single-cell RNA printing and sequencing. Genome 672 Biology 1-16 (2015). doi:10.1186/s13059-015-0684-3) with three barcode extension rounds. Each barcode position contained 96 possible sequences, and each set of barcodes was selected such that each had at least 3 bp hamming distance from the other barcodes in each set (allowing for 1 bp error correction). The first barcode position was 7-9 bp in length (allowing for dephasing of reads to improve sequencing quality) while the second and third positions were 8 bp in length.


Construction of the barcoded beads followed procedures from Zilionis et al. (see Zilionis, R. et al. Single-cell barcoding and sequencing using droplet microfluidics. Nat Protoc 12, 44-73 (2017)) with minor modification for our barcoding scheme. Briefly, acrylamide beads (6% w/w acrylamide, 0.18% w/w N,N′-methylenebisacrylamide [Sigma-Aldrich 146072], 20 μM acry_pcp_pe1 [see Table 1]) were generated using a custom microfluidic droplet device. Resulting beads were ˜20-25 μm in diameter. Batches of ˜20 million beads were then subjected to three rounds of primer extension using the three sets of 96 barcode sequences (pe1, pe2, and pe3 primer extension sets, see Table 2). For each round, beads and primers were distributed into wells of a 96 well PCR microplate and primers were annealed to the beads by incubation. A Bst polymerase reaction master mix (NEB M0537L) was then distributed to each well and incubated to allow for extension. Finally, the reaction was quenched with EDTA and pooled for cleanup steps. The beads were then subjected to denaturing of the extension primers by sodium hydroxide and washing, and the extension protocol was repeated. These procedures were automated on a Biomek 4000 liquid handling robot where possible. After the final extension step, a primer targeted to the terminal 515f primer sequence (515f RC, see Table 1) was annealed, and an Exol enzymatic cleanup (NEB M0293L) was utilized to remove extension intermediates. Resulting barcoded beads were subjected to a final denaturing and washing step and stored at 4° C. in TET (10 mM Tris HCl [pH 8.0], 1 mM EDTA, 0.1% Tween-20). FIG. 12.










TABLE 1





Primer name
Primer sequence







acry_pc_pe1
/5Acryd//iSpPC/GACTACTCCACGACG



CTCTTCCGATCT



(SEQ ID NO: 1)





acry_pc_pe2_816r
/5Acryd//iSpPC/ATTAGGTCGACGTGTGC



TCTTCCGATCTGGACTACNVGGGTWTCTAAT



(SEQ ID NO: 2)





515f_RC
TTACCGCGGCKGCTGRCAC



(SEQ ID NO: 3)


















TABLE 2





Primer

Barcode


name
Primer sequence
sequence







pe1_1
CGCTCAGCAGTGTCTCGCACCTAGTAGATCGGA
ACTAGGT



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 4)
NO: 292)





pe1_2
CGCTCAGCAGTGTCTCGCTAGAGCTAGATCGGA
AGCTCTA



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 5)
NO: 293)





pe1_3
CGCTCAGCAGTGTCTCGCACTCTCTAGATCGGA
AGAGAGT



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 6)
NO: 294)





pe1_4
CGCTCAGCAGTGTCTCGCGGAACACAGATCGG
GTGTTCC



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 7)
NO: 295)





pe1_5
CGCTCAGCAGTGTCTCGCCAGCTAAAGATCGGA
TTAGCTG



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 8)
NO: 296)





pe1_6
CGCTCAGCAGTGTCTCGCGTATGGTAGATCGGA
ACCATAC



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 9)
NO: 297)





pe1_7
CGCTCAGCAGTGTCTCGCAACGGTAAGATCGG
TACCGTT



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 10)
NO: 298)





pe1_8
CGCTCAGCAGTGTCTCGCAGTTGGCAGATCGGA
GCCAACT



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 11)
NO: 299)





pe1_9
CGCTCAGCAGTGTCTCGCAGACTTCAGATCGGA
GAAGTCT



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 12)
NO: 300)





pe1_10
CGCTCAGCAGTGTCTCGCGTGCTTAAGATCGGA
TAAGCAC



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 13)
NO: 301)





pe1_11
CGCTCAGCAGTGTCTCGCCCACTAGAGATCGGA
CTAGTGG



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 14)
NO: 302)





pe1_12
CGCTCAGCAGTGTCTCGCGCGCTATAGATCGGA
ATAGCGC



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 15)
NO: 303)





pe1_13
CGCTCAGCAGTGTCTCGCTGACACTAGATCGGA
AGTGTCA



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 16)
NO: 304)





pe1_14
CGCTCAGCAGTGTCTCGCGAGGAACAGATCGG
GTTCCTC



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 17)
NO: 305)





pe1_15
CGCTCAGCAGTGTCTCGCTTGACCAAGATCGGA
TGGTCAA



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 18)
NO: 306)





pe1_16
CGCTCAGCAGTGTCTCGCGGTAGCAAGATCGG
TGCTACC



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 19)
NO: 307)





pe1_17
CGCTCAGCAGTGTCTCGCCGTTGAGAGATCGGA
CTCAACG



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 20)
NO: 308)





pe1_18
CGCTCAGCAGTGTCTCGCACAACTGAGATCGGA
CAGTTGT



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 21)
NO: 309)





pe1_19
CGCTCAGCAGTGTCTCGCTCAGTCAAGATCGGA
TGACTGA



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 22)
NO: 310)





pe1_20
CGCTCAGCAGTGTCTCGCCGTACATAGATCGGA
ATGTACG



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 23)
NO: 311)





pe1_21
CGCTCAGCAGTGTCTCGCTGAGTGCAGATCGGA
GCACTCA



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 24)
NO: 312)





pe1_22
CGCTCAGCAGTGTCTCGCCCTGTTAAGATCGGA
TAACAGG



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 25)
NO: 313)





pe1_23
CGCTCAGCAGTGTCTCGCACCTCTAAGATCGGA
TAGAGGT



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 26)
NO: 314)





pe1_24
CGCTCAGCAGTGTCTCGCATTCCACAGATCGGA
GTGGAAT



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 27)
NO: 315)





pe1_25
CGCTCAGCAGTGTCTCGCTCGTATGAGATCGGA
CATACGA



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 28)
NO: 316)





pe1_26
CGCTCAGCAGTGTCTCGCAGGTTGTAGATCGGA
ACAACCT



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 29)
NO: 317)





pe1_27
CGCTCAGCAGTGTCTCGCCGTAGTCAGATCGGA
GACTACG



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 30)
NO: 318)





pe1_28
CGCTCAGCAGTGTCTCGCCTTCTCGAGATCGGA
CGAGAAG



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 31)
NO: 319)





pe1_29
CGCTCAGCAGTGTCTCGCAGGTAAGAGATCGG
CTTACCT



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 32)
NO: 320)





pe1_30
CGCTCAGCAGTGTCTCGCGATCTCAAGATCGGA
TGAGATC



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 33)
NO: 321)





pe1_31
CGCTCAGCAGTGTCTCGCATCGAACAGATCGGA
GTTCGAT



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 34)
NO: 322)





pe1_32
CGCTCAGCAGTGTCTCGCCACGCATAGATCGGA
ATGCGTG



AGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 35)
NO: 323)





pe1_33
CGCTCAGCAGTGTCTCGCAACTCAGGAGATCGG
CCTGAGTT



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 36)
NO: 324)





pe1_34
CGCTCAGCAGTGTCTCGCTGCCACAAAGATCGG
TTGTGGCA



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 37)
NO: 325)





pe1_35
CGCTCAGCAGTGTCTCGCATGGCGATAGATCGG
ATCGCCAT



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 38)
NO: 326)





pe1_36
CGCTCAGCAGTGTCTCGCAATCAGCGAGATCGG
CGCTGATT



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 39)
NO: 327)





pe1_37
CGCTCAGCAGTGTCTCGCGGTTGTACAGATCGG
GTACAACC



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 40)
NO: 328)





pe1_38
CGCTCAGCAGTGTCTCGCCTCGACTTAGATCGG
AAGTCGAG



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 41)
NO: 329)





pe1_39
CGCTCAGCAGTGTCTCGCTAGGAAGCAGATCGG
GCTTCCTA



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 42)
NO: 330)





pe1_40
CGCTCAGCAGTGTCTCGCGTGCATGTAGATCGG
ACATGCAC



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 43)
NO: 331)





pe1_41
CGCTCAGCAGTGTCTCGCTCAATCGGAGATCGG
CCGATTGA



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 44)
NO: 332)





pe1_42
CGCTCAGCAGTGTCTCGCTCAAGCTCAGATCGG
GAGCTTGA



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 45)
NO: 333)





pe1_43
CGCTCAGCAGTGTCTCGCAGTGTCACAGATCGG
GTGACACT



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 46)
NO: 334)





pe1_44
CGCTCAGCAGTGTCTCGCTGTGTTCCAGATCGG
GGAACACA



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 47)
NO: 335)





pe1_45
CGCTCAGCAGTGTCTCGCTCCGAATCAGATCGG
GATTCGGA



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 48)
NO: 336)





pe1_46
CGCTCAGCAGTGTCTCGCGGAGTACAAGATCGG
TGTACTCC



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 49)
NO: 337)





pe1_47
CGCTCAGCAGTGTCTCGCAGGACAGAAGATCGG
TCTGTCCT



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 50)
NO: 338)





pe1_48
CGCTCAGCAGTGTCTCGCGCACAGTTAGATCGG
AACTGTGC



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 51)
NO: 339)





pe1_49
CGCTCAGCAGTGTCTCGCCGACAACAAGATCGG
TGTTGTCG



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 52)
NO: 340)





pe1_50
CGCTCAGCAGTGTCTCGCAGCACGTAAGATCGG
TACGTGCT



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 53)
NO: 341)





pe1_51
CGCTCAGCAGTGTCTCGCCCAACAGTAGATCGG
ACTGTTGG



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 54)
NO: 342)





pe1_52
CGCTCAGCAGTGTCTCGCTCAGGACAAGATCGG
TGTCCTGA



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 55)
NO: 343)





pe1_53
CGCTCAGCAGTGTCTCGCCTATCCTGAGATCGG
CAGGATAG



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 56)
NO: 344)





pe1_54
CGCTCAGCAGTGTCTCGCTGTCTGTCAGATCGG
GACAGACA



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 57)
NO: 345)





pe1_55
CGCTCAGCAGTGTCTCGCCCTAGTCTAGATCGG
AGACTAGG



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 58)
NO: 346)





pe1_56
CGCTCAGCAGTGTCTCGCGTAATGGCAGATCGG
GCCATTAC



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 59)
NO: 347)





pe1_57
CGCTCAGCAGTGTCTCGCTAGTGGCTAGATCGG
AGCCACTA



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 60)
NO: 348)





pe1_58
CGCTCAGCAGTGTCTCGCGAATCTGCAGATCGG
GCAGATTC



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 61)
NO: 349)





pe1_59
CGCTCAGCAGTGTCTCGCTTCGATGCAGATCGG
GCATCGAA



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 62)
NO: 350)





pe1_60
CGCTCAGCAGTGTCTCGCGCTTGGTTAGATCGG
AACCAAGC



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 63)
NO: 351)





pe1_61
CGCTCAGCAGTGTCTCGCAGCTGATCAGATCGG
GATCAGCT



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 64)
NO: 352)





pe1_62
CGCTCAGCAGTGTCTCGCATAAGCGGAGATCGG
CCGCTTAT



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 65)
NO: 353)





pe1_63
CGCTCAGCAGTGTCTCGCACTTCGGAAGATCGG
TCCGAAGT



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 66)
NO: 354)





pe1_64
CGCTCAGCAGTGTCTCGCCTAGTCGAAGATCGG
TCGACTAG



AAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 67)
NO: 355)





pe1_65
CGCTCAGCAGTGTCTCGCCGTTCTTGCAGATCG
GCAAGAACG



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 68)
NO: 356)





pe1_66
CGCTCAGCAGTGTCTCGCTGTAGACTCAGATCG
GAGTCTACA



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 69)
NO: 357)





pe1_67
CGCTCAGCAGTGTCTCGCGAAGGCCTAAGATCG
TAGGCCTTC



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 70)
NO: 358)





pe1_68
CGCTCAGCAGTGTCTCGCTTCGTAAGGAGATCG
CCTTACGAA



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 71)
NO: 359)





pe1_69
CGCTCAGCAGTGTCTCGCTGATCACCTAGATCG
AGGTGATCA



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 72)
NO: 360)





pe1_70
CGCTCAGCAGTGTCTCGCTAGCTAACGAGATCG
CGTTAGCTA



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 73)
NO: 361)





pe1_71
CGCTCAGCAGTGTCTCGCCGTAGAAGGAGATCG
CCTTCTACG



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 74)
NO: 362)





pe1_72
CGCTCAGCAGTGTCTCGCTCTCTCGAAAGATCG
TTCGAGAGA



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 75)
NO: 363)





pe1_73
CGCTCAGCAGTGTCTCGCTCTAGTTCCAGATCG
GGAACTAGA



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 76)
NO: 364)





pe1_74
CGCTCAGCAGTGTCTCGCCCGAAGAGAAGATCG
TCTCTTCGG



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 77)
NO: 365)





pe1_75
CGCTCAGCAGTGTCTCGCAGGTGACATAGATCG
ATGTCACCT



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 78)
NO: 366)





pe1_76
CGCTCAGCAGTGTCTCGCCTGAGAACGAGATCG
CGTTCTCAG



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 79)
NO: 367)





pe1_77
CGCTCAGCAGTGTCTCGCCCAGCTGAAAGATCG
TTCAGCTGG



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 80)
NO: 368)





pe1_78
CGCTCAGCAGTGTCTCGCCGTTCGACAAGATCG
TGTCGAACG



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 81)
NO: 369)





pe1_79
CGCTCAGCAGTGTCTCGCTCTTAGACCAGATCG
GGTCTAAGA



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 82)
NO: 370)





pe1_80
CGCTCAGCAGTGTCTCGCCACGAGCAAAGATCG
TTGCTCGTG



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 83)
NO: 371)





pe1_81
CGCTCAGCAGTGTCTCGCCTGCCGAATAGATCG
ATTCGGCAG



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 84)
NO: 372)





pe1_82
CGCTCAGCAGTGTCTCGCGGGCTCATAAGATCG
TATGAGCCC



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 85)
NO: 373)





pe1_83
CGCTCAGCAGTGTCTCGCCACCGTACTAGATCG
AGTACGGTG



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 86)
NO: 374)





pe1_84
CGCTCAGCAGTGTCTCGCGTGTCTCGAAGATCG
TCGAGACAC



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 87)
NO: 375)





pe1_85
CGCTCAGCAGTGTCTCGCTTACTGCGAAGATCG
TCGCAGTAA



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 88)
NO: 376)





pe1_86
CGCTCAGCAGTGTCTCGCTCCATACGAAGATCG
TCGTATGGA



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 89)
NO: 377)





pe1_87
CGCTCAGCAGTGTCTCGCGATCCAGGTAGATCG
ACCTGGATC



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 90)
NO: 378)





pe1_88
CGCTCAGCAGTGTCTCGCAGTTGCGAAAGATCG
TTCGCAACT



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 91)
NO: 379)





pe1_89
CGCTCAGCAGTGTCTCGCAGGTTGAGAAGATCG
TCTCAACCT



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 92)
NO: 380)





pe1_90
CGCTCAGCAGTGTCTCGCGTTGCGCTTAGATCG
AAGCGCAAC



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 93)
NO: 381)





pe1_91
CGCTCAGCAGTGTCTCGCCTCGAGAGAAGATCG
TCTCTCGAG



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 94)
NO: 382)





pe1_92
CGCTCAGCAGTGTCTCGCTGTTCCTAGAGATCG
CTAGGAACA



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 95)
NO: 383)





pe1_93
CGCTCAGCAGTGTCTCGCCTCACACTGAGATCG
CAGTGTGAG



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 96)
NO: 384)





pe1_94
CGCTCAGCAGTGTCTCGCACCACATGTAGATCG
ACATGTGGT



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 97)
NO: 385)





pe1_95
CGCTCAGCAGTGTCTCGCAGCTTAACCAGATCG
GGTTAAGCT



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 98)
NO: 386)





pe1_96
CGCTCAGCAGTGTCTCGCCACCTATGCAGATCG
GCATAGGTG



GAAGAGCGTCGTG
(SEQ ID



(SEQ ID NO: 99)
NO: 387)





pe2_1
CGACGAGGCTGGAGTGACACTGGTACCGCTCAG
GTACCAGT



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 100)
NO: 388)





pe2_2
CGACGAGGCTGGAGTGACGGTACTGTCGCTCAG
ACAGTACC



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 101)
NO: 389)





pe2_3
CGACGAGGCTGGAGTGACTCTGTGTGCGCTCAG
CACACAGA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 102)
NO: 390)





pe2_4
CGACGAGGCTGGAGTGACTATGGCTCCGCTCAG
GAGCCATA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 103)
NO: 391)





pe2_5
CGACGAGGCTGGAGTGACGTTGTCAGCGCTCAG
CTGACAAC



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 104)
NO: 392)





pe2_6
CGACGAGGCTGGAGTGACATGCCAGTCGCTCAG
ACTGGCAT



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 105)
NO: 393)





pe2_7
CGACGAGGCTGGAGTGACCGCTACTACGCTCAG
TAGTAGCG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 106)
NO: 394)





pe2_8
CGACGAGGCTGGAGTGACCATACACGCGCTCA
CGTGTATG



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 107)
NO: 395)





pe2_9
CGACGAGGCTGGAGTGACTCGAGGATCGCTCA
ATCCTCGA



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 108)
NO: 396)





pe2_10
CGACGAGGCTGGAGTGACGGTTCGATCGCTCAG
ATCGAACC



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 109)
NO: 397)





pe2_11
CGACGAGGCTGGAGTGACACGGAACACGCTCA
TGTTCCGT



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 110)
NO: 398)





pe2_12
CGACGAGGCTGGAGTGACCGTTGCATCGCTCAG
ATGCAACG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 111)
NO: 399)





pe2_13
CGACGAGGCTGGAGTGACATACGTCCCGCTCAG
GGACGTAT



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 112)
NO: 400)





pe2_14
CGACGAGGCTGGAGTGACGATCTGGACGCTCA
TCCAGATC



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 113)
NO: 401)





pe2_15
CGACGAGGCTGGAGTGACTCTCGAAGCGCTCAG
CTTCGAGA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 114)
NO: 402)





pe2_16
CGACGAGGCTGGAGTGACCTGTGCTACGCTCAG
TAGCACAG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 115)
NO: 403)





pe2_17
CGACGAGGCTGGAGTGACAGGTGGAACGCTCA
TTCCACCT



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 116)
NO: 404)





pe2_18
CGACGAGGCTGGAGTGACTAGCAACGCGCTCA
CGTTGCTA



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 117)
NO: 405)





pe2_19
CGACGAGGCTGGAGTGACGGTCATTCCGCTCAG
GAATGACC



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 118)
NO: 406)





pe2_20
CGACGAGGCTGGAGTGACAGATACGCCGCTCA
GCGTATCT



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 119)
NO: 407)





pe2_21
CGACGAGGCTGGAGTGACGAACTGCTCGCTCAG
AGCAGTTC



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 120)
NO: 408)





pe2_22
CGACGAGGCTGGAGTGACAGTGCACACGCTCA
TGTGCACT



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 121)
NO: 409)





pe2_23
CGACGAGGCTGGAGTGACCCGATCATCGCTCAG
ATGATCGG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 122)
NO: 410)





pe2_24
CGACGAGGCTGGAGTGACACAAGGACCGCTCA
GTCCTTGT



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 123)
NO: 411)





pe2_25
CGACGAGGCTGGAGTGACATTCGGTCCGCTCAG
GACCGAAT



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 124)
NO: 412)





pe2_26
CGACGAGGCTGGAGTGACTTGTGACGCGCTCAG
CGTCACAA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 125)
NO: 413)





pe2_27
CGACGAGGCTGGAGTGACGAAGTCTGCGCTCA
CAGACTTC



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 126)
NO: 414)





pe2_28
CGACGAGGCTGGAGTGACTGGACGAACGCTCA
TTCGTCCA



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 127)
NO: 415)





pe2_29
CGACGAGGCTGGAGTGACGAGTTCCTCGCTCAG
AGGAACTC



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 128)
NO: 416)





pe2_30
CGACGAGGCTGGAGTGACGATAGGAGCGCTCA
CTCCTATC



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 129)
NO: 417)





pe2_31
CGACGAGGCTGGAGTGACAGCTTGGACGCTCA
TCCAAGCT



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 130)
NO: 418)





pe2_32
CGACGAGGCTGGAGTGACCACATCCTCGCTCAG
AGGATGTG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 131)
NO: 419)





pe2_33
CGACGAGGCTGGAGTGACAGTCCTGACGCTCAG
TCAGGACT



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 132)
NO: 420)





pe2_34
CGACGAGGCTGGAGTGACCTTGTAGCCGCTCAG
GCTACAAG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 133)
NO: 421)





pe2_35
CGACGAGGCTGGAGTGACCAGGAGTACGCTCA
TACTCCTG



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 134)
NO: 422)





pe2_36
CGACGAGGCTGGAGTGACCACAAGGACGCTCA
TCCTTGTG



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 135)
NO: 423)





pe2_37
CGACGAGGCTGGAGTGACTTCCTCTGCGCTCAG
CAGAGGAA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 136)
NO: 424)





pe2_38
CGACGAGGCTGGAGTGACCCATTGCTCGCTCAG
AGCAATGG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 137)
NO: 425)





pe2_39
CGACGAGGCTGGAGTGACGCACATAGCGCTCA
CTATGTGC



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 138)
NO: 426)





pe2_40
CGACGAGGCTGGAGTGACCACTGTACCGCTCAG
GTACAGTG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 139)
NO: 427)





pe2_41
CGACGAGGCTGGAGTGACGTGATCTCCGCTCAG
GAGATCAC



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 140)
NO: 428)





pe2_42
CGACGAGGCTGGAGTGACAATGCCGTCGCTCAG
ACGGCATT



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 141)
NO: 429)





pe2_43
CGACGAGGCTGGAGTGACTCCTTGTCCGCTCAG
GACAAGGA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 142)
NO: 430)





pe2_44
CGACGAGGCTGGAGTGACAGTAGGCACGCTCA
TGCCTACT



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 143)
NO: 431)





pe2_45
CGACGAGGCTGGAGTGACAGCCTCTTCGCTCAG
AAGAGGCT



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 144)
NO: 432)





pe2_46
CGACGAGGCTGGAGTGACCGATTACGCGCTCAG
CGTAATCG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 145)
NO: 433)





pe2_47
CGACGAGGCTGGAGTGACCCAGGAATCGCTCA
ATTCCTGG



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 146)
NO: 434)





pe2_48
CGACGAGGCTGGAGTGACGAGTCAGTCGCTCA
ACTGACTC



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 147)
NO: 435)





pe2_49
CGACGAGGCTGGAGTGACTGAGAGGACGCTCA
TCCTCTCA



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 148)
NO: 436)





pe2_50
CGACGAGGCTGGAGTGACACGACTCACGCTCA
TGAGTCGT



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 149)
NO: 437)





pe2_51
CGACGAGGCTGGAGTGACTAGCTCAGCGCTCAG
CTGAGCTA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 150)
NO: 438)





pe2_52
CGACGAGGCTGGAGTGACTAACCGGTCGCTCAG
ACCGGTTA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 151)
NO: 439)





pe2_53
CGACGAGGCTGGAGTGACGTACTGAGCGCTCA
CTCAGTAC



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 152)
NO: 440)





pe2_54
CGACGAGGCTGGAGTGACAACCACTCCGCTCAG
GAGTGGTT



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 153)
NO: 441)





pe2_55
CGACGAGGCTGGAGTGACCAGTTACCCGCTCAG
GGTAACTG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 154)
NO: 442)





pe2_56
CGACGAGGCTGGAGTGACGATGGATGCGCTCA
CATCCATC



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 155)
NO: 443)





pe2_57
CGACGAGGCTGGAGTGACCTACCTCTCGCTCAG
AGAGGTAG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 156)
NO: 444)





pe2_58
CGACGAGGCTGGAGTGACGTCAAGAGCGCTCA
CTCTTGAC



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 157)
NO: 445)





pe2_59
CGACGAGGCTGGAGTGACGATCTACGCGCTCAG
CGTAGATC



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 158)
NO: 446)





pe2_60
CGACGAGGCTGGAGTGACACATTCCGCGCTCAG
CGGAATGT



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 159)
NO: 447)





pe2_61
CGACGAGGCTGGAGTGACCTGAATCCCGCTCAG
GGATTCAG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 160)
NO: 448)





pe2_62
CGACGAGGCTGGAGTGACTGGCCATACGCTCAG
TATGGCCA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 161)
NO: 449)





pe2_63
CGACGAGGCTGGAGTGACGTCTTGCTCGCTCAG
AGCAAGAC



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 162)
NO: 450)





pe2_64
CGACGAGGCTGGAGTGACACGTGTTGCGCTCAG
CAACACGT



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 163)
NO: 451)





pe2_65
CGACGAGGCTGGAGTGACGAAGCGTTCGCTCA
AACGCTTC



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 164)
NO: 452)





pe2_66
CGACGAGGCTGGAGTGACTAACGCCACGCTCA
TGGCGTTA



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 165)
NO: 453)





pe2_67
CGACGAGGCTGGAGTGACAGGCTGTACGCTCA
TACAGCCT



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 166)
NO: 454)





pe2_68
CGACGAGGCTGGAGTGACCTACAGTGCGCTCAG
CACTGTAG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 167)
NO: 455)





pe2_69
CGACGAGGCTGGAGTGACTTCAGAGCCGCTCAG
GCTCTGAA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 168)
NO: 456)





pe2_70
CGACGAGGCTGGAGTGACTGCCTACACGCTCAG
TGTAGGCA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 169)
NO: 457)





pe2_71
CGACGAGGCTGGAGTGACCGGATTGACGCTCA
TCAATCCG



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 170)
NO: 458)





pe2_72
CGACGAGGCTGGAGTGACGGAGGATTCGCTCA
AATCCTCC



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 171)
NO: 459)





pe2_73
CGACGAGGCTGGAGTGACCATTAGCCCGCTCAG
GGCTAATG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 172)
NO: 460)





pe2_74
CGACGAGGCTGGAGTGACTTGGTCACCGCTCAG
GTGACCAA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 173)
NO: 461)





pe2_75
CGACGAGGCTGGAGTGACCAAGCAAGCGCTCA
CTTGCTTG



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 174)
NO: 462)





pe2_76
CGACGAGGCTGGAGTGACCAACATCCCGCTCAG
GGATGTTG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 175)
NO: 463)





pe2_77
CGACGAGGCTGGAGTGACGACGACAACGCTCA
TTGTCGTC



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 176)
NO: 464)





pe2_78
CGACGAGGCTGGAGTGACATCGAGTCCGCTCAG
GACTCGAT



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 177)
NO: 465)





pe2_79
CGACGAGGCTGGAGTGACTATGCGAGCGCTCA
CTCGCATA



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 178)
NO: 466)





pe2_80
CGACGAGGCTGGAGTGACTAGCTTCCCGCTCAG
GGAAGCTA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 179)
NO: 467)





pe2_81
CGACGAGGCTGGAGTGACACCAACGTCGCTCA
ACGTTGGT



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 180)
NO: 468)





pe2_82
CGACGAGGCTGGAGTGACACGCGATACGCTCA
TATCGCGT



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 181)
NO: 469)





pe2_83
CGACGAGGCTGGAGTGACGTCAGCTACGCTCAG
TAGCTGAC



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 182)
NO: 470)





pe2_84
CGACGAGGCTGGAGTGACCACCAGATCGCTCA
ATCTGGTG



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 183)
NO: 471)





pe2_85
CGACGAGGCTGGAGTGACCAACCTTGCGCTCAG
CAAGGTTG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 184)
NO: 472)





pe2_86
CGACGAGGCTGGAGTGACTTGCCTTGCGCTCAG
CAAGGCAA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 185)
NO: 473)





pe2_87
CGACGAGGCTGGAGTGACAGTCTGCTCGCTCAG
AGCAGACT



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 186)
NO: 474)





pe2_88
CGACGAGGCTGGAGTGACGTCCTTCACGCTCAG
TGAAGGAC



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 187)
NO: 475)





pe2_89
CGACGAGGCTGGAGTGACCGGTCTATCGCTCAG
ATAGACCG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 188)
NO: 476)





pe2_90
CGACGAGGCTGGAGTGACTCTGCCTTCGCTCAG
AAGGCAGA



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 189)
NO: 477)





pe2_91
CGACGAGGCTGGAGTGACCAAGTTGGCGCTCA
CCAACTTG



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 190)
NO: 478)





pe2_92
CGACGAGGCTGGAGTGACATCTACGGCGCTCAG
CCGTAGAT



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 191)
NO: 479)





pe2_93
CGACGAGGCTGGAGTGACCACTTCTGCGCTCAG
CAGAAGTG



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 192)
NO: 480)





pe2_94
CGACGAGGCTGGAGTGACCACACAACCGCTCA
GTTGTGTG



GCAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 193)
NO: 481)





pe2_95
CGACGAGGCTGGAGTGACGCCTAATGCGCTCAG
CATTAGGC



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 194)
NO: 482)





pe2_96
CGACGAGGCTGGAGTGACGTTCGCATCGCTCAG
ATGCGAAC



CAGTGTCTCGC
(SEQ ID



(SEQ ID NO: 195)
NO: 483)





pe3_1
TTACCGCGGCKGCTGRCACACGAGTCTAGCGAC
CTAGACTC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 196)
NO: 484)





pe3_2
TTACCGCGGCKGCTGRCACACGCCTCTATCGAC
ATAGAGGC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 197)
NO: 485)





pe3_3
TTACCGCGGCKGCTGRCACACGCCATTCTCGAC
AGAATGGC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 198)
NO: 486)





pe3_4
TTACCGCGGCKGCTGRCACACTACGGTTGCGAC
CAACCGTA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 199)
NO: 487)





pe3_5
TTACCGCGGCKGCTGRCACACACTCTACCCGAC
GGTAGAGT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 200)
NO: 488)





pe3_6
TTACCGCGGCKGCTGRCACACTAGGTCCACGAC
TGGACCTA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 201)
NO: 489)





pe3_7
TTACCGCGGCKGCTGRCACACTCCTGAGTCGAC
ACTCAGGA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 202)
NO: 490)





pe3_8
TTACCGCGGCKGCTGRCACACGTGGATAGCGAC
CTATCCAC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 203)
NO: 491)





pe3_9
TTACCGCGGCKGCTGRCACACGCGCTATTCGAC
AATAGCGC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 204)
NO: 492)





pe3_10
TTACCGCGGCKGCTGRCACACGGAAGGAACGA
TTCCTTCC



CGAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 205)
NO: 493)





pe3_11
TTACCGCGGCKGCTGRCACACGGACTCAACGAC
TTGAGTCC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 206)
NO: 494)





pe3_12
TTACCGCGGCKGCTGRCACACAACACTCGCGAC
CGAGTGTT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 207)
NO: 495)





pe3_13
TTACCGCGGCKGCTGRCACACCCGGAATTCGAC
AATTCCGG



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 208)
NO: 496)





pe3_14
TTACCGCGGCKGCTGRCACACAACTTGCCCGAC
GGCAAGTT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 209)
NO: 497)





pe3_15
TTACCGCGGCKGCTGRCACACTTGACAGGCGAC
CCTGTCAA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 210)
NO: 498)





pe3_16
TTACCGCGGCKGCTGRCACACTCTTAGCGCGAC
CGCTAAGA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 211)
NO: 499)





pe3_17
TTACCGCGGCKGCTGRCACACCTGTTGCACGAC
TGCAACAG



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 212)
NO: 500)





pe3_18
TTACCGCGGCKGCTGRCACACAGAACACGCGAC
CGTGTTCT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 213)
NO: 501)





pe3_19
TTACCGCGGCKGCTGRCACACCCTTGATGCGAC
CATCAAGG



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 214)
NO: 502)





pe3_20
TTACCGCGGCKGCTGRCACACAGCGATCTCGAC
AGATCGCT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 215)
NO: 503)





pe3_21
TTACCGCGGCKGCTGRCACACGCTCAGAACGAC
TTCTGAGC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 216)
NO: 504)





pe3_22
TTACCGCGGCKGCTGRCACACATTGCGTGCGAC
CACGCAAT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 217)
NO: 505)





pe3_23
TTACCGCGGCKGCTGRCACACCATCCGTTCGAC
AACGGATG



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 218)
NO: 506)





pe3_24
TTACCGCGGCKGCTGRCACACTCTCTGGTCGAC
ACCAGAGA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 219)
NO: 507)





pe3_25
TTACCGCGGCKGCTGRCACACAACGAGCACGAC
TGCTCGTT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 220)
NO: 508)





pe3_26
TTACCGCGGCKGCTGRCACACACGTTCACCGAC
GTGAACGT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 221)
NO: 509)





pe3_27
TTACCGCGGCKGCTGRCACACATCAGCACCGAC
GTGCTGAT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 222)
NO: 510)





pe3_28
TTACCGCGGCKGCTGRCACACGATAGCGACGAC
TCGCTATC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 223)
NO: 511)





pe3_29
TTACCGCGGCKGCTGRCACACAGAGCTTGCGAC
CAAGCTCT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 224)
NO: 512)





pe3_30
TTACCGCGGCKGCTGRCACACTGATCGTCCGAC
GACGATCA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 225)
NO: 513)





pe3_31
TTACCGCGGCKGCTGRCACACACGATACGCGAC
CGTATCGT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 226)
NO: 514)





pe3_32
TTACCGCGGCKGCTGRCACACCTAACTGGCGAC
CCAGTTAG



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 227)
NO: 515)





pe3_33
TTACCGCGGCKGCTGRCACACTCGCGTAACGAC
TTACGCGA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 228)
NO: 516)





pe3_34
TTACCGCGGCKGCTGRCACACCGGTTCTTCGAC
AAGAACCG



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 229)
NO: 517)





pe3_35
TTACCGCGGCKGCTGRCACACTTGGTTCGCGAC
CGAACCAA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 230)
NO: 518)





pe3_36
TTACCGCGGCKGCTGRCACACGAAGTAGCCGAC
GCTACTTC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 231)
NO: 519)





pe3_37
TTACCGCGGCKGCTGRCACACGGCTAGAACGAC
TTCTAGCC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 232)
NO: 520)





pe3_38
TTACCGCGGCKGCTGRCACACCATCGTGACGAC
TCACGATG



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 233)
NO: 521)





pe3_39
TTACCGCGGCKGCTGRCACACTCACCAACCGAC
GTTGGTGA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 234)
NO: 522)





pe3_40
TTACCGCGGCKGCTGRCACACCTTCAAGGCGAC
CCTTGAAG



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 235)
NO: 523)





pe3_41
TTACCGCGGCKGCTGRCACACAGTAGCTCCGAC
GAGCTACT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 236)
NO: 524)





pe3_42
TTACCGCGGCKGCTGRCACACGCCACATTCGAC
AATGTGGC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 237)
NO: 525)





pe3_43
TTACCGCGGCKGCTGRCACACTTCACGGACGAC
TCCGTGAA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 238)
NO: 526)





pe3_44
TTACCGCGGCKGCTGRCACACTGACGTTGCGAC
CAACGTCA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 239)
NO: 527)





pe3_45
TTACCGCGGCKGCTGRCACACTCATCTGGCGAC
CCAGATGA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 240)
NO: 528)





pe3_46
TTACCGCGGCKGCTGRCACACCGTTCATCCGAC
GATGAACG



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 241)
NO: 529)





pe3_47
TTACCGCGGCKGCTGRCACACAACCGTCACGAC
TGACGGTT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 242)
NO: 530)





pe3_48
TTACCGCGGCKGCTGRCACACTGCTAAGCCGAC
GCTTAGCA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 243)
NO: 531)





pe3_49
TTACCGCGGCKGCTGRCACACCAGGTAGACGAC
TCTACCTG



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 244)
NO: 532)





pe3_50
TTACCGCGGCKGCTGRCACACAAGAACCGCGAC
CGGTTCTT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 245)
NO: 533)





pe3_51
TTACCGCGGCKGCTGRCACACAGGAGACTCGAC
AGTCTCCT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 246)
NO: 534)





pe3_52
TTACCGCGGCKGCTGRCACACAGTGAAGGCGAC
CCTTCACT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 247)
NO: 535)





pe3_53
TTACCGCGGCKGCTGRCACACTCTTCAGCCGAC
GCTGAAGA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 248)
NO: 536)





pe3_54
TTACCGCGGCKGCTGRCACACAACGGAGTCGAC
ACTCCGTT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 249)
NO: 537)





pe3_55
TTACCGCGGCKGCTGRCACACGAAGAGACCGA
GTCTCTTC



CGAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 250)
NO: 538)





pe3_56
TTACCGCGGCKGCTGRCACACATTGGTGGCGAC
CCACCAAT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 251)
NO: 539)





pe3_57
TTACCGCGGCKGCTGRCACACCTGTCAAGCGAC
CTTGACAG



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 252)
NO: 540)





pe3_58
TTACCGCGGCKGCTGRCACACAGGCATCACGAC
TGATGCCT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 253)
NO: 541)





pe3_59
TTACCGCGGCKGCTGRCACACAAGAGGTCCGAC
GACCTCTT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 254)
NO: 542)





pe3_60
TTACCGCGGCKGCTGRCACACTGCATTCGCGAC
CGAATGCA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 255)
NO: 543)





pe3_61
TTACCGCGGCKGCTGRCACACTTGGACGTCGAC
ACGTCCAA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 256)
NO: 544)





pe3_62
TTACCGCGGCKGCTGRCACACTTGCTGGACGAC
TCCAGCAA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 257)
NO: 545)





pe3_63
TTACCGCGGCKGCTGRCACACTGGAGATGCGAC
CATCTCCA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 258)
NO: 546)





pe3_64
TTACCGCGGCKGCTGRCACACTACGTACCCGAC
GGTACGTA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 259)
NO: 547)





pe3_65
TTACCGCGGCKGCTGRCACACTGACACCTCGAC
AGGTGTCA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 260)
NO: 548)





pe3_66
TTACCGCGGCKGCTGRCACACGTCCATTGCGAC
CAATGGAC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 261)
NO: 549)





pe3_67
TTACCGCGGCKGCTGRCACACCAGAGAAGCGA
CTTCTCTG



CGAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 262)
NO: 550)





pe3_68
TTACCGCGGCKGCTGRCACACTGCTTCAGCGAC
CTGAAGCA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 263)
NO: 551)





pe3_69
TTACCGCGGCKGCTGRCACACTACACTGCCGAC
GCAGTGTA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 264)
NO: 552)





pe3_70
TTACCGCGGCKGCTGRCACACGGACGTATCGAC
ATACGTCC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 265)
NO: 553)





pe3_71
TTACCGCGGCKGCTGRCACACCTCGCATACGAC
TATGCGAG



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 266)
NO: 554)





pe3_72
TTACCGCGGCKGCTGRCACACGCATCCTACGAC
TAGGATGC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 267)
NO: 555)





pe3_73
TTACCGCGGCKGCTGRCACACAGGCTTACCGAC
GTAAGCCT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 268)
NO: 556)





pe3_74
TTACCGCGGCKGCTGRCACACGTAAGTCGCGAC
CGACTTAC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 269)
NO: 557)





pe3_75
TTACCGCGGCKGCTGRCACACTTCTGGAGCGAC
CTCCAGAA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 270)
NO: 558)





pe3_76
TTACCGCGGCKGCTGRCACACGACACACACGAC
TGTGTGTC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 271)
NO: 559)





pe3_77
TTACCGCGGCKGCTGRCACACACCAGACACGAC
TGTCTGGT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 272)
NO: 560)





pe3_78
TTACCGCGGCKGCTGRCACACTGCAGCTTCGAC
AAGCTGCA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 273)
NO: 561)





pe3_79
TTACCGCGGCKGCTGRCACACGCAACTTCCGAC
GAAGTTGC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 274)
NO: 562)





pe3_80
TTACCGCGGCKGCTGRCACACACTCGCTTCGAC
AAGCGAGT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 275)
NO: 563)





pe3_81
TTACCGCGGCKGCTGRCACACTGAACTCCCGAC
GGAGTTCA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 276)
NO: 564)





pe3_82
TTACCGCGGCKGCTGRCACACGTGTAAGCCGAC
GCTTACAC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 277)
NO: 565)





pe3_83
TTACCGCGGCKGCTGRCACACATGCACCTCGAC
AGGTGCAT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 278)
NO: 566)





pe3_84
TTACCGCGGCKGCTGRCACACTCCGTCAACGAC
TTGACGGA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 279)
NO: 567)





pe3_85
TTACCGCGGCKGCTGRCACACGTCGGTATCGAC
ATACCGAC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 280)
NO: 568)





pe3_86
TTACCGCGGCKGCTGRCACACACAGATCCCGAC
GGATCTGT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 281)
NO: 569)





pe3_87
TTACCGCGGCKGCTGRCACACTCGGATCTCGAC
AGATCCGA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 282)
NO: 570)





pe3_88
TTACCGCGGCKGCTGRCACACAGAGTCGTCGAC
ACGACTCT



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 283)
NO: 571)





pe3_89
TTACCGCGGCKGCTGRCACACGAATAGCGCGAC
CGCTATTC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 284)
NO: 572)





pe3_90
TTACCGCGGCKGCTGRCACACGGATTGGTCGAC
ACCAATCC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 285)
NO: 573)





pe3_91
TTACCGCGGCKGCTGRCACACGCCATAGACGAC
TCTATGGC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 286)
NO: 574)





pe3_92
TTACCGCGGCKGCTGRCACACTGTCAGAGCGAC
CTCTGACA



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 287)
NO: 575)





pe3_93
TTACCGCGGCKGCTGRCACACCCTACGAACGAC
TTCGTAGG



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 288)
NO: 576)





pe3_94
TTACCGCGGCKGCTGRCACACGTTACGTCCGAC
GACGTAAC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 289)
NO: 577)





pe3_95
TTACCGCGGCKGCTGRCACACCGAGATACCGAC
GTATCTCG



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 290)
NO: 578)





pe3_96
TTACCGCGGCKGCTGRCACACGCATTGACCGAC
GTCAATGC



GAGGCTGGAGTGAC
(SEQ ID



(SEQ ID NO: 291)
NO: 579)


















TABLE 3





Primer

Barcode


name
Primer sequence
sequence







p7_1
CAAGCAGAAGACGGCATACGAGATTCGATGAGGTGA
CTCATCGA



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 580)
NO: 612)





p7_2
CAAGCAGAAGACGGCATACGAGATAACGATCCGTGA
GGATCGTT



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 581)
NO: 613)





p7_3
CAAGCAGAAGACGGCATACGAGATTAACGTGGGTGA
CCACGTTA



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 582)
NO: 614)





p7_4
CAAGCAGAAGACGGCATACGAGATATGGAGGAGTGA
TCCTCCAT



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 583)
NO: 615)





p7_5
CAAGCAGAAGACGGCATACGAGATGCGAAGATGTGA
ATCTTCGC



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 584)
NO: 616)





p7_6
CAAGCAGAAGACGGCATACGAGATACTTCGCTGTGA
AGCGAAGT



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 585)
NO: 617)





p7_7
CAAGCAGAAGACGGCATACGAGATTGCGTAAGGTGA
CTTACGCA



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 586)
NO: 618)





p7_8
CAAGCAGAAGACGGCATACGAGATGGTCAAGTGTGA
ACTTGACC



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 587)
NO: 619)





p7_9
CAAGCAGAAGACGGCATACGAGATAGGCTTACGTGA
GTAAGCCT



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 588)
NO: 620)





p7_10
CAAGCAGAAGACGGCATACGAGATGATTCTCGGTGA
CGAGAATC



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 589)
NO: 621)





p7_11
CAAGCAGAAGACGGCATACGAGATGTCTCCTAGTGA
TAGGAGAC



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 590)
NO: 622)





p7_12
CAAGCAGAAGACGGCATACGAGATGACGGTATGTGA
ATACCGTC



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 591)
NO: 623)





p7_13
CAAGCAGAAGACGGCATACGAGATCATGGTGTGTGA
ACACCATG



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 592)
NO: 624)





p7_14
CAAGCAGAAGACGGCATACGAGATTGTCTACCGTGA
GGTAGACA



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 593)
NO: 625)





p7_15
CAAGCAGAAGACGGCATACGAGATACCATGCAGTGA
TGCATGGT



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 594)
NO: 626)





p7_16
CAAGCAGAAGACGGCATACGAGATCATTCCTGGTGA
CAGGAATG



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 595)
NO: 627)





p7_17
CAAGCAGAAGACGGCATACGAGATAGGACTAGGTGA
CTAGTCCT



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 596)
NO: 628)





p7_18
CAAGCAGAAGACGGCATACGAGATGCTTGTTGGTGA
CAACAAGC



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 597)
NO: 629)





p7_19
CAAGCAGAAGACGGCATACGAGATAGTCACACGTGA
GTGTGACT



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 598)
NO: 630)





p7_20
CAAGCAGAAGACGGCATACGAGATCCAGTTGTGTGA
ACAACTGG



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 599)
NO: 631)





p7_21
CAAGCAGAAGACGGCATACGAGATCTCCATTCGTG
GAATGGAG



ACTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 600)
NO: 632)





p7_22
CAAGCAGAAGACGGCATACGAGATTTGCCAACGTGA
GTTGGCAA



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 601)
NO: 633)





p7_23
CAAGCAGAAGACGGCATACGAGATGAGCACATGTGA
ATGTGCTC



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 602)
NO: 634)





p7_24
CAAGCAGAAGACGGCATACGAGATATGTGGTGGTGA
CACCACAT



CTGGAGTTCAGACGTGTGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 603)
NO: 635)





p5_1
AATGATACGGCGACCACCGAGATCTACACTAGATC
TAGATCGC



GCACACTCTTTCCCTACACGACGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 604)
NO: 636)





p5_2
AATGATACGGCGACCACCGAGATCTACACCTCTCT
CTCTCTAT



ATACACTCTTTCCCTACACGACGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 605)
NO: 637)





p5_3
AATGATACGGCGACCACCGAGATCTACACTATCCTC
TATCCTCT



TACACTCTTTCCCTACACGACGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 606)
NO: 638)





p5_4
AATGATACGGCGACCACCGAGATCTACACAGAGTAG
AGAGTAGA



AACACTCTTTCCCTACACGACGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 607)
NO: 639)





p5_5
AATGATACGGCGACCACCGAGATCTACACGTAAGGA
GTAAGGAG



GACACTCTTTCCCTACACGACGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 608)
NO: 640)





p5_6
AATGATACGGCGACCACCGAGATCTACACACTGCAT
ACTGCATA



AACACTCTTTCCCTACACGACGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 609)
NO: 641)





p5_7
AATGATACGGCGACCACCGAGATCTACACAAGGAGT
AAGGAGTA



AACACTCTTTCCCTACACGACGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 610)
NO: 642)





p5_8
AATGATACGGCGACCACCGAGATCTACACCTAAGCC
CTAAGCCT



TACACTCTTTCCCTACACGACGCTCTTCCGATCT
(SEQ ID



(SEQ ID NO: 611)
NO: 643)


















TABLE 4






Primer name
Primer sequence








bead_pe1_cy5
/5Cy5/AGATCGGAAGAGCGTCGTG




(SEQ ID NO: 644)






bead_515f_cy5
/5Cy5/TTACCGCGGCKGCTGRCAC




(SEQ ID NO: 645)






erec482_a488
/5Alex488N/GCTTCTTAGTCAGGTACCG




(SEQ ID NO: 646)






lab158_cy3
/5Cy3/GGTATTAGCAYCTGTTTCCA




(SEQ ID NO: 647)






ato291_cy5
/5Cy5/GGTCGGTCTCTCAACCC




(SEQ ID NO: 648)






erec482_cy3
/5Cy3/GCTTCTTAGTCAGGTACCG




(SEQ ID NO: 649)






eub338_cy5
/5Cy5/GCTGCCTCCCGTAGGAGT




(SEQ ID NO: 650)






non338_cy5
/5Cy5/ACTCCTACGGGAGGCAGC




(SEQ ID NO: 651)























TABLE 5












Final



Cluster
Reads,

Reads

Number
number



size
cutoff,
Clusters,
cutoff,
Clusters,
clusters
of


Dataset
(microns)
TR1
TR1
TR2
TR2
discarded
clusters






















Community
30
1440
399
N/A
N/A
N/A
399


mixing, E. coli +


homogenized


fecal material


Community
20
566
88
N/A
N/A
N/A
88


mixing, S. pasteurii +


homogenized


fecal material


Mouse distal
30
992
715
717
754
63
1406


colon


Mouse distal
30
920
730
651
624
126
1228


colon (co-


housed mouse)


Mouse ileum
20
432
379
510
114
107
386


(si6)


Mouse cecum
20
405
235
314
193
23
405


(cec)


Mouse distal
20
404
164
442
124
29
259


colon (co2)


Mouse distal
7
540
292
438
237
0
529


colon (co2)


Mouse ileum
20
379
157
396
104
0
261


(si6; co-housed


mouse)


Mouse cecum
20
239
112
256
177
66
223


(cec; co-housed


mouse)


Mouse distal
7
328
111
286
40
0
151


colon (co2; co-


housed mouse)


Mouse distal
20
121
240
124
255
0
495


colon, LF diet


Mouse distal
20
184
225
125
192
58
359


colon, LF diet,


adjacent


segment


Mouse distal
20
262
503
279
460
25
938


colon, HF diet









All mouse samples were collected in technical replicate (TR), a single technical replicate was collected for community mixing experiments. The procedure to remove technical artifacts (i.e. “Number clusters discarded”) was not performed on community mixing experiments given that they are composed of highly homogenous communities.


Sample fixation and in situ polymerization. Intact tissue segments (from the colon, cecum or small intestine as noted) were obtained by dissection and immediately fixed in methacarn solution (60% methanol, 30% chloroform, 10% acetic acid) for 24 hours (see Johansson, M. E. V. & Hansson, G. C. Preservation of mucus in histological sections, immunostaining of mucins in fixed tissue, and localization of bacteria with FISH. Methods Mol. Biol. 842, 229-235 (2012)). The fixed tissue was trimmed with a sterile razor into segments no larger than 3 mm in length, and segments containing digesta were selected. Thus, all input samples for MaP-seq analysis contained undisturbed epithelial tissue and lumenal digesta contents. The trimmed sample was then incubated in phosphate buffered saline (PBS) for 5 minutes and was permeabilized in PBS with 0.1% v/v Triton-X 100 for 5 minutes. Next, a matrix embedding solution (see Chung, K. et al. Structural and molecular interrogation of intact biological systems. Nature 497, 332-337 (2013); Chen, F., Tillberg, P. W. & Boyden, E. S. Expansion microscopy. Science 347, 543-548 (2015) containing a reverse sequencing primer with 16S V4 primer 806rB (see Klein, A. M. et al. (2015); Apprill, A., McNally, S., Parsons, R. & Weber, L. Minor revision to V4 region SSU rRNA 806R gene primer greatly increases detection of SAR11 bacterioplankton. Aquat. Microb. Ecol. 75, 129-137 (2015)) and acrydite and photocleavable linker groups was prepared on ice by mixing concentrated stocks of the following components in order: 1×PBS, 10% w/w acrylamide (Sigma-Aldrich A9099), 0.4% w/w N,N′-Bis(acryloyl)cystamine (BAC, Alfa Aesar 44132-03), 5 μM acry_pc_pe2_816r (see Table 1), 0.01% w/w 4-hydroxy-2,2,6,6-tetramethylpiperidin-1-oxyl (Sigma-Aldrich 176141), 0.2% w/w tetramethylethylenediamine (Sigma-Aldrich T7024) and 0.2% w/w ammonium persulfate (Sigma-Aldrich A3678). The BAC crosslinker enables gel degradation upon exposure to reducing conditions. The sample was dabbed dry with a sterile Kimwipe and placed in a PCR tube with excess matrix embedding solution (˜50 μL per segment) and incubated on ice for 5 minutes. Excess embedding solution was removed by pipetting and replaced, and the sample was subsequently incubated on ice for >1 hour for perfusion. Excess embedding solution was removed, and samples were placed in a 37° C. incubator in an anaerobic chamber (Coy Laboratory Products) for >3 hours. Gel-embedded samples were removed, excess polymer matrix was trimmed from the sample with a sterile razor, and the sample was washed twice with PBS and once with TET and stored in TET at 4° C. FIG. 13.


Sample fracturing, lysis and size-selection. Samples were placed in a stainless-steel vial (Biospec 2007) along with a 6.35 mm stainless steel bead (Biospec 11709635ss), and were sealed with a silicone rubber plug cap (Biospec 2008). The vial was placed in liquid nitrogen for >2 minutes, vigorously shaken to dislodge the sample from the vial wall, and quickly transferred to a bead beater (Biospec 112011) and subjected to beating for 10 seconds. PBS was added to the vial and vortexed; clusters in PBS were removed and washed twice with PBS via centrifugation at 15K RPM for 1 minute (Eppendorf 5424). Next, embedded cells were lysed (see Spencer, S. J. et al. Massively parallel sequencing of single cells by epicPCR links functional genes with phylogenetic markers. 1-10 (2015). doi:10.1038/ismej.2015.124); clusters were resuspended in 500 μL lysis buffer (10 mM Tris-HCl [pH 8.0], 1 mM ethylenediaminetetraacetic acid [EDTA], 100 mM NaCl) with 75 U/μL lysozyme (Epicentre R1810M) and were incubated at 37° C. for 1 hour. Clusters were then resuspended in 500 μL digestion buffer (30 mM Tris-HCl [pH 8.0], 1 mM EDTA, 0.5% Triton X-100, 800 mM guanidine hydrochloride [Sigma-Aldrich G9284]) with 0.1 μg/μL proteinase K (Epicentre MPRK092), and were incubated at 65° C. for 15 minutes. Finally, clusters were incubated at 95° C. for 5 minutes to inactivate proteinase K and washed three times with TET.


Samples were next subjected to size-selection. Clusters were first passed through a 40 μm cell strainer (Fisher 22-363-547) to remove large particulate matter. Next, nylon mesh filters (Component Supply Company, 7 μm: U-CMN-7-A, 15 μm: U-CMN-15-A, 31 μm: U-CMN-31-A) were cut to size using a ½″ hole punch and two filter punches were placed in a holder (EMD Millipore SX0001300) for each size. Clusters were passed through the 31 μm filter, 15 μm filter, and 7 μm filter sequentially using a 3 mL syringe (BD 309657); for each filter, clusters were passed through three times, and retained clusters on filters were washed once with TET. Clusters were washed off the 15 μm filter (large, ˜30 μm median diameter) and 7 μm (medium, ˜20 μm median diameter) or collected from the pass-through from the final 7 μm filter (small, ˜7 μm median diameter). The concentration of clusters was quantified by counting on a hemocytometer (INCYTO DHC-N01) and stored at 4° C. in TET for processing within ˜2 days. FIG. 13.


Co-encapsulation of beads and clusters. A microfluidic co-encapsulation strategy was utilized with three syringe pumps (Harvard Apparatus Pump 11 Elite) and observed under a microscope (Nikon Eclipse Ti2). First, 300 μL of HFE-7500 (3M) with 5% w/w surfactant (RAN Biotechnologies 008—FluoroSurfactant) was loaded into a 1 mL low dead volume syringe (Air-Tite Products A1), the syringe was fitted with a needle (BD 305122) and polyethylene tubing (Scientific Commodities Inc., BB31695—PE/2) and primed on a syringe pump. 30 μL of packed barcoded beads were then removed and washed twice with wash buffer (WB, 10 mM Tris HCl [pH 8.0], 0.1 mM EDTA, 0.1% Tween-20) and twice with bead buffer (10 mM Tris HCl [pH 8.0], 0.1% Tween-20, 50 mM KCl, 10 mM fresh DTT [utilized to degrade clusters within droplets]) by addition of buffer and centrifugation at 15K RPM for 1 minute. After the 4 washes, remaining buffer supernatant was removed with a gel-loading tip (Fisher 02-707-139). ˜5 μL of packed beads were loaded into polyethylene tubing and primed with a 1 mL syringe (BD 309626) backfilled with 500 μL HFE-7500. The tubing was protected from light with a black tubing sheath (McMaster-Carr 5231K31) and primed on a syringe pump with needle facing upwards.


Next, a cluster stock was vortexed for 1 minute, 2,500 clusters were removed, washed three times in WB, and the remaining buffer was removed as above. A 45 μL encapsulation mix was prepared (25 μL NEBNext Q5 Hot Start HiFi PCR Master Mix [NEB M0543L], 4 μL Nycoprep Universal [Accurate Chemical & Scientific Corp. AN1106865), 5 μL 10% w/v Pluronic F-127 [Sigma-Aldrich P2443], 1.25 μL 20 mg/mL BSA [NEB B90005], 9.75 μL nuclease-free water) and clusters were resuspended in the mix and vortexed for >10 s. A 1 mL low dead volume syringe was backfilled with 500 μL HFE-7500, and the encapsulation mix was added directly into the tip of the syringe. A needle and polyethylene tubing were fitted to the syringe, protected from light with a black tubing sheath, and primed on a syringe pump with needle facing upwards.


Tubing was connected for the carrier, bead and cluster encapsulation mix channels to a new microfluidic device. Pumps were primed for the carrier, beads and cluster encapsulation mix channels in order and once stable bead packing was observed set to final flow rates of 2 μL/min for carrier, 0.3 μL/min for beads, and 2.7 μL/min for cluster encapsulation mix. Once stable droplet formation was observed, polyethylene tubing was connected to the outlet port and emulsion was collected in a PCR tube (Axygen PCR-02-L-C) prefilled with 10 μL of 30% w/w surfactant in HFE-7500 and 50 μL of mineral oil. Under these conditions, generated droplets were ˜35-45 μm in diameter with bead occupancy of ˜25-50% (packed bead ordering enables loading beating expected Poisson encapsulation statistics (see Abate, A. R., Chen, C.-H., Agresti, J. J. & Weitz, D. A. Beating Poisson encapsulation statistics using close-packed ordering. Lab Chip 9, 2628-2631 (2009)) and extremely low cluster occupancy of <0.1% (cluster aggregation and channel clogging is a limiting factor at higher concentrations). FIG. 14. Emulsion PCR, library preparation and sequencing. The carrier phase underneath the emulsion was removed and replaced with 30 μL of 30% w/w surfactant in HFE-7500 to ensure droplet stability during PCR cycling. Tubes were placed on ice under a 365 nm UV light (Ted Pella Blak-Ray) and exposed for 10 minutes to release amplification primers. The emulsion was then subjected to PCR cycling (10° C. for 2 h, 98° C. for 30 s; 30 cycles of: 98° C. for 10 s, 55° C. for 20 s, 65° C. for 30 s; 65° C. for 2 m) with heated lid off. Coalesced droplet fraction, if present, was removed by pipetting and the carrier phase and mineral oil were removed. Droplets were broken by addition of 20 μL 1H,1H,2H,2H-perfluoro-1-octanol (Sigma-Aldrich 370533), and brief centrifugation in a microfuge tube. The aqueous phase was extracted and passed through a 0.45 μm spin column (Corning 8162) and subjected to an Exol cleanup by adding 50 uL of 1× Exol buffer with 1 U/uL Exol (NEB M0293L) and incubating at 37° C. for 30 minutes. The mixture was then subjected to a 1×SPRI bead cleanup (Beckman Coulter A63881) per the manufacturer's protocol with addition of 1× volume beads and elution in 20 μL of 10 mM Tris-HCl (pH 8.0).


The resulting products were then subjected to a second PCR to add sample indexes and Illumina P5 and P7 adapters. 10 μL of cleanup product was used as template for a 50 μL reaction with 1×NEBNext Q5 Hot Start HiFi PCR Master Mix, 0.5 μM of each of the indexing primers (p5_X, p7_X, see Table 3), and 0.1×SYBR Green I (Invitrogen S7567). The PCR (98° C. for 30 s, cycle: 98° C. for 10 s, 68° C. for 20 s, 65° C. for 30 s; 65° C. for 2 m) was run on a real-time PCR machine (Bio-Rad CFX96) to stop reactions during exponential amplification (typically ˜10 cycles). Products were assessed on an agarose gel (2% E-gel, Thermo Fisher G501802) to confirm the expected ˜490 bp amplicon and were subjected to a 1×SPRI bead cleanup as above. Resulting libraries were quantified via fluorometric quantitation (Thermo Fisher Q32854), pooled, and were subjected to sequencing with an Illumina MiSeq 500 cycle v2 kit (read1: 254 bp, read2: 254 bp) at 12 pM loading concentration with 20% PhiX spike in. Sequence filtering and 16S analysis. For MaP-seq data, a custom python script was utilized to demultiplex reads based on barcode identity and strip primer sequences from reads. Reads were merged and filtered using USEARCH 9.2.64 (see Edgar, R. C. & Flyvbjerg, H. Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics 31, 3476-3482 (2015) with maximum expected errors of 1. The resulting sequences were then dereplicated, de-novo clustered with a minimum cluster size of 2, and reads were mapped to OTUs at 97% identity (see Edgar, R. C. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat. Methods 10, 996-998 (2013). Taxonomy was assigned to OTUs using the RDP classifier (see Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology 73, 5261-5267 (2007). This yielded an OTU table consisting of individual barcodes (i.e., putative clusters) as samples.


Cluster mixing quality control experiment. Two bacterial communities were assembled; the first contained a single strain (e.g. E. coli NEB-beta), the second contained homogenized fecal bacteria. E. coli is not expected in the mouse gut at high abundances (see Xiao, L. et al. A catalog of the mouse gut metagenome. Nature Biotechnology 33, 1103-1108 (2015). To generate homogenized fecal bacteria, fecal pellets were subjected to bead beating (Biospec 1001) with 0.1 mm glass beads in PBS for 1 minute and passed through a 40 μm cell strainer. The two communities were fixed in methacarn, resuspended in approximately equal volume matrix embedding solution to fixed pellet volume and subjected to cluster generation as per the MaP-seq protocol above. The resulting size-selected clusters were then mixed in equal quantity and subjected to encapsulation and sequencing.


Analysis of MaP-seq data. An overview of all MaP-seq datasets generated in this study can be found in Table 5. The resulting dataset contained a large number of barcodes/clusters with varying numbers of reads. A conservative threshold cutoff for considering real clusters was set as the total number of reads in a sample divided by 2,500 (i.e., the number of clusters that were utilized as input during microfluidic encapsulation, and assuming an equal read distribution for each cluster). Reactions yielding an extremely low number of clusters passing this threshold (i.e., <50) were conservatively excluded as they may represent failed encapsulation or amplification reactions.


Clusters were first pre-processed to remove a small number of clusters displaying highly similar OTU abundance profiles within a single technical replicate that appeared to represent technical artifacts (i.e., clusters encapsulated into droplets containing multiple barcoded beads or beads erroneously containing multiple barcodes) which could confound association detection. The pairwise Pearson correlation of all clusters was calculated, and highly correlated sets of clusters (r>0.95) dominated by a single technical replicate and large in size (>90% belonging to a single technical replicate, clusters constitute>1% of the overall dataset) were removed. These artifacts constituted a low amount of the overall dataset. For analysis of presence or absence of species within a cluster, a 2% relative abundance threshold within clusters was utilized, given observation of a small amount of background read-through across clusters and to ensure that at least 2 reads (and not singletons) were required to denote presence of a species.


To determine pairwise associations, prevalent and abundant OTUs within filtered clusters (>2% relative abundance in >10% of clusters) were identified, and 2 by 2 contingency tables of appearance (>2% relative abundance) were calculated for all pairs of OTUs. Fishers exact test was then used to calculate the probability of pairs occurring more or less together than expected (i.e. a null model of random assortment of the two species, assuming equiprobable occupancy at all sites), and resulting p-values were adjusted via the Benjamini-Hochberg procedure (FDR=0.05).


For t-distributed Stochastic Neighbor Embedding (tSNE) analysis (see Maaten, L. V. D. & Hinton, G. Visualizing Data using t-SNE. Journal of Machine Learning Research 9, 2579-2605 (2008), reads for each cluster were subsampled to the lowest number for all clusters in the dataset (as specified in the text) since raw relative abundance values were analyzed (i.e. not utilizing a 2% relative abundance threshold as in other analyses). Bray-Curtis distance between taxa relative abundances within clusters was calculated, and this resulting distance matrix was utilized as the input for tSNE analysis.


The Net Relatedness Index (NRI) was calculated as previously described (see David, L. A. et al. Diet rapidly and reproducibly alters the human gut microbiome. Nature 505, 559-563 (2014) adapting code from the relatedness_library.py script from Qiime 1.9.1 (see Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335-336 (2010)) which implements the same calculation as in phylocom 4.2 (see Webb, C. O., Ackerly, D. D. & Kembel, S. W. Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics 24, 2098-2100 (2008)). Briefly, species presence and absences across clusters were defined using the same 2% relative abundance threshold, and clusters containing only one OTU were omitted from analysis. OTU sequences were aligned and a neighbor-joining tree was constructed using MUSCLE 3.8.31 (see Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792-1797 (2004). The NRI was calculated as a standardized effect size for each cluster: NRI=−1*(MPD,-./012−MPD3.--)/sd(MPD3.--), where MPD,-./012 denotes the mean phylogenetic distance (MPD), and MPD-./012 & sd(MPD3.--) indicate the mean MPD, and the standard deviation of the MPD over 1000 iterations of a null mode. The null model, calculated for each cluster, was random draws for the number of OTUs present in the sample (i.e. preserving cluster OTU richness) from the sample pool (i.e. any OTU observed at least once in any cluster in the sample) without replacement. The null model therefore preserves the OTU richness of each cluster but randomizes the OTUs present from the set of OTUs occurring in the sample.


Bulk 16S sequencing and spike-in for absolute abundance calculation. The bulk sequencing protocol followed our established spike-in sequencing pipeline (see Ji, B. W. et al. (2018)). Briefly, genomic DNA (gDNA) extraction was performed using a custom liquid handling protocol on a Biomek 4000 robot based on the Qiagen MagAttract PowerMicrobiome DNA/RNA Kit (Qiagen 27500-4-EP) but adapted for lower volumes. Samples were subjected to bead beating for a total of 10 minutes. For samples processed with the spike-in sequencing approach for absolute abundance calculation, the sample added was weighed on an analytical balance, and 10 uL of a frozen spike-in strain concentrate (Sporocarcina pasteurii, ATCC 11859, an environmental bacterium not found in the gut microbiome) was added during gDNA preparation. Resulting gDNA was subjected to amplification and sequencing of the 16S V4 region following a dual indexing scheme (see Kozich, J. J., Westcott, S. L., Baxter, N. T., Highlander, S. K. & Schloss, P. D. Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform. Applied and Environmental Microbiology 79, 5112-5120 (2013)) but utilized updated 515f and 806rB primers as in the MaP-seq technique. A 20 μL PCR amplification was performed (1 μM forward and 1 μM reverse barcoded primers, 1 μL prepared gDNA, 10 μL NEBNext Q5 Hot Start HiFi Master Mix, 0.2× final concentration SYBR Green I). The PCR (98° C. for 30 s; cycle: 98° C. for 20 s, 55° C. for 20 s, 65° C. for 60 s, 65° C. for 5 m) was run on a real-time PCR machine to stop reactions during exponential amplification. Amplicon products were quantified and pooled, the expected 390 bp product was gel-extracted, and paired-end sequencing was performed with an Illumina MiSeq 300 cycle v2 kit (read1: 154 bp, read2: 154 bp, custom sequencing primers spiked into sequencing kit) at 10 pM loading concentration with 20% PhiX spike in. Resulting sequences were processed with USEARCH as above. The absolute bacterial density for a sample (A) was calculated by utilizing the weight of sample added (w) and proportion of reads mapping to spike in strain (p/) in the following formula: A=(1−p/)/(p/*w). The absolute density of individual OTUs was calculated by rescaling the total sample absolute density by the relative abundance of sample OTUs. 16S FISH and imaging. Samples were fixed as with the MaP-seq protocol, embedded within paraffin blocks, 4 μm thick lumenal sections were cut and deparaffinized. 16S FISH was performed as previously described (see Mark Welch, J. L., Rossetti, B. J., Rieken, C. W., Dewhirst, F. E. & Borisy, G. G. Biogeography of a human oral microbiome at the micron scale. Proceedings of the National Academy of Sciences 113, E791-800 (2016); Whitaker, W. R., Shepherd, E. S. & Sonnenburg, J. L. (2017). Briefly, previously validated FISH probes targeting abundant taxa present in the sample were obtained with conjugated fluorophores suitable for multiplex imaging: Erec482_a488 or Erec482_cy3 (see Franks, A. H. et al. Variations of bacterial 710 populations in human feces measured by fluorescent in situ hybridization with group-specific 16S rRNA-targeted oligonucleotide probes. Applied and Environmental Microbiology 64, 3336-3345 (1998) targeting Lachnospiraceae, Lab158_cy3 (see Harmsen, H., Elfferich, P. & Schut, F. A 16S rRNA-targeted probe for detection of lactobacilli and enterococci in faecal samples by fluorescent in situ hybridization. Microbial Ecology in Health and Disease 11, 3-12 (1999)) targeting Lactobacillaceae and Enterococcaceae, Ato291_cy5 (see Harmsen, H. et al. Development of 16S rRNA-based probes for the Coriobacterium group and the Atopobium cluster and their application for enumeration of Coriobacteriaceae in human feces from volunteers of different age groups. Applied and Environmental Microbiology 66, 4523-4527 (2000)) targeting Coriobacteriaceae, Eub338_cy5 (see Amann, R. I. et al. Combination of 16S rRNA-targeted oligonucleotide probes with flow cytometry for analyzing mixed microbial populations. Applied and Environmental Microbiology 56, 1919-1925 (1990)) targeting Bacteria, and Non338_cy5 (see Wallner, G., Amann, R. & Beisker, W. Optimizing fluorescent in situ hybridization with rRNA-targeted oligonucleotide probes for flow cytometric identification of microorganisms. Cytometry 14, 136-143 (1993)) control probe (see Table 4). Sections were incubated with probes at 10 ng/μL in FISH hybridization buffer (0.9 M NaCl, 20 mM Tris-HCl pH 7.5, 0.01% SDS, 10% formamide) at 47° C. for 4 hours. Sections were then incubated in preheated FISH wash buffer (0.9 M NaCl, 20 mM Tris-HCl pH 7.5) for 10 minutes, washed 3× times in PBS, incubated with 10 μg/mL DAPI in PBS for 10 minutes and washed 3× times in PBS. Sections were then mounted in mounting medium (Vector Laboratories H1000).


Images were acquired on a Nikon Eclipse Ti2 epifluorescence microscope with a SOLA-SE2 illuminator and Andor Zyla 4.2 plus camera controlled by Nikon Elements AR software. DAPI, FITC/GFP, RFP and CY5 filter cubes (Nikon 96359, 96362, 96364, 96366 respectively) were utilized. Large area four-color fluorescence scans with three 0.6 μm Z-stacks within the 4 μm section were performed with a Plan Apo λ 40× objective. The extended depth of focus (EDF) module was applied to resulting Z-stacks to obtain a focused image across the stack, and images across the entire section were stitched together.


REFERENCES



  • 1. Reichenbach, T., Mobilia, M. & Frey, E. Mobility promotes and jeopardizes biodiversity in rock—paper—scissors games. Nature 448, 1046-1049 (2007).

  • 2. MacArthur, R. H. & Wilson, E. O. The theory of island biogeography. (1967).

  • 3. Cordero, 0. X. & Datta, M. S. Microbial interactions and community assembly at microscales. Current Opinion in Microbiology 31, 227-234 (2016).

  • 4. Swidsinski, A., Loening Baucke, V., Verstraelen, H., Osowska, S. & Doerffel, Y. Biostructure of Fecal Microbiota in Healthy Subjects and Patients With Chronic Idiopathic Diarrhea. Gastroenterology 135, 568-579.e2 (2008).

  • 5. Yasuda, K. et al. Biogeography of the Intestinal Mucosal and Lumenal Microbiome in the Rhesus Macaque. Cell Host & Microbe 17, 385-391 (2015).

  • 6. Earle, K. A. et al. Quantitative Imaging of Gut Microbiota Spatial Organization. Cell Host & Microbe 18, 478-488 (2015).

  • 7. Mark Welch, J. L., Rossetti, B. J., Rieken, C. W., Dewhirst, F. E. & Borisy, G. G. Biogeography of a human oral microbiome at the micron scale. Proceedings of the National Academy of Sciences 113, E791-800 (2016).

  • 8. Mark Welch, J. L., Hasegawa, Y., McNulty, N. P., Gordon, J. I. & Borisy, G. G. Spatial organization of a model 15-member human gut microbiota established in gnotobiotic mice. Proc. Natl. Acad. Sci. U.S.A. 21, 201711596-E9114 (2017).

  • 9. Donaldson, G. P., Lee, S. M. & Mazmanian, S. K. Gut biogeography of the bacterial microbiota. 1-13 (2015). doi:10.1038/nrmicro3552

  • 10. Lee, S. M. et al. Bacterial colonization factors control specificity and stability of the gut microbiota. Nature 1-6 (2013). doi:10.1038/nature12447

  • 11. Nagara, Y., Takada, T., Nagata, Y., Kado, S. & Kushiro, A. Microscale spatial analysis provides evidence for adhesive monopolization of dietary nutrients by specific intestinal bacteria. PLoS ONE 12, e0175497 (2017).

  • 12. Tropini, C., Earle, K. A., Huang, K. C. & Sonnenburg, J. L. The Gut Microbiome: Connecting Spatial Organization to Function. Cell Host & Microbe 21, 433-442 (2017).

  • 13. Nava, G. M., Friedrichsen, H. J. & Stappenbeck, T. S. Spatial organization of intestinal microbiota in the mouse ascending colon. ISME J 5, 627-638 (2010).

  • 14. Pedron, T. et al. A Crypt-Specific Core Microbiota Resides in the Mouse Colon. mBio 3, e00116-12-e00116-12 (2012).

  • 15. Valm, A. M., Welch, J. L. M. & Borisy, G. G. CLASI-FISH: Principles of combinatorial labeling and spectral imaging. Systematic and Applied Microbiology 35, 496-502 (2012).

  • 16. Geva-Zatorsky, N. et al. In vivo imaging and tracking of host-microbiota interactions via metabolic labeling of gut anaerobic bacteria. Nature Medicine 21, 1091-1100 (2015).

  • 17. Whitaker, W. R., Shepherd, E. S. & Sonnenburg, J. L. Tunable Expression Tools Enable Single-Cell Strain Distinction in the Gut Microbiome. Cell 169, 538-546.e12 (2017).

  • 18. Pereira, F. C. & Berry, D. Microbial nutrient niches in the gut. Environ Microbiol 19, 1366-1378 (2017).

  • 19. Donaldson, G. P. et al. Gut microbiota utilize immunoglobulin A for mucosal colonization. Science 360, 795-800 (2018).

  • 20. Wexler, A. G. et al. Human symbionts inject and neutralize antibacterial toxins to persist in the gut. Proc. Natl. Acad. Sci. U.S.A. 201525637-6 (2016). doi:10.1073/pnas.1525637113.

  • 21. Kim, H. J., Boedicker, J. Q., Choi, J. W. & Ismagilov, R. F. Defined spatial structure stabilizes a synthetic multi species bacterial community. Proceedings of the National Academy of Sciences 105, 18188-18193 (2008).

  • 22. Coyte, K. Z., Schluter, J. & Foster, K. R. The ecology of the microbiome: Networks, competition, and stability. Science 350, 663-666 (2015).

  • 23. Amann, R. & Fuchs, B. M. Single-cell identification in microbial communities by improved fluorescence in situ hybridization techniques. Nature Reviews Microbiology 6, 339-348 (2008).

  • 24. Rakoff-Nahoum, S., Coyne, M. J. & Comstock, L. E. An Ecological Network of Polysaccharide Utilization among Human Intestinal Symbionts. Current Biology 24, 40-49 (2014).

  • 25. Ji, B. W. et al. Quantifying spatiotemporal dynamics and noise in absolute microbiota abundances using replicate sampling. biorxiv.org doi:10.1101/310649

  • 26. Ormerod, K. L. et al. Genomic characterization of the uncultured Bacteroidales family S24-7 inhabiting the guts of homeothermic animals. Microbiome 1-17 (2016). doi:10.1186/s40168-016-0181-2

  • 27. Rakoff-Nahoum, S., Foster, K. R. & Comstock, L. E. The evolution of cooperation within the gut microbiota. Nature 533, 255-259 (2016).

  • 28. Carmody, R. N. et al. Diet Dominates Host Genotype in Shaping the Murine Gut Microbiota. Cell Host & Microbe 17, 72-84 (2015).

  • 29. Sonnenburg, E. D. et al. Diet-induced extinctions in the gut microbiota compound over generations. Nature 529, 212-215 (2016).

  • 30. David, L. A. et al. Diet rapidly and reproducibly alters the human gut microbiome. Nature 505, 559-563 (2014).

  • 31. Webb, C. O., Ackerly, D. D., McPeek, M. A. & Donoghue, M. J. Phylogenies and Community Ecology. Annu. Rev. Ecol. Syst. 33, 475-505 (2002).

  • 32. Cavender-Bares, J., Kozak, K. H., Fine, P. V. A. & Kembel, S. W. The merging of community ecology and phylogenetic biology. Ecology Letters 12, 693-715 (2009).

  • 33. Mazutis, L. et al. Single-cell analysis and sorting using droplet-based microfluidics. Nat Protoc 8, 870-891 (2013).

  • 34. Parada, A. E., Needham, D. M. & Fuhrman, J. A. Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. Environ Microbiol 18, 1403-1414 (2016).

  • 35. Walters, W. et al. Improved Bacterial 16S rRNA Gene (V4 and V4-5) and Fungal Internal Transcribed Spacer Marker Gene Primers for Microbial Community Surveys. mSystems 1, e00009-15-10 (2015).

  • 36. Klein, A. M. et al. Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells. Cell 161, 1187-1201 (2015).

  • 37. Bose, S. et al. Scalable microfluidics for single-cell RNA printing and sequencing. Genome Biology 1-16 (2015). doi:10.1186/s13059-015-0684-3

  • 38. Zilionis, R. et al. Single-cell barcoding and sequencing using droplet microfluidics. Nat Protoc 12, 44-73 (2017).

  • 39. Johansson, M. E. V. & Hansson, G. C. Preservation of mucus in histological sections, immunostaining of mucins in fixed tissue, and localization of bacteria with FISH. Methods Mol. Biol. 842, 229-235 (2012).

  • 40. Chung, K. et al. Structural and molecular interrogation of intact biological systems. Nature 497, 332-337 (2013).

  • 41. Chen, F., Tillberg, P. W. & Boyden, E. S. Expansion microscopy. Science 347, 543-548 (2015).

  • 42. Apprill, A., McNally, S., Parsons, R. & Weber, L. Minor revision to V4 region SSU rRNA 806R gene primer greatly increases detection of SAR11 bacterioplankton. Aquat. Microb. Ecol. 75, 129-137 (2015).

  • 43. Spencer, S. J. et al. Massively parallel sequencing of single cells by epicPCR links functional genes with phylogenetic markers. 1-10 (2015). doi:10.1038/ismej.2015.124

  • 44. Abate, A. R., Chen, C.-H., Agresti, J. J. & Weitz, D. A. Beating Poisson encapsulation statistics using close-packed ordering. Lab Chip 9, 2628-2631 (2009).

  • 45. Edgar, R. C. & Flyvbjerg, H. Error filtering, pair assembly and error correction for next generation sequencing reads. Bioinformatics 31, 3476-3482 (2015).

  • 46. Edgar, R. C. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat. Methods 10, 996-998 (2013).

  • 47. Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology 73, 5261-5267 (2007).

  • 48. Xiao, L. et al. A catalog of the mouse gut metagenome. Nature Biotechnology 33, 1103-1108 (2015).

  • 49. Maaten, L. V. D. & Hinton, G. Visualizing Data using t-SNE. Journal of Machine Learning Research 9, 2579-2605 (2008).

  • 50. Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335-336 (2010).

  • 51. Webb, C. O., Ackerly, D. D. & Kembel, S. W. Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics 24, 2098-2100 (2008).

  • 52. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792-1797 (2004).

  • 53. Kozich, J. J., Westcott, S. L., Baxter, N. T., Highlander, S. K. & Schloss, P. D. Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform. Applied and Environmental Microbiology 79, 5112-5120 (2013).

  • 54. Franks, A. H. et al. Variations of bacterial 710 populations in human feces measured by fluorescent in situ hybridization with group-specific 16S rRNA-targeted oligonucleotide probes. Applied and Environmental Microbiology 64, 3336-3345 (1998).

  • 55. Harmsen, H., Elfferich, P. & Schut, F. A 16S rRNA-targeted probe for detection of lactobacilli and enterococci in faecal samples by fluorescent in situ hybridization. Microbial Ecology in Health and Disease 11, 3-12 (1999).

  • 56. Harmsen, H. et al. Development of 16S rRNA-based probes for the Coriobacterium group and the Atopobium cluster and their application for enumeration of Coriobacteriaceae in human feces from volunteers of different age groups. Applied and Environmental Microbiology 66, 4523-4527 (2000).

  • 57. Amann, R. I. et al. Combination of 16S rRNA-targeted oligonucleotide probes with flow cytometry for analyzing mixed microbial populations. Applied and Environmental Microbiology 56, 1919-1925 (1990).

  • 58. Wallner, G., Amann, R. & Beisker, W. Optimizing fluorescent in situ hybridization with rRNA-targeted oligonucleotide probes for flow cytometric identification of microorganisms. Cytometry 14, 136-143 (1993).



Example 2 Precision Microbiome Replacement to Enhance Cancer Checkpoint Immunotherapy

The human gut contains trillions of microorganisms (microbiota) that form a complex and unique ecosystem within our bodies. It is now clear that these bacteria have systemic effects on the host and can directly interact with many classes of pharmaceutical interventions, altering efficacy and clinical outcomes1,2. A prime example of this effect is in cancer immunotherapy, where recent studies suggest that the commensal microbiota modulate the efficacy of therapies involving monoclonal antibodies (mAbs) targeted to the PD-1 receptor, via stimulation of the immune system2-7. Importantly, it has been observed that living bacteria in gut are required to elicit this effect3. Correspondingly, approaches to alter microbiomes to improve the efficacy of cancer immunotherapy are sorely needed.


Current microbiome manipulation strategies broadly fall under two approaches: chemical perturbation and probiotic supplementation8. The abundance of bacterial species within a given microbiome can be altered by administration of chemical compounds (i.e. different diets, prebiotic compounds, antibiotics). Alternatively, new bacterial strains or combinations of strains (probiotics or fecal microbiota transplant) with functionality of interest can be administered. However, the pervasive variability of individual microbiomes limits the efficacy of these techniques. Chemical perturbations will be unsuccessful if a targeted bacterial species is not present, and their effect can be highly variable. Supplemented probiotic strains may not robustly colonize all microbiomes9. An alternative to these approaches is to completely replace a microbiome with a new defined microbiome containing specific desired functionality. Here, precision microbiome replacement, a new paradigm in manipulating microbiomes, can be used to enhance cancer immunotherapy.


Specific aims: To develop a precision microbiome replacement therapy to improve the efficacy of cancer immunotherapies, we will (1) generate a comprehensive reference collection of gut bacterial strains, (2) identify strains promoting immunotherapy efficacy using combinatorial in vivo animal model screens, and (3) develop a microbiome transplantation therapy and formulate strains into stable consortia for delivery.


Approach: (1) Generate a comprehensive reference collection of gut bacterial strains. Individual bacterial strains can act as effectors (i.e., stimulating the host immune system) in the context of complex communities10. Fecal samples will be collected from geographically and environmentally distinct individuals representing global gut microbial diversity. Samples will then be subjected to culturing and isolation in anaerobic settings, and individual strains will be isolated utilizing colony picking robots. Resulting bacterial strains will be identified and characterized using whole-genome sequencing and unique strains of interest will be subjected to long-term cryogenic storage. This sequencing characterization may be conducted by utilizing robotic liquid handling for library preparation (i.e. Labcyte Echo 550, Agilent Bravo, Formulatrix Mantis; sequence on HiSeq X Ten). This automated approach will allow for generation of a gut bacterial strain collection resource in an economic manner.


(2) Identify strains promoting immunotherapy efficacy using combinatorial in vivo animal model screens. Representative strains from the collection will be selected, revived from storage and inoculated into cohorts of germ-free mice. The mice will be subjected to standard cancer models (e.g. metastatic cutaneous squamous cell carcinoma) and given mAb checkpoint immunotherapy (e.g. cemiplimab) and efficacy and response to therapy will be measured. Importantly, the screen will be performed with different combinations of strains rather than individual strains, to enable efficient and higher throughput screens10. Strains promoting efficacy of immunotherapy will be identified.


(3) Develop a new microbiome transplantation therapy and formulate strains into stable consortia for delivery. To perform efficient microbiome transplantation, strategies utilizing oral antibiotic therapy to clear to eradicate commensal microbiota and subsequent oral delivery of new microbial strains will be tested in gnotobiotic mouse models with humanized microbiota. Combinations of antibiotics, dosing, and timing of the therapy in addition to physical clearing of the gut and dietary changes will be explored to optimize efficient elimination of endogenous microbiota and colonization of new strains. Next, the identified immunotherapy enhancing strains will be formulated into a complex microbiome consortium recapitulating the ecology and functionality of naturally occurring microbiomes. The stability of the microbiome (i.e. retention of desired strains over time, resistance to invasion by other commensal strains) will be measured in mice models and improved by iterative design.


Some species of gut bacteria may be recalcitrant to in vitro isolation. Recent studies, however, suggest that the majority of the gut microbiome is culturable12, and the cultivability of species could be further improved by systematic exploration of culture media formulation. The transplantation and resulting microbiome could differ across individuals due to interactions between the strains and the host. However, recent studies suggest that environment dominates host genotype in determining microbiota composition, implying that microbiome transplantation may be reproducible across different host backgrounds13.


Although there may be variability of microbiomes across individuals, direct therapeutic microbiomes interventions can be used. Alternatively, new microbiomes with desired functionality can be designed and replaced. Cancer immunotherapy offers a salient first application of the concept, but the pipeline could be broadly scaled to other microbiome linked human disorders.


REFERENCES



  • 1. Spanogiannopoulos, P., Bess, E. N., Carmody, R. N. & Turnbaugh, P. J. The microbial pharmacists within us: a metagenomic view of xenobiotic metabolism. Nature Reviews Microbiology 14, 273-287 (2016).

  • 2. Zitvogel, L., Ma, Y., Raoult, D., Kroemer, G. & Gajewski, T. F. The microbiome in cancer immunotherapy: Diagnostic tools and therapeutic strategies. Science 359, 1366-1370 (2018).

  • 3. Sivan, A. et al. Commensal Bifidobacterium promotes antitumor immunity and facilitates anti-PD-L1 efficacy. Science 350, 1084-1089 (2015).

  • 4. Matson, V. et al. The commensal microbiome is associated with anti-PD-1 efficacy in metastatic melanoma patients. Science 359, 104-108 (2018).

  • 5. Routy, B. et al. Gut microbiome influences efficacy of PD-1-based immunotherapy against epithelial tumors. Science 359, 91-97 (2018).

  • 6. Gopalakrishnan, V. et al. Gut microbiome modulates response to anti-PD-1 immunotherapy in melanoma patients. Science 359, 97-103 (2018).

  • 7. Vétizou, M. et al. Anticancer immunotherapy by CTLA-4 blockade relies on the gut microbiota. Science 350, 1079-1084 (2015).

  • 8. Sheth, R. U., Cabral, V., Chen, S. P. & Wang, H. H. Manipulating Bacterial Communities by in situ Microbiome Engineering. Trends in Genetics 32, 189-200 (2016).

  • 9. Maldonado-Gómez, M. X. et al. Stable Engraftment of Bifidobacterium longum AH1206 in the Human Gut Depends on Individualized Features of the Resident Microbiome. Cell Host & Microbe 20, 515-526 (2016).

  • 10. Faith, J. J., Ahern, P. P., Ridaura, V. K., Cheng, J. & Gordon, J. I. Identifying gut microbe-host phenotype relationships using combinatorial communities in gnotobiotic mice. Science Translational Medicine 6, 220ra11-220ra11 (2014).

  • 11. Sheth, R. U., Yim, S. S., Wu, F. L. & Wang, H. H. Multiplex recording of cellular events over time on CRISPR biological tape. Science 358, 1457-1461 (2017).

  • 12. Browne, H. P. et al. Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation. Nature 533, 543-546 (2016).

  • 13. Rothschild, D. et al. Environment dominates over host genetics in shaping human gut microbiota. Nature 555, 210-215 (2018).



Example 3 Antibiotics I

Disruption of the normal homeostatic balance of the gut can lead to profound changes in the gut microbiome. For example, antibiotics are known to cause large-scale alterations to the gut microbiome. In general, antibiotics not only target the intended pathogens, but often cause collateral damage in wiping out native commensal microbiota that have sensitivity to the compound. Clinical administration of antibiotics not only reduces biodiversity in the gut microbiome, but also predisposes individuals to a variety of short- and long-term diseases, including antibiotic-associated C. difficile infections, diabetes, and inflammation. While it is generally believed that antibiotic exposure disrupts the state of the microbiome by increasing its fragility and susceptibility to pathogenic infections, specific mechanisms mediating this process is not understood. In large ecological systems, changes in spatial patterning can play an important role in susceptibility to invasion, for example in exotic plant invasion in river and creek ecosystems. Exposure to antibiotics c a n lead to destabilization of the natural commensal microbiota by removing key members in the community that facilitate robust interspecies interactions, which in turn is marked by a profound change in the microbial spatial architecture that reduces the microbiome's natural resistance to colonization by pathogens. We used two wild-type C57BL6/J mice that were both fed on a conventional diet and co-housed prior to normalize their gut microbiota, which was validated by bulk fecal sequencing. We then separated the mice into individual cages and introduced ciprofloxacin (0.625 mg/mL) in drinking water ad-libitum for 2 days in one cage and a sham control in the other cage. We extracted small intestinal tissues from both the control and ciprofloxacin-treated mice and applied bulk 16S sequencing and MIST-seq. As expected, exposure to antibiotics significantly shifted the gut community, leading to an overall loss in microbiome diversity and the domination of particular groups (e.g. Lactobacillales and Clostridiales) compared to the wild-type control (FIG. 15A). More interestingly, we observed a robust small intestinal interaction network (FIG. 15B) that is significantly disrupted by ciprofloxacin, resulting in a few dominant species with drastically altered spatial organization.


Example 4 Antibiotics II

The prevalent use of antibiotics both in pediatrics and adult populations and its impact on the gut microbiome is hypothesized to be a key contributor in the rise of autoimmune and metabolic disorders. However, the impact of specific antibiotics on the gut microbiome can vary significantly depending on the type (e.g. broad vs narrow spectrum, antibiotic class), therapeutic dosage and duration, resistance profiles of endogenous bacteria, and geographic location along the GI. We will explore how antibiotics can alter the spatial microbiota organization. Altered spatial patterns due to antibiotics exposure may reflect changes in microbiota function beyond simple variations in community composition or abundance. We will use antibiotics with various modes of action and varying levels of host and microbiota impact. Specifically, we will administer Ciprofloxacin (Lincoasimide; single oral gavage 10 mg/kg), Vancomycin (Glycopeptide, 0.625 mg/mL, drinking water ad libitum), Ampicillin (□-lactam, 0.5 mg/mL, drinking water ad libitum), Streptomycin (Aminoglycoside, 5 mg/mL, drinking water ad libitum) to different cohorts of 5 pre-cohoused wild-type C57BL6/J mice as previously described. Mice from each cohort will be sacrificed at day 0 (before treatment), 3, 7 and 10 (FIG. 16a). Samples from the small intestine, colon and fecal matter will be analyzed by MIST-seq. We collect temporal samples to assess the transition states from an unperturbed microbiota to one that is compromised by antibiotics. Three biological replicate studies will be performed and both male and female mice will be tested separately. As before, should additional replicates be needed for sufficient statistical power, we will increase the number of mice per group accordingly. Based on our preliminary studies, we expect knockdown or abolition of specific species and a loss in biodiversity upon treatment. We anticipate that the spatial ecological role of strains killed by an antibiotic will be a key factor in its degree of GI microbiome disruption. Disrupted networks may lead to more fragile states with reduced inter-microbial interactions and increased vulnerability to infiltration by a pathogen. Importantly, previous work showed that some antibiotics (e.g. Ampicillin, Streptomycin) increased murine gut susceptibility to C. difficile infection, whereas others (e.g. Ciprofloxacin, Vancomycin) led to resistance. We will compare spatial mapping results between these two antibiotic “classes” to identify systematic spatial differences and key players. For validation, we will perform bulk sequencing to assess abundance and compositional changes. In addition, we will apply FISH techniques to visualize specific architectural changes using specific probes to identify major microbiota families, pre- and post-antibiotic treatment. We will also perform in vitro culture studies and antibiotic sensitivity assays on isolates to validate MIST-seq findings.


To functionally characterize gut microbiota ecology, we will employ a classical ecology approach to introduce species into novel or perturbed environments, and tracked them longitudinally over space and time. We will introduce “mock” murine fecal transplants into wild-type and antibiotic-perturbed mice and profile the colonization process. Specifically, 5 cohorts of C57BL6/J mice will be obtained commercially (Taconic Biosciences), 4 of which will be orally treated with different antibiotics for 10 days, and the remaining will serve as a control group. We will isolate live fecal microbiota from mice obtained through another vendor (i.e. Jackson Laboratories, Charles River Laboratories) that are known to harbor highly distinct microbiomes, which we will validate by bulk 16S sequencing (FIG. 16b). Freshly collected fecal pellets will be placed in pre-reduced PBS in anaerobic conditions and live microbiota will be isolated by established protocols. Two groups of 3 mice from each of the four cohorts will receive a different fecal microbiota gavage (approximating a human FMT procedure); the control group will receive a gavage of pre-reduced PBS; animals will be sacrificed at days 0, 3 and 10, and tissue from the small intestine and colon will be profiled; the experiment will be performed in triplicate and with gender-controlled cohorts. As before, should additional replicates be necessary for sufficient statistical power, we will increase the number of mice per group accordingly. We will then perform detailed analysis of ecosystem assembly of the two different “donor” fecal transplants in the five “recipient” ecological contexts. Importantly, this will allow us to assess processes shaping FMT efficacy in an in vivo context. For example, given that diet plays an important role in microbiota composition via environmental filtering (i.e. available nutrients), the spatial and compositional structure of microbiota after FMT may be similar to that before perturbation. On the other hand, novel spatial patterns may form due to other ecological processes such as microbial competition17 or cooperation. Thus, this study will advance our functional knowledge of principles that contribute to microbiota colonization and maintenance, relevant for designing better FMT therapies (e.g. defined communities or personalized FMT).









TABLE 6







Sequences









Sequence ID




Number
Sequence
Species





SEQ ID NO: 1
GACTACTCCACGACGCTCTTCCGATCT
Synthetic





SEQ ID NO: 2
ATTAGGTCGACGTGTGCTCTTCCGATCTGGACTACNVGG
Synthetic



GTWTCTAAT






SEQ ID NO: 3
TTACCGCGGCKGCTGRCAC
Synthetic





SEQ ID NO: 4
CGCTCAGCAGTGTCTCGCACCTAGTAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 5
CGCTCAGCAGTGTCTCGCTAGAGCTAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 6
CGCTCAGCAGTGTCTCGCACTCTCTAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 7
CGCTCAGCAGTGTCTCGCGGAACACAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 8
CGCTCAGCAGTGTCTCGCCAGCTAAAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 9
CGCTCAGCAGTGTCTCGCGTATGGTAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 10
CGCTCAGCAGTGTCTCGCAACGGTAAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 11
CGCTCAGCAGTGTCTCGCAGTTGGCAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 12
CGCTCAGCAGTGTCTCGCAGACTTCAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 13
CGCTCAGCAGTGTCTCGCGTGCTTAAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 14
CGCTCAGCAGTGTCTCGCCCACTAGAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 15
CGCTCAGCAGTGTCTCGCGCGCTATAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 16
CGCTCAGCAGTGTCTCGCTGACACTAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 17
CGCTCAGCAGTGTCTCGCGAGGAACAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 18
CGCTCAGCAGTGTCTCGCTTGACCAAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 19
CGCTCAGCAGTGTCTCGCGGTAGCAAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 20
CGCTCAGCAGTGTCTCGCCGTTGAGAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 21
CGCTCAGCAGTGTCTCGCACAACTGAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 22
CGCTCAGCAGTGTCTCGCTCAGTCAAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 23
CGCTCAGCAGTGTCTCGCCGTACATAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 24
CGCTCAGCAGTGTCTCGCTGAGTGCAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 25
CGCTCAGCAGTGTCTCGCCCTGTTAAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 26
CGCTCAGCAGTGTCTCGCACCTCTAAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 27
CGCTCAGCAGTGTCTCGCATTCCACAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 28
CGCTCAGCAGTGTCTCGCTCGTATGAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 29
CGCTCAGCAGTGTCTCGCAGGTTGTAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 30
CGCTCAGCAGTGTCTCGCCGTAGTCAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 31
CGCTCAGCAGTGTCTCGCCTTCTCGAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 32
CGCTCAGCAGTGTCTCGCAGGTAAGAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 33
CGCTCAGCAGTGTCTCGCGATCTCAAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 34
CGCTCAGCAGTGTCTCGCATCGAACAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 35
CGCTCAGCAGTGTCTCGCCACGCATAGATCGGAAGAGCG
Synthetic



TCGTG






SEQ ID NO: 36
CGCTCAGCAGTGTCTCGCAACTCAGGAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 37
CGCTCAGCAGTGTCTCGCTGCCACAAAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 38
CGCTCAGCAGTGTCTCGCATGGCGATAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 39
CGCTCAGCAGTGTCTCGCAATCAGCGAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 40
CGCTCAGCAGTGTCTCGCGGTTGTACAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 41
CGCTCAGCAGTGTCTCGCCTCGACTTAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 42
CGCTCAGCAGTGTCTCGCTAGGAAGCAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 43
CGCTCAGCAGTGTCTCGCGTGCATGTAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 44
CGCTCAGCAGTGTCTCGCTCAATCGGAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 45
CGCTCAGCAGTGTCTCGCTCAAGCTCAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 46
CGCTCAGCAGTGTCTCGCAGTGTCACAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 47
CGCTCAGCAGTGTCTCGCTGTGTTCCAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 48
CGCTCAGCAGTGTCTCGCTCCGAATCAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 49
CGCTCAGCAGTGTCTCGCGGAGTACAAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 50
CGCTCAGCAGTGTCTCGCAGGACAGAAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 51
CGCTCAGCAGTGTCTCGCGCACAGTTAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 52
CGCTCAGCAGTGTCTCGCCGACAACAAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 53
CGCTCAGCAGTGTCTCGCAGCACGTAAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 54
CGCTCAGCAGTGTCTCGCCCAACAGTAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 55
CGCTCAGCAGTGTCTCGCTCAGGACAAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 56
CGCTCAGCAGTGTCTCGCCTATCCTGAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 57
CGCTCAGCAGTGTCTCGCTGTCTGTCAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 58
CGCTCAGCAGTGTCTCGCCCTAGTCTAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 59
CGCTCAGCAGTGTCTCGCGTAATGGCAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 60
CGCTCAGCAGTGTCTCGCTAGTGGCTAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 61
CGCTCAGCAGTGTCTCGCGAATCTGCAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 62
CGCTCAGCAGTGTCTCGCTTCGATGCAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 63
CGCTCAGCAGTGTCTCGCGCTTGGTTAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 64
CGCTCAGCAGTGTCTCGCAGCTGATCAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 65
CGCTCAGCAGTGTCTCGCATAAGCGGAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 66
CGCTCAGCAGTGTCTCGCACTTCGGAAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 67
CGCTCAGCAGTGTCTCGCCTAGTCGAAGATCGGAAGAGC
Synthetic



GTCGTG






SEQ ID NO: 68
CGCTCAGCAGTGTCTCGCCGTTCTTGCAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 69
CGCTCAGCAGTGTCTCGCTGTAGACTCAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 70
CGCTCAGCAGTGTCTCGCGAAGGCCTAAGATCGGAAGA
Synthetic



GCGTCGTG






SEQ ID NO: 71
CGCTCAGCAGTGTCTCGCTTCGTAAGGAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 72
CGCTCAGCAGTGTCTCGCTGATCACCTAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 73
CGCTCAGCAGTGTCTCGCTAGCTAACGAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 74
CGCTCAGCAGTGTCTCGCCGTAGAAGGAGATCGGAAGA
Synthetic



GCGTCGTG






SEQ ID NO: 75
CGCTCAGCAGTGTCTCGCTCTCTCGAAAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 76
CGCTCAGCAGTGTCTCGCTCTAGTTCCAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 77
CGCTCAGCAGTGTCTCGCCCGAAGAGAAGATCGGAAGA
Synthetic



GCGTCGTG






SEQ ID NO: 78
CGCTCAGCAGTGTCTCGCAGGTGACATAGATCGGAAGA
Synthetic



GCGTCGTG






SEQ ID NO: 79
CGCTCAGCAGTGTCTCGCCTGAGAACGAGATCGGAAGA
Synthetic



GCGTCGTG






SEQ ID NO: 80
CGCTCAGCAGTGTCTCGCCCAGCTGAAAGATCGGAAGA
Synthetic



GCGTCGTG






SEQ ID NO: 81
CGCTCAGCAGTGTCTCGCCGTTCGACAAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 82
CGCTCAGCAGTGTCTCGCTCTTAGACCAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 83
CGCTCAGCAGTGTCTCGCCACGAGCAAAGATCGGAAGA
Synthetic



GCGTCGTG






SEQ ID NO: 84
CGCTCAGCAGTGTCTCGCCTGCCGAATAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 85
CGCTCAGCAGTGTCTCGCGGGCTCATAAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 86
CGCTCAGCAGTGTCTCGCCACCGTACTAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 87
CGCTCAGCAGTGTCTCGCGTGTCTCGAAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 88
CGCTCAGCAGTGTCTCGCTTACTGCGAAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 89
CGCTCAGCAGTGTCTCGCTCCATACGAAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 90
CGCTCAGCAGTGTCTCGCGATCCAGGTAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 91
CGCTCAGCAGTGTCTCGCAGTTGCGAAAGATCGGAAGA
Synthetic



GCGTCGTG






SEQ ID NO: 92
CGCTCAGCAGTGTCTCGCAGGTTGAGAAGATCGGAAGA
Synthetic



GCGTCGTG






SEQ ID NO: 93
CGCTCAGCAGTGTCTCGCGTTGCGCTTAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 94
CGCTCAGCAGTGTCTCGCCTCGAGAGAAGATCGGAAGA
Synthetic



GCGTCGTG






SEQ ID NO: 95
CGCTCAGCAGTGTCTCGCTGTTCCTAGAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 96
CGCTCAGCAGTGTCTCGCCTCACACTGAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 97
CGCTCAGCAGTGTCTCGCACCACATGTAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 98
CGCTCAGCAGTGTCTCGCAGCTTAACCAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 99
CGCTCAGCAGTGTCTCGCCACCTATGCAGATCGGAAGAG
Synthetic



CGTCGTG






SEQ ID NO: 100
CGACGAGGCTGGAGTGACACTGGTACCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 101
CGACGAGGCTGGAGTGACGGTACTGTCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 102
CGACGAGGCTGGAGTGACTCTGTGTGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 103
CGACGAGGCTGGAGTGACTATGGCTCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 104
CGACGAGGCTGGAGTGACGTTGTCAGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 105
CGACGAGGCTGGAGTGACATGCCAGTCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 106
CGACGAGGCTGGAGTGACCGCTACTACGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 107
CGACGAGGCTGGAGTGACCATACACGCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 108
CGACGAGGCTGGAGTGACTCGAGGATCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 109
CGACGAGGCTGGAGTGACGGTTCGATCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 110
CGACGAGGCTGGAGTGACACGGAACACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 111
CGACGAGGCTGGAGTGACCGTTGCATCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 112
CGACGAGGCTGGAGTGACATACGTCCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 113
CGACGAGGCTGGAGTGACGATCTGGACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 114
CGACGAGGCTGGAGTGACTCTCGAAGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 115
CGACGAGGCTGGAGTGACCTGTGCTACGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 116
CGACGAGGCTGGAGTGACAGGTGGAACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 117
CGACGAGGCTGGAGTGACTAGCAACGCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 118
CGACGAGGCTGGAGTGACGGTCATTCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 119
CGACGAGGCTGGAGTGACAGATACGCCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 120
CGACGAGGCTGGAGTGACGAACTGCTCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 121
CGACGAGGCTGGAGTGACAGTGCACACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 122
CGACGAGGCTGGAGTGACCCGATCATCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 123
CGACGAGGCTGGAGTGACACAAGGACCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 124
CGACGAGGCTGGAGTGACATTCGGTCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 125
CGACGAGGCTGGAGTGACTTGTGACGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 126
CGACGAGGCTGGAGTGACGAAGTCTGCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 127
CGACGAGGCTGGAGTGACTGGACGAACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 128
CGACGAGGCTGGAGTGACGAGTTCCTCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 129
CGACGAGGCTGGAGTGACGATAGGAGCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 130
CGACGAGGCTGGAGTGACAGCTTGGACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 131
CGACGAGGCTGGAGTGACCACATCCTCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 132
CGACGAGGCTGGAGTGACAGTCCTGACGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 133
CGACGAGGCTGGAGTGACCTTGTAGCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 134
CGACGAGGCTGGAGTGACCAGGAGTACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 135
CGACGAGGCTGGAGTGACCACAAGGACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 136
CGACGAGGCTGGAGTGACTTCCTCTGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 137
CGACGAGGCTGGAGTGACCCATTGCTCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 138
CGACGAGGCTGGAGTGACGCACATAGCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 139
CGACGAGGCTGGAGTGACCACTGTACCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 140
CGACGAGGCTGGAGTGACGTGATCTCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 141
CGACGAGGCTGGAGTGACAATGCCGTCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 142
CGACGAGGCTGGAGTGACTCCTTGTCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 143
CGACGAGGCTGGAGTGACAGTAGGCACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 144
CGACGAGGCTGGAGTGACAGCCTCTTCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 145
CGACGAGGCTGGAGTGACCGATTACGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 146
CGACGAGGCTGGAGTGACCCAGGAATCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 147
CGACGAGGCTGGAGTGACGAGTCAGTCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 148
CGACGAGGCTGGAGTGACTGAGAGGACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 149
CGACGAGGCTGGAGTGACACGACTCACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 150
CGACGAGGCTGGAGTGACTAGCTCAGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 151
CGACGAGGCTGGAGTGACTAACCGGTCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 152
CGACGAGGCTGGAGTGACGTACTGAGCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 153
CGACGAGGCTGGAGTGACAACCACTCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 154
CGACGAGGCTGGAGTGACCAGTTACCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 155
CGACGAGGCTGGAGTGACGATGGATGCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 156
CGACGAGGCTGGAGTGACCTACCTCTCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 157
CGACGAGGCTGGAGTGACGTCAAGAGCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 158
CGACGAGGCTGGAGTGACGATCTACGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 159
CGACGAGGCTGGAGTGACACATTCCGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 160
CGACGAGGCTGGAGTGACCTGAATCCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 161
CGACGAGGCTGGAGTGACTGGCCATACGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 162
CGACGAGGCTGGAGTGACGTCTTGCTCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 163
CGACGAGGCTGGAGTGACACGTGTTGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 164
CGACGAGGCTGGAGTGACGAAGCGTTCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 165
CGACGAGGCTGGAGTGACTAACGCCACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 166
CGACGAGGCTGGAGTGACAGGCTGTACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 167
CGACGAGGCTGGAGTGACCTACAGTGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 168
CGACGAGGCTGGAGTGACTTCAGAGCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 169
CGACGAGGCTGGAGTGACTGCCTACACGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 170
CGACGAGGCTGGAGTGACCGGATTGACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 171
CGACGAGGCTGGAGTGACGGAGGATTCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 172
CGACGAGGCTGGAGTGACCATTAGCCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 173
CGACGAGGCTGGAGTGACTTGGTCACCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 174
CGACGAGGCTGGAGTGACCAAGCAAGCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 175
CGACGAGGCTGGAGTGACCAACATCCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 176
CGACGAGGCTGGAGTGACGACGACAACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 177
CGACGAGGCTGGAGTGACATCGAGTCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 178
CGACGAGGCTGGAGTGACTATGCGAGCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 179
CGACGAGGCTGGAGTGACTAGCTTCCCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 180
CGACGAGGCTGGAGTGACACCAACGTCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 181
CGACGAGGCTGGAGTGACACGCGATACGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 182
CGACGAGGCTGGAGTGACGTCAGCTACGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 183
CGACGAGGCTGGAGTGACCACCAGATCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 184
CGACGAGGCTGGAGTGACCAACCTTGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 185
CGACGAGGCTGGAGTGACTTGCCTTGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 186
CGACGAGGCTGGAGTGACAGTCTGCTCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 187
CGACGAGGCTGGAGTGACGTCCTTCACGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 188
CGACGAGGCTGGAGTGACCGGTCTATCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 189
CGACGAGGCTGGAGTGACTCTGCCTTCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 190
CGACGAGGCTGGAGTGACCAAGTTGGCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 191
CGACGAGGCTGGAGTGACATCTACGGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 192
CGACGAGGCTGGAGTGACCACTTCTGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 193
CGACGAGGCTGGAGTGACCACACAACCGCTCAGCAGTG
Synthetic



TCTCGC






SEQ ID NO: 194
CGACGAGGCTGGAGTGACGCCTAATGCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 195
CGACGAGGCTGGAGTGACGTTCGCATCGCTCAGCAGTGT
Synthetic



CTCGC






SEQ ID NO: 196
TTACCGCGGCKGCTGRCACACGAGTCTAGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 197
TTACCGCGGCKGCTGRCACACGCCTCTATCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 198
TTACCGCGGCKGCTGRCACACGCCATTCTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 199
TTACCGCGGCKGCTGRCACACTACGGTTGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 200
TTACCGCGGCKGCTGRCACACACTCTACCCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 201
TTACCGCGGCKGCTGRCACACTAGGTCCACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 202
TTACCGCGGCKGCTGRCACACTCCTGAGTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 203
TTACCGCGGCKGCTGRCACACGTGGATAGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 204
TTACCGCGGCKGCTGRCACACGCGCTATTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 205
TTACCGCGGCKGCTGRCACACGGAAGGAACGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 206
TTACCGCGGCKGCTGRCACACGGACTCAACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 207
TTACCGCGGCKGCTGRCACACAACACTCGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 208
TTACCGCGGCKGCTGRCACACCCGGAATTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 209
TTACCGCGGCKGCTGRCACACAACTTGCCCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 210
TTACCGCGGCKGCTGRCACACTTGACAGGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 211
TTACCGCGGCKGCTGRCACACTCTTAGCGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 212
TTACCGCGGCKGCTGRCACACCTGTTGCACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 213
TTACCGCGGCKGCTGRCACACAGAACACGCGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 214
TTACCGCGGCKGCTGRCACACCCTTGATGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 215
TTACCGCGGCKGCTGRCACACAGCGATCTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 216
TTACCGCGGCKGCTGRCACACGCTCAGAACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 217
TTACCGCGGCKGCTGRCACACATTGCGTGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 218
TTACCGCGGCKGCTGRCACACCATCCGTTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 219
TTACCGCGGCKGCTGRCACACTCTCTGGTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 220
TTACCGCGGCKGCTGRCACACAACGAGCACGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 221
TTACCGCGGCKGCTGRCACACACGTTCACCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 222
TTACCGCGGCKGCTGRCACACATCAGCACCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 223
TTACCGCGGCKGCTGRCACACGATAGCGACGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 224
TTACCGCGGCKGCTGRCACACAGAGCTTGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 225
TTACCGCGGCKGCTGRCACACTGATCGTCCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 226
TTACCGCGGCKGCTGRCACACACGATACGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 227
TTACCGCGGCKGCTGRCACACCTAACTGGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 228
TTACCGCGGCKGCTGRCACACTCGCGTAACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 229
TTACCGCGGCKGCTGRCACACCGGTTCTTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 230
TTACCGCGGCKGCTGRCACACTTGGTTCGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 231
TTACCGCGGCKGCTGRCACACGAAGTAGCCGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 232
TTACCGCGGCKGCTGRCACACGGCTAGAACGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 233
TTACCGCGGCKGCTGRCACACCATCGTGACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 234
TTACCGCGGCKGCTGRCACACTCACCAACCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 235
TTACCGCGGCKGCTGRCACACCTTCAAGGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 236
TTACCGCGGCKGCTGRCACACAGTAGCTCCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 237
TTACCGCGGCKGCTGRCACACGCCACATTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 238
TTACCGCGGCKGCTGRCACACTTCACGGACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 239
TTACCGCGGCKGCTGRCACACTGACGTTGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 240
TTACCGCGGCKGCTGRCACACTCATCTGGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 241
TTACCGCGGCKGCTGRCACACCGTTCATCCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 242
TTACCGCGGCKGCTGRCACACAACCGTCACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 243
TTACCGCGGCKGCTGRCACACTGCTAAGCCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 244
TTACCGCGGCKGCTGRCACACCAGGTAGACGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 245
TTACCGCGGCKGCTGRCACACAAGAACCGCGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 246
TTACCGCGGCKGCTGRCACACAGGAGACTCGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 247
TTACCGCGGCKGCTGRCACACAGTGAAGGCGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 248
TTACCGCGGCKGCTGRCACACTCTTCAGCCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 249
TTACCGCGGCKGCTGRCACACAACGGAGTCGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 250
TTACCGCGGCKGCTGRCACACGAAGAGACCGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 251
TTACCGCGGCKGCTGRCACACATTGGTGGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 252
TTACCGCGGCKGCTGRCACACCTGTCAAGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 253
TTACCGCGGCKGCTGRCACACAGGCATCACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 254
TTACCGCGGCKGCTGRCACACAAGAGGTCCGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 255
TTACCGCGGCKGCTGRCACACTGCATTCGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 256
TTACCGCGGCKGCTGRCACACTTGGACGTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 257
TTACCGCGGCKGCTGRCACACTTGCTGGACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 258
TTACCGCGGCKGCTGRCACACTGGAGATGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 259
TTACCGCGGCKGCTGRCACACTACGTACCCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 260
TTACCGCGGCKGCTGRCACACTGACACCTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 261
TTACCGCGGCKGCTGRCACACGTCCATTGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 262
TTACCGCGGCKGCTGRCACACCAGAGAAGCGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 263
TTACCGCGGCKGCTGRCACACTGCTTCAGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 264
TTACCGCGGCKGCTGRCACACTACACTGCCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 265
TTACCGCGGCKGCTGRCACACGGACGTATCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 266
TTACCGCGGCKGCTGRCACACCTCGCATACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 267
TTACCGCGGCKGCTGRCACACGCATCCTACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 268
TTACCGCGGCKGCTGRCACACAGGCTTACCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 269
TTACCGCGGCKGCTGRCACACGTAAGTCGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 270
TTACCGCGGCKGCTGRCACACTTCTGGAGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 271
TTACCGCGGCKGCTGRCACACGACACACACGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 272
TTACCGCGGCKGCTGRCACACACCAGACACGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 273
TTACCGCGGCKGCTGRCACACTGCAGCTTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 274
TTACCGCGGCKGCTGRCACACGCAACTTCCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 275
TTACCGCGGCKGCTGRCACACACTCGCTTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 276
TTACCGCGGCKGCTGRCACACTGAACTCCCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 277
TTACCGCGGCKGCTGRCACACGTGTAAGCCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 278
TTACCGCGGCKGCTGRCACACATGCACCTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 279
TTACCGCGGCKGCTGRCACACTCCGTCAACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 280
TTACCGCGGCKGCTGRCACACGTCGGTATCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 281
TTACCGCGGCKGCTGRCACACACAGATCCCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 282
TTACCGCGGCKGCTGRCACACTCGGATCTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 283
TTACCGCGGCKGCTGRCACACAGAGTCGTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 284
TTACCGCGGCKGCTGRCACACGAATAGCGCGACGAGGC
Synthetic



TGGAGTGAC






SEQ ID NO: 285
TTACCGCGGCKGCTGRCACACGGATTGGTCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 286
TTACCGCGGCKGCTGRCACACGCCATAGACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 287
TTACCGCGGCKGCTGRCACACTGTCAGAGCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 288
TTACCGCGGCKGCTGRCACACCCTACGAACGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 289
TTACCGCGGCKGCTGRCACACGTTACGTCCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 290
TTACCGCGGCKGCTGRCACACCGAGATACCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 291
TTACCGCGGCKGCTGRCACACGCATTGACCGACGAGGCT
Synthetic



GGAGTGAC






SEQ ID NO: 292
ACTAGGT
Synthetic





SEQ ID NO: 293
AGCTCTA
Synthetic





SEQ ID NO: 294
AGAGAGT
Synthetic





SEQ ID NO: 295
GTGTTCC
Synthetic





SEQ ID NO: 296
TTAGCTG
Synthetic





SEQ ID NO: 297
ACCATAC
Synthetic





SEQ ID NO: 298
TACCGTT
Synthetic





SEQ ID NO: 299
GCCAACT
Synthetic





SEQ ID NO: 300
GAAGTCT
Synthetic





SEQ ID NO: 301
TAAGCAC
Synthetic





SEQ ID NO: 302
CTAGTGG
Synthetic





SEQ ID NO: 303
ATAGCGC
Synthetic





SEQ ID NO: 304
AGTGTCA
Synthetic





SEQ ID NO: 305
GTTCCTC
Synthetic





SEQ ID NO: 306
TGGTCAA
Synthetic





SEQ ID NO: 307
TGCTACC
Synthetic





SEQ ID NO: 308
CTCAACG
Synthetic





SEQ ID NO: 309
CAGTTGT
Synthetic





SEQ ID NO: 310
TGACTGA
Synthetic





SEQ ID NO: 311
ATGTACG
Synthetic





SEQ ID NO: 312
GCACTCA
Synthetic





SEQ ID NO: 313
TAACAGG
Synthetic





SEQ ID NO: 314
TAGAGGT
Synthetic





SEQ ID NO: 315
GTGGAAT
Synthetic





SEQ ID NO: 316
CATACGA
Synthetic





SEQ ID NO: 317
ACAACCT
Synthetic





SEQ ID NO: 318
GACTACG
Synthetic





SEQ ID NO: 319
CGAGAAG
Synthetic





SEQ ID NO: 320
CTTACCT
Synthetic





SEQ ID NO: 321
TGAGATC
Synthetic





SEQ ID NO: 322
GTTCGAT
Synthetic





SEQ ID NO: 323
ATGCGTG
Synthetic





SEQ ID NO: 324
CCTGAGTT
Synthetic





SEQ ID NO: 325
TTGTGGCA
Synthetic





SEQ ID NO: 326
ATCGCCAT
Synthetic





SEQ ID NO: 327
CGCTGATT
Synthetic





SEQ ID NO: 328
GTACAACC
Synthetic





SEQ ID NO: 329
AAGTCGAG
Synthetic





SEQ ID NO: 330
GCTTCCTA
Synthetic





SEQ ID NO: 331
ACATGCAC
Synthetic





SEQ ID NO: 332
CCGATTGA
Synthetic





SEQ ID NO: 333
GAGCTTGA
Synthetic





SEQ ID NO: 334
GTGACACT
Synthetic





SEQ ID NO: 335
GGAACACA
Synthetic





SEQ ID NO: 336
GATTCGGA
Synthetic





SEQ ID NO: 337
TGTACTCC
Synthetic





SEQ ID NO: 338
TCTGTCCT
Synthetic





SEQ ID NO: 339
AACTGTGC
Synthetic





SEQ ID NO: 340
TGTTGTCG
Synthetic





SEQ ID NO: 341
TACGTGCT
Synthetic





SEQ ID NO: 342
ACTGTTGG
Synthetic





SEQ ID NO: 343
TGTCCTGA
Synthetic





SEQ ID NO: 344
CAGGATAG
Synthetic





SEQ ID NO: 345
GACAGACA
Synthetic





SEQ ID NO: 346
AGACTAGG
Synthetic





SEQ ID NO: 347
GCCATTAC
Synthetic





SEQ ID NO: 348
AGCCACTA
Synthetic





SEQ ID NO: 349
GCAGATTC
Synthetic





SEQ ID NO: 350
GCATCGAA
Synthetic





SEQ ID NO: 351
AACCAAGC
Synthetic





SEQ ID NO: 352
GATCAGCT
Synthetic





SEQ ID NO: 353
CCGCTTAT
Synthetic





SEQ ID NO: 354
TCCGAAGT
Synthetic





SEQ ID NO: 355
TCGACTAG
Synthetic





SEQ ID NO: 356
GCAAGAACG
Synthetic





SEQ ID NO: 357
GAGTCTACA
Synthetic





SEQ ID NO: 358
TAGGCCTTC
Synthetic





SEQ ID NO: 359
CCTTACGAA
Synthetic





SEQ ID NO: 360
AGGTGATCA
Synthetic





SEQ ID NO: 361
CGTTAGCTA
Synthetic





SEQ ID NO: 362
CCTTCTACG
Synthetic





SEQ ID NO: 363
TTCGAGAGA
Synthetic





SEQ ID NO: 364
GGAACTAGA
Synthetic





SEQ ID NO: 365
TCTCTTCGG
Synthetic





SEQ ID NO: 366
ATGTCACCT
Synthetic





SEQ ID NO: 367
CGTTCTCAG
Synthetic





SEQ ID NO: 368
TTCAGCTGG
Synthetic





SEQ ID NO: 369
TGTCGAACG
Synthetic





SEQ ID NO: 370
GGTCTAAGA
Synthetic





SEQ ID NO: 371
TTGCTCGTG
Synthetic





SEQ ID NO: 372
ATTCGGCAG
Synthetic





SEQ ID NO: 373
TATGAGCCC
Synthetic





SEQ ID NO: 374
AGTACGGTG
Synthetic





SEQ ID NO: 375
TCGAGACAC
Synthetic





SEQ ID NO: 376
TCGCAGTAA
Synthetic





SEQ ID NO: 377
TCGTATGGA
Synthetic





SEQ ID NO: 378
ACCTGGATC
Synthetic





SEQ ID NO: 379
TTCGCAACT
Synthetic





SEQ ID NO: 380
TCTCAACCT
Synthetic





SEQ ID NO: 381
AAGCGCAAC
Synthetic





SEQ ID NO: 382
TCTCTCGAG
Synthetic





SEQ ID NO: 383
CTAGGAACA
Synthetic





SEQ ID NO: 384
CAGTGTGAG
Synthetic





SEQ ID NO: 385
ACATGTGGT
Synthetic





SEQ ID NO: 386
GGTTAAGCT
Synthetic





SEQ ID NO: 387
GCATAGGTG
Synthetic





SEQ ID NO: 388
GTACCAGT
Synthetic





SEQ ID NO: 389
ACAGTACC
Synthetic





SEQ ID NO: 390
CACACAGA
Synthetic





SEQ ID NO: 391
GAGCCATA
Synthetic





SEQ ID NO: 392
CTGACAAC
Synthetic





SEQ ID NO: 393
ACTGGCAT
Synthetic





SEQ ID NO: 394
TAGTAGCG
Synthetic





SEQ ID NO: 395
CGTGTATG
Synthetic





SEQ ID NO: 396
ATCCTCGA
Synthetic





SEQ ID NO: 397
ATCGAACC
Synthetic





SEQ ID NO: 398
TGTTCCGT
Synthetic





SEQ ID NO: 399
ATGCAACG
Synthetic





SEQ ID NO: 400
GGACGTAT
Synthetic





SEQ ID NO: 401
TCCAGATC
Synthetic





SEQ ID NO: 402
CTTCGAGA
Synthetic





SEQ ID NO: 403
TAGCACAG
Synthetic





SEQ ID NO: 404
TTCCACCT
Synthetic





SEQ ID NO: 405
CGTTGCTA
Synthetic





SEQ ID NO: 406
GAATGACC
Synthetic





SEQ ID NO: 407
GCGTATCT
Synthetic





SEQ ID NO: 408
AGCAGTTC
Synthetic





SEQ ID NO: 409
TGTGCACT
Synthetic





SEQ ID NO: 410
ATGATCGG
Synthetic





SEQ ID NO: 411
GTCCTTGT
Synthetic





SEQ ID NO: 412
GACCGAAT
Synthetic





SEQ ID NO: 413
CGTCACAA
Synthetic





SEQ ID NO: 414
CAGACTTC
Synthetic





SEQ ID NO: 415
TTCGTCCA
Synthetic





SEQ ID NO: 416
AGGAACTC
Synthetic





SEQ ID NO: 417
CTCCTATC
Synthetic





SEQ ID NO: 418
TCCAAGCT
Synthetic





SEQ ID NO: 419
AGGATGTG
Synthetic





SEQ ID NO: 420
TCAGGACT
Synthetic





SEQ ID NO: 421
GCTACAAG
Synthetic





SEQ ID NO: 422
TACTCCTG
Synthetic





SEQ ID NO: 423
TCCTTGTG
Synthetic





SEQ ID NO: 424
CAGAGGAA
Synthetic





SEQ ID NO: 425
AGCAATGG
Synthetic





SEQ ID NO: 426
CTATGTGC
Synthetic





SEQ ID NO: 427
GTACAGTG
Synthetic





SEQ ID NO: 428
GAGATCAC
Synthetic





SEQ ID NO: 429
ACGGCATT
Synthetic





SEQ ID NO: 430
GACAAGGA
Synthetic





SEQ ID NO: 431
TGCCTACT
Synthetic





SEQ ID NO: 432
AAGAGGCT
Synthetic





SEQ ID NO: 433
CGTAATCG
Synthetic





SEQ ID NO: 434
ATTCCTGG
Synthetic





SEQ ID NO: 435
ACTGACTC
Synthetic





SEQ ID NO: 436
TCCTCTCA
Synthetic





SEQ ID NO: 437
TGAGTCGT
Synthetic





SEQ ID NO: 438
CTGAGCTA
Synthetic





SEQ ID NO: 439
ACCGGTTA
Synthetic





SEQ ID NO: 440
CTCAGTAC
Synthetic





SEQ ID NO: 441
GAGTGGTT
Synthetic





SEQ ID NO: 442
GGTAACTG
Synthetic





SEQ ID NO: 443
CATCCATC
Synthetic





SEQ ID NO: 444
AGAGGTAG
Synthetic





SEQ ID NO: 445
CTCTTGAC
Synthetic





SEQ ID NO: 446
CGTAGATC
Synthetic





SEQ ID NO: 447
CGGAATGT
Synthetic





SEQ ID NO: 448
GGATTCAG
Synthetic





SEQ ID NO: 449
TATGGCCA
Synthetic





SEQ ID NO: 450
AGCAAGAC
Synthetic





SEQ ID NO: 451
CAACACGT
Synthetic





SEQ ID NO: 452
AACGCTTC
Synthetic





SEQ ID NO: 453
TGGCGTTA
Synthetic





SEQ ID NO: 454
TACAGCCT
Synthetic





SEQ ID NO: 455
CACTGTAG
Synthetic





SEQ ID NO: 456
GCTCTGAA
Synthetic





SEQ ID NO: 457
TGTAGGCA
Synthetic





SEQ ID NO: 458
TCAATCCG
Synthetic





SEQ ID NO: 459
AATCCTCC
Synthetic





SEQ ID NO: 460
GGCTAATG
Synthetic





SEQ ID NO: 461
GTGACCAA
Synthetic





SEQ ID NO: 462
CTTGCTTG
Synthetic





SEQ ID NO: 463
GGATGTTG
Synthetic





SEQ ID NO: 464
TTGTCGTC
Synthetic





SEQ ID NO: 465
GACTCGAT
Synthetic





SEQ ID NO: 466
CTCGCATA
Synthetic





SEQ ID NO: 467
GGAAGCTA
Synthetic





SEQ ID NO: 468
ACGTTGGT
Synthetic





SEQ ID NO: 469
TATCGCGT
Synthetic





SEQ ID NO: 470
TAGCTGAC
Synthetic





SEQ ID NO: 471
ATCTGGTG
Synthetic





SEQ ID NO: 472
CAAGGTTG
Synthetic





SEQ ID NO: 473
CAAGGCAA
Synthetic





SEQ ID NO: 474
AGCAGACT
Synthetic





SEQ ID NO: 475
TGAAGGAC
Synthetic





SEQ ID NO: 476
ATAGACCG
Synthetic





SEQ ID NO: 477
AAGGCAGA
Synthetic





SEQ ID NO: 478
CCAACTTG
Synthetic





SEQ ID NO: 479
CCGTAGAT
Synthetic





SEQ ID NO: 480
CAGAAGTG
Synthetic





SEQ ID NO: 481
GTTGTGTG
Synthetic





SEQ ID NO: 482
CATTAGGC
Synthetic





SEQ ID NO: 483
ATGCGAAC
Synthetic





SEQ ID NO: 484
CTAGACTC
Synthetic





SEQ ID NO: 485
ATAGAGGC
Synthetic





SEQ ID NO: 486
AGAATGGC
Synthetic





SEQ ID NO: 487
CAACCGTA
Synthetic





SEQ ID NO: 488
GGTAGAGT
Synthetic





SEQ ID NO: 489
TGGACCTA
Synthetic





SEQ ID NO: 490
ACTCAGGA
Synthetic





SEQ ID NO: 491
CTATCCAC
Synthetic





SEQ ID NO: 492
AATAGCGC
Synthetic





SEQ ID NO: 493
TTCCTTCC
Synthetic





SEQ ID NO: 494
TTGAGTCC
Synthetic





SEQ ID NO: 495
CGAGTGTT
Synthetic





SEQ ID NO: 496
AATTCCGG
Synthetic





SEQ ID NO: 497
GGCAAGTT
Synthetic





SEQ ID NO: 498
CCTGTCAA
Synthetic





SEQ ID NO: 499
CGCTAAGA
Synthetic





SEQ ID NO: 500
TGCAACAG
Synthetic





SEQ ID NO: 501
CGTGTTCT
Synthetic





SEQ ID NO: 502
CATCAAGG
Synthetic





SEQ ID NO: 503
AGATCGCT
Synthetic





SEQ ID NO: 504
TTCTGAGC
Synthetic





SEQ ID NO: 505
CACGCAAT
Synthetic





SEQ ID NO: 506
AACGGATG
Synthetic





SEQ ID NO: 507
ACCAGAGA
Synthetic





SEQ ID NO: 508
TGCTCGTT
Synthetic





SEQ ID NO: 509
GTGAACGT
Synthetic





SEQ ID NO: 510
GTGCTGAT
Synthetic





SEQ ID NO: 511
TCGCTATC
Synthetic





SEQ ID NO: 512
CAAGCTCT
Synthetic





SEQ ID NO: 513
GACGATCA
Synthetic





SEQ ID NO: 514
CGTATCGT
Synthetic





SEQ ID NO: 515
CCAGTTAG
Synthetic





SEQ ID NO: 516
TTACGCGA
Synthetic





SEQ ID NO: 517
AAGAACCG
Synthetic





SEQ ID NO: 518
CGAACCAA
Synthetic





SEQ ID NO: 519
GCTACTTC
Synthetic





SEQ ID NO: 520
TTCTAGCC
Synthetic





SEQ ID NO: 521
TCACGATG
Synthetic





SEQ ID NO: 522
GTTGGTGA
Synthetic





SEQ ID NO: 523
CCTTGAAG
Synthetic





SEQ ID NO: 524
GAGCTACT
Synthetic





SEQ ID NO: 525
AATGTGGC
Synthetic





SEQ ID NO: 526
TCCGTGAA
Synthetic





SEQ ID NO: 527
CAACGTCA
Synthetic





SEQ ID NO: 528
CCAGATGA
Synthetic





SEQ ID NO: 529
GATGAACG
Synthetic





SEQ ID NO: 530
TGACGGTT
Synthetic





SEQ ID NO: 531
GCTTAGCA
Synthetic





SEQ ID NO: 532
TCTACCTG
Synthetic





SEQ ID NO: 533
CGGTTCTT
Synthetic





SEQ ID NO: 534
AGTCTCCT
Synthetic





SEQ ID NO: 535
CCTTCACT
Synthetic





SEQ ID NO: 536
GCTGAAGA
Synthetic





SEQ ID NO: 537
ACTCCGTT
Synthetic





SEQ ID NO: 538
GTCTCTTC
Synthetic





SEQ ID NO: 539
CCACCAAT
Synthetic





SEQ ID NO: 540
CTTGACAG
Synthetic





SEQ ID NO: 541
TGATGCCT
Synthetic





SEQ ID NO: 542
GACCTCTT
Synthetic





SEQ ID NO: 543
CGAATGCA
Synthetic





SEQ ID NO: 544
ACGTCCAA
Synthetic





SEQ ID NO: 545
TCCAGCAA
Synthetic





SEQ ID NO: 546
CATCTCCA
Synthetic





SEQ ID NO: 547
GGTACGTA
Synthetic





SEQ ID NO: 548
AGGTGTCA
Synthetic





SEQ ID NO: 549
CAATGGAC
Synthetic





SEQ ID NO: 550
CTTCTCTG
Synthetic





SEQ ID NO: 551
CTGAAGCA
Synthetic





SEQ ID NO: 552
GCAGTGTA
Synthetic





SEQ ID NO: 553
ATACGTCC
Synthetic





SEQ ID NO: 554
TATGCGAG
Synthetic





SEQ ID NO: 555
TAGGATGC
Synthetic





SEQ ID NO: 556
GTAAGCCT
Synthetic





SEQ ID NO: 557
CGACTTAC
Synthetic





SEQ ID NO: 558
CTCCAGAA
Synthetic





SEQ ID NO: 559
TGTGTGTC
Synthetic





SEQ ID NO: 560
TGTCTGGT
Synthetic





SEQ ID NO: 561
AAGCTGCA
Synthetic





SEQ ID NO: 562
GAAGTTGC
Synthetic





SEQ ID NO: 563
AAGCGAGT
Synthetic





SEQ ID NO: 564
GGAGTTCA
Synthetic





SEQ ID NO: 565
GCTTACAC
Synthetic





SEQ ID NO: 566
AGGTGCAT
Synthetic





SEQ ID NO: 567
TTGACGGA
Synthetic





SEQ ID NO: 568
ATACCGAC
Synthetic





SEQ ID NO: 569
GGATCTGT
Synthetic





SEQ ID NO: 570
AGATCCGA
Synthetic





SEQ ID NO: 571
ACGACTCT
Synthetic





SEQ ID NO: 572
CGCTATTC
Synthetic





SEQ ID NO: 573
ACCAATCC
Synthetic





SEQ ID NO: 574
TCTATGGC
Synthetic





SEQ ID NO: 575
CTCTGACA
Synthetic





SEQ ID NO: 576
TTCGTAGG
Synthetic





SEQ ID NO: 577
GACGTAAC
Synthetic





SEQ ID NO: 578
GTATCTCG
Synthetic





SEQ ID NO: 579
GTCAATGC
Synthetic





SEQ ID NO: 580
CAAGCAGAAGACGGCATACGAGATTCGATGAGGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 581
CAAGCAGAAGACGGCATACGAGATAACGATCCGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 582
CAAGCAGAAGACGGCATACGAGATTAACGTGGGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 583
CAAGCAGAAGACGGCATACGAGATATGGAGGAGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 584
CAAGCAGAAGACGGCATACGAGATGCGAAGATGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 585
CAAGCAGAAGACGGCATACGAGATACTTCGCTGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 586
CAAGCAGAAGACGGCATACGAGATTGCGTAAGGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 587
CAAGCAGAAGACGGCATACGAGATGGTCAAGTGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 588
CAAGCAGAAGACGGCATACGAGATAGGCTTACGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 589
CAAGCAGAAGACGGCATACGAGATGATTCTCGGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 590
CAAGCAGAAGACGGCATACGAGATGTCTCCTAGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 591
CAAGCAGAAGACGGCATACGAGATGACGGTATGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 592
CAAGCAGAAGACGGCATACGAGATCATGGTGTGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 593
CAAGCAGAAGACGGCATACGAGATTGTCTACCGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 594
CAAGCAGAAGACGGCATACGAGATACCATGCAGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 595
CAAGCAGAAGACGGCATACGAGATCATTCCTGGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 596
CAAGCAGAAGACGGCATACGAGATAGGACTAGGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 597
CAAGCAGAAGACGGCATACGAGATGCTTGTTGGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 598
CAAGCAGAAGACGGCATACGAGATAGTCACACGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 599
CAAGCAGAAGACGGCATACGAGATCCAGTTGTGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 600
CAAGCAGAAGACGGCATACGAGATCTCCATTCGTGACTG
Synthetic



GAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 601
CAAGCAGAAGACGGCATACGAGATTTGCCAACGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 602
CAAGCAGAAGACGGCATACGAGATGAGCACATGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 603
CAAGCAGAAGACGGCATACGAGATATGTGGTGGTGACT
Synthetic



GGAGTTCAGACGTGTGCTCTTCCGATCT






SEQ ID NO: 604
AATGATACGGCGACCACCGAGATCTACACTAGATCGCAC
Synthetic



ACTCTTTCCCTACACGACGCTCTTCCGATCT






SEQ ID NO: 605
AATGATACGGCGACCACCGAGATCTACACCTCTCTATAC
Synthetic



ACTCTTTCCCTACACGACGCTCTTCCGATCT






SEQ ID NO: 606
AATGATACGGCGACCACCGAGATCTACACTATCCTCTAC
Synthetic



ACTCTTTCCCTACACGACGCTCTTCCGATCT






SEQ ID NO: 607
AATGATACGGCGACCACCGAGATCTACACAGAGTAGAA
Synthetic



CACTCTTTCCCTACACGACGCTCTTCCGATCT






SEQ ID NO: 608
AATGATACGGCGACCACCGAGATCTACACGTAAGGAGA
Synthetic



CACTCTTTCCCTACACGACGCTCTTCCGATCT






SEQ ID NO: 609
AATGATACGGCGACCACCGAGATCTACACACTGCATAAC
Synthetic



ACTCTTTCCCTACACGACGCTCTTCCGATCT






SEQ ID NO: 610
AATGATACGGCGACCACCGAGATCTACACAAGGAGTAA
Synthetic



CACTCTTTCCCTACACGACGCTCTTCCGATCT






SEQ ID NO: 611
AATGATACGGCGACCACCGAGATCTACACCTAAGCCTAC
Synthetic



ACTCTTTCCCTACACGACGCTCTTCCGATCT






SEQ ID NO: 612
CTCATCGA
Synthetic





SEQ ID NO: 613
GGATCGTT
Synthetic





SEQ ID NO: 614
CCACGTTA
Synthetic





SEQ ID NO: 615
TCCTCCAT
Synthetic





SEQ ID NO: 616
ATCTTCGC
Synthetic





SEQ ID NO: 617
AGCGAAGT
Synthetic





SEQ ID NO: 618
CTTACGCA
Synthetic





SEQ ID NO: 619
ACTTGACC
Synthetic





SEQ ID NO: 620
GTAAGCCT
Synthetic





SEQ ID NO: 621
CGAGAATC
Synthetic





SEQ ID NO: 622
TAGGAGAC
Synthetic





SEQ ID NO: 623
ATACCGTC
Synthetic





SEQ ID NO: 624
ACACCATG
Synthetic





SEQ ID NO: 625
GGTAGACA
Synthetic





SEQ ID NO: 626
TGCATGGT
Synthetic





SEQ ID NO: 627
CAGGAATG
Synthetic





SEQ ID NO: 628
CTAGTCCT
Synthetic





SEQ ID NO: 629
CAACAAGC
Synthetic





SEQ ID NO: 630
GTGTGACT
Synthetic





SEQ ID NO: 631
ACAACTGG
Synthetic





SEQ ID NO: 632
GAATGGAG
Synthetic





SEQ ID NO: 633
GTTGGCAA
Synthetic





SEQ ID NO: 634
ATGTGCTC
Synthetic





SEQ ID NO: 635
CACCACAT
Synthetic





SEQ ID NO: 636
TAGATCGC
Synthetic





SEQ ID NO: 637
CTCTCTAT
Synthetic





SEQ ID NO: 638
TATCCTCT
Synthetic





SEQ ID NO: 639
AGAGTAGA
Synthetic





SEQ ID NO: 640
GTAAGGAG
Synthetic





SEQ ID NO: 641
ACTGCATA
Synthetic





SEQ ID NO: 642
AAGGAGTA
Synthetic





SEQ ID NO: 643
CTAAGCCT
Synthetic





SEQ ID NO: 644
AGATCGGAAGAGCGTCGTG
Synthetic





SEQ ID NO: 645
TTACCGCGGCKGCTGRCAC
Synthetic





SEQ ID NO: 646
GCTTCTTAGTCAGGTACCG
Synthetic





SEQ ID NO: 647
GGTATTAGCAYCTGTTTCCA
Synthetic





SEQ ID NO: 648
GGTCGGTCTCTCAACCC
Synthetic





SEQ ID NO: 649
GCTTCTTAGTCAGGTACCG
Synthetic





SEQ ID NO: 650
GCTGCCTCCCGTAGGAGT
Synthetic





SEQ ID NO: 651
ACTCCTACGGGAGGCAGC
Synthetic









The scope of the present invention is not limited by what has been specifically shown and described hereinabove. Those skilled in the art will recognize that there are suitable alternatives to the depicted examples of materials, configurations, constructions and dimensions. Numerous references, including patents and various publications, are cited and discussed in the description of this invention. The citation and discussion of such references is provided merely to clarify the description of the present invention and is not an admission that any reference is prior art to the invention described herein. All references cited and discussed in this specification are incorporated herein by reference in their entirety. Variations, modifications and other implementations of what is described herein will occur to those of ordinary skill in the art without departing from the spirit and scope of the invention. While certain embodiments of the present invention have been shown and described, it will be obvious to those skilled in the art that changes and modifications may be made without departing from the spirit and scope of the invention. The matter set forth in the foregoing description and accompanying drawings is offered by way of illustration only and not as a limitation.

Claims
  • 1. A method of determining microbial identities and/or abundances in a biological sample, the method comprising: (a) immobilizing the biological sample in a matrix;(b) fracturing the matrix into clusters; and(c) determining identities and/or abundances of microbes in the clusters,wherein in step (b) the matrix is fractured through cryo-fracturing.
  • 2. The method of claim 1, wherein the clusters comprise co-localized cells.
  • 3. The method of claim 1, wherein in step (c) identities and/or abundances of microbes are determined by sequencing DNAs and/or RNAs.
  • 4. The method of claim 3, wherein the DNAs are genomic DNAs.
  • 5. The method of claim 1, wherein the in step (c) identities and/or abundances of microbes are determined by analyzing proteins, polypeptides, and/or metabolites.
  • 6. The method of claim 1, wherein in step (a) the biological sample is immobilized via perfusion and polymerization of the matrix.
  • 7. The method of claim 1, wherein the matrix comprises a polymer.
  • 8. The method of claim 1, wherein the matrix comprises an acrylamide polymer.
  • 9. The method of claim 1, wherein the matrix comprises a plurality of 16S ribosomal RNA (16S rRNA) amplification primers.
  • 10. The method of claim 9, wherein the plurality of 16S rRNA amplification primers are covalently linked to the matrix.
  • 11. The method of claim 9, wherein the plurality of 16S rRNA amplification primers are linked to the matrix through photocleavable linkers.
  • 12. The method of claim 11, wherein the photocleavable linkers are acrydite linkers.
  • 13. The method of claim 1, further comprising step (d) processing the matrix by chemical or enzymatic means after step (a) or step (b).
  • 14. The method of claim 13, wherein step (d) comprises lysing cells.
  • 15. The method of claim 1, further comprising passing the clusters through a filter for size selection.
  • 16. The method of claim 15, wherein after passing the clusters through a filter for size selection the clusters have a median diameter ranging from about 1 μm to about 100 μm.
  • 17. The method of claim 15, wherein after passing the clusters through a filter for size selection the clusters have a median diameter ranging from about 10 μm to about 50 μm.
  • 18. The method of claim 15, wherein after passing the clusters through a filter for size selection the clusters have a median diameter ranging from about 1 μm to about 20 μm.
  • 19. The method of claim 1, wherein the clusters are microparticles.
  • 20. The method of claim 1, wherein the cryo-fracturing is cryo-bead beating.
  • 21. The method of claim 1, wherein in step (c) identities and/or abundances of microbes are determined through droplet-based encapsulation.
  • 22. The method of claim 21, wherein the droplet-based encapsulation is through co-encapsulating the clusters with beads in droplets, wherein each droplet comprises a cluster and a bead, each bead comprising a unique molecular barcode.
  • 23. The method of claim 22, wherein the beads comprise a plurality of 16S rRNA amplification primers, and wherein the plurality of 16S rRNA amplification primers linked to each bead comprise a unique molecular barcode.
  • 24. The method of claim 23, wherein the plurality of 16S rRNA amplification primers are covalently linked to the beads.
  • 25. The method of claim 23, wherein the plurality of 16S rRNA amplification primers are linked to the beads through photocleavable linkers.
  • 26. The method of claim 25, wherein the photocleavable linkers are acrydite linkers.
  • 27. The method of claim 22, wherein the beads comprise a polymer.
  • 28. The method of claim 22, wherein the beads comprise an acrylamide polymer.
  • 29. The method of claim 21, wherein the droplet-based encapsulation is through capturing the clusters in emulsion droplets comprising molecular barcodes, each emulsion droplet comprising identical molecular barcodes.
  • 30. The method of claim 29, wherein the emulsion droplets have a diameter ranging from about 35 μm to about 45 μm.
  • 31. The method of claim 1, further comprising cleaving the plurality of 16S rRNA amplification primers from the matrix and/or the beads.
  • 32. The method of claim 1, further comprising degrading the matrix.
  • 33. The method of claim 32, wherein the matrix is degraded through exposure to reducing conditions.
  • 34. The method of claim 1, further comprising polymerase chain reaction (PCR) amplification.
  • 35. The method of claim 3, wherein the sequencing is deep sequencing.
  • 36. The method of claim 1, wherein the biological sample is obtained from a mammal.
  • 37. The method of claim 36, wherein the biological sample is obtained from a nervous system, a pulmonary system, a peripheral vascular system, a cardiovascular system, and/or a gastrointestinal system of a mammal.
  • 38. The method of claim 36, wherein the biological sample is obtained from the brain, a lung, a bronchus, an alveolus, an artery, a vein, a heart, an esophagus, a stomach, a small intestine, a large intestine, or combinations thereof.
  • 39. The method of claim 1, wherein the biological sample is obtained from a tumor or is a tumor sample.
  • 40. The method of claim 1, wherein the biological sample is a soil sample, a gut sample, and/or a biofilm sample.
  • 41. The method of claim 1, wherein the biological sample is an environmental sample.
  • 42. A method of determining microbial identities and/or abundances in a biological sample, the method comprising: (a) immobilizing the biological sample in a matrix;(b) fracturing the matrix into clusters; and(c) determining identities and/or abundances of microbes in the clusters,wherein the matrix comprises an acrylamide polymer.
CROSS REFERENCE TO RELATED APPLICATION

This application claims priority to U.S. Provisional Application No. 62/648,716 filed on Mar. 27, 2018, which is incorporated herein by reference in its entirety.

GOVERNMENT LICENSE RIGHTS

This invention was made with government support under OD009172 and AI132403 awarded by the National Institutes of Health. The government has certain rights in the invention.

US Referenced Citations (5)
Number Name Date Kind
20090098555 Roth et al. Apr 2009 A1
20100285975 Mathies et al. Nov 2010 A1
20120149584 Olle et al. Jun 2012 A1
20120165215 Andersen et al. Jun 2012 A1
20150259728 Cutliffe et al. Sep 2015 A1
Foreign Referenced Citations (1)
Number Date Country
2012048341 Apr 2012 WO
Non-Patent Literature Citations (12)
Entry
Sepp, R. et al., Rapid techniques for DNA extraction from routinely processed archival tissue for use in PCR, J. Clin. Pathol., vol. 47 , pp. 318-323 (Year: 1994).
Xu et al., Virtual microfluidics for digital quantification and single-cell sequencing, Nature Meth., vol. 13, pp. 759-762 plus online methods pp. 1-2 (Year: 2016).
Welch et al., Biogeography of a human oral microbiome at the micron scale. PNAS 2015, 113(6): E791-800.
Macosko et al., Highly Parallel Genome-wide Expression profiling of Individual Cells Using Nanoliter Droplets. Cell 2015; 161(5): pp. 1202-1214.
Chung et al., Clarity for mapping the nervous system. Nature Methods, 2013; 10(6): pp. 508-513.
Geva-Zatorsky N et al, In vivo imaging and tracking of host-microbiota interactions via metabolic labeling of gut anaerobic bacteria, Nature Medicine, vol. 21/Issue 9, pp. 1091-1100, 2015.
Zhang et al., Spatial heterogeneity and co-occurrence patterns of human mucosal-associated intestinal microbiota, ISME J. vol. 8/Issue 4, pp. 881-893, 2014.
Nava et al., Spatial organization of intestinal microbiota in the mouse ascending colon, ISME J. vol. 5/Issue 4, pp. 327-638, 2011.
Gill et al., Metagenomic analysis of the human distal gut microbiome, Science, vol. 312/Issue 5778, pp. 1355-1359, 2006.
Valm et al., Systems-level analysis of microbial community organization through combinatorial labeling and spectral Imaging, PNAS, vol. 108/ Issue 10, pp. 4152-4157, 2011.
Wang, H., Functional metagenomic reprogramming of the human microbiome through mobilome eng, NIH Grant #:1DP50D009172-01. Award Notice Date: Sep. 20, 2011; Project Start Date: Sep. 20, 2011.
Alm, E., High-resolution analysis of diversity and variation in the human microbiome, NIH Gran #: 5R21AI084032-02. Award Notice Date: Jun. 6, 2011; Project Start Date: Jun. 15, 2010.
Related Publications (1)
Number Date Country
20190300968 A1 Oct 2019 US
Provisional Applications (1)
Number Date Country
62648716 Mar 2018 US