This application is related to U.S. Patent Application No. 62/120,873, entitled “Systems and Methods for Visualizing Structural Variation and Phasing Information,” filed Feb. 25, 2015, which is hereby incorporated by reference herein in its entirety.
This application is also related to U.S. Patent Application No. 62/102,926, entitled “Systems and Methods for Visualizing Structural Variation and Phasing Information,” filed Jan. 13, 2015, which is hereby incorporated by reference herein in its entirety.
This specification describes technologies relating to visualizing structural variation and phasing information in nucleic acid sequencing data.
Haplotype assembly from experimental data obtained from human genomes sequenced using massively parallelized sequencing methodologies has emerged as a prominent source of genetic data. Such data serves as a cost-effective way of implementing genetics based diagnostics as well as human disease study, detection, and personalized treatment.
The long-range information provided by such massively parallelized sequencing methodologies is disclosed, for example, in U.S. Patent Application No. 62/072,214, filed Oct. 29, 2014, entitled “Analysis of Nucleic Acid Sequences.” Such techniques greatly facilitate the detection of large-scale structural variations of the genome, such as translocations, large deletions, or gene fusions. Other examples include, but are not limited to the sequencing-by-synthesis platform (ILLUMINA), Bentley et al., 2008, “Accurate whole human genome sequencing using reversible terminator chemistry, Nature 456:53-59; sequencing-by-litigation platforms (POLONATOR; ABI SOLiD), Shendure et al., 2005, “Accurate Multiplex Polony Sequencing of an Evolved bacterial Genome” Science 309:1728-1732; pyrosequencing platforms (ROCHE 454), Margulies et al., 2005, “Genome sequencing in microfabricated high-density picoliter reactors,” Nature 437:376-380; and single-molecule sequencing platforms (HELICOS HELISCAPE); Pushkarev et al., 2009, “Single-molecule sequencing of an individual human genome,” Nature Biotech 17:847-850, (PACIFIC BIOSCIENCES) Eid et al., “Real-time sequencing form single polymerase molecules,” Science 323:133-138, each of which is hereby incorporated by reference in its entirety.
The availability of haplotype data spanning large portions of the human genome, the need has arisen for ways in which to efficiently work with this data in order to advance the above stated objectives of diagnosis, discovery, and treatment, particularly as the cost of whole genome sequencing for a personal genome drops below $1000. To computationally assemble haplotypes from such data, it is necessary to disentangle the reads from the two haplotypes present in the sample and infer a consensus sequence for both haplotypes. Such a problem has been shown to be NP-hard. See Lippert et al., 2002, “Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem,” Brief. Bionform 3:23-31, which is hereby incorporated by reference.
The assembly view Consed supports visualization of reads obtained from the above-identified sequencing methods. See Gordon 1998, “Consed: A graphical tool for sequencing finishing,” Genome Research 8:198-202.
Another visualization tool is EagleView. See Huang and Marth, 2008, “EagleView: A genome assembly viewer for next-generation sequencing technologies,” Genome Research 18:1538-1543.
Still another such viewer is HapEdit. See Kim et al., “HapEdit: an accuracy assessment viewer for haplotype assembly using massively parallel DNA-sequencing technologies.” Nucleic Acids Research, 2011, 1-5. HapEdit provides tools for assessing the accuracy of Haplotype assemblies and permits a user to fit the composition rates of reads sequence by numerous different sequencing technologies.
While the above-disclosed programs are each significant advancements in their own right, they do not adequately address the need in the art for tools for visually assessing structural variants (e.g., deletions, duplications, copy-number variants, insertions, inversions, translocations, long terminal repeats (LTRs), short tandem repeats (STRs), and a variety of other useful characterizations) in sequencing data.
Technical solutions (e.g., computing systems, methods, and non-transitory computer readable storage mediums) for visually assessing structural variants are provided. With platforms such as those disclosed in U.S. Patent Application No. 62/072,214, filed Oct. 29, 2014, entitled “Analysis of Nucleic Acid Sequences,” which is hereby incorporated by reference, the genome is fragmented and partitioned and barcoded prior to the target identification. Therefore the integrity of the barcode information is maintained across the genome. The barcode information is used to identify potential structural variation breakpoints by detecting regions of the genome that show significant barcode overlap. They are also used to obtain phasing information.
The following presents a summary of the invention in order to provide a basic understanding of some of the aspects of the invention. This summary is not an extensive overview of the invention. It is not intended to identify key/critical elements of the invention or to delineate the scope of the invention. Its sole purpose is to present some of the concepts of the invention in a simplified form as a prelude to the more detailed description that is presented later.
One aspect of the present disclosure is a system for providing structural variation or phasing information over a network connection to a remote client computer. The system comprises one or more microprocessors, a persistent memory and a non-persistent memory. The persistent memory (e.g., a hard drive) and the non-persistent memory (e.g., RAM memory) collectively store one or more nucleic acid sequence datasets. Each respective nucleic acid sequencing dataset in the one or more nucleic acid sequence datasets corresponds to at least one target nucleic acid in a respective sample in a plurality of samples. The respective sample is associated with a reference genome of a species that may serve as a benchmark for analysis of the respective sample in some embodiments. For instance, in some embodiments the respective sample is mapped to the reference genome and the reference genome is thereby used as a template (reference) to parse queries to visualize portions of the respective sample. For instance, in some embodiments a sample is from a human subject. In such instance, a human genome (as opposed to a genome from a different species) serves as the reference genome and the respective sample is mapped to the human genome. In this way, requests to visual sequences or sequence variations in certain human chromosomes, or portions thereof from the sample, can be interpreted and handled using the disclosed systems and methods, based on such mapping to the reference genome.
The respective nucleic acid sequencing dataset comprises (i) a header, (ii) a synopsis, and (iii) a data section. The data section comprises a plurality of aligned sequence reads from the sample and information about each variant call made. Advantageously, the data section is extensible and can store additional data. Each respective sequencing read in the plurality of sequencing reads comprises a first portion that corresponds to a subset of at least one target nucleic acid in the respective sample and a second portion that encodes a respective identifier for the respective sequencing read in a plurality of identifiers. Each respective identifier is independent of the sequence of the at least one target nucleic acid. Sequencing reads in the plurality of sequencing reads collectively include the plurality of identifiers.
The persistent memory and the non-persistent memory further collectively store one or more programs that use the one or more microprocessors to provide a haplotype visualization tool to a client for installation on the remote client computer. The system receives a request, sent from the client over a network connection (e.g., Internet), for structural variation or phasing information using a first dataset in the one or more datasets. Responsive to receiving the request, the request is automatically filtered by performing a method comprising loading the header and the synopsis of the first dataset into the non-persistent memory if not already loaded into the non-persistent memory while retaining the data section in persistent memory. In the method, the request is compared (analyzed against) the synopsis of the first dataset thereby identifying one or more portions of the data section of the first dataset. These one or more identified portions of the data section are, in turn, loaded into non-persistent memory. Structural variation or phasing information is formatted for display on the client computer using the first dataset. Then the formatted structural variation or phasing information is transmitted over the network connection to the client device for display on the client device.
In some embodiments, the header delineates a plurality of components in the respective nucleic acid sequencing dataset. In some embodiments the plurality of components comprises two or more components, three or more components, four or more components or five or more components selected from the group consisting of a summary, an index to variant call data, a phase block track, a refseq index track, a gene track, an exon track, an index to read data, a structural variant dataset track, an index to a target dataset, and an index to a fragment dataset.
In some embodiments, the plurality of components comprises the summary and this summary comprises two or more items, three or more items, four or more items, five or more items, or six or more items in the group consisting of: a percentage of known SNPs phased in the respective nucleic acid sequencing dataset, a longest phase block in the respective nucleic acid sequencing dataset, a number of unique barcodes used in the respective nucleic acid sequencing dataset, an average fragment length in the respective nucleic acid sequencing dataset, a mean of the average fragment length in the respective nucleic acid sequencing dataset, a percentage of fragments greater than a lower threshold in the respective nucleic acid sequencing dataset, a fragment length histogram in the respective nucleic acid sequencing dataset, an N50 phase block size in the respective nucleic acid sequencing dataset, a phase block histogram in the respective nucleic acid sequencing dataset, a number of sequence reads represented by respective the nucleic acid sequencing dataset, a median insert size in the respective nucleic acid sequencing dataset, a median depth in the respective nucleic acid sequencing dataset, a percent of the target genome with zero coverage in the respective nucleic acid sequencing dataset, a mapped reads percentage for the respective nucleic acid sequencing dataset, a PCR duplication percentage for the respective nucleic acid sequencing dataset, a coverage histogram for the in the respective nucleic acid sequencing dataset, an identity of a test nucleic acid that forms the basis for the respective nucleic acid sequencing dataset, a genome source for the respective nucleic acid sequencing dataset, a sex of an organism that originated the at least one test nucleic acid of the respective nucleic acid sequencing dataset, a sex of the organism that originate the respective sample of the in the respective nucleic acid sequencing dataset, a dataset file format version of the in the respective nucleic acid sequencing dataset, and a pointer to a plurality of structural variant calls made for the respective nucleic acid sequencing dataset. Advantageously, as this non-limiting example of the list of information indicates, the disclosed nucleic acid sequencing datasets can contain arbitrary bits of metadata (e.g., annotation data) that might be of user interest in along with sequencing data.
In some embodiments, the plurality of components comprises the index to variant call data that provides a correspondence between respective ranges of the genome of the species to offsets in the data section where variant call data for the respective ranges is found.
In some embodiments, the plurality of components comprises the phase block track. The phase block track comprises (i) a dictionary and (ii) a track data section comprising phase information for one or more chromosomes in the genome of the species. In some embodiments, the dictionary comprises a plurality of names, and for each respective name in the plurality of names, an offset into the track data where records for the corresponding name are found. In some embodiments, the track data section comprises a plurality of records and wherein each record in the plurality of records represents a phase block in the target nucleic acid. In some embodiments, the tract data section is in the JSON file format.
In some embodiments, each respective record in the plurality of records specifies (i) a chromosome number corresponding to the respective record, (ii) a position where the phase block starts on the chromosome, (iii) a position where the phase block ends, (iv) a unique name for the record, and (v) phasing information about the phase block.
In some embodiments, each respective record in the plurality of records is represented by a node in a plurality of nodes in a respective interval tree in a plurality of interval trees, and each interval tree in the plurality of interval trees represents a chromosome in a plurality of chromosomes for the species. In some such embodiments, a node in the plurality of nodes of a first interval tree in the plurality of interval trees stores a midpoint of the node, the midpoint of the node is a position of the midpoint, on the corresponding chromosome, of the phase block corresponding to the node, each respective node in the plurality of nodes of the first interval tree has a link to a left child node, which corresponds to the phase block immediately to the left of (i.e., numerically less than) the phase block represented by the respective node in the genome of the species, each respective node in the plurality of nodes of the first interval tree has a link to a right child node, which corresponds to the phase block immediately to the right of (i.e., numerically greater than) the phase block represented by the respective node in the genome of the species, each respective node in the plurality of nodes of the first interval tree has a sorted set of nodes that represent phase blocks that overlap the midpoint of the respective node sorted by left hand position of such phase block, and each respective node in the plurality of nodes of the first interval tree has a sorted set of nodes that represent phase blocks that overlap the midpoint of the respective node sorted by right hand position of such phase blocks. In some such embodiments, each respective node in the plurality of nodes of the first interval tree further includes a name, which is an offset in the track data section to the record in the plurality of records that contains phase information for the phase block corresponding to the respective node.
In some embodiments, the header further comprises the version of the dataset structure used by the nucleic acid sequencing dataset. In some embodiments, the plurality of components comprises the refseq index, and the refseq index comprises an index of a plurality of molecular variation identifiers that are called in the sample. In some such embodiments, each respective molecular variation identifier in the plurality of molecular variation identifiers is dbSNP identifier.
In some embodiments, the plurality of components comprises the gene track. In such embodiments, the gene track comprises a plurality of genes and, for each respective gene in the plurality of genes, a number of single nucleotide polymorphisms in the respective gene.
Another aspect of the present disclosure provides a system for processing program output over a network connection using a local computer, where the local computer comprises one or more microprocessors, and a memory that stores one or more programs. The one or more programs use the one or more microprocessors to execute a method in accordance with a first operating system running on the local computer. In the method a first instance of a first program is invoked. Then, there is obtained through the first instance of the first program from a user, a login and a password to a user account on a remote computer. This is used to log the user into the user account on the remote computer automatically (using the login and the password provided by the first instance of the first program) across a network connection between the local computer and the remote computer. Responsive to successful login on the remote computer, there automatically sent, without human intervention, a second instance of the first program configured to auto-install on the remote computer upon transmission to the remote computer when the remote computer does not already have the first program available in the users account. Next, there is received from the remote computer a request to open a panel within the first instance of the first program. The panel is originated by the second instance of the first program running on the remote computer. The panel solicits input from the user for controlling the second instance of the first program. Responsive to receiving input from the user for controlling the second instance of the first program in the panel on the local computer, the input is sent to the second instance of the first program on the remote computer across the network connection (e.g., wireless or wired connection). Next, there is received, from the remote computer across the network connection, output from the second instance of the first program responsive to the input. This output is displayed at the local computer.
Another aspect of the present disclosure provides a system for viewing nucleic acid sequencing data. The system comprises one or more microprocessors and a memory. The memory stores one or more programs that use the one or more microprocessors to obtain a nucleic acid sequencing dataset corresponding to at least one target nucleic acid in a sample. The nucleic acid sequencing dataset comprises a plurality of sequencing reads from the sample. Each respective sequencing read in the plurality of sequencing reads comprises a first portion that corresponds to a subset of at least one target nucleic acid in the sample and a second portion that encodes a respective identifier (e.g., bar code) for the respective sequencing read in a plurality of identifiers. Each respective identifier is independent of the sequence of the at least one target nucleic acid. The plurality of sequencing reads collectively includes the plurality of identifiers. A visualization tool is displayed. A request is obtained from a user through the visualization tool. The request specifies a genomic region represented by the nucleic acid sequencing dataset. Responsive to obtaining the request, the request is parsed by obtaining a plurality of sequencing reads within the genomic region from the nucleic acid sequencing dataset. A scan window is run against the plurality of sequencing reads thereby creating a plurality of windows, each respective window of the plurality of windows corresponding to a different region of the genomic region and including an identity of each identifier of each sequencing read in the different region of the genomic region in the nucleic acid sequencing dataset. A two dimensional heat map that represents each possible window pair in the plurality of windows is displayed. Each respective window pair is displayed in the two dimensional heat map as a color selected from a color scheme based upon the number of identifiers in common in the respective window pair.
Various embodiments of systems, methods and devices within the scope of the appended claims each have several aspects, no single one of which is solely responsible for the desirable attributes described herein. Without limiting the scope of the appended claims, some prominent features are described herein. After considering this discussion, and particularly after reading the section entitled “Detailed Description” one will understand how the features of various embodiments are used.
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference in their entireties to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
The implementations disclosed herein are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings. Like reference numerals refer to corresponding parts throughout the drawings.
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it will be apparent to one of ordinary skill in the art that the present disclosure may be practiced without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.
It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first subject could be termed a second subject, and, similarly, a second subject could be termed a first subject, without departing from the scope of the present disclosure. The first subject and the second subject are both subjects, but they are not the same subject.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used herein, the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” may be construed to mean “upon determining” or “in response to determining” or “upon detecting (the stated condition or event (“or” in response to detecting (the stated condition or event),” depending on the context.
The implementations described herein provide various technical solutions to detect a structural variant (e.g., deletions, duplications, copy-number variants, insertions, inversions, translocations, long terminal repeats (LTRs), short tandem repeats (STRs), and a variety of other useful characterizations) in sequencing data of a test nucleic acid obtained from a biological sample. Details of implementations are now described in relation to the Figures.
In some implementations, the user interface 106 includes an input device (e.g., a keyboard, a mouse, a touchpad, a track pad, and/or a touch screen) 100 for a user to interact with the system 100 and a display 108.
In some implementations, one or more of the above identified elements are stored in one or more of the previously mentioned memory devices, and correspond to a set of instructions for performing a function described above. The above identified modules or programs (e.g., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various implementations. In some implementations, the memory 112 optionally stores a subset of the modules and data structures identified above. Furthermore, in some embodiments, the memory stores additional modules and data structures not described above. In some embodiments, one or more of the above identified elements is stored in a computer system, other than that of system 100, that is addressable by system 100 so that system 100 may retrieve all or a portion of such data when needed.
Although
Advantageously, because the nucleic acid sequence datasets 126 are large in typical embodiments (e.g., 1 gigabyte or greater, 5 gigabytes or greater, or 10 gigabytes or greater), in some embodiments the structural variation and phasing visualization system 100 is part of a system that includes one or more client devices 3102 that are in electronic communication with the structural variation and phasing visualization system 100 of
The process of running a program on a remote computer (e.g., in system 3100, the structural variation and phasing visualization system 100 is considered remote) and viewing the results on a client device 3102 (e.g., desktop or laptop) is cumbersome. A user must generally (i) install certain parts of the program on their computer 3102 and other parts on the server 100, (ii) use SSH or firewall software to create a open network port connecting the two computers (system 3102 to client device 100), and (iii) independently start different parts of the program on different systems. For example, a May 17, 2014, Trackets Blog post titled “SSH Tunnel—Local and Remote Port Forwarding Explained With Examples,” which is hereby incorporated by reference, explains one way of setting up forwarding. The present disclosure incorporates such techniques. However, advantageously, in some embodiments, the present disclosure affords solutions to the above-disclosed networking techniques, which seeks to automate and improve upon the processes described above. Once a user has installed the haplotype visualization tool 148 on their client device 3102, they only need to provide the tool 148 with their credentials (e.g., user-name and password) for the remote computer (structural variation and phasing visualization system 100) that has the data and computational facilities to run the haplotype visualization tool 148. For instance, in some embodiments, referring to
Referring once again to
Referring to
The sequencing reads ultimately form the basis of a nucleic acid sequencing dataset 126. Each respective sequencing read 202 in the plurality of sequencing reads comprises a first portion that corresponds to a subset of a test nucleic acid and a second portion that encodes identification information for the respective sequencing read. The identification information is independent of the sequencing data of the test nucleic acid.
In some embodiments, sequencing read lengths have an N50 (where the sum of the sequence read lengths that are greater than the stated N50 number is 50% of the sum of all sequencing read lengths). In typical embodiments, sequencing reads are tens or hundreds of bases in length, which in turn, are aligned to form constructs of at least about 10 kb, at least about 20 kb, or at least about 50 kb. In more preferred aspects, sequencing reads are tens or hundreds of bases in length, which in turn, are aligned to form constructs having at least about 100 kb, at least about 150 kb, at least about 200 kb, and in many cases, at least about 250 kb, at least about 300 kb, at least about 350 kb, at least about 400 kb, and in some cases, at least about 500 kb or more.
In some embodiments, to obtain the plurality of sequencing reads from a biological sample from a subject, a test nucleic acid 206 is fragmented and these fragments are compartmentalized, or partitioned into discrete compartments or partitions (referred to interchangeably herein as partitions). In some embodiments, the test nucleic acid is the genome of a multi-chromosomal organism such as a human. In typical embodiments, multiple sequencing reads are measured from each such compartment or partition with lengths that are tens or hundreds of bases in length. Sequencing reads from the same compartment or partition that have the same bar code can be aligned to form sequence constructs that are at least about 25 kb, at least about 50 kb, 100 kb, at least about 150 kb, at least about 200 kb, and in many cases, at least about 250 kb, at least about 300 kb, at least about 350 kb, at least about 400 kb, and in some cases, at least about 500 kb or more in length.
Each partition maintains separation of its own contents from the contents of other partitions. As used herein, the partitions refer to containers or vessels that may include a variety of different forms, e.g., wells, tubes, micro or nanowells, through holes, or the like. In preferred aspects, however, the partitions are flowable within fluid streams. In some embodiments, these vessels are comprised of, e.g., microcapsules or micro-vesicles that have an outer barrier surrounding an inner fluid center or core, or have a porous matrix that is capable of entraining and/or retaining materials within its matrix. In a preferred aspect, however, these partitions comprise droplets of aqueous fluid within a non-aqueous continuous phase, e.g., an oil phase. A variety of different vessels are described in, for example, U.S. patent application Ser. No. 13/966,150, filed Aug. 13, 2013, which is hereby incorporated by reference herein in its entirety. Likewise, emulsion systems for creating stable droplets in non-aqueous or oil continuous phases are described in detail in, e.g., Published U.S. Patent Application No. 2010-0105112, which is hereby incorporated by reference herein in its entirety. In certain embodiments, microfluidic channel networks are particularly suited for generating partitions as described herein. Examples of such microfluidic devices include those described in detail in Provisional U.S. Patent Application No. 61/977,804, filed Apr. 4, 2014, as well as PCT/US15/025197, the full disclosures of which are incorporated herein by reference in their entirety for all purposes. Alternative mechanisms may also be employed in the partitioning of individual cells, including porous membranes through which aqueous mixtures of cells are extruded into non-aqueous fluids. Such systems are generally available from, e.g., NANOMI, Inc.
In the case of droplets in an emulsion, partitioning of the test nucleic acid fragments into discrete partitions may generally be accomplished by flowing an aqueous, sample containing stream, into a junction into which is also flowing a non-aqueous stream of partitioning fluid, e.g., a fluorinated oil, such that aqueous droplets are created within the flowing stream partitioning fluid, where such droplets include the sample materials. As described below, the partitions, e.g., droplets, also typically include co-partitioned barcode oligonucleotides.
The relative amount of sample materials within any particular partition may be adjusted by controlling a variety of different parameters of the system, including, for example, the concentration of test nucleic acid fragments in the aqueous stream, the flow rate of the aqueous stream and/or the non-aqueous stream, and the like. The partitions described herein are often characterized by having overall volumes that are less than 1000 pL, less than 900 pL, less than 800 pL, less than 700 pL, less than 600 pL, less than 500 pL, less than 400 pL, less than 300 pL, less than 200 pL, less than 100 pL, less than 50 pL, less than 20 pL, less than 10 pL, or even less than 1 pL. Where co-partitioned with beads, it will be appreciated that the sample fluid volume within the partitions may be less than 90% of the above described volumes, less than 80%, less than 70%, less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, or even less than 10% the above described volumes. In some cases, the use of low reaction volume partitions is particularly advantageous in performing reactions with very small amounts of starting reagents, e.g., input test nucleic acid fragments. Methods and systems for analyzing samples with low input nucleic acids are presented in U.S. Provisional Patent Application No. 62/017,580 Jun. 26, 2014, the full disclosure of which is hereby incorporated by reference in its entirety.
Once the test nucleic acid fragments are introduced into their respective partitions, the test nucleic acid fragments within partitions are generally provided with unique identifiers such that, upon characterization of those test nucleic acid fragments, they may be attributed as having been derived from their respective partitions. Such unique identifiers may be previously, subsequently or concurrently delivered to the partitions that hold the compartmentalized or partitioned test nucleic acid fragments, in order to allow for the later attribution of the characteristics, e.g., nucleic acid sequence information, to the sample nucleic acids included within a particular compartment, and particularly to relatively long stretches of contiguous sample nucleic acids that may be originally deposited into the partitions.
Accordingly, the test nucleic acid fragments are typically co-partitioned with the unique identifiers (e.g., barcode sequences). In particularly preferred aspects, the unique identifiers are provided in the form of oligonucleotides that comprise nucleic acid barcode sequences that is attached to test nucleic acid fragments in the partitions. The oligonucleotides are partitioned such that as between oligonucleotides in a given partition, the nucleic acid barcode sequences contained therein are the same, but as between different partitions, the oligonucleotides can, and preferably have differing barcode sequences. In some embodiments, only one nucleic acid barcode sequence is associated with a given partition, although in some embodiments, two or more different barcode sequences are present in a given partition.
The nucleic acid barcode sequences will typically include from 6 to about 20 or more nucleotides within the sequence of the oligonucleotides. These nucleotides may be completely contiguous, i.e., in a single stretch of adjacent nucleotides, or they may be separated into two or more separate subsequences that are separated by one or more nucleotides. Typically, separated subsequences may typically be from about 4 to about 16 nucleotides in length.
The test nucleic acid is typically partitioned such that the nucleic acids are present in the partitions in relatively long fragments or stretches of contiguous nucleic acid molecules. These fragments typically represent a number of overlapping fragments of the overall test nucleic acid to be analyzed, e.g., an entire chromosome, exome, or other large genomic fragment. This test nucleic acid may include whole genomes, individual chromosomes, exomes, amplicons, or any of a variety of different nucleic acids of interest. Typically, the fragments of the test nucleic acid that are partitioned are longer than 1 kb, longer than 5 kb, longer than 10 kb, longer than 15 kb, longer than 20 kb, longer than 30 kb, longer than 40 kb, longer than 50 kb, longer than 60 kb, longer than 70 kb, longer than 80 kb, longer than 90 kb or even longer than 100 kb.
The test nucleic acid is also typically partitioned at a level whereby a given partition has a very low probability of including two overlapping fragments of the starting test nucleic acid. This is typically accomplished by providing the test nucleic acid at a low input amount and/or concentration during the partitioning process. As a result, in preferred cases, a given partition includes a number of long, but non-overlapping fragments of the starting test nucleic acid. The nucleic acid fragments in the different partitions are then associated with unique identifiers, where for any given partition, nucleic acids contained therein possess the same unique identifier, but where different partitions include different unique identifiers. Moreover, because the partitioning step allocates the sample components into very small volume partitions or droplets, it will be appreciated that in order to achieve the desired allocation as set forth above, one need not conduct substantial dilution of the sample, as would be required in higher volume processes, e.g., in tubes, or wells of a multiwell plate. Further, because the systems described herein employ such high levels of barcode diversity, one can allocate diverse barcodes among higher numbers of genomic equivalents, as provided above. In some embodiments, in excess of 10,000, 100,000, 500,000, etc. diverse barcode types are used to achieve genome:(barcode type) ratios that are on the order of 1:50 or less, 1:100 or less, 1:1000 or less, or even smaller ratios, while also allowing for loading higher numbers of genomes (e.g., on the order of greater than 100 genomes per assay, greater than 500 genomes per assay, 1000 genomes per assay, or even more) while still providing for far improved barcode diversity per genome. Here, each such genome is an example of a test nucleic acid.
Referring to
In some embodiments, the co-partitioned oligonucleotides also comprise functional sequences in addition to the barcode region 214 and the primer region 216 region of the nucleic acids within the sample within the partitions. See, for example, the disclosure on co-partitioning of oligonucleotides and associated barcodes and other functional sequences, along with sample materials as described in, for example, U.S. Patent Application Nos. 61/940,318, filed Feb. 7, 2014, 61/991,018, Filed May 9, 2014, and U.S. patent application Ser. No. 14/316,383, filed on Jun. 26, 2014, as well as U.S. patent application Ser. No. 14/175,935, filed Feb. 7, 2014, the full disclosures of which is hereby incorporated by reference in their entireties.
In one exemplary process, beads are provided, where each such bead includes large numbers of the above described oligonucleotides releasably attached to the beads. In such embodiments, all of the oligonucleotides attached to a particular bead include the same nucleic acid barcode sequence, but a large number of diverse barcode sequences are represented across the population of beads used. Typically, the population of beads provides a diverse barcode sequence library that includes at least 1000 different barcode sequences, at least 10,000 different barcode sequences, at least 100,000 different barcode sequences, or in some cases, at least 1,000,000 different barcode sequences. Additionally, each bead typically is provided with large numbers of oligonucleotide molecules attached. In particular, the number of molecules of oligonucleotides including the barcode sequence on an individual bead may be at least about 10,000 oligonucleotides, at least 100,000 oligonucleotide molecules, at least 1,000,000 oligonucleotide molecules, at least 100,000,000 oligonucleotide molecules, and in some cases at least 1 billion oligonucleotide molecules.
In some embodiments, the oligonucleotides are releasable from the beads upon the application of a particular stimulus to the beads. In some cases, the stimulus may be a photo-stimulus, e.g., through cleavage of a photo-labile linkage that may release the oligonucleotides. In some cases, a thermal stimulus may be used, where elevation of the temperature of the beads environment may result in cleavage of a linkage or other release of the oligonucleotides form the beads. In some cases, a chemical stimulus may be used that cleaves a linkage of the oligonucleotides to the beads, or otherwise may result in release of the oligonucleotides from the beads.
In accordance with the methods and systems described herein, the beads including the attached oligonucleotides may be co-partitioned with the individual samples, such that a single bead and a single sample are contained within an individual partition. In some cases, where single bead partitions are desired, it may be desirable to control the relative flow rates of the fluids such that, on average, the partitions contain less than one bead per partition, in order to ensure that those partitions that are occupied, are primarily singly occupied. Likewise, one may wish to control the flow rate to provide that a higher percentage of partitions are occupied, e.g., allowing for only a small percentage of unoccupied partitions. In preferred aspects, the flows and channel architectures are controlled as to ensure a desired number of singly occupied partitions, less than a certain level of unoccupied partitions and less than a certain level of multiply occupied partitions.
FIG. 3 of U.S. Patent Application No. 62/072,214, filed Oct. 29, 2014, entitled “Analysis of Nucleic Acid Sequences,” which is hereby incorporated by reference and the portions of the specification therein describing
Once co-partitioned, the oligonucleotides disposed upon the beads may be used to barcode and amplify the partitioned samples. One process for use of these barcode oligonucleotides in amplifying and barcoding samples is described in detail in U.S. Patent Application Nos. 61/940,318, filed Feb. 7, 2014, 61/991,018, Filed May 9, 2014, and U.S. patent application Ser. No. 14/316,383, filed on Jun. 26, 2014, the full disclosures of which are hereby incorporated by reference in their entireties. Briefly, in one aspect, the oligonucleotides present on the beads that are co-partitioned with the samples are released from their beads into the partition with the samples. The oligonucleotides typically include, along with the barcode sequence, a primer sequence at its 5′ end. This primer sequence may be a random oligonucleotide sequence intended to randomly prime numerous different regions of the samples, or it may be a specific primer sequence targeted to prime upstream of a specific targeted region of the sample.
Once released, the primer portion of the oligonucleotide can anneal to a complementary region of the sample. Extension reaction reagents, e.g., DNA polymerase, nucleoside triphosphates, co-factors (e.g., Mg2+ or Mn2+ etc.), that are also co-partitioned with the samples and beads, then extend the primer sequence using the sample as a template, to produce a complementary fragment to the strand of the template to which the primer annealed, with complementary fragment that includes the oligonucleotide and its associated barcode sequence. Annealing and extension of multiple primers to different portions of the sample may result in a large pool of overlapping complementary fragments of the sample, each possessing its own barcode sequence indicative of the partition in which it was created. In some cases, these complementary fragments may themselves be used as a template primed by the oligonucleotides present in the partition to produce a complement of the complement that again, includes the barcode sequence. In some cases, this replication process is configured such that when the first complement is duplicated, it produces two complementary sequences at or near its termini, to allow the formation of a hairpin structure or partial hairpin structure that reduces the ability of the molecule to be the basis for producing further iterative copies. A schematic illustration of one example of this is shown in
As
In some embodiments, the sequencing reads in a nucleic acid sequencing dataset 126 are processed in order to sequence the at least one target nucleic acid. In some embodiments conventional methods are used to process the nucleic acid sequence reads in order to establish a sequence for the at least one target nucleic acid. In some embodiments the novel methods disclosed in PCT application PCT/US2015/038175, entitled “Processes and Systems for Nucleic Acid Sequence Assembly,” filed Jun. 26, 2015, which is hereby incorporated by reference, are used to process the nucleic acid sequence reads in order to establish a sequence for the at least one target nucleic acid. In some embodiments, such sequencing involves mapping the sequencing reads to a reference genome, such as the genome of the species from which the sample is taken. In some embodiments, the sample is expected, or suspected, of containing multiple genomes (e.g., the case in which a sample, such as a human sample, infected with a retrovirus). In such cases, multiple reference genomes, from different species may be concurrently used.
In some embodiments, the sequencing reads are processed by phasing them and by looking for structural variations. In some embodiments, conventional phasing methods and structural variation methods are used. In some embodiments, novel phasing methods and structural variation methods, such as those disclosed in U.S. Provisional Application No. 62/238,077, entitled “Systems and Method for Determining Structural Variation Using Probabilistic Models,” filed Oct. 6, 2015, which is hereby incorporated by reference, are used. Although not disclosed in this reference, in some embodiments the teachings of the reference are extended to incorporate multiple reference genomes in instances where the sample potential contains nucleic acid from multiple reference genomes. For instance, in the case where the sample is human but it is possible that the sample is infected with a retrovirus, the genome of the retrovirus is treated as an additional chromosome. In this way, it is possible to extend the visualization methods disclosed in the present disclosure to identify insertion of nucleic acid constructs, such as retroviruses, into the genome of the sample under study.
So, for example, the disclosed techniques can use the bar codes to distinguish the following two scenarios. One is a human sample with HPV virus free floating in the sample but the virus hasn't been inserted into the human DNA. They are a free floating molecule—separate molecules, separate virus, separate human DNA. In that case, the measured sequence reads are going to include reads that map to HPV as well as the human genome but there will not be bar codes in common with the HPV and the human genome meaning that the human genome and the HPV are distinct. On the other hand, if the HPV molecule has been inserted into a human chromosome or two, what will be measured are sequence reads that map to both a human chromosome and the HPV at the same time and share the same bar codes meaning that they exist in the same molecule as opposed to separate molecules (e.g., the HPV has been incorporated into a human chromosome). Moreover, the bar codes can be used to localize the precise location(s) of the HPV insertion into the human chromosome.
As illustrated in
The synopsis section 308 contains data that is read by haplotype visualization tool 148 into volatile (e.g., random access) memory, typically in its entirety, when the dataset 126 is accessed. This data consists of indexes into the data section 340 as well as other data that is referenced frequently by visualization tool 148. As illustrated in
Summary 310 provides high level metrics extracted from the data. In some embodiments, summary 310 is used by summarization module 150 to provide summary data such as that illustrated in
Index to variant call data 312 is an example of an index found in the summary and it relates respective ranges 214 of the genome of the target nucleic acid to offsets 316 in the corresponding data section 340 where variant call data for the respective ranges is found.
In some embodiments, the phase block track 318 is stored in the synopsis section 308 of the nucleic acid sequencing dataset 126. More details of the architecture of an exemplary phase block track 318 are found in
The dictionary 402 of the phase block track 318 comprises a plurality of names 404, and for each name 404, an offset 406 into the track data 408 where records for the corresponding name 404 are found. In some embodiments, the dictionary 402 for the phase block track 318 contains a single name, e.g., “phase data”.
In some embodiments, the track data 408 is in JSON format. In some embodiments, each record 410 represents a phase block in the target nucleic acid. As such, in some embodiments, each record 410 specifies a chromosome number 412 that the phase block is on as well as the position where the phase block starts 414 on the chromosome 412 and a position where the phase block ends 416 on the chromosome 412. Moreover, there is a unique name 418 for each record and phasing information 420 about the phase block. In some embodiments, the purpose for the information 420 is to provide details of phasing information of the phase block. In some embodiments, a phase block includes information about two haplotypes corresponding to the two parents (e.g., respectively denoted haplotype “A” and haplotype “B”). Accordingly, in some embodiments, the phase information comprises PhaseASNP 422 (the number of counted single nucleotide polymorphisms on haplotype “A” in the phase block), Unphased SNP 424 (the number of counted single nucleotide polymorphisms of unknown haplotype in the phase block) and PhaseBSNP (the number of counted single nucleotide polymorphisms on haplotype “B” in the phase block). As such, the track data 408 holds certain phase block data (e.g., SNP counts) for the nucleic acid sequencing dataset 126. Techniques for phasing genomic data and phase blocks are described in Browning and Browning, “Haplotype phasing: Existing methods and new developments,” Nat Rev Genet.; 12(10): 703-714. doi:10.1038/nrg3054, which is hereby incorporated by reference in its entirety.
In some embodiments, the track data 408 is put into context by corresponding interval trees 422. As such, each record 410 is represented by a node 424 in an interval tree 422. Each such interval tree 422 is a ternary tree with each node 424 of the tree storing a midpoint of the node xmed 432. This midpoint 432 is the position of the midpoint, on the corresponding chromosome, of the phase block corresponding to the node. Each respective node 424 has a link to a left child node 428, which corresponds to the phase block immediately to the left of the phase block represented by the respective node 424 in the genome of the species of the target (genetic source) organism. Each respective node 424 has a link to a right child node 430, which corresponds to the phase block immediately to the right of the phase block represented by the respective node 424. Each respective node 424 has a sorted set of nodes 425 that represent phase blocks that overlap the xmed 432 of the respective node 424 sorted by left hand position of such phase block. Each respective node 424 has a sorted set of nodes 436 that represent phase blocks that overlap the xmed 432 of the respective node 424 sorted by right hand position of such phase blocks. In some embodiments, sorted sets 425 and 436 are represented in a node 424 by arrays or linked lists. Each respective node 424 further includes a name 426, which is an offset in track data 410 to the record 410 that contains phase information 420 for the phase block corresponding to the respective node 424.
As illustrated in
Referring to
In some embodiments, the synopsis 308 further comprises a gene track 320, which provides a reference of human genes tagged with the number of SNPs found in each gene. More details of the architecture of an exemplary gene track 320 are found in
The dictionary 602 of the gene track 320 comprises a plurality of names 604, and for each name 604, an offset 606 into the track data 608 where records for the corresponding name 604 are found. In some embodiments, each name 604 in dictionary 602 is the name of a chromosome in the target genome.
In some embodiments, the track data 608 for gene track 320 comprises a plurality of gene records 610. In some embodiments, the track data 608 is in JSON format. In some embodiments, each gene record 610 represents a gene in the species of the target nucleic acid. As such, in some embodiments, each gene record 610 specifies a chromosome number 612 the corresponding gene is on, the position where the gene starts 614 on the chromosome 612 and a position where the gene ends 616 on the chromosome 612. Moreover, there is a unique name 618 for each gene record and gene information 620 about the gene. In some embodiments, the purpose for the information 620 is to provide genetic information about the gene, such as, for example, an alternative name 622 for the gene, a count of single nucleotide polymorphisms 624 on the gene, and a direction (e.g., plus or minus) 626 of the gene.
In some embodiments, the track data 608 is put into context by the corresponding interval trees 628. Each gene record 610 forms a node 630 in an interval tree 628. Each interval tree 628 is a ternary tree with each node 630 storing a midpoint of the node xmed 642. This midpoint 642 is the position of the midpoint, on the corresponding chromosome, of the gene corresponding to the node. Each respective node 630 has a link to a left child node 638, which corresponds to the gene immediately to the left (lesser position on the chromosome) of the gene represented by the respective node 630 in the species of the target organism. Each respective node 630 has a link to a right child node 640, which corresponds to the gene immediately to the right of the gene (greater position on the chromosome) represented by the respective node 630 in the species of the target organism. Each respective node 620 has a sorted set of nodes 632 that respectively represent genes that overlap xmed 632 of the respective node 620 sorted by left hand position. Each respective node 630 has a sorted set of nodes 630 that respectively represent genes that overlap the xmed 642 of the respective node 630 sorted by right hand position. In some embodiments, sorted sets 632 and 644 are represented in a node 630 by arrays or linked lists. Each respective node 630 further includes a name 636, which is an offset in track data 608 to the gene record 610 that contains genetic information 620 for the gene corresponding to the respective node 630.
As illustrated in
In some embodiments, the synopsis 308 further comprises an exon track 322. In some embodiments, the exon track 322 has the same architecture as the gene track 320, the exception being that whereas the gene track 320 represents genetic information for genes in the species of the target organism, the exon track 320 provides genetic information for exons in the species of the target organism.
In some embodiments, the synopsis 308 further comprises an index to read data 324. This index 324 provides an index into sequence/read data 1048 in the data section 340 of the nucleic acid sequencing set, which is described in more detail below with reference to
The index 324 further comprises a per chromosome array of chromosome-offset→file-offset associations 328 into read data 1048 as well as a length of each such data element which allow lookup of the corresponding data for a specific genomic range. In some embodiments the read data is stored as a blocked index, and each record 328 is a fixed bit record for each entry in a BAM file that was incorporated into the dataset 126. Each such entry in the BAM file is organized into chunks within the data section 340 of the file. The index 324 in the synopsis 308 helps to find the correct chunk within the data section 340 to read. Referring to
where O is always O, X indicates the read quality is below a threshold value (e.g., below 60), L indicates the read is from parental haplotype A, R indicates the read is from parental haplotype B, I is a numerical identifier corresponding to the barcode in the read, E is the ‘end’ length of the read, and S is the ‘start’ position of this read, relative to the start of the chunk 1050. More generally, referring to
In some embodiments, the synopsis 308 further comprises a structural variant dataset track 330. In some embodiments, the structural variants dataset track 330 comprises a listing of the called structural variants in the sample represented by the dataset 126. More details of the architecture of an exemplary structural variant dataset track 330 are found in
The dictionary 802 of the structural variant dataset track 330 comprises a plurality of names 804, and for each name 804, an offset 606 into the track data 808 where records for the corresponding name 804 are found. In some embodiments, each name 804 in dictionary 802 is the name of a chromosome in the target genome.
In some embodiments, the track data 808 for structural variant dataset track 330 comprises a plurality of structural variant records 810. In some embodiments, the track data 808 is in JSON format. In some embodiments, each structural variant record 810 represents a structural variant call made for the target nucleic acid of the single organism represented by the dataset 126. As such, in some embodiments, each structural variant record 810 specifies a chromosome number 812, a start position 814 represented by the structural variation, a stop position 816 represented by the structural variation on the chromosome 812, a unique name 818 for the structural variation, and information 820 about the structural variation. In some embodiments, the structural variant dataset track 330 includes information analogous, corresponding to, or in a BEDPE format to advantageously concisely describe disjoint genome features, such as structural variations or paired-end sequence alignments. Accordingly, in some embodiments, the information section 820 in each structural variant record 810 includes a chromosome 1 name 822, which is the name of the chromosome on which the first end of the feature exists. In some embodiments chromosome 1 name 822 is in string format, for example, “chr1”, “III”, “myChrom”, or “contig1112.23.”
In some embodiments, the information section 820 in each record 810 further comprises a start 1 position 830, which is a zero-based starting position of the first end of the feature on chromosome 1 name 822.
In some embodiments, the information section 820 in each record 810 further comprises stop 1 (end 1) position 826, which is the one-based ending position of the first end of the feature (e.g., structural variation) represented by record 810 on chromosome 1 name 822.
In some embodiments, the information section 820 in each record 810 further comprises chromosome 2 name 836, which is the name of the chromosome on which the second end of the feature represented by record 810 exists. In some embodiments chromosome 2 name 836 is in string format, for example, “chr1”, “III”, “myChrom”, or “contig1112.23.”
In some embodiments, the information section 820 in each record 810 further comprises a start 2 position 828, which is the zero-based starting position of the second end of the feature represented by record 810 on chromosome 2 name 836.
In some embodiments, the information section 820 in each record 810 further comprises a stop 2 (end 2) position 824, which is the one-based ending position of the second end of the feature (e.g., structural variation) represented by record 810 on chromosome 2 name 836.
In some embodiments, the information section 820 in each record 810 further comprises a name of the structural variant field 834, which is the name of the feature (e.g., structural variation) represented by record 810. In some embodiments, the name of the structural variant 834 is in string format, for example, “LINE”, “Exon3”, “HWIEAS_0001:3:1:0:266#0/1”, or “my_Feature”.
In some embodiments, the information section 820 in each record 810 further comprises a quality (score) field 832, which is any metric the scores the quality of the feature (e.g., structural variation) represented by record 810. In some embodiments, quality 832 is in string format thereby permitting the expression of quality of the feature in any scientific metric, e.g., p-values, mean enrichment values, etc.
In some embodiments, the information section 820 in each record 810 further comprises further information 838 on the feature represented by the record 81, such as edit distance for each end of an alignment, or “deletion”, “inversion”, etc.).
Continuing to refer to
As illustrated in
Referring to
Thus, as the above provides, the disclosed nucleic acid sequencing datasets 126 of the present disclosure provide a streamlined file format that combines several forms of data that is conventionally found in separate files along with data that is of only secondary value. Advantageously, the nucleic acid sequencing dataset 126 file format is self-contained and has all the data required to support the features of haplotype visualization tool 148.
In some embodiments, a user provides a search query of syntax Y1, Y2, . . . , YN, where each Yi in Y1, Y2, . . . , YN is either an alphanumeric identification of a selected gene, a selection of a chromosomal region, or selection of a region of a contig sequence. In some such embodiments, a first Yi in Y1, Y2, . . . , YN is an identity of a first chromosome or a first contig sequence having the syntax X1:N1-N2, where X1 is an identity of the first chromosome or the first contig sequence, N1 is a selected start position within the first chromosome or the first contig sequence, and N2 is a selected end position within the first chromosome or the first contig sequence, and a second Yi in Y1, Y2, . . . , YN is an alphanumeric identification of a selected gene. In other such embodiments, a first Yi in Y1, Y2, . . . , YN is an identity of a first chromosome or a first contig sequence having the syntax X1:N1-N2, where X1 is an identity of the first chromosome or the first contig sequence, N1 is a selected start position within the first chromosome or the first contig sequence, and N2 is a selected end position within the first chromosome or the first contig sequence, and a second Yi in Y1, Y2, . . . , YN is an alphanumeric identification of a selected gene. In some embodiments, the request is converted, without human intervention, to genomic coordinates by comparison of the request against one or more lookup tables that match alphanumeric entries of genes to genomic coordinates. In some embodiments, the request comprises one or more gene names, one or more genomic coordinates, or a combination thereof.
Advantageously, the haplotype visualization tool 148 can be invoked in a variety of different system topologies. For instance, referring to
In some embodiments, the persistent memory and the non-persistent memory, collectively referenced as memory 112 in
The data section 340 comprises a plurality of sequencing reads and is the largest component of the dataset 126. Each respective sequencing read in the plurality of sequencing reads comprises a first portion that corresponds to a subset of at least one target nucleic acid in the respective sample and a second portion that encodes a respective identifier for the respective sequencing read in a plurality of identifiers. Each respective identifier is independent of the sequence of the at least one target nucleic acid. The plurality of sequencing reads collectively includes the plurality of identifiers.
The persistent memory and the non-persistent memory further collectively store one or more programs that use the one or more microprocessors 102 to provide a haplotype visualization tool 148 to the client for installation on the remote client computer. In turn, a request, sent from the client over the network connection, is received for structural variation or phasing information using a first dataset 126 in the one or more datasets. Responsive to receiving the request, the request is automatically filtered by loading the header 302 and the synopsis 308 of the first dataset into the non-persistent memory if not already loaded into the non-persistent memory while retaining the data section 340 in persistent memory. In this way, the amount of non-persistent memory is minimized. The request is compared to the synopsis 308 of the first dataset thereby identifying one or more portions of the data section of the first dataset. In particular, the various components of the synopsis 308, as described in further detail below, are used to identify which portions of the data 340 are needed to fulfill the request. In some embodiments, the request identifies a particular dataset 126 and a region of a genome. In some embodiments, the request identifies a particular dataset 126 and one or more genes. In some embodiments, the request identifies a particular dataset 126 and one or more exons. Once the portions of the data section that are needed to fulfill the request are identified, they are loaded into non-persistent memory and the requested structural variation or phasing information is formatted for display on the client computer 3102 using the first dataset. This formatted structural variation or phasing information is then sent over the network connection 3106 to the client device for display on the client device. In some embodiments, as disclosed in
Now that advantages of splitting up the nucleic acid sequence dataset 126 have been explained, graphical user interface features of the haplotype visualization tool 148, and its component modules (e.g., summarization module 150, phase visualization module 152, structural variations module 154, etc.) will be described in further detail. Turning to
In the haplotype view, phased portions of the selected region are enclosed in black rectangular boxes 1440. The entire region illustrated in
Vertical bars in the haplotype 1 (1402), haplotype 2 (1404), and middle area 1406 represent single nucleotide polymorphisms, small insertions and deletions. In some embodiments, these bars are color coded with a first color (e.g. grey) representing the reference genotype, and a second color (e.g., green) representing the alternative genotype.
A homozygous SNP will have a vertical bar spanning the two haplotype tracks and the middle area (unphased track) since homozygous variants cannot be phased. This is illustrated as element 2602 in
Phased heterozygous SNPs are placed on the haplotype tracks 1402/1404. This is illustrated as element 2604 in
Heterozygous SNPs are placed in the middle area 1405 (unphased track) sandwiched in between the haplotype tracks 1402/1404 when they are not phased. This is illustrated as element 2606 in
Finally, if both phased single nucleotide polymorphisms are of alternative genotype, two vertical bars of the second color (e.g., green) will be displayed in the haplotype tracks 1402/1404, one for each track. This is illustrated as element 2608 in
Dark regions, such as region 2710 of
Referring to
In some embodiments the genome browser further provides a chromosome map 1424 and the location 1426 on the chromosome that is being displayed. Referring to
The disclosed genome browser further provides a graphic representation 1408 of each gene that is in the displayed genomic region. This genes track 1408 displays annotated reference genes. Multiple genes can be displayed using the search bar 1250 by entering the genes of interest. The direction of each gene is indicated with arrows. Although not illustrated in
The disclosed genome browser further provides a graphic representation 1410 of exons that are in the displayed genomic region.
The disclosed genome browser further provides a coverage track 1412 for the coverage in the displayed genomic region. Aligned sequence reads are shown on the coverage track. Each vertical bar in the coverage track 1412 shows the average coverage-per-base for the area of the genome under the bar. The height is scaled such that maximum height is four times the median coverage. In some embodiments, when a user clicks on a portion of the coverage track 1412, the mean reads per base pair and total number of reads is displayed in a coverage details pop-up black box for that portion of the coverage track.
The disclosed genome browser further provides a breakpoints track 1414 in the displayed region. Structural variants including inter-chromosomal translocations, gene fusions, inversions and deletions are highlighted in the breakpoints track 1414. Structural variants are arbitrarily numbered in the display. Structural variant call are indicated in a first color (e.g., orange) in the breakpoints track 1414 and structural variant candidate are specified in a second color (e.g., grey) in the breakpoints track 1414. To display structural variant breakpoint pairs, a user can click on the structural variant displayed for the gene, as illustrated in
Advantageously, what is not shown in some embodiments of the display mode of the disclosed genome browser, illustrated in
Referring to
In some embodiments, the search bar 1250 of the disclosed genome browser provides intelligent auto complete features. For instance, when a user starts typing a gene name in the search bar 1250, the genome browser auto completes on the genes. In some embodiments, the genome browser accomplishes this by comparing partial search queries that the user enters against genomic information stored in the nucleic acid sequencing dataset such as the names of genes in the gene track. Advantageously, in such embodiments the search bar 1250 auto completes on gene names. For instance, referring to
As illustrated in
In particular, referring to
Again referring to
There are nine pairs of chunks between region (1) and region (2) which can be placed in a matrix such as the one set forth below in Table 1.
Computing the overlap between the two sets of barcodes in each cell yields the values set forth in Table 2.
Table 2 can be displayed by the structural variants module 154 as a heat map which efficiently shows areas of low and high barcode correlation to the user. In some embodiments, the structural variants module 154 provides additional information, such as gene and exon boundaries overlaid with the matrix to allow easy alignment of the data to known places of interest. In some embodiments, the structural variants module 154 also allows a textual copy of the matrix to be downloaded for analysis with other computer programs. In some embodiments, the user may adjust the region of the genome that is visualized in the structural variants module 154 by scrolling or zooming in real time. In some embodiments, the user can adjust the resolution (chunk size/window size) to avoid aliases or overload when looking at very small or very large areas of the genome.
Some embodiments of the present disclosure provide a system 100 for viewing nucleic acid sequencing data (e.g., information obtained from nucleic acid sequencing datasets 126). The system 100 comprises one or more microprocessors 102 and a memory 112. The memory stores a nucleic acid sequence dataset 126 corresponding to at least one target nucleic acid in a sample. The memory further stores one or more programs (e.g., the haplotype visualization tool 148) that use the one or more microprocessors to obtain the nucleic acid sequencing dataset that comprises a plurality of sequencing reads from a sample. Then, a request is obtained from a user (e.g., through search bar 1250 of the haplotype visualization tool 148 illustrated in
Referring to
Referring to affordance 3304, each respective sequence read 1048 is mapped to a location on a reference genome with a confidence value that represents a probability that the respective sequence read was correctly mapped. The default is to only show data for sequence reads when this confidence value satisfies a stringent (high) threshold value so that misleading information is not displayed. But sometimes a user still wants to see information for sequence reads that do not satisfy the stringent threshold confidence value. For instance, sometimes, when too much data is filtered out based on the confidence threshold unusual artifacts may appear in the heat map. For instance, regions of the heat map will appear to have no data. In reality, such regions may be just regions where the confidence in the localization of sequence reads 1048 is low (e.g., regions of the genome that exhibit extensive repeats). To determine whether there is actual no data (perhaps indicating an extensive structural variation) affordance 3304 allows the user to remove (or lower) the stringent threshold value and to permit the display of data from sequence reads 1048 that have been mapped to the reference genome with lower confidence values. In this way, the user can determined whether there is in fact a structural variation at sites that were missing data when the stringent threshold value was turned on or whether the genomic region simply represents a region where the confidence values for the sequence reads is low.
In a typical use case scenario associated with affordance 3304, sequence reads 1084 that that do not satisfy a quality threshold are discarded and so are not used to in downstream phasing algorithms and structural variation algorithms. The consequence of discarding such sequence reads is that it can introduce what looks like structure in the heat map plot illustrated in
Referring to
Referring to
In some embodiments, the different views offered (e.g., haplotype/phase 152, structural variants 154, and reads 156) by the haplotype visualization tool 148 are all linked. For instance, a user may navigate from one view to another to see the same data using an alternate visualization without reentering information using affordances 1252, 1254, and 1256. For instance, the user may toggle between the matrix view of the structural variants module 154 and the haplotype view of the phase visualization module 152.
A “smart” search affordance 1250 is employed in the various views. Referring to
In some embodiments, system 100 stores genomic data to be displayed in a custom file format (e.g., the format of nucleic acid sequencing dataset 126). The file is generated by a “preprocessor” which takes reference data, the VCF file, the BAM, file and the structural variant file as inputs and produces a single output nucleic acid sequencing dataset 126. The nucleic acid sequencing dataset 126 contains all of the information that is required to display a given dataset. The file is organized into several sections. A small synopsis section 308 that is roughly 25 MB and a much larger data section 340 (100 MB to 20 GB). These sections are further subdivided as described above. When the nucleic acid sequencing dataset 126 is loaded, it loads just the index section into memory. System 100 uses that data to find appropriate ranges of the data section to load into memory on-demand. Variant calls and read information is stored in the data section, the rest of the data loupe needs is small enough to store in the index section.
The data section is organized to chunks which are about ˜250 KB in some embodiments. When system 100 requires information stored in the data section it consults the relevant index in the synopsis section (e.g., gene track, exon track, etc.) to find the chunk that should have the data and loads the entire chunk into memory. In some embodiments, the chunks for variant data are JSON-encoded structures containing the variant data as well as the supporting barcode information. In some embodiments, the chunks for read data have an array of small (8-byte) data structures in which each structure contains the position, length, and barcode of a single read. In some embodiments, both variant and read data is sorted by genomic position so that in general, system 100 will make only a small number of on-disk reads to acquire all of the data it needs to display a given subset of the data. In some embodiments, the rest of the data that system 100 needs for visualization (such as the location of genes, structural variant breakpoints, etc) is stored in the index (synopsis) section of the nucleic acid sequencing dataset 126 file as an “itree”. An itree is an implementation of an interval tree. It is a reusable data structure (usually encoded in JSON) for annotating ranges of the genome. Thus exons, genes, phase blocks, and structural variant breakpoints are all encoded with the same mechanism even though they are displayed differently.
Plural instances may be provided for components, operations or structures described herein as a single instance. Finally, boundaries between various components, operations, and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the implementation(s). In general, structures and functionality presented as separate components in the example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the implementation(s).
It will also be understood that, although the terms “first,” “second,” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first object could be termed a second object, and, similarly, a second object could be termed a first object, without changing the meaning of the description, so long as all occurrences of the “first object” are renamed consistently and all occurrences of the “second object” are renamed consistently. The first object and the second object are both objects, but they are not the same object.
The terminology used herein is for the purpose of describing particular implementations only and is not intended to be limiting of the claims. As used in the description of the implementations and the appended claims, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used herein, the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in accordance with a determination” or “in response to detecting,” that a stated condition precedent is true, depending on the context. Similarly, the phrase “if it is determined (that a stated condition precedent is true)” or “if (a stated condition precedent is true)” or “when (a stated condition precedent is true)” may be construed to mean “upon determining” or “in response to determining” or “in accordance with a determination” or “upon detecting” or “in response to detecting” that the stated condition precedent is true, depending on the context.
The foregoing description included example systems, methods, techniques, instruction sequences, and computing machine program products that embody illustrative implementations. For purposes of explanation, numerous specific details were set forth in order to provide an understanding of various implementations of the inventive subject matter. It will be evident, however, to those skilled in the art that implementations of the inventive subject matter may be practiced without these specific details. In general, well-known instruction instances, protocols, structures and techniques have not been shown in detail.
The foregoing description, for purpose of explanation, has been described with reference to specific implementations. However, the illustrative discussions above are not intended to be exhaustive or to limit the implementations to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The implementations were chosen and described in order to best explain the principles and their practical applications, to thereby enable others skilled in the art to best utilize the implementations and various implementations with various modifications as are suited to the particular use contemplated.
Number | Name | Date | Kind |
---|---|---|---|
5149625 | Church et al. | Sep 1992 | A |
5202231 | Drmanac et al. | Apr 1993 | A |
5413924 | Kosak et al. | May 1995 | A |
5436130 | Mathies et al. | Jul 1995 | A |
5512131 | Kumar et al. | Apr 1996 | A |
5587128 | Wilding et al. | Dec 1996 | A |
5605793 | Stemmer | Feb 1997 | A |
5618711 | Gelfand et al. | Apr 1997 | A |
5695940 | Drmanac et al. | Dec 1997 | A |
5736330 | Fulton | Apr 1998 | A |
5834197 | Parton | Nov 1998 | A |
5851769 | Gray et al. | Dec 1998 | A |
5856174 | Lipshutz et al. | Jan 1999 | A |
5958703 | Dower et al. | Sep 1999 | A |
5994056 | Higuchi | Nov 1999 | A |
6046003 | Mandecki | Apr 2000 | A |
6051377 | Mandecki | Apr 2000 | A |
6057107 | Fulton | May 2000 | A |
6103537 | Ullman et al. | Aug 2000 | A |
6143496 | Brown et al. | Nov 2000 | A |
6172218 | Brenner | Jan 2001 | B1 |
6297006 | Drmanac et al. | Oct 2001 | B1 |
6297017 | Thompson | Oct 2001 | B1 |
6327410 | Walt et al. | Dec 2001 | B1 |
6355198 | Kim et al. | Mar 2002 | B1 |
6361950 | Mandecki | Mar 2002 | B1 |
6372813 | Johnson et al. | Apr 2002 | B1 |
6406848 | Bridgham et al. | Jun 2002 | B1 |
6432360 | Church | Aug 2002 | B1 |
6485944 | Church et al. | Nov 2002 | B1 |
6511803 | Church et al. | Jan 2003 | B1 |
6524456 | Ramsey et al. | Feb 2003 | B1 |
6586176 | Trnovsky et al. | Jul 2003 | B1 |
6632606 | Ullman et al. | Oct 2003 | B1 |
6632655 | Mehta et al. | Oct 2003 | B1 |
6670133 | Knapp et al. | Dec 2003 | B2 |
6767731 | Hannah | Jul 2004 | B2 |
6800298 | Burdick et al. | Oct 2004 | B1 |
6806052 | Bridgham et al. | Oct 2004 | B2 |
6806058 | Jesperson et al. | Oct 2004 | B2 |
6859570 | Walt et al. | Feb 2005 | B2 |
6913935 | Thomas | Jul 2005 | B1 |
6929859 | Chandler et al. | Aug 2005 | B2 |
6969488 | Bridgham et al. | Nov 2005 | B2 |
6974669 | Mirkin et al. | Dec 2005 | B2 |
7041481 | Anderson et al. | May 2006 | B2 |
7115400 | Adessi et al. | Oct 2006 | B1 |
7129091 | Ismagilov et al. | Oct 2006 | B2 |
7268167 | Higuchi et al. | Sep 2007 | B2 |
7282370 | Bridgham et al. | Oct 2007 | B2 |
7323305 | Leamon et al. | Jan 2008 | B2 |
7425431 | Church et al. | Sep 2008 | B2 |
7536928 | Kazuno | May 2009 | B2 |
7604938 | Takahashi et al. | Oct 2009 | B2 |
7622280 | Holliger et al. | Nov 2009 | B2 |
7638276 | Griffiths et al. | Dec 2009 | B2 |
7645596 | Williams et al. | Jan 2010 | B2 |
7666664 | Sarofim et al. | Feb 2010 | B2 |
7708949 | Stone et al. | May 2010 | B2 |
7709197 | Drmanac | May 2010 | B2 |
7745178 | Dong | Jun 2010 | B2 |
7776927 | Chu et al. | Aug 2010 | B2 |
RE41780 | Anderson et al. | Sep 2010 | E |
7799553 | Mathies et al. | Sep 2010 | B2 |
7842457 | Berka et al. | Nov 2010 | B2 |
7901891 | Drmanac | Mar 2011 | B2 |
7910354 | Drmanac et al. | Mar 2011 | B2 |
7960104 | Drmanac et al. | Jun 2011 | B2 |
7968287 | Griffiths et al. | Jun 2011 | B2 |
7972778 | Brown et al. | Jul 2011 | B2 |
8003312 | Krutzik et al. | Aug 2011 | B2 |
8067159 | Brown et al. | Nov 2011 | B2 |
8133719 | Drmanac et al. | Mar 2012 | B2 |
8252539 | Quake et al. | Aug 2012 | B2 |
8268564 | Roth et al. | Sep 2012 | B2 |
8273573 | Ismagilov et al. | Sep 2012 | B2 |
8278071 | Brown et al. | Oct 2012 | B2 |
8304193 | Ismagilov et al. | Nov 2012 | B2 |
8329407 | Ismagilov et al. | Dec 2012 | B2 |
8337778 | Stone et al. | Dec 2012 | B2 |
8592150 | Drmanac et al. | Nov 2013 | B2 |
8603749 | Gillevet | Dec 2013 | B2 |
8748094 | Weitz et al. | Jun 2014 | B2 |
8748102 | Berka et al. | Jun 2014 | B2 |
8765380 | Berka et al. | Jul 2014 | B2 |
8822148 | Ismagliov et al. | Sep 2014 | B2 |
8871444 | Griffiths et al. | Oct 2014 | B2 |
8889083 | Ismagilov et al. | Nov 2014 | B2 |
9012370 | Hong | Apr 2015 | B2 |
9017948 | Agresti et al. | Apr 2015 | B2 |
9029083 | Griffiths et al. | May 2015 | B2 |
9347059 | Saxonov | May 2016 | B2 |
9388465 | Hindson et al. | Jul 2016 | B2 |
9410201 | Hindson et al. | Aug 2016 | B2 |
9694361 | Bharadwaj | Jul 2017 | B2 |
9695468 | Hindson et al. | Jul 2017 | B2 |
9824068 | Wong et al. | Nov 2017 | B2 |
10119167 | Srinivasan et al. | Nov 2018 | B2 |
10221442 | Hindson et al. | Mar 2019 | B2 |
20010020588 | Adourian et al. | Sep 2001 | A1 |
20010044109 | Mandecki | Nov 2001 | A1 |
20020034737 | Drmanac | Mar 2002 | A1 |
20020051992 | Bridgham et al. | May 2002 | A1 |
20020089100 | Kawasaki | Jul 2002 | A1 |
20020092767 | Bjornson et al. | Jul 2002 | A1 |
20020179849 | Maher et al. | Dec 2002 | A1 |
20030008285 | Fischer | Jan 2003 | A1 |
20030008323 | Ravkin et al. | Jan 2003 | A1 |
20030027221 | Scott et al. | Feb 2003 | A1 |
20030028981 | Chandler et al. | Feb 2003 | A1 |
20030039978 | Hannah | Feb 2003 | A1 |
20030044777 | Beattie | Mar 2003 | A1 |
20030044836 | Levine et al. | Mar 2003 | A1 |
20030104466 | Knapp et al. | Jun 2003 | A1 |
20030108897 | Drmanac | Jun 2003 | A1 |
20030149307 | Hai et al. | Aug 2003 | A1 |
20030170698 | Gascoyne et al. | Sep 2003 | A1 |
20030182068 | Battersby et al. | Sep 2003 | A1 |
20030207260 | Trnovsky et al. | Nov 2003 | A1 |
20030215862 | Parce et al. | Nov 2003 | A1 |
20040063138 | McGinnis et al. | Apr 2004 | A1 |
20040132122 | Banerjee et al. | Jul 2004 | A1 |
20040258701 | Dominowski et al. | Dec 2004 | A1 |
20050019839 | Jespersen et al. | Jan 2005 | A1 |
20050042625 | Schmidt et al. | Feb 2005 | A1 |
20050079510 | Berka et al. | Apr 2005 | A1 |
20050130188 | Walt et al. | Jun 2005 | A1 |
20050172476 | Stone et al. | Aug 2005 | A1 |
20050181379 | Su et al. | Aug 2005 | A1 |
20050202429 | Trau et al. | Sep 2005 | A1 |
20050202489 | Cho et al. | Sep 2005 | A1 |
20050221339 | Griffiths et al. | Oct 2005 | A1 |
20050244850 | Huang et al. | Nov 2005 | A1 |
20050287572 | Mathies et al. | Dec 2005 | A1 |
20060020371 | Ham et al. | Jan 2006 | A1 |
20060073487 | Oliver et al. | Apr 2006 | A1 |
20060078888 | Griffiths et al. | Apr 2006 | A1 |
20060153924 | Griffiths et al. | Jul 2006 | A1 |
20060163385 | Link et al. | Jul 2006 | A1 |
20060199193 | Koo et al. | Sep 2006 | A1 |
20060240506 | Kushmaro et al. | Oct 2006 | A1 |
20060257893 | Takahashi et al. | Nov 2006 | A1 |
20060263888 | Fritz et al. | Nov 2006 | A1 |
20060292583 | Schneider et al. | Dec 2006 | A1 |
20070003442 | Link et al. | Jan 2007 | A1 |
20070020617 | Trnovsky et al. | Jan 2007 | A1 |
20070054119 | Garstecki et al. | Mar 2007 | A1 |
20070077572 | Tawfik et al. | Apr 2007 | A1 |
20070092914 | Griffiths et al. | Apr 2007 | A1 |
20070099208 | Drmanac et al. | May 2007 | A1 |
20070111241 | Cereb et al. | May 2007 | A1 |
20070154903 | Marla et al. | Jul 2007 | A1 |
20070172873 | Brenner et al. | Jul 2007 | A1 |
20070190543 | Livak | Aug 2007 | A1 |
20070195127 | Ahn et al. | Aug 2007 | A1 |
20070207060 | Zou et al. | Sep 2007 | A1 |
20070228588 | Noritomi et al. | Oct 2007 | A1 |
20070264320 | Lee et al. | Nov 2007 | A1 |
20080003142 | Link et al. | Jan 2008 | A1 |
20080004436 | Tawfik et al. | Jan 2008 | A1 |
20080014589 | Link et al. | Jan 2008 | A1 |
20080213766 | Brown et al. | Sep 2008 | A1 |
20080241820 | Krutzik et al. | Oct 2008 | A1 |
20080268431 | Choy et al. | Oct 2008 | A1 |
20090005252 | Drmanac et al. | Jan 2009 | A1 |
20090011943 | Drmanac et al. | Jan 2009 | A1 |
20090012187 | Chu et al. | Jan 2009 | A1 |
20090025277 | Takanashi | Jan 2009 | A1 |
20090035770 | Mathies et al. | Feb 2009 | A1 |
20090048124 | Leamon et al. | Feb 2009 | A1 |
20090053169 | Castillo et al. | Feb 2009 | A1 |
20090068170 | Weitz et al. | Mar 2009 | A1 |
20090098555 | Roth et al. | Apr 2009 | A1 |
20090118488 | Drmanac et al. | May 2009 | A1 |
20090137404 | Drmanac et al. | May 2009 | A1 |
20090137414 | Drmanac et al. | May 2009 | A1 |
20090143244 | Bridgham et al. | Jun 2009 | A1 |
20090155781 | Drmanac et al. | Jun 2009 | A1 |
20090197248 | Griffiths et al. | Aug 2009 | A1 |
20090197772 | Griffiths et al. | Aug 2009 | A1 |
20090202984 | Cantor | Aug 2009 | A1 |
20090203531 | Kurn | Aug 2009 | A1 |
20090264299 | Drmanac et al. | Oct 2009 | A1 |
20090286687 | Dressman et al. | Nov 2009 | A1 |
20100021973 | Makarov et al. | Jan 2010 | A1 |
20100021984 | Edd et al. | Jan 2010 | A1 |
20100022414 | Link et al. | Jan 2010 | A1 |
20100069263 | Shendure et al. | Mar 2010 | A1 |
20100105112 | Holtze et al. | Apr 2010 | A1 |
20100130369 | Shenderov et al. | May 2010 | A1 |
20100136544 | Agresti et al. | Jun 2010 | A1 |
20100137163 | Link et al. | Jun 2010 | A1 |
20100173394 | Colston et al. | Jul 2010 | A1 |
20100210479 | Griffiths et al. | Aug 2010 | A1 |
20100216153 | Lapidus et al. | Aug 2010 | A1 |
20110033854 | Drmanac et al. | Feb 2011 | A1 |
20110053798 | Hindson et al. | Mar 2011 | A1 |
20110071053 | Drmanac et al. | Mar 2011 | A1 |
20110086780 | Colston et al. | Apr 2011 | A1 |
20110092376 | Colston et al. | Apr 2011 | A1 |
20110092392 | Colston et al. | Apr 2011 | A1 |
20110160078 | Fodor et al. | Jun 2011 | A1 |
20110195496 | Muraguchi et al. | Aug 2011 | A1 |
20110201526 | Berka et al. | Aug 2011 | A1 |
20110217736 | Hindson | Sep 2011 | A1 |
20110218123 | Weitz et al. | Sep 2011 | A1 |
20110257889 | Klammer et al. | Oct 2011 | A1 |
20110263457 | Krutzik et al. | Oct 2011 | A1 |
20110267457 | Weitz et al. | Nov 2011 | A1 |
20110281738 | Drmanac et al. | Nov 2011 | A1 |
20110305761 | Shum et al. | Dec 2011 | A1 |
20110319281 | Drmanac | Dec 2011 | A1 |
20120000777 | Garrell et al. | Jan 2012 | A1 |
20120010098 | Griffiths et al. | Jan 2012 | A1 |
20120010107 | Griffiths et al. | Jan 2012 | A1 |
20120015382 | Weitz et al. | Jan 2012 | A1 |
20120015822 | Weitz et al. | Jan 2012 | A1 |
20120041727 | Mishra et al. | Feb 2012 | A1 |
20120071331 | Casbon et al. | Mar 2012 | A1 |
20120121481 | Romanowsky et al. | May 2012 | A1 |
20120132288 | Weitz et al. | May 2012 | A1 |
20120135893 | Drmanac et al. | May 2012 | A1 |
20120172259 | Rigatti et al. | Jul 2012 | A1 |
20120184449 | Hixson et al. | Jul 2012 | A1 |
20120190032 | Ness et al. | Jul 2012 | A1 |
20120196288 | Beer | Aug 2012 | A1 |
20120211084 | Weitz et al. | Aug 2012 | A1 |
20120220494 | Samuels et al. | Aug 2012 | A1 |
20120220497 | Jacobson et al. | Aug 2012 | A1 |
20120222748 | Weitz et al. | Sep 2012 | A1 |
20120230338 | Ganeshalingam et al. | Sep 2012 | A1 |
20120309002 | Link | Dec 2012 | A1 |
20120316074 | Saxonov | Dec 2012 | A1 |
20130028812 | Prieto et al. | Jan 2013 | A1 |
20130046030 | Rotem et al. | Feb 2013 | A1 |
20130078638 | Berka et al. | Mar 2013 | A1 |
20130079231 | Pushkarev et al. | Mar 2013 | A1 |
20130109575 | Kleinschmidt et al. | May 2013 | A1 |
20130130919 | Chen et al. | May 2013 | A1 |
20130157870 | Pushkarev et al. | Jun 2013 | A1 |
20130157899 | Adler et al. | Jun 2013 | A1 |
20130178368 | Griffiths et al. | Jul 2013 | A1 |
20130185096 | Giusti | Jul 2013 | A1 |
20130189700 | So et al. | Jul 2013 | A1 |
20130203605 | Shendure et al. | Aug 2013 | A1 |
20130210639 | Link et al. | Aug 2013 | A1 |
20130225418 | Watson | Aug 2013 | A1 |
20130268206 | Porreca et al. | Oct 2013 | A1 |
20130274117 | Church et al. | Oct 2013 | A1 |
20130311106 | White et al. | Nov 2013 | A1 |
20130317755 | Mishra et al. | Nov 2013 | A1 |
20140037514 | Stone et al. | Feb 2014 | A1 |
20140057799 | Johnson et al. | Feb 2014 | A1 |
20140065234 | Shum et al. | Mar 2014 | A1 |
20140155295 | Hindson et al. | Jun 2014 | A1 |
20140194323 | Gillevet | Jul 2014 | A1 |
20140199730 | Agresti et al. | Jul 2014 | A1 |
20140199731 | Agresti et al. | Jul 2014 | A1 |
20140200166 | Van Rooyen et al. | Jul 2014 | A1 |
20140206554 | Hindson et al. | Jul 2014 | A1 |
20140214334 | Plattner et al. | Jul 2014 | A1 |
20140227684 | Hindson et al. | Aug 2014 | A1 |
20140227706 | Kato et al. | Aug 2014 | A1 |
20140228255 | Hindson et al. | Aug 2014 | A1 |
20140235506 | Hindson et al. | Aug 2014 | A1 |
20140287963 | Hindson et al. | Sep 2014 | A1 |
20140302503 | Lowe et al. | Oct 2014 | A1 |
20140323316 | Drmanac et al. | Oct 2014 | A1 |
20140378322 | Hindson et al. | Dec 2014 | A1 |
20140378345 | Hindson et al. | Dec 2014 | A1 |
20140378349 | Hindson et al. | Dec 2014 | A1 |
20140378350 | Hindson et al. | Dec 2014 | A1 |
20150005199 | Hindson et al. | Jan 2015 | A1 |
20150005200 | Hindson et al. | Jan 2015 | A1 |
20150011430 | Saxonov | Jan 2015 | A1 |
20150011432 | Saxonov | Jan 2015 | A1 |
20150066385 | Schnall-Levin et al. | Mar 2015 | A1 |
20150111256 | Church et al. | Apr 2015 | A1 |
20150133344 | Shendure et al. | May 2015 | A1 |
20150218633 | Hindson et al. | Aug 2015 | A1 |
20150220532 | Wong | Aug 2015 | A1 |
20150224466 | Hindson et al. | Aug 2015 | A1 |
20150225777 | Hindson et al. | Aug 2015 | A1 |
20150225778 | Hindson et al. | Aug 2015 | A1 |
20150292988 | Bharadwaj et al. | Oct 2015 | A1 |
20150298091 | Weitz et al. | Oct 2015 | A1 |
20150299772 | Zhang | Oct 2015 | A1 |
20150376605 | Jarosz et al. | Dec 2015 | A1 |
20150376609 | Hindson et al. | Dec 2015 | A1 |
20150376700 | Schnall-Levin | Dec 2015 | A1 |
20150379196 | Schnall-Levin et al. | Dec 2015 | A1 |
20160232291 | Kyriazopoulou-Panagiotopoulou et al. | Aug 2016 | A1 |
20160304860 | Hindson et al. | Oct 2016 | A1 |
20160350478 | Chin et al. | Dec 2016 | A1 |
20170235876 | Jaffe et al. | Aug 2017 | A1 |
20180196781 | Wong | Jul 2018 | A1 |
20180265928 | Schnall-Levin et al. | Sep 2018 | A1 |
Number | Date | Country |
---|---|---|
0249007 | Dec 1987 | EP |
0637996 | Jul 1997 | EP |
1019496 | Sep 2004 | EP |
1482036 | Oct 2007 | EP |
1594980 | Nov 2009 | EP |
1967592 | Apr 2010 | EP |
2258846 | Dec 2010 | EP |
2145955 | Feb 2012 | EP |
1905828 | Aug 2012 | EP |
2136786 | Oct 2012 | EP |
1908832 | Dec 2012 | EP |
2540389 | Jan 2013 | EP |
2485850 | May 2012 | GB |
5949832 | Mar 1984 | JP |
2006-507921 | Mar 2006 | JP |
2006-289250 | Oct 2006 | JP |
2007-268350 | Oct 2007 | JP |
2009-208074 | Sep 2009 | JP |
2321638 | Apr 2008 | RU |
WO-1996029629 | Sep 1996 | WO |
WO-1996041011 | Dec 1996 | WO |
WO-1999009217 | Feb 1999 | WO |
WO-1999052708 | Oct 1999 | WO |
WO-2000008212 | Feb 2000 | WO |
WO-2000026412 | May 2000 | WO |
WO-2001014589 | Mar 2001 | WO |
WO-2001089787 | Nov 2001 | WO |
WO-2002031203 | Apr 2002 | WO |
WO-2002086148 | Oct 2002 | WO |
WO-2004002627 | Jan 2004 | WO |
WO-2004010106 | Jan 2004 | WO |
WO-2004069849 | Aug 2004 | WO |
WO-2004091763 | Oct 2004 | WO |
WO-2004102204 | Nov 2004 | WO |
WO-2004103565 | Dec 2004 | WO |
WO-2004105734 | Dec 2004 | WO |
WO-2005002730 | Jan 2005 | WO |
WO-2005021151 | Mar 2005 | WO |
WO-2005023331 | Mar 2005 | WO |
WO-2005040406 | May 2005 | WO |
WO-2005049787 | Jun 2005 | WO |
WO-2005082098 | Sep 2005 | WO |
WO-2006030993 | Mar 2006 | WO |
WO-2006078841 | Jul 2006 | WO |
WO-2006096571 | Sep 2006 | WO |
WO-2007001448 | Jan 2007 | WO |
WO-2007002490 | Jan 2007 | WO |
WO-2007024840 | Mar 2007 | WO |
WO-2007081385 | Jul 2007 | WO |
WO-2007081387 | Jul 2007 | WO |
WO-2007089541 | Aug 2007 | WO |
WO-2007114794 | Oct 2007 | WO |
WO-2007121489 | Oct 2007 | WO |
WO-2007133710 | Nov 2007 | WO |
WO-2007138178 | Dec 2007 | WO |
WO-2007139766 | Dec 2007 | WO |
WO-2007140015 | Dec 2007 | WO |
WO-2007149432 | Dec 2007 | WO |
WO-2008021123 | Feb 2008 | WO |
WO-2008091792 | Jul 2008 | WO |
WO-2008102057 | Aug 2008 | WO |
WO-2008109176 | Sep 2008 | WO |
WO-2008121342 | Oct 2008 | WO |
WO-2008134153 | Nov 2008 | WO |
WO-2009005680 | Jan 2009 | WO |
WO-2009011808 | Jan 2009 | WO |
WO 2009023821 | Feb 2009 | WO |
WO-2009061372 | May 2009 | WO |
WO-2009085215 | Jul 2009 | WO |
WO-2010004018 | Jan 2010 | WO |
WO-2010033200 | Mar 2010 | WO |
WO-2010115154 | Oct 2010 | WO |
2010126614 | Nov 2010 | WO |
WO-2010127304 | Nov 2010 | WO |
WO-2010148039 | Dec 2010 | WO |
WO-2010151776 | Dec 2010 | WO |
WO-2011047870 | Apr 2011 | WO |
WO-2011056546 | May 2011 | WO |
WO-2011066476 | Jun 2011 | WO |
WO-2011074960 | Jun 2011 | WO |
WO-2012012037 | Jan 2012 | WO |
WO-2012048341 | Apr 2012 | WO |
WO 2012061832 | May 2012 | WO |
WO-2012055929 | May 2012 | WO |
WO 2012100216 | Jul 2012 | WO |
2012112804 | Aug 2012 | WO |
WO-2012106546 | Aug 2012 | WO |
WO-2012116331 | Aug 2012 | WO |
WO-2012083225 | Sep 2012 | WO |
WO 2012142531 | Oct 2012 | WO |
WO 2012142611 | Oct 2012 | WO |
WO-2012149042 | Nov 2012 | WO |
WO-2012166425 | Dec 2012 | WO |
WO-2013035114 | Mar 2013 | WO |
2013055955 | Apr 2013 | WO |
WO-2013123125 | Aug 2013 | WO |
WO-2013177220 | Nov 2013 | WO |
WO-2014028537 | Feb 2014 | WO |
WO 2014093676 | Jun 2014 | WO |
WO 2015157567 | Oct 2015 | WO |
WO 2015200891 | Dec 2015 | WO |
WO-2016130578 | Aug 2016 | WO |
Entry |
---|
Zheng, X., SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequencing Variants, Department of Biostatistics, University of Washington-Seattle, Dec. 28, 2014. |
Ekblom, R. et al. “A field guide to whole-genome sequencing, assembly and annotation” Evolutionary Apps (Jun. 24, 2014) 7(9):1026-1042. |
Jarosz, M. et al. “Using 1ng of DNA to detect haplotype phasing and gene fusions from whole exome sequencing of cancer cell lines” Cancer Res (2015) 75(supp15):4742. |
Lo, et al. On the design of clone-based haplotyping. Genome Biol. 2013;14(9):R100. |
McCoy, R. et al. “Illumina TruSeq Synthetic Long-Reads Empower De Novo Assembly and Resolve Complex, Highly-Repetitive Transposable Elements” PLOS (2014) 9(9):e1016689. |
Ritz, A. et al. “Characterization of structural variants with single molecule and hybrid sequencing approaches” Bioinformatics (2014) 30(24):3458-3466. |
Voskoboynik, A. et al. “The genome sequence of the colonial chordate, Botryllus schlosseri.” eLife Jul. 2, 2013, 2: e00569. |
Zerbino, D.R. “Using the Velvet de novo assembler for short-read sequencing technologies” Curr Protoc Bioinformatics (Sep. 1, 2010) 31:11.5:11.5.1″11.5.12. |
Zheng, X.Y. et al. “Haplotyping germline and cancer genomes with high-throughput linked-read sequencing” Nature Biotech (Feb. 1, 2016) 34(3):303-311 and Supplemental Material. |
Extended European Search Report for EP Application No. 16737834.8, dated Jul. 27, 2018, 10 pages. |
U.S. Appl. No. 15/019,928, filed Feb. 9, 2016, 10X Genomics, Inc. |
International Search Report for International Patent Application No. PCT/US2016/013290, dated May 19, 2016, 11 pages. |
International Search Report for International Patent Application No. PCT/US2016/017196, dated May 29, 2016, 14 pages. |
Bansal et al., “HapCUT: an efficient and accurate algorithm for the haplotype assembly problem,” Bioinformatics, vol. 24, 2008, pp. i153-i159. |
Bansal et al., 2008, “An MCMC algorithm for haplotype assembly from whole-genome sequence data,” Genome Res, 18:1336-1346. |
Bentley et al., 2008, Accurate whole human genome sequencing using reversible terminator chemistry, Nature 456:53-59. |
Bray, “The JavaScript Object Notation (JSON) Data Interchange Format,” Mar. 2014, retrieved from the Internet Feb. 15, 2015; https://tools.ietf.org/html/rfc7159. |
Browning et al., “Haplotype phasing: Existing methods and new developments,” Nat Rev Genet., 12(10), Apr. 1, 2012, pp. 703-714. |
Chen et al., 2009, “BreakDancer: an algorithm for high-resolution mapping of genomic structural variation,” Nature Methods 6(9), pp. 677-681. |
Choi et al., 2008, “Identification of novel isoforms of the EML4-ALK transforming gene in non-small cell lung cancer,” Cancer Res, 68:4971-4976. |
Cleary et al., 2014, “Joint variant and de novo mutation identification on pedigrees from high-throughput sequencing data,” J Comput Biol, 21:405-419. |
Eid et al., “Real-time sequencing form single polymerase molecules,” Science 323:133-138. |
Gordon et al., 1998, “Consed: A Graphical Tool for Sequence Finishing,” Genome Research 8:198-202. |
Heng and Durbin, 2010, “Fast and accurate long-read alignment with Burrows-Wheeler transform,” Bioinformatics, 25(14): 1754-1760. |
Huang and Marth, 2008, “EagleView: A genome assembly viewer for next-generation sequencing technologies,” Genome Research 18:1538-1543. |
Kanehisa and Goto, 2000, “KEGG: Kyoto Encyclopedia of Genes and Genomes,” Nucleic Acids Research 28, 27-30. |
Kim et al., “HapEdit: an accuracy assessment viewer for haplotype assembly using massively parallel DNA-sequencing technologies,” Nucleic Acids Research, 2011, pp. 1-5. |
Kirkness et al., 2013, “Sequencing of isolated sperm cells for direct haplotyping of a human genome,” Genome Res, 23:826-832. |
Kitzman et al., 2011, “Haplotype-resolved genome sequencing of a Gujarati Indian individual.” Nat Biotechnol, 29:59-63. |
Layer et al., 2014, “LUMPY: A probabilistic framework for structural variant discovery,” Genome Biology 15(6):R84. |
Lippert et al., 2002, “Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem,” Brief. Bionform 3:23-31. |
Margulies et al., 2005, “Genome sequencing in microfabricated high-density picoliter reactors,” Nature 437:376-380. |
McKenna et al., “The Genome Analysis Toolkit: A MapReduce framework for anaylzing next-generation DNA sequencing data,” Genome Research, 2010, pp. 1297-1303. |
Miller et al., “Assembly Algorithms for next-generation sequencing data,” Genomics, 95 (2010), pp. 315-327. |
Myllykangas et al., 2011, “Efficient targeted resequencing of human germline and cancer genomes by oligonucleotide-selective sequencing,” Nat Biotechnol, 29:1024-1027. |
Pushkarev et al., 2009, “Single-molecule sequencing of an individual human genome,” Nature Biotech 17:847-850. |
Shendure et al., 2005, “Accurate Multiplex Polony Sequencing of an Evolved bacterial Genome” Science 309:1728-1732. |
Tewhey et al., 2011, “The importance of phase information for human genomics,” Nat Rev Genet, 12:215-223. |
The SAM/BAM Format Specificatio Working Group, “Sequence Allignment/ Map Format Specification,” Dec. 28, 2014. |
Wheeler et al.,2007, “Database resources of the National Center for Biotechnology Information,” Nucleic Acids Res. 35 (Database issue): D5-12. |
Zerbino et al., “Velvet: Algorithms for de novo short read assembly using de Bruijn graphs,” Genome Research 18, 2008, pp. 821-829. |
Zerbino, Daniel, “Velvet Manual—version 1.1,” Aug. 15, 2008, pp. 1-22. |
“SSH Tunnel—Local and Remote Port Forwarding Explained With Examples,” Trackets Blog, http://blog.trackets.com/2014/05/17/ssh-tunnel-local-and-remote-port-forwarding-explained-with-examples.html; Retrieved from the Internet Jul. 7, 2016. |
“Bedtools: General Usage,” http://bedtools.readthedocs.io/en/latest/content/general-usage.html; Retrieved from the Internet Jul. 8, 2016. |
Co-pending U.S. Appl. No. 15/242,256, filed Aug. 19, 2016. |
Margulies 2005 Supplementary methods (Year: 2005). |
Abate et al., Valve-based flow focusing for drop formation. Appl Phys Lett. 2009;94. 3 pages. |
Abate, A.R. et al. “Beating Poisson encapsulation statistics using close-packed ordering” Lab on a Chip (Sep. 21, 2009) 9(18):2628-2631. |
Abate, et al. High-throughput injection with microfluidics using picoinjectors. Proc Natl Acad Sci U S A. Nov. 9, 2010;107(45):19163-6. doi: 10.1073-pNas.1006888107. Epub Oct. 20, 2010. |
Agresti, et al. Selection of ribozymes that catalyse multiple-turnover Diels-Alder cycloadditions by using in vitro compartmentalization. Proc Natl Acad Sci U S A. Nov. 8, 2005;102(45):16170-5. Epub Oct. 31, 2005. |
Aitman, et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature. Feb. 16, 2006;439(7078):851-5. |
Akselband, “Enrichment of slow-growing marine microorganisms from mixed cultures using gel microdrop (GMD) growth assay and fluorescence-activated cell sorting”, J. Exp. Marine Biol., 329: 196-205 (2006). |
Akselband, “Rapid mycobacteria drug susceptibility testing using gel microdrop (GMD) growth assay and flow cytometry”, J. Microbiol. Methods, 62:181-197 (2005). |
Anna et al., “Formation of dispersions using ‘flow focusing’ in microchannels”, Appln. Phys. Letts. 82:3 364 (2003). |
Attia, U.M et al., “Micro-injection moulding of polymer microfluidic devices” Microfluidics and nanofluidics (2009) 7(1):1-28. |
Balikova, et al. Autosomal-dominant microtia linked to five tandem copies of a copy-number-variable region at chromosome 4p16. Am J Hum Genet. Jan. 2008;82(1):181-7. doi: 10.1016-j.ajhg.2007.08.001. |
Baret et al. “Fluorescence-activated droplet sorting (FADS): efficient microfluidic cell sorting based on enzymatic activity” Lab on a Chip (2009) 9(13):1850-1858. |
Boone, et al. Plastic advances microfluidic devices. The devices debuted in silicon and glass, but plastic fabrication may make them hugely successful in biotechnology application. Analytical Chemistry. Feb. 2002; 78A-86A. |
Braeckmans et al., Scanning the Code. Modern Drug Discovery. 2003:28-32. |
Bransky, et al. A microfluidic droplet generator based on a piezoelectric actuator. Lab Chip. Feb. 21, 2009;9(4):516-20. doi: 10.1039-b814810d. Epub Nov. 20, 2008. |
Brouzes, E et al., “Droplet microfluidic technology for single-cell high-throughput screening” PNAS (2009) 106(34):14195-14200. |
Cappuzzo, et al. Increased HER2 gene copy number is associated with response to gefitinib therapy in epidermal growth factor receptor-positive non-small-cell lung cancer patients. J Clin Oncol. Aug. 1, 2005;23(22):5007-18. |
Carroll, “The selection of high-producing cell lines using flow cytometry and cell sorting”, Exp. Op. Bioi. Therp., 4:11 1821-1829 (2004). |
Chaudhary “A rapid method of cloning functional variable-region antibody genes in Escherichia coli as single-chain immunotoxins” Proc. Nat!. Acad. Sci USA 87: 1066-1070 (Feb. 1990). |
Chechetkin et al., Sequencing by hybridization with the generic 6-mer oligonucleotide microarray: an advanced scheme for data processing. J Biomol Struct Dyn. Aug. 2000;I8(1):83-101. |
Chen, F et al., “Chemical transfection of cells in picoliter aqueous droplets in fluorocarbon oil” Anal. Chem. (2011) 83:8816-8820. |
Chokkalingam, V et al., “Probing cellular heterogeneity in cytokine-secreting immune cells using droplet-based microfluidics” Lab Chip (2013) 13:4740-4744. |
Chou, H-P. et al. “Disposable Microdevices for DNA Analysis and Cell Sorting Proc. Solid-State Sensor and Actuator Workshop” Hilton Head, SC Jun. 8-11, 1998, pp. 11-14. |
Chu, L-Y. et al., “Controllable monodisperse multiple emulsions” Angew. Chem. Int. Ed. (2007) 46:8970-8974. |
Clausell-Tormos et al., “Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms”, Chem. Biol. 15:427-437 (2008). |
Cook, et al. Copy-number variations associated with neuropsychiatric conditions. Nature. Oct. 16, 2008;455(7215):919-23. doi: 10.1038-nature07458. |
De Bruin et al., UBS Investment Research. Q-Series�: DNa Sequencing. UBS Securities LLC. Jul. 12, 2007. 15 pages. |
Demirci, et al. “Single cell epitaxy by acoustic picolitre droplets” Lab Chip. Sep. 2007;7(9):1139-45. Epub Jul. 10, 2007. |
Doerr, “The smallest bioreactor”, Nature Methods, 2:5 326 (2005). |
Dowding, et al. “Oil core-polymer shell microcapsules by interNal phase separation from emulsion droplets. II: controlling the release profile of active molecules” Langmuir. Jun. 7, 2005;21(12):5278-84. |
Draper, M.C. et al., “Compartmentalization of electrophoretically separated analytes in a multiphase microfluidic platform” Anal. Chem. (2012) 84:5801-5808. |
Dressler, O.J. et al., “Droplet-based microfluidics enabling impact on drug discovery” J. Biomol. Screen (2014) 19(4):483-496. |
Drmanac et al., Sequencing by hybridization (SBH): advantages, achievements, and opportunities. Adv Biochem Eng Biotechnol. 2002;77 :75-101. |
Droplet Based Sequencing (slides) dated (Mar. 12, 2008). |
Eastburn, D.J. et al., “Ultrahigh-throughput mammalian single-cell reverse-transcriptase polymerase chain reaction in microfluidic droplets” Anal. Chem. (2013) 85:8016-8021. |
Esser-Kahn, et al. Triggered release from polymer capsules. Macromolecules. 2011; 44:5539-5553. |
Fabi, et al. Correlation of efficacy between EGFR gene copy number and lapatinib-capecitabine therapy in HER2-positive metastatic breast cancer. J. Clin. Oncol. 2010; 28:15S. 2010 ASCO Meeting abstract Jun. 14, 2010:1059. |
Fisher, S. et al. “A Scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries” Genome Biology (2011) 2:R1-R15. doi: 10.1186-gb-2011-12-1-r1. Epub Jan. 4, 2011. |
Fredrickson, C.K. et al., “Macro-to-micro interfaces for microfluidic devices” Lab Chip (2004) 4:526-533. |
Freiberg, et al. “Polymer microspheres for controlled drug release” Int J Pharm. Sep. 10, 2004;282(1-2):1-18. |
Fu. A.Y. et al. “A microfabricated fluorescence-activated cell sorter” Nature Biotech (Nov. 1999) 17:1109-1111. |
Fulton et al., “Advanced multiplexed analysis with the FlowMetrix system” Clin Chern. Sep. 1997;43(9): 1749-56. |
Garstecki, P. et al. “Formation of monodisperse bubbles in a microfluidic flow-focusing device” Appl. Phys. Lett (2004) 85(13):2659-2651. DOI: 10.1063-1.1796526. |
Gartner, et al. The Microfluidic Toolbox �examples for fluidic interfaces and standardization concepts. Proc. SPIE 4982, Microfluidics, BioMEMS, and Medical Microsystems, (Jan. 17, 2003); doi: 10.1117-12.479566. |
Ghadessy, et al. Directed evolution of polymerase function by compartmentalized self-replication. Proc Natl Acad Sci U S A. Apr. 10, 2001;98(8):4552-7. Epub Mar. 27, 2001. |
Gonzalez, et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1-AIDS susceptibility. Science. Mar. 4, 2005;307(5714):1434-40. Epub Jan. 6, 2005. |
Granieri, Lucia “Droplet-based microfluidics and engineering of tissue plasminogen activator for biomedical applications” Ph.D. Thesis, Nov. 13, 2009 (131 pages). |
Grasland-Mongrain, E. et al. “Droplet coalescence in microlfuidic devices” Internet Citation, 2003, XP002436104, Retrieved from the Internet: URL:http:--www.eleves.ens.fr.home-grasland-rapports-stage4.pdf [retrieved on Jun. 4, 2007]. |
Guo, M.T. et al., “Droplet microfluidics for high-throughput biological assays” Lab Chip (2012) 12:2146-2155. |
Gyarmati et al., “Reversible Disulphide Formation in Polymer Networks: A Versitile Functional Group from Synthesis to Application,” European Polymer Journal, 2013, 49, 1268-1286. |
Hashimshony, T et al. “CEL-Seq: Single-Cell RNa-Seq by Multiplexed Linear Amplification” Cell Rep. Sep. 27, 2012;2(3):666-73. doi: 10.1016-j.celrep.2012.08.003. Epub Aug. 30, 2012. |
He “Selective Encapsulation of Single Cells and Subcellular Organelles into Picoliter- and Femtoliter-Volume Droplets” Anal. Chern 77: 1539-1544 (2005). |
Holtze, C. et al. Biocompatible surfactants for water-in-fluorocarbon emulsions. Lab Chip. Oct. 2008;8(10):1632-9. doi: 10.1039-b806706f. Epub Sep. 2, 2008. |
Huebner, “Quantitative detection of protein expression in single cells using droplet microfluidics”, Chern. Commun. 1218-1220 (2007). |
Hug, H. et al. “Measurement of the number of molecules of a single mRNA species in a complex mRNA preparation” J Theor Biol. Apr. 21, 2003;221(4):615-24. |
Illumina, Inc. An Introduction to Next-Generation Sequencing Technology. Feb. 28, 2012. |
Jena et al., “Cyclic olefin copolymer based microfluidic devices for biochip applications: Ultraviolet surface grafting using 2-methacryloyloxyethyl phosphorylchloline” Biomicrofluidics (Mar. 15, 2012) 6:012822 (12 pages). |
Jung, W-C et al., “Micromachining of injection mold inserts for fluidic channel of polymeric biochips” Sensors (2007) 7:1643-1654. |
Khomiakov A et al., “Analysis of perfect and mismatched DNA duplexes by a generic hexanucleotide microchip”. Mol Bioi (Mosk). Jul.-Aug. 2003;37(4):726-41. Russian. Abstract only. |
Kim, et al. Albumin loaded microsphere of amphiphilic poly(ethylene glycol)-poly(alpha-ester) multiblock copolymer. Eur J Pharm Sci. Nov. 2004;23(3):245-51. |
Kim, et al. Fabrication of monodisperse gel shells and functioNal microgels in microfluidic devices. Angew Chem Int Ed Engl. 2007;46(11):1819-22. |
Kim, J et al., “Rapid prototyping of microfluidic systems using a PDMS-polymer tape composite” Lab Chip (2009) 9:1290-1293. |
Kitzman, et al. Noninvasive whole-genome sequencing of a human fetus. Sci Transl Med. Jun. 6, 2012;4(137):137ra76. doi: 10.1126-scitranslmed.3004323. |
Klein, et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell. May 21, 2015; 161:1187-1201. |
Knight, et al. Subtle chromosomal rearrangements in children with unexplained mental retardation. Lancet. Nov. 13, 1999;354(9191):1676-81. |
Koster et al., “Drop-based microfluidic devices for encapsulation of single cells”, Lab on a Chip The Royal Soc. of Chern. 8: 1110-1115 (2008). |
Kutyavin, et al. Oligonucleotides containing 2-aminoadenine and 2-thiothymine act as selectively binding complementary agents. Biochemistry. Aug. 27, 1996;35(34):11170-6. |
Lagus, T.P. et al., “A review of the theory, methods and recent applications of high-throughput single-cell droplet microfluidics” J. Phys. D: Appl. Phys. (2013) 46:114005 (21 pages). |
Li, Y., et al., “PEGylated PLGA Nanoparticles as protein carriers: synthesis, preparation and biodistribution in rats,” Journal of Controlled Release, vol. 71, pp. 203-211 (2001). |
Liu, et al. Preparation of uniform-sized PLA microcapsules by combining Shirasu porous glass membrane emulsification technique and multiple emulsion-solvent evaporation method. J Control Release. Mar. 2, 2005;103(1):31-43. Epub Dec. 21, 2004. |
Liu, et al. Smart thermo-triggered squirting capsules for Nanoparticle delivery. Soft Matter. 2010; 6(16):3759-3763. |
Loscertales, I.G., et al., “Micro-Nano Encapsulation via Electrified Coaxial Liquid Jets,” Science, vol. 295, pp. 1695-1698 (2002). |
Love, “A microengraving method for rapid selection of single cells producing antigen-specific antibodies”, Nature Biotech, 24(6):703-707 (Jun. 2006). |
Lowe, Adam J. “Norbornenes and [n]polynorbornanes as molecular scaffolds for anion recognition” Ph.D. Thesis (May 2010). (361 pages). |
Lupski. Genomic rearrangements and sporadic disease. Nat Genet. Jul. 2007;39(7 Suppl):S43-7. |
Macosko, et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. May 21, 2015;161(5):1202-14. doi: 10.1016-j.cell.2015.05.002. |
Mair, D.A. et al., “Injection molded microfluidic chips featuring integrated interconnects” Lab Chip (2006) 6:1346-1354. |
Makino, K. et al. “Preparation of hydrogel microcapsules Effects of preparation conditions upon membrane properties” Colloids and Surfaces: B Biointerfaces (1998) 12:97-104. |
Marcus. Gene method offers diagnostic hope. The Wall Street Journal. Jul. 11, 2012. |
Matochko, W.L. et al., “Uniform amplification of phage display libraries in monodisperse emulsions,” Methods (2012) 58:18-27. |
Mazutis, et al. Selective droplet coalescence using microfluidic systems. Lab Chip. Apr. 24, 2012;12(10):1800-6. doi: 10.1039-c2lc40121e. Epub Mar. 27, 2012. |
Merriman, et al. Progress in ion torrent semiconductor chip based sequencing. Electrophoresis. Dec. 2012;33(23):3397-3417. doi: 10.1002-elps.201200424. |
Microfluidic ChipShop. Microfluidic product catalogue. Mar. 2005. |
Microfluidic ChipShop. Microfluidic product catalogue. Oct. 2009. |
Mirzabekov, “DNA Sequencing by Hybridization—a Megasequencing Method and a Diagnostic Tool?” Trends in Biotechnology 12(1): 27-32 (1994). |
Moore, J.L. et al., “Behavior of capillary valves in centrifugal microfluidic devices prepared by three-dimensional printing” Microfluid Nanofluid (2011) 10:877-888. |
Mouritzen et al., Single nucleotide polymorphism genotyping using locked nucleic acid (LNa). Expert Rev Mol Diagn. Jan. 2003;3(1):27-38. |
Nagashima, S. et al. “Preparation of monodisperse poly(acrylamide-co-acrylic acid) hydrogel microspheres by a membrane emulsification technique and their size dependent surface properties” Colloids and Surfaces: B Biointerfaces (1998) 11:47-56. |
Navin, N. E. “The first five years of single-cell cancer genomics and beyond” Genome Res. (2015) 25:1499-1507. |
Nguyen, et al. In situ hybridization to chromosomes stabilized in gel microdrops. Cytometry. 1995; 21:111-119. |
Novak, R. et al., “Single cell multiplex gene detection and sequencing using microfluidicallygenerated agarose emulsions” Angew. Chem. Int. Ed. Engl. (2011) 50(2):390-395. |
Oberholzer, et al. Polymerase chain reaction in liposomes. Chem Biol. Oct. 1995;2(10):677-82. |
Ogawa, et al. Production and characterization of O-W emulsions containing cationic droplets stabilized by lecithin-chitosan membranes. J Agric Food Chem. Apr. 23, 2003;51(9):2806-12. |
Okushima, “Controlled production of monodisperse double emulsions by two-step droplet breakup in microfluidic devices”, Langmuir, 20:9905-9908 (2004). |
Perez, C., et al., “Poly(lactic acid)-poly(ethylene glycol) Nanoparticles as new carriers for the delivery ofplasmid DNa,” Journal of Controlled Release, vol. 75, pp. 211-224 (2001). |
Peters et al., “Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells,” Nature, Jul. 12, 2012, vol. 487, pp. 190-195. |
Pinto, et al. Functional impact of global rare copy number variation in autism spectrum disorders. Nature. Jul. 15, 2010;466(7304):368-72. doi: 10.1038-nature09146. Epub Jun. 9, 2010. |
Plunkett, et al. Chymotrypsin responsive hydrogel: application of a disulfide exchange protocol for the preparation of methacrylamide containing peptides. Biomacromolecules. Mar.-Apr. 2005;6(2):632-7. |
Ropers. New perspectives for the elucidation of genetic disorders. Am J Hum Genet. Aug. 2007;81(2):199-207. Epub Jun. 29, 2007. |
Rotem, A. et al. “Single Cell Chip-Seq Using Drop-Based Microfluidics” Abstract #50. Frontiers of Single Cell Analysis, Stanford University Sep. 5-7, 2013. |
Rotem, A. et al., “High-throughput single-cell labeling (Hi-SCL) for RNA-Seq using drop-based microfluidics” PLOS One (May 22, 2015) 0116328 (14 pages). |
Ryan, et al. Rapid assay for mycobacterial growth and antibiotic susceptibility using gel microdrop encapsulation. J Clin Microbiol. Jul. 1995;33(7):1720-6. |
Schirinzi et al., Combinatorial sequencing-by-hybridization: analysis of the NFI gene. Genet Test. 2006 Spring;10(1):8-17. |
Schmitt, “Bead-based multiplex genotyping of human papillomaviruses”, J. Clinical Microbiol., 44:2 504-512 (2006). |
Sebat, et al. Strong association of de novo copy number mutations with autism. Science. Apr. 20, 2007;316(5823):445-9. Epub Mar. 15, 2007. |
Seiffert, S. et al., “Smart microgel capsules from macromolecular precursors” J. Am. Chem. Soc. (2010) 132:6606-6609. |
Shah, “Fabrication of mono disperse thermosensitive microgels and gel capsules in micro fluidic devices”, Soft Matter, 4:2303-2309 (2008). |
Shimkus et al. “A chemically cleavable biotinylated nucleotide: Usefulness in the recovery of protein-DNA complexes from avidin affinity columns” PNAS (1985) 82:2593-2597. |
Shlien, et al. Copy number variations and cancer. Genome Med. Jun. 16, 2009;1(6):62. doi: 10.1186-gm62. |
Shlien, et al. Excessive genomic DNA copy number variation in the Li-Fraumeni cancer predisposition syndrome. Proc Natl Acad Sci U S A. Aug. 12, 2008;105(32):11264-9. doi: 10.1073-pnas.0802970105. Epub Aug. 6, 2008. |
Simeonov et al., Single nucleotide polymorphism genotyping using short, fluorescently labeled locked nucleic acid (LNa) probes and fluorescence polarization detection. Nucleic Acids Res. Sep. 1, 2002;30(17):e91. |
Sorokin et al., Discrimination between perfect and mismatched duplexes with oligonucleotide gel microchips: role of thermodyNamic and kinetic effects during hybridization. J Biomol Struct Dyn. Jun. 2005;22(6):725-34. |
Su, et al., Microfluidics-Based Biochips: Technology Issues, Implementation Platforms, and Design-Automation Challenges. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 2006;25(2):211-23. (Feb. 2006). |
Sun et al., Progress in research and application of liquid-phase chip technology. Chinese Journal Experimental Surgery. May 2005;22(5):639-40. |
Tawfik, D.S. et al. “Man-made cell-like compartments for molecular evolution” Nature Biotech (Jul. 1998) 16:652-656. |
Tewhey, R. et al., “Microdroplet-based PCR enrichment for large-scale targeted sequencing” Nature Biotech. (2009) 27(11):1025-1031 and Online Methods (11 pages). |
Theberge, A.B, et al. Microdropelts in microfluidics: an evolving platform for discoveries in chemsitry and biology. Angew Chem Int Ed Engl. Aug. 9, 2010;49(34):5846-68. doi: 10.1002-anie.200906653. |
Tonelli, C. et al., “Perfluoropolyether functional oligomers: unusual reactivity in organic chemistry” J. Fluorine Chem. (2002) 118:107-121. |
Tubeleviciute, et al. Compartmentalized self-replication (CSR) selection of Thermococcus litoralis Sh1B DNa polymerase for diminished uracil binding. Protein Eng Des Sel. Aug. 2010;23(8):589-97. doi: 10.1093-protein-gzq032. Epub May 31, 2010. |
Turner, et al. “Methods for genomic partitioning” Annu Rev Genomics Human Genet. (2009) 10:263-284. doi: 10.1146-annurev-genom-082908-150112. Review. |
Wagner, O et al., “Biocompatible fluorinated polyglycerols for droplet microfluidics as an alternative to PEG-based copolymer surfactants” Lab Chip DOI:10.1039-C5LC00823A. 2015. |
Wang et al., Single nucleotide polymorphism discrimination assisted by improved base stacking hybridization using oligonucleotide microarrays. Biotechniques. 2003;35:300-08. |
Wang, et al. A novel thermo-induced self-bursting microcapsule with magnetic-targeting property. Chemphyschem. Oct. 5, 2009;10(14):2405-9. |
Wang, et al. Digital karyotyping. Proc Natl Acad Sci U S A. Dec. 10, 2002;99(25):16156-61. Epub Dec. 2, 2002. |
Weaver, J.C. et al. “Rapid clonal growth measurements at the single-cell level: gel microdroplets and flow cytometry”, Biotechnology, 9:873-877 (1991). |
Whitesides, “Soft lithography in biology and biochemistry”, Annual Review of Biomedical Engineering, 3:335-373 (2001). |
Williams, R. et al. “Amplification of complex gene libraries by emulsion PCR” Nature Methods (Jul. 2006) 3(7):545-550. |
Woo, et al. G-C-modified oligodeoxynucleotides with selective complementarity: synthesis and hybridization properties. Nucleic Acids Res. Jul. 1, 1996;24(13):2470-5. |
Xia, “Soft lithography”, Annual Review of Material Science, 28: 153-184 (1998). |
Yamamoto, et al. Chemical modification of Ce(IV)-EDTA-base artificial restriction DNa cutter for versatile manipulation of doulbe-stranded DNa. Nucleic Acids Research. 2007; 35(7):e53. |
Zhang, “Combinatorial marking of cells and organelles with reconstituted fluorescent proteins”, Cell, 119:137-144 (Oct. 1, 2004). |
Zhang, et al. Degradable disulfide core-cross-linked micelles as a drug delivery system prepared from vinyl functioNalized nucleosides via the RAFT process. Biomacromolecules. Nov. 2008;9(11):3321-31. doi: 10.1021-bm800867n. Epub Oct. 9, 2008. |
Zhao, J., et al., “Preparation of hemoglobin-loaded Nano-sized particles with porous structure as oxygen carriers,” Biomaterials, vol. 28, pp. 1414-1422 (2007). |
Zhu, S. et al., “Synthesis and self-assembly of highly incompatible polybutadienepoly(hexafluoropropoylene oxide) diblock copolymers” J. Polym. Sci. (2005) 43:3685-3694. |
Zimmermann et at., Microscale production of hybridomas by hypo-osmolar electrofusion. Human Antibodies Hybridomas. Jan. 1992;3(1): 14-8. |
Zong, C. et al. “Genome-wide detection of single-nucleotide and copy-number variations of a single human cell” Science. Dec. 21, 2012;338(6114):1622-6. doi: 10.1126-science.1229164. |
Number | Date | Country | |
---|---|---|---|
20160203196 A1 | Jul 2016 | US |
Number | Date | Country | |
---|---|---|---|
62120873 | Feb 2015 | US | |
62102926 | Jan 2015 | US |