Analysis methods

Information

  • Patent Grant
  • 10370710
  • Patent Number
    10,370,710
  • Date Filed
    Monday, November 20, 2017
    7 years ago
  • Date Issued
    Tuesday, August 6, 2019
    5 years ago
Abstract
The invention generally relates to methods for analyzing nucleic acids to identify novel mutations associated with diseases. In certain embodiments, methods of the invention involve obtaining nucleic acid from a subject having a disease, identifying at least one mutation in the nucleic acid, and comparing the mutation to a database of mutations known to be associated with the disease, wherein mutations that do not match to the database are identified as novel mutations.
Description
FIELD OF THE INVENTION

The invention generally relates to methods for analyzing nucleic acids to identify novel mutations associated with diseases.


BACKGROUND

All genetic diseases are associated with some form of genomic instability. Abnormalities can range from a discrete mutation in a single base in the DNA of a single gene to a gross chromosome abnormality involving the addition or subtraction of an entire chromosome or set of chromosomes. Being able to identify the genetic abnormalities associated with a particular disease provides a mechanism by which one can diagnosis a subject as having the disease.


SUMMARY

The invention generally relates to methods for analyzing nucleic acids to identify novel mutations associated with diseases. Methods of the invention involve obtaining nucleic acid from a subject having a disease, identifying at least one mutation in the nucleic acid, and comparing the mutation to a database of mutations known to be associated with the disease, wherein mutations that do not match to the database are identified as novel mutations.


Numerous methods of identifying mutations in nucleic acids are known by those of skill in the art and any of those methods may be used with methods of the invention. In certain embodiments, identifying a mutation in a nucleic acid from a sample involves sequencing the nucleic acid, and comparing the sequence of the nucleic acid from the sample to a reference sequence. Any sequencing technique known in the art may be used, such as sequencing-by-synthesis and more particularly single molecule sequencing-by-synthesis. The reference sequence may be a consensus human sequence or a sequence from a non-diseased sample.


Certain aspects of the invention are especially amenable for implementation using a computer. Such systems generally include a central processing unit (CPU) and storage coupled to the CPU. The storage stores instructions that when executed by the CPU, cause the CPU execute the method steps described above and throughout the present application.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is an illustration of EIR for a simple homopolymeric sequence.



FIG. 2 is an illustration of the CFTR exon 10 5′ boundary (hg18).



FIG. 3 illustrates a system for performing methods of the invention.





DETAILED DESCRIPTION

The invention generally relates to methods for analyzing nucleic acids to identify novel mutations associated with diseases. Methods of the invention involve obtaining nucleic acid from a subject having a disease, identifying at least one mutation in the nucleic acid, and comparing the mutation to a database of mutations known to be associated with the disease, wherein mutations that do not match to the database are identified as novel mutations.


Obtaining a Tissue Sample and Extraction of nucleic acid


Methods of the invention involve obtaining a sample, e.g., tissue, blood, bone, that is suspected to be associated with a disease. Such samples may include tissue from brain, kidney, liver, pancreas, bone, skin, eye, muscle, intestine, ovary, prostate, vagina, cervix, uterus, esophagus, stomach, bone marrow, lymph node, and blood. Once the sample is obtained, nucleic acids are extracted.


Nucleic acids may be obtained by methods known in the art. Generally, nucleic acids can be extracted from a biological sample by a variety of techniques such as those described by Maniatis, et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, N.Y., pp. 280-281, (1982), the contents of which is incorporated by reference herein in its entirety.


It may be necessary to first prepare an extract of the cell and then perform further steps—i.e., differential precipitation, column chromatography, extraction with organic solvents and the like—in order to obtain a sufficiently pure preparation of nucleic acid. Extracts may be prepared using standard techniques in the art, for example, by chemical or mechanical lysis of the cell. Extracts then may be further treated, for example, by filtration and/or centrifugation and/or with chaotropic salts such as guanidinium isothiocyanate or urea or with organic solvents such as phenol and/or HCCl3 to denature any contaminating and potentially interfering proteins.


Capture of target sequences


Any method known in the art for capturing target sequences may be used with methods of the invention. In certain embodiments, an oligonucleotide-driven annealing reaction is performed between genomic DNA and target-specific probes to form open loop complexes, where the target sequence is flanked by the ends of each oligo. Then, polymerase and ligase enzymes are added to fill and seal the gap between the two oligonucleotide probe ends, forming a covalently-closed circular molecule that contains the target sequence. Finally, an exonuclease mix is added to degrade any non-circular DNA (un-reacted probe, genomic DNA). What remains is circular DNA containing the set of targets captured by the reaction. Further details are provided for example in the following U.S. Pat. Nos. 5,866,337; 7,790,388; 6,858,412; 7,993,880; 7,700,323; 6,558,928; 6,235,472; 7,320,860; 7,351,528; 7,074,564; 5,871,921; 7,510,829; 7,862,999; and 7,883,849, the content of each of which is incorporated by reference herein in its entirety.


Barcode Sequences


In certain embodiments, at least one barcode sequence is attached to or incorporated into a nucleic acid template prior to sequencing. Strategies for barcoding nucleic acid templates are described for example in Porreca et al. (U.S. patent application Ser. No. 13/081,660) and Umbarger et al. (U.S. patent application Ser. No. 13/081,660), the content of each of which is incorporated by reference herein in its entirety. In embodiments that use more than one barcode, the barcode sequences may be attached to the template such that a first barcode sequence is attached to a 5′ end of the template and a second barcode sequence is attached to a 3′ end of the template. The first and second barcode sequences may be the same, or they may be different. Barcode sequence may be incorporated into a contiguous region of a template that includes the target to be sequenced.


Exemplary methods for designing sets of barcode sequences and other methods for attaching barcode sequences are shown in U.S. Pat. Nos. 6,138,077; 6,352,828; 5,636,400; 6,172,214; 6235,475; 7,393,665; 7,544,473; 5,846,719; 5,695,934; 5,604,097; 6,150,516; RE39,793; 7,537,897; 6172,218; and 5,863,722, the content of each of which is incorporated by reference herein in its entirety.


The barcode sequence generally includes certain features that make the sequence useful in sequencing reactions. For example the barcode sequences can be designed to have minimal or no homopolymer regions, i.e., 2 or more of the same base in a row such as AA or CCC, within the barcode sequence. The barcode sequences can also be designed so that they do not overlap the target region to be sequence or contain a sequence that is identical to the target.


The first and second barcode sequences are designed such that each pair of sequences is correlated to a particular sample, allowing samples to be distinguished and validated. Methods of designing sets of barcode sequences is shown for example in Brenner et al. (U.S. Pat. No. 6,235,475), the contents of which are incorporated by reference herein in their entirety. In certain embodiments, the barcode sequences range from about 2 nucleotides to about 50; and preferably from about 4 to about 20 nucleotides. Since the barcode sequence is sequenced along with the template nucleic acid or may be sequenced in a separate read, the oligonucleotide length should be of minimal length so as to permit the longest read from the template nucleic acid attached. Generally, the barcode sequences are spaced from the template nucleic acid molecule by at least one base.


Methods of the invention involve attaching the barcode sequences to the template nucleic acids. Template nucleic acids are able to be fragmented or sheared to desired length, e.g. generally from 100 to 500 bases or longer, using a variety of mechanical, chemical and/or enzymatic methods. DNA may be randomly sheared via sonication, exposed to a DNase or one or more restriction enzymes, a transposase, or nicking enzyme. RNA may be fragmented by brief exposure to an RNase, heat plus magnesium, or by shearing. The RNA may be converted to cDNA before or after fragmentation.


Barcode sequence is integrated with template using methods known in the art. Barcode sequence is integrated with template using, for example, a ligase, a polymerase, Topo cloning (e.g., Invitrogen's topoisomerase vector cloning system using a topoisomerase enzyme), or chemical ligation or conjugation. The ligase may be any enzyme capable of ligating an oligonucleotide (RNA or DNA) to the template nucleic acid molecule. Suitable ligases include T4 DNA ligase and T4 RNA ligase (such ligases are available commercially, from New England Biolabs). Methods for using ligases are well known in the art. The polymerase may be any enzyme capable of adding nucleotides to the 3′ and the 5′ terminus of template nucleic acid molecules. Barcode sequence can be incorporated via a PCR reaction as part of the PCR primer.


The ligation may be blunt ended or via use of over hanging ends. In certain embodiments, following fragmentation, the ends of the fragments may be repaired, trimmed (e.g. using an exonuclease), or filled (e.g., using a polymerase and dNTPs), to form blunt ends. Upon generating blunt ends, the ends may be treated with a polymerase and dATP to form a template independent addition to the 3′-end and the 5-end of the fragments, thus producing a single A overhanging. This single A is used to guide ligation of fragments with a single T overhanging from the 5′-end in a method referred to as T-A cloning.


Alternatively, because the possible combination of overhangs left by the restriction enzymes are known after a restriction digestion, the ends may be left as is, i.e., ragged ends. In certain embodiments double stranded oligonucleotides with complementary over hanging ends are used.


Sequencing


Sequencing may be by any method known in the art. DNA sequencing techniques include classic dideoxy sequencing reactions (Sanger method) using labeled terminators or primers and gel separation in slab or capillary, sequencing by synthesis using reversibly terminated labeled nucleotides, pyrosequencing, 454 sequencing, allele specific hybridization to a library of labeled oligonucleotide probes, sequencing by synthesis using allele specific hybridization to a library of labeled clones that is followed by ligation, real time monitoring of the incorporation of labeled nucleotides during a polymerization step, polony sequencing, and SOLiD sequencing. Sequencing of separated molecules has more recently been demonstrated by sequential or single extension reactions using polymerases or ligases as well as by single or sequential differential hybridizations with libraries of probes.


A sequencing technique that can be used in the methods of the provided invention includes, for example, Helicos True Single Molecule Sequencing (tSMS) (Harris T. D. et al. (2008) Science 320:106-109). In the tSMS technique, a DNA sample is cleaved into strands of approximately 100 to 200 nucleotides, and a polyA sequence is added to the 3′ end of each DNA strand. Each strand is labeled by the addition of a fluorescently labeled adenosine nucleotide. The DNA strands are then hybridized to a flow cell, which contains millions of oligo-T capture sites that are immobilized to the flow cell surface. The templates can be at a density of about 100 million templates/cm2. The flow cell is then loaded into an instrument, e.g., HeliScope™ sequencer, and a laser illuminates the surface of the flow cell, revealing the position of each template. A CCD camera can map the position of the templates on the flow cell surface. The template fluorescent label is then cleaved and washed away. The sequencing reaction begins by introducing a DNA polymerase and a fluorescently labeled nucleotide. The oligo-T nucleic acid serves as a primer. The polymerase incorporates the labeled nucleotides to the primer in a template directed manner. The polymerase and unincorporated nucleotides are removed. The templates that have directed incorporation of the fluorescently labeled nucleotide are detected by imaging the flow cell surface. After imaging, a cleavage step removes the fluorescent label, and the process is repeated with other fluorescently labeled nucleotides until the desired read length is achieved. Sequence information is collected with each nucleotide addition step. Further description of tSMS is shown for example in Lapidus et al. (U.S. Pat. No. 7,169,560), Lapidus et al. (U.S. patent application number 2009/0191565), Quake et al. (U.S. Pat. No. 6,818,395), Harris (U.S. Pat. No. 7,282,337), Quake et al. (U.S. patent application number 2002/0164629), and Braslaysky, et al., PNAS (USA), 100: 3960-3964 (2003), the contents of each of these references is incorporated by reference herein in its entirety.


Another example of a DNA sequencing technique that can be used in the methods of the provided invention is 454 sequencing (Roche) (Margulies, M et al. 2005, Nature, 437, 376-380). 454 sequencing involves two steps. In the first step, DNA is sheared into fragments of approximately 300-800 base pairs, and the fragments are blunt ended. Oligonucleotide adaptors are then ligated to the ends of the fragments. The adaptors serve as primers for amplification and sequencing of the fragments. The fragments can be attached to DNA capture beads, e.g., streptavidin-coated beads using, e.g., Adaptor B, which contains 5′-biotin tag. The fragments attached to the beads are PCR amplified within droplets of an oil-water emulsion. The result is multiple copies of clonally amplified DNA fragments on each bead. In the second step, the beads are captured in wells (pico-liter sized). Pyrosequencing is performed on each DNA fragment in parallel. Addition of one or more nucleotides generates a light signal that is recorded by a CCD camera in a sequencing instrument. The signal strength is proportional to the number of nucleotides incorporated. Pyrosequencing makes use of pyrophosphate (PPi) which is released upon nucleotide addition. PPi is converted to ATP by ATP sulfurylase in the presence of adenosine 5′ phosphosulfate. Luciferase uses ATP to convert luciferin to oxyluciferin, and this reaction generates light that is detected and analyzed.


Another example of a DNA sequencing technique that can be used in the methods of the provided invention is SOLiD technology (Applied Biosystems). In SOLiD sequencing, genomic DNA is sheared into fragments, and adaptors are attached to the 5′ and 3′ ends of the fragments to generate a fragment library. Alternatively, internal adaptors can be introduced by ligating adaptors to the 5′ and 3′ ends of the fragments, circularizing the fragments, digesting the circularized fragment to generate an internal adaptor, and attaching adaptors to the 5′ and 3′ ends of the resulting fragments to generate a mate-paired library. Next, clonal bead populations are prepared in microreactors containing beads, primers, template, and PCR components. Following PCR, the templates are denatured and beads are enriched to separate the beads with extended templates. Templates on the selected beads are subjected to a 3′ modification that permits bonding to a glass slide. The sequence can be determined by sequential hybridization and ligation of partially random oligonucleotides with a central determined base (or pair of bases) that is identified by a specific fluorophore. After a color is recorded, the ligated oligonucleotide is cleaved and removed and the process is then repeated.


Another example of a DNA sequencing technique that can be used in the methods of the provided invention is Ion Torrent sequencing (U.S. patent application numbers 2009/0026082, 2009/0127589, 2010/0035252, 2010/0137143, 2010/0188073, 2010/0197507, 2010/0282617, 2010/0300559), 2010/0300895, 2010/0301398, and 2010/0304982), the content of each of which is incorporated by reference herein in its entirety. In Ion Torrent sequencing, DNA is sheared into fragments of approximately 300-800 base pairs, and the fragments are blunt ended. Oligonucleotide adaptors are then ligated to the ends of the fragments. The adaptors serve as primers for amplification and sequencing of the fragments. The fragments can be attached to a surface and is attached at a resolution such that the fragments are individually resolvable. Addition of one or more nucleotides releases a proton (H+), which signal detected and recorded in a sequencing instrument. The signal strength is proportional to the number of nucleotides incorporated.


Another example of a sequencing technology that can be used in the methods of the provided invention is Illumina sequencing. Illumina sequencing is based on the amplification of DNA on a solid surface using fold-back PCR and anchored primers. Genomic DNA is fragmented, and adapters are added to the 5′ and 3′ ends of the fragments. DNA fragments that are attached to the surface of flow cell channels are extended and bridge amplified. The fragments become double stranded, and the double stranded molecules are denatured. Multiple cycles of the solid-phase amplification followed by denaturation can create several million clusters of approximately 1,000 copies of single-stranded DNA molecules of the same template in each channel of the flow cell. Primers, DNA polymerase and four fluorophore-labeled, reversibly terminating nucleotides are used to perform sequential sequencing. After nucleotide incorporation, a laser is used to excite the fluorophores, and an image is captured and the identity of the first base is recorded. The 3′ terminators and fluorophores from each incorporated base are removed and the incorporation, detection and identification steps are repeated.


Another example of a sequencing technology that can be used in the methods of the provided invention includes the single molecule, real-time (SMRT) technology of Pacific Biosciences. In SMRT, each of the four DNA bases is attached to one of four different fluorescent dyes. These dyes are phospholinked. A single DNA polymerase is immobilized with a single molecule of template single stranded DNA at the bottom of a zero-mode waveguide (ZMW). A ZMW is a confinement structure which enables observation of incorporation of a single nucleotide by DNA polymerase against the background of fluorescent nucleotides that rapidly diffuse in an out of the ZMW (in microseconds). It takes several milliseconds to incorporate a nucleotide into a growing strand. During this time, the fluorescent label is excited and produces a fluorescent signal, and the fluorescent tag is cleaved off. Detection of the corresponding fluorescence of the dye indicates which base was incorporated. The process is repeated.


Another example of a sequencing technique that can be used in the methods of the provided invention is nanopore sequencing (Soni G V and Meller A. (2007) Clin Chem 53: 1996-2001). A nanopore is a small hole, of the order of 1 nanometer in diameter. Immersion of a nanopore in a conducting fluid and application of a potential across it results in a slight electrical current due to conduction of ions through the nanopore. The amount of current which flows is sensitive to the size of the nanopore. As a DNA molecule passes through a nanopore, each nucleotide on the DNA molecule obstructs the nanopore to a different degree. Thus, the change in the current passing through the nanopore as the DNA molecule passes through the nanopore represents a reading of the DNA sequence.


Another example of a sequencing technique that can be used in the methods of the provided invention involves using a chemical-sensitive field effect transistor (chemFET) array to sequence DNA (for example, as described in US Patent Application Publication No. 20090026082). In one example of the technique, DNA molecules can be placed into reaction chambers, and the template molecules can be hybridized to a sequencing primer bound to a polymerase. Incorporation of one or more triphosphates into a new nucleic acid strand at the 3′ end of the sequencing primer can be detected by a change in current by a chemFET. An array can have multiple chemFET sensors. In another example, single nucleic acids can be attached to beads, and the nucleic acids can be amplified on the bead, and the individual beads can be transferred to individual reaction chambers on a chemFET array, with each chamber having a chemFET sensor, and the nucleic acids can be sequenced.


Another example of a sequencing technique that can be used in the methods of the provided invention involves using a electron microscope (Moudrianakis E. N. and Beer M. Proc Natl Acad Sci USA. 1965 March; 53:564-71). In one example of the technique, individual DNA molecules are labeled using metallic labels that are distinguishable using an electron microscope. These molecules are then stretched on a flat surface and imaged using an electron microscope to measure sequences.


Analysis


Alignment and/or compilation of sequence results obtained from the image stacks produced as generally described above utilizes look-up tables that take into account possible sequences changes (due, e.g., to errors, mutations, etc.). Essentially, sequencing results obtained as described herein are compared to a look-up type table that contains all possible reference sequences plus 1 or 2 base errors. Sequence alignment algorithms and methods are described for example in U.S. Pat. No. 8,209,130, the content of which is incorporated by reference herein in its entirety.


In some embodiments, de novo assembly proceeds according to so-called greedy algorithms. For assembly according to greedy algorithms, one of the reads of a group of reads is selected, and it is paired with another read with which it exhibits a substantial amount of overlap—generally it is paired with the read with which it exhibits the most overlap of all of the other reads. Those two reads are merged to form a new read sequence, which is then put back in the group of reads and the process is repeated. Assembly according to a greedy algorithm is described, for example, in Schatz, et al., Genome Res., 20:1165-1173 (2010) and U.S. Pub. 2011/0257889, each of which is hereby incorporated by reference in its entirety.


In other embodiments, assembly proceeds by pairwise alignment, for example, exhaustive or heuristic (e.g., not exhaustive) pairwise alignment. Exhaustive pairwise alignment, sometimes called a “brute force” approach, calculates an alignment score for every possible alignment between every possible pair of sequences among a set. Assembly by heuristic multiple sequence alignment ignores certain mathematically unlikely combinations and can be computationally faster. One heuristic method of assembly by multiple sequence alignment is the so-called “divide-and-conquer” heuristic, which is described, for example, in U.S. Pub. 2003/0224384. Another heuristic method of assembly by multiple sequence alignment is progressive alignment, as implemented by the program ClustalW (see, e.g., Thompson, et al., Nucl. Acids. Res., 22:4673-80 (1994)). Assembly by multiple sequence alignment in general is discussed in Lecompte, O, et al., Gene 270:17-30 (2001); Mullan, L. J., Brief Bioinform., 3:303-5 (2002); Nicholas, H. B. Jr., et al., Biotechniques 32:572-91(2002); and Xiong, G., Essential Bioinformatics, 2006, Cambridge University Press, New York, N.Y.


An alignment according to the invention can be performed using any suitable computer program known in the art.


One exemplary alignment program, which implements a BWT approach, is Burrows-Wheeler Aligner (BWA) available from the SourceForge web site maintained by Geeknet (Fairfax, Va.). BWA can align reads, contigs, or consensus sequences to a reference. BWT occupies 2 bits of memory per nucleotide, making it possible to index nucleotide sequences as long as 4G base pairs with a typical desktop or laptop computer. The pre-processing includes the construction of BWT (i.e., indexing the reference) and the supporting auxiliary data structures.


BWA implements two different algorithms, both based on BWT. Alignment by BWA can proceed using the algorithm bwa-short, designed for short queries up to .about.200 bp with low error rate (<3%) (Li H. and Durbin R. Bioinformatics, 25:1754-60 (2009)). The second algorithm, BWA-SW, is designed for long reads with more errors (Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics, Epub.). The BWA-SW component performs heuristic Smith-Waterman-like alignment to find high-scoring local hits. One skilled in the art will recognize that bwa-sw is sometimes referred to as “bwa-long”, “bwa long algorithm”, or similar. Such usage generally refers to BWA-SW.


An alignment program that implements a version of the Smith-Waterman algorithm is MUMmer, available from the SourceForge web site maintained by Geeknet (Fairfax, Va.). MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form (Kurtz, S., et al., Genome Biology, 5:R12 (2004); Delcher, A. L., et al., Nucl. Acids Res., 27:11 (1999)). For example, MUMmer 3.0 can find all 20-basepair or longer exact matches between a pair of 5-megabase genomes in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer. MUMmer can also align incomplete genomes; it can easily handle the 100s or 1000s of contigs from a shotgun sequencing project, and will align them to another set of contigs or a genome using the NUCmer program included with the system. If the species are too divergent for a DNA sequence alignment to detect similarity, then the PROmer program can generate alignments based upon the six-frame translations of both input sequences.


Another exemplary alignment program according to embodiments of the invention is BLAT from Kent Informatics (Santa Cruz, Calif.) (Kent, W. J., Genome Research 4: 656-664 (2002)). BLAT (which is not BLAST) keeps an index of the reference genome in memory such as RAM. The index includes of all non-overlapping k-mers (except optionally for those heavily involved in repeats), where k=11 by default. The genome itself is not kept in memory. The index is used to find areas of probable homology, which are then loaded into memory for a detailed alignment.


Another alignment program is SOAP2, from Beijing Genomics Institute (Beijing, CN) or BGI Americas Corporation (Cambridge, Mass.). SOAP2 implements a 2-way BWT (Li et al., Bioinformatics 25(15):1966-67 (2009); Li, et al., Bioinformatics 24(5):713-14 (2008)).


Another program for aligning sequences is Bowtie (Langmead, et al., Genome Biology, 10:R25 (2009)). Bowtie indexes reference genomes by making a BWT.


Other exemplary alignment programs include: Efficient Large-Scale Alignment of Nucleotide Databases (ELAND) or the ELANDv2 component of the Consensus Assessment of Sequence and Variation (CASAVA) software (Illumina, San Diego, Calif.); RTG Investigator from Real Time Genomics, Inc. (San Francisco, Calif.); Novoalign from Novocraft (Selangor, Malaysia); Exonerate, European Bioinformatics Institute (Hinxton, UK) (Slater, G., and Birney, E., BMC Bioinformatics 6:31(2005)), Clustal Omega, from University College Dublin (Dublin, Ireland) (Sievers F., et al., Mol Syst Biol 7, article 539 (2011)); ClustalW or ClustalX from University College Dublin (Dublin, Ireland) (Larkin M. A., et al., Bioinformatics, 23, 2947-2948 (2007)); and FASTA, European Bioinformatics Institute (Hinxton, UK) (Pearson W. R., et al., PNAS 85(8):2444-8 (1988); Lipman, D. J., Science 227(4693):1435-41 (1985)).


Once the mutations in the nucleic acid sequence from the sample are determined, those mutations are compared to a database(s) of known mutations associated with the particular disease. Such databases are publically available and known to those of skill in the art. Mutations that do not match to the database are identified as novel mutations.


Novel insertions and deletion variants present a particular challenge for high-throughput sequencing technologies. Aligned reads with coordinate-altering variants require the use of penalized gaps in either the query or reference sequence to maintain global coordinate order. Extended gaps tend to reduce overall mappability leading to false negative insertions and deletions. Gaps are often inserted at the ends of reads to artificially maintain optimality leading to false positive insertion, deletion, and substitution variants. Realignment improves sensitivity (of insertions/deletions) and specificity (of substitutions); however, these techniques often use Smith-Waterman alignment algorithms without gaps. Without penalizing gaps FP insertions and deletions often result.


An additional complication results from the sequence context where the majority of insertions and deletion variants are found. Small insertions and deletions (less than 100 bp) commonly occur within tandem repeats where polymerase slippage or intra-chromosomal recombination leads to nucleotide expansion or contraction. Relative to the original (or reference) genome, the consequence of these processes appear as insertions or deletions, respectively. Insertions and deletions within tandem repeats are spatially ambiguous, that is, they may not be faithfully represented using a single genomic coordinate (FIG. 1). It is necessary to calculate the variant's equivalent insertion/deletion region (EIR) which is essentially the contiguous block of DNA representing its associated tandem repeat. It is important to note that alignment algorithms arbitrarily assign variant positions within EIRs.


Due to the biological mechanisms mentioned above, naturally occurring insertion and deletion mutations tend to occur as tandem repeats (i.e., within EIRs) much more often than would be expected by chance. This fact can be exploited to distinguish true variants from false positions. For example, within capture regions of capture probes, 13 (21%) and 53 (100%) of dbSNP insertion and deletion variants, respectively, have EIRs within lengths greater than one. Thus, known insertions and deletions are strongly associated with tandem repeats. Appropriate probability-based scores can be used to measure the mutual dependence between these two variables and reduce uncertainty about whether a caller variant represents a true position or a false positive. For example:







p


(

deletion

repeat

)


=



p


(

repeat

deletion

)




p


(
deletion
)




p


(
repeat
)








where p(repeat|deletion) is the likelihood of a repeat given a deletion (in the example above, this value equals 1.0), p(deletion) is the prior probability of a deletion in the absence of additional evidence, and p(repeat) is a normalization factor that accounts for local variability in sequence repetitiveness (the latter two values depend on the specific genomic regions under consideration). It is likely that probabilities would be calculated separately for different sized variants. In combination with other pieces of evidence, such as genotype qualities, a sample lookup table would provide additional confidence in any particular variant call given its presence in a repetitive region.


Once a particular insertion/deletion variant is determined to be real, the EIR required further to determine its precise functional or clinical significance. This is illustrated with reference to FIG. 2. Consider a scenario of a three base pair homopolymeric repeat (GGG), that partially overlaps the exon boundary and its associate splice site (chr7:116975929-116975930). Depending on its size, a deletion of one or more nucleotides from within this repeat may be reported by detection algorithms at any of three equivalent positions (chr7:116975929-11697931) within the EIR chr7:116975929-chr7:116975932; however, in this particular case, the functional annotation depends on the exact position of the variant. Translating genomic positions directly into their functional analogues would lead to a splice site annotation from chr7:116975929delG whereas the equivalent chr7:116975931delG is frame shift.


Consistent annotation requires implementing rules (or performing simulations) that consider insertion and deletion variants in both genomic and functional contexts. Taken together, the process of applying EIR-assisted confidence scores and functional annotations can be reduced to the following steps:

    • 1. Determine if the variant is known to be disease causing by consulting a relevant database(s);
    • 2. If the variant is not known to be disease causing then by definition it is novel. If the variant is a substitution, determine its clinical impact directly from its genomic coordinate. Otherwise calculate the equivalent insertion/deletion region (EIR) using methods described in Krawitz et al., 2010;
    • 3. If the variant EIR length is equal to one, use this information to assess the likelihood that the variant is a false positive (e.g., the result of a sequence artifact). If it is determined that the variant is real, continue to the next step, otherwise stop.
    • 4. Annotate the variant EIR with all proportional functional information.
    • 5. Attempt to push the variant completely out of the functional region by retrieving the extreme lower or upper position of the variant EIR. Choosing the correct extreme position depends on the orientation of the variant relative to its associated functional region or regions.
    • 6. If the variant can be pushed completely out of the functional region, don't report or report as being unknown or benign, otherwise determine the variant's clinical significance.


      Computers and Software


Other embodiments are within the scope and spirit of the invention. For example, due to the nature of software, functions described above can be implemented using software, hardware, firmware, hardwiring, or combinations of any of these. Features implementing functions can also be physically located at various positions, including being distributed such that portions of functions are implemented at different physical locations.


As one skilled in the art would recognize as necessary or best-suited for performance of the methods of the invention and sequence assembly in general, computer system 200 or machines of the invention include one or more processors (e.g., a central processing unit (CPU) a graphics processing unit (GPU) or both), a main memory and a static memory, which communicate with each other via a bus.


In an exemplary embodiment shown in FIG. 3, system 200 can include a sequencer 201 with data acquisition module 205 to obtain sequence read data. Sequencer 201 may optionally include or be operably coupled to its own, e.g., dedicated, sequencer computer 233 (including an input/output mechanism 237, one or more of processor 241 and memory 245). Additionally or alternatively, sequencer 201 may be operably coupled to a server 213 or computer 249 (e.g., laptop, desktop, or tablet) via network 209. Computer 249 includes one or more processor 259 and memory 263 as well as an input/output mechanism 254. Where methods of the invention employ a client/server architecture, an steps of methods of the invention may be performed using server 213, which includes one or more of processor 221 and memory 229, capable of obtaining data, instructions, etc., or providing results via interface module 225 or providing results as a file 217. Server 213 may be engaged over network 209 through computer 249 or terminal 267, or server 213 may be directly connected to terminal 267, including one or more processor 275 and memory 279, as well as input/output mechanism 271.


System 200 or machines according to the invention may further include, for any of I/O 249, 237, or 271 a video display unit (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)). Computer systems or machines according to the invention can also include an alphanumeric input device (e.g., a keyboard), a cursor control device (e.g., a mouse), a disk drive unit, a signal generation device (e.g., a speaker), a touchscreen, an accelerometer, a microphone, a cellular radio frequency antenna, and a network interface device, which can be, for example, a network interface card (NIC), Wi-Fi card, or cellular modem.


Memory 263, 245, 279, or 229 according to the invention can include a machine-readable medium on which is stored one or more sets of instructions (e.g., software) embodying any one or more of the methodologies or functions described herein. The software may also reside, completely or at least partially, within the main memory and/or within the processor during execution thereof by the computer system, the main memory and the processor also constituting machine-readable media.


The software may further be transmitted or received over a network via the network interface device.


While the machine-readable medium can in an exemplary embodiment be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories (e.g., subscriber identity module (SIM) card, secure digital card (SD card), micro SD card, or solid-state drive (SSD)), optical and magnetic media, and any other tangible storage media.


INCORPORATION BY REFERENCE

References and citations to other documents, such as patents, patent applications, patent publications, journals, books, papers, web contents, have been made throughout this disclosure. All such documents are hereby incorporated herein by reference in their entirety for all purposes.


EQUIVALENTS

The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting on the invention described herein.

Claims
  • 1. A method for identifying a novel mutation associated with a disease, the method comprising: obtaining nucleic acid from a subject having a disease;determining that the nucleic acid comprises a variant comprising an insertion or deletion;comparing the variant to a database of variants known to be associated with the disease, wherein a variant that does not match to the database is identified as a novel variant;determining a lower boundary and an upper boundary of an equivalent insertion/deletion region (EIR) of the novel variant;determining that part of the EIR falls within a functional region.
  • 2. The method according to claim 1, wherein determining that the nucleic acid comprises a variant comprising an insertion or deletion comprises: sequencing the nucleic acid; andcomparing the sequence of the nucleic acid to a reference sequence.
  • 3. The method according to claim 2, wherein sequencing is sequencing-by-synthesis.
  • 4. The method according to claim 3, wherein sequencing-by-synthesis is single molecule sequencing-by-synthesis.
  • 5. The method according to claim 2, wherein the reference sequence is a consensus human sequence or a sequence from a non-diseased sample.
  • 6. The method according to claim 1, wherein prior to determining that the nucleic acid comprises a variant comprising an insertion or deletion, the method further comprises attaching a barcode sequence to the nucleic acid.
  • 7. The method according to claim 1, wherein the disease is cystic fibrosis.
  • 8. The method according to claim 7, wherein the subject is Hispanic.
  • 9. The method according to claim 1, further comprising determining that the novel variant is causative of the disease by annotating the variant with functional information.
  • 10. A method for determining if a mutation is causative of a disease, the method comprising: conducting an assay to obtain a nucleic acid sequence from a subject having a disease;determining a presence of at least one novel variant comprising an insertion or deletion in the sequence;annotating the variant with appropriate functional information;identifying a lower and upper boundary of an equivalent insertion/deletion region (EIR) of the novel variant; anddetermining that part of the EIR falls within a functional region.
RELATED APPLICATIONS

The present application is a continuation of U.S. patent application Ser. No. 14/984,644, filed Dec. 30, 2015, which is continuation of U.S. patent application Ser. No. 13/616,788, filed Sep. 14, 2012, which claims the benefit of and priority to U.S. provisional application Ser. No. 61/548,073, filed Oct. 17, 2011, the contents of each of which are incorporated by reference.

US Referenced Citations (314)
Number Name Date Kind
4683195 Mullis et al. Jul 1987 A
4683202 Mullis Jul 1987 A
4988617 Landegren et al. Jan 1991 A
5060980 Johnson et al. Oct 1991 A
5210015 Gelfand et al. May 1993 A
5234809 Boom et al. Aug 1993 A
5242794 Whiteley et al. Sep 1993 A
5342328 Grossman et al. Aug 1994 A
5348853 Wang et al. Sep 1994 A
5459307 Klotz, Jr. Oct 1995 A
5486686 Zdybel, Jr. et al. Jan 1996 A
5494810 Barany et al. Feb 1996 A
5567583 Wang et al. Oct 1996 A
5583024 McElroy et al. Dec 1996 A
5604097 Brenner Feb 1997 A
5636400 Young Jun 1997 A
5674713 McElroy et al. Oct 1997 A
5695934 Brenner Dec 1997 A
5700673 McElroy et al. Dec 1997 A
5701256 Marr et al. Dec 1997 A
5830064 Bradish et al. Nov 1998 A
5846719 Brenner et al. Dec 1998 A
5863722 Brenner Jan 1999 A
5866337 Schon Feb 1999 A
5869252 Bouma et al. Feb 1999 A
5869717 Frame et al. Feb 1999 A
5871921 Landegren et al. Feb 1999 A
5888788 De Miniac Mar 1999 A
5942391 Zhang et al. Aug 1999 A
5971921 Timbel Oct 1999 A
5993611 Moroney, III et al. Nov 1999 A
5994056 Higuchi Nov 1999 A
6033854 Kurnit et al. Mar 2000 A
6033872 Bergsma et al. Mar 2000 A
6100099 Gordon et al. Aug 2000 A
6138077 Brenner Oct 2000 A
6150516 Brenner et al. Nov 2000 A
6171785 Higuchi Jan 2001 B1
6172214 Brenner Jan 2001 B1
6172218 Brenner Jan 2001 B1
6179819 Haswell Jan 2001 B1
6197508 Stanley Mar 2001 B1
6197574 Miyamoto et al. Mar 2001 B1
6210891 Nyren et al. Apr 2001 B1
6223128 Allex et al. Apr 2001 B1
6235472 Landegren et al. May 2001 B1
6235475 Brenner et al. May 2001 B1
6235501 Gautsch et al. May 2001 B1
6235502 Weissman et al. May 2001 B1
6258568 Nyren Jul 2001 B1
6274320 Rothberg et al. Aug 2001 B1
6306597 Macevicz Oct 2001 B1
6352828 Brenner Mar 2002 B1
6360235 Tilt et al. Mar 2002 B1
6361940 Van Ness et al. Mar 2002 B1
6403320 Read et al. Jun 2002 B1
6462254 Vernachio et al. Oct 2002 B1
6489105 Matlashewski et al. Dec 2002 B1
6558928 Landegren May 2003 B1
6569920 Wen et al. May 2003 B1
6582938 Su et al. Jun 2003 B1
6585938 Machida et al. Jul 2003 B1
6613516 Christians et al. Sep 2003 B1
6714874 Myers et al. Mar 2004 B1
6716580 Gold et al. Apr 2004 B2
6719449 Laugharn, Jr. et al. Apr 2004 B1
6818395 Quake et al. Nov 2004 B1
6828100 Ronaghi Dec 2004 B1
6833246 Balasubramanian Dec 2004 B2
6858412 Willis et al. Feb 2005 B2
6911345 Quake Jun 2005 B2
6913879 Schena Jul 2005 B1
6927024 Dodge et al. Aug 2005 B2
6941317 Chamberlin et al. Sep 2005 B1
6948843 Laugharn, Jr. et al. Sep 2005 B2
7034143 Preparata et al. Apr 2006 B1
7041481 Anderson et al. May 2006 B2
7049077 Yang May 2006 B2
7057026 Barnes et al. Jun 2006 B2
7071324 Preparata et al. Jul 2006 B2
7074564 Landegren Jul 2006 B2
7074586 Cheronis et al. Jul 2006 B1
7115400 Adessi et al. Oct 2006 B1
7169560 Lapidus et al. Jan 2007 B2
7211390 Rothberg et al. May 2007 B2
7232656 Balasubramanian et al. Jun 2007 B2
7244559 Rothberg et al. Jul 2007 B2
RE39793 Brenner Aug 2007 E
7264929 Rothberg et al. Sep 2007 B2
7282337 Harris Oct 2007 B1
7297518 Quake et al. Nov 2007 B2
7320860 Landegren et al. Jan 2008 B2
7323305 Leamon et al. Jan 2008 B2
7335762 Rothberg et al. Feb 2008 B2
7351528 Landegren Apr 2008 B2
7393665 Brenner Jul 2008 B2
7510829 Faham et al. Mar 2009 B2
7523117 Zhang et al. Apr 2009 B2
7537889 Sinha et al. May 2009 B2
7537897 Brenner et al. May 2009 B2
7544473 Brenner Jun 2009 B2
7582431 Drmanac et al. Sep 2009 B2
7598035 Macevicz Oct 2009 B2
7629151 Gold et al. Dec 2009 B2
7642056 Ahn et al. Jan 2010 B2
7666593 Lapidus Feb 2010 B2
7700323 Willis et al. Apr 2010 B2
7774962 Ladd Aug 2010 B1
7776616 Heath et al. Aug 2010 B2
RE41780 Anderson et al. Sep 2010 E
7790388 Landegren et al. Sep 2010 B2
7809509 Milosavljevic Oct 2010 B2
7835871 Kain et al. Nov 2010 B2
7862999 Zheng et al. Jan 2011 B2
7865534 Genstruct Jan 2011 B2
7883849 Dahl Feb 2011 B1
7957913 Chinitz et al. Jun 2011 B2
7960120 Rigatti et al. Jun 2011 B2
7985716 Yershov et al. Jul 2011 B2
7993880 Willis et al. Aug 2011 B2
8024128 Rabinowitz et al. Sep 2011 B2
8114027 Triva Feb 2012 B2
8165821 Zhang Apr 2012 B2
8209130 Kennedy et al. Jun 2012 B1
8283116 Bhattacharyya et al. Oct 2012 B1
8462161 Barber Jun 2013 B1
8463895 Arora et al. Jun 2013 B2
8474228 Adair et al. Jul 2013 B2
8496166 Burns et al. Jul 2013 B2
8529744 Marziali et al. Sep 2013 B2
8778609 Umbarger Jul 2014 B1
8812422 Nizzari et al. Aug 2014 B2
8847799 Kennedy et al. Sep 2014 B1
8976049 Kennedy et al. Mar 2015 B2
9074244 Sparks et al. Jul 2015 B2
9228233 Kennedy et al. Jan 2016 B2
9292527 Kennedy et al. Mar 2016 B2
9535920 Kennedy et al. Jan 2017 B2
9567639 Oliphant et al. Feb 2017 B2
20010007742 Landergren Jul 2001 A1
20010046673 French et al. Nov 2001 A1
20020001800 Lapidus Jan 2002 A1
20020040216 Dumont et al. Apr 2002 A1
20020091666 Rice et al. Jul 2002 A1
20020164629 Quake et al. Nov 2002 A1
20020182609 Arcot Dec 2002 A1
20020187496 Andersson et al. Dec 2002 A1
20020190663 Rasmussen Dec 2002 A1
20030166057 Hildebrand et al. Sep 2003 A1
20030175709 Murphy et al. Sep 2003 A1
20030177105 Xiao et al. Sep 2003 A1
20030203370 Yakhini et al. Oct 2003 A1
20030208454 Rienhoff et al. Nov 2003 A1
20030224384 Sayood et al. Dec 2003 A1
20040029264 Robbins Feb 2004 A1
20040106112 Nilsson et al. Jun 2004 A1
20040142325 Mintz et al. Jul 2004 A1
20040152108 Keith et al. Aug 2004 A1
20040170965 Scholl et al. Sep 2004 A1
20040171051 Holloway Sep 2004 A1
20040197813 Hoffman et al. Oct 2004 A1
20040209299 Pinter et al. Oct 2004 A1
20050003369 Christians et al. Jan 2005 A1
20050026204 Landegren Feb 2005 A1
20050032095 Wigler et al. Feb 2005 A1
20050048505 Fredrick et al. Mar 2005 A1
20050059048 Gunderson et al. Mar 2005 A1
20050100900 Kawashima et al. May 2005 A1
20050112590 Boom et al. May 2005 A1
20050186589 Kowalik et al. Aug 2005 A1
20050214811 Margulies et al. Sep 2005 A1
20050244879 Schumm et al. Nov 2005 A1
20050272065 Lakey et al. Dec 2005 A1
20060019304 Hardenbol et al. Jan 2006 A1
20060024681 Smith et al. Feb 2006 A1
20060078894 Winkler et al. Apr 2006 A1
20060149047 Nanduri et al. Jul 2006 A1
20060177837 Borozan et al. Aug 2006 A1
20060183132 Fu et al. Aug 2006 A1
20060192047 Goossen Aug 2006 A1
20060195269 Yeatman et al. Aug 2006 A1
20060292585 Nautiyal et al. Dec 2006 A1
20060292611 Berka et al. Dec 2006 A1
20070020640 McCloskey et al. Jan 2007 A1
20070042369 Reese et al. Feb 2007 A1
20070092883 Schouten et al. Apr 2007 A1
20070114362 Feng et al. May 2007 A1
20070128624 Gormley et al. Jun 2007 A1
20070161013 Hantash Jul 2007 A1
20070162983 Hesterkamp et al. Jul 2007 A1
20070166705 Milton et al. Jul 2007 A1
20070225487 Nilsson et al. Sep 2007 A1
20070238122 Allbritton et al. Oct 2007 A1
20070244675 Shai et al. Oct 2007 A1
20070264653 Berlin et al. Nov 2007 A1
20080003142 Link et al. Jan 2008 A1
20080014589 Link et al. Jan 2008 A1
20080076118 Tooke et al. Mar 2008 A1
20080081330 Kahvejian Apr 2008 A1
20080085836 Kearns et al. Apr 2008 A1
20080090239 Shoemaker et al. Apr 2008 A1
20080176209 Muller et al. Jul 2008 A1
20080269068 Church et al. Oct 2008 A1
20080280955 McCamish Nov 2008 A1
20080293589 Shapero Nov 2008 A1
20090009904 Yasuna et al. Jan 2009 A1
20090019156 Mo et al. Jan 2009 A1
20090026082 Rothberg et al. Jan 2009 A1
20090029385 Christians et al. Jan 2009 A1
20090042206 Schneider et al. Feb 2009 A1
20090098551 Landers et al. Apr 2009 A1
20090099041 Church et al. Apr 2009 A1
20090105081 Rodesch et al. Apr 2009 A1
20090119313 Pearce May 2009 A1
20090127589 Rothberg et al. May 2009 A1
20090129647 Dimitrova et al. May 2009 A1
20090156412 Boyce, IV et al. Jun 2009 A1
20090163366 Nickerson et al. Jun 2009 A1
20090181389 Li et al. Jul 2009 A1
20090191565 Lapidus et al. Jul 2009 A1
20090192047 Parr et al. Jul 2009 A1
20090202984 Cantor Aug 2009 A1
20090203014 Wu et al. Aug 2009 A1
20090226975 Sabot et al. Sep 2009 A1
20090233814 Bashkirov et al. Sep 2009 A1
20090298064 Batzoglou et al. Dec 2009 A1
20090301382 Patel Dec 2009 A1
20090318310 Liu et al. Dec 2009 A1
20100035243 Muller et al. Feb 2010 A1
20100035252 Rothberg et al. Feb 2010 A1
20100063742 Hart et al. Mar 2010 A1
20100069263 Shendure et al. Mar 2010 A1
20100086926 Craig et al. Apr 2010 A1
20100105107 Hildebrand et al. Apr 2010 A1
20100137143 Rothberg et al. Jun 2010 A1
20100137163 Link et al. Jun 2010 A1
20100143908 Gillevet Jun 2010 A1
20100159440 Messier et al. Jun 2010 A1
20100188073 Rothberg et al. Jul 2010 A1
20100196911 Hoffman et al. Aug 2010 A1
20100197507 Rothberg et al. Aug 2010 A1
20100216151 Lapidus et al. Aug 2010 A1
20100216153 Lapidus et al. Aug 2010 A1
20100248984 Shaffer et al. Sep 2010 A1
20100282617 Rothberg et al. Nov 2010 A1
20100285578 Selden et al. Nov 2010 A1
20100297626 McKernan et al. Nov 2010 A1
20100300559 Schultz et al. Dec 2010 A1
20100300895 Nobile et al. Dec 2010 A1
20100301042 Kahlert Dec 2010 A1
20100301398 Rothberg et al. Dec 2010 A1
20100304982 Hinz et al. Dec 2010 A1
20100311061 Korlach et al. Dec 2010 A1
20100330619 Willis et al. Dec 2010 A1
20110004413 Carnevali et al. Jan 2011 A1
20110009278 Kain et al. Jan 2011 A1
20110015863 Pevzner et al. Jan 2011 A1
20110021366 Chinitz et al. Jan 2011 A1
20110034342 Fox Feb 2011 A1
20110092375 Zamore et al. Apr 2011 A1
20110098193 Kingsmore et al. Apr 2011 A1
20110117544 Lexow May 2011 A1
20110159499 Hindson et al. Jun 2011 A1
20110166029 Margulies et al. Jul 2011 A1
20110224105 Kurn et al. Sep 2011 A1
20110230365 Rohlfs et al. Sep 2011 A1
20110257889 Klammer et al. Oct 2011 A1
20110288780 Rabinowitz et al. Nov 2011 A1
20110301042 Steinmann et al. Dec 2011 A1
20120015050 Abkevich et al. Jan 2012 A1
20120021930 Schoen et al. Jan 2012 A1
20120046877 Hyland et al. Feb 2012 A1
20120059594 Hatchwell et al. Mar 2012 A1
20120074925 Oliver Mar 2012 A1
20120079980 Taylor et al. Apr 2012 A1
20120115736 Bjornson et al. May 2012 A1
20120164630 Porreca et al. Jun 2012 A1
20120165202 Porreca et al. Jun 2012 A1
20120179384 Kuramitsu et al. Jul 2012 A1
20120214678 Rava et al. Aug 2012 A1
20120216151 Sarkar et al. Aug 2012 A1
20120236861 Ganeshalingam et al. Sep 2012 A1
20120245041 Brenner et al. Sep 2012 A1
20120252020 Shuber Oct 2012 A1
20120252684 Selifonov et al. Oct 2012 A1
20120258461 Weisbart Oct 2012 A1
20120270212 Rabinowitz et al. Oct 2012 A1
20120270739 Rava et al. Oct 2012 A1
20130130921 Gao et al. May 2013 A1
20130178378 Hatch et al. Jul 2013 A1
20130183672 de Laat et al. Jul 2013 A1
20130222388 McDonald Aug 2013 A1
20130268474 Nizzari et al. Oct 2013 A1
20130275103 Struble et al. Oct 2013 A1
20130288242 Stoughton et al. Oct 2013 A1
20130323730 Curry et al. Dec 2013 A1
20130332081 Reese et al. Dec 2013 A1
20130344096 Chiang et al. Dec 2013 A1
20140129201 Kennedy et al. May 2014 A1
20140136120 Colwell et al. May 2014 A1
20140206552 Rabinowitz et al. Jul 2014 A1
20140222349 Higgins et al. Aug 2014 A1
20140228226 Yin et al. Aug 2014 A1
20140318274 Zimmerman et al. Oct 2014 A1
20140342354 Evans et al. Nov 2014 A1
20140361022 Finneran Dec 2014 A1
20150051085 Vogelstein et al. Feb 2015 A1
20150056613 Kural Feb 2015 A1
20150178445 Cibulskis et al. Jun 2015 A1
20150299767 Armour et al. Oct 2015 A1
20160034638 Spence et al. Feb 2016 A1
20160210486 Porreca et al. Jul 2016 A1
20170044610 Johnson Feb 2017 A1
20170129964 Cheung May 2017 A1
Foreign Referenced Citations (31)
Number Date Country
1 321 477 Jun 2003 EP
1 564 306 Aug 2005 EP
10770071.8 Nov 2010 EP
2 437 191 Apr 2012 EP
95011995 May 1995 WO
96019586 Jun 1996 WO
98014275 Apr 1998 WO
98044151 Oct 1998 WO
00018957 Apr 2000 WO
02093453 Nov 2002 WO
2004015609 Feb 2004 WO
2004018497 Mar 2004 WO
2004083819 Sep 2004 WO
2005003304 Jan 2005 WO
2007010251 Jan 2007 WO
2007107717 Sep 2007 WO
2007123744 Nov 2007 WO
2007135368 Nov 2007 WO
2009036525 Mar 2009 WO
2010024894 Mar 2010 WO
2010126614 Nov 2010 WO
2011027219 Mar 2011 WO
2012040387 Mar 2012 WO
2012051208 Apr 2012 WO
2012087736 Jun 2012 WO
2012109500 Aug 2012 WO
2012134884 Oct 2012 WO
2013052557 Apr 2013 WO
2013058907 Apr 2013 WO
2013148496 Oct 2013 WO
2013191775 Dec 2013 WO
Non-Patent Literature Citations (288)
Entry
Schrijver et al., “Diagnostic Testing by CFTR Gene Mutation Analysis in a Large Group of Hispanics: Novel Mutations and Assessment of a Population-Specific Mutation Spectrum,” J. Mol. Diagn. 2005, 7:289-299. (Year: 2005).
Mamanova, 2010, Target-enrichment strategies for nextgeneration sequencing, Nature Methods 7(2):111-8.
Margulies, 2005, Genome sequencing in micro-fabricated high-density picotiter reactors, Nature, 437:376-380.
Marras, 1999, Multiplex detection of single-nucleotide variations using molecular beacons, Genetic Analysis: Biomolecular Engineering 14:151.
Maxam, 1977, A new method for sequencing DNA, PNAS 74:560-564.
May 1988, How Many Species Are There on Earth?, Science 241(4872):1441-9.
McKenna, 2010, The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data, Genome Res 20(9):1297-1303.
Meyer, 2007, Targeted high-throughput sequencing of tagged nucleic acid samples, Nucleic Acids Research 35(15):e97 (5 pages).
Meyer, 2008, Parallel tagged sequencing on the 454 platform, Nature Protocols 3(2):267-78.
Miesenbock, 1998, Visualizing secretion and synaptic transmission with pH-sensitive green fluorescent proteins, Nature 394(6689):192-95.
Miller, 2010, Assembly algorithms for next-generation sequencing data, Genomics 95:315-327.
Mills, 2010, Mapping copy number variation by population-scale genome sequencing, Nature 470(7332):59-65.
Miner, 2004, Molecular barcodes detect redundancy and contamination in hairpin-bisulfite PCR, Nucl Acids Res 32(17):e135.
Minton, 2011, Mutation Surveyor: software for DNA sequence analysis, Meth Mol Biol 688:143-53.
Miyazaki, 2009, Characterization of deletion breakpoints in patients with dystrophinopathy carrying a deletion of exons 45-55 of the Duchenne muscular dystrophy (DMD) gene, J Hum Gen 54:127-30.
Mockler, 2005, Applications of DNA tiling arrays for whole-genome analysis, Genomics 85(1):1-15.
Mohammed, 2012, DELIMINATE—a fast and efficient methods for loss-less compression of genomice sequences, Bioinformatics 28(19):2527-2529.
Moudrianakis, 1965, Base sequence determination in nucleic acids with the electron microscope, III. Chemistry and microscopy of guanine-labeled DNA, PNAS 53:564-71.
Mullan, 2002, Multiple sequence alignment—the gateway to further analysis, Brief Bioinform 3(3):303-5.
Munne, 2012, Preimplantation genetic diagnosis for aneuploidy and translocations using array comparative genomic hybridization, Curr Genomics 13(6):463-470.
Nan, 2006, A novel CFTR mutation found in a Chinese patient with cystic fibrosis, Chinese Med J 119(2):103-9.
Narang, 1979, Improved phosphotriester method for the synthesis of gene fragments, Meth Enz 68:90-98.
Nelson, 1989, Bifunctional oligonucleotide probes synthesized using a novel CPG support are able to detect single base pair mutations, Nucl Acids Res 17(18):7187-7194.
Ng, 2009, Targeted capture and massively parallel sequencing of 12 human exomes, Nature 461(7261):272-6.
Nicholas, 2002, Strategies for multiple sequence alignment, Biotechniques 32:572-91.
Nickerson, 1990, Automated DNA diagnostics using an ELISA-based oligonucleotide ligation assay, PNAS 87:8923-7.
Nielsen et al., 1999, Peptide Nucleic Acids, Protocols and Applications (Norfolk: Horizon Scientific Press, 1-19).
Nilsson, 2006, Analyzing genes using closing and replicating circles, Trends in Biotechnology 24:83-8.
Ning, 2001, SSAHA: a fast search method for large DNA databases, Genome Res 11(10):1725-9.
Nordhoff, 1993, Ion stability of nucleic acids in infrared matrix-assisted laser desorption/ ionization mass spectrometry, Nucl Acid Res 21(15):3347-57.
Nuttle, 2013, Rapid and accurate large-scale genotyping of duplicated genes and discovery of interlocus gene conversions, Nat Meth 10(9):903-909.
Nuttle, 2014, Resolving genomic disorder-associated breakpoints within segmental DNA duplications using massively parallel sequencing, Nat Prot 9(6):1496-1513.
O'Roak, 2012, Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders, Science 338(6114):1619-1622.
Oefner, 1996, Efficient random sub-cloning of DNA sheared in a recirculating point-sink flow system, Nucleic Acids Res 24(20):3879-3886.
Oka, 2006, Detection of loss of heterozygosity in the p53 gene in renal cell carcinoma and bladder cancer using the polymerase chain reaction, Mol Carcinogenesis 4(1):10-13.
Okoniewski, 2013, Precise breakpoint localization of large genomic deletions using PacBio and Illumina next-generation sequencers, Biotechniques 54(2):98-100.
Oliphant, 2002, BeadArray technology: enabling an accurate, cost-effective approach to high-throughput genotyping, Biotechniques Suppl:56-8, 60-1.
Ordahl, 1976, Sheared DNA fragment sizing: comparison of techniques, Nucleic Acids Res 3:2985-2999.
Ostrer, 2001, A genetic profile of contemporary Jewish populations, Nat Rev Genet 2(11):891-8.
Owens, 1998, Aspects of oligonucleotide and peptide sequencing with MALDI and electrospray mass spectrometry, Bioorg Med Chem 6:1547-1554.
Parameswaran, 2007, A pyrosequencing-tailored nucleotide barcode design unveils opportunities for large-scale sample multiplexing, Nucl Acids Rese 35:e130.
Parkinson, 2012, Preparation of high-quality next-generation sequencing libraries from picogram quantities of target DNA, Genome Res 22:125-133.
Pastor, 2010, Conceptual modeling of human genome mutations: a dichotomy between what we have and what we shoudl have, 2010 Proc BIOSTEC Bioinformatics, pp. 160-166.
Paton, 2000, Conceptual modelling of genomic information, Bioinformatics 16(6):548-57.
Pearson, 1988, Improved tools for biological sequence comparison, PNAS 85(8):2444-8.
Pertea et al., 2003, TIGR Gene indices clustering tools (TGICL): a software system for fast clustering of large EST datasets, Bioinformatics 19(5):651-52.
Pertea, 2003, TIGR gene indices clustering tools (TGICL), Bioinformatics 19(5):651-52.
Pieles, 1993, Matrix-assisted laser desorption ionization time-of-flight mass spectrometry: A powerful tool for the mass and sequence analysis of natural and modified oligonucleotides, Nucleic Acids Res 21:3191-3196.
Pinho, 2013, MFCompress: a compression tool for FASTA and multi-FASTA data, Bioinformatics 30(1):117-8.
Porreca, 2007, Multiplex amplification of large sets of human exons, Nat Meth 4(11):931-936.
Porreca, 2013, Analytical performance of a Next-Generation DNA sequencing-based clinical workflow for genetic carrier screening, American Society of Human Genetics 63rd Annual Meeting, Abstract, Oct. 22, 2013.
Procter, 2006, Molecular diagnosis of Prader-Willi and Angelman syndromes by methylation-specific melting analysis and methylation-specific multiplex ligation-dependent probe amplification, Clin Chem 52(7):1276-83.
Qiagen, 2011, Gentra Puregene handbook, 3d Ed. (72 pages).
Thompson, 1994, Clustal W: improving the sensitivity of progressive mulitple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nuc Acids Res 22:4673-80.
Quail, 2010, DNA: Mechanical Breakage, in Encyclopedia of Life Sciences, John Wiley & Sons Ltd, Chicester (5 pages).
Rambaut, 1997, Seq-Gen:an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees, Bioinformatics 13:235-38.
Richards, 2008 ACMG recommendations for standards for interpretation and reporting of sequence variations: Revisions, Genet Med 10(4):294-300.
Richter, 2008, MetaSim—A Sequencing Simulator for Genomics and Metagenomics, PLoS One 3:e3373.
Roberts, 1980, Restriction and modification enzymes and their recognition sequences, Nucleic Acids Res 8(1):r63-r80.
Robinson et al., 2013, Graph Databases, O'Reilly Media, Inc., Sebastopol, CA (223 pages).
Rodriguez, 2010, Constructions from Dots and Lines, Bull Am Soc Inf Sci Tech 36(6):35-41.
Rosendahl et al., 2013, CFTR, SPINK1, CTRC and PRSS1 variants in chronic pancreatitis: is the role of mutated CFTR overestimated?, Gut 62:582-592.
Rothberg, 2011, An integrated semiconductor device enabling non-optical genome sequencing, Nature 475:348-352.
Rowntree, 2003, The phenotypic consequences of CFTR mutations, Ann Hum Gen 67:471-485.
Saihan, 2009, Update on Usher syndrome, Cur Op Neurology 22:19-27.
Sanger, 1977, DNA Sequencing with chain-terminating inhibitors, PNAS 74(12):5463-5467.
Santa Lucia, 1998, A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics, PNAS 95(4):1460-5.
Sargent, 1988, Isolation of differentially expressed genes, Methods Enzymol 152:423-432.
Sauro, 2004, How Do You Calculate a Z-Score/ Sigma Level?, https://www.measuringusability.com/zcalc.htm (online publication).
Sauro, 2004, What's a Z-Score and Why Use it in Usability Testing?, https://www.measuringusability.com/z.htm (online publication).
Schadt, 2010, A window into third-generation sequencing, Human Mol Genet 19(R2):R227-40.
Schatz et al., 2010, Assembly of large genomes using second-generation sequencing, Genome Res., 20:1165-1173.
Schiffman, 2009, Molecular inversion probes reveal patterns of 9p21 deletion and copy number aberrations in childhood leukemia, Cancer Genetics and Cytogenetics 193:9-18.
Schneeberger, 2011, Reference-guided assembly of four diverse Arabidopsis thaliana genomes, PNAS 108(25):10249-10254.
Schoolcraft, 2010, Clinical application of comprehensive chromosomal screening at the blastocyst stage, Fert Steril 94(5):1700-1706.
Schouten, 2002, Relative Quantification of 40 Nucleic Acid Sequences by Multiplex Ligation-Dependent Probe Amplification, Nucle Acids Res 30 (12):257.
Schrijver et al., 2005, Diagnostic testing by CFTR gene mutation analysis in a large group of hispanics: novel mutations and assessment of a population-specific mutation spectrum, J Mol Diag 7(2):289-299.
Schrijver, 2005, Diagnostic testing by CFTR gene mutation analysis in a large group of Hispanics, J Mol Diag 7(2):289-299.
Schuette, 1995, Sequence analysis of phosphorothioate oligonucleotides via matrix-assisted laser desorption ionization time-of-flight mass spectrometry, J Pharm Biomed Anal 13:1195-1203.
Schwartz, 2009, Identification of cystic fibrosis variants by polymerase chain reaction/oligonucleotide ligation assay, J Mol Diag 11(3):211-15.
Schwartz, 2011, Clinical utility of single nucleotide polymorphism arrays, Clin Lab Med 31(4):581-94.
Sequeira, 1997, Implementing generic, object-oriented models in biology, Ecological Modeling 94.1:17-31.
Shen, 2013, Multiplex capture with double-stranded DNA probes, Genome Medicine 5(50):1-8.
Sievers, 2011, Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega, Mol Syst Biol 7:539.
Simpson, 2009, ABySS: A parallel assembler for short read sequence data, Genome Res., 19(6):1117-23.
Slater, 2005, Automated generation of heuristics for biological sequence comparison, BMC Bioinformatics 6:31.
Smirnov, 1996, Sequencing oligonucleotides by exonuclease digestion and delayed extraction matrix-assisted laser desorption ionization time-of-flight mass spectrometry, Anal Biochem 238:19-25.
Smith, 1985, The synthesis of oligonucleotides containing an aliphatic amino group at the 5′ terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis, Nucl. Acid Res., 13:2399-2412.
Smith, 2010, Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples, Nucleic Acids Research 38(13):e142 (8 pages).
Soni, 2007, Progress toward ultrafast DNA sequencing using solid-state nanopores, Clin Chem 53(11):1996-2001.
Spanu, 2010, Genome expansion and gene loss in powdery mildew fungi reveal tradeoffs in extreme parasitism, Science 330(6010):1543-46.
Sproat, 1987, The synthesis of protected 5′-mercapto-2′,5′-dideoxyribonucleoside-3′-O-phosphoramidites; uses of 5′-mercapto-oligodeoxyribonucleotides, Nucl Acid Res 15:4837-4848.
Strom, 2005, Mutation detection, interpretation, and applications in the clinical laboratory setting, Mutat Res 573:160-67.
Summerer, 2009, Enabling technologies of genomic-scale sequence enrichment for targeted high-throughput sequencing, Genomics 94(6):363-8.
Summerer, 2010, Targeted High Throughput Sequencing of a Cancer-Related Exome Subset by Specific Sequence Capture With a Fully Automated Microarray Platform, Genomics 95(4):241-246.
Sunnucks, 1996, Microsatellite and chromosome evolution of parthenogenetic sitobion aphids in Australia, Genetics 144:747-756.
Tan, 2014, Clinical outcome of preimplantation genetic diagnosis and screening using next generation sequencing, GigaScience 3(30):1-9.
Thauvin-Robinet, 2009, The very low penetrance of cystic fibrosis for the R117H mutation: a reappraisal for genetic counseling and newborn screening, J Med Genet 46:752-758.
Thiyagarajan, 2006, PathogenMIPer: a tool for the design of molecular inversion probes to detect multiple pathogens, BMC Bioinformatics 7:500.
Thompson et al., Nucl Acids Res 22:4673-80 (1994).
Adey, 2010, Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition, Genome Biol 11:R119.
Ageno, 1969, The alkaline denaturation of DNA, Biophys J 9(11):1281-1311.
Agrawal, 1990, Site-specific functionalization of oligodeoxynucleotides for non-radioactive labelling, Tetrahedron Let 31:1543-1546.
Akhras, 2007, Connector Inversion Probe Technology: A Powerful OnePrimer Multiplex DNA Amplification System for Numerous Scientific Applications PLOS One 2(9):e915.
Alazard, 2002, Sequencing of production-scale synthetic oligonucleotides by enriching for coupling failures using matrix-assisted laser desorption/ ionization time-of-flight mass spectrometry, Anal Biochem 301:57-64.
Alazard, 2006, Sequencing oligonucleotides by enrichment of coupling failures using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry, Curr Protoc Nucleic Acid Chem, Chapter 10, Unit 10:1-7.
Albert, 2007, Direct selection of human genomic loci by microarray hybridization, Nature Methods 4(11):903-5.
Aljanabi, 1997, Universal and rapid salt-extraction of high quality genomic DNA for PCR-based techniques, Nucl. Acids Res 25:4692-4693.
Antonarakis and the Nomenclature Working Group, 1998, Recommendations for a nomenclature system for human gene mutations, Human Mutation 11:1-3.
Archer, 2014, Selective and flexible depletion of problematic sequences from RNA-seq libraries at the cDNA stage, BMC Genomics 15(1):401.
Australian Patent Examination Report No. 1 dated Aug. 12, 2014, for Australian Patent Application No. 2010242073, filed Apr. 30, 2010, (4 pages).
Ball, 2009, Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells, Nat Biotech 27:361-8.
Balzer, 2013, Filtering duplicate reads from 454 pyrosequencing data, Bioinformatics 29(7):830-836.
Barany, 1991, Genetic disease detection and DNA amplification using cloned thermostable ligase, PNAS 88:189-193.
Barany, 1991, The Ligase Chain Reaction in a PCR World, Genome Research 1:5-16.
Bau, 2008, Targeted next-generation sequencing by specific capture of multiple genomic loci using low-volume microfluidic DNA arrays, Analytical and Bioanal Chem 393(1):171-5.
Beer, 1962, Determination of base sequence in nucleic acids with the electron microscope: visibility of a marker, PNAS 48(3):409-416.
Bell, 2011, Carrier testing for severe childhood recessive diseases by next-generation sequencing, Sci Trans Med 3(65ra4).
Benner, 2001, Evolution, language and analogy in functional genomics, Trends Genet 17:414-8.
Bentzley, 1996, Oligonucleotide sequence and composition determined by matrix-assisted laser desorption/ionization, Anal Chem 68:2141-2146.
Bentzley, 1998, Base specificity of oligonucleotide digestion by calf spleen phosphodiesterase with matrix-assisted laser desorption ionization analysis, Anal Biochem 258:31-37.
Bickle, 1993, Biology of DNA Restriction, Microbiol Rev 57(2):434-50.
Bonfield, 2013, Compression of FASTQ and SAM format sequencing data, PLoS One 8(3):e59190.
Bose, 2012, BIND—An algorithm for loss-less compression of nucleotide sequence data, J Biosci 37(4):785-789.
Boyden, 2013, High-throughput screening for SMN1 copy number loss by next-generation sequencing, American Society of Human Genetics 63rd Annual Meeting, Abstract, Oct. 22, 2013.
Boyer, 1971, DNA restriction and modification mechanisms in bacteria, Ann Rev Microbiol 25:153-76.
Braasch, 2001, Locked nucleic acid (LNA): fine-tuning the recognition of DNA and RNA, Chemistry & Biology 8(1):1-7.
Braslavsky, 2003, Sequence information can be obtained from single DNA molecules, PNAS 100:3960-4.
Brinkman, 2004, Splice Variants as Cancer Biomarkers, Clin Biochem 37:584.
Brown et al., 1979, Chemical synthesis and cloning of a tyrosine tRNA gene, Methods Enzymol 68:109-51.
Browne, 2002, Metal ion-catalyzed nucleic Acid alkylation and fragmentation, J Am Chem Soc 124(27):7950-7962.
Brownstein, 2014, An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge, Genome Biol 15:R53.
Bunyan, 2004, Dosage analysis of cancer predisposition genes by multiplex ligation-dependent probe amplification, British Journal of Cancer, 91(6):1155-59.
Burrow, 1994, A block-sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, CA.
Carpenter, 2013, Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries, Am J Hum Genet 93(5):852-864.
Caruthers, 1985, Gene synthesis machines: DNA chemistry and its uses, Science 230:281-285.
Castellani, 2008, Consenses on the use of and interpretation of cystic fibrosis mutation analysis in clinical practice, J Cyst Fib 7:179-196.
CDC, 2010 Assisted Reproductive Technology: Fertility Clinic Success Rates Report.
Challis, 2012, An integrative variant analysis suite for whole exome next-generation sequencing data, BMC Informatics 13(8):1-12.
Chan, 2011, Natural and engineered nicking endonucleases—from cleavage mechanism to engineering of strand-specificity, Nucl Acids Res 39(1):1-18.
Chen, 2010, Identification of racehorse and sample contamination by novel 24-plex STR system, Forensic Sci Int: Genetics 4:158-167.
Chennagiri, 2013, A generalized scalable database model for storing and exploring genetic variations detected using sequencing data, American Society of Human Genetics 63rd Annual Meeting, Abstract, Oct. 22, 2013.
Chevreux, 1999, Genome sequence assembly using trace signals and additional sequence information, Proc GCB 99:45-56.
Chirgwin et al., 1979, Isolation of biologically active ribonucleic acid from sources enriched in ribonuclease, Biochemistry, 18:5294-99.
Choe, 2010, Novel CFTR Mutations in a Korean Infant with Cystic Fibrosis and Pancreatic Insufficiency, J Korean Med Sci 25:163-5.
Ciotti, 2004, Triplet repeat prmied PCR (TP PCR) in molecular diagnostic testing for Friedrich ataxia, J Mol Diag 6(4):285-9.
Cock, 2010, The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants, Nucleic Acids Res 38(6):1767-1771.
Collins, 2004, Finishing the euchromatic sequence of the human genome, Nature 431(7011):931-45.
Cremers, 1998, Autosomal Recessive Retinitis Pigmentosa and Cone-Rod Dystrophy Caused by Splice Site Mutations in the Stargardt's Disease Gene ABCR, Hum Mol Gen 7(3):355.
Cronin, 1996, Cystic Fibrosis Mutation Detection by Hybridization to Light-Generated DNA Probe Arrays Human Mutation 7:244.
Homer, 2009, BFAST: An alignment tool for large scale genome resequencing, PLoS One 4(11):e7767.
Housley, 2009, SNP discovery and haplotype analysis in the segmentally duplicated DRD5 coding region, Ann Hum Genet 73(3):274-282.
Huang, 2008, Comparative analysis of common CFTR polymorphisms poly-T, TGrepeats and M470V in a healthy Chinese population, World J Gastroenterol 14(12):1925-30.
Husemann, 2009, Phylogenetic Comparative Assembly, Algorithms in Bioinformatics: 9th International Workshop, pp. 145-156, Salzberg & Warnow, Eds. Springer-Verlag, Berlin, Heidelberg.
Illumina, 2010, De Novo assembly using Illumina reads, Technical Note (8 pages).
International Human Genome Sequencing Consortium, 2004, Finishing the euchromatic sequence of the human genome, Nature 431:931-945.
Iqbal, 2012, De novo assembly and genotyping of variants using colored de Bruijn graphs, Nature Genetics 44:226-232.
Isosomppi, 2009, Disease-causing mutations in the CLRN1 gene alter normal CLRN1 protien trafficking to the plasma membrane, Mol Vis 15:1806-1818.
Jaijo, 2010, Microarray-based mutation analysis of 183 Spanish families with Usher syndrome, Invest Ophthalmol Vis Sci 51(3):1311-7.
Jensen, 2001, Orthologs and paralogs—we need to get it right, Genome Biol 2(8):1002-1002.3.
Jones, 2008, Core signaling pathways in human pancreatic cancers revealed by global genomic analyses, Science 321(5897):1801-1806.
Kambara et al., Optimization of Parameters in a DNA Sequenator Using Fluorescence Detection, Nature Biotechnology 6:816-821 (1988).
Kennedy et al., 2013, Accessing more human genetic variation with short sequencing reads, American Society of Human Genetics 63rd Annual Meeting, Abstract, Oct. 22, 2013.
Kent, 2002, BLAT'The BLAST-like alignment tool, Genome Res 12(4): 656-664.
Kerem, 1989, Identification of the cystic fibrosis gene: genetic analysis, Science 245:1073-1080.
Kinde, 2012, FAST-SeqS: a simple an effective method for detection of aneuploidy by massively parallel sequencing, PLoS One 7(7):e41162.
Kircher, 2010, High-througput DNA sequencing—concepts and limitations, Bioassays 32:524-36.
Kirpekar, 1994, Matrix assisted laser desorption/ionization mass spectrometry of enzymatically synthesized RNA up to 150 kDa, Nucl Acids Res 22:3866-3870.
Klein, 2011, LOCAS—A low coverage sequence assembly tool for re-sequencing projects, PLoS One 6(8):e23455.
Kneen, 1998, Green fluorescent protein as a noninvasive intracellular pH indicator, Biophys J 74(3):1591-99.
Koboldt, 2009, VarScan: variant detection in massively parallel sequencing of individual and pooled samples, Bioinformatics 25:2283-85.
Krawitz, 2010, Microindel detection in short-read sequence data, Bioinformatics 26(6):722-729.
Kreindler, 2010, Cystic fibrosis: exploiting its genetic basis in the hunt for new therapies, Pharmacol Ther 125(2):219-229.
Krishnakumar, 2008, A comprehensive assay for targeted multiplex amplification of human DNA sequences, PNAS 105:9296-301.
Kumar, 2010, Comparing de novo assemblers for 454 transcriptome data, Genomics 11:571.
Kurtz, 2004, Versatile and open software for comparing large genomes, Genome Biol 5:R12.
Lam, 2008, Compressed indexing and local alignment of DNA, Bioinformatics 24(6):791-97.
Langmead, 2009, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biol 10:R25.
Larkin, 2007, Clustal W and Clustal X version 2.0, Bioinformatics, 23(21):2947-2948.
Lecompte, 2001, Multiple alignment of complete sequences (MACS) in the post-genomic era, Gene 270(1-2):17-30.
Li, 2003, DNA binding and cleavage by the periplasmic nuclease Vvn: a novel structure with a known active site, EMBO J 22(15):4014-4025.
Li, 2008, SOAP: short oligonucleotide alignment program, Bioinformatics 24(5):713-14.
Li, 2009, Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, 25 (14):1754-60.
Li, 2009, SOAP2: an improved ultrafast tool for short read alignment, Bioinformatics 25(15):1966-67.
Li, 2009, The Sequence Alignment/Map format and SAMtools, Bioinformatics 25(16):2078-9.
Li, 2010, Fast and accurate long-read alignment with Burrows-Wheeler transform, Bioinformatics 26(5):589-95.
Li, 2011, Improving SNP discovery by base alignment quality, Bioinformatics 27:1157.
Li, 2011, Single nucleotide polymorphism genotyping and point mutation detection by ligation on microarrays, J Nanosci Nanotechnol 11(2):994-1003.
Li, 2012, A new approach to detecting low-level mutations in next-generation sequence data, Genome Biol 13:1-15.
Li, 2014, HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads, JAMIA 21:363-373.
Lin, 2008, ZOOM! Zillions of Oligos Mapped, Bioinformatics 24:2431.
Lin, 2010, A molecular inversion prove assay for detecting alternative splicing, BMC Genomics 11(712):1-14.
Lin, 2012, Development and evaluation of a reverse dot blot assay for the simultaneous detection of common alpha and beta thalassemia in Chinese, Blood Cells Molecules, and Diseases 48(2):86-90.
Lipman, 1985, Rapid and sensitive protein similarity searches, Science 227(4693):1435-41.
Liu, 2012, Comparison of next-generation sequencing systems, J Biomed Biotech 2012:251364.
Llopis, 1998, Measurement of cytosolic, mitochondrial, and Golgi pH in single living cells with green fluorescent proteins, PNAS 95(12):6803-08.
Ma, 2006, Application of real-time polymerase chain reaction (RT-PCR), J Am Soc 1-15.
MacArthur, 2014, Guidelines for investigating causality of sequence variants in human disease, Nature 508:469-76.
Maddalena, 2005, Technical standards and guidelines: molecular genetic testing for ultra-rare disorders, Genet Med 7:571-83.
Malewicz, 2010, Pregel: a system for large-scale graph processing, Proc. ACM SIGMOD Int Conf Mgmt Data 135-46.
Thompson, 2011, The properties and applications of single-molecule DNA sequencing, Genome Biol 12(2):217.
Thorstenson, 1998, An Automated Hydrodynamic Process for Controlled, Unbiased DNA Shearing, Genome Res 8(8):848-855.
Thorvaldsdottir, 2012, Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration, Brief Bioinform 24(2):178-92.
Tkachuk, 1990, Detection of bcr-abl Fusion in Chronic Myelogeneous Leukemia by in Situ Hybridization, Science 250:559.
Tobler, 2005, The SNPlex Genotyping System: A Flexible and Scalable Platform for SNP Genotyping, J Biomol Tech 16(4):398.
Tokino, 1996, Characterization of the human p57 KIP2 gene: alternative splicing, insertion/deletion polymorphisms in VNTR sequences in the coding region, and mutational analysis, Human Genetics 96:625-31.
Turner, 2009, Massively parallel exon capture and library-free resequencing across 16 genomes, Nat Meth 6:315-316.
Turner, 2009, Methods for genomic partitioning, Ann Rev Hum Gen 10:263-284.
Umbarger, 2013, Detecting contamination in Next Generation DNA sequencing libraries, American Society of Human Genetics 63rd Annual Meeting, Abstract, Oct. 22, 2013.
Umbarger, 2014, Next-generation carrier screening, Gen Med 16(2):132-140.
Veeneman, 2012, Oculus: faster sequence alignment by streaming read compression, BMC Bioinformatics 13:297.
Wallace 1979, Hybridization of synthetic oligodeoxyribonucteotides to dp x 174DNA:the effect of single base pair mismatch, Nucl Acids Res 6:3543-3557.
Wallace, 1987, Oligonucleotide probes for the screening of recombinant DNA libraries, Meth Enz 152:432-442.
Wang et al., 2005, Allele quantification using molecular inversion probes (MIP), Nucleic Acids Research 33(21):e183.
Warner, 1996, A general method for the detection of large CAG repeat expansions by fluorescent PCR, J Med Genet 33(12):1022-6.
Warren, 2007, Assembling millions of short DNA sequences using SSAKE, Bioinformatics, 23:500-501.
Waszak, 2010, Systematic inference of copy-number genotypes from personal genome sequencing data reveals extensive olfactory gene content diversity, PLoS Comp Biol 6(11):e1000988.
Watson et al., 2004, Cystic fibrosis population carrier screening: 2004 revision of American College of Medical Genetics mutation panel, Genetics in Medicine 6(5):387-391.
Williams, 2003, Restriction endonucleases classification, properties, and applications, Mol Biotechnol 23(3):225-43.
Wittung, 1997, Extended DNA-Recognition Repertoire of Peptide Nucleic Acid (PNA): PNA-dsDNA Triplex Formed with Cytosine-Rich Homopyrimidine PNA, Biochemistry 36:7973.
Wu, 1998, Sequencing regular and labeled oligonucleotides using enzymatic digestion and ionspray mass spectrometry, Anal Biochem 263:129-138.
Wu, 2001, Improved oligonucleotide sequencing by alkaline phosphatase and exonuclease digestions with mass spectrometry, Anal Biochem 290:347-352.
Xu, 2012, FastUniq: A fast de novo duplicates removal tool for paired short reads, PLoS One 7(12):e52249.
Yau, 1996, Accurate diagnosis of carriers of deletions and duplications in Duchenne/Becker muscular dystrophy by fluorescent dosage analysis, J Med Gen 33(7):550-8.
Ye, 2009, Pindel: a pattern growth approach to detect break points of large deletions and medium size insertions from paired-end short reads, Bioinformatics 25(21):2865-2871.
Yershov, 1996, DNA analysis and diagnostics on oligonucleotide microchips, PNAS 93:4913-4918.
Yoo, 2009, Applications of DNA microarray in disease diagnostics, J Microbiol Biotech19(7):635-46.
Yoon, 2014, MicroDuMIP: target-enrichment technique for microarray-based duplex molecular inversion probes, Nucl Ac Res 43(5):e28.
Yoshida, 2004, Role of BRCA1 and BRCA2 as regulators of DNA repair, transcription, and cell cycle in response to DNA damage, Cancer Science 95(11)866-71.
Yu, 2007, A novel set of DNA methylation markers in urine sediments for sensitive/specific detection of bladder cancer, Clin Cancer Res 13(24):7296-7304.
Yuan, 1981, Structure and mechanism of multifunctional restriction endonucleases, Ann Rev Biochem 50:285-319.
Zerbino, 2008, Velvet: Algorithms for de novo short read assembly using de Bruijn graphs, Genome Research 18(5):821-829.
Zhang, 2011, Is Mitochondrial tRNAphe Variant m.593T.Ca Synergistically Pathogenic Mutation in Chinese LHON Families with m.11778G.A? PLoS One 6(10):e26511.
Zhao, 2009, PGA4genomics for comparative genome assembly based on genetic algorithm optimization, Genomics 94(4):284-6.
Zheng, 2011, iAssembler: a package for de novo assembly of Roche-454/Sanger transcriptome sequences, BMC Bioinformatics 12:453.
Zhou, 2014, Bias from removing read duplication in ultra-deep sequencing experiments, Bioinformatics 30(8):1073-1080.
Zimmerman, 2010, A novel custom resequencing array for dilated cardiomyopathy, Gen Med 12(5):268-78.
Zuckerman, 1987, Efficient methods for attachment of thiol specific probes to the 3′-ends of synthetic oligodeoxyribonucleotides, Nucl Acid Res 15(13):5305-5321.
Dahl et al., 2005, Multiplex amplification enabled by selective circularization of large sets of genomic DNA fragments, Nucleic Acids Res 33(8):e71.
Danecek, 2011, The variant call format and VCFtools, Bioinformatics 27(15):2156-2158.
de la Bastide, 2007, Assembling genome DNA sequences with PHRAP, Current Protocols in Bioinformatics 17:11.4.1-11.4.15.
Delcher, 1999, Alignment of whole genomes, Nuc Acids Res 27(11):2369-2376.
den Dunnen, 2003, Mutation Nomenclature, Curr Prot Hum Genet 7.13.1-7.13.8.
Deng et. al., 2012, Supplementary Material, Nature Biotechnology, S1-1-S1-1 1, Retrieved from the Internet on Oct. 24, 2012.
Deng, 2009, Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming, nature biotechnology 27:353-60 (and supplement).
Deorowicz, 2013, Data compression for sequencing data, Alg for Mole Bio 8:25.
Diep, 2012, Library-free methylation sequencing with bisulfite padlock probes, Nature Methods 9:270-272 (and supplemental information).
DiGuistini, 2009, De novo sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data, Genome Biology, 10:R94.
Dolinsek, 2013, Depletion of unwanted nucleic acid templates by selection cleavage: LNAzymes, catalytically active oligonucleotides containing locked nucleic acids, open a new window for detecting rare microbial community members, App Env Microbiol 79(5):1534-1544.
Dong & Yu, 2011, Mutation surveyor: An in silico tool for sequencing analysis, Methods Mol Biol 760:223-37.
Drmanac, 1992, Sequencing by hybridization: towards an automated sequencing of one million M13 clones arrayed on membranes, Elctrophoresis 13:566-573.
Dudley, 2009, A quick guide for developing effective bioinformatics programming skills, PLoS Comp Biol 5(12):e1000589.
Ericsson, 2008, A dual-tag microarray platform for high-performance nucleic acid and protein analyses, Nucl Acids Res 36:e45.
Examination Report from the European Patent Office for EP 10770071.8 dated Jul. 16, 2013, (5 pages).
Extended European Search Report for Application No. 12765217.0 dated Aug. 26, 2014, (5 pages).
Extended European Search Report dated Nov. 11, 2015, for EP Application 13772357.3 (8 pages).
Fares, 2008, Carrier frequency of autosomal-recessive disorders in the Ashkenazi Jewish population: should the rationale for mutation choice for screening be reevaluated?, Prenatal Diagnosis 28:236-41.
Faulstich, 1997, A sequencing method for RNA oligonucleotides based on mass spectrometry, Anal Chem 69:4349-4353.
Faust, 2014, SAMBLASTER: fast duplicate marking and structural variant read extraction, Bioinformatics published online May 7, 2014.
Fitch, 1970, Distinguishing homologs from analogous proteins, Syst Biol 19(2):99-113.
Frey, 2006, Statistics Hacks 108-115.
Friedenson, 2005, BRCA1 and BRCA2 Pathways and the Risk of Cancers Other Than Breast or Ovarian, Medscape General Medicine 7(2):60.
Furtado, 2011, Characterization of large genomic deletions in the FBN1 gene using multiplex ligation-dependent probe amplification, BMC Med Gen 12:119-125.
Garber, 2008, Fixing the front end, Nat Biotech 26(10):1101-1104.
Gemayel, 2010, Variable tandem repeats accelerate evolution of coding and regulatory sequences, Ann Rev Genet 44:445-77.
Giusti, 1993, Synthesis and Characterization of f′-Fluorescent-dye-labeled Oligonucleotides, PCR Meth Appl 2:223-227.
Glover, 1995, Sequencing of oligonucleotides using high performance liquid chromatography and electrospray mass spectrometry, Rapid Com Mass Spec 9:897-901.
Gnirke et al., 2009, Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing, nature biotechnology 27:182-9.
Goto, 1994, A Study on Development of a Deductive Object-Oriented Database and Its Application to Genome Analysis, PhD Thesis, Kyushu University, Kyushu, Japan (106 pages).
Goto, 2010, BioRuby: bioinformatics software for the Ruby programming language, Bioinformatics 26(20):2617-2619.
Green, 2005, Suicide polymerase endonuclease restriction, a novel technique for enhancing PCR amplification of minor DNA template, Appl Env Microbiol 71(8):4721-4727.
Guerrero-Fernandez, 2013, FQbin: a compatible and optimize dformat for storing and managing sequence data, IWBBIO Proceedings, Granada 337-344.
Gupta, 1991, A general method for the synthesis of 3′-sulfhydryl and phosphate group containing oligonucleotides, Nucl Acids Res 19(11):3019-3025.
Gustincich. et al., 1991, A fast method for high-quality genomic DNA extraction from whole human blood, BioTechniques 11(3):298-302.
Gut, 2995, A procedure for selective DNA alkylation and detection by mass spectrometry, Nucl Acids Res 23(8):1367-1373.
Hallam, 2014, Validation for Clinical Use of, and Initial Clinical Experience with, a Novel Approach to Population-Based Carrier Screening using High-Throughput Next-Generation DNA Sequencing, J Mol Diagn 16:180-9.
Hammond, 1996, Extraction of DNA from preserved animal specimens for use in randomly amplified polymorphic DNA analysis, Anal Biochem 240:298-300.
Hardenbol, 2003, Multiplexed genotyping with sequence-tagged molecular inversion probes, Nat Biotech 21:673-8.
Hardenbol, 2005, Highly multiplexed molecular inversion probe genotyping: over 10,000 targeted SNPs genotyped in a single tube assay, Genome Res 15:269-75.
Harris, 2006, Defects can increase the melting temperature of DNA-nanoparticle assemblies, J Phys Chem B 110(33):16393-6.
Harris, 2008, Helicos True Single Molecule Sequencing (tSMS) Science 320:106-109.
Harris, 2008, Single-molecule DNA sequencing of a viral genome, Science 320(5872):106-9.
Heger, 2006, Protonation of Cresol Red in Acidic Aqueous Solutions Caused by Freezing, J Phys Chem B 110(3):1277-1287.
Heid, 1996, Real time quantitative PCR, Genome Res 6:986-994.
Hiatt, 2013, Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation, Genome Res 23:843-54.
Hodges, 2007, Genome-wide in situ exon capture for selective resequencing, Nat Genet 39(12):1522-7.
Holland, 2008, BioJava: an open-source framework for bioinformatics, Bioinformatics 24(18):2096-2097.
Homer et al., 2008, Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays. PLoS One 4(8):e1000167.
Related Publications (1)
Number Date Country
20180298440 A1 Oct 2018 US
Provisional Applications (1)
Number Date Country
61548073 Oct 2011 US
Continuations (2)
Number Date Country
Parent 14984644 Dec 2015 US
Child 15818165 US
Parent 13616788 Sep 2012 US
Child 14984644 US