A significant challenge facing testing laboratories is quality control. Some reports have indicated that mutations in cancer genes were correctly identified by only 70% of testing laboratories (Bellon, et al. External Quality Assessment for KRAS Testing Is Needed: Setup of a European Program and Report of the First Joined Regional Quality Assessment Rounds. Oncologist. 2011 April; 16(4): 467-478). Questions have been raised regarding how to monitor next generation sequencing and assays as well as the concordance of variant calls across multiple platforms, library preparation methods, and bioinformatic pipelines. Compositions and methods providing a flexible, single reagent representing specific genetic variants are desired by those of ordinary skill in the art and are described herein.
The disclosure provides compositions, controls, plasmids, cells, methods and kits comprising nucleic acid molecules.
In one embodiment, a nucleic acid molecule comprising multiple variants of a reference is disclosed. In other embodiments, a mixture or combination of nucleic acid molecules comprising variants of the reference sequence are disclosed.
In certain embodiments, the nucleic acid molecule or mixture of nucleic acid molecules comprise one or more variants present at a high or low-frequency.
In certain embodiments, the disclosure provides a control reagent comprising multiple nucleic acid molecules.
In yet another embodiment, a kit comprising at least one nucleic acid molecule or mixture of nucleic acid molecules comprising variants is disclosed
In another embodiment, a method for confirming the validity of a sequencing reaction is disclosed. The method comprises including a known number of representative sequences and/or variants thereof in a mixture comprising a test sample potentially comprising a test nucleic acid sequence, and sequencing the nucleic acids in the mixture, wherein detection of all of the representative sequences and/or variants in the mixture indicates the sequencing reaction was accurate.
The disclosure also provides a composition comprising multiple nucleic acid species wherein the nucleic acid sequence of each species differs from its neighbor species by a predetermined percentage.
In certain embodiments, a method is provided that comprises sequencing a nucleic acid species in order to calibrate a sequencing instrument.
In yet other embodiments, the disclosure provides plasmids and cells encoding the nucleic acids or mixture of nucleic acids disclosed herein.
The disclosure also provides a plasmid and/or a cell comprising multiple nucleic acid species wherein the nucleic acid sequence of each species differs from its neighbor species by a predetermined percentage.
The disclosure further provides a frequency ladder. The frequency ladder comprises a plurality of variants at different frequencies.
The disclosure also provides a method of for preparing a formalin fixed paraffin-embedded (FFPE) control, the method comprising: a) obtaining a defined concentration of cellular material; b) introducing in to the cellular material a nucleic acid molecule or mixture of nucleic acid molecules comprising multiple variants of a reference sequence or a mixture of variants with the reference sequence; c) mixing the cellular material of b) with a gelling polymer, creating a gel/cellular material; and d) adding the gel/cellular material to a mold with a defined shape until the gelling polymer solidifies.
In certain embodiments, the method is carried out with a mixture of variants, wherein the variants comprise at least one single nucleotide polymorphism (SNP), multiple nucleotide polymorphisms (MNP), insertion, deletion, copy number variation, gene fusion, duplication, inversion, repeat polymorphism, homopolymer of a reference sequence, and/or a non-human sequence.
In yet other embodiments, the method is carried out with a nucleic acid molecule or mixture of nucleic acid molecules comprising multiple variants comprises at least 30 variants. In other embodiments, the nucleic acid molecule or mixture of nucleic acid molecules used in the methods comprises a variant is related to cancer, an inherited disease, infectious disease.
The disclosure also provides for a kit comprising a formalin fixed paraffin-embedded (FFPE) control produced by the method of the invention.
Provided herein are compositions, methods, kits, plasmids, and cells comprising nucleic acid reference sequences and variants of a reference sequence. The compositions disclosed herein have a variety of uses, including but not limited to, assay optimization, validation, and calibration; peer-to-peer comparison; training and PT/EQA, QC monitoring, reagent QC, and system installation assessment.
There is a recognized need in the market for flexible, reliable control materials for NGS testing (see Assuring the Next Quality of Next-Generation Sequencing in Clinical Laboratory Practice; Next Generation Sequencing: Standardization of Clinical Testing (Nex-SToCT) Working group Principles and Guidelines, Nature Biotechnology, doi:10.1038/nbt.2403; and ACMG Clinical laboratory standards for next generation sequencing, American College of Medical Genetics and Genomics, doi: 10.1038/gim.2013.92). This disclosure provides such control materials.
This disclosure relates to control reagents representing reference sequences and/or variants thereof (e.g., mutations) that may be used for various purposes such as, for instance, assay validation/quality control in sequencing reactions (e.g., next generation sequencing (NGS) assays). Traditional metrics used to characterize the quality of a sequencing reaction include, for instance, read length, minimum quality scores, percent target-mapped reads, percent pathogen-specific reads, percent unique reads, coverage levels, uniformity, percent of non-covered targeted bases and/or real-time error rate. Parameters that may affect quality include, for instance, the types and/or number of analytes being monitored (e.g., the types and number of polymorphisms (single or multiple nucleotide polymorphisms (SNPs, MNPs)), insertions and/or deletions, amplicons, assay contexts and/or limits of detection), sample type (e.g., mammalian cells, infectious organism, sample source), commutability (e.g., validation across multiple technology platforms and/or types of screening panels being utilized), sample preparation (e.g., library preparation type/quality and/or type of sequencing reaction (e.g., run conditions, sequence context)), and/or other parameters. Those of ordinary skill in the art realize, for instance, that the quality of such reactions may vary between laboratories due to subtle differences in guidelines, the metrics and parameters mentioned above, the reference standards used, and the fact that many NGS technologies are highly complex and evolving. This disclosure provides quality control reagents that may be used in different laboratories, under different conditions, with different types of samples, and/or across various technology platforms to confirm that that assays are being carried out correctly and that results from different laboratories may be reliably compared to one another (e.g., that each is of suitable quality). In some embodiments, the problem of confirming the quality of a sequencing reaction is solved using a multiplex control comprising multiple nucleic acid fragments, each representing a different variant of a reference sequence.
In certain embodiments, a control reagent for use in sequencing reactions is provided. The control reagent may comprise one or more components that may be used alone or combined to assess the quality of a particular reaction. For instance, some assays are carried out to identify genetic variants present within a biological sample. The control reagents described herein may also provide users with the ability to compare results between laboratories, across technology platforms, and/or with different sample types. For instance, in some embodiments, the control reagent may represent a large number of low percentage (e.g., low frequency) variants of different cancer-related genes that could be used to detect many low percentage variants in a single assay and/or confirm the reliability of an assay. The control reagent could be used to generate numerous data points to compare reactions (e.g., run-to-run comparisons). The control reagent may be used to determine the reproducibility of variant detection over time across multiple variables. The control reagent may be used to assess the quality of a sequencing run (i.e., that the instrument has sufficient sensitivity to detect the included variants at the given frequencies). The control reagent may also be used to differentiate between a proficient and a non-proficient user by comparing their sequencing runs, and/or to differentiate the quality of reagents between different lots. The control reagent may also aid in assay validation studies, as many variants are combined in one sample material. This obviates the need for multiple samples containing one or two variants each, and greatly shortens the work and time required to validate the assay.
The control reagent typically comprises one or more nucleic acid (e.g., DNA, RNA, circular RNA, hairpin DNA and/or RNA) fragments containing a defined reference sequence of a reference genome (defined as chromosome and nucleotide range) and/or one or more variants of the reference sequence. The source material for the variants may be genomic DNA, synthetic DNA, and combinations thereof. A variant typically includes nucleotide sequence variations relative to the reference sequence. The variant and reference sequence typically share at least 50% or about 75-100% (e.g., any of about 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99%) sequence identity. In some embodiments, however, the identity shared may be significantly less where, for example, the variant represents a deletion or insertion mutation (either of which may be up to several kilobases or more). An exemplary deletion may be, for instance, recurrent 3.8 kb deletion involving exons 17a and 17b within the CFTR gene as described by Tang, et al. (J. Cystic Fibrosis, 12(3): 290-294 (2013) (describing a c.2988+1616_c.3367+356de13796ins62 change, flanked by a pair of perfectly inverted repeats of 32 nucleotides)). In some embodiments, variants may include at least one of a single nucleotide polymorphism (SNP), one or more multiple nucleotide polymorphism(s) (MNV), insertion(s), deletion(s), copy number variation(s), gene fusion(s), duplication(s), inversion(s), repeat polymorphism(s), homopolymer(s), non-human sequence(s), or any combination thereof. Such variants (which may include by reference any combinations) may be included in a control reagent as part of the same or different components. The reference sequence(s) and/or variants may be arranged within a control reagent as cassettes.
Cassettes contains a reference sequence or variant adjoined and/or operably linked to one or more restriction enzyme site(s), sequencing primer(s) site, and/or hairpin-forming site(s). In some embodiments, it may be useful to include different types of sequences adjacent to each cassette; for instance, it may be useful to design one cassette to be adjacent to a restriction enzyme and/or a hairpin sequence. Doing so may help prevent problems such as cross-amplification between adjacent fragments/cassettes. As such, each reference sequence and/or variant may be releasable and/or detectable separate from any other reference sequence and/or variant. The typical cassette may be about 400 bp in length but may vary between 50-20,000 bp (e.g., such as about any of 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 800, 900, 1000, 2500, 5000, 7500, 10000, 12500, 15000, 17500, or 20000 bp). Each control reagent may comprise one or more cassettes, each representing one or more reference sequence(s) and/or variant(s) (e.g., each being referred to as a “control sequence”). Each reference sequence and/or variant may be present in a control sequence and/or control reagent at percentage of about any of 0.1% to 100% (e.g., about any 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 2.5, 5, 7.5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100%). For instance, a control sequence or control reagent that is 100% reference sequence or variant would be a reagent representing only one reference sequence or variant. Similarly, a control sequence or control reagent comprising 50% of a variant would be a control reagent representing only up to two reference sequences and/or variants. The remaining percentage could consist of other sequences such as control sequences and the like.
In certain embodiments a T7 or other promoter can be present upstream of each cassette. This allows for massively parallel transcription of many gene regions. This technique facilitates construction of a control containing equivalent amounts of each target sequence. When there are equivalent amounts of many targets, ease of use of the control is increased. For example, contamination in a control in a patient sample would be easier to detect because all transcripts would show up in the contaminated sample. It is highly unlikely for patient samples to contain, e.g., a large number of fusion transcripts, such an assay result would signal the user that that a contamination issue is present. This is in contrast to a situation in which only one transcript is present at a much higher abundance in a contaminated sample—which could lead to the contaminant being mistaken for a true positive signal. The ability to construct a control with equivalent amounts of each target sequence eliminates the potential for this type of error.
In certain embodiments, the reference sequence(s) and/or variants may be adjoined and/or operably linked to one or more different restriction enzyme sites, sequencing primer site(s), and/or hairpin-forming sites. As described above, certain designs may be used to prevent problems such as cross-amplification between reference sequences and/or variants. In some embodiments, the control sequences and/or cassettes may optionally be arranged such that the same are releasable from the control reagent. This may be accomplished by, for instance, including restriction enzyme (RE) sites at either end of the control sequence. A control reagent may therefore be arranged as follows: RE site/control sequence/RE site. The RE sites may be the same and/or different from one another. The RE sites in one cassette may also be the same and/or different to those present in any other cassette. As such, the control sequences may be released from the cassette as desired by the user by treating the control sequence with one or more particular restriction enzymes.
In some embodiments, the control reagent may comprise multiple components that may be used together. In certain embodiments, the multiple components comprise a first and a second component which may be plasmids comprising different control sequences and/or different arrangements of the same control sequences. Thus, the components may represent the same or different reference sequences and/or variants. Such components may be used together as a panel, for instance, such that a variety of reference sequences and/or variants may be assayed together. Where the reference sequences and/or variants are the same, each component may include those variants in different cassette arrangements and/or forms. In some embodiments, the multiple components may comprise a first component representing one or more SNP variants and a second component representing one or more multiple nucleotide polymorphism(s), insertion(s), deletion(s), copy number variation(s), gene fusion(s), duplication(s), inversion(s), repeat polymorphism(s), homopolymer(s), and/or non-human sequence(s). The components may be the same or different types of nucleic acids such as plasmids, with each comprising the same or different variants of one or more reference sequences arranged as described herein or as may be otherwise determined to be appropriate by one of ordinary skill in the art. In some embodiments, different types of plasmids may be combined to provide a multi-component control reagent representing many different reference sequences and/or variants.
Plasmids can be quantified by any known means. In one embodiment, quantitation of each plasmid is performed using a non-human ‘xeno’ digital PCR target sequence. The exact copy number of the plasmid is determined. The exact copy number of genomic DNA is also determined (obtained by quantification of genomic target site(s)). With this information, controls can be accurately and reproducibly developed that contain all targets/variants within a tight frequency range.
The variants may be contained within the control reagent as DNA fragments, each containing a defined sequence derived from a reference genome (defined as chromosome and nucleotide range) with one or more variations (e.g., nucleotide differences) introduced into the fragment. A variant may be, for instance, a sequence having one or more nucleotide sequence differences from the defined sequence (e.g., a reference sequence). For instance, an exemplary reference sequence may comprise “hostpots” suitable for modification. Such hotspots may represent nucleotides and/or positions in a reference sequence that occur in nature (e.g., mutations observed in cancer cells). One or more of such hotspots may be modified by changing one or more nucleotides therein to produce a control sequence (or portion thereof) that may be incorporated into a control reagent. For example, modification of the exemplary epidermal growth factor receptor (EGFR) Ex19 reference sequence to produce control sequences (Hotspots 1, 2, 3, 4, 5) is shown below (see also,
Wild Type (e.g., EGFR Ex19) CCAAGCTC (SEQ ID NO: 1) . . . AGGATCTTGA (SEQ ID NO: 2) . . . AACTGAATTC (SEQ ID NO: 3) . . . AAAAAG (SEQ ID NO: 4) . . . ATCAAAGTGC (SEQ ID NO: 5) (400 bp)
Hotspot ID 1 CCAATCTC (SEQ ID NO: 6) . . . AGGATCTTGA (SEQ ID NO: 2) . . . AACTGAATTC (SEQ ID NO: 3) . . . AAAAAG (SEQ ID NO: 4) . . . ATCAAAGTGC (SEQ ID NO: 5)
Control Sequence Contains Multiple Hotspots CCAATCTC (SEQ ID NO: 6; HOTSPOT ID 1) . . . AGGAACTTGA (SEQ ID NO: 7; HOTSPOT ID 2) . . . AACTCAATTC (SEQ ID NO: 8; HOTSPOT ID 3) . . . ATAAAG (SEQ ID NO: 9; HOTSPOT ID 4) . . . ATGAAAGTGC (SEQ ID NO: 10; HOTSPOT ID 5). This exemplary control sequence thereby represents multiple EGFR variants (e.g., Hotspot IDs 1, 2, 3, 4, 5, etc.) A control reagent may comprise multiple control sequences, each representing one or more variants of the same or different reference sequences. Any number of variants may be represented by a control sequence, and any number of control sequences may be included in a control reagent. A control reagent may comprise, for instance, a number of variants such that the all possible variants of a particular reference sequence are represented by a single control reagent. For instance, the control reagent may comprise multiple SNPs, MNPs, deletions, insertions and the like, each representing a different variant of the reference sequence. Additional, exemplary, non-limiting variants are shown in Tables 1 A and 1B and Table 6.
Control reagents may also be designed to represent multiple types of control sequences. For instance, control reagents may be designed that represent multiple types of reference sequences and/or variants thereof (which may be found in control sequences alone or in combination). Exemplary categories of control sequences for which the control reagents described herein could have relevance include not only the aforementioned cancer-related areas but also fields of inherited disease, microbiology (e.g., with respect to antibiotic resistance mutations, immune-escape related mutations), agriculture (e.g., plant microbe and/or drug resistance-related mutations), livestock (e.g., mutations related to particular livestock traits), food and water testing, and other areas. Exemplary combinations (e.g., panels) of cancer-related reference sequences that may be represented by a particular control reagent (or combinations thereof) are shown in Table 2.
The control reagents and methods for using the same described herein may provide consistent control materials for training, proficiency testing and quality control monitoring. For instance, the control reagents may be used to confirm that an assay is functioning properly by including a specific number of representative sequences and/or variants thereof that should be detected in an assay and then calculating the number that were actually detected. This is exemplified by the data presented in Table 3:
As illustrated in Table 3, a “bad run” is identified where the number of variants detected does not match the number of variants expected to be detected (e.g., included in the assay). As shown in the exemplary assay of Table 3, if a particular control reagent (or combination thereof) used in an assay includes 15 representative sequences and/or variants thereof, all 15 should be detected if the assay is properly carried out. If less than 15 of these control sequences are not detected, the assay is identified as inaccurate (e.g., a “Bad Run”). If all 15 of the sequences are detected, the assay is identified as accurate (e.g., a “Good Run”). Variations of this concept are also contemplated herein, as would be understood by those of ordinary skill in the art.
In certain embodiments, the control reagent may be prepared by mixing variant DNA fragments (e.g., as may be incorporated into a plasmid) with genomic DNA or synthesized DNA comprising “wild-type” (e.g., non-variant) sequence. Such sequence may be obtained from or present in control cells (e.g., naturally occurring or engineered/cultured cell lines). In some embodiments, the wild-type sequence may be included on a DNA fragment along with the variant sequence, or the variant sequences may be transfected into and/or mixed with cells (e.g., control cells). In certain embodiments, such mixtures may be used to prepare formalin-fixed, paraffin-embedded (FFPE) samples (e.g., control FFPE samples), for example. For instance, in some embodiments, the control reagent may be prepared and tested by designing a control sequence (e.g., an amplicon) comprising a representative sequence and/or variant thereof; designing restriction sites to surround each amplicon; synthesizing a nucleic acid molecule comprising a cassette comprising the amplicon and the restriction sites; and, incorporating the cassette into a plasmid backbone. The construct may then be tested by sequencing it alone (e.g, providing an expected frequency of 100%) or after mixing the same with, for example, genomic DNA at particular expected frequencies (e.g., 50%). Such constructs may also be mixed with cells for various uses, including as FFPE controls.
In certain embodiments, the control reagents described herein can also be used to provide a frequency ladder. A frequency ladder is composed of many variants at different frequencies. In some embodiments, the control reagent could be used to provide an “ladder” in, for example, 5% increments of abundance (e.g., about any of 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100% abundance). For example, the ladder could be constructed by taking a single sample with many different variants present at high (e.g., 80% allele frequency) and making dilutions down to low frequencies. Alternatively, the ladder could be a single sample containing variants at different frequencies. The ladder could be used as a reference for many sample types, including somatic variants at low abundance (e.g., tumor single nucleotide polymorphisms), or germline variants present at, as a non-limiting example, about 50% abundance. Such a ladder may also be used to determine instrument limits of detection for many different variants at the same time. This saves users time in finding materials containing one to a few variants and resources for testing because all variants are present in a single sample rather than many. An example is provided in Table 8:
As shown in Table 8, a ladder was constructed by diluting a sample containing 555 variants starting at approximately 50% frequency down to ˜3% frequency. The ladder was tested in duplicate using the Ion AMPLISEQ® Cancer Hotspot Panel v2 using the Ion Torrent PERSONAL GENOME MACHINE® (PGM). The frequencies for 35 of the variants are reported for each sample tested. The shaded cells indicate that the variant was not detected. Such data could be used to establish the limit of detection for each variant.
The ladder could be used across many platforms, including Sanger sequencing and next generation platforms, and both RUO and IVD applications could benefit from use of this standard. The frequency ladder could also serve as internal controls in sequencing reactions, much like the 1 kb DNA ladder serves as a reference in almost every agarose gel. As an example, one design would provide five unique and five identical sequences as shown in
One of ordinary skill in the art would understand that the control reagents described herein are broadly useful in a variety of sequencing systems and/or platforms. For instance, the control reagents described herein may be used in any type of sequencing procedure including but not limited to Ion Torrent semiconductor sequencing, Illumina MISEQ®, capillary electrophoresis, microsphere-based systems (e.g., Luminex), Roche 454 system, DNA replication-based systems (e.g., SMRT by Pacific Biosciences), nanoball- and/or probe-anchor ligation-based systems (Complete Genomics), nanopore-based systems and/or any other suitable system.
One of ordinary skill in the art would also understand that the control reagents described herein are broadly useful in a variety of nucleic acid amplification-based systems and/or platforms. The control reagents described herein may used in and/or with any in vitro system for multiplying the copies of a target sequence of nucleic acid, as may be ascertained by one of ordinary skill in the art. Such systems may include, for instance, linear, logarithmic, and/or any other amplification method including both polymerase-mediated amplification reactions (such as polymerase chain reaction (PCR), helicase-dependent amplification (HDA), recombinase-polymerase amplification (RPA), and rolling chain amplification (RCA)), as well as ligase-mediated amplification reactions (such as ligase detection reaction (LDR), ligase chain reaction (LCR), and gap-versions of each), and combinations of nucleic acid amplification reactions such as LDR and PCR (see, for example, U.S. Pat. No. 6,797,470). Such systems and/or platforms may therefore include, for instance, PCR (U.S. Pat. Nos. 4,683,202; 4,683,195; 4,965,188; and/or 5,035,996), isothermal procedures (using one or more RNA polymerases (see, e.g., PCT Publication No. WO 2006/081222)), strand displacement (see, e.g., U.S. Pat. No. RE39007E), partial destruction of primer molecules (see, e.g., PCT Publication No. WO 2006/087574)), ligase chain reaction (LCR) (see, e.g., Wu, et al., Genomics 4: 560-569 (1990)), and/or Barany, et al. Proc. Natl. Acad. Sci. USA 88:189-193 (1991)), Qβ RNA replicase systems (see, e.g., PCT Publication No. WO 1994/016108), RNA transcription-based systems (e.g., TAS, 3SR), rolling circle amplification (RCA) (see, e.g., U.S. Pat. No. 5,854,033; U.S. Patent Application Publication No. 2004/265897; Lizardi et al. Nat. Genet. 19: 225-232 (1998); and/or Banér et al. Nucleic Acid Res., 26: 5073-5078 (1998)), and/or strand displacement amplification (SDA) (Little, et al. Clin. Chem. 45:777-784 (1999)), among others. These systems, along with the many other systems available to the skilled artisan, may be suitable for use with the control reagents described herein.
In one embodiment, a control reagent may be designed and tested using one or more of the steps below:
In certain embodiment, individual cassettes can be synthesized for all genes of interest and combined with wild type. In certain embodiments, a cassette can be designed with a plurality of variants, which do not interfere with the detection of variants near or adjacent thereto.
In some embodiments, NGS may be performed using the Ion Personal Genome Machine (PGM) by first constructing libraries following the user manuals for the Ion AMPLISEQ® Library Preparation Manual with AMPLISEQ® Cancer Hotspot Panel v2 reagents; preparing template-positive Ion sphere particles (ISPs) and enriching the same using the Ion OneTouch2 instrument following the Ion PGM Template OT2 200 Kit Manual; sequencing using the Ion PGM Sequencing 200 Kit v2 Manual or Sequencing on the Illumina MISEQ® following the TRUSEQ® Amplicon Cancer Panel user manual or the Illumina MiSeq® user manual; and, performing data analysis for PGM using the Torrent Variant Caller v3.4 and v3.6, and for MISEQ® using the MISEQ® Reporter v2.3).
The reagents and methods described herein may be used in a variety of settings with a variety of samples. For instance, these reagents and methods may be used to analyze biological samples such as serum, whole blood, saliva, tissue, urine, dried blood on filter paper (e.g, for newborn screening), nasal samples, stool samples or the like obtained from a patient and/or preparations thereof (e.g., FFPE preparations). In some embodiments, control preparations comprising the control reagents described herein may be provided.
This disclosure further relates to kits comprising one or more control reagents described herein. The kits may be used to carry out the methods described herein or others available to those of orindary skill in the art along with, optionally, instructions for use. A kit may include, for instance, control sequence(s) including multiple reference sequences and/or variations thereof in the form of, for instance, one or more plasmids. In some embodiments, the kit may contain a combination of control sequences organized to provide controls for many variations of one or more reference sequences. In some embodiments, the variations may relate to an oncogene that is diagnostic for a particular cancer. In some embodiments, for instance, the kit may comprise control reagents and/or control samples (e.g., tissue samples) known to cover the breadth of mutations known for a particular cancer. In some embodiments, the variations of the marker are variations of a mutation in a gene that are prognostic for the usefulness of treating with a drug. In some embodiments, the marker or markers are for a particular disease and/or a variety of diseases (e.g., cancer, infectious disease). In some embodiments, the control reagent(s) may be included in a test to ascertain the efficacy of a drug in testing for the presence of a disease and/or progression thereof. In some embodiments, the kit may comprise control reagents for testing for a series of diseases that have common characteristics and/or symptoms (e.g., related diseases). In some embodiments, the marker may have unknown significance but may otherwise be of interest to the user (e.g., for basic research purposes). The kit may also include a container (e.g., vial, test tube, flask, bottle, syringe or other packaging system (e.g., include injection or blow-molded plastic containers) into which one or more control reagents may be placed/contained, and in some embodiments, aliquoted). Where more than one component is included in the kit, it will generally include at least one second, third or other additional container into which the additional components can be separately placed. Various combinations of components may also be packaged in a single container. The kits may also include reagent containers in close confinement for commercial sale. When the components of the kit are provided in one and/or more liquid solutions, the liquid solution comprises an aqueous solution that may be a sterile aqueous solution. As mentioned above, the kit may also include instructions for employing the kit components as well as the use of any other reagent not included in the kit. Instructions may include variations that may optionally be implemented. The instructions may be provided as a separate part of the kit (e.g., a paper or plastic insert or attachment) or as an internet-based application. In some embodiments, the kit may control reagents relating to between any number of reference sequences and/or variants thereof which may be detected alone or in combination with one another (e.g., a multiplex assay). In some embodiments, the kit may also comprise at least one other sample containing a defined amount of control reagent and “control” test cell admixed such that the same may provide a reference point for the user. Kits may further comprise one or more of a polymerase and/or one or more oligonucleotide primers. Other variations and arrangements for the kits of this disclosure are contemplated as would be understood by those of ordinary skill in the art.
Thus, in some embodiments, the disclosure provides a nucleic acid molecule or mixture of nucleic acid molecules comprising multiple variants of a reference sequence, each variant sequence may optionally be releasable from the nucleic acid molecule. In certain embodiments, the nucleic acid molecule or mixture of nucleic acid molecules comprises variants releasable from the nucleic acid molecule using a restriction enzyme.
In some embodiments, the nucleic acid molecule or mixture of nucleic acid molecules comprises at least one single nucleotide polymorphism (SNP), multiple nucleotide polymorphisms (MNP), insertion, deletion, copy number variation, gene fusion, duplication, inversion, repeat polymorphism, homopolymer of a reference sequence, and/or a non-human sequence. In some embodiments, the nucleic acid molecule or mixture of nucleic acid molecules comprises at least 5 variants. In certain embodiments, at least 15, 20, 30, 50, 100, 200, 300 400, 700, 1000 variants are present. In yet other embodiments, greater than 1000 variants are present. In some embodiments, each variant is present (e.g., in the sample being tested) at a high or low-frequency. For instance, in certain embodiments, each variant may be present at a frequency of 1%, 5%, 10%, 15%, 20%, 30%, 40% or 50% or more. In other embodiments, each variant may be present at a frequency of less than 50%, less than 40%, less than 20%, less than 15%, less than 10%, less than 5%, less than 3%, less than 1%, less than 0.5%, less than 0.1%, and any integer in between.
An advantage of the disclosed control materials is that the “truth” of a sample is known. There are currently no reference materials for which absolute frequency (i.e, the truth) is known, that is, the actual frequency of a given variant or combination of variants present are not known. In contrast, in the disclosed control materials, the actual frequency of variants is known.
Attendant to the teachings of this disclosure, standardized control materials for next generation sequencing (NGS) assays can be produced. Issues such as variant call differences between sites, variability of reagents across instruments, variation introduced by diverse bioinformatics pipelines and filters, run-to-run and lab-to-lab variability can be identified and resolved and/or obviated utilizing the control materials.
A further advantage is that the control materials disclosed herein can comprise any number and type of variants, including insertions and deletions of differing lengths, large numbers of SNPs, etc. No other control material exists that provide such diversity.
The variants can be any of interest. There is no limit provided herein with respect to the type and number of variants that can be utilized in the current disclosure.
In certain embodiments, modified nucleotides can be utilized as variants. In certain embodiments, methylation can be detected. For example, CpG methylation can be utilized as a biomarker variant.
This disclosure also provides reagents and methods for confirming the validity of a sequencing reaction by including a known number of representative sequences and/or variants thereof in a mixture comprising a test sample potentially comprising a test nucleic acid sequence and sequencing the nucleic acids in the mixture, wherein detection of all of the representative sequences and/or variants in the mixture indicates the sequencing reaction was accurate. The representative sequences and/or variants may be of the type described herein. Compositions comprising the same are also provided. The pre-determined percentage may be, for instance, about 1, 5 or 10%. And each species may be from, for instance, 20-500 nucleotides. Each species may comprise a homopolymer sequence of at least 3 nucleotides. The nucleic acids may be DNA. Each species may possess a nucleic acid barcode that may be unique to each species. The nucleic acid species described herein may be used to calibrate a sequencing instrument, for instance. Kits comprising such species, optionally further comprising one or more polymerases and/or one or more oligonucleotide primers are also provided. Plasmids and/or cells comprising multiple nucleic acid species wherein the nucleic acid sequence of each species differs from its neighbor species by a predetermined percentage are also provided.
It is to be understood that the descriptions of this disclosure are exemplary and explanatory only and are not intended to limit the scope of the current teachings. In this application, the use of the singular includes the plural unless specifically stated otherwise. Also, the use of “comprise”, “contain”, and “include”, or modifications of those root words, for example but not limited to, “comprises”, “contained”, and “including”, are not intended to be limiting. Use of “or” means “and/or” unless stated otherwise. The term “and/or” means that the terms before and after can be taken together or separately. For illustration purposes, but not as a limitation, “X and/or Y” can mean “X” or “Y” or “X and Y”. Whenever a range of values is provided herein, the range is meant to include the starting value and the ending value and any value or value range therebetween unless otherwise specifically stated. For example, “from 0.2 to 0.5” may mean 0.2, 0.3, 0.4, and 0.5; ranges therebetween such as 0.2-0.3, 0.3-0.4, 0.2-0.4; increments there between such as 0.25, 0.35, 0.225, 0.335, 0.49; increment ranges there between such as 0.26-0.39; and the like. The term “about” or “approximately” may refer the ordinary meaning of the term but may also indicate a value or values within about any of 1-10 percent of the listed value.
The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described in any way. All literature and similar materials cited in this application including, but not limited to, patents, patent applications, articles, books, treatises, and internet web pages, regardless of the format of such literature and similar materials, are expressly incorporated by reference in their entirety for any purpose. In the event that one or more of the incorporated literature and similar materials defines or uses a term in such a way that it contradicts that term's definition in this application, this application controls. While the present teachings are described in conjunction with various embodiments, it is not intended that the present teachings be limited to such embodiments. On the contrary, the present teachings encompass various alternatives, modifications, and equivalents, as will be appreciated by those of skill in the art. Certain embodiments are further described in the following examples. These embodiments are provided as examples only and are not intended to limit the scope of the claims in any way.
Aspects of this disclosure may be further understood in light of the following examples, which should not be construed as limiting the scope of the disclosure in any way.
An exemplary control reagent was prepared and tested as described below:
a) amplicons were designed comprising the fragments shown in Tables 1-3;
b) genomic sequences were selected to encompass each amplicon (the selected genomic sequences being the chromosome and nucleotide positions of the reference genome corresponding to the 5′ nucleotide of the forward and reverse primers for each amplicon and all the sequence between these two nucleotides);
c) a cassette was designed comprising an ˜400 bp EGFR sequence comprising the amplicon surrounded by (e.g., 5′ and 3′) the genomic sequence identified in step b) (the reference sequence is added in roughly equally amounts to each end of the region defined in step b) to comprise a ˜400 bp region);
d) restriction enzyme and other sites were designed to each cassette prepared in step c) (e.g., where one version may additionally include sequences that create a hairpin when the DNA is single-stranded; the restriction enzymes being chosen such that the sequences of interest are not digested but simply released from the control reagent) as shown below:
EGFR_1-ClaI-EGFR_2-HindIII-EGFR_3-SmaI-EGFR_4-XhoI-EGFR_5-NotI-EGFR_6/7-EGFR_8
**EGFR_1, etc. represent EGFR variants; restriction enzyme sites for ClaI, HindIII, SmaI, XhoI and Not I enzymes were positioned between variants.
EGFR_4-HP(7)-ClaI-EGFR_5-HP(7)-HindIII-EGFR_6/7-HP(9)-SmaI-EGFR_8
***Hairpin 7 (HP(7)): GGGGGGGTTTTCCCCCCC (SEQ ID NO: 11); HindIII=HindIII RE site;
Hairpin 9 (HP(9)): GGGGGGGGGAACCCCCCCCC (SEQ ID NO: 12); SmaI=SmaI RE site
e) the cassette of step d) was incorporated into a common vector (pUC57) (e.g., plasmid V1) by automated synthesis of oligonucleotides on solid-phase synthesizers followed by ligation of overlapping oligonucleotides;
f) a second plasmid (e.g., “plasmid V2”) comprising multiple fragments of the gene of interest (and/or variants thereof) with a hairpin structure and a restriction site between each region (e.g., as in exemplary construct EGFR V2 above and Table 4) was also prepared by automated synthesis of oligonucleotides on solid-phase synthesizers followed by ligation of overlapping oligonucleotides;
g) the variant sequences (Tables 4-6) contained within plasmids V1 and/or V2 were then linearized HindIII;
h) the variants were then mixed with genomic DNA (e.g., wild-type gDNA) at a particular expected variant frequency (e.g., approximately 50%) (plasmid DNA and human embryonic kidney (HEK-293) genomic DNA were quantified using a fluorometer (QUBIT®) to determine the concentration; plasmid and genomic DNA were then mixed together to obtain a 1:1 molecular ratio (50% variant frequency));
i) the “variant sequences” were then tested alone to provide an expected variant frequency of 100%) to confirm sequencing; and,
j) variants of step h) were detected by NGS using the Ion Personal Genome Machine (PGM) and Illumina MiSeq (results are presented in Table 7).
The results of monitoring assays using FFPE-embedded controls are presented in
Each embodiment disclosed herein may be used or otherwise combined with any of the other embodiments disclosed. Any element of any embodiment may be used in any embodiment. Although the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the true spirit and scope of the invention. In addition, modification may be made without departing from the essential teachings of the invention.
A control sample was constructed that contained 555 variants from 53 different genes and tested with the Ion AMPLISEQ® Cancer Hotspot Panel v2 (CHPv2), TRUSEQ® Amplicon Cancer Panel (TSACP) and the TRUSIGHT® Tumor Panel. For each panel, two lots of the AcroMetrix® Oncology Hotspot Control were tested in duplicate, in at least two sites. Additional sites only tested one of the lots at least twice or both lots once. Sources of variation between sites may include different instruments, operators and general workflows. Also, variation in bioinformatics pipelines may have contributed significantly to variation in performance results.
To assess the detection of specific variants across different panels, twenty-two clinically-relevant variants that were targeted by three panels were selected.
Performance of the control material comprising 555 variants is shown in Table 9, wherein SNV (single nucleotide variant), MNV (multiple nucleotide variant), DEL (deletion), INS (insertion), for CHPv2 (AMPLISEQ® Cancer Hotspot Panel v2), TSACP (TRUSEQ® Amplicon cancer panel), and TSTP (TRUSIGHT® tumor panel) are shown. A variant was considered to be covered by the test method if the variant was positioned between the upstream and downstream primers. A variant was considered detected if it was detected in at least one run of the control. Sanger sequencing was performed on the synthetic DNA prior to dilution with genomic DNA. Variants detected in the genomic DNA were confirmed using publicly available whole genome sequencing information for GM24385.
The control materials provided herein can be used for rapid cell line generation by transiently transfecting plasmids and/or RNA into cells and incorporating such cells into a formalin-fixed paraffin-embedded (FFPE) block for use as a control. Methods for generating FFPE control are provided in US Patent Application Publication No. 2014/0335533 which is incorporated herein by reference in its entirety for all purposes. Accordingly, FFPE material was generated by directly introducing nucleic acids into cells after cell growth and processed into FFPE material. This reduces the time to generate a mutant cell material from 7 months to 1 day, representing significant time and cost savings. Also, by introducing nucleic acid after cell growth, many toxic combinations that can inhibit cell growth or lead to cell death can be avoided. This also simplifies the process of growing and storing cells as one cell line can accommodate hundreds of mutations versus the 10+ engineered cell lines that would be required for the same number of mutations. The reagents and methods provided herein allow for the generation of, for example, a single cell containing one or more predetermined nucleic acid sequences containing one or more predetermined mutations. The reagents and methods provided herein permit the generation of any cell line containing an unlimited number of plasmids or RNA transcripts. Further, the reagents and methods provided herein do not require the integration of non-native nucleic acids into the genome of an engineered cell line.
This method has been demonstrated to be feasible by transfecting either DNA or RNA into human embryonic kidney (HEK 293) cells. For the DNA study, non-growing HEK 293 cells were transfected with eight (8) different DNA fragments simultaneously, each about 6-14 kb long and containing approximately 50 different mutations each. Lipofectamine 2000 was used for transfection. The cells were subsequently mixed with a polymer and processed into FFPE material. DNA from the FFPE material was extracted and was tested using the Ion Torrent AmpliSeq Cancer Hotspot Panel v2. Over 300 hotspot variants were detected from sequencing. Table 10 and Table 11 provide data showing the results of the DNA transfection method. It is understood that methods provided herein can be used with any technique suitable for transferring nucleic acids in to a cell. In general, a transfection reagent is a compound or compounds that bind(s) to or complex(es) with oligonucleotides and polynucleotides, and mediates their entry into cells. The transfection reagent also mediates the binding and internalization of oligonucleotides and polynucleotides into cells. Examples of transfection reagents include cationic liposomes and lipids, polyamines, calcium phosphate precipitates, histone proteins, polyethylenimine, and polylysine complexes. It has been shown that cationic proteins like histones and protamines, or synthetic polymers like polylysine, polyarginine, polyornithine, DEAE dextran, polybrene, and polyethylenimine may be effective intracellular delivery agents, while small polycations like spermine are ineffective. Typically, the transfection reagent has a net positive charge that binds to the oligonucleotide's or polynucleotide's negative charge. The transfection reagent mediates binding of oligonucleotides and polynucleotides to cells or via ligands that bind to receptors in the cell. For example, cationic liposomes or polylysine complexes have net positive charges that enable them to bind to DNA or RNA. Polyethylenimine, which facilitates gene transfer without additional treatments, probably disrupts endosomal function itself. Other vehicles are also used, in the prior art, to transfer genes into cells. These include complexing the nucleic acids on particles that are then accelerated into the cell. This is termed “biolistic” or “gun” techniques. Other methods include electroporation, microinjection, liposome fusion, protoplast fusion, viral infection, and iontophoresis.
In addition, to assess whether the Fast FFPE method produced fragmented DNA as expected for a typical FFPE material, qPCR assays that amplify different lengths of DNA were used to compare the FFPE DNA to intact plasmid DNA. This study demonstrated that the FFPE DNA was more fragmented than the plasmids.
For the RNA study, two different EML4-ALK in-vitro fusion gene RNA transcripts were generated and transfected into non-growing HEK 293 using Lipofectamine 2000. The cells were subsequently processed into FFPE material. RNA from the FFPE material was extracted and tested using two qPCR assays that specifically amplify the EML4-ALK fusion. The FFPE material was positive for both transcripts. Table 12 provides data indicating that RNA transcripts of EML4-ALK fusions are detectable following transfection.
These reagents and methods provided herein demonstrate that FFPE material containing hundreds of different DNA or RNA mutations can be created by a single transfection and that the nucleic acid extracted from such materials shows aspects of true FFPE material.
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
One or more variants of each of these reference sequences may also be represented in each control sequence and/or control reagent. In some embodiments, for instance, multiple variants may be included for each reference sequence. Panels of reference sequences may also be designed to represent particular metabolic, genetic information processing, environmental information processing, cellular process, organismal system, disease, drug development, or other pathways (e.g., KEGG pathways (http://www.genome.jp/kegg/pathway.html, Nov. 8, 2013)). Control reagents such as these may be assayed separately or combined into a single assay. The control reagents may also be designed to include various amounts of each reference sequences and/or variants thereof.
This application claims priority to U.S. Provisional Patent Application No. 62/232,261, filed Sep. 24, 2016, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62232261 | Sep 2015 | US |