The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jul. 10, 2019, is named 920006-295629_SL.txt and is 777,782 bytes in size.
The invention relates to control compositions for sequencing and chemical analyses. More particularly, the invention relates to control compositions for sequencing and chemical analyses having at least one barcode sequence fragment and at least one universal sequence fragment, and to methods of their use.
Sequencing controls are needed that can be used starting after the extraction step (e.g., by spiking the extract with the control constructs) or in every step of analysis of an unknown test sample (e.g., from nucleic acid extraction to nucleic acid purification to library preparation and sequencing). Sample swapping or sample-to-sample contamination can occur during any of these steps, but without a priori knowledge of what is in the sample, one may not know if the samples were contaminated or just contained similar genetic profiles. Also, sequencing controls that can be used both for 1) detection of sample swapping and sample-to-sample contamination, and 2) quantitation are needed.
For quantitation, metagenomic communities are currently analyzed by determining the relative abundance of 16S genes or unique k-mers that can differentiate microbial species and strains. However, the methods used to process the samples can influence the relative abundance of the community members. For example, during DNA extraction, the chemical or physical lysis process can bias the analysis due to different lysis efficiencies for different microbial membranes or cell wall compositions (e.g., fungi typically are underrepresented in metagenomes due to lysis resistance). After DNA extraction, the library preparation method can also add additional bias. As an example, amplification of library molecules relies on polymerases which can bias results towards fifty percent GC content fragments or shorter fragments versus longer molecules, as polymerases tend to amplify shorter fragments and lower GC content or balanced molecules faster than molecules with high GC content.
Analytical chemistry analysis of unknown materials can be confounded by identification of compounds that do not seem to fit with what is expected. These unexpected compounds could be the result of a cross contamination event or may actually be present in the sample. Therefore, spike-in cross contamination and sample swapping controls are also needed for analytical chemistry analyses.
The present invention provides sequencing controls that can be used starting after the extraction step (e.g., by spiking the extract with the control constructs) or in every step of analysis of an unknown test sample (e.g., from nucleic acid extraction to nucleic acid purification to library preparation and sequencing). In one embodiment, nucleic acid constructs comprising a barcode sequence fragment are provided that can be encapsulated in a simulated cell membrane (e.g., a simulated bacterial cell membrane or eukaryotic cell membrane), or embedded directly in the genome of an organism for use as spike-in sequencing controls. In one aspect, the barcode sequence fragment comprises a unique sequence not present in any known genome. In one embodiment, the sequencing controls can be spiked into the unknown test sample prior to or after nucleic acid extraction and then can be detected in the final sequenced samples. In another embodiment, different nucleic acid constructs (i.e., with different barcode sequence fragments) can be spiked into different samples so that cross-contamination of samples or sample swapping can be detected.
In one embodiment, the barcode sequence fragment can be flanked by universal sequence fragments. The universal sequence fragments can add length to the nucleic acid construct and can serve as markers for bioinformatic analysis to identify the beginning and end of the barcode sequence fragment after sequencing. In another illustrative aspect, the barcode sequence fragment may be flanked by primer binding site sequence fragments (i.e., directly or indirectly linked to the barcode sequence fragment) so that the nucleic acid construct comprising the barcode sequence fragment can be amplified during an amplicon sequencing protocol. In another embodiment, primer binding site sequence fragments may be lacking for use of the sequencing controls in whole genome sequencing protocols. In another embodiment, a set of different nucleic acid construct spike-ins with different barcode sequence fragments (e.g., 384 or 96 different barcode sequence fragments) can be used to allow for multiplexing of samples on one sequencing run.
In various embodiments, samples with microorganisms containing nucleic acids (e.g., DNA), or samples with other sources of nucleic acids, may be analyzed by sequencing using the control compositions for sequencing described herein. The samples can be, for example, selected from the group consisting of urine, nasal secretions, nasal washes, inner ear fluids, bronchial lavages, bronchial washes, alveolar lavages, spinal fluid, bone marrow aspirates, sputum, pleural fluids, synovial fluids, pericardial fluids, peritoneal fluids, saliva, tears, gastric secretions, stool, reproductive tract secretions, lymph fluid, whole blood, serum, plasma, a tissue sample, a soil sample, a water sample, a food sample, an air sample, a plant sample, an industrial waste sample, a surface wipe sample, a dust sample, a hair sample, and an animal sample.
In another embodiment, a method is provided for the use of spike-in controls that simultaneously 1) control for cross-contamination and/or sample swapping and 2) allow for quantitation while controlling for different GC content samples (e.g., low, balanced, and high GC content) and/or for different lysis efficiencies. In one aspect, barcoded DNA molecules are produced with different GC contents, using GC content fragments, wherein the barcode sequence fragments and the GC content fragments are flanked by universal sequence fragments, and then the nucleic acid construct is encapsulated in a simulated cell membrane. By using the same type of nucleic acid construct, but with different barcode sequence fragments, different quantities of the encapsulated nucleic acid construct can be spiked-in, and a standard curve for quantitation can be produced. In this embodiment, the barcode sequence fragments can be used to verify that no cross-contamination or sample swapping occurred during sample preparation or processing. Also in this quantitation embodiment, the different GC content fragments (e.g., low, balanced, and high GC content) have the same barcode sequence fragment at each GC percentage (e.g., low, balanced, and high GC content), but at each separate concentration of the nucleic acid construct used to produce the standard curve, the barcode sequence fragments are unique to each concentration used to produce the standard curve. In this embodiment, the encapsulation method can also be varied to control for different resistances to lysis to mimic, for example, Gram positive, Gram negative, and fungal cell walls. In this encapsulation embodiment, the type of encapsulation method can be correlated to a unique barcode sequence fragment in the nucleic acid construct to enable differentiation post sequencing.
The present invention also provides spike-in cross-contamination and sample swapping controls for analytical chemistry analysis of unknown materials. These controls can be used in analytical chemistry procedures, such as mass spectrometry.
The following clauses, and combinations thereof, provide various additional illustrative aspects of the invention described herein. The various embodiments described in any other section of this patent application, including the section titled “DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS” and the “EXAMPLES” are applicable to any of the following embodiments of the invention described in the numbered clauses below.
The present invention provides sequencing controls that can be used starting after the extraction step (e.g., by spiking the extract with the control constructs) or in every step of analysis of an unknown test sample (e.g., from nucleic acid extraction to nucleic acid purification to library preparation and sequencing). In one embodiment, nucleic acid constructs comprising a barcode sequence fragment are provided that can be encapsulated in a simulated cell membrane (e.g., a simulated bacterial cell membrane or eukaryotic cell membrane), or embedded directly in the genome of an organism for use as spike-in sequencing controls. In one aspect, the barcode sequence fragment comprises a unique sequence not present in any known genome. In one embodiment, the sequencing controls can be spiked in the unknown test sample prior to or after nucleic acid extraction and then can be detected in the final sequenced samples. In another embodiment, different nucleic acid constructs (i.e., with different barcode sequence fragments) can be spiked in different samples so that cross-contamination of samples or sample swapping can be detected.
In one embodiment, the barcode sequence fragment can be flanked at its 5′ or 3′ end, or both, by universal sequence fragments. The universal sequence fragments can add length to the nucleic acid construct and can serve as markers for bioinformatic analysis to identify the beginning and end of the barcode sequence fragment after sequencing. In another illustrative aspect, the barcode sequence fragment may be flanked by primer binding site sequence fragments (i.e., directly or indirectly linked to the barcode sequence fragment) so that the nucleic acid construct comprising the barcode sequence fragment can be amplified during an amplicon sequencing protocol. In another embodiment, primer binding site sequence fragments may be lacking for use of the sequencing controls in whole genome sequencing protocols. In another embodiment, a set of different nucleic acid construct spike-ins with different barcode sequence fragments (e.g., 384 or 96 different barcode sequence fragments) can be used to allow for multiplexing of samples on one sequencing run.
In various embodiments, samples with microorganisms containing nucleic acids (e.g., DNA), or samples with other sources of nucleic acids, may be analyzed by sequencing using the control compositions for sequencing described herein. The samples can be, for example, selected from the group consisting of urine, nasal secretions, nasal washes, inner ear fluids, bronchial lavages, bronchial washes, alveolar lavages, spinal fluid, bone marrow aspirates, sputum, pleural fluids, synovial fluids, pericardial fluids, peritoneal fluids, saliva, tears, gastric secretions, stool, reproductive tract secretions, lymph fluid, whole blood, serum, plasma, hair, a tissue sample, a soil sample, a water sample, a food sample, an air sample, a plant sample, an industrial waste sample, a surface wipe sample, and an animal sample.
In another embodiment, compositions and methods are provided for the use of spike-in controls that simultaneously 1) control for cross-contamination and/or sample swapping and 2) allow for quantitation while controlling for different GC content samples (e.g., low, balanced, and high GC content) and/or for different lysis efficiencies. In one aspect, barcoded DNA molecules are produced with different GC contents, using GC content fragments, wherein barcode sequence fragments and GC content fragments are flanked by universal sequence fragments, and then the nucleic acid construct can be encapsulated in a simulated cell membrane. By using the same type of nucleic acid construct, but with different barcode sequence fragments, different quantities of the encapsulated or unencapsulated nucleic acid construct can be spiked-in, and a standard curve for quantitation can be produced. In this embodiment, the barcode sequence fragments can be used to verify that no cross-contamination or sample swapping occurred during sample preparation or processing. In this quantitation embodiment, the different GC content fragments (e.g., low, balanced, and high GC content) have the same barcode sequence fragment at each GC percentage (e.g., low, balanced, and high GC content), but at each separate concentration of the nucleic acid construct used to produce the standard curve, the barcode sequence fragments are unique to each concentration used to produce the standard curve. In this embodiment, the encapsulation method can also be varied to control for different resistances to lysis to mimic, for example, Gram-positive bacterial cell walls, Gram-negative bacterial cell walls, and fungal cell walls. In this encapsulation embodiment, the type of encapsulation method can be correlated to a unique barcode sequence fragment in the nucleic acid construct to enable differentiation post sequencing.
In one embodiment, the nucleic acid construct can be constructed (5′ to 3′) with a universal sequence fragment, a unique barcode sequence fragment, a GC content fragment (e.g., with high, balanced, or low GC content), and a second universal sequence fragment. In this embodiment, the unique barcode sequence fragment is a sequence that is not present in any known genome. An exemplary GC content fragment can contain about 60 to about 100 percent GC content for high GC content, about 40 to about 60 percent GC content for balanced GC content, and about 1 to about 40 percent GC content for low GC content. In this embodiment, the universal sequence fragments can add length to the nucleic acid construct and can serve as markers for bioinformatic analysis to identify the beginning and end of the nucleic acid construct after sequencing. In alternate embodiments, the universal sequence fragments could be extended as needed to make the total nucleic acid construct longer for different applications such as long read sequencing. In various embodiments, the nucleic acid constructs can either be encapsulated to spike into samples at sample collection and control for full sample preparation and processing or can be unencapsulated and can be spiked in after extraction to control for library preparation. In one aspect, two or more mixtures of three different GC content fragment constructs can be prepared (e.g., a low quantity standard and a high quantity standard with each having a unique barcode sequence fragment so that the high and low quantity standards can be differentiated post-sequencing).
In yet another embodiment, spike-in cross-contamination and sample swapping controls for analytical chemistry analysis of unknown materials are provided. These controls can be used in analytical chemistry procedures, such as mass spectrometry, and any of the nucleic acid constructs described herein can be used.
The following clauses, and combinations thereof, provide various additional illustrative aspects of the invention described herein. The various embodiments described in any other section of this patent application, including the summary portion of the section titled “BACKGROUND AND SUMMARY”, the “EXAMPLES”, and this “DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS” section of the application are applicable to any of the following embodiments of the invention described in the numbered clauses below.
Control compositions for sequencing or chemical analyses and methods of their use are provided herein. The polymerase chain reaction (PCR) has been developed to analyze nucleic acids in a laboratory. PCR evolved over the last decade into a new generation of devices and methods known as Next Generation Sequencing (NGS). NGS provides faster detection and amplification of nucleic acids at a cheaper price. The NGS devices and methods allow for rapid sequencing as the nucleic acids are amplified in massively parallel, high-throughput platforms.
NGS, and other sequencing methods, for detection of nucleic acids are powerful techniques, for example, for pathogen detection and identification purposes, including for biosurveillance. However, the field suffers from a lack of standards for use in sequencing methods and devices, including NGS methods and devices. Currently, researchers are able to detect and identify nucleic acids from, for example, pathogens through sequencing, but are unable to monitor sample cross-contamination and sample swapping throughout the sequencing protocol. More effective standards are also needed for monitoring sample cross-contamination and sample swapping after the extraction process, and for quantitation of nucleic acids during sequencing.
Analytical chemistry analysis of unknown materials can be confounded by identification of compounds that do not seem to fit with what is expected. These unexpected compounds could be the result of a cross contamination event or may actually be present in the sample. Therefore, spike-in cross contamination and sample swapping controls are also needed for analytical chemistry analyses.
In one embodiment, control compositions for sequencing or chemical analyses are provided. The control compositions comprise a nucleic acid construct comprising at least one barcode sequence fragment. The barcode sequence fragment comprises a unique sequence not found in any known genome. In one embodiment, the control composition is used to determine if cross-contamination between samples for sequencing or chemical analyses has occurred. In another embodiment, the control composition is used to determine if sample swapping has occurred. In yet another embodiment, the control composition can be used for quantitation of nucleic acids during sequencing. In one aspect, the nucleic acid construct is a deoxyribonucleic acid construct. In another aspect, the nucleic acid construct is a ribonucleic acid. In another embodiment, the nucleic acid construct is incorporated into a plasmid.
In various embodiments, the barcode sequence fragment can be from about 10 to about 35 base pairs in length, about 10 to about 34 base pairs in length, about 10 to about 33 base pairs in length, about 10 to about 32 base pairs in length, about 10 to about 31 base pairs in length, about 10 to about 30 base pairs in length, about 10 to about 29 base pairs in length, about 10 to about 28 base pairs in length, about 10 to about 27 base pairs in length, about 10 to about 26 base pairs in length, about 10 to about 25 base pairs in length, about 10 to about 24 base pairs in length, about 10 to about 15 base pairs in length, about 21 to about 28 base pairs in length, about 21 to about 27 base pairs in length, about 21 to about 26 base pairs in length, about 21 to about 25 base pairs in length, about 22 to about 28 base pairs in length, about 22 to about 27 base pairs in length, about 22 to about 26 base pairs in length, about 22 to about 25 base pairs in length, about 23 to 25 base pairs in length, or about 24 base pairs in length.
Various embodiments of barcode sequence fragments are shown below in Table 1 (labeled barcode sequence fragments). These barcode sequence fragments can be used alone or in combinations of, for example, two or more barcode sequence fragments. Additional barcode sequence fragments are shown in Table 2 between the bolded fragments and within the exemplary nucleic acid constructs having SEQ ID NOS:1 to 384.
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
GCAGATCTCG
AGTCAGTCAGCC
In one illustrative aspect, the control composition for sequencing or chemical analyses comprises a nucleic acid construct comprising at least one barcode sequence fragment linked at its 5′ or 3′ end to at least one universal sequence fragment. In another embodiment, the nucleic acid construct comprises at least a first and a second universal sequence fragment, and the first universal sequence fragment can be linked to the 5′ end of the barcode sequence fragment and the second universal sequence fragment can be linked to the 3′ end of the barcode sequence fragment. In one aspect, the universal sequence fragments can be extended as needed to make the nucleic acid construct longer for different applications such as whole genome sequencing where short inserts may be lost.
In yet another embodiment, a universal sequence fragment is not included in the nucleic acid construct (e.g., for microarray applications). In this microarray embodiment, primer binding site fragments are also not included. The complimentary sequence to each barcode sequence fragment may be spotted onto the microarray alongside nucleic acid sequences of interest to detect the barcode sequence fragments. The barcode sequence fragment detected would be in a fixed location that would identify which barcode sequence fragment was present.
In various embodiments, the universal sequence fragments can be from about 10 base pairs in length to about 270 base pairs in length, from about 10 base pairs in length to about 260 base pairs in length, from about 10 base pairs in length to about 250 base pairs in length, from about 10 base pairs in length to about 240 base pairs in length, from about 10 base pairs in length to about 230 base pairs in length, from about 10 base pairs in length to about 220 base pairs in length, from about 10 base pairs in length to about 210 base pairs in length, from about 10 base pairs in length to about 200 base pairs in length, from about 10 base pairs in length to about 190 base pairs in length, from about 10 base pairs in length to about 180 base pairs in length, from about 10 base pairs in length to about 170 base pairs in length, from about base pairs in length to about 160 base pairs in length, from about 10 base pairs in length to about 150 base pairs in length, from about 10 base pairs in length to about 140 base pairs in length, from about 10 base pairs in length to about 130 base pairs in length, from about 10 base pairs in length to about 120 base pairs in length, from about 10 base pairs in length to about 110 base pairs in length, from about 10 base pairs in length to about 100 base pairs in length, from about 10 base pairs in length to about 90 base pairs in length, from about 10 base pairs in length to about 80 base pairs in length, from about 10 base pairs in length to about 70 base pairs in length, from about 10 base pairs in length to about 60 base pairs in length, from about 10 base pairs in length to about 50 base pairs in length, from about 10 base pairs in length to about 40 base pairs in length, from about 10 base pairs in length to about 30 base pairs in length, from about 10 base pairs in length to about 20 base pairs in length, from about 10 base pairs in length to about 15 base pairs in length, from about 8 base pairs in length to about 15 base pairs in length, or from about 8 base pairs in length to about 12 base pairs in length.
In embodiments for amplicon sequencing or chemical analyses involving amplicon sequencing to detect the nucleic acid construct of the control composition, the nucleic acid construct can further comprise at least a first and a second primer binding site fragment. In this aspect, the primers can be any primers of interest. In this embodiment, the first primer binding site fragment is linked at its 3′ end to the 5′ end of the first universal sequence fragment and the second primer binding site fragment is linked at its 5′ end to the 3′ end of the second universal sequence fragment (see
In all of the various embodiments described above, the entire nucleic acid construct, not including plasmid sequence if a plasmid is present, can range in length from about 80 base pairs to about 300 base pairs, from about 80 base pairs to about 290 base pairs, from about 80 base pairs to about 280 base pairs, from about 80 base pairs to about 270 base pairs, from about 80 base pairs to about 260 base pairs, from about 80 base pairs to about 250 base pairs, from about 80 base pairs to about 240 base pairs, from about 80 base pairs to about 230 base pairs, from about 80 base pairs to about 220 base pairs, from about 80 base pairs to about 210 base pairs, from about 80 base pairs to about 200 base pairs, from about 80 base pairs to about 190 base pairs, from about 80 base pairs to about 180 base pairs, from about 80 base pairs to about 170 base pairs, or from about 80 base pairs to about 160 base pairs.
Various embodiments of the nucleic acid constructs, including the forward and reverse primer binding site fragments, the 5′ and 3′ universal sequence fragments, and the barcode sequence fragment are shown in Table 2 above having SEQ ID NOS:1 to 384. The corresponding full sequences are also shown as SEQ ID NOS:385 to 768 in Table 3 below. These embodiments have primer binding site sequence fragments similar to the nucleic acid construct exemplified in
In another embodiment, spike-in control compositions are provided for use in a method that simultaneously 1) controls for cross-contamination and/or sample swapping and 2) allows for quantitation while controlling for different GC content samples (e.g., low, balanced, and high GC content). In this embodiment, nucleic acid constructs are used with barcode sequence fragments, and with GC content fragments where the barcode sequence fragments and the GC content fragments are positioned between universal sequence fragments (see
By using the same type of nucleic acid construct, but with different barcode sequence fragments, different quantities of the nucleic acid construct can be spiked into samples (see the “Low Quantity Standard” and the “High Quantity Standard” with “Barcode 1” and “Barcode 2”, respectively in
Various embodiments of the GC content fragments are shown below in Tables 4 through 7.
GCAGATCTCGTACG
CGAA
GCAGATCTCGTACG
CGAA
GCAGATCTCGTACG
CGAA
GCAGATCTCGTACG
CGAA
GCAGATCTCGTACG
CGAA
GCAGATCTCGTACG
CGAA
ATGATTACAGTTAACAGTATCTTAATGATTACAG
TTAACAGTATCTTA
CTGACTGCAGTTAGCAGTACCTGAATGCTGACAG
TCAGCAGTACCTGA
CGACGGCTCAGGCCTCAGCGTGGCCGACGGCTG
AGGCCTCAGCGTGGC
ATGATTACAGTTAACAGTATCTTAATGATTACAG
TTAACAGTATCTTA
CTGACTGCAGTTAGCAGTACCTGAATGCTGACAG
TCAGCAGTACCTGA
CGACGGCTCAGGCCTCAGCGTGGCCGACGGCTG
AGGCCTCAGCGTGGC
In this quantitation embodiment, the GC content fragment can be from about 100 base pairs in length to about 270 base pairs in length, from about 100 base pairs in length to about 260 base pairs in length, from about 100 base pairs in length to about 250 base pairs in length, from about 100 base pairs in length to about 240 base pairs in length, from about 100 base pairs in length to about 230 base pairs in length, from about 100 base pairs in length to about 220 base pairs in length, from about 100 base pairs in length to about 210 base pairs in length, from about 100 base pairs in length to about 200 base pairs in length, from about 100 base pairs in length to about 190 base pairs in length, from about 100 base pairs in length to about 180 base pairs in length, from about 100 base pairs in length to about 170 base pairs in length, from about 100 base pairs in length to about 160 base pairs in length, from about 100 base pairs in length to about 150 base pairs in length, from about 100 base pairs in length to about 140 base pairs in length, from about 100 base pairs in length to about 130 base pairs in length, from about 100 base pairs in length to about 120 base pairs in length, from about 50 base pairs in length to about 270 base pairs in length, from about 50 base pairs in length to about 260 base pairs in length, from about 50 base pairs in length to about 250 base pairs in length, from about 50 base pairs in length to about 240 base pairs in length, from about 50 base pairs in length to about 230 base pairs in length, from about 50 base pairs in length to about 220 base pairs in length, from about 50 base pairs in length to about 210 base pairs in length, from about 50 base pairs in length to about 200 base pairs in length, from about 50 base pairs in length to about 190 base pairs in length, from about 50 base pairs in length to about 180 base pairs in length, from about 50 base pairs in length to about 170 base pairs in length, from about 50 base pairs in length to about 160 base pairs in length, from about 50 base pairs in length to about 150 base pairs in length, from about 50 base pairs in length to about 140 base pairs in length, from about 50 base pairs in length to about 130 base pairs in length, from about 50 base pairs in length to about 120 base pairs in length, from about 60 base pairs in length to about 120 base pairs in length, from about 70 base pairs in length to about 120 base pairs in length, from about 80 base pairs in length to about 120 base pairs in length, from about 90 base pairs in length to about 120 base pairs in length, or from about 100 base pairs in length to about 120 base pairs in length.
In quantitation embodiments where GC content fragments are present, the GC content of the GC content fragments can vary. As exemplary embodiments, the GC content fragments can have GC contents of about 1 to about 40 percent, about 1 to about 35 percent, about 1 to about 30 percent, about 1 to about 25 percent, about 1 to about 20 percent, about to about 65 percent, about 40 to about 65 percent, about 40 to about 60 percent, about 40 to about 55 percent, about 40 to about 50 percent, about 45 to about 65 percent, about 45 to about 60 percent, about 45 to about 55 percent, about 45 to about 50 percent, about 65 to about 100 percent, about 65 to about 95 percent, about 65 to about 90 percent, about 65 to about 85 percent, about 65 to about 80 percent, about 65 to about 75 percent, about 65 to about 70 percent, about 60 to about 100 percent, about 60 to about 95 percent, about 60 to about 90 percent, about 60 to about 85 percent, about 60 to about 80 percent, about 60 to about 75 percent, or about 60 to about 70 percent. In one aspect, the GC content fragments can have low (e.g., about 1 to about 40 percent), balanced (e.g., about 40 to about 60 percent or about 45 to about 60 percent), or high GC content (e.g., about 60 to about 100 percent or about 65 to about 100 percent). In this quantitation embodiment, the GC content fragments in different nucleic acid constructs can have, for example, at least one, two, three, or four different GC content percentages in the different nucleic acid constructs.
In this quantitation embodiment, the different GC content fragments (e.g., low, balanced, and high GC content) have the same barcode sequence fragment at each GC percentage (e.g., low, balanced, and high GC content), but at each separate concentration of the nucleic acid construct used to produce the standard curve (e.g., “Low Quantity Standard” and the “High Quantity Standard” in
In quantitation embodiments for amplicon sequencing, the nucleic acid construct can further comprise at least a first and a second primer binding site fragment. In this aspect, the primers can be any primers of interest. In this embodiment, the first primer binding site fragment is linked at its 3′ end to the 5′ end of the first universal sequence fragment and the second primer binding site fragment is linked at its 5′ end to the 3′ end of the second universal sequence fragment. In embodiments for whole genome sequencing, the nucleic acid construct may lack primer binding site fragments. In embodiments where primer binding site fragments are included in the nucleic acid construct, the primer binding site fragments can range in length from about 15 base pairs to about 28 base pairs, from about 15 base pairs to about 26 base pairs, from about 15 base pairs to about 24 base pairs, from about base pairs to about 22 base pairs, from about 15 base pairs to about 20 base pairs, from about 16 base pairs to about 22 base pairs, from about 16 base pairs to about 20 base pairs, from about 17 base pairs to about 20 base pairs, or can be about 18 base pairs.
In an illustrative embodiment of the quantitation embodiment, the nucleic acid construct is a deoxyribonucleic acid construct. In another aspect, the nucleic acid construct is a ribonucleic acid. In another embodiment, the nucleic acid construct is incorporated into a plasmid. In yet another embodiment, the nucleic acid construct is incorporated into the genome of an organism.
In all of the various quantitation embodiments described above, the entire nucleic acid construct, not including plasmid sequence if a plasmid is present, can range in length from about 80 base pairs to about 300 base pairs, from about 80 base pairs to about 290 base pairs, from about 80 base pairs to about 280 base pairs, from about 80 base pairs to about 270 base pairs, from about 80 base pairs to about 260 base pairs, from about 80 base pairs to about 250 base pairs, from about 80 base pairs to about 240 base pairs, from about 80 base pairs to about 230 base pairs, from about 80 base pairs to about 220 base pairs, from about 80 base pairs to about 210 base pairs, from about 80 base pairs to about 200 base pairs, from about 80 base pairs to about 190 base pairs, from about 80 base pairs to about 180 base pairs, from about 80 base pairs to about 170 base pairs, or from about 80 base pairs to about 160 base pairs.
In another embodiment, any of the nucleic acids constructs, incorporated into a plasmid or not incorporated or encapsulated or not encapsulated, can be in the form of a kit. In this illustrative aspect, the kit can further comprise a reagent for nucleic acid extraction, a reagent for nucleic acid purification, a reagent for library preparation, a reagent for amplification, a probe (for example for use in exome/targeted hybridization sequencing as described below), a reagent for sequencing, a reagent for chemical analyses, such as mass spectrometry, and/or instructions for use of the kit. In this illustrative embodiment, the kit can comprise more than one of the control compositions for sequencing or chemical analyses wherein each control composition comprises a different nucleic acid construct wherein the different nucleic acid constructs comprise different barcode sequence fragments (e.g., the 384 barcode sequence fragments contained in SEQ ID NOS:1 to 384 or SEQ ID NOS:384 to 768, or, for example, a subset of 96 of these sequences for use in multiplex sequencing applications).
In yet another illustrative aspect, a kit for quantitation of nucleic acids during sequencing can comprise more than one of any of the control compositions described herein wherein each control composition comprises a different nucleic acid construct wherein the different nucleic acid constructs comprise different barcode sequence fragments. In this quantitation embodiment, the nucleic acid constructs comprising different barcode sequence fragments can be spiked into the sample at different concentrations (see the “Low Quantity Standard” and the “High Quantity Standard” with “Barcode 1” and “Barcode 2”, respectively in
In yet another illustrative aspect, the kits described herein can comprise more than one of any of the control compositions described herein wherein the nucleic acid construct in each control composition is encapsulated in a different type of liposome. In this embodiment, each control composition wherein the nucleic acid construct is encapsulated in a different type of liposome may have a different barcode sequence fragment to differentiate the various types of liposomes post-sequencing (see
In one embodiment, the probes for use in exome/targeted hybridization sequencing, primers for use in amplicon sequencing, whole genome sequencing, or exome/targeted hybridization sequencing, and the nucleic acid constructs, including nucleic acid constructs incorporated into a plasmid, described herein can be made by methods well-known in the art, including syntheses and recombinant methods. Such techniques are described in Sambrook et al., “Molecular Cloning: A Laboratory Manual”, 3rd Edition, Cold Spring Harbor Laboratory Press, (2001), incorporated herein by reference. Plasmids, primers, probes, and the nucleic acid constructs described herein can also be made commercially (e.g., Blue Heron, Bothell, WA 98021). Techniques for purifying or isolating the probes, primers, or nucleic acid constructs, including nucleic acid constructs incorporated into a plasmid, described herein are well-known in the art. Such techniques are described in Sambrook et al., “Molecular Cloning: A Laboratory Manual”, 3rd Edition, Cold Spring Harbor Laboratory Press, (2001), incorporated herein by reference. The nucleic acid constructs, including nucleic acid constructs incorporated into a plasmid, described herein can be analyzed by techniques known in the art, such as sequencing, to determine if the sequence is correct.
In one illustrative aspect, the nucleic acid construct, incorporated into a plasmid or not incorporated into a plasmid, can be encapsulated. In one exemplary embodiment, the nucleic acid construct, incorporated into a plasmid or not incorporated into a plasmid, can be encapsulated in a liposome, and the liposome can comprise a lipid selected from the group consisting of cholesterol, a cholesterol ester salt, a lipopolysaccharide, a sphingolipid, a peptidoglycan, a phospholipid, any other suitable lipid, and combinations thereof.
In this embodiment, liposomes can be closed, spherical vesicles comprising amphiphilic lipids in proportions such that they arrange themselves into multiple concentric bilayers when hydrated in aqueous solutions. In another aspect, the liposomes can be converted into single bilayer liposomes which are useful carriers of both hydrophilic molecules, which can reside entrapped in the aqueous interior of the liposome, and of hydrophobic molecules, which can reside entrapped in the lipid bilayer. An exemplary hydrophilic chain constituent is polyethylene glycol.
In various embodiments, the lipids can include those having two hydrocarbon chains, typically acyl chains, and a polar head group, such as phospholipids and glycolipids. In this aspect, phospholipids may include any one type of phospholipid or a combination of phospholipids capable of forming liposomes, including, but not limited to, phosphatidylcholines, phosphatidylethanolamine, phosphatidic acid, phosphatidylinositol, and sphingomyelin, where the two hydrocarbon chains are typically between about 14 to 22 carbons in length, and have varying degrees of unsaturation. The glycolipids include, but are not limited to, cerebrosides and gangliosides. Exemplary phosphatidylcholines, include those obtained from natural sources or those that are partially or wholly synthetic, or are of variable chain length and unsaturation.
In various embodiments, the nucleic acid construct can be encapsulated, incorporated into a plasmid or not incorporated into a plasmid, into a simulated cell membrane that mimics the cell membrane of the microorganism or a eukaryotic cell, or another cell of interest. In one illustrative embodiment, lipids with varying crystal transition temperatures, including cholesterol and lipopolysaccharide, can be incorporated during encapsulation to better mimic the mechanical and material characteristics of a microorganism cell wall (e.g., a bacterial cell wall). In this embodiment, variation in liposome production parameters such as the lipid:DNA ratio, the solvent:non-solvent ratio, and the lipid charge can be used to better tune the liposome composition and size to mimic the cell membrane of the microorganism or a eukaryotic cell, or another cell of interest.
For example, membrane rigidity may be increased with increasing amounts of cholesterol. In one embodiment, this allows the production of a range of liposomes that include easy to lyse (i.e., non-resistant liposomes) through difficult to lyse liposomes (i.e., resistant liposomes). In another embodiment, LPS may be used to mimic Gram-negative bacterial membranes. The hydrated saccharide chains can act as a barrier to hydrophobic species while the phospholipid layer can act as a barrier to hydrophilic species. A periplasm layer of water and peptidoglycan (PG) separates the LPS outer membrane from an inner membrane composed of a more conventional phospholipid lipid bilayer. Polyethylene Glycol (PEG) is a hydrophilic, biologically inert, synthetic material that may confer similar membrane robustness. The PEG can assemble into a brush-like layer on the outer membrane of the liposomes, and act as a hydrated barrier while also increasing the apparent size. Although PEG has been extensively used in liposomes for drug delivery, it may not have been demonstrated as an LPS mimic in an artificial cell. PG, teichoic acids, or similar materials can be added to mimic a Gram-positive cell wall, as the thick PG layers increase lysis resistance. In one aspect, after synthesis, liposome size can be adjusted by extruding the liposomes through a filter membrane with well-defined pore sizes. In this embodiment, the final liposome will comprise small, unilamellar vesicles with a size that is determined by the pore size in the membrane used for extrusion. With no extrusion step, the liposomes may be larger, multi-lamellar liposomes.
In one illustrative aspect, direct encapsulation of the nucleic acid construct without a plasmid or genome backbone (shown schematically in
In all of the encapsulation embodiments described above, encapsulation of the control composition for sequencing or chemical analyses, including the nucleic acid construct, or by incorporation into the genome of a cell (e.g., a bacterial or eukaryotic cell) allows for the control composition for sequencing or chemical analyses to be used in every step of sequencing analysis or chemical analyses of an unknown test sample: from extraction to purification to library preparation, sequencing, or chemical analyses, and data analysis because degradation of the control sample can be avoided so that sample cross-contamination and sample swapping can be effectively monitored throughout the protocol. In another aspect for the quantitation embodiments described herein, the nucleic acid constructs can be encapsulated in a simulated cell membrane to control for differential lysis during sample preparation. In another illustrative aspect, encapsulation of the nucleic acid constructs described herein can enable simultaneous quantification that is controlled for extraction efficiency, cross contamination control, and extraction quality control
In embodiments where the nucleic acid construct is not artificially encapsulated in, for example, a liposome, the nucleic acid construct can be incorporated into the genome of a microorganism for use as a control composition for sequencing. This embodiment is shown schematically in
The CRISPR/Cas9 system for genome editing has benefits over other genome editing systems. In this embodiment, the Cas9 endonuclease is capable of introducing a double strand break into a DNA target sequence (e.g., the natural primer binding sites described above). In this aspect, the Cas9 endonuclease is guided by the guide polynucleotide (e.g., guide RNA) to recognize and optionally introduce a double strand break at a specific target site into the genome of a cell, such as a microorganism, a eukaryotic cell, or another cell of interest for use in the methods described herein. The Cas9 endonuclease can unwind the DNA duplex in close proximity to the genomic target site and cleaves both DNA strands upon recognition of a target sequence by a guide polynucleotide (e.g., guide RNA), but only if the correct protospacer-adjacent motif (PAM) is approximately oriented at the 3′ end of the target. In this embodiment, the donor polynucleotide construct (e.g., the nucleic acid construct described herein) can then be incorporated into the genomic target site. Methods for using the CRISPR/Cas9 system for genome editing are well-known in the art.
In one illustrative aspect, for sequencing or chemical analyses, the nucleic acids in the sample (e.g., microorganisms such as bacteria or viruses) and the nucleic acids in the control composition for sequencing or chemical analyses (e.g., the nucleic acid construct incorporated or not incorporated into a plasmid or into the genome of a microorganism), are extracted and purified for analysis. In various embodiments, the preparation of the nucleic acids (e.g., DNA or RNA) can involve rupturing the cells that contain the nucleic acids (e.g., cells of a microorganism or the nucleic acid construct in a simulated cell membrane) and isolating and purifying the nucleic acids (e.g., DNA or RNA) from the lysate. Techniques for rupturing cells and for isolation and purification of nucleic acids (e.g., DNA or RNA) are well-known in the art. In one embodiment, for example, nucleic acids may be isolated and purified by rupturing cells using a detergent or a solvent, such as phenol-chloroform. In another aspect, nucleic acids (e.g., DNA or RNA) may be separated from the lysate by physical methods including, but not limited to, centrifugation, pressure techniques, or by using a substance with an affinity for nucleic acids (e.g., DNA or RNA), such as, for example, beads that bind nucleic acids. In one embodiment, after sufficient washing, the isolated, purified nucleic acids may be suspended in either water or a buffer. In another aspect, the nucleic acids (e.g., DNA or RNA) are “isolated” or “purified” before sequencing. In one embodiment, “isolated” means that the nucleic acids are removed from their normal environment. In another aspect, “purified” in the context of the nucleic acids that are sequenced means the nucleic acids are substantially free of other cellular material, or culture medium, or other chemicals used in the extraction process. In other embodiments, commercial kits are available, such as Qiagen™ (e.g., Qiagen DNeasy PowerSoil Kit™) Nuclisensm™, and Wizard™ (Promega), and Promegam™ for extraction and purification of nucleic acids. Methods for preparing nucleic acids for sequencing or chemical analyses and library preparation are also described in Green and Sambrook, “Molecular Cloning: A Laboratory Manual”, 4th Edition, Cold Spring Harbor Laboratory Press, (2012), incorporated herein by reference.
In one illustrative aspect, after preparation for sequencing of the nucleic acids in the sample (e.g., in microorganisms such as bacteria or viruses) and the nucleic acid constructs in the control compositions for sequencing or chemical analyses (e.g., nucleic acid construct incorporated or not incorporated into a plasmid or the genome of a microorganism), a library can be prepared, and the nucleic acids can be sequenced using any suitable sequencing method. In one embodiment, Next Generation Sequencing (e.g., using Illumina, ThermoFisher, or PacBio or Oxford Nanopore Technologies sequencing platforms), sequencing by synthesis, pyrosequencing, nanopore sequencing, or modifications or combinations thereof can be used.
In one embodiment, the sequencing can be amplicon sequencing. In another embodiment, the sequencing can be whole genome sequencing. Whole genome sequencing includes, for example, metagenomics, and is utilized heavily in environmental microbial community research, microbiome research, and cancer or human diagnostics. In another embodiment, the sequencing can be exome/targeted hybridization sequencing.
An exemplary nucleic acid construct and probe for exome/targeted hydridization sequencing is shown schematically in
In one aspect, libraries can be pooled and concentrated before sequencing. Methods for library preparation and for sequencing are described in Green and Sambrook, “Molecular Cloning: A Laboratory Manual”, 4th Edition, Cold Spring Harbor Laboratory Press, (2012), incorporated herein by reference. In one illustrative aspect, after sequencing, the number of reads (i.e., read counts) obtained by sequencing the nucleic acids in the sample or the nucleic acids in the control compositions for sequencing (e.g., nucleic acid construct incorporated or not incorporated into a plasmid or the genome of a microorganism) can be determined.
In various illustrative embodiments, using the control compositions for sequencing or chemical analyses described herein, patient samples or environmental samples (e.g., containing animal, plant, bacteria, viruses, fungi, or archaea) can be analyzed by sequencing or chemical analyses. In accordance with the invention, the term “patient” means a human or an animal, such as a domestic animal (e.g., a dog or a cat). Accordingly, the methods and control compositions for sequencing or chemical analyses described herein can be used, for example, for human clinical medicine (e.g., infectious disease diagnosis, cancer genomics, mendelian genetic testing, and paternity testing), veterinary applications, forensics, environmental or ecological use, and consumer sequencing services such as ancestry DNA, American Gut, or other amplicon sequencing-based technologies that sequence amplicons to determine ancestry or the consumer's microbiome composition.
In various aspects, the patient can be a human, or in the case of veterinary applications, can be a laboratory, agricultural, domestic or wild animal. In one embodiment, the patient can include, but is not limited to, a human, a laboratory animal such as a rodent (e.g., mice, rats, hamsters, etc.), a rabbit, a monkey, a chimpanzee, a domestic animal such as a dog, a cat, and a rabbit, and an agricultural animal such as a cow, a horse, a pig, a sheep, a goat, a chicken, and a wild animal in captivity such as a bear, a panda, a lion, a tiger, a leopard, an elephant, a zebra, a giraffe, a gorilla, a dolphin, and a whale.
In various illustrative embodiments, the samples that can be tested using the control compositions for sequencing or chemical analyses and the methods described herein comprise patient body fluids including, but not limited to, urine, nasal secretions, nasal washes, inner ear fluids, bronchial lavages, bronchial washes, alveolar lavages, spinal fluid, bone marrow aspirates, sputum, pleural fluids, synovial fluids, pericardial fluids, peritoneal fluids, saliva, tears, gastric secretions, stool, reproductive tract secretions, such as seminal fluid, lymph fluid, and whole blood, serum, or plasma, or any other suitable patient sample. In another embodiment, nucleic acids extracted from microorganisms (e.g., bacteria or viruses) isolated or purified from patient samples or environmental samples can be tested using the control compositions for sequencing or chemical analyses and methods described herein. In various embodiments, patient tissue samples that can be tested by using the control compositions for sequencing or chemical analyses and the methods described herein can include tissue biopsies of hospital patients or out-patients and autopsy specimens. As used herein, the term “tissue” includes, but is not limited to, biopsies (including tumor biopsies), autopsy specimens, cell extracts, hair, tissue sections, aspirates, tissue swabs, and fine needle aspirates.
In various illustrative embodiments, environmental samples that can be tested by using the control compositions for sequencing or chemical analyses and the methods described herein can be selected from the group consisting of a soil sample, a water sample, a food sample, an air sample, a plant sample, an industrial waste sample, an agricultural sample, a surface wipe sample, a dust sample, a hair sample, and an animal sample, or any other suitable environmental sample.
In another illustrative embodiment, any of the unencapsulated or encapsulated nucleic acid constructs, incorporated into a plasmid or not incorporated into a plasmid, as described herein may be spiked into a sample that will undergo analysis by an analytical chemistry method, such as mass spectrometry, thermal analysis, electrochemical analysis, chromatographic analysis, and the like. In this embodiment, the analytical chemistry analysis may be quantitative and/or qualitative and the small molecules analyzed may be inorganic or organic compounds. In this aspect, the analysis may be selected from the group consisting of forensic analysis, environmental analysis, industrial analysis (e.g., quality control), or medical analysis. In this illustrative aspect, the nucleic acid construct samples can be extracted and treated in a similar fashion as the analytical chemistry samples, and archived samples, after the analytical chemistry analysis protocol is performed, can be saved for sequencing analysis of the cross-contamination or sample swapping controls. In this embodiment, forensic analysis, for example, may be stomach content analysis, checking blood alcohol content, monitoring substance abuse, toxin analysis, poison analysis, and the like. In this embodiment, the archived samples can be subjected to DNA sequencing to confirm or deny cross-contamination or sample swapping (e.g., at the time of sample collection).
In various illustrative embodiments, the microorganisms present in the patient sample or the environmental sample to be tested can be bacteria or viruses. In this aspect, the bacteria can be selected from Gram-negative and Gram-positive cocci and bacilli, and can comprise antibiotic-resistant bacteria. In another illustrative aspect, the bacteria can be selected from the group consisting of Pseudomonas species, Staphylococcus species, Streptococcus species, Escherichia species, Haemophilus species, Neisseria species, Chlamydia species, Helicobacter species, Campylobacter species, Salmonella species, Shigella species, Clostridium species, Treponema species, Ureaplasma species, Listeria species, Legionella species, Mycoplasma species, and Mycobacterium species, or the group consisting of S. aureus, P. aeruginosa, and E. coli. In another aspect, the viruses can be selected from DNA and RNA viruses, or can be selected from the group consisting of papilloma viruses, parvoviruses, adenoviruses, herpesviruses, vaccinia viruses, arenaviruses, coronaviruses, rhinoviruses, respiratory syncytial viruses, influenza viruses, picornaviruses, paramyxoviruses, reoviruses, retroviruses, and rhabdoviruses. In another illustrative embodiment, mixtures of any of these microorganisms can be present in the patient sample or the environmental sample. In yet another embodiment, the sample to be tested comprises eukaryotic cells.
In one illustrative aspect, a method is provided using any of the non-quantitation control compositions described herein. The method is for monitoring cross-contamination or sample swapping over all steps of a DNA sequencing protocol including collection of a sample comprising DNA, DNA extraction from the sample, purification of the extracted DNA, library preparation, and sequencing. The method comprises a) spiking the sample with a control composition comprising a nucleic acid construct wherein the nucleic acid construct comprises at least one barcode sequence fragment linked to at least one universal sequence fragment and wherein the nucleic acid construct is a deoxyribonucleic acid construct, b) extracting total DNA wherein total DNA comprises the DNA from the sample and DNA from the nucleic acid construct, c) purifying total DNA, d) preparing a library from total DNA, e) sequencing the extracted, purified total DNA, and f) detecting the nucleic acid construct in total DNA.
In another embodiment, a method is provided using any of the quantitation control compositions described herein that contain GC content fragments, where the method is for monitoring sample cross-contamination and/or sample swapping and for quantification of nucleic acids during sequencing. The method comprises a) extracting DNA from a sample, b) purifying the DNA, c) spiking the sample, after DNA extraction and purification and before library preparation, with a control composition comprising a nucleic acid construct wherein the nucleic acid construct comprises at least one barcode sequence fragment, at least one universal sequence fragment, and at least one GC content fragment, and wherein the nucleic acid construct is a deoxyribonucleic acid construct, wherein total DNA is obtained after spiking the sample, and wherein total DNA comprises the DNA from the sample and the DNA from the nucleic acid construct, d) preparing a library from total DNA, e) sequencing total DNA, and f) detecting and quantifying the nucleic acid construct in total DNA.
In another embodiment, a method is provided using any of the quantitation control compositions described herein that contain GC content fragments. The method is for monitoring sample cross-contamination and/or sample swapping and for quantification of nucleic acids during sequencing. The method comprises a) spiking a sample with a control composition comprising a nucleic acid construct wherein the nucleic acid construct comprises at least one barcode sequence fragment, at least one universal sequence fragment, and at least one GC content fragment and wherein the nucleic acid construct is a deoxyribonucleic acid construct, b) extracting total DNA from the sample wherein total DNA comprises the DNA from the sample and the DNA from the nucleic acid construct, c) purifying total DNA, d) preparing a library from total DNA, e) sequencing total DNA, and f) detecting and quantifying the nucleic acid construct in total DNA.
In another illustrative aspect, a method is provided using any of the non-quantitation control compositions described herein. The method is for monitoring cross-contamination or sample swapping over steps of a DNA sequencing protocol including collection of a sample comprising DNA, DNA extraction from the sample, purification of the extracted DNA, library preparation, and sequencing. The method comprises a) spiking the sample, after DNA extraction and purification and before library preparation, with a control composition comprising a nucleic acid construct wherein the nucleic acid construct comprises at least one barcode sequence fragment, at least one universal sequence fragment, and wherein the nucleic acid construct is a deoxyribonucleic acid construct, wherein total DNA comprises the DNA from the sample and the DNA from the nucleic acid construct, b) extracting total DNA, c) purifying total DNA, d) preparing a library from total DNA, e) sequencing the extracted, purified total DNA, and f) detecting the nucleic acid construct in total DNA.
In another embodiment, a method for monitoring cross-contamination or sample swapping during an analytical chemistry protocol is provided. The method comprises a) spiking an analytical chemistry protocol sample with a control composition comprising a nucleic acid construct wherein the nucleic acid construct comprises at least one barcode sequence fragment linked to at least one universal sequence fragment and wherein the nucleic acid construct is a deoxyribonucleic acid construct; b) performing the analytical chemistry protocol; c) archiving a sample from the analytical chemistry protocol; d) extracting total DNA from the archived sample wherein total DNA comprises the DNA from the nucleic acid construct and DNA from the analytical chemistry protocol sample, if any; e) purifying total DNA; f) preparing a library from total DNA; g) sequencing the extracted, purified total DNA; and h) detecting the nucleic acid construct in total DNA.
Referring now to
In the illustrative embodiment, the method 100 begins with block 102 in which a computing device receives sequencing reads associated a plurality of samples. The sequencing reads received in block 102 will typically have been generated during multiplex sequencing of the plurality of samples. As discussed above, each of the plurality of samples is spiked with a different control composition comprising a different nucleic acid construct, with each different nucleic acid construct comprising a different barcode sequence fragment, to allow for monitoring cross-contamination or sample swapping over all steps of a DNA sequencing protocol being applied to the plurality of samples. As such, the sequencing reads received in block 102 will include sequencing reads of the DNA found in each sample and DNA from the nucleic acid constructs of the control compositions spiked into the samples. Each sequencing read is associated with the sample from which it was read, either by the use of a tag or by grouping in a distinct data structure. Block 102 may involve receiving the sequencing reads in the form of one or more FASTA, FASTQ, or similar files.
After block 102, the method 100 proceeds to block 104 in which the computing device analyzes the sequencing reads associated with a particular sample to identify the presence of one or more universal sequence fragments. As discussed above, universal sequence fragments may be linked to the 5′ end and/or the 3′ end of the barcode sequence fragment to assist the bioinformatic software in locating and processing the barcode sequence fragments found in the nucleic acid constructs of the control compositions. In some embodiments, block 104 may involve using a text-matching algorithm to identify the presence of one or more universal sequence fragments in the sequencing reads. By way of example, if a 10-base pair universal sequence fragment is included in the nucleic acid constructs of the control compositions, block 104 may involve utilizing a text-matching algorithm to compare each string of 10 characters present in the sequencing reads to the 10 characters representing that 10-base pair universal sequence fragment. In some embodiments, block 104 may also involve referencing a database of universal sequence fragments that may be included in the nucleic acid constructs of the control compositions. In such embodiments, each text string present in the sequencing reads being analyzed may be compared to each of the text strings representing a universal sequence fragment in the database to identify any matches.
After block 104, the method 100 proceeds to block 106 in which the computing device compares sequence fragments that are adjacent the universal sequence fragments identified in block 104 to the barcode sequence fragments included in the nucleic acid constructs of the control compositions spiked into the samples. In some embodiments, where the barcode sequence fragments are linked to two universal sequence fragments (one at the 5′ end of the barcode sequence fragment and another at the 3′ end of the barcode sequence fragment), block 106 may involve comparing each sequence fragment located between two universal sequence fragments in a sequencing read (identified in block 104) to the barcode sequence fragments included in the nucleic acid constructs of the control compositions. In some embodiments, block 106 may involve using a text-matching algorithm to identify the barcode sequence fragment adjacent the universal sequence fragment(s). By way of example, block 106 may involve utilizing a text-matching algorithm to compare the text string representing the sequence fragment adjacent the universal sequence fragment(s) to a plurality of text strings representing the different barcode sequence fragments included in the nucleic acid constructs of the control compositions spiked into the samples. In some embodiments, block 106 may involve referencing a database of barcode sequence fragments that may be included in the nucleic acid constructs of the control compositions for this purpose.
After block 106, the method 100 proceeds to block 108 in which the computing device determines whether the sequence fragments analyzed in block 106 collectively match multiple barcode sequence fragments included in the nucleic acid constructs of the control compositions spiked into the samples. If no cross-contamination between samples has occurred, all of the barcode sequence fragments found in the sequencing reads associated with a particular sample will be identical and match only the barcode sequence fragment included in the nucleic acid construct of the control composition spiked into that sample. As such, if block 108 determines that the sequence fragments analyzed in block 106 collectively match multiple barcode sequence fragments, the method 100 proceeds to block 112 in which the computing device identifies a cross-contamination condition. If block 108 determines that all of the sequence fragments analyzed in block 106 are identical, the method 100 instead proceeds to block 110.
In block 110 of the method 100, the computing device determines whether the sequence fragments analyzed in block 106 all match an unexpected barcode sequence fragment included in the nucleic acid constructs of the control compositions spiked into the samples. The sequencing reads associated with each sample will be expected to include a particular barcode sequence fragment based upon the nucleic acid construct of the control composition spiked into that sample. As such, if block 110 determines that the sequence fragments analyzed in block 106 all match an unexpected barcode sequence fragment, the method 100 proceeds to block 114 in which the computing device identifies a sample swap condition. If block 110 determines that all of the sequence fragments analyzed in block 106 are identical and match the expected barcode sequence fragment, the method 100 instead proceeds to block 116 in which the computing device identifies a (normal) controlled sample condition.
After reaching any of blocks 112, 114, or 116 for each sample, the method returns to block 104 and repeats blocks 104-116 for the sequencing reads associated with another sample of the plurality of samples. This process repeats until the sequencing reads associated with each of the plurality of samples has been analyzed. As such, at the conclusion of the method 100, each of the plurality of samples will have been identified as subject to a cross-contamination condition, a sample swap condition, or a controlled sample condition.
For each sample identified as subject to a controlled sample condition by the method 100, the graphic 200 includes a first icon 202 at a location corresponding to the well containing that sample. For each sample identified as subject to a cross-contamination condition by the method 100, the graphic 200 includes a second icon 204 at a location corresponding to the well containing that sample. For each sample identified as subject to a sample swap condition by the method 100, the graphic 200 may include a third icon (not shown) at a location corresponding to the well containing that sample. The first icon 202, second 204, and third icon may each be visually distinct from one another, allowing a user observing graphic 200 to quickly identify which samples are subject to which conditions. It is contemplated that in some embodiments, the graphic 200 may provide additional information on each sample, particularly in response to user interaction with the graphic 200. For instance, where a user clicks on and/or hovers over one of the icons 202, 204 with a mouse pointer, the graphic 200 may display additional information related to the sample represented by that icon, such as the barcode sequence fragment(s) found in that sample and their amounts (e.g., in number of reads or percentage of total reads).
The following examples are for illustrative purposes only. The examples are not intended to limit the invention in any way.
The goal was to encapsulate the CCC-1 and CCC-2 DNA (see description below) in a synthetic cell wall-like membrane that would mimic a natural bacterium, and to verify the encapsulation through spectrophotometric analysis (UV absorbance, or fluorescence), and then to test the encapsulated CCC-1 and CCC-2 DNA molecules for use as control compositions for sequencing (as described herein) in a spiked soil sample using amplicon sequencing.
Encapsulation Protocol
The Thin Film Hydration method is a viable liposome production method due to its applicability to the small volumes used for pDNA (plasmid DNA—CCC-1 and CCC-2 DNA) samples. Stock pDNA (plasmid DNA—CCC-1 and CCC-2 DNA) was purchased (see below), and only 5 μL of pDNA (at 10 μg/mL) is required for an amplicon sequencing test. The thin film hydration method (without extrusion) yields a small volume of liposomes with good yield.
Materials
Escherichia coli EH100 (Ra mutant)
The encapsulation methods involved generating a standard calibration curve of pDNA in a UV transparent 96-well plate and reading the absorbance at 260 nm. To a micro-vial, 781 μL of ethanol was added. Then 16 μL of pDNA at 841 μg/mL (i.e. ng/μL) was added. The resulting solution was 98% ethanol with 20 μg/mL pDNA. This is the standard solution. CCC-1 and CCC-2 DNA was quantitated as described in
200 μL of ethanol was then added to wells B-H of columns 1 and 2 of a 96-well plate. Then 400 μL of the pDNA standard solution was added to well A of columns 1 and 2 of the plate. A 2-fold, 8-step serial dilution was performed, leaving row H as pure ethanol. The absorbance at 260 nm was then read.
To three separate 1-dram glass vials, the mass of lipids shown in Table 8 below was weighed. The actual masses were recorded and the required volume of chloroform was calculated to bring each lipid solution to its target concentration. The required volume of chloroform to add is shown under Vol solvent, add. Then the three lipid solutions were mixed by combining 1.25 mL of each in a single container.
The lipid solution was added to the round bottom flask and the chloroform was removed to yield a thin film. To a 1-dram glass vial, 2.5 mL of Tris-EDTA buffer was added. Then 59 μL of pDNA at 841.7 μg/mL (i.e. ng/μL) was added to the vial, and vortexed briefly to disperse the DNA. Then the pDNA solution (2.5 mL) was added to the flask, and the flask was vortexed at room temperature until the lipid film dissolved. This yielded a white turbid dispension of pDNA encapsulated in liposomes. The solution was stored in the refrigerator until use.
Spike-In Protocol
Each 0.25 gram soil sample was spiked with either 12.5 ng of CCC-1 DNA, CCC-2 DNA, or a mixture of CCC-1 and CCC-2 DNA, encapsulated as described above. The average size of the encapsulated CCC-1 DNA or CCC-2 DNA (each include a plasmid) was 8±2 μm in diameter, and encapsulation efficiency was demonstrated to be ˜85%. The CCC-1 and CCC-2 DNA molecules are plasmids comprising a barcode sequence fragment and were purchased from Blue Heron, Bothell, WA 98021. The CCC-1 DNA and CCC-2 DNA sequences, including the plasmid, are shown below as SEQ ID NOS:769 and 770, respectively. The nucleic acid construct sequence within the CCC-1 DNA and CCC-2 DNA sequences are shown below as SEQ ID NOS:771 and 772, respectively.
Extraction and Purification Protocol
The DNA in the spiked soil samples was then extracted using the Qiagen DNeasy PowerSoil Kit™. The Agilent Bioanalyzer confirmed that samples contained amplicon products from both the soil microorganisms of the sample and the nucleic acid construct described herein, based on the different amplicon sizes: 16S (soil sample)=˜600 bp and the nucleic acid construct=200 bp (
Library Preparation and Sequencing Protocol
The 16S DNA was amplified and prepared for Illumina NGS sequencing on an Illumina MiSeq. The bead based clean-ups were replaced with MinElute PCR clean up columns. After library preparation, the libraries were visualized on the Agilent Bioanalyzer using the DNA High Sensitivity Assay to check the amplification and size. The libraries were then sequenced on an Illumina MiSeq 300 cycle nanoflow cell. The data was processed and sequencing results showed that the expected microorganisms (
The data shown in
Analytical chemistry analysis of unknown materials can be confounded by identification of compounds that do not seem to fit with what is expected. These unexpected compounds could be the result of a cross contamination event or may actually be present in the sample. Therefore, the next generation sequencing (NGS) cross contamination controls described herein were tested in a mass spectrometry protocol.
Chemical Sample Composition and Analysis
Mock chemical samples composed of 10 mL MilliQ water spiked with Cannabigerol at 10 ng/mL and Dicamba at 10 ng/mL were prepared for analysis. Two replicates (1A and 1B) were spiked with 20 μL (240 ng) of cross contamination control 1 (CCC1). Two replicates (2A and 2B) were spiked with 20 μL (240 ng) of cross contamination control 2 (CCC2), and two replicates (3A and 3B) were spiked with 20 μL (240 ng) of CCC1 and 20 μL (240 ng) of CCC2 for a total of six samples for analysis. A negative control blank of water was also prepared and tested. A positive control spiked with Cannabigerol at 10 ng/mL and Dicamba at 10 ng/mL in water was run concurrently with the mock contamination samples. Chemical analysis was performed on a Waters Xevo TQ-XS triple quadrupole mass spectrometer following standard analytical methods for both spiked compounds. The pertinent instrument conditions are shown in Table 10.
Nucleic Acid Extraction
Encapsulated DNA from the cross-contamination control spike-in mock chemical samples were captured using a 0.22 μM nylon membrane filter (Agilent, Cat. No. R000038111) within a filtration system. The cross-contamination controls were extracted from the nylon membranes using a DNeasy PowerWater Kit (Qiagen, 14900-50-NF) and the DNA was eluted in molecular biology grade water. One tenth of the filtrate volume of 3M sodium acetate pH 5.2 was added to each filtrate. Twice the volume of the filtrate volume of ethanol (Fisher; Cat. No. BP2818-500) was added and incubated overnight at −20° C. Each sample was centrifuged at 16,000×g for 20 minutes at 4° C. and the supernatant was discarded. The DNA pellet was washed with 10 mL of 70% ethanol and centrifuged at 16,000×g for 2 minutes. The alcohol was removed, and the nucleic acid pellet air dried in a BSC until visibly dry (˜15-30 minutes). The DNA pellet was suspended using the 100 μL of PowerWater DNA sample prepared using the PowerWater Kit. The extracted DNA was cleaned using a OneStep PCR Inhibitor Removal Kit (Zymo Research; Cat. No. D6030).
Sequencing
The extracted DNA samples, and a non-encapsulated CCC-1 positive control, were amplified using a KAPA HiFi Hot Start Ready Mix (KAPA Biosystems; Cat. No. 07958935001) following the Illumina 16S Metagenomic Sequencing Library Preparation guideline. The thermocycler conditions were as follows: one cycle of 95° C. for 3 minutes, 25 cycles of 95° C. for 0.5 minutes, 55° C. for 0.5 minutes, and 72° C. for 0.5 minutes; and one cycle of 72° C. for 5 minutes. The following 16S rRNA gene-specific primers coupled to Illumina adapter overhang nucleotide sequences were used:
After amplification, the products were purified using a MinElute PCR Purification Kit (Qiagen; Cat. No. 28006) followed with a SPRISelect (Beckman Coulter; Cat. No. B23317) bead size selection (0.9× Beads). Nextera Dual-index adapters (Illumina; Cat. No. 15055293) were added to the PCR products through amplification with the KAPA HiFi Hot Start Ready Mix (KAPA Biosystems; Cat. No. 07958935001) with the following thermocycler conditions: one cycle of 95° C. for 3 minutes, 8 cycles of 95° C. for 0.5 minutes, 55° C. for 0.5 minutes, and 72° C. for 0.5 minutes; and one cycle of 72° C. for 5 minutes. The libraries were purified using a MinElute PCR Purification Kit (Qiagen; Cat. No. 28006) followed with a SPRISelect (Beckman Coulter; Cat. No. B23317) bead size selection (1.4× Beads). Libraries were quantified using a Qubit dsDNA High Sensitivity kit (Invitrogen; Cat. No. Q32854), analyzed on an Agilent High Sensitivity DNA chip (Agilent; Cat. No. 5067-4627) using a 2100 Bioanalyzer and pooled and normalized to 1 nM with 10 mM Tris-HCl (pH 8.5). Pooled library was denatured using 0.2N NaOH, neutralized with 200 mM Tris-HCl pH 7, diluted to 10 μM with hybridization buffer, and combined with 5% PhiX volume (10 μM). The denatured Phi-X and amplicon library pool was heat denatured at 96° C. before loading onto a 500 cycles MiSeq Nano Kit V2 (Cat. No. MS-103-1003) and was sequenced on an Illumina MiSeq instrument using the 250×250 bp paired-end reads.
Bioinformatics
Data files were downloaded and unzipped for analysis. Cutadapt was run to remove adapters prior to analysis. A Grep search was conducted to identify and count the custom control sequences in each file.
Results
Mock chemical samples containing Cannabigerol and Dicamba were analyzed on a Waters Xevo TQ-XS triple quadrupole mass spectrometer. The MilliQ water negative control came back blank on the MS and the chemical samples all matched the positive control showing there was no influence in the spectra from the presence of the cross-contamination controls (Table 11).
Results showed acceptable variance at this concentration, which was specifically chosen to be at the lower detection limit of the mass spectrometer. There were no detectable interferences from the mock cross contamination compounds that would suppress or enhance chromatography, or otherwise influence result interpretation by an analyst.
Aliquots from each sample along with a control of CCC1 in water were prepared and sequenced. Adapter sequences, short and low-quality reads were removed prior to data analysis. The reads that passed quality control were counted for CCC1 or CCC2. The number of reads for each cross-contamination control are shown in Table 12.
The results show that the cross-contamination controls spiked into the chemical samples do not interfere with chemical analysis and the controls can be detected in analytical chemistry samples when the solvent is water.
This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Application Ser. No. 62/674,533 filed on May 21, 2018, U.S. Provisional Application Ser. No. 62/703,266 filed on Jul. 25, 2018 and U.S. Provisional Application Ser. No. 62/801,520 filed on Feb. 5, 2019, the entire disclosures of which are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
20060058249 | Tong et al. | Mar 2006 | A1 |
20060073506 | Frederick | Apr 2006 | A1 |
20070072212 | Vinayagamoorthy | Mar 2007 | A1 |
20080254453 | Shapero | Oct 2008 | A1 |
20100240064 | Jeddeloh | Sep 2010 | A1 |
20110076726 | Lakey | Mar 2011 | A1 |
20120208193 | Okino | Aug 2012 | A1 |
20150322508 | Miguel | Nov 2015 | A1 |
20160257984 | Hardenbol et al. | Sep 2016 | A1 |
20160257993 | Fu et al. | Sep 2016 | A1 |
20160281182 | Monpoeho et al. | Sep 2016 | A1 |
20170275691 | Christians et al. | Sep 2017 | A1 |
20170292149 | Sherwood | Oct 2017 | A1 |
20190300948 | Cuppens | Oct 2019 | A1 |
20200277672 | Freeman | Sep 2020 | A1 |
Number | Date | Country |
---|---|---|
3246412 | Nov 2017 | EP |
2004083819 | Sep 2004 | WO |
2009036525 | Mar 2009 | WO |
WO2011156795 | Dec 2011 | WO |
2016179530 | Nov 2016 | WO |
2017058936 | Apr 2017 | WO |
2017165864 | Sep 2017 | WO |
2017192974 | Nov 2017 | WO |
2018119301 | Jun 2018 | WO |
2019226648 | Nov 2019 | WO |
Entry |
---|
Quaile et al. BMC Genomics 2014, 15:110 http://www.biomedcentral.com/1471-2164/15/110 ; doi: 10.1186/1471-2164-15-110. |
Kojima et al. Nucleic Acids Research, 2005, 33(17) e 150. |
Dauphin et al. Journal of Applied Microbiology, 2010, 108, 163-72. |
Hammer et al. FEBS Letters, 2012, 586, 1882-90. |
Zelikin et al. ACSNANO, 2007, 1(1) 63-9. |
International Search Report for PCT/US19/66326 dated Apr. 6, 2020. |
Kozarewa et al. Nature Methods 2009; 6:291-295. |
Jiang, L. et al., “Synthetic spike-in standards for RNA-seq experiments,” Genome Research, 2011, 21(9) 1543-51. |
Supplementary European Search Report, completed May 20, 2022, for EP 19894954. |
Kaifu Chen et al. The Overlooked Fact: Fundamental Need for Spike-In Control for Virtually All Genome-Wide Analysis Molecular and Cellular Biology Mar. 206 vol. 36 No. 5. |
Qu et al. Development of ERCC RNA Spike-In Control Mixes J. Biomol. Tech. Oct. 2011; 22(Suppl): S46. |
Wong et al. ANAQUIN: a software toolkit for the analysis of spike-in controls for next generation sequencing Bioinformatics, 33(11), 2017, 1723-1724. doi: 10.1093/bioinformatics/btx038 Advance Access Publication Date: Jan. 27, 2017. |
Chen et al. Effects of GC Bias in Next-Generation-Sequencing data on De Novo Genome Assembly PLoS One 8(4): e62856. doi: 10.137/journal.pone.0062856. |
O'Connell et al (High Interspecimen Variability in Nucleic Acid Extraction Efficiency Necessitates the Use of Spike-In Control for Accurate qPCR-based Measurement of Plasma Cell-Free DNA Levels, Laboratory Medicine 48:4:332-338, DOI: 10.1093/labmed/ lmx043. |
Stoeckel et al Water Research 42 4820-4827, Jun. 4, 2009. |
Zhang et al (Results of first proficiency test for KRAS testing with formalin-fixed, paraffin-embedded cell lines in China, Clin Chem Lab Med 2014; 52(12): 1851-1857. |
Sundquist et al Identifying and Preventing DNA Contamination in a DNA-Typing Laboratory Promega.com, Sep. 2005. |
Number | Date | Country | |
---|---|---|---|
20190382837 A1 | Dec 2019 | US |
Number | Date | Country | |
---|---|---|---|
62801520 | Feb 2019 | US | |
62703266 | Jul 2018 | US | |
62674533 | May 2018 | US |