A Sequence Listing accompanies this application and is submitted as an ASCII text file of the sequence listing named “155554_00719_Sequence_Listing.xml” which is 197,938 bytes in size and was created on Oct. 31, 2023. The sequence listing is electronically submitted via Patent Center with the application and is incorporated herein by reference in its entirety.
The three human Ras genes, i.e., KRAS, HRAS, and NRAS, are the most common oncogenes detected in human cancer. KRAS mutations are found at high rates in leukemias, colorectal cancer, pancreatic cancer and lung cancer. Conventional next generation sequencing (NGS) methods are used to detect KRAS mutations, but they have a detection limit of about one mutant copy per 1000 gene copies, whereas a mutation detected by the method may be present in the sample at a frequency much lower than one mutant copy per 1000 gene copies, and even lower than one mutant copy per 2×106 gene copies. Therefore, improved sequencing methods are needed to detect these rare mutations.
The present disclosure provides methods and kits for detecting a mutation in a Ras gene in a human sample.
In an aspect, provided herein is a method of detecting a mutation in a KRAS gene in a sample from a human subject, the method comprising: (a) digesting genomic DNA in the sample with at least one enzyme that cleaves at the 3′ end of a region of interest (ROI); (b) adding a forward adaptor and a barcode to the 3′ end of the ROI by mixing the digested genomic DNA with an adaptor-barcode primer and performing a single round of extension, wherein the adaptor-barcode primer comprises from 5′ to 3′: a forward adaptor sequence, a barcode sequence, and a first ROI-specific sequence that is complementary to the 3′ end of the digested ROI, wherein adding the forward adaptor and the barcode to the ROI produces a barcoded DNA; (c) performing linear amplification of the barcoded DNA produced in (b) using a forward adaptor primer that anneals to the forward adaptor sequence; (d) performing exponential amplification of the linearly amplified barcoded DNA produced in (c) using: an exon-specific reverse primer comprising from 5′ to 3′: a reverse adaptor sequence, and a second ROI-specific sequence that is complementary to the 5′ end of the digested ROI; the forward adaptor primer; and a reverse adaptor primer that anneals to the reverse adaptor sequence; and (e) sequencing the exponentially amplified barcoded DNA produced in (d); wherein a different nucleotide in the sequenced DNA produced in (e) compared to a wild type KRAS sequence indicates that a mutation is detected.
In embodiments, the ROI is on the transcribed strand of KRAS exon 1, the non-transcribed strand of KRAS exon 1, or the non-transcribed strand of KRAS exon 2.
In embodiments, the ROI is on the transcribed strand of KRAS exon 1; and: the enzyme is StuI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 6, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 7 or SEQ ID NO: 8; the enzyme is Hinfl, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 25, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is MluCI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 26, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is Hpy188I, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 27, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is AlwI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 28, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is DpnII, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 29, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is MnlI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 30, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is NsiI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 31, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is HpyCH4V, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 32, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; or the enzyme is BsrI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 33, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17.
In embodiments, the ROI is on the non-transcribed strand of KRAS exon 1; and: the enzyme is HinfI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 9, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 10; the enzyme is PsiI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 18, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 8; the enzyme is Tsp45I, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 19, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 8; the enzyme is selected from AflIII, PciI and FatI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 20, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 8; the enzyme is selected from NspI and NlaIII, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 21, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 8; the enzyme is CviAII, and the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 22, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 8; the enzyme is CviQ, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 23, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 8; or the enzyme is HphI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 24, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 8.
In embodiments, the ROI is on the non-transcribed strand of KRAS exon 2; and the enzyme is XmnI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 11, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 12.
In embodiments, the adaptor-barcode primer comprises the forward adaptor sequence of SEQ ID NO: 1; and wherein the forward adaptor primer comprises SEQ ID NO: 2. In embodiments, the exon-specific reverse primer comprises the reverse adaptor sequence of SEQ ID NO: 3; and wherein the reverse adaptor primer comprises SEQ ID NO: 4.
In embodiments, the adaptor-barcode primer further comprises a first index sequence between the forward adaptor sequence and the barcode sequence; wherein the exon-specific reverse primer further comprises a second index sequence between the reverse adaptor sequence and the second ROI-specific sequence; and wherein the first index sequence and the second index sequence comprise between one and seven nucleotides. In embodiments, the first index sequence and the second index sequence are selected from A, GA, CGA, TCGA, ATCGA, GATCGA, and CGATCGA.
In embodiments, the adaptor-barcode primer comprises a sequence selected from SEQ ID NOs: 37 and 41-111. In embodiments, the exon-specific reverse primer comprises a sequence selected from SEQ ID NOs: 38 and 112-118.
In embodiments, the adaptor-barcode primer comprises a sequence selected from SEQ ID NOs: 34 and 119-181. In embodiments, the exon-specific reverse primer comprises a sequence selected from SEQ ID NOs: 36 and 182-188.
In embodiments, performing exponential amplification in step (d) comprises performing at least 20 PCR cycles.
In embodiments, the sample is a biopsy. In embodiments, the subject has or is suspected of having cancer.
In another aspect, provided herein is a kit for detecting a mutation in a KRAS gene in a human subject, the kit comprising at least one set of primers comprising: a forward adaptor primer comprising SEQ ID NO: 2; a reverse adaptor primer comprising SEQ ID NO: 4; an adaptor barcode primer selected from SEQ ID NOs: 34, 37, 39, 41-111, and 119-181; and an exon-specific reverse primer selected from SEQ ID NOs: 35, 38, 40, 112-118, and 182-188; wherein each of the at least one set of primers comprises an adaptor barcode primer and an exon-specific barcode primer that comprise a sequence that targets the same region of interest in KRAS.
In embodiments, the kit further comprises at least one enzyme selected from StuI, HinfI, AlwI, BsrI, DpnII, Hpy188I, HpyCH4V, MluCI, MnlI, NsiI, StuI, AflIII, PciI, FatI, NlaIII, CviAII, CviQI, HphI, NspI, PsiI, XmnI, and Tsp45I; wherein the at least one enzyme cleaves KRAS at the 3′ end of the region of interest that the adaptor barcode primer and the exon-specific barcode primer target.
In embodiments, the enzyme is StuI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 6, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 7 or SEQ ID NO: 8; the enzyme is Hinfl, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 25, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is MluCI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 26, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is Hpy188I, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 27, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is AlwI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 28, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is DpnII, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 29, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is MnlI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 30, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is NsiI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 31, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is HpyCH4V, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 32, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is BsrI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 33, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 17; the enzyme is HinfI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 9, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 10; the enzyme is PsiI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 18, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 8; the enzyme is Tsp45I, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 19, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 8; the enzyme is selected from AflIII, PciI and FatI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 20, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 8; the enzyme is selected from NspI and NlaIII, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 21, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 8; the enzyme is CviAII, and the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 22, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 8; the enzyme is CviQ, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 23, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 8; the enzyme is HphI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 24, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 8; or the enzyme is XmnI, the adaptor-barcode primer comprises the first ROI-specific sequence of SEQ ID NO: 11, and the exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 12.
The present invention provides methods and kits for the detection of rare genetic mutations in the human KRAS gene. These methods were adapted from the previously developed, error-corrected, high-throughput sequencing method referred to as maximum depth sequencing (MDS). MDS overcomes the limitations of conventional next generation sequencing (NGS) to allow for the identification of ultra-rare (1×10−6, 1 mutant per 106 templates) antibiotic-resistance mutations arising in bacteria populations13. The inventors adapted the MDS method for use with the much larger mammalian genome.
To perform MDS, genomic DNA is cleaved (e.g., using a restriction enzyme) at the 3′ end of a region of interest (ROI) (Step a). Then, a single PCR cycle is performed to anneal “adaptor-barcode primers,” which comprise a forward adaptor sequence and a barcode sequence, to the 3′ end of the ROI (Step b). In this step, the exposed 3′ end of the genomic DNA molecule serves as a “primer” that allows the adaptor sequence and barcode sequence to be synthesized onto the end of the ROI. Next, unused adaptor-barcode primers are removed, and the ROI is subjected to linear amplification using a “forward adaptor primer” that anneals to the newly added forward adaptor sequence (Step c). Next, the DNA is subjected to exponential amplification using an “exon-specific reverse primer” that comprises a reverse adaptor sequence and a “reverse adaptor primer” that anneals to the reverse adaptor sequence (Step d). Finally, the DNA is sequenced, and the results are used to identify mutations in the ROI. See
The inventors modified the original MDS protocol developed for bacteria to function in mammals in several ways. Specific restriction enzymes that target a specific region in the 3′ end of a region of interest (ROI) within the human KRAS gene were selected, and primers were designed for the specific amplification of the digested ROIs. Several additional parameters, such as the PCR annealing temperatures and the number of PCR cycles were also optimized for use with these particular ROIs and primers.
Provided herein is a method of detecting a mutation in the human KRAS gene in a sample from a subject. The method comprises: (a) digesting genomic DNA in the sample with at least one enzyme that cleaves at the 3′ end of a region of interest (ROI); (b) adding a forward adaptor and a barcode to the 3′ end of the ROI by mixing the digested genomic DNA with an adaptor-barcode primer and performing a single round of extension, wherein the adaptor-barcode primer comprises from 5′ to 3′: a forward adaptor sequence, a barcode sequence, and a first ROI-specific sequence that is complementary to the 3′ end of the digested ROI, wherein adding the forward adaptor and the barcode to the ROI produces a barcoded DNA; (c) performing linear amplification of the barcoded DNA produced in (b) using a forward adaptor primer that anneals to the forward adaptor sequence; (d) performing exponential amplification of the linearly amplified barcoded DNA produced in (c) using (i) an exon-specific reverse primer comprising from 5′ to 3′: a reverse adaptor sequence and a second ROI-specific sequence that is complementary to the 5′ end of the digested ROI, and (ii) a reverse adaptor primer that anneals to the reverse adaptor sequence; and (e) sequencing the exponentially amplified barcoded DNA produced in (d); wherein a different nucleotide in the sequenced DNA produced in (e) compared to a wild type KRAS sequence indicates that a mutation is detected.
The methods of the present invention are designed to detect a mutation, typically a point mutation, within the human KRAS gene. Kras is a member of the Ras family of proteins, which are GTPases that function as molecular switches that control intracellular signaling pathways. Overactive Ras signaling can lead to cancer. The three human Ras genes, i.e., KRAS, HRAS, and NRAS, are the most common oncogenes detected in human cancer. Therefore, the mutation sought may be an oncogenic driver mutation. The methods are designed to detect rare mutations. Conventional next generation sequencing (NGS) methods have a detection limit of about one mutant copy per 1000 gene copies. In contrast, a mutation detected by the method may be present in the sample at a frequency lower than one mutant copy per 1000 gene copies, or even at a frequency lower than one mutant copy per 2×106 gene copies.
The present methods rely on the analysis of short reads within a defined region of interest (ROI). Thus, a specific portion of the KRAS gene must be selected for analysis. Preferably, the ROI is greater than 50 bp and less than 150 bp in size. In the Examples, the inventors disclose methods that can be used to detect mutations in the transcribed strand of KRAS exon 1, the non-transcribed strand of KRAS exon 1, or the non-transcribed strand of KRAS exon 2. Thus, the ROI may be on the transcribed strand of KRAS exon 1, the non-transcribed strand of KRAS exon 1, or the non-transcribed strand of KRAS exon 2.
In Step (a) of the present methods, at least one enzyme is used to digest genomic DNA in the sample. Any enzyme that is capable of cleaving or nicking a genomic DNA at the 3′ end of a ROI may be used in this step. The enzyme may be an endonuclease. Exemplary enzymes include, but are not limited to, StuI, HinfI, AlwI, BsrI, DpnII, Hpy188I, HpyCH4V, MluCI, MnlI, NsiI, StuI, AflIII, PciI, FatI, NlaIII, CviAII, CviQI, HphI, NspI, PsiI, XmnI, and Tsp45I. The enzyme may be a restriction enzyme. In embodiments, enzymatic digestion is performed by CRISPR/Cas9. Notably, since the barcode/adaptor sequences are added to the ROI using DNA polymerization, the method is not limited to only the use of restriction enzymes with overhangs of a specific length, as are required by other approaches.
In Step (b), a single round of extension with an adaptor-barcode primer is used to add a forward adaptor and a barcode to the 3′ end of the ROI. The adaptor-barcode primer is an oligonucleotide that comprises from 5′ to 3′: a forward adaptor sequence, a barcode sequence, and a first ROI-specific sequence. The adaptor-barcode primer may have varying lengths and compositions as required by the protocol. In some cases, more than one adaptor amplifier primer may be used.
Adaptor sequences are designed to interact with a specific sequencing platform (e.g., the surface of a flow-cell (Illumina) or beads (Ion Torrent)) to facilitate a sequencing reaction. Thus, the optimal length of the forward adaptor sequence and the reverse adaptor sequence will vary depending on the sequencing platform used. One of ordinary skill will understand that adaptor sequences may be as short as 20 nucleotides or substantially longer. For example, an adaptor sequence of 58 nucleotides may be used with an Illumina machine.
A barcode sequence is a short, pre-defined sequence that is used to track the origin of specific DNA molecules (i.e., which sample they came from) through the sequencing process. A barcode sequence may be about 6-40 nucleotides in length. In exemplary embodiments, the barcode sequence is 14 nucleotides in length (as in the generic barcode sequence of NNNNNNNNNNNNNN) Multiple barcodes may be used. For example, a second barcoded primer may be used to add a second barcode sequence after linear amplification is performed in Step (b).
The first ROI-specific sequence is a sequence that is complementary to the 3′ end of the digested ROI, allowing the adaptor-barcode primer to anneal to the 3′ end of the digested ROI. The first ROI-specific sequence may be about 8-30 nucleotides in length.
In Step (c) of the present methods, the barcoded DNA is subjected to linear amplification using a single primer to replicate the copies of the same barcoded DNA template. In linear amplification, the same DNA molecule is copied multiple times, which reduces the probability of recovering a defective copy. Linear amplification is accomplished using PCR with a forward adaptor primer that anneals to the forward adaptor sequence. For example, in some embodiments, the forward adaptor sequence is SEQ ID NO: 1, and the forward adaptor primer comprises SEQ ID NO: 2. Advantageously, annealing the linear amplification primer to an adaptor sequence rather than the ROI itself allows for more uniform amplification in multiplexed reactions and can reduce the amount of off-target amplification. The optimal length of the forward adaptor primer may be similar to or shorter than the length of primers used for standard PCR, i.e., between about 10 and about 20 nucleotides. In exemplary embodiments, the linear amplification comprises at least 12 PCR cycles. However, the number of cycles may be scaled according to the amount of DNA at the start of the reaction.
In Step (d), the DNA is subjected to exponential amplification using both an exon-specific reverse primer and a reverse adaptor primer. The exon-specific reverse primer is used to generate PCR products from the original linear amplification in combination with the forward adaptor primer from Step (c). The generated PCR products are further amplified by the reverse adaptor primer in combination with the forward adaptor primer to generate enough DNA to be sequenced. In exemplary embodiments, the exponential amplification of Step (d) comprises at least 20 PCR cycles. Additional PCR cycles were needed compared to the prior methods using bacterial templates because the genomic DNA of mammals is larger and there are less copies of the templates within the reaction at the same concentration of genomic DNA.
The exon-specific reverse primer is designed to add a reverse adaptor sequence to the 5′ end of ROI, and it comprises from 5′ to 3′: a reverse adaptor sequence and a second ROI-specific sequence. The adaptor-barcode primer may have varying lengths and compositions as required by the protocol.
The second ROI-specific sequence is a sequence that is identical to the 5′ end of the digested ROI, allowing the exon-specific reverse primer to anneal to the 3′ end of the product from linear amplification of digested ROI. The second ROI-specific sequence may be about 8-30 nucleotides in length.
The reverse adaptor primer is designed to anneal to the reverse adaptor sequence added by the exon-specific reverse primer to amplify the DNA product. For example, in some embodiments, the exon-specific reverse primer comprises the reverse adaptor sequence of SEQ ID NO: 3 and the reverse adaptor primer comprises SEQ ID NO: 4. The optimal length of the reverse adaptor primer may be similar to or shorter than the length of primers used for standard PCR (between about 10 and about 20 nucleotides).
An index sequence may be added to the adaptor-barcode primer and the exon-specific revers primer, but it is not always necessary. On the adaptor-barcode primer, the index sequence (first index sequence) may be between the forward adaptor sequence and the barcode sequence. On the exon-specific reverse primer, the index sequence (second index sequence) may be between the reverse adaptor sequence and the second ROI-specific sequence. Index sequences may be used to increase the total index variations (i.e., increase how many samples you can pack into one sequencing library). They're unique identifying sequences that are used to track DNA in a multiplexed sequencing reaction. In contrast to a barcode sequence, which is usually read in the same read as the genomic DNA, the index sequence is read in a separate sequencing read. Many index sequences are commercially available. In exemplary embodiments, the index sequence is 0-7 nucleotides in length, or 1-7 nucleotides in length. Each of the first and second index sequences may be selected from A, GA, CGA, TCGA, ATCGA, GATCGA, and CGATCGA. The first and second index sequence may be the same or different.
In Step (e) of the present methods, the DNA products are sequenced. Any suitable sequencing method may be used with the present methods. For example, sequencing may be accomplished using a next generation sequencer, such as a NextSeq 550, 10×, Illumina, or another sequencing instrument. Paired-end sequencing may be used to increase the yield of sequencing reads. Single-end sequencing may be used.
The sequencing results are compared to a reference sequence to identify the mutation in the KRAS gene. The term “reference sequence” is used to refer the known normal or wild-type sequence of the KRAS gene of interest. Reference sequences can be obtained, for example from a reference genome, such as those provided by the National Center for Biotechnology Information. A mutation in the KRAS gene is identified when the reference sequence does not match the majority of the sequencing results at a particular position within the ROI. Sequencing data may be analyzed using any known method including, for example, through the Galaxy web platform73.
To detect mutations within the desired regions of interest (ROI) within the human KRAS gene, the inventors selected appropriate enzymes that target a site at the 3′ end of the ROIs and designed ROI-specific primers (i.e., an adaptor-barcode primer and an exon-specific reverse primer) for the amplification of the enzyme digested ROI. Specifically, to detect the transcribed strand of human KRAS exon 1, the inventors used the restriction enzyme StuI to digest the genomic DNA, and they used an adaptor-barcode primer that comprises the first ROI-specific sequence of
SEQ ID NO: 6 and an exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 7 or SEQ ID NO: 8. To detect the non-transcribed strand of human KRAS exon 1, the inventors used the restriction enzyme Hinfl to digest the genomic DNA, and they used an adaptor-barcode primer that comprises the first ROI-specific sequence of SEQ ID NO: 9 and an exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 10. To detect the non-transcribed strand of human KRAS exon 2, the inventors used the restriction enzyme XmnI to digest the genomic DNA, and they used an adaptor-barcode primer that comprises the first ROI-specific sequence of SEQ ID NO: 11 and an exon-specific reverse primer comprises the second ROI-specific sequence of SEQ ID NO: 12. Thus, any of these sets of restriction enzymes and primers may be used to detect a mutation in an ROI within the human KRAS gene using the methods of the present invention. Other identified sets of restriction enzymes and primers include, but are not limited to: restriction enzyme Hinfl, first ROI-specific sequence SEQ ID NO: 25, second ROI-specific sequence SEQ ID NO: 17; restriction enzyme MluCI, first ROI-specific sequence SEQ ID NO: 26, second ROI-specific sequence SEQ ID NO: 17; restriction enzyme Hpy188I, first ROI-specific sequence SEQ ID NO: 27, second ROI-specific sequence SEQ ID NO: 17; restriction enzyme AlwI, first ROI-specific sequence SEQ ID NO: 28, second ROI-specific sequence SEQ ID NO: 17; restriction enzyme DpnII, first ROI-specific sequence SEQ ID NO: 29, second ROI-specific sequence SEQ ID NO: 17; restriction enzyme MnlI, first ROI-specific sequence SEQ ID NO: 30, second ROI-specific sequence SEQ ID NO: 17; restriction enzyme NsiI, first ROI-specific sequence SEQ ID NO: 31, second ROI-specific sequence SEQ ID NO: 17; restriction enzyme HpyCH4V, first ROI-specific sequence SEQ ID NO: 32, second ROI-specific sequence SEQ ID NO: 17; restriction enzyme BsrI, first ROI-specific sequence SEQ ID NO: 33, second ROI-specific sequence SEQ ID NO: 17; restriction enzyme PsiI, first ROI-specific sequence SEQ ID NO: 18, second ROI-specific sequence SEQ ID NO: 8; restriction enzyme Tsp45I, first ROI-specific sequence SEQ ID NO: 19, second ROI-specific sequence SEQ ID NO: 8; restriction enzyme AflIII, PciI and FatI, first ROI-specific sequence SEQ ID NO: 20, second ROI-specific sequence SEQ ID NO: 8; restriction enzyme NspI and NlaIII, first ROI-specific sequence SEQ ID NO: 21, second ROI-specific sequence SEQ ID NO: 8; restriction enzyme CviAII, first ROI-specific sequence SEQ ID NO: 22, second ROI-specific sequence SEQ ID NO: 8; restriction enzyme CviQ, first ROI-specific sequence SEQ ID NO: 23, second ROI-specific sequence SEQ ID NO: 8; and restriction enzyme HphI, first ROI-specific sequence SEQ ID NO: 24, second ROI-specific sequence SEQ ID NO: 8. The first ROI-specific sequence and second ROI-specific sequence identified in any of the disclosed pairs, may be unpaired and used independently of each other.
The methods may further comprise isolating the genomic DNA prior to the digestion step. Genomic DNA may be isolated from the cells within the biological samples using standard methods that are well known in the art, including those that rely on organic extraction, ethanol precipitation, silica-binding chemistry, cellulose-binding chemistry, and ion exchange chemistry. Many reagents and kits are for DNA isolation are commercially available. In exemplary embodiments, genomic DNA is isolated by phenol/chloroform extraction followed by ethanol precipitation.
Any sample that comprises genomic DNA may be used with the present invention. The term “genomic DNA” refers to the chromosomal DNA of an organism.
About a third of all human cancers are driven by mutations in RAS genes. Thus, in some embodiments, the methods are performed on a sample from a subject that has or is suspected of having cancer. In such cases, the sample may comprise a tumor biopsy, circulating tumor cells (CTCs; i.e., a liquid biopsy), or circulating tumor DNA (ctDNA). As used herein the term “cancer” refers to an abnormal mass of tissue in which the growth of the mass surpasses and is not coordinated with the growth of normal tissue. The term cancer includes both benign and malignant cancers. Typical cancers include but are not limited to carcinomas, lymphomas, or sarcomas, such as, for example, ovarian cancer, colon cancer, breast cancer, pancreatic cancer, lung cancer, prostate cancer, colorectal cancer, endometrial cancer, urinary tract cancer, uterine cancer, acute lymphatic leukemia, Hodgkin's disease, small cell carcinoma of the lung, melanoma, neuroblastoma, glioma, and soft tissue sarcoma of humans, among others.
Because MDS allows for the detection of extremely rare oncogenic mutations, this method is well suited for detecting genetic heterogeneity within a particular subject or within a particular tumor. Thus, in some embodiments, the detected mutation is an oncogenic mutation. In some embodiments, the mutation is an oncogenic driver mutation, i.e., a mutation that is responsible for both the initiation and maintenance of the cancer. Thus, the methods of the present invention can be used in the clinic, for example, to predict cancer treatment outcomes before chemotherapy is administered.
The terms “nucleic acid” “polynucleotide,” and “oligonucleotide,” as used herein, refer to polydeoxyribonucleotides (containing 2-deoxy-D-ribose), polyribonucleotides (containing D-ribose), and to any other type of polynucleotide that is an N glycoside of a purine or pyrimidine base. There is no intended distinction in length between the terms “nucleic acid”, “oligonucleotide” and “polynucleotide”, and these terms will be used interchangeably. These terms refer only to the primary structure of the molecule. Thus, these terms include double- and single-stranded DNA, as well as double- and single-stranded RNA. For use in the present methods, an oligonucleotide also can comprise nucleotide analogs in which the base, sugar, or phosphate backbone is modified as well as non-purine or non-pyrimidine nucleotide analogs.
Oligonucleotides can be prepared by any suitable method, including direct chemical synthesis by a method such as the phosphotriester method of Narang et al., 1979, Meth. Enzymol. 68:90-99; the phosphodiester method of Brown et al., 1979, Meth. Enzymol. 68:109-151; the diethylphosphoramidite method of Beaucage et al., 1981, Tetrahedron Letters 22:1859-1862; and the solid support method of U.S. Pat. No. 4,458,066, each incorporated herein by reference. A review of synthesis methods of conjugates of oligonucleotides and modified nucleotides is provided in Goodchild, 1990, Bioconjugate Chemistry 1(3): 165-187, incorporated herein by reference.
The term “amplification reaction” refers to any chemical reaction, including an enzymatic reaction, which results in increased copies of a template nucleic acid sequence or results in transcription of a template nucleic acid. Amplification reactions include reverse transcription, the polymerase chain reaction (PCR), including Real Time PCR (see U.S. Pat. Nos. 4,683,195 and 4,683,202; PCR Protocols: A Guide to Methods and Applications (Innis et al., eds, 1990)), and the ligase chain reaction (LCR) (see Barany et al., U.S. Pat. No. 5,494,810). Exemplary “amplification reactions conditions” or “amplification conditions” typically comprise either two or three step cycles. Two-step cycles have a high temperature denaturation step followed by a hybridization/elongation (or ligation) step. Three step cycles comprise a denaturation step followed by a hybridization step followed by a separate elongation step. Multiplex polymerase PCR refers to the use of polymerase chain reaction to amplify several different DNA sequences simultaneously (as if performing many separate PCR reactions all together in one reaction). This process amplifies DNA in samples using multiple primers and a temperature-mediated DNA polymerase in a thermal cycler. The primer design for all primers pairs has to be optimized so that all primer pairs can work at the same annealing temperature during PCR.
The terms “target,” “target sequence”, “target region”, and “target nucleic acid,” as used herein, are synonymous and refer to a region or sequence of a nucleic acid which is to be amplified, sequenced, or detected.
The terms “annealing” and “hybridization,” as used herein, refer to the formation of a duplex structure by two single-stranded nucleic acids due to complementary base pairing. Hybridization can occur between fully complementary nucleic acid strands or between “substantially complementary” nucleic acid strands that contain minor regions of mismatch. Conditions under which hybridization of fully complementary nucleic acid strands is strongly preferred are referred to as “stringent hybridization conditions” or “sequence-specific hybridization conditions”. Stable duplexes of substantially complementary sequences can be achieved under less stringent hybridization conditions; the degree of mismatch tolerated can be controlled by suitable adjustment of the hybridization conditions. Those skilled in the art of nucleic acid technology can determine duplex stability empirically considering a number of variables including, for example, the length and base pair composition of the oligonucleotides, ionic strength, and incidence of mismatched base pairs, following the guidance provided by the art (see, e.g., Sambrook et al., 1989, Molecular Cloning—A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York; Wetmur, 1991, Critical Review in Biochem. and Mol. Biol. 26(3/4):227-259; and Owczarzy et al., 2008, Biochemistry, 47: 5336-5353, which are incorporated herein by reference).
The term “primer,” as used herein, refers to an oligonucleotide capable of acting as a point of initiation of DNA synthesis under suitable conditions. Such conditions include those in which synthesis of a primer extension product complementary to a nucleic acid strand is induced in the presence of four different nucleoside triphosphates and an agent for extension (for example, a DNA polymerase or reverse transcriptase) in an appropriate buffer and at a suitable temperature. A primer is preferably a single-stranded DNA. The appropriate length of a primer depends on the intended use of the primer but typically ranges from about 6 to about 225 nucleotides, including intermediate ranges, such as from 15 to 35 nucleotides, from 18 to 75 nucleotides and from 25 to 150 nucleotides. Short primer molecules generally require cooler temperatures to form sufficiently stable hybrid complexes with the template. A primer need not reflect the exact sequence of the template nucleic acid but must be sufficiently complementary to hybridize with the template. The design of suitable primers for the amplification of a given target sequence is well known in the art and described in the literature cited herein.
Primers can incorporate additional features which allow for the detection or immobilization of the primer but do not alter the basic property of the primer, that of acting as a point of initiation of DNA synthesis. For example, primers may contain an additional nucleic acid sequence at the 5′ end which does not hybridize to the target nucleic acid, but which facilitates cloning or detection of the amplified product, or which enables transcription of RNA (for example, by inclusion of a promoter) or translation of protein (for example, by inclusion of a 5′ -UTR, such as an Internal Ribosome Entry Site (IRES) or a 3′ -UTR element, such as a poly(A)n sequence, where n is in the range from about 20 to about 200). The region of the primer that is sufficiently complementary to the template to hybridize is referred to herein as the hybridizing region.
As used herein, a primer is “specific,” for a target sequence if, when used in an amplification reaction under sufficiently stringent conditions, the primer hybridizes primarily to the target nucleic acid. Typically, a primer is specific for a target sequence if the primer-target duplex stability is greater than the stability of a duplex formed between the primer and any other sequence found in the sample. One of skill in the art will recognize that various factors, such as salt conditions as well as base composition of the primer and the location of the mismatches, will affect the specificity of the primer, and that routine experimental confirmation of the primer specificity will be needed in many cases. Hybridization conditions can be chosen under which the primer can form stable duplexes only with a target sequence. Thus, the use of target-specific primers under suitably stringent amplification conditions enables the selective amplification of those target sequences that contain the target primer binding sites.
In another aspect, provided herein is a kit for detecting a mutation in a KRAS gene in a subject. The kits comprise at least one set of primers for the detection of a mutation in a specific gene region of interest (ROI). The set of primers includes a forward adaptor primer, a reverse adaptor primer, an adaptor-barcode primer, and an exon-specific reverse primer. The set of primers may comprise a forward adaptor primer comprising SEQ ID NO: 2; a reverse adaptor primer comprising SEQ ID NO: 4; an adaptor barcode primer selected from SEQ ID NOs: 34, 37, 39, 41-111, and 119-181; and an exon-specific reverse primer selected from SEQ ID NOs: 35, 38, 40, 112-118, and 182-188. Each of the at least one set of primers should comprise an adaptor barcode primer and an exon-specific barcode primer pair that targets a region of interest in KRAS.
The kit may further comprise at least one enzyme selected from StuI, HinfI, AlwI, BsrI, DpnII, Hpy188I, HpyCH4V, MluCI, MnlI, NsiI, StuI, AflIII, PciI, FatI, NlaIII, CviAII, CviQI, HphI, NspI, PsiI, XmnI, and Tsp45I. The at least one enzyme should cleave KRAS at the 3′ end of the region of interest target by the at least one adaptor barcode primer and the exon-specific barcode primer in the kit.
The kit may further comprise enzymes, buffers, and reagents necessary to perform PCR reactions.
Unless otherwise specified or indicated by context, the terms “a”, “an”, and “the” mean “one or more.” For example, “a molecule” should be interpreted to mean “one or more molecules.”
As used herein, “about”, “approximately,” “substantially,” and “significantly” will be understood by persons of ordinary skill in the art and will vary to some extent on the context in which they are used. If there are uses of the term which are not clear to persons of ordinary skill in the art given the context in which it is used, “about” and “approximately” will mean plus or minus ≤10% of the particular term and “substantially” and “significantly” will mean plus or minus >10% of the particular term.
As used herein, the terms “include” and “including” have the same meaning as the terms “comprise” and “comprising.” The terms “comprise” and “comprising” should be interpreted as being “open” transitional terms that permit the inclusion of additional components further to those components recited in the claims. The terms “consist” and “consisting of” should be interpreted as being “closed” transitional terms that do not permit the inclusion additional components other than the components recited in the claims. The term “consisting essentially of” should be interpreted to be partially closed and allowing the inclusion only of additional components that do not fundamentally alter the nature of the claimed subject matter.
“Percentage of sequence identity”, “percent similarity”, or “percent identity” is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide or peptide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.
The term “substantial identity” or “substantial similarity” of polynucleotide or peptide sequences means that a polynucleotide or peptide comprises a sequence that has at least 75% sequence identity. Alternatively, percent identity can be any integer from 75% to 100%. More preferred embodiments include at least: 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% compared to a reference sequence using the programs described herein; preferably BLAST using standard parameters, as described. These values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning and the like. Any of the primers sequences disclosed in include sequences having at least 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity thereto.
All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.
All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and were set forth in its entirety herein. Preferred aspects of this invention are described herein, including the best mode known to the inventors for carrying out the invention. Variations of those preferred aspects may become apparent to those of ordinary skill in the art upon reading the foregoing description. The inventors expect a person having ordinary skill in the art to employ such variations as appropriate, and the inventors intend for the invention to be practiced otherwise than as specifically described herein. Accordingly, this invention includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the invention unless otherwise indicated herein or otherwise clearly contradicted by context.
The environmental carcinogen urethane exhibits a profound specificity for pulmonary tumors driven by an oncogenic Q61L/R mutation in the gene Kras. Similarly, the frequency, isoform, position, and substitution of oncogenic RAS mutations are often unique to human cancers. In the following example, to elucidate the principles underlying this RAS mutation tropism caused by urethane, the inventors adapted an error-corrected, high-throughput sequencing approach to detect mutations in murine Ras genes with high sensitivity. This approach not only captured the initiating Kras mutation days after urethane exposure, but also revealed that the sequence specificity of urethane mutagenesis coupled with transcription and isoform locus are major influences on the extreme tropism of this carcinogen.
Cell culture. Mouse embryonic fibroblasts (MEFs) derived from E13.5 mouse embryos were stably infected with an ecotropic retrovirus derived from pBabeHygro70 encoding the early region of SV4071 and selected with 100 μg·ml−1 hygromycin to establish immortalized cultures using standard procedures and then cultured in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin.
Construction of Kras mutant plasmids. A region upstream of Kras start codon was amplified from murine genomic DNA (termed PCR1). PCR reactions were comprised of 100 ng of genomic DNA, 2.5 μl of 10 μM forward (5′-AATTGCGGCCGCCCAGGGGGTATAGCGTACTATGCAGAAT-3′) (SEQ ID NO: 189) and reverse (5′-CATTTTCAGCAGGCCTTACAAT-3′) (SEQ ID NO: 190) primers, 4 μl of 2.5 mM dNTP, 10 μl of 5×buffer (NEB), and 0.5 μl Q5 ® Hot Start High-Fidelity DNA Polymerase (NEB) in a total volume of 50 PCR cycles were as follows: one cycle at 98° C. for 30 seconds, 28 cycles at 98° C. for 8 seconds, 64° C. for 15 seconds, 72° C. for 10 seconds, and one cycle at 72° C. for 2 minutes. PCR products were gel purified using QIAquick Gel Extraction Kit following the manufacturer's protocol (Qiagen).
Mutations in Kras cDNA were generated through error-prone PCR (termed PCR2). PCR reactions were comprised of 15 nmol of plasmid containing Kras cDNA, 2 μl of 10 μM forward (5′ -AT TGTAAGGCC TGC TGAAAGAAGAGTATAAACT TGTGGT-3 ‘) (SEQ ID NO: 191) and reverse (5’-CAGGGTCGACTCACATAACTGTACACCTTGTC-3′) (SEQ ID NO: 192) primers, 2 μl of 2.5 mM dNTP, 1.25 μl of 50 mM MgCl2, 2.5 μl of 10×buffer (Invitrogen), 5 μl of 2.5 mM MnCl2, and 0.2 μl of Platinum Taq DNA polymerase (Invitrogen) in a total volume of 25 PCR cycles were as follows: one cycle at 94° C. for 1 minute, 18 cycles at 94° C. for 30 seconds, 55° C. for 30 seconds, 72° C. for 3 minutes, and one cycle at 72° C. for 3 minutes. PCR products were gel purified as described above.
Products from PCR1 and PCR2 were fused through overlap PCR (termed PCR3). 20 ng product from PCR1 and 40 ng product from PCR2 were mixed with 4 μl of 2.5 mM dNTP, 10 μl of 5×buffer (NEB), and 0.5 μl Q5® Hot Start High-Fidelity DNA Polymerase (NEB) in a total volume of 50 μl reaction. PCR cycles were 98° C. for 30 seconds and 10 cycles at 98° C. for 8 seconds, 63° C. for 15 seconds, and 72° C. for 15 seconds. 2.5 μl of forward primer from PCR1 and 2.5 μl of reverse primer from PCR2 were then added and the reaction was continued in the following conditions: 98° C. for 30 seconds, 25 cycles at 98° C. for 8 seconds, 72° C. for 40 seconds, and one cycle at 72° C. for 2 minutes. PCR products were gel purified as described above.
Plasmid backbone was amplified from the pUC1972 (Addgene 50005) plasmid (termed PCR4). PCR reactions were comprised of 1 ng of pUC19 DNA, 2.5 μl of 10 μM forward (5′-AATTGTCGACTTAGACGTCAGGTGGCAC-3′) (SEQ ID NO: 193) and reverse (5′-TTAAGCGGCCGCGTTTGCGTATTGGGCGCT-3′) (SEQ ID NO: 194) primers, 4 μl of 2.5 mM dNTP, 10 μl of 5×buffer (NEB), and 0.5 μl Q5° Hot Start High-Fidelity DNA Polymerase (NEB) in a total volume of 50 PCR cycles were as follows: one cycle at 98° C. for 30 seconds, 28 cycles at 98° C. for 8 seconds, 65° C. for 15 seconds, 72° C. for 1 minute, and one cycle at 72° C. for 2 minutes. PCR products were gel purified as described above.
Products from PCR3 and PCR4 were digested with Sall and Notl according to the manufacture's protocol (NEB). Digested products were column purified using QIAquick PCR Purification Kit following the manufacturer's protocol (Qiagen), ligated, and transformed using standard methodologies. DNA was isolated from individual clones by NucleoSpin® Plasmid miniprep kit (MACHEREY-NAGEL) and validated by Sanger sequencing. Ten clones with different sets of co-occurring mutations in Kras exon 1 and/or 2 were selected to be spiked into wildtype mouse genomic DNA at different ratios to test the detection limit of maximum depth sequencing (see below).
Urethane treatment. 6-8-week-old male and female A/J mice (JAX Stock #000646) were intraperitoneally injected daily for three days with either urethane (Sigma U2500) dissolved in PBS (1 g·kg−1) or the vehicle PBS alone. Mice were humanely euthanized 1, 2, 3, or 4 weeks after the last injection and the lung, liver, and pancreas collected for the extraction of genomic DNA. All mouse care and experiments were performed in accordance with protocols approved by the IACUC of Duke University.
Pharmacokinetic analysis. 6-8-week-old male and female A/J mice (JAX Stock #000646) were intraperitoneally injected with one dose of urethane dissolved in PBS (1 g·kg−1). Mice were humanely euthanized 2, 4, and 8 hours later after which plasma, lungs, pancreas, and livers were harvested and snap frozen. Liquid chromatography (LC) tandem-mass spectrometry (MS/MS) was used to measure urethane (ethyl carbamate, EC) (Sigma U2500) and vinyl carbamate (VC) (Santa Cruz Biotechnology sc-213157) concentrations in plasma and tissues. The LC-MS/MS system consisted of Shimadzu 20A series LC and Applied Biosystems/SCIEX API 4000 QTrap MS/MS instrument. LC columns: Phenomenex C18 3×4 mm guard column (#AJO-4287) and Agilent ZORBAX Eclipse Plus C18 150×4.6 mm 1.8 μm analytical column (#959994-902). Mobile phase A: 0.1% formic acid, 10 μM sodium acetate, and 2% acetonitrile; mobile phase B: 100% methanol. Elution gradient: isocratic flow 30% A. Flow rate: 0.8 ml·min 1. The run time was 10 minutes. Calibration samples were prepared by adding pure standards of EC or VC to corresponding matrix (plasma or tissue homogenate) in appropriate concentration range. The calibration samples were analyzed alongside study samples as a single analytical batch on the day of analysis.
In 2 ml screw cap vial, 20 μl (EC) or 50 μl (VC) of plasma or tissue homogenate (1 part tissue and 2 parts water) diluted with water 1/100 (EC) or undiluted (VC), 10 μl of 2 μg·ml −1 MC-d5 in water (internal standard) (Toronto Research Chemicals), and 60 μl (EC) or 100 μl (VC) of 20 mM xanthydrol (Sigma) in glacial HAc were added and incubated at room temperature for 30 minutes. 100 μl of water and 500 μl of chloroform were then added and the mixture was vigorously agitated (speed 4, 40 seconds; Fast-Prep FP120, Thermo Savant). After centrifugation at 16,000 g for 5 minutes at room temperature, 200 μl (EC) or 400 μl (VC) of chloroform (lower) layer was subjected to a gentle stream of nitrogen for 30 minutes, dry residue reconstituted with 50 μl (EC) or 100 μl (VC) 50% A/50% B, centrifuged at 16,000 g for 5 minutes at 4° C., after which 5 μl (EC) or 10 μl (VC) was injected into LC-MS/MS system. The mass spectrometer was operated in positive mode with the following MRM transitions (m/z): 292/180.8 [EC-1st], 292/151.3 [EC-2nd], 297/181.8 [EC-d5-1st], 297/151.7 [EC-d5-2nd] for EC and 290/180.5 [VC-1st], 290/151.2 [VC-2nd], 297/181.8 [EC-d5-1st], 297/151.7 [EC-d5-2nd] for VC.
Isolation of genomic DNA. MEF cells were resuspended in lysis buffer (100 mM NaCl, 10 mM Tris pH 7.6, 25 mM EDTA pH 8.0, and 0.5% SDS in H2O, supplemented with 20 μg·ml−1 RNase A (Sigma)). Lung, pancreas, and liver (right lobe) from A/J mice (JAX Stock #000646) were cut into fine pieces and similarly resuspended in lysis buffer. Samples were incubated at 37° C. for 1 hour. 2 μl of 800 U·ml−1 proteinase K (NEB) was then added to each sample, the samples were vortexed, and then incubated at 55° C. overnight. Genomic DNA was isolated by phenol/chloroform extraction followed by ethanol precipitation using standard procedures and quantified using Qubit fluorometer.
Maximum depth sequencing (MDS). The MDS assay13 was adapted for mammalian Ras genes as follows. 20-50 μg of genomic DNA was incubated with StuI (NEB) for analysis of the transcribed strand of Kras exon 1, EcoRV (NEB) and EcoRI (NEB) for analysis of the non-transcribed strand of Kras exon 1, XmnI (NEB) for analysis of the non-transcribed strand of Kras exon 2, and PleI (NEB) for analysis of the transcribed strand of Kras exon 2, or HphI (NEB) for the analysis of the non-transcribed strand of Hras exon 2. Reaction conditions were 5 units of the indicated restriction enzyme and per 1 μg DNA per 20 μl reaction (e.g., 20 μg genomic DNA, 5 μl enzyme (20 units/μl), and 40 μl 10×buffer in 400 μl reaction). Digested genomic DNA was column purified using QIAquick PCR Purification Kit following the manufacturer's protocol (Qiagen) and resuspended in ddH2O (35 μl H2O per 10 μg DNA). The barcode and adaptor were added to the target DNA by incubating purified DNA with the appropriate barcode primer (see below) for one cycle of PCR. PCR reactions were comprised of 10 μg DNA, 2.5 μl of 10 μM barcode primer, 4 μl of 2.5 mM dNTP, 10 μl of 5×buffer (NEB), and 0.5 μl Q5® Hot Start High-Fidelity DNA Polymerase (NEB) in a total volume of 50 μl. The number of PCR reactions was scaled according to the amount of DNA. PCR conditions were 98° C. for 1 minute, barcode primer annealing temperate (see below) for 15 seconds, and 72° C. for 1 minute. 1 μl of 20,000 U.m1 1 exonuclease I (NEB) and 5 μl of 10×exonuclease I buffer (NEB) was then added to each 50 μl reaction to remove unused barcoded primers and incubated at 37° C. for 1 hour and then 80° C. for minutes. Processed DNA were column purified using QIAquick PCR Purification Kit as above and resuspended in ddH2O (35 μl H2 per column). The concentration of purified product was measured with SimpliNano spectrophotometer (GE Healthcare Life Sciences). Samples were linear amplified with forward adaptor primer (see below). PCR reactions were comprised of 1.5 μg DNA, 2.5 μl of 10 μM forward-adaptor primer, 4 μl of 2.5 mM dNTP, 10 μl of 5×buffer (NEB), and 0.5 μl Q5® Hot Start High-Fidelity DNA Polymerase (NEB) in a total volume of 50 μl. The number of PCR reactions was scaled according to the amount of DNA. PCR conditions were as follows: 12 cycles of 98° C. for 15 seconds, 70° C. for 15 seconds, 72° C. for 8 seconds. 2.5 μl of 10 μM exon-specific reverse primer (see below) and 2.5 μl of 10 μM reverse-adaptor primer (see below) were then added to each 50 μl reaction. The mixtures were then subjected to 20 cycles of exponential amplification. PCR conditions were as follows: 4 cycles of 98° C. for 15 seconds, exon-specific reverse primer annealing temperature (see below) for 15 seconds, 72° C. for 8 seconds, 16 cycles of 98° C. for 15 seconds, 70° C. for 15 seconds, and 72° C. for 8 seconds. The final library was size selected and purified with Ampure XP beads according to the manufacturer's protocol (Beckman Coulter). Sequencing was performed using HiSeq 2500 100 bp PE rapid run, HiSeq 4000 150 bp PE or NovaSeq 6000 S Prime 150 bp PE at Duke Center for Genomic and Computational Biology. For the optimization of barcode recovery, the same amount of genomic DNA was processed in parallel by MDS assay targeting Kras exon 1 transcribed strand and the PCR products were pooled together at different concentrations in one library to obtain different sequencing depths.
All primers were synthesized by Integrated DNA Technologies (IDT).
Data analysis. All sequencing data were analyzed through the Galaxy web platform73. Specifically, raw data were uploaded to the usegalaxy website or Galaxy Cloudman. For analysis of Kras exon 1 transcribed strand in mouse lung tissue, only read 1 was used. For all the other experiments, read 1 and read 2 were joined via PEAR pair-end read merger74. The reads were then filtered by quality by requiring 90% of bases in the sequence to have a quality core ≥20. Filtered reads were split into different files based on assigned sample indexes and variation in sequence lengths using the tool Barcode Splitter and the tool Filter Sequences by Length.
For the experiment optimizing barcode recovery, the reads were trimmed down to the barcode and grouped into families by barcode. The number of families containing 1 read and ≥2 reads was then counted, respectively.
For the mutant plasmid spike-in experiments, the reads were trimmed down to the barcode and the bases containing engineered mutations. Trimmed reads were grouped by barcode into different families. The frequency of mutants present was calculated by dividing the counts of families containing engineered co-occurring mutations by the total number of families. The frequency of mutants detected was calculated by dividing the counts of families containing ≥2 reads and have ≥90% reads sharing the same engineered mutation at one specified position by the total number of families.
For the experiments examining carcinogen-induced mutations in Kras exon 1 or 2, the reads were trimmed down to the barcode and the target exon. Trimmed reads were grouped by barcode. Barcode families containing ≥3 reads and a unique consensus sequence were selected. To ensure sufficient barcode recovery for the purpose of sensitivity and accuracy, samples with less than 1.5×105 barcode families recovered were excluded from downstream analysis. Sequences from selected barcode families were compared against annotated reference mutant sequences containing all possible single nucleotide substitutions in the exon of interest and the mutation in the reference mutant sequence was assigned to the matched barcode family. The frequency of the corresponding mutation was calculated by dividing the counts of the families containing the mutation by the total number of families.
C>T and G>T substitutions have high background in PBS-treated mouse and have been previously identified as artifacts caused by deamination of cytosine or methyl-cytosine or oxidation of guanine arising during in library preparation11,75, or mis-incorporated nucleotides in vivo not yet repaired13. Consistent with this, we detect high C>T or G>T substitutions but not the complementary G>A or C>A substitutions from the strand processed by MDS. To circumvent this background, the frequency of C>T or G>T substitutions was estimated from the strand with the reverse complementary G>A or C>A substitutions when necessary. Specifically, frequency of G12/13C and G12/13V mutations in Kras exon 1 (G>T substitution on the non-transcribed strand) were estimated from the MDS targeting the transcribed strand while frequency of G12/13S and G12/13D mutations in Kras exon 1 (C>T substitutions on the transcribed strand) were estimated from MDS targeting the non-transcribed strand.
Droplet digital PCR (ddPCR). ddPCR was performed using the QX200 AutoDG Droplet Digital PCR System (Bio-Rad) following the manufacturer's protocol in a 22 μl ddPCR reaction containing 11 μl of 2×ddPCR SuperMix for probes (no dUTP) (Bio-Rad), 66 ng template DNA, 450 nM forward and reverse primers, and 250 nM FAM- and HEX-labelled probes. The primer and probe oligonucleotides were synthesized (IDT) based on sequences previously described76 with minor modifications. The sequences for the primers are: Kras_Q61_For: 5′-ATGGAGAAACCTGTCTCTTGG-3′ (SEQ ID NO: 205); and Kras_Q61_Rev: 5′-CTCATGTACTGGTCCCTCATT-3′ (SEQ ID NO: 206). The sequences for the probes are: Kras_Q61L_MUT_FAM: 5′ -/56-FAM/CAGGT+C+T+AGA+GGAG/3IABkFQ/-3′; and Kras_Q61L_WT_HEX: 5′ -/5HEX/CAGGT +C+A+AGA+GGAG/3IABkFQ/-3′ where “+” denotes the following base is a locked nucleic acid. Following droplet generation on the AutoDG, the plate was sealed with pierceable foil heat seal (Bio-Rad) and PCR performed on a C1000 Touch™ thermal cycler (Bio-Rad). Thermal cycling conditions were as follows: once cycle at 95° C. for 10 minutes, 40 cycles at 94° C. for 30 seconds and 60° C. for 60 seconds, once cycle at 98° C. for 10 minutes, and 4° C. until the sample was removed. Every ddPCR run included no template control, wildtype control with DNA from PBS-treated mice, and mutation-positive control. To achieve detection sensitivity of 1 in 10,000, each sample was assayed in at least 2 wells. Plates were read on a QX200 droplet reader (Bio-Rad) and analyzed with QuantaSoft™ Analysis Pro software (version 1.0.596) (Bio-Rad) to assess the number of droplets positive for mutant DNA, wild-type DNA, both, or neither. The mutant allele fraction43 was estimated as follows: The concentration of mutant DNA (copies of mutant DNA per droplet) was estimated from the Poisson distribution using the formula number of mutant copies per droplet Mmu=−ln (1−(nmu/n)), where nmu=number of droplets positive for mutant FAM probe and n=total number of droplets. The DNA concentration in the reaction was estimated using the formula MDNAconc=−ln (1−(nDNAconc/n)), where nDNAconc=number of droplets positive for mutant FAM probe and/or wild-type HEX probe and n=total number of droplets. The mutant allele fraction=Mmu/MDNAconc.
RNA isolation and quantitative PCR. RNA was extracted from the lung, liver, and pancreas of 6-week-old A/J mice using TRIzol (Thermo Fisher Scientific) and converted to cDNA using iScript™ cDNA Synthesis Kit (Bio-Rad) following the manufacturer's instructions. Quantitative PCR reactions were performed using iTaq Universal SYBR Green Supermix (Bio-Rad) and CFX384 touch real-time PCR detection system (Bio-Rad) using the forward (5′-CCAGCGTCGTGATTAGCGA-3′ (SEQ ID NO: 207) and reverse (5′-CCAGCAGGTCAGCAAAGAAC-3′) (SEQ ID NO: 208) primers (IDT) to detect the control Hprt mRNA and the forward (5′ -GCAAGAGCGCCTTGACGATA-3′) (SEQ ID NO: 209) and reverse (5′-CATGTACTGGTCCCTCATTGCAC-3′) (SEQ ID NO: 210) primers (IDT) to detect Kras mRNA. Gene expression values were calculated using the comparative Ct (−ΔΔCt) method77, using Hprt housekeeping gene as internal control.
Whole exome analysis of mutation frequency versus gene expression. Mutation counts were obtained from published datasets3. Single-nucleotide variations (SNVs) identified by the whole-exome sequencing of urethane-induced adenomas and adenocarcinomas were examined. The expression level of the genes containing these SNVs were determined from published datasets36 . FPKM values of genes expressed in the lung of six-weeks-old C57BL/6JJcl mice were used. The second set of gene expression data 37 were obtained from mouse ENCODE project. FPKM values of genes expressed in the lung of eight-week-old male C57B1/6 mice were used. To bin the genes into different expression groups, the genes were sorted by the mean FPKM value across biological replicates and split into quartiles. The sum of CAN→CTN transversions in the non-transcribed or transcribed strand for the genes in each quartile was calculated and the mean±SEM of all tumors was plotted.
Generation of heatmaps. All heatmaps were generated using Morpheus. All mutation frequencies used in heatmap were corrected by the addition of the detection limit at a barcode recovery of 1.5×105 (˜6.67×10−6). For the heatmap showing the mutation frequency per nucleotide (
Statistics. The number of independent experiments and the statistical analysis used are indicated in the legends of each figure. Data are represented as mean±SEM. p values were determined by Holm-Sidak multiple comparisons test following one-way or two-way ANOVA, non-parametric Dunn's multiple comparison test following Kruskal-Wallis test, or two-tailed non-parametric Mann-Whitney U test. p<0.05 was considered significant. Different levels of significance are indicated as *p<0.05, **p<0.01, ***p<0.001, ****p<0.0001 and ns: not significant. Holm-Sidak multiple comparisons test following ANOVA and non-parametric Dunn's multiple comparison test following Kruskal-Wallis test were executed using GraphPad Prism 6. Two-tailed non-parametric Mann-Whitney U test was executed using excel supplemented with Real Statistics Resource Pack.
Data availability. All raw Illumina® sequencing data has been deposited to NCBI Sequence Read Archive (SRA) under accession number PRJNA561927.
Adapting MDS to the mammalian genome. After urethane exposure, a barrier to detecting initiating Kras mutations in vivo at the time they occur is that the mutation rate of this carcinogen is well below the detection limit of next generation sequencing (NGS). To overcome this limitation, we turned to the error-corrected, high-throughput sequencing approach of maximum depth sequencing (MDS), which is shown to recover mutants in bacteria at a frequency as low as 1×10−6 or 1 mutant per 106 templates13. The key steps of MDS are: first, synthesis of unique barcodes onto one strand of a genomic region-of-interest (ROI); second, linear amplification to obtain multiple direct copies of the barcoded genomic DNA; third, exponential amplification to obtain families of PCR products sharing the same barcode; and fourth, ultra-deep sequencing of millions of barcode families from the single region-of-interest13. Bona fide mutations are differentiated from PCR and sequencing errors by virtue of being detected in all members of one barcode family13. The challenge of adapting MDS to the mammalian genome is maintaining the recovery of a sufficient number of analyzable barcode families (with at least 2 or 3 members) in a genome that is three orders of magnitude larger in size and weight14,15. To this end, we optimized assay conditions (see Methods) for mammalian Kras (
R: number of independent reads sharing the same barcode
To validate the sensitivity of this mammalian version of MDS, we generated a panel of Kras-mutant plasmids, each comprised of Kras cDNA with a unique set of co-occurring double or triple mutations in the region encoded by exon 1 and/or exon 2 (Table 5). Each was spiked at specific concentrations into genomic DNA isolated from mouse embryonic fibroblasts (MEFs) or murine lungs to benchmark different levels of sensitivity. As the error rates of PCR and sequencing are unlikely to give the same two or three exact improper base calls, the actual frequency of mutants present in the sample was estimated by calculating the frequency of barcode families with the pre-engineered co-occurring mutations. The frequency of mutations determined by MDS was then compared against the aforementioned actual frequency. Using this approach, we demonstrated that MDS adapted for the transcribed strand of Kras exon 1 detected mutations at a sensitivity of 5×10−7 or 1 mutant per 2×106 templates (
Capturing the initiating oncogenic mutation in Kras. Urethane induces pulmonary tumors driven by a KrasQ61/LR oncogenic mutation1,3-5, exemplifying the selectivity of this carcinogen at the level of tissue, isoform, position, and substitution. To elucidate the processes behind this RAS mutation tropism, we exposed A/J mice to urethane or the vehicle PBS via three daily intraperitoneal injections. After 1, 2, 3, and 4 weeks, genomic DNA was isolated from the lungs of four to seven mice from each condition. The non-transcribed strand of exon 2 of the endogenous Kras gene was then sequenced by MDS. To ensure abundant depth for mutant recovery and the accuracy of detected mutation frequency, samples with less than 1.5×105 barcodes were excluded from analysis. For the remaining samples, mutation frequencies were summed by either nucleotide position or substitution type, normalized to control PBS, log10 transformed, and then plotted as a heatmap (
This analysis identified the well-established1,3-5 oncogenic L (and to a lesser extent R) mutation at codon Q61 preferentially in the urethane, but not PBS cohort of mice, as early as 1 week after exposure to this carcinogen. We confirm that these are initiating mutations, as they expanded over time indicative of tumor growth (
Substitution tropism. Previous whole-exome sequencing of urethane-induced tumors revealed a strong bias towards A>T/G substitutions3, consistent with ethenodeoxyadenosine adducts forming in vivo after urethane exposure17,18. These substitutions were also detected in Kras by MDS at a high frequency, although A>T transversions were far more common than A>G transitions (
Position tropism. This bias of urethane for (C)A>T/G substitutions similarly argues against mutations arising at an appreciable level in codons 12 (G34GT) or 13 (G37GC) in exon 1, as neither fit the CAN pattern in either strand orientation. Related to this, despite the fact that oncogenic mutations at G12, and to a lesser extend G13, occur frequently in human cancers19 and when introduced into the lungs of mice are tumorigenic to varying degrees20, they are rarely recovered from urethane-induced tumors3. We therefore sequenced the transcribed strand of exon 1 of Kras by MDS from genomic DNA isolated from the lungs of mice 1, 2, 3, and 4 weeks after exposure to urethane or PBS. To overcome interference from strand-specific background (see Methods), we also sequenced the non-transcribed strand of exon 1 of Kras by MDS from the lungs of mice at the 1- and 4-week time points. While the CAN→CTN signature was again preferentially detected 1 week after urethane exposure (
Isoform tropism. The other two Ras genes, Hras and Nras, encode the identical codon 61 (CAA). CAN→CT/GN substitutions at this codon generate the identical oncogenic Q61L/R mutations, which are well known to render Hras and Nras oncogenic22,23. Despite this, oncogenic mutations in Hras or Nras are not recovered in urethane-induced lung tumors3. This suggests that either these loci are resistant in some manner to urethane mutagenesis or oncogenic mutations in these two genes are unable to initiate tumorigenesis. To differentiate between these two possibilities, we optimized the MDS assay to detect mutations in the non-transcribed strand of exon 2 in Hras (see Methods). We then applied this approach to genomic DNA isolated from the lungs of mice 1 and 4 weeks after exposure to urethane or PBS. We found a high prevalence of A>T followed by A>G mutations in exon 2 of Hras (
Organ tropism. Pulmonary lesions are the primary tumors arising in mice after intraperitoneal injections of urethane2. However, activating an oncogenic Kras allele in a broad spectrum of murine organs has been documented to be tumorigenic24. This begs the question of why urethane fails to induce other types of tumors. We thus analyzed the mutation status by MDS of the non-transcribed strand of exon 2 of Kras from lung compared to the liver and pancreas from mice 1 and 4 weeks after exposure to urethane versus PBS. The liver was chosen as in rare cases tumors develop in this organ during urethane carcinogenesis25,26. The pancreas was chosen as it is sensitive to tumorigenesis by oncogenic Kras mutations27,28 but is not known to develop tumors after intraperitoneal injections of urethane2. In comparison with the lung, significantly fewer CAN→CTN transversions were recovered in the liver and pancreas 1 and 4 weeks after urethane exposure (
Strand bias. Given the above differences in the mutation frequency between different tissues, we revisited the MDS sequencing of the Kras locus, finding a bias towards mutations in the non-transcribed strand in mice exposed to urethane. In more detail, MDS targeting the non-transcribed strand of Kras exon 2 revealed that CAN→CTN, but not the complement NTG→NAG transversions, were the predominant mutations in the lungs of mice 1 week after exposure to urethane (
Mutational strand asymmetry has been observed for other mutational processes32,33 and correlated with the transcriptional status of mutated genes34. Kras mRNA levels determined by quantitative RT-PCR (RT-qPCR)35 or RNA-seq36 have been reported to be higher in the murine lung compared to the liver. In agreement, we validated the higher expression of Kras mRNA in lung compared to liver and pancreas by RT-qPCR (
Here we adapted MDS, an error-corrected, high-throughput sequencing approach originally developed for use in microbiology13, to now detect extremely rare mutations in the mammalian genome at a sensitivity of up to 5×10−7 (1 mutant per 2×106 templates). While we developed this assay to study RAS mutation tropism, MDS could find value in other applications, such as early detection38. Nevertheless, by leveraging MDS to study the mutagenesis process at the earliest stage of tumorigenesis, we detected the initiating Q61L/R mutations in Kras in the lungs of mice only days after exposure to urethane, capturing the very birth of cancer. We note that mutant allele-specific amplification39,40 and droplet digital PCR41 have documented Kras mutations after carcinogen exposure. However, we chose to develop MDS for the mammalian settings as these assays are either not as quantitative and sensitive39,42, or are designed to examine pre-selected mutations41,43. Indeed, capitalizing on the ability of MDS to detect any sequence variation in targeted regions of Ras genes at great sensitivity, we show at least three features underpinning the extreme mutational tropism of urethane- the mutational bias of this environmental carcinogen, transcription, and the gene locus.
With regards to the substitution and position bias of urethane, we demonstrate that the prevalence of Q61L/R mutations arises in large part due to the known preference of urethane for A>T/G substitutions3, especially as we show here in the context of a 5′ C. This mutational bias, coupled with codon 61 containing a CAN trinucleotide that when the A is mutated to either T or G gives rise to an oncogenic L (CT182A) or R (CG182A) amino acid, favors the KrasQ61L/R driver mutation characteristic of this carcinogen. Other oncogenic mutations at Q61, G12, or G13 codons do not result from CA→CT/G substitutions, and in agreement, were rarely detected following urethane exposure. The implication being that a mutagenic preference may influence the type of initiating mutations in cancer. Similarly in humans, a CCT→CTC mutation characteristic of C>T transitions induced by UV encodes an activating P29S mutation in RAC1 in sun-exposed melanoma44.
While Q61H, G12, and G13 oncogenic mutations in Kras, which are not favored by urethane mutagenesis, were rare or absent 1 week after urethane exposure, they were detectable 4 weeks later. This implies that extremely rare mutations induced by urethane, provided they have a favorable oncogenic outcome, may initiate tumorigenesis (although we cannot formally rule out that these were pre-existing mutations unveiled by a cooperating mutation induced by urethane). In agreement, while the Q61L mutation is more frequent than Q61R in urethane-induced lung tumors of the A/J mouse strain, the reverse is true in the B6 strain5. Similarly, the mutation spectrum of urethane is also shifted in a variety of mutant Ras backgrounds3,24,45,46. If the mutagenesis preference of urethane is independent of strain background, the prevalence of the Q61R mutation suggest that this less common mutation is more conducive to tumor initiation in the B6 strain. As such, the most dominant mutation of a mutagen may not always dictate the initiating event, echoing the common discordance between the mutagenic signatures and the putative initiating mutation in certain human cancers47-49.
Another fascinating feature of urethane mutagenesis revealed by MDS sequencing relates to isoform tropism. We found that codon 61 was readily mutated in Hras in lung tissue, yet the oncogenic Hras allele was not expanded appreciably over time. This suggests that either HrasQ61L is not as oncogenic as KrasQ61L or the encoded protein is expressed too low (or high) to be tumorigenic. In support of the first, RAS isoforms differ in their residency at different membranes50 and the composition of proteins within the immediate vicinity differs between RAS isoforms51,52, with proteins like PIP5K1A52, calmodulin53, galectin-354, and so forth documented to specifically associate with KRAS. In support of the second, a Kras allele whereby the 3′ end was replaced with Hras exons to encode Hras protein was found mutated in urethane-induced tumors55, indicating that under a Kras promoter HrasQ61L is indeed oncogenic in the lung. Whether the inability of oncogenic mutations in Hras to promote lung tumorigenesis is because the protein is less oncogenic, expressed too low, too high, combinations thereof, or for other reasons7,24,56,57 remains to be elucidated. Nevertheless, the finding that Hras is mutated yet such mutations are not recovered in lung tumors3 after urethane exposure is in itself an important finding, and perhaps related, of the three RAS genes, HRAS is mutated the least often in human cancers6,7,58.
With regards to organ tropism, a very different mechanism appears to be at play. In this case, we found that Kras is rarely mutated in the liver and pancreas, despite the presence of the carcinogen. While a number of factors could contribute to this variation in mutagenesis59-61, one notable difference is that Kras mRNA levels are higher in the lung compared to these other tissues, suggestive of increased transcription. In fact, the lung was found to have the second highest levels of Kras mRNA of 15 adult murine tissues analyzed, second only to the brain35. Kras expression in the mouse lung also correlates with strain susceptibility to urethane carcinogenesis62,63. Related, we discovered that the non-transcribed strand of Kras is preferentially mutated, which for other mutagens has been linked to transcription-coupled repair of the transcribed strand64 or transcription-coupled damage of the displaced, non-transcribed strand34. Indeed, we found a global correlation between mRNA levels and the mutation frequency of urethane. This is not to say that there is a universal concordance between high gene transcription and an elevated mutation frequency of the non-transcribed strand. Indeed, high transcription has been associated with a lower mutation frequency in chromatin-dense genomic regions in cutaneous squamous cell carcinomas65. Thus, the type of cancer, mutational process, specific genes, and so forth may influence the bias of a mutagenic process. In the case of urethane however, we suggest that the tissue tropism is related to the high transcription of Kras in the lung, increasing the susceptibility of this gene to urethane mutagenesis.
In humans, there are also very distinct patterns to RAS mutations at the level of the organ (e.g., RAS is commonly mutated in pancreatic but rarely in breast cancer), isoform (e.g. KRAS is mutated in lung cancer while NRAS is mutated in melanoma), position (e.g. G12 is mutated in CMML while Q61 is mutated in thyroid carcinoma), and substitution (e.g. G12V is the primary mutation in bladder carcinoma while it is G12S in mouth carcinoma). There is no definitive mechanism to explain this phenomenon, although the pattern itself has been widely reported for decades6,7,24,58,66-68. In this regard, the extreme specificity of urethane carcinogenesis for KrasQ61L/R-mutant pulmonary tumors may inform the basic principles of the RAS mutation patterns observed in these clinical samples. Admittedly, urethane is not a major environmental carcinogen in humans compared to, for example, tobacco smoke. KrasQ61L/R mutations are also rare in human lung cancers'. With these two provisos, we speculate that the RAS mutation tropism of human cancers may similarly be a product of mutagenesis selectivity factors, for example the specificity of the mutagenic process or susceptibility of a specific locus to mutations, and selection factors, for example differences in the oncogenic activity of one isoform over another. Moreover, it is entirely possible, if not likely, that different combinations of these or even other factors such as cooperating mutations, as elegantly demonstrated in MNU carcinogenesis3, cell type69, signaling intensity45, and so forth24 underlie the RAS mutation tropism human cancers. As such, each cancer initiating event may be molded by a unique set of factors, each with varying influence.
86, 3070-4 (1989).
16. Hindson, B.J. et al. High-throughput droplet digital PCR system for absolute quantitation of DNA copy number. Anal Chem 83, 8604-10 (2011).
39. Ichikawa, T. et al. The activation of K-ras gene at an early stage of lung tumorigenesis in mice. Cancer Lett 107, 165-70 (1996).
In the following example, the inventors adapted an error-corrected, high-throughput sequencing approach to detect mutations in human Ras genes with high sensitivity. Similar methods as described in Example 1 were modified to be used for human gene detection as described herein.
Isolation of genomic DNA. Cells were resuspended in lysis buffer (100 mM NaCl, 10 mM Tris pH 7.6, 25 mM EDTA pH 8.0, and 0.5% SDS in H2O, supplemented with 20 μg·ml−1 RNase A (Sigma)). Samples were incubated at 37° C. for 1 hour. 2 μlof 800 U·ml−1 proteinase K (NEB) was then added to each sample, the samples were vortexed, and then incubated at 55° C. overnight. Genomic DNA was isolated by phenol/chloroform extraction followed by ethanol precipitation using standard procedures and quantified using Qubit fluorometer.
Maximum depth sequencing (MDS). The MDS assay' was adapted for mammalian Ras genes as follows. 20-50 μg of genomic DNA was incubated with Stul (NEB) for analysis of the transcribed strand of Kras exon 1, Hinfl for analysis of the non-transcribed strand of Kras exon 1, or Xmnl (NEB) for analysis of the non-transcribed strand of Kras exon 2. Reaction conditions were 5 units of the indicated restriction enzyme and per 1 μg DNA per 20 μl reaction (e.g., 20 μg genomic DNA, 5 μl enzyme (20 units/μl), and 40 μl 10×buffer in 400 μl reaction). Digested genomic DNA was column purified using QIAquick PCR Purification Kit following the manufacturer's protocol (Qiagen) and resuspended in ddH2O (35 μl H2O per 10 μg DNA). The barcode and adaptor were added to the target DNA by incubating purified DNA with the appropriate adaptor-barcode primer (see below) for one cycle of PCR. PCR reactions were comprised of 10 μg DNA, 2.5 μl of 10 μM adaptor-barcode primer, 4 μl of 2.5 mM dNTP, 10 μl of 5×buffer (NEB), and 0.5 μl Q5® Hot Start High-Fidelity DNA Polymerase (NEB) in a total volume of 50 μl. The number of PCR reactions was scaled according to the amount of DNA. PCR conditions were 98° C. for 1 minute, adaptor-barcode primer annealing temperate (see below) for 15 seconds, and 72° C. for 1 minute. 1 μl of 20,000 U·ml−1 exonuclease I (NEB) and 5 μl of 10×exonuclease I buffer (NEB) was then added to each 50 μl reaction to remove unused adaptor-barcode primers and incubated at 37° C. for 1 hour and then 80° C. for 20 minutes. Processed DNA were column purified using QIAquick PCR Purification Kit as above and resuspended in ddH2O (35 μl H2O per column). The concentration of purified product was measured with SimpliNano spectrophotometer (GE Healthcare Life Sciences). Samples were linear amplified with forward adaptor primer (see below). PCR reactions were comprised of 1.5 μg DNA, 2.5 μl of 10 μM forward adaptor primer, 4 μl of 2.5 mM dNTP, 10 μl of 5×buffer (NEB), and 0.5 μl Q5® Hot Start High-Fidelity DNA Polymerase (NEB) in a total volume of 50 μl. The number of PCR reactions was scaled according to the amount of DNA. PCR conditions were as follows: 12 cycles of 98° C. for 15 seconds, 70° C. for 15 seconds, 72° C. for 8 seconds. 2.5 μl of 10 μM exon-specific reverse primer (see below) and 2.5 μl of 10 μM reverse adaptor primer (see below) were then added to each 50 μl reaction. The mixtures were then subjected to 20 cycles of exponential amplification. PCR conditions were as follows: 4 cycles of 98° C. for 15 seconds, exon-specific reverse primer annealing temperature (see below) for 15 seconds, 72° C. for 8 seconds, 16 cycles of 98° C. for 15 seconds, 70° C. for 15 seconds, and 72° C. for 8 seconds. The final library was size selected and purified with Ampure XP beads according to the manufacturer's protocol (Beckman Coulter). Sequencing was performed using HiSeq 2500 100 bp PE rapid run, HiSeq 4000 150 bp PE or NovaSeq 6000 S Prime 150 bp PE at Duke Center for Genomic and Computational Biology. For the optimization of barcode recovery, the same amount of genomic DNA was processed in parallel by MDS assay targeting Kras exon 1 transcribed strand and the PCR products were pooled together at different concentrations in one library to obtain different sequencing depths.
All primers were synthesized by Integrated DNA Technologies (IDT).
For the experiment optimizing barcode recovery, the reads were trimmed down to the barcode and grouped into families by barcode. The number of families containing 1 read and ≥2 reads was then counted, respectively.
Adapting MDS to the mammalian genome. Similar to endeavors discussed in Example 1, we optimized assay conditions (see Methods) for the detection of mutations with the human KRAS gene (
Adapting K-MDS to detect G12/13 mutations in human KRAS. We modified the K-MDS assay for the transcribed strand of exon 1 of human KRAS. In brief, human 293 T gDNA was spiked with a panel of KRASGAT* DNA templates which all contained a G34GT→GAT mutation encoding the common G12D oncogenic mutation and a second unique co-occurring mutation (*) to benchmark specific concentrations of template, ranging from 1×10−3 to 10−6. Using this panel we tested various restriction enzymes, primers, annealing temperatures, linear and exponential amplification cycles, and so forth to optimize K-MDS for human KRAS exon 1. As above, the actual frequency of G12D mutants present in the sample was estimated by calculating the frequency of barcode families with the pre-engineered co-occurring mutations. We find complete concordance between detecting a single mutation alone versus with a co-occurring mutation down to the lowest sensitivity assayed (1×10−6,
15. Dwyer-Nield, L.D. et al. Epistatic interactions govern chemically-induced lung tumor susceptibility and Kras mutation site in murine C57BL/6J-ChrA/J chromosome substitution strains. Int J Cancer 126, 125-32 (2010).
This example features the high-throughput sequence method using Kras exon 1 primers. These primers utilize a Hinfl restriction site. A barcode is added to any exon-specific primer.
This application claims priority to U.S. Provisional Application No. 63/421,491 filed on Nov. 1, 2022, the content of which is incorporated by reference in its entirety.
This invention was made with government support under grant numbers R01CA94184 and P01CA203657 awarded by the National Institute of Cancer and under grant number R35GM127062 awarded by the National Institute of General Medicine. The government has certain rights in this invention.
Number | Date | Country | |
---|---|---|---|
63421491 | Nov 2022 | US |