Throughout this application, various publications are referenced, including referenced in parenthesis. The disclosures of all publications mentioned in this application in their entireties are hereby incorporated by reference into this application in order to more fully describe the state of the art to which this invention pertains.
This application incorporates-by-reference nucleotide sequences which are present in the file named “190913_90418-A-PCT_Sequence_Listing_DH.txt”, which is 41 kilobytes in size, and which was created on Sep. 13, 2019 in the IBM-PC machine format, having an operating system compatibility with MS-Windows, which is contained in the text file filed Sep. 13, 2019 as part of this application.
Cells in the human body accumulate hundreds of de novo mutations over the span of a lifetime, especially in frequently dividing cell types. While only a fraction of somatic mutations modifies the protein function, harmful variants eventually emerge and contribute to age-related disorders, including cancer. Therefore, timely detection of pathogenic protein alterations is an important goal in disease screening, early diagnosis, and treatment monitoring.
One strategy for detecting altered proteins is based on antibodies (i.e. FACS-fluorescence-activated cell sorting, IHC-immunohistochemistry, ELISA-enzyme-linked immunosorbent assay). However, antibodies against specific amino acid alterations are challenging to generate. An alternative solution is to find cellular biomarkers downstream of initiating mutations, but such biomarkers are often present in other cell types. Regardless, the low specificity of mutation-associated antibodies or biomarkers precludes multiplexed assays with a low false positive rate in early or residual disease detection.
Another strategy to sequence the DNA and infer their consequence on the protein in silico. In cancer research next-generation sequencing (NGS) has been instrumental in classifying cancer types based on high-frequency mutations or molecular signatures. In early or residual cancer, however, somatic mutations are present in a small number of cells, necessitating clonal expansion of single cells in vitro (i.e. organoids) or deep sequencing. But single-cell expansion is not an option for real-time disease assessment, and the cost of deep sequencing over a large patient population limits the number of genetic markers surveyed, reducing its diagnostic specificity and sensitivity (
A less costly strategy involves allele-specific probes for PCR or droplet-based assays. Allele-specific primer technologies have been around for decades, and they can be relatively specific, robust, and affordable for a handful of mutations. However, the detection specificity varies from one locus to another, making it challenging to multiplex a large number of allele-specific probes. Furthermore, disease-causing base identity at each locus must be known in advance to design allele-specific primers, which is challenging for loci with numerous allelic combinations. Because of these limitations, the role of allele-specific PCR or droplet-based assays are not suited for profiling mutations across a large number of genetic loci (
Among recent technologies, multiplexed single-molecule fluorescent in situ DNA or RNA hybridization (smFISH) could be a potential option for detecting somatic mutations in owing to its sensitivity, simplicity, and versatility. Allele-specific smFISH has been demonstrated as a proof-of-concept, but the single-base specificity is inadequate for clinical applications (
Latest advances in NGS technologies permit the quantification of rare DNA variants even in the blood sample (i.e. ‘liquid biopsy’), and a single-cell NGS workflow can profile somatic mutations from different tissue regions using microdissection. While these advances are conceptually important, they have not translated into readily deployable clinical assays for assessing early or residual disease. Key reasons include the lack of sensitivity and specificity largely due to the cost of deep sequencing across a large number of loci.
The high cost of deep sequencing is attributable to its sequence-agnostic nature. For example, one needs to sequence >106 molecules in order to detect a single mutant molecule among 106 wild-type DNA molecules. Therefore, sequencing 100 different loci requires sequencing 108 reads on a single Illumina HiSeq lane. By extension, sequencing 1,000 loci with VAF of 10−8 across 100 patients could cost up to $100 million USD using NGS technologies. As a consequence, the sensitivity of clinical high-throughput sequencing is generally capped at 10−2 VAF for practical reasons, which limits their utility in detecting early or residual disease.
In the example above, the cost of NGS reflects the disproportionate amount of unaltered sequences from ‘normal’ cells in the tissue sample. To overcome this bottleneck, antibody-dependent diseased cell sorting is often used to enrich for the variant sequence (
Regardless, NGS-based approaches do not address whether protein modifications are pertinent to the tissue of interest. In contrast, antibodies detect proteins that are actually expressed in disease-relevant tissues. However, antibodies lack the specificity for discriminating amino acid alterations. While RNA-seq can discriminate genetic mutations and quantify the level of gene expression pattern simultaneously, it has the same sensitivity limitation as other types of NGS applications (described above) due to the overwhelming abundance of wild-type transcripts that require deep-sequencing.
In summary, early or residual cancer screening requires the detection of functional mutations from numerous genetic loci amidst a large number of normal cells (
The subject invention provides a method for determining the presence or absence of variant ribonucleic acid molecules in a population of ribonucleic acid molecules, wherein the reference sequence of the variant ribonucleic acid molecules is known, the method comprising:
The subject invention also provides a composition comprising a primer molecule and at least two probes,
The subject invention also provides a kit comprising a primer molecule and at least two probes,
The subject invention also provides a composition comprising complexes of primer molecules, probes and ribonucleic acid molecules,
The subject invention also provides a method of treating a disease or condition associated with the presence of variant ribonucleic acid molecules in a subject, the method comprising:
The present invention provides for characterizing individual cells by sequencing RNA directly without cDNA synthesis advances diagnostics and discovery. The present invention discloses a probe design that increases RNA templated ligation accuracy, enables multiple rounds of ligation and sequencing of mRNA variant classes without a priori knowledge of their exact sequences. The programmable sequencing chemistry permits cell characterization using conditional statements about single cells.
The subject invention also provides for the methods, processes, compositions, devices, and kits for practicing substantially what is shown and described.
In
The present invention provides a method for determining the presence or absence of variant ribonucleic acid molecules in a population of ribonucleic acid molecules, wherein the reference sequence of the variant ribonucleic acid molecules is known, the method comprising:
In an embodiment, the primer molecules and probes form the following sequence, read 3′ to 5′, when hybridized to their respective complimentary sequence on a ribonucleic acid molecule in the population of ribonucleic acid molecules, wherein the numbers in brackets represent the number of nucleotides, N represents nucleotides of the primer molecule that are fully complimentary to the reference sequence, P represents additional nucleotides of the primer molecule, and X is any whole number sufficient for the primer molecules to have a melting temperature of at least 50° C. when hybridized to their complimentary ribonucleic acid molecules in the population of ribonucleic acid molecules:
In an embodiment, the method further comprises a step of removing excess unhybridized or partially hybridized primer molecules and/or probes after step (a).
In an embodiment, step (c) comprises sequencing the ligated nucleic acid molecules.
In an embodiment, L is 1, 2, 3, 4, 5, 6, 7, or 8. In an embodiment, L is 3.
In an embodiment, the plurality of probes consists of probes complimentary to each respective single base variant along the length of L.
In an embodiment, the plurality of probes consists of probes complimentary to each possible single base variant along the length of L other than non-actionable sequences, synonymous mutations, non-functional polymorphisms, or mutational patterns not observed in the human population.
In an embodiment, some or all of the plurality of probes comprise a fluorophore.
In an embodiment, some or all of the plurality of probes further a signal amplification functional group.
In an embodiment the signal amplification functional group is horseradish peroxidase, alkaline phosphatase, digoxigenin, or fluorescein isothiocyanate (FITC).
In an embodiment, some or all of the plurality of probes comprise an amplification sequence. In an embodiment, the amplification sequence is an adapter sequence for circularization and rolling circle amplification (RCA). In an embodiment, the amplification sequence is a sequence for hybridization of a PCR primer.
In an embodiment, some or all of the plurality of probes comprise a barcode. In an embodiment, some or all of the plurality of probes comprise a cleavable terminator.
In an embodiment, the preferably cleavable terminator is an inosine base. In an embodiment, some or all of the plurality of probes comprise a functional group that accepts and transfers energy from an external source for generation of cytotoxic processes.
In an embodiment, the method comprises one or more further rounds of interrogation and ligation, wherein the ligated nucleic acid molecules formed in step (b) serve as the primer molecules for the next round of interrogation and ligation with a plurality of probes designed as described in step (a) based on the nucleotides of the reference sequence that are adjacent to the nucleotides of the reference sequence that such ligated nucleic acid molecule are complementary to.
In this embodiment, some or all of the plurality of probes further comprise a cleavable terminator and wherein the cleavable terminator is cleaved to form a cleaved ligated nucleic acid molecules which serve as the primer molecules for the next round of interrogation and ligation with a plurality of probes designed as in step (a) based on the nucleotides reference sequence that are adjacent to the nucleotides of the reference sequence that each cleaved ligated nucleic acid molecule is complementary to.
In this embodiment, Endonuclease V is used to cleave the cleavable terminator of the ligated nucleic acid molecules.
In an embodiment, probes that are fully complimentary to the reference sequence of the variant ribonucleic acid molecules do not further comprise a signal amplification functional group. In this embodiment, probes that are fully complimentary to the reference sequence of the variant ribonucleic acid molecules do not further comprise an amplification sequence. In an embodiment, probes that are fully complimentary to the reference sequence of the variant ribonucleic acid molecules are tagged with a non-functional control sequence, which control sequence blocks amplification of the ligated nucleic acid molecule. In an embodiment, probes that are fully complimentary to the reference sequence of the variant ribonucleic acid molecules comprise an inverted dT to prevent circularization and rolling circle amplification.
In an embodiment, probes that are fully complimentary to non-actionable sequences, synonymous mutations, non-functional polymorphisms, and/or mutational pattern not observed in the human population do not further comprise a signal amplification functional group. In an embodiment, probes that are fully complimentary to non-actionable sequences, synonymous mutations, non-functional polymorphisms, and/or mutational pattern not observed in the human population are tagged with a non-functional control sequence, which control sequence blocks amplification of the ligated nucleic acid molecule. In an embodiment, probes that are fully complimentary to non-actionable sequences, synonymous mutations, non-functional polymorphisms, and/or mutational pattern not observed in the human population comprise an inverted dT to prevent circularization and rolling circle amplification.
In an embodiment at least 8 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment at least 9 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment at least 10 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment at least 11 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment at least 12 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment at least 13 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment at least 14 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment at least 15 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence.
In an embodiment at least 8 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment at least 9 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment at least 10 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment at least 11 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment at least 12 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment at least 13 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment at least 14 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment at least 15 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence.
In an embodiment, the primer molecules comprise 20-50 nucleotides. In an embodiment, 20-50 nucleotides that are complementary to the reference sequence of the ribonucleic nucleic acid molecules.
In an embodiment, each primer molecule comprises an amplification sequence. In an embodiment the amplification sequence is an adapter sequence for circularization and rolling circle amplification (RCA). In an embodiment the amplification sequence is a sequence for hybridization of a PCR primer.
In an embodiment, each primer molecule comprises a signal amplification functional group. In an embodiment the signal amplification functional group is horseradish peroxidase. In an embodiment the signal amplification functional group is alkaline phosphatase. In an embodiment the signal amplification functional group is digoxigenin. In an embodiment the signal amplification functional group is fluorescein isothiocyanate (FITC)
In an embodiment, each primer molecule comprises a blocking group to make the ligated nucleic acid molecules resistant to degradation. In an embodiment, the blocking group is an inverted dT. In an embodiment, the blocking group is a phosphorothioate nucleotide or phosphorothioate nucleotides. In an embodiment, the blocking group is inert spacer moiety. In an embodiment, the blocking group is a locked nucleic acid or locked nucleic acids. In an embodiment, the blocking group is a modified base or modified bases.
In an embodiment, each primer molecule comprises a fluorescent or colorimetric sequence. In an embodiment, each primer molecule comprises an inverted dT. In an embodiment, each primer molecule comprises a phosphorothioate nucleotide or phosphorothioate nucleotides. In an embodiment, each primer molecule comprises a locked nucleic acid or locked nucleic acids. In an embodiment, each primer molecule comprises a modified base or modified bases. In an embodiment, each primer molecule comprises a functional group that accepts and transfers energy from an external source for generation of cytotoxic processes. Accordingly, the invention provides a method of generating cytotoxicity depending on the presence or absence of a variant ribonucleic acid molecule.
In an embodiment, the method further comprises a step of degrading un-ligated, excess, and/or off-target probes after step (b).
In an embodiment, degrading is by an endonuclease. In an embodiment, degrading is by an exonuclease. In an embodiment, degrading is by a surveyor enzyme. In an embodiment, degrading is by a resolvase. In an embodiment, degrading is by a ssDNA-binding protein.
In an embodiment, the exonuclease is Exonuclease I. In an embodiment, the exonuclease is T7 exonuclease. In an embodiment, the exonuclease is Exonuclease III.
In an embodiment, the endonuclease is T7 endonuclease I.
In an embodiment, the exonuclease is used in combination with RNase H and/or an RNase cocktail;
In an embodiment, degrading comprises the use of exonucleases that remove bound RNA to degrade partially hybridized probes;
In an embodiment, degrading of bound RNA results in the diffusion of the ligated product for in situ applications in fixed cells or tissues;
In an embodiment, degrading further comprises hybridization independent degrading.
In an embodiment, degradation of ligated nucleic acid molecules is blocked by an inverted dT, phosphorothioate nucleotide, or inert spacer moiety from the primer molecule.
In an embodiment, partially hybridized probes of step (b) are in a complex with DNA or RNA molecules or are non-covalently associated with proteins or other cellular material.
In an embodiment, the method further comprises a step of amplifying the ligated nucleic acid molecules before step (c).
In an embodiment, the step of amplifying the ligated nucleic acid molecules comprises multiple displacement amplification (MDA).
In an embodiment, the step of amplifying the ligated nucleic acid molecules comprises rolling circle amplification (RCA).
In an embodiment, the step of amplifying the ligated nucleic acid molecules comprises Polymerase chain reaction (PCR) amplification.
In an embodiment, the step of amplifying the ligated nucleic acid molecules comprises inhibiting the partially hybridized probe/nucleic acid molecule complexes from being amplified.
In an embodiment, the step of amplifying the ligated nucleic acid comprises first ligating an oligomer assembly to the ligated nucleic acid, wherein the oligomer assembly extends the length of the ligated nucleic acid molecules so as to form an extended ligated nucleic acid molecules, preferably wherein the extended ligated nucleic acid molecules are immobilized.
In an embodiment, the oligomer assembly contains multiple copies of the same sequence.
In an embodiment, the ligation of the oligomer assembly to the ligated nucleic acid enables degradation of the entire oligomer assembly complex, unless the ligated nucleic acid molecule is exonuclease-resistant.
In an embodiment, degradation of the oligomer assembly amplifies the detectable signal from ligated nucleic acid molecules that are complementary to a sequence that differs from the reference sequence.
In an embodiment, degrading of the oligomer assembly complex results in the formation of a single-strand DNA of a known orientation. In an embodiment, the single-strand of DNA contains multiple copies of the same sequence corresponding to a sequence of the oligomer assembly. In an embodiment, the single strand of DNA can be hybridized and sequenced in situ. In an embodiment, the single strand of DNA is hybridized to primer molecules linked to magnetic nanoparticles to magnetize the cell for cell purification.
In an embodiment, the oligomer assembly is formed by using well, condition, or batch specific monomer sequences that can be grown subsequently using further monomer sequences of alternate sequences for combinatorial labeling of the ligated nucleic acid, preferably wherein the oligomer assembly for combinatorial labeling can be used to multiplex 100 to 1,000,000 single cells or wells, or can be used in high-throughput bulk DNA sequencing.
In an embodiment, 50% of the primer molecules are hybridized within two minutes;
In an embodiment, the reaction temperature of step (b) is about 37° C. In an embodiment, the reaction temperature of step (b) is 37° C.
In an embodiment, the ligating of step (b) is ligation with PBCV ligase. In an embodiment, the ligating of step (b) is ligation with T4 Rnl2. In an embodiment, the ligating of step (b) is ligation with T4 DNA ligase.
In an embodiment, in step (b) partially hybridized probes are ligated to adjacent primer molecules at a rate such that they comprise less than 1% of ligated nucleic acid molecules.
In an embodiment, the method can detect the presence of variant ribonucleic acids with a variant allele frequency (VAF) of less than 5%, less than 4%, less than 3%, less than 2%, or about 1%;
In an embodiment, the sensitivity of the method to detect variant ribonucleic acid molecules is 75%-90%;
In an embodiment, the method is conducted ex vivo. In an embodiment, the method is conducted in vitro. In an embodiment, the method is conducted in situ.
In an embodiment, the population of ribonucleic acid molecules is in a tissue culture. In an embodiment, the population of ribonucleic acid molecules are bound to a solid support such as a bead. In an embodiment, the population of ribonucleic acid molecules are bound to parts of a cell. In an embodiment, the population of ribonucleic acid molecules is in a fixed cell or tissue.
In an embodiment, the variant ribonucleic acid molecule is associated with functional changes. In an embodiment, the variant ribonucleic acid molecule is associated with disease. In an embodiment, the variant ribonucleic acid molecule is associated with cancer. In an embodiment, the function changes are functional changes affecting protein structure.
In an embodiment, the variant ribonucleic acid molecule is used for cell tracing. In an embodiment, the variant ribonucleic acid molecule is used for cell labeling.
In an embodiment, the presence or absence of multiple variant ribonucleic acid molecules with different reference sequences is determined by simultaneously performing the method on the population of ribonucleic acid molecules using multiple sets of probes and primer molecules that are each designed as described in step (a) based on the different reference sequences of each of the multiple variant ribonucleic acid molecules.
This invention also provides a composition comprising a primer molecule and at least two probes,
This invention also provides a kit comprising a primer molecule and at least two probes,
In an embodiment, the composition or kit further comprises a ligase. In an embodiment, the ligase is PBCV ligase. In an embodiment, the ligase is T4 Rnl2. In an embodiment, the ligase is T4 DNA ligase.
In an embodiment of the composition or kit, L is 1, 2, 3, 4, 5, 6, 7, or 8. In an embodiment of the composition or kit, L is 3.
In an embodiment, the composition or kit is for use in determining the presence or absence of variant ribonucleic acids in a population of ribonucleic acid molecules.
In an embodiment, the composition or kit comprises probes and primers designed as in (a), (b) and (c) to hybridize to multiple different target sequences such that multiple different target sequences can be interrogated in series or preferably simultaneously.
In an embodiment, the composition or kit comprises an endonuclease. In an embodiment, the composition or kit comprises an exonuclease. In an embodiment, the composition or kit comprises surveyor enzyme. In an embodiment, the composition or kit comprises resolvase. In an embodiment, the composition or kit comprises ssDNA-binding protein. In an embodiment, the exonuclease is Exonuclease I. In an embodiment, the exonuclease is Exonuclease III. In an embodiment, the composition or kit further comprises RNase H and/or an RNase cocktail.
In an embodiment of the composition or kit, the plurality of probes consists of probes complimentary to each respective single base variant along the length of L.
In an embodiment of the composition or kit, the plurality of probes consists of probes complimentary to each possible single base variant along the length of L other than non-actionable sequences, synonymous mutations, non-functional polymorphisms, or mutational patterns not observed in the human population.
In an embodiment of the composition or kit, some or all of the plurality of probes comprise a signal amplification functional group. In an embodiment of the composition or kit, the signal amplification functional group is horseradish peroxidase. In an embodiment of the composition or kit, the signal amplification functional group is alkaline phosphatase. In an embodiment of the composition or kit, the signal amplification functional group is digoxigenin. In an embodiment of the composition or kit, the signal amplification functional group is or fluorescein isothiocyanate (FITC).
In an embodiment of the composition or kit, some or all of the plurality of probes further comprise an amplification sequence. In an embodiment, the amplification sequence is an adapter sequence for circularization and rolling circle amplification (RCA). In an embodiment, the amplification sequence is a sequence for hybridization of a PCR primer.
In an embodiment of the composition or kit, some or all of the plurality of probes further comprise a barcode.
In an embodiment of the composition or kit, some or all of the plurality of probes further comprise an inverted dT.
In an embodiment of the composition or kit, some or all of the plurality of probes further comprise a cleavable terminator. In an embodiment the cleavable terminator is an inosine base.
In an embodiment of the composition or kit, some or all of the plurality of probes further comprise a functional group that accepts and transfers energy from an external source for generation of cytotoxic processes.
In an embodiment of the composition or kit, some or all of the plurality of probes comprise a fluorophore.
In an embodiment of the composition or kit, some or all of the plurality of probes further comprise a cleavable terminator and Endonuclease V is used to cleave the terminator of the ligated nucleic acid molecule.
In an embodiment of the composition or kit, probes that are fully complimentary to the reference sequence of the variant ribonucleic acid molecules do not further comprise a signal amplification functional group.
In an embodiment of the composition or kit, probes that are fully complimentary to the reference sequence of the variant ribonucleic acid molecules do not further comprise an amplification sequence.
In an embodiment of the composition or kit, probes that are fully complimentary to the reference sequence of the variant ribonucleic acid molecules are tagged with a non-functional control sequence, which control sequence blocks amplification of the ligated nucleic acid molecule.
In an embodiment of the composition or kit, probes that are fully complimentary to the reference sequence of the variant ribonucleic acid molecules comprise an inverted dT to prevent circularization and rolling circle amplification.
In an embodiment of the composition or kit, probes that are fully complimentary to non-actionable sequences, synonymous mutations, non-functional polymorphisms, and/or mutational pattern not observed in the human population do not further comprise a signal amplification functional group.
In an embodiment of the composition or kit, probes that are fully complimentary to non-actionable sequences, synonymous mutations, non-functional polymorphisms, and/or mutational pattern not observed in the human population do not further comprise an amplification sequence.
In an embodiment of the composition or kit, probes that are fully complimentary to non-actionable sequences, synonymous mutations, non-functional polymorphisms, and/or mutational pattern not observed in the human population are tagged with a non-functional control sequence, which control sequence blocks amplification of the ligated nucleic acid molecule.
In an embodiment of the composition or kit, probes that are fully complimentary to non-actionable sequences, synonymous mutations, non-functional polymorphisms, and/or mutational pattern not observed in the human population comprise an inverted dT to prevent circularization and rolling circle amplification.
In an embodiment of the composition or kit, at least 8 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment of the composition or kit, at least 9 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment of the composition or kit, at least 10 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment of the composition or kit, at least 11 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment of the composition or kit, at least 12 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment of the composition or kit, at least 13 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment of the composition or kit, at least 14 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment of the composition or kit, at least 15 consecutive nucleotides starting at the 5′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence.
In an embodiment of the composition or kit, at least 8 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment of the composition or kit, at least 9 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment of the composition or kit, at least 10 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment of the composition or kit, at least 11 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment of the composition or kit, at least 12 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment of the composition or kit, at least 13 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment of the composition or kit, at least 14 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence. In an embodiment of the composition or kit, at least 15 consecutive nucleotides starting at the 3′ end of each primer molecule, are fully complimentary to consecutive nucleotides of the reference sequence.
In an embodiment of the composition or kit, the primer molecule comprises 20-50 nucleotides. In an embodiment of the composition or kit, 20-50 nucleotides that are complementary to the reference sequence of the ribonucleic nucleic acid molecules.
In an embodiment of the composition or kit, each primer molecule further comprises an amplification sequence. In an embodiment of the composition or kit, each primer molecule further comprises a signal amplification functional group. In an embodiment of the composition or kit, each primer molecule further comprises a blocking group to make the ligated nucleic acid molecules resistant to degradation. In an embodiment of the composition or kit, each primer molecule further comprises a fluorescent or colorimetric sequence. In an embodiment of the composition or kit, each primer molecule further comprises an inverted dT. In an embodiment of the composition or kit, each primer molecule further comprises. In an embodiment of the composition or kit, each primer molecule further comprises a phosphorothioate nucleotide or phosphorothioate nucleotides. In an embodiment of the composition or kit, each primer molecule further comprises a locked nucleic acid or locked nucleic acids. In an embodiment of the composition or kit, each primer molecule further comprises a modified base or modified bases. In an embodiment of the composition or kit, each primer molecule further comprises a functional group that accepts and transfers energy from an external source for generation of cytotoxic processes.
In an embodiment of the composition or kit, each primer molecule comprises a blocking group to make the ligated nucleic acid molecules resistant to degradation. In an embodiment the blocking group is an inverted dT.
In an embodiment the blocking group is a phosphorothioate nucleotide or phosphorothioate nucleotides. In an embodiment the blocking group is an inert spacer moiety. In an embodiment the blocking group is a locked nucleic acid or locked nucleic acids. In an embodiment the blocking group is a modified base or modified bases.
In an embodiment each primer molecule comprises an amplification sequence. In an embodiment the amplification sequence is an adapter sequence for circularization and rolling circle amplification (RCA). In an embodiment the amplification sequence is a sequence for hybridization of a PCR primer.
This invention also provide a composition comprising complexes of primer molecules, probes and ribonucleic acid molecules,
In an embodiment, the complexes further comprise a ligase. In an embodiment, the composition of complexes comprises the complexes formed by performing the methods described herein.
This invention also provides method of treating a disease or condition associated with the presence of variant ribonucleic acid molecules in a subject, the method comprising:
In an embodiment, the subject is a human. In an embodiment, the subject is not a human.
The present invention also provides for methods, processes, compositions, devices, and kits for practicing substantially what is shown and described.
Each embodiment disclosed herein is contemplated as being applicable to each of the other disclosed embodiments. Thus, all combinations of the various elements described herein are within the scope of the invention.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by a person of ordinary skill in the art to which this invention belongs.
As used herein, and unless stated otherwise or required otherwise by context, each of the following terms shall have the definition set forth below.
As used herein, “about” in the context of a numerical value or range means±10% of the numerical value or range recited or claimed, unless the context requires a more limited range.
The terms “sequencing primer” and “primer molecule” are used interchangeably herein. As used herein, the “primer molecule” encompasses both (a) the nucleotides that are fully complementary to ribonucleic acid molecules in the population of ribonucleic acid molecules, and the (b) any other component that is covalently attached to these nucleotides, such as, without limitation, additional nucleotides that are partially complementary to the ribonucleic acid molecules, additional nucleotides that are complementary to PCR primers for subsequent amplification, additional nucleotides that block exonuclease digestion, spacers, signal amplification functional groups, or other functional groups. The nucleotides of the primer molecule that are fully complementary to ribonucleic acid molecules in the population of ribonucleic acid molecules are preferably deoxyribonucleotides.
The terms “template”, “nucleic acid”, and “nucleic acid molecule”, are used interchangeably herein, and each refers to a polymer of nucleotides. “Nucleotide” shall mean any monomer units for forming the deoxyribonucleic acids and ribonucleic acids or derivatives or analogues thereof, or hybrids of any of these. These monomer units include, without limitation, deoxyribonucleotides and ribonucleotides, nucleotides that have been modified according to techniques known in the art, and the monomer units of nucleic acid analogues. “Nucleic acid analogues” are structural analogues of DNA or RNA, designed to hybridize to complementary nucleic acid sequences. Examples of nucleic acid analogs include, but are not limited to the Nucleic acid analogues disclosed in Hunziker, J, and Leumann, C. (1995), peptide nucleic acids (PNA), locked nucleic acids (LNA) (Imanishi, et al WO 98/39352; Imanishi, et al WO 98/22489; Wengel, et al WO 00/14226), 2′-O-methyl nucleic acids (Ohtsuka, et al, U.S. Pat. No. 5,013,830), 2′-fluoro nucleic acids, phosphorothioates, and metal phosphonates. The term “nucleotide base” may be used interchangeably with “nucleotide”. “Genomic nucleic acid” refers to DNA derived from a genome, which can be extracted from, for example, a cell, a tissue, a tumor or blood.
As used herein, the term “amplifying” refers to the process of synthesizing nucleic acid molecules that are complementary to one or both strands of a template nucleic acid. Amplifying a nucleic acid molecule typically includes denaturing the template nucleic acid, annealing primers to the template nucleic acid at a temperature that is below the melting temperatures of the primers, and enzymatically elongating from the primers to generate an amplification product. The denaturing, annealing and elongating steps each can be performed once. Generally, however, the denaturing, annealing and elongating steps are performed multiple times (e.g., polymerase chain reaction (PCR)) such that the amount of amplification product is increasing, often times exponentially, although exponential amplification is not required by the present methods. Amplification typically requires the presence of deoxyribonucleoside triphosphates, a DNA polymerase enzyme and an appropriate buffer and/or co-factors for optimal activity of the polymerase enzyme. An “amplification sequence” is a sequence of nucleotides whose presence is necessary to amplify a nucleic acid molecule using a given amplification method, such as, without limitation, an adapter sequence for rolling circle amplification (RCA), or a sequence which PCR primers may hybridize to for PCR amplification.
As used herein the term “amplicon” refers to a nucleic acid molecule that is the product of amplifying a nucleic acid molecule.
As used herein the term “multiple displacement amplification” (MDA) refers to a method of isothermal, strand-displacing amplification as described in Dean et al. 2002.
As used herein, the term “sequence” may mean either a strand or part of a strand of nucleotides, or the order of nucleotides within a strand or part of a strand, depending on the appropriate context in which the term is used. Unless specified otherwise in context, the order of nucleotides is recited from the 5′ to the 3′ direction of a strand.
As used herein, the term “read” or “sequence read” refers to the nucleotide or base sequence information of a nucleic acid that has been generated by any sequencing method. A read therefore corresponds to the sequence information obtained from one strand of a nucleic acid fragment. For example, a DNA fragment where sequence has been generated from one strand in a single reaction will result in a single read. However, multiple reads for the same DNA strand can be generated where multiple copies of that DNA fragment exist in a sequencing project or where the strand has been sequenced multiple times. A read therefore corresponds to the purine or pyrimidine base calls or sequence determinations of a particular sequencing reaction.
As used herein, the terms “sequencing”, “obtaining a sequence” or “obtaining sequences” refer to obtaining nucleotide sequence information that is sufficient to identify or characterize the nucleic acid molecule and could be the full length or only partial sequence information for the nucleic acid molecule.
As used herein, the terms “wild-type” or “reference sequence” refers to a non-mutant sequence of nucleotides from a genome of the same species as that being analyzed, for which genome at least the non-mutant sequence information is known. As used herein, the term “wild-type” may be used interchangeably with “reference”. Reference sequence may refer to a non-mutant ribonucleotide sequence.
In embodiments of the present invention, “having a known nucleotide sequence” may refer to having a known “reference nucleotide sequence.”
As used herein, the term “variant” or “variant allele” refers to a sequence of nucleotides, variant codon, or indel, resulting in a sequence other than a wild-type sequence from the genome of the same species as that being analyzed for which genome the non-mutant sequence information is known. As used herein, the term “variant allele frequency” (VAF) refers to the refers to the ratio of variant alleles to wild-type alleles in a population. For example, 1 variant allele among 1,000,000 wild type alleles may be represented as a 10−6 VAF. In embodiments of the present invention, the VAF may be less than about 10−2, 10−3, 10−4, 10−5, 10−6, 10−7, 10−8. In embodiments of the present invention the VAF may be less than 10−9. As used herein, “variant allele” may refer to the variant allele in the genome or the variant allele that has been transcribed into a variant ribonucleic acid molecule. As used herein “variant ribonucleic acid molecule” is a ribonucleic acid molecule that has a sequence of ribonucleotides other than the ribonucleic acid wild-type sequence.
As used herein, the term “functionally relevant sequences” refers to sequences whose alterations could lead to functional changes, diseases, or lend themselves to lineage or cell labeling applications (See e.g.,
As used herein a “functionally relevant sequence variant” refers to the “variant allele” of a functionally relevant sequence. A “variant ribonucleic acid molecule” may be a functionally relevant sequence variant if it encodes a sequence whose alterations could lead to functional changes, diseases, or lend themselves to lineage or cell labeling applications. Thus, “functionally relevant sequence variant” encompasses functionally relevant variant ribonucleic acid molecules.
In embodiments of the present invention, the wild-type allele for a functionally relevant sequence has a known nucleotide sequence. Accordingly, in embodiments of the present invention nucleic acid molecules comprising the wild-type allele of the functionally relevant sequence are preferentially not amplified.
As used herein, the term “saturate” may be used interchangeably with the term “capture”. In embodiments of the present invention, saturating a sequence with, for example probes or primers, comprises saturating the sequence with a concentration of probes or primers capable of saturating the sequence.
In embodiments of the present invention, each probe may differ from the reference sequence at one or more nucleotide base.
In embodiments of the present invention, ligating each probe to a primer in competition under conditions favoring the ligation of fully hybridized probes over partially hybridized probes comprises ligating only hybridized probes.
In embodiments of the present invention, degrading un-ligated, excess, and/or off-target probes comprises removing un-ligated, excess, and/or off-target partially degenerate probes.
In embodiments of the present invention, a mixture of nucleic acid molecules comprising a plurality of functionally relevant sequences may refer to a mixture comprising a plurality of nucleic acid molecules each comprising the same functionally relevant sequence, or comprising a plurality of functionally relevant sequences among the nucleic acid molecules.
As used herein, the term “barcode”, also known as an “index,” refers to a unique DNA sequence within a sequencing adaptor used to identify the sample of origin for each fragment.
As used herein, the term “gene” includes a DNA region encoding a gene product, as well as all DNA regions which regulate the production of the gene product, whether or not such regulatory sequences are adjacent to coding and/or transcribed sequences. Accordingly, a gene includes, but is not necessarily limited to, promoter sequences, terminators, translational regulatory sequences such as ribosome binding sites and internal ribosome entry sites, enhancers, silencers, insulators, boundary elements, replication origins and locus control regions.
As used herein, the term “sequencing target” refers to the sequence of interest which is selected, amplified, and/or revealed via the sequencing operation. This sequence is represented in a traditional format via the oligonucleotide bases (e.g. G, T, A, C, and U) or in a similar textual format. “Target sequences on a ribonucleic acid molecule” are sequences of A, G, U and C nucleotides on the ribonucleic acid molecule that the primer molecules and probes are designed to hybridize to.
As used herein, the term “next generation sequencing” or “NGS” refers to any modern high-throughput sequencing technology. NGS includes, but is not limited to, sequencing technologies such as Illumina (Solexa) sequencing and SOLiD sequencing.
As used herein, the term “hybridization” is used in reference to the pairing of complementary nucleic acids. Hybridization and the strength of hybridization (i.e., the strength of the association between the nucleic acids) is impacted by such factors as the degree of complementary between the nucleic acids, stringency of the conditions involved, the melting temperature of the formed hybrid, and the G:C ratio within the nucleic acids.
As used herein, the term “probe” refers to an oligonucleotide (i.e., a sequence of nucleotides), whether occurring naturally as in a purified restriction digest or produced synthetically, recombinantly or by amplification (e.g. PCR), which is capable of hybridizing to another oligonucleotide of interest. Probes are useful in the detection, identification and isolation of particular gene sequences (e.g., Her2, marker A1, marker A2 or marker B). The term probe encompasses the oligonucleotide portion of the probe that is designed to hybridize to a target sequence as well as any other any other component that is covalently attached to these nucleotides. For example, it is contemplated that any probe used in the present invention may be labeled with any “reporter molecule,” so that is detectable in any detection system, including, but not limited to enzyme (e.g., ELISA, as well as enzyme-based immunohistochemical assays), fluorescent (e.g., FISH), radioactive, mass spectroscopy, and luminescent systems. It is not intended that the present invention be limited to any particular detection system or label. As used herein the term “k-mer” refers to a probe of length k.
As used herein, the term “oligomer assembly” is used interchangeably with “concatemers”. Concatemers may be formed by short monomers that anneal to one another by virtue of having partially overlapping oligonucleotides.
In embodiments of the present invention, a mixture of nucleic acid molecules comprising a plurality of functionally relevant sequences may comprise nucleic acid molecules comprising the wild-type allele of the functionally relevant sequence and/or nucleic acid molecules comprising the variant allele, variant codon, or indel for the functionally relevant sequence. A “population of ribonucleic acid molecules” may comprise ribonucleic acid molecules comprising the wild-type/reference sequence of the functionally relevant sequence and/or ribonucleic acid molecules comprising the variant ribonucleic acid molecule. A “population of ribonucleic acid molecules” may refer to any composition that comprises ribonucleic acid molecules, such as, without limitation, a cell, a tissue, a tumor or blood.
As used herein, the term “Tm” refers to the melting point of a nucleic acid template, measured as the temperature(s) at which half of the nucleic acid template is present in a single-stranded (denatured) form.
As used herein, the term “Treaction” refers to the temperature(s) at which a hybridization reaction is being conducted.
The human body has trillions of cells, excluding microbial organisms, and each cell has a unique combination of gene expression, somatic mutation, epigenetic modification, and post-transcriptional processing. If cells could be labeled using genomic signatures with single-nucleotide specificity and sensitivity, it might be possible to map functional genetic mosaicism and/or distinguish aberrant from normal cells early in complex traits disease progression and use this information for disease screening or in early detection, such as cancer. (
The present invention discloses an algorithm and reaction parameters to reduce the degenerate probe complexity in DNA or RNA or nucleic acid sequencing, and its application in single cells for highly accurate consensus base calling using a wide range of enzymes and conditions. The algorithm of the present invention for probe selection favorably impacts the detection of rare cancer cells in an affordable and scalable manner, compared to traditional sequencing or single-cell quantification methods (
Embodiments of the present invention disclose a method for quantifying or labeling single cells based on RNA-templated in situ sequencing chemistry, overcoming barriers with regard to sequencing and the detection sensitivity, specificity, bias, speed, scalability, and read-length for sequencing RNA molecules directly, i.e. RNA-seq, in single cells for massively parallel single-cell analysis, image-based functional genomics, and cancer diagnostics (
Embodiments of the present invention describe methods for sequencing a subset of RNA or nucleic acid sequences from any given loci using DNA ligase-dependent primer extension methods. The present invention enables one to choose the desired sequencing product (e.g. variant base compositions or positions) versus indiscriminately interrogating all possible sequence variants. By selecting a set of oligonucleotides containing mixed bases to interrogate functionally relevant subsequences, while ignoring uninformative or background sequences (
Embodiments of the present invention may utilize sequencing probes capable of only detecting a subset of relevant sequences (
The primer molecules have a melting temperature of at least 50° C. when hybridized to their complimentary ribonucleic acid molecules in the population of ribonucleic acid molecules. In embodiments the melting temperature primer molecules may be about 50, 51, 52, 53, 54, 55, 56, 57, 58, or 58° C. or may be at least 60° C. An aspect of the invention is a large differential between the melting temperature of the primer molecules compared to the probes when hybridized to their respective complimentary sequences. Probes that are fully complementary along the length of L+S have a melting temperature that is about the same as the reaction temperature, such that probes that have a mismatch along the length of L have a melting temperature that is below the reaction temperature. Since the reaction temperature is well below the melting temperature of the primer molecules, the primer molecules are fully hybridized to their reference sequences on the ribonucleic acid molecules during the ligation reaction.
In embodiments of the present invention, each sequencing target is comprised of a constant primer region, either upstream or downstream of a variable region of length L to be interrogated. The variable region or the sequence within the interrogated region may be as short as one (1)-base, three-(3) bases (e.g. codon), or longer (e.g. insertion, deletion, splicing enhancers, protein binding motifs, junction, fusion, molecular barcodes); however, L is generally, and in some embodiments always, less than the sequencing read-length. In a particularly useful embodiment of the invention, the probes are designed to interrogate three-(3) bases that form a codon (anti-codon oligonucleotides) to determine the presence or absence of variant ribonucleic acid molecules that produce a functional change in a protein that is translated from the ribonucleic acid molecule.
The length (L) determines the total number of possible sequences (4L) that could be expected at random, i.e. if every base is uniformly degenerate. There is one wild-type or reference sequence for a given locus, i.e. for each locus (
The overwhelming number of somatic mutational events are due to point mutations, resulting from errors in DNA replication or repair. They occur at the rate of ˜1 to 10 per cell division from embryonic to adult development (Bae, T. et al. (2017); Lodato, M. A., et al. (2018)). Hundreds to thousands of somatic point mutations are present in single cells (Bae, T. et al. (2017); Lodato, M. A., et al. (2018); Enge, M. et al. (2017); Navin, N., et al. (2011)). The low frequency of somatic mutations and the large size of the human genome makes it unlikely that any two independent point mutations occur within the span of L bases in a single cell. If one considers single point mutations or point mutations across or in L, only 3L sequences are possible, that is to say the number of non-wild-type sequences is much smaller (3L). When L=12 (equals twelve), 36 non-wild-type sequences exist, i.e. this value is 36, nearly a 470,000-fold (466,000-fold) reduction in potential sequences and therefore interrogation probe complexity [(4L−1)/3L]. It is possible to represent all oligonucleotide, e.g. three oligonucleotide, sequences containing a single or all possible point mutations and their complementary sequences as a set of n oligonucleotides using mixed bases that excludes the wild-type base at each position, as shown in Table 1 below. Here, n is equal to L for single point mutations. When L=equals 12, twelve (12) synthetic mixed-base oligonucleotides can represent all possible point or single point mutation sequences.
Compared to standard NGS, in which all bases are degenerate, programmable k-mers have 105-fold lower sequence complexity and permit higher molar concentration per sequence for SBL.
In embodiments of the present invention the sequence space can be further reduced to ignore non-informative sequences, including synonymous mutations, non-functional polymorphisms, and unobserved mutational patterns, for example mutational patterns not observed in human diseases, by changing the mixed base symbols among the in oligonucleotides. In embodiments of the present invention n remains unchanged as long as all base positions can be mutated. Therefore, a set of n oligonucleotides can interrogate any sequence subspace containing a single point mutation. In embodiments of the present invention n oligonucleotides can interrogate any sequence subspace containing a single point mutation using oligonucleotide extension methods (e.g. Sequencing-By-Ligation).
Sequencing-By-Ligation (SBL) interrogates multiple contiguous bases at once, with a variable base calling accuracy (Landegren, U., et al. (1988); Shendure, J. et al. (2005)). In embodiments of the present invention, variable base calling accuracy is achieved by decreasing ligation further away from the ligation junction. Therefore, the ligation specificity and reaction conditions are important parameters for determining the allowable value of L (
In SBL, the sequencing template (hereinafter T), a DNA template, is pre-hybridized to the sequencing primer (SP) and is transiently bound to interrogating oligonucleotides of length L+S (
Because base-pairing mismatches incur ΔG° penalties, the ratio of T:M to T:C intermediate or MM: k-mermismatch to SP: k-merperfectmatch pre-ligation complexes are determined by Keq of hybridization, which can be inferred from ΔΔG° between T:M and T:C, i.e. the two possibilities. The ΔG° penalty for a single-base mismatch is +0.5 kcal/mol, whereas a correct pair lowers ΔG° by −1.3 kcal/mol. When Treaction is equal to Tm, the amount of T:C, i.e. k-merperfectmatch, is ˜50% more than T:M for a 1-base k-mermismatch based on ΔΔG° i.e. for single base mismatch (Zhang, D. Y. et al. (2015); Wang, J. S., et al. (2015)) calculations (
If Treaction<<Tm, the ratio of T:C (SP: k-merperfectmatch) to T:M (SP: k-mermismatch) is dictated by the initial concentration of C and MM, their probe concentrations since they do not equilibrate (
Not all possible k-mer sequences for a given locus will have the same Tm, for example depending on their GC content. However, it is impractical to calculate Tm for each possible sequence across the sequencing loci, or for every sequencing locus, for optimal probe hybridization. This is the reason why allele-specific primer hybridization (e.g. PCR, FISH) cannot be multiplexed, since each target-specific primer could have different Tm that exceeds the Tm differences between alleles.
Therefore, purely hybridization-based methods for sequencing are prone to error as the number of degenerate probes increases (or in multiplexing experiments), even if Treaction approaches the average Tm(
To overcome the extremely low concentration of C (SP: k-merperfectmatch) (SP:PM in
DNA ligases have distinct Km and kcat for T:C (SP:PM) and T:I (SP:MM) complexes. Km describes the affinity for which the enzyme recognizes the substrate, and kcat describes the turnover rate of the substrate once bound to the enzyme. For high-fidelity ligases kcat/Km can be several orders of magnitude larger than kcat/Km of mismatched substrates.
If T:C (SP:PM) and T:M (SP:MM) cannot equilibrate (Treaction<<Tm), a large fraction of T:M (SP:MM) will prevent the formation of T:C (SP:PM) for productive ligation, reducing the overall SBL yield. In addition, DNA ligases will eventually ligate T:M as well, increasing the sequencing error rate over time (
Therefore, any contiguous bases of length L can be driven to near completion for SBL if DNA ligases demonstrate a measurable kcat/Km difference between T:C (SP:PM) vs. T:M (SP:MM) as long as T:C and T:M (SP:PM vs. SP:MM) continue to equilibrate (Treaction˜Tm). This assumes that no other trapped or non-productive products are formed during the reaction. Some ligases and sequence motifs form adenylated DNA products during the reaction, which reduce the concentration of T:C (SP:PM) and also inhibit the activity of DNA ligases. In embodiments of the present invention this will limit the practical efficiency of SBL at any given concentrations or reaction temperatures.
Embodiments of the present invention restrict the probe complexity to detect single point mutations or a specific subset of possible sequence variants (e.g. non-synonymous mutations), thereby increasing the initial relative concentration of C (SP:PM) for ligation reactions (e.g. ˜470,000, specifically, 466,000-fold, for L=12). In embodiments of the present invention the concentration is increased compared to completely degenerate k-mers (
In embodiments of the present invention it is understood that controlling the reaction temperature of SBL and L is important for reducing the error rate and increasing the efficiency. For example, at a reaction temperature below Tm of k-mers, the probability of DNA hybridization containing single-base mismatch increases, which forces T4 DNA ligase to ligate incorrect oligonucleotides. In the absence of correct k-mers, the erroneous primer ligation is as high as 25% (
In embodiments of the present invention it is understood that 5′ phosphate base may be critical for high rSBL efficiency (
In embodiments of the present invention SBL can be implemented on DNA or RNA using click chemistry using alkenyl and azide modifications to the sequencing primer and k-mer ends. The reaction condition of click-based DNA ligation is adjusted to maximize the difference of ligation between matched and mismatched k-mer probes.
Embodiments of the present invention provide an SBL product engineered to contain DNA modifications for conditional DNA amplification or elimination. This allows one to selectively amplify any subset of sequences after SBL or degrade wild-type sequences that could interfere with the rare variant detection.
The initial SBL product remains hybridized to the RNA template, forming a DNA-RNA duplex. If error in SBL were to occur, it creates one or more mismatches between DNA and RNA strands. Embodiments of the present invention provide use of an endonuclease, Surveyor enzyme, resolvases, or ssDNA-binding proteins specific for mis-matched ssDNA loops which can recognize such mismatches. This will cleave error containing SBL products so that they cannot be amplified (e.g. enzyme-linked) for highly specific molecular readout (e.g. optical imaging).
In embodiments of the present invention, an exonuclease degrades sequences not in an SBL product from participating in a PCR reaction. In embodiments of the present invention, the exonuclease is Exo 1, or T7 exo. In embodiments of the present invention, the exonuclease is in combination with an RNAase. In embodiments of the present invention the RNAase is RNASeH or RNase H. (
The probe with a variable region L can also be modified using adapter sequences for heterodimer ligation, circularization, and RCA. The adapter sequences can be arranged so that the SBL product is amplified if the subsequence A and B are present. This property can be used to label single cells only when mutation X and Y are both present. Here, adapter sequences for X and Y form a heterodimer concatemer capable of self-circularization and RCA. If either X or Y is missing, the concatemer cannot be formed or circularized.
In further embodiments of the invention the adapter sequences added to the SBL primer and interrogating oligonucleotides can be further modified to include phosphorothioate, locked nucleic acids (LNAs), and other modified bases in order to change their Tm or DNA cleavage sensitivity. This is important for ssDNA-specific error correction mechanisms used after SBL, if one were to utilize the adapter sequence for PCR amplification or NGS.
Other embodiments of the invention include an acrydite, azide, or biotin moiety for conditionally immobilizing the SBL product based on the sequence detected. Embodiments may also include a phosphorothioate, inverted T, or inert spacer moiety for conditionally blocking exonuclease digestion. Certain modifications (e.g. deoxyUridine, chimeric RNA nucleotide) can be used to selectively cleave the SBL product, while others (e.g. inverted T, spacers) prevent circular ligation and RCA of specific variant sequences. Together, they can pull-down, isolate, amplify, degrade, or cleave any number of sequence variants or their combinations after SBL for a variety of applications.
Additional embodiments of the inventions include digoxigenin, digoxin, HRP, alkaline phosphatase, or other moieties used for enzyme-linked assays. In embodiments of the present invention SBL may be used to conjugate a specific enzyme activity to a subset of DNA or RNA sequences, followed by the degradation of error-associated or off-target probe-enzyme binding. When used in conjunction with the appropriate enzyme substrate, any specific or general category of DNA or RNA variants can be detected using a fluorescent or colorimetric assay. Such a method could be suitable for rapid and highly multiplexed testing for the presence of mutant cells, pathogens, contaminants, DNA/RNA-based molecular diagnostic markers using a portable, point-of-care device. In embodiments of the present invention the method provides a substitute for antibodies in enzyme-linked assays to estimate the abundance of mutant proteins by quantifying non-synonymous codon alterations directly from the cell or tissue lysate for point-of-care clinical applications.
SBL can be performed using PBCV-1 DNA ligase or similar ligases capable of DNA ligation splinted by RNA (
Embodiments of the present invention providing RNA-based SBL using PBCV DNA ligase using four competing oligonucleotides detect up to 50% of the sequencing primer bound to the RNA template within one minute at 25° C. or 37° C. After 60 min, the sensitivity of RNA SBL is 75% and/or 90%, respectively (
Since RNA-based somatic mutations are present in multiple copies (generally ˜20 or more for common oncogenes), embodiments of the present invention provide SBL reads to call mutations with a low false positive and negative rate even in the presence of a high error rate (e.g. long-read sequencing).
To generate a consensus base call, it is necessary to attach metadata associated with their origin to each read. Accordingly, embodiments of the present invention may include the use of UMIs for individual molecules, for example, when technical noise during molecular amplification may be an issue. To compare all reads from a given cell, embodiments of the present invention may label SBL reads with the cellular ‘UMI.’ For example, individual cells can be sorted into separate wells. In such embodiments, since all SBL reads come from a single cell, they can be averaged to eliminate random sequencing errors and identify true biological variants. Other embodiments localize individual reads in single cells in situ. Therefore, the accuracy of SBL for identifying somatic mutations from a single cell depends on its compatibility with single cell manipulation and analysis.
In embodiments of the present invention C or G may be present adjacent to the target of interest, lowering its rSBL efficiency; however, base-specificity extends from the ligation site for up to 3-bases with greater than 90% specificity in both 5′ and 3′ rSBL direction (
In further embodiments of the invention SBL primers and interrogating k-mer oligonucleotides can include ribonucleotide, inosine, locked nucleic acids (LNAs), and other modified bases in order to change their Tm and their probe length in order to maintain the balance of k-mer hybridization and exchange of mismatched oligonucleotides at a given reaction temperature.
Other embodiments of the invention include an acrydite, amino-allyl, azide, or biotin moiety for conditionally immobilizing the SBL product. Embodiments may also include a phosphorothioate, inverted T, or inert spacer moiety for conditionally blocking exonuclease digestion. Together, they can pull-down, isolate, amplify, degrade, or cleave any number of sequence variants or their combinations after rSBL.
In embodiments of the present invention SBL is implemented on DNA or RNA using click chemistry using alkenyl and azide modifications to the sequencing primer and k-mer ends. The reaction condition of click-based DNA ligation is adjusted to maximize the difference of ligation between matched and mismatched k-mer probes. In another embodiment of the present invention rSBL is performed using ribozyme sequences incorporated into either the sequencing primer or k-mers, in which ribozyme sequences are evolved for ligating DNA probes on RNA templates with different kinetics depending on the number of mismatches.
In embodiments of the present invention SBL is implemented using k-mers that are ligated to the sequencing primer by T7 RNA ligase or other ligases capable of joining 5′ and 3′ RNA ends. Embodiments include RNA k-mers that contain tracer RNA, RNA aptamers, ribozymes, or other RNA-based functional groups for programmable activation in vitro.
The result of a successful SBL reaction is a single-stranded DNA product hybridized to each sequenced template. This allows one to selectively label, pull-down, or amplify any subset of sequences after SBL in addition to removing or degrading wild-type sequences that could interfere with the rare variant detection.
If rSBL occurs on the RNA template, it results in the formation of a DNA-RNA duplex. If error in rSBL were to occur, it creates one or more mismatches between DNA and RNA strands. Embodiments of the present invention provide use of an endonucleases, resolvases, or ssDNA-binding proteins which can recognize such mismatches (
In embodiments of the present invention, exonucleases degrade sequences not incorporated into SBL products and prevent them from being PCR amplified (
In embodiments of the present invention, target-specific sequencing primers are designed to be orthogonal (
In embodiments of the present invention, absolute rather than relative RNA quantification is possible by performing rSBL directly on single-molecules or molecular amplicons in a flow cell or on glass, similar to Nanostring or Illumina NGS platforms. Embodiments of this invention enable one to quantify or sequence only those molecules bearing deleterious functional mutations, significantly lowering the bandwidth needed to quantify low-abundance nucleic acids associated with early cancer (
In embodiments of the present invention, k-mers bearing one or more mixed bases (e.g. K or R) are used for rSBL either at 5′ or 3′ ends of the sequencing primer (
In embodiments of the invention automated fluidics and imaging instrumentation enables quantifying DNA amplicon molecules arrayed on glass or in flow cells using fluorescent single-base extension or NGS (e.g. SBS chemistry); however, only those amplicons containing functionally deleterious variants form productive sequencing primers after k-mer SBL that can be extended and visualized. This embodiment enables one to perform deep-sequencing of billions of single-molecules or molecular amplicons without wasting reads on uninformative sequences (e.g. wild-type, synonymous mutations) (
In embodiments of the invention target-specific sequencing primers and k-mers can include universal or barcoded adapter sequences for secondary probe hybridization, in situ PCR, and/or rolling circle amplification (RCA) for detecting rSBL products from programmable k-mers inside chemically fixed cells or tissue sections in situ for cell imaging or in suspension for cell sorting (
In embodiments of the invention k-mer sequences can differ in their composition by the virtue of containing types of adapter sequences, end modifications, or degradation-resistant phosphate backbone modifications or stem-loop structures for conditional SBL or SBL product degradation or amplification (
In embodiments of the invention the sequencing primer can incorporate existing methods for detecting DNA probes in situ through molecular amplification (
In embodiments of the invention rSBL products bound to RNA inside single cells in situ can be amplified 100-1,000-fold using sequential antibody-based amplification (e.g. primary and secondary antibodies), followed by enzymatic conversion of cell labeling substrates (e.g. fluorescein-labeled tyramide) (
In embodiments of the present invention additional signal amplification is achieved by ligating or hybridizing reporter molecules comprised of short oligonucleotide monomers bearing modifications suitable for fluorescence or colorimetric detection (
In embodiments of the present invention T7 or SP6 bacteriophage promoters attached to k-mer interrogation probes can be used to synthesize short RNA transcripts using in vitro transcription (IVT). RNA molecules are functionally modified during or after IVT to reduce its diffusion through cross-linking (e.g. aminoallyl UTP, biotin UTP). Embodiments of the invention containing T7 or SP6 promoters enable one to translate synthetic peptides in vitro or in situ using in vitro transcription and translation systems (e.g. PURExpress from NEB). Such peptides can be short tags (e.g. His 6× tag, Flag tag, HA tag) or longer enzymes or fluorescent proteins (e.g. GFP, RFP). Depending on the number of rSBL products, embodiments of the invention enables multiple signal amplification steps (e.g. in vitro transcription: ˜100-fold, in vitro translation: ˜1000-fold, 1° and 2° antibodies: ˜1000 fold, FITC-tyramide converting enzyme: ˜100-fold), mimicking a massive level of signal amplification that occurs from genomic DNA to proteins inside single cells. After in vitro transcription/translation, cell culture or tissue section slides can be used for standard immunohistochemistry (IHC) using anti-tag primary antibodies.
In embodiments of the invention, programmable sequencing of functional mutations using rSBL using partially degenerate k-mers is performed on disposable paper, dip stick, or other forms of solid substrate to ‘fish out’ desired nucleic variants of interest for rapid quantification (
In embodiments of the invention, target-specific sequencing primers are immobilized onto a solid substrate. The paper strip is immersed in the sample (e.g. tissue lysate, concentrated blood, body fluids) to capture desired nucleic acids of interest, followed by a wash cycle to remove excess. The paper strip is transferred to another tube for rSBL with programmable k-mers that possess signal amplification functional groups (e.g. horseradish peroxidase, alkaline phosphatase, digoxigenin, FITC). The paper strip is washed again, and it is then transferred to a signal read-out tube containing enzyme substrates (
Embodiments of this invention may include driver codon mutation probes against KRAS (
Embodiments of the present invention include the use of loci-specific probe design principles to label single cells using induced somatic mutations, for example through the Cas9/CRISPR system. (
Embodiments of the present invention read any genetic information of length L in single cells. In specific embodiments, the location of such genetic information that is written or edited can be interspersed throughout the genome, as in cancer point mutations or Cas9-induced insertions or deletions. Embodiments of the present invention convert this information into short single-stranded DNA fragments inside the cell for signal amplification and oligonucleotide detection. The short DNA fragments are stable and amenable to single molecule amplification in solution or in situ. Embodiments of the present invention may assemble the short DNA fragments into larger polymers using specific end-joining adapter sequences. Such polymeric structures from the short DNA fragments derived from SBL can be amplified and interrogated in solution or in situ to generate a consensus read, since the number of polymerizable DNA fragments can be adjusted by varying the number of unique ends for end-joining (In embodiments of the present invention such DNA polymers could come from SBL products from multiple loci, and can be either linear or circular for signal amplification using strand-displacing DNA polymerases (e.g. Phi29).
Embodiments of the present invention utilize barcoded SBL-capable oligonucleotides for readout of individual bases. To discriminate individual SBL products that represent one specific sequence, embodiments of the present invention may sequence every base in single-stranded DNA fragments using molecular sequencing (e.g. SBL) post signal amplification. Additional embodiments barcode individual oligonucleotides in a manner to allow easier discrimination using probe hybridization, antibody-based detection, or any other means of affinity-based detection. For the latter, individual oligonucleotides capable of representing the genetic information in single cells have to be synthesized. For large L, massively parallel synthesis of modified or barcoded oligonucleotides can become rate-limiting (e.g. 412). Embodiments of the present invention can reduce the complexity of interrogation oligonucleotides by a factor of several orders of magnitude (e.g. ˜470,000-fold reduction at L=12 for single point mutations), allowing for affordable synthesis of individual single-stranded DNA probes for sequencing, enabling rapid, non-enzymatic interrogation of individual point mutations, polymorphisms, or variants using probe hybridization to barcode sequences, significantly shortening interrogation time. This can be used to sequence multiple loci for point mutations using ordinary high-throughput oligonucleotide synthesis platforms (e.g. Custom Arrays, LC Sciences, IDT), followed by a probe hybridization-based rapid readout. This enables easier, non-enzymatic ‘painting’ of single cells in a tissue section or cell culture for their single-nucleotide variant profiles, including tumor mutations, akin to general tissue stains used in medical pathology.
In situ hybridization of short probes results in diffuse background or non-specific binding to fixed proteins or nucleic acids due to charge-charge and hydrophobic interactions. In single-molecule FISH, multiple probes co-localize on the same molecule to generate high SNR. Alternatively, molecular inversion probes can generate high SNR only when the two arms of the probe are ligated together. In these methods, false positives are of concern, especially when trying to detect single-base differences with high-sensitivity amplification techniques. In embodiments of the present invention, non-specifically bound probes are completely degraded upon successful SBL on RNA. In these embodiments only fully ligated products are capable of surviving, for example, after exonuclease degradation, and initiate in vitro transcription (IVT) from RNA reporters in situ. In embodiments of the present invention 8 to 12 hour IVT is sufficient to amplify the bound probe and signal amplification can continue indefinitely as long as fresh enzymes are continually added. Because of the absence of non-specific probe binding, SNR increases linearly over time, embodiments of the present invention allow one to detect rare transcripts in an allele-specific manner. In embodiments of the present invention synthesized RNA spreads gradually and eventually fills the whole cell, allowing one to perform single-cell quantification in situ using low magnification objectives or to classify cells using Fluorescence Activated Cell Sorting (FACS) using low-abundance or short transcripts. In further embodiments, reporter RNA can be transcribed from the bound DNA probes even after a protracted archival period or protein immunocytochemistry. To visualize the amplified reporter RNA, fluorescent UTP can be directly incorporated during IVT for one-color assay, or barcoded reporter RNAs can be used for rapid sequential readout using FISH.
In embodiments of the present invention, programmable k-mers for rSBL are comprised of related sequences that form high repetitive sequences associated with human disease progression (e.g. triplet expansion). Embodiments also include k-mers that bind to small exons and introns that compete for the same splicing acceptor sites (
In embodiments of the present invention, sequential rSBL counts the number of short sequence repeats using ligation of partially degenerate repeat k-mers that end with a cleavable terminator. Cleavable terminators prevent simultaneous ligation of multiple k-mers on repetitive sequences, and they may include Endonuclease V-based cleavage of DNA (
In embodiments of the invention, programmable k-mers are mixed base-containing oligonucleotides that represent a repetitive sequence motif, in which the conserved sequence is a known fixed based while variable bases are represented by mixed-base symbols in the k-mer sequence (
In embodiments of the present invention, programmable k-mers can represent short sequences that are shared by different groups of DNA or RNA molecules (
In embodiments of the present invention, rSBL using programmable k-mers may be utilized inside a living cell. Upon successful ligation of rSBL probes to the pathogenic target sequence (e.g. missense or non-sense mutations), in vivo signal amplification is performed to sensitize the cell to external cytotoxic modalities, including pharmacological agents, radiation, viral agents, and immune cells. (
Embodiments of the present invention may use endogenous DNA or RNA ligases, probe-associated ribozymes, or chemical ligation for rSBL in live cells. Anti-sense oligonucleotides that form constituents of the live-cell rSBL mix may include chemical modification to the phosphate backbone of nucleotides for efficient stability and delivery, as long as their effect on Tm of k-mers are compensated by changing the probe length of k-mers (
In embodiments of the present invention, sequencing primers and k-mers may be covalently attached to functional groups, including metal nanoparticles, split proteins, aptamers, and chemical moiety, that accept and transfer energy from the external source, including microwave and shorter wave radiation, in a proximity-dependent manner. Embodiments of the present invention enables generation of cytotoxic processes (e.g. free radicals, heat, protein modifications, enzyme inhibition) to occur within the cell if a sufficient number of functional nucleic acid mutations are present for k-mer based rSBL (
In embodiments of the present invention, rSBL using k-mers may be used to fluorescently label circulating tumor cells based on the presence of functionally deleterious mutations for FACS analysis and subsequent genome or proteome profiling. In other embodiments, rSBL ligation may result from k-mers associated with metal isotopes for mass spectrometry-based imaging or single-cell quantification (
The PBCV DNA ligase employed in the Examples of this application has been shown to be an effective ligase for the methods described herein. Accordingly, in an embodiment of the invention, the ligase is a DNA ligase that has the same or similar activity as PBCV DNA ligase. Such a ligase can be a homologue of PBCV DNA ligase. For example, the DNA Ligase Encoded by Chlorella Virus PBCV-1 has been characterized in Ho, C. K., et al. (1997), and is found to be suitable for the methods described herein. Furthermore, additional homologues of the PBCV DNA ligase can be readily identified and validated based on the information disclosed herein. Ho, C. K., et al. (1997) in its entirety and/or for the specific description of the Chlorella Virus PBCV-1 DNA Ligase is incorporated herein by reference.
In addition to homologues of PBCV DNA ligase, in another embodiment of the invention, the ligase is produced by rational design, artificial selection and/or directed evolution to have properties analogous to one or more or all of the properties of the PBCV DNA ligase such ligase may, for example, be produced by rational design, artificial selection and/or directed evolution starting, for example, from PBCV DNA ligase or homologues thereof various methods of directed evolution are known in the art (see, e.g. Turner, N. J. (2009)) and can include, for example, directed evolution as described in Arnold, F. H., et al (1999) or computer-aided protein directed evolution as described in Verma, R. et al. (2012j. Turner, N. J. (2009), Arnold, F. H., et al (1999) and Verma, R et al. (2012) in their entirety and/or for the specific description of directed evolution or artificial selection, are incorporated herein by reference.
In another embodiment of the invention, T4 RINA ligase 2 (Rnl2) is found to be effective in the method described herein and is used to ligate the primers and probes in the methods and compositions described herein. Rnl2 has been characterized in Ho, C. K., et al. (2002) and Larman, H. B., et al. (2014) describes detecting of RNA sequences using a modified RNA Annealing, Selection and Ligation (RASL) assay. Ho, C K., et al. (2002) and Larman, H. B., et al. (2014) in their entirety and/or for the specific description of T4 RNA ligase 2 (Rnl2), are incorporated herein by reference. In a further embodiment, the ligase is a homologue of Rnl2.
In embodiments of the present invention one has a strict control over DNA or RNA molecules that are interrogated in vitro with single-nucleotide specificity. The present invention enables one to amplify, visualize, or sequence functional or clinically relevant nucleic acid variants without the need for specialized target enrichment, targeted library construction, or deep sequencing.
In embodiments of the present invention the entire collection of sampled DNA molecules is amplified but only deleterious mutation-bearing DNA amplicons are sequenced using fluorescently labeled programmable k-mers after rSBL. This enables the operator to overload the sequencing flow-cell with cell-free DNA or their amplicons as molecular over-crowding does not impair imaging. Embodiments may utilize DNA amplicons immobilized onto a flow-cell coupled to optical imaging systems, enabling the detection of ultra-rare circulating tumor DNA molecules in a miniaturized flow cell. Embodiments of this invention may utilize fluorescence imaging of k-mer labeled DNA amplicons, followed by subsequent terminator cleavage and re-ligation for short DNA sequencing using automated fluidics handling.
In embodiments of the present invention the size of DNA amplificons can be made arbitrarily large for high signal-to-noise ratio, since wild-type or non-deleterious molecules do not fluoresce. This enables an instrument to utilize low-cost and low-magnification objectives for quantitative imaging. Such signal amplification methods may include multiple displacement amplification (MDA) of the template DNA. Embodiments of the present invention based on programmable rSBL using k-mers include a portable or benchtop instrument for counting or sequencing ultra-rare cell-free DNA in the blood sample.
In embodiments of the present invention, cell-free DNA detection may be performed by, inter alia: (1) generating short 5′ phosphorylated single-stranded DNA (ssDNA) using exonuclease digestion, asymmetric PCR amplification, or oligonucleotide synthesis, (2) circularization of 5′ phosphorylated ssDNA using end-joining DNA or RNA ligases, (3) binding a 5′ biotinylated RCA primer to a streptavidin or avidin glass or bead to saturation. In preferred embodiments of the present invention the bead is Dynabeads, (4) hybridizing the circularized ssDNA to the bead. (5) adding a DNA polymerase to generate rolling circle amplification products (RCPs). In embodiments of the present invention the polymerase is Phi29 DNA polymerase.
In such embodiments rSBL may be performed by (1) hybridizing an rSBL sequencing primer to RCPs on a bead. In embodiments of the present invention the hybridizing is conducted for 10 minutes. (2) Adding DNA ligase and a fluorescently labeled k-mer. In embodiments of the present invention the reaction is conducted for 60 minutes. In embodiments of the present invention the ligase is T4 DNA ligase. (3) washing un-ligated k-mers from the beads. (4) imaging fluorescently labeled DNA amplicons. In embodiments of the invention the preferred imaging modality is inverted epifluorescence microscopy with a 4-megapixel camera CCD camera.
The following examples are presented in order to more fully illustrate some embodiments of the invention. They should, in no way be construed, however, as limiting the broad scope of the invention.
PBCV DNA ligase had originally been described as being incapable of performing DNA-to-DNA ligation when splinted by an RNA template; however, Lohman et al. showed that the enzyme activity is ˜100-times more efficient compared to T4 DNA ligase (Lohman, G. J., et al. (2013)). Others have shown that the single-base specificity is variable, making PBCV DNA ligase ill-suited for high-fidelity RNA sequencing applications. In this example, it was reasoned that the specificity of mismatch recognition in sequencing-by-ligation using T4 DNA ligase comes from the rapid exchange of base-interrogating probes in competition, in addition to the difference in enzymatic kinetics between perfectly matched vs. mismatched base pairs (
In this example it is tested whether the single-base detection sensitivity and specificity of PBCV DNA ligase could be improved by establishing a solid phase-based in vitro assay. A biotinylated RNA template (30-mer) is bound to Streptavidin beads, a sequencing primer (20-mer) is hybridized to the template in three-fold excess, and the base-interrogating oligonucleotides are added along with PBCV DNA ligase for rSBL using RNA as the template. Because PBCV DNA ligase exhibits nucleotide-specific bias at the ligation junction, all sixteen possible two-base combinations are tested in vitro. Results are quantified using high-throughput capillary gel electrophoresis in order to quantify the absolute amount of the ligated products in addition to any ligation intermediates Single-base discrimination specificity is found to be 100% at position +1 across multiple experiments and probe designs. At room temperature, the specificity is less than perfect for several base combinations, likely due to slower competitive probe exchanges. Deep sequencing of the ligation product demonstrated single-nucleotide specificity ranging from 99% to 99.99% (position +1 to +4) and lower than 99% after base position +8. The ligation efficiency is more variable, but it was >93% as long as the 5′ base of the sequencing primer is either A or T, significantly higher than the allele detection rate of RT-based sequencing methods. Our results here defined the core sequence requirement and the read length for designing sequencing primers and interrogation probes.
In this example, immobilized RNA targets in fixed cells is primed using high excess DNA target primer a Hyb buffer (HB) of 10 mM Trist-HCl, 50 mM KCl, and 1.5 mm MgCl2 at a pH of 7.5-8.0 @b, 25° C. rSBL is conducted for 60 minutes at 37° C. with an rSBL mix of interrogation probes, SplintR NDA ligase, 10 mM Tris-Hcl, 50 mM KCl, and 1.5 mM MgCl2 at a pH of 7.5-8.0, with 1 mM ATP and 200 μM dNTP. Clean up solution of RNase H, Exo 1, 10 mM Trist-HCl, 50 mM KCl, 1.5 mM MgCl2 at a pH of 25° C. is added to degrade un-ligated degenerate sequences for 15 minutes at 37° C., and then heated to 95° C. for 5 minutes. PCR is conducted for 30× cycles using a PCR primer solution of Hot start Taq, PCR primers, dNTP, 10 mM Trist-HCl, 50 mM KCl, and 1.5 mM MgCl2 at a pH of 25° C.
PBCV DNA ligase requires >8-bases for >90% ligation efficiency. Longer N-mers (>12-bases) do not compete well at 25° C. due to higher Tm and lead to misincorporation and base errors (>5% vs. 1-2%). PBCV DNA ligase works at 25° C. or 37° C. The ideal reaction temperature is 37° C., and the ideal N-mer length is 12 to obtain requisite sensitivity and specificity.
Increased error rate is found at 25° C. versus 37° C., with ligation efficiency significantly lower at 8-mer ligation. 5′ inverted dT was required to block degradation of correctly ligated product.
In this example, the exceptional sensitivity, specificity, and SNR is applied to detect specific mutations in suspended single cells. The goal is to detect rare tumor cells and to enable volume-filling signal amplification for monitoring or cell sorting for downstream analysis. To assess the sensitivity and the specificity of such an approach, two populations of HEK293 cells expressing CFP or GFP that differ by a single point mutation are mixed. A probe pair to discriminate GFP from CFP mRNA is designed, followed by conditional IVT amplification during which Cy5-UTP used to label the amplified reporter RNA. Fluorescence microscopy to quantify the false negative and positive rate is used, demonstrating unparalleled performance in identifying cells based on a single-nucleotide mutation). To demonstrate the ease-of-use and potential applications, ˜10 GFP-positive cells per million un-labelled cells is spotted on a piece of nitrocellulose. After gel encapsulation, the nitrocellulose strip is dipped across three different tubes (ligation, exonuclease, and IVT). Using basic epifluorescence microscopy, at least one or more GFP-positive cells out of >1-million cells can be detected in ten independent experiments with a false negative and positive rate of <10′. If significant variations were to exist in GFP protein synthesis, the actual false positive rate is even lower. To see whether the transcriptome remained intact after in situ mutation detection, Cy5-positive cells (GFP mRNA) are sorted and standard mRNA-seq is performed before or after in situ RNA mutation detection. Using two million reads per sample, the total number of unique genes was similar (10,000 vs. 12,000), and their global gene expression is highly correlated (R2=0.88; Spemann's p-value <10′).
To see if our platform can also be used for sequencing short RNA barcodes in situ, a cell line stably expressing GFP and Cas9 is used. After transfecting GFP-specific sgRNA-expressing plasmids, the region downstream of the PAM sequence predicted to contain short somatic indels is interrogated using partly degenerate interrogation probes that are barcoded for each base (+1 to +4). Prior to sgRNA transfection, 99% of the cells were GFP-positive, and 98% of the cells displayed the same GFP template sequence. After sgRNA transfection, however, 52% of the cells lose their GFP fluorescence after 24 hours, each cell displays unique indel sequences in situ as 1-, 2-, or 4-cell mosaics. Of those cells they stayed GFP-positive, 17% now display unique GFP indel sequences, suggesting in-frame indels. Because somatic mutations are sequenced in situ, multiple independent reporters can be introduced and maintain their phase for each cell. In combination with high sensitivity and speed, it possible to interrogate dozens of induced somatic mutations sequentially for comprehensively reconstruct cell lineage or activity information in situ.
The ultimate goal of cell atlas projects is to scale these approaches for whole tissue, whole organ, or whole organism reconstruction based on molecular or cellular information. To demonstrate the scalability of our platform, 200 adult brain sections are generated from heterozygous mice. Probe pairs for those transcripts most associated with cell identity information from published single-cell RNA studies are generated (See
In one experimental realization of our method, tissues are dissociated using enzymatic digestion, or the blood is collected and spun down at 4° C. Suspended cells are then fixed in formalin, ethanol, or methanol for 15 min, followed by cell permeabilization, if necessary, using Triton X-100. The sequencing primer is hybridized in situ at 42° C. for 2 hours in the presence of formamide and RNase inhibitors. The excess primer is then washed out, followed by the addition of mutation-scanning probes along with the DNA ligase of choice (e.g. PBCV DNA ligase) for up to 1 hour. Cells are then washed and used for in situ PCR or RCA. For PCR, a pair of 5′ modified primers are used so that one PCR strand can be digested after the PCR reaction. For RCA, the rSBL product is circularized using DNA splinted ligation or CircLigase, followed by strand-displacement amplification using Phi29 DNA. These steps enable fluorescent probe hybridization or DNA barcode sequencing to interrogate individual bases in the amplified product.
In another experimental realization, single cells are sorted into 96-well plates manually or using FACS into a cell lysis buffer. The sequencing primer is annealed to endogenous mRNA for 2 hour, and mutation scanning probes along with DNA ligase are then added into each well for 1 hour at 37° C. Un-ligated rSBL probes are digested using exonucleases (1, III, or lambda), followed by the heat inactivation of exonucleases. Real-time quantitative PCR is performed using mutation or sequence variant-specific PCR primers, using ΔCt from the wild-type sequence to quantify the relative amounts of mutant alleles on RNA. This method can quantify the single-cell heterogeneity in somatic mutations or allele-specific gene expression. In contrast, current methods for single-cell mutation sequencing or amplification suffer from a high drop-out rate due to their low sensitivity (e.g. inefficiencies in reverse transcription), limiting a quantitative analysis of mutation- or allele-specific gene expression in single cells to highly expressed genes.
In another experimental realization of our method, fixed frozen OCT-embedded tissues or FFPE tissues are mounted on a glass slide, followed by cell permeabilization using proteinase K. The sequencing primer is hybridized to tissues in situ at 42° C. for 2 to overnight in the presence of formamide and RNase inhibitors. The excess primers are then washed out, followed by the addition of mutation-scanning probes along with DNA ligase of choice (e.g. PBCV DNA ligase) for up to 1 hour. Tissues are then washed and used for in situ PCR or RCA. For PCR, a pair of 5′ modified primers are used so that one PCR strand can be digested after the PCR reaction. For RCA, the rSBL product is circularized using DNA splinted ligation or CircLigase, followed by strand-displacement amplification using Phi29 DNA. These steps enable fluorescent probe hybridization or DNA sequencing (e.g. rSBL) to interrogate individual bases in the amplified product. This allows one to sequence somatic mutations in situ to map the tumor mutational heterogeneity, including other types of RNA variants (e.g. T-cell receptor variants, splicing variants, RNA modifications) spatially.
Step 1. A first A or T base upstream from a codon-of-interest is identified. If A or T is within 9 bases from the codon, the codon is suitable for sequencing, as indicated below with the codon-of-interest indicated in uppercase and the A or T, here a T, indicated in underline.
For RNA-based SBL, a first A or T base upstream from a codon-of-interest is identified. If A or T is within 6 bases from the codon, the codon is suitable for sequencing, as indicated below with the codon-of-interest indicated in uppercase and the A or T, here a T, indicated in underline. For DNA-based programmable DNA, any base adjacent to the codon sequence is suitable for the targeted primer design.
Step 2. A 20-base sequence, or 20- to 35 base sequence (Tm˜60-80° C.), going away from the codon sequence is chosen, starting from the chosen A or T base (rSBL) or any adjacent base (SBL), as indicated below in italics. Its reverse complement sequence is generated as the target-specific rSBL primer (bottom strand in the figure below).
Step 3. The rSBL primer is 5′ phosphorylated for ligation.
The 5′ end of the sequencing primer forming a base-pair with A or T is the rSBL junction, indicated below with a vertical line. The 5′ phosphorylation is not shown for the clarity of presentation. The 5′ end of the sequencing primer forming a base-pair with the DNA template base is the rSBL junction. Only the rSBL junction example is shown below. The hybridization region of the sequencing primer is shown in italics, and the ligation junction is shown as a vertical line through the RNA template sequence.
Step 4. Starting from the ligation junction, 12-bases containing the codon sequence, indicated below in bold are selected. Then its reverse complement sequence is generated.
Step 6. To exclude the wild-type sequence from detection, the sequence complementary to the wild-type codon base is replaced using the mixed bases identified in Table 1.
Step 7. For point mutations, the wild-type complementary sequence is fixed at the other two positions for every mixed base in programmable rSBL probes. The anti-codon sequence is underlined. Note the direction of rSBL probes (5′ to 3′).
Step 8. To further reduce the probe complexity to non-synonymous mutations, only probes interrogating bases expected to change amino acid identity are used, as, e.g., identifiable from Table 2:
For example, in the case of KRAS G12, the third codon base does not alter amino acid identity. Therefore, only two probes with mixed degeneracy at codon base 2 or 3 are used to detect non-synonymous point mutations from KRAS G12.
Step 8. Programmable rSBL probe sequences are added to amplification-enabling primer sequences for PCR, FISH, RCA, or other universal primer-based amplification methods. An example of adapter sequence for PCR or RCA is shown below. Note that the RNA template direction is 5′ to 3′, while the adapter-containing rSBL probe is 3′ to 5′. Adapter 1 is added to the sequencing primer, and Adapter 2 is added to the rSBL probe.
Dcaccgcatccg-(adapter2) 5′
cDaccgcatccg (adapter2) 5′
Step 9. The wild type rSBL probe is tagged with a scrambled control sequence to block PCR amplification from wild type sequences.
Step 10. This process yields three rSBL interrogation oligonucleotide sequences that are added to amplification or control adapter sequences.
Step 11. To eliminate excess SBL probes from interfering with signal amplification, phosphothioate (PPT) or inverted T is added to the 3′ end of Adapter 1 in the sequencing primer. This prevents successfully ligated rSBL products from being digested by 3′ exonucleases (e.g. Exo I or III), while un-ligated SBL probes containing (adapter2) sequences are degraded.
Step 12. Adapter 2 can include a >15-nt barcode sequence so that fluorescent hybridization can be used for determining the specific sequence that is incorporated into the final rSBL product. The barcode length for can be 1 or 2-bases for in situ sequencing readout using optical microscopy.
For additional codons, Step 1-12 is iterated. For 50 codons, this procedure generates 50 phosphothiolated target specific primers (20-35-nt+adapter sequence) and 150 partially degenerate rSBL probes (12-nt+adapter sequence), including wild-type sequence competitors. If nonsense mutations are considered in addition to missense mutations, the final number of partially degenerate rSBL probes may change.
A practical result of the method exemplified in Example 8 is the creation of a generic cancer probe with high single-base specificity and sensitivity, capable of labeling cells based on common driver mutations rather than functional biomarkers that require extensive testing and validation. Our algorithm results in a set of pancreatic ductal adenocarcinoma (PDA)-specific probes capable of sequencing seven Kras mutations that account for 86% of PDAs. Our algorithm enables the detection of up to 112 non-synonymous somatic mutation variants de novo using 23 oligonucleotides as shown in Table 3 in a single-pot reaction. The algorithm can be broadly generalized for creating multiple cancer-specific probe panels or a pan-cancer probe panel for labeling, visualizing, and isolating human cancers cells. Each probe cancer-specific panel can be combined with SBL and signal amplification reagents described for various medical and research purposes.
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
GGGTCATATCGGTCACTGTT
Synthetic RNA transcripts of 42 bases long are obtained with 16 different ligation junctions located at bases 30 and 31. The RNA is biotinylated at the 3′ end. DNA probes are designed with the sequencing primer being complementary with a hybridization size of 30 and a 3′-FAM fluorophore, as well as 5′ phosphate. The forward primer is obtained with each base combination at the 3′ end, with the rest of the 11 bases being complementary. This is a total of six oligos being obtained. Entire workspace is cleaned to ensure RNase-free reaction. RNA template is added at 5-uM and DNA sequencing primer at twice the concentration of the RNA template, 10-uM, in 2×SSC to a total volume of 50-uL. Oligos are mixed via gentle pipetting up and down. Oligo mixture is then incubated at 95° C. for 5 minutes, 60° C. for 10 minutes, and room temperature for 10 minutes. While the incubation is occurring, 50-uL of Dynabeads streptavidin m270 (ThermoFisher, 65306) per each 50-uL volume of oligos were washed in 2×SSC 3-4 times depending on the volume of beads. After oligos are cooled to room temperature, they are added to the washed beads and shaken with gentle agitation for 15 minutes at room temperature. After 15 minutes the beads are placed on a magnetic stand until the supernatant became clear (2 minutes), the supernatant is removed. Beads with oligos are then washed three times with 10-mM Tris buffer. Beads are then split into two aliquots for positive and negative controls. A SplintR master mix consisting of 20-uM of forward primer per base for a total of 80-uM of forward primers, 1.0-uM of SplintR Ligase (NEB, M0375L) and 1× SplintR buffer to a total volume of 20-uL per reaction. Master mix is added to washed beads, mixed gently, and incubated at 37° C. for 60 mins with a 10-minute heat kill at 70° C. post ligation. Beads were then washed with 10 mM Tris buffer three times. RNase cocktail of 6.25 U of RNase H (Enzymatics, Y9220L), 2× RNase H buffer, 20-ug of RNase DNase-free (Sigma Aldrich 11119915001) and ultrapure water to 50-uL per reaction. Cocktail is added to beads and incubated at 37° C. for 1 hour followed by 10 minutes of 70° C. Supernatant is removed and diluted 1:30 in ultrapure water. 3-uL of dilution is added to 9-uL of HiDi Formamide (ThermoFisher, 4311320) and 0.5-uL of GeneScan ROX 500 (ThermoFisher, 401734) per reaction. Mixture is to be heated to 95° C. for 5 minutes, placed on ice, and centrifuged for 5 seconds. Plate is then run in Bioanalyzer AB13730, GeneMapper software was is to analyze data. Downstream NGS preparation included PCR amplification with 0.25-uL of each primer, 6.0-uL of ligation product, 25.0-uL of Phusion Pfu high fidelity mastermix (NEB M0531s) and ultrapure water to 50-uL. Amplified product is cleaned with 2% SizeSelect E-Gel (ThermoFisher G661012) and proceeded with NEB Next Ultra II Library Prep Kit for Illumina (NEB E7645s and E7335s) as per commercial protocol. All Primers formulated as desalted.
96 sequencing primers are added at 10-uL each (100 uM stock concentration) for a total of 960-uL of primers. A phosphorylation master mix was then made with 10-ul 10× T4 DNA ligase buffer (NEB Catalog B0202S), 50 U of PNK enzyme (NEB Catalog M0201S), 25-uL of sequencing primer mix (stock 100-uM, final concentration 25-uM), and ultrapure water to 100-uL per reaction. The mix is then incubated at 37° C. for 1 hour and heat inactivated at 65° C. for 20 minutes. Cells are then lysed on plate in 50-uL of Single Shot Lysis Buffer (BioRad Catalog 1725080) at ˜-100,000 cells per 50-uL following the manufacturer's protocol. Alternatively, the lysate is incubated with poly dT oligonucleotides and streptavidin magnetic beads as followed by mRNA isolation on beads. 5-uL of the lysate is then added to 5-uL of the sequencing primer mix (25-uM, 5-uM final concentration), 10-uL of the phosphorylated degenerate forward primer mix (100-uM, 40-uM final concentration), and 2.5-uM of 10× SplintR buffer (NEB Catalog M0375S). For bead-based protocols, the sample excess probes and reagents are decanted, and the sample is washed twice in the wash buffer. Mixture is then heated to 95° C. for 5 minutes, 60° C. for 10 minutes, room temperature for 10 minutes and held at 4° C. 2.5-uL of SplintR ligase (NEB Catalog M0375S) was then added to each reaction, or 2.5-uL of water for negative controls. The ligation mixture is then incubated at 37° C. for 1 hour and heat inactivated at 70° C. for 10 minutes. To quantify, 1-uL of 10,000× diluted product is added to 5-uL of PowerUp Sybr Green Master Mix (Thermofisher Catalog A25742), 0.1-uL of 100-uM primers and 3.8-uL of ultrapure water. The mix is run on a Quant Studio qPCR machine and analyzed. Downstream NGS preparation included PCR amplification with 0.25-uL of each primer, 6.0-uL of ligation product, 25.0-uL of Phusion Pfu high fidelity mastermix (NEB M053 is) and ultrapure water to 50-uL. Amplified product is cleaned with 2% SizeSelect E-Gel (ThermoFisher G661012) and proceeded with NEB Next Ultra II Library Prep Kit for Illumina (NEB E7645s and E7335s) as per commercial protocol. All Primers formulated as desalted.
In conclusion, the above examples demonstrate a flexible and scalable platform for detecting or sequencing RNA single-nucleotide variants with sensitivity and specificity surpassing existing single-cell methods. In addition, the incorporation of signal amplification of molecular, cellular, or pathway identifiers across many different tissue types and applications, including isolating rare single cells based on somatic mutations. Because of its simplicity, the platform can be adapted for ‘staining’ clinical tissue specimens using their genetic characteristics, including point mutations, translocations, and tumor type gene expression markers. Fundamentally, the platform is a nucleotide-specific targeted in situ amplification method compatible with multiple downstream applications, including single cell genomics, in situ hybridization, and in situ sequencing methods. More specifically, the technology can be used to mark the position of individual cells prior to dissociation-dependent single cell analysis or to improve the detection sensitivity of in situ sequencing methods. By incorporating gel encapsulation and probe immobilization techniques, its spatial resolution can be improved even further. The platform, named Heuristic In Situ Targeted Oligopaint sequencing (HISTO-seq) enables the development of applications for disease-specific genetic ‘dyes’ for uses in basic research or clinical applications.
In this example, 100 ng 50-mer 5′ biotinylated RNA templates are bound to Dynabeads (ThermoFisher) in a provided binding buffer at 25° C. for 10 minutes, followed by a wash cycle in 2×SSC. The 5′ phosphorylated 20-mer DNA sequence primer with a 3′ FITC modification (IDT) is added in 3-fold molar excess for DNA-RNA hybridization in 2×SSC with RNaseOUT (ThermoFisher) for 10 min at 60° C. After two rounds of washing cycles using 2×SSC, 2 U PBCV DNA ligase (SplintR, NEB) along with 10-fold molar excess of programmable rSBL probes are added to the DNA-RNA complex bound to Dynabeads in the SplintR reaction buffer containing RNaseOUT. The reaction is incubated at 37° C. for 60 minutes and washed twice using 2×SSC. The immobilized RNA is degraded using 1 U RNase H (NEB) and RNase A (NEB) in the ligation buffer, releasing the FITC-labeled sequencing primer and the full rSBL product. After spinning in microcentrifuge, the supernatant is collected and diluted using ddH2O 100-fold for fragment size analysis on ABI 3730 capillary sequencer. The area under the FITC signal is quantified as well as the position of individual peaks reflecting the size of ligated or un-ligated products. The size of base-specific rSBL products differ in fragment size by 5-10 bases, enabling the discrimination of correct to incorrect incorporation of interrogating oligonucleotides. The ligation efficiency of correct rSBL is expressed as (Area under the correct rSBL product)/(Area under the un-ligated FITC primer+Area under the incorrectly ligated rSBL product). When the 5′ base of the sequencing primer is A or T, the ligation efficiency is >90% after 60 minutes of rSBL at 37° C.
After rSBL the ligation product in the supernatant is used for PCR, qPCR, digital droplet PCR, or in situ PCR/RCA/MDA on a flow cell. In this example, the ligation product in the supernatant after RNase H and RNase A digestion is diluted in ddH2O 1 to 1,000-fold, depending on the starting amount of immobilized RNA template. For 100 ng RNA template, the rSBL product was diluted 1,000-times in ddH2O. Two microliters of the diluted product are added to KAPA Real-Time Sybr-Green qPCR 2× Master Mix, along with 10 μM forward and reverse PCR primers against Adapter 1 and Adapter 2 sequences in the rSBL product. The cycling parameters are as follows: 95° C. for 30 sec, 60° C. for 10 sec, and 72° C. for 10 sec for 40 cycles. The real-time qPCR benchtop instrument (Eppendorf) is used to quantify the rate of PCR amplification to estimate the amount of rSBL products using un-ligated and wild-type reference samples for ΔΔCt calculations. The final product size (85-nt) was validated using 2% agarose gel electrophoresis.
In this example, single cells of interest from the blood (Ficoll centrifugation) or enzymatic tissue dissociation (trypsin) are fixed in 4% PFA in PBS-T at 4° C. for 15 min. Cells are pelleted using 100-g centrifugation over 15 min at 4° C., and washed in cold DEPC-PBS twice. The 5′ phosphorylated 50-mer DNA sequencing primer (25-nt target specific sequence+20-nt adapter sequence; 1 uM) are added for in situ RNA hybridization in 2×SSC with RNaseOUT (ThermoFisher) for 2 hours to overnight at 42 to 60° C., depending on cell type. After two rounds of washing cycles using 2×SSC, 2 U PBCV DNA ligase (SplintR, NEB) along with 20-uM programmable rSBL probes are added to the fixed cells in the SplintR reaction buffer containing RNaseOUT. The reaction is incubated at 37° C. for 60 minutes and washed twice using 2×SSC. Un-ligated rSBL probes are degraded by 1 U Exonuclease I/I at 37° C. for 1 hour, while ligated rSBL products survive enzymatic digestion due to phosphothioate modifications in the sequencing primer. Individual cells are stabilized further using degassed 4% polyacrylamide (no bis-acrylamide) solution with APS and TEMED for 1 hour. Single-cell-hydrogel particles are filtered through a 200-um nylon mesh to eliminate large particle aggregates. The collected single-cell hydrogel mixtures are added to KAPA PCR Master Mix with forward and reverse PCR primers against adapter sequence 1 and 2. Cycling parameters can start follows: 95° C. for 30 sec, 60° C. for 30 sec, and 72° C. for 30 sec for 10-30 cycles for in situ PCR. The resulting double-stranded PCR products are converted into single-stranded DNA using lambda 5′ exonuclease at 37° C. for 1 hour in the provided buffer, followed by re-fixation with 4% PFA in PBS prior to fluorescent FISH probe hybridization in 2×SSC. The labeled cells are then ready for FACS analysis.
In this example adherent cells or fresh frozen tissue sections on a glass slide are fixed in 4% PFA prior to rSBL. Silicone gaskets (Grace-bio) are cut to size (˜10-mm chamber diameter) and placed to enclose the specimen, forming an open flow-cell accessible to direct manipulation. The 5′ phosphorylated 50-mer DNA sequencing primer (25-nt target specific sequence+20-nt adapter sequence; 1 uM) are used for in situ RNA hybridization in 2×SSC with RNaseOUT for overnight at 42° C. After two rounds of washing cycles using 2×DEPC-SSC, 2 U PBCV DNA ligase along with 20-uM programmable rSBL probes are added to fixed cells or tissue sections in the SplintR reaction buffer containing RNaseOUT. The reaction is incubated at 37° C. for 60 minutes and washed twice using 2×SSC. Un-ligated rSBL probes are degraded by 1 U Exonuclease 11111 at 37° C. for 1 hour, while ligated rSBL products survive enzymatic digestion due to phosphothioate modifications in the sequencing primer. The fixed cells or tissues are incubated with KAPA PCR Master Mix with 5′ phosphorylated forward and non-phosphorylated reverse PCR primers against adapter sequence 1 and 2. Cycling parameters can start follows: 95° C. for 30 sec, 60° C. for 30 sec, and 72° C. for 30 sec for 10-30 cycles for in situ PCR. The resulting double-stranded PCR products are converted into single-stranded DNA using lambda 5′ exonuclease at 37° C. for 1 hour in the provided buffer, followed by re-fixation with 4% PFA in PBS prior to fluorescent FISH probe hybridization in 2×SSC. The labeled cells are then ready for FACS analysis.
In this example, rSBL products are amplified using RCA rather than in situ PCR. Adherent cells or fresh frozen tissue sections on a glass slide are fixed in 4% PFA prior to rSBL. Silicone gaskets are cut to size and placed to enclose the specimen. The 5′ phosphorylated 50-mer DNA sequencing primer are used for in situ RNA hybridization in 2×SSC with RNaseOUT for overnight at 42° C. After two rounds of washing cycles using 2×DEPC-SSC, 2 U PBCV DNA ligase along with 20-uM programmable rSBL probes are added to fixed cells or tissue sections in the SplintR reaction buffer containing RNaseOUT. The reaction is incubated at 37° C. for 60 minutes and washed twice using 2×SSC. The sample is then washed with DEPC H2O to remove any trace of residual chloride. Two units of CircLigase II (Epicenter) in the CircLigase buffer are added to the rSBL product-containing fixed cells or tissues and incubated for 2 hours at 60° C. in a humidifier oven. The RCA primer is then hybridized to the circularized rSBL products in 2×SSC and 10% formamide solution at 60° C. for 10 min, followed by two 1 min wash in 2×SSC. Two units of Phi29 DNA polymerase along with amino-allele dUTP spike-in are added to the specimen at 30° C. for up to overnight. After RCA, the specimen is then incubated with BS(PEG)9 in PBS pH8.0 for 10 min at 25° C. to cross-link RCA products in situ. 100 uM fluorescently labeled detection FISH probes are hybridized against RCA products in 60° C. for 5 min followed by three washes in 2×SSC and imaging on an epifluorescence or confocal microscope. Note that wild type rSBL probes are included, but they end with inverted dT so that they cannot be circularized by CircLigase. Table 3 shows examples of codon specific probes used for in situ rSBL for RCA-based optical imaging.
Many different single-molecule FISH methods exist for increasing the fluorescent signal from single ssDNAs in situ. Some enable ssDNA rSBL products to be visualized, while others permit rSBL amplicons to be sequenced. Depending on applications, the precise nature of rSBL signal amplification differs; therefore, the experimental details for alternative signal amplification methods will not be summarized here for the sake of clarity.
CGGGA
A
GCTGA
AGAta
CC
CGC
TGAAG
Atacg
CGC
TGAAG
Atacg
CGCT
GAAG
Agcct
TCTCG
GGAA
GCTGA
AGACA
ACG
TGAAG
ACACA
G
TGAAG
ACACA
TGAAG
AAGGC
TCGGG
AA
GCTGA
AGAGC
ATA
CGC
TGAAG
AGCGG
A
CGC
TGAAG
AGCGG
B
CGC
TGAAG
ACAGG
A 3′ biotinylated RNA template and a 5′ phosphorylated, 3′ FAM conjugated DNA oligo (sequencing primer) complementary to the hybridization site of the RNA template were mixed (5 μM and 10 μM, respectively) in 2×SSC to a total volume of 50 μL. The mixture was incubated at 95° C. for 5 min, 60° C. for 10 min, and room temperature for 10 min. In the meantime, 50 μL of Dynabeads streptavidin m270 (ThermoFisher, 65306) per each 50 μL volume of oligos were washed in 2×SSC 3-4 times depending on the volume of beads. The hybridized RNA:DNA duplex was added to the washed beads and gently agitated for 15 min at room temperature before being transferred to a magnetic stand for 2 min and the supernatant removed. The beads were then washed three times with 10 mM Tris buffer. Beads were then split into two aliquots, 75 μL and 25 μL for positive and negative controls, respectively. A SplintR master mix consisting of 20 μM of forward primer per base for a total of 80 μM of forward primers, 1.0 μM of SplintR Ligase (NEB, M0375L) and 1× SplintR buffer to a total volume of 20 μL per reaction. The master mix was added to washed beads, mixed gently, and incubated at 37° C. for 60 min with a 10 min heat kill at 70° C. post ligation. For time courses, reactions were removed from 37° C. at each time point and transferred to a separate 70° C. incubator for a 10 min heat kill. Beads were then washed with 10 mM Tris buffer three times. A RNase cocktail of 6.25 U of RNase H (Enzymatics, Y9220L), 2× RNase H buffer, 20 μg of RNase DNase-free (Sigma Aldrich 11119915001) and Ultrapure 1120 to 50 μL, per reaction. The cocktail was added to beads and incubated at 37° C. for 1 hour followed by 10 min of 70° C. The supernatant was removed and diluted 1:3 in Ultrapure H2O. 3 μL of dilution was added to 9 μL of HiDi Formamide (ThermoFisher, 4311320) and 0.5 μL. of GeneScan ROX 500 (ThermoFisher, 401734) per reaction. The mixture was heated to 95° C. for 5 min, placed on ice, and centrifuged for 5 sec. Ligation products were then run in Bioanalyzer AB13730, and analyzed using GeneMapper. For base specificity, an RNA template with 16 different ligation junctions located at bases 30 and 31 was used. All Primers ordered from IDT and formulated as desalted.
To determine the attainable read length of ProRSBL, interrogation probes consisting of four degenerate bases at positions 1-4, 5-8, or 9-12 were used to interrogate bases either 5′ or 3′ to the sequencing primer and ligation products were quantified using Illumina MiSeq. For forward interrogation (positive position from ligation junction) a 3′ biotinylated RNA template and a 5′ phosphorylated DNA oligo (sequencing primer) complementary to the hybridization site of the RNA template were mixed (5 μM and 10 μM, respectively) in 2×SSC to a total volume of 50 μL. For reverse interrogation, (negative position from ligation junction) a 3′ biotinylated RNA template and a DNA oligo (sequencing primer) complementary to the hybridization site of the RNA template were mixed (5 μM and 10 μM, respectively) in 2×SSC to a total volume of 50 μL. SplintR ligation followed previously described protocol, using 80 μM of total degenerate probes. RNase digestion follows RNA templated ligation protocol. Downstream NGS preparation included PCR amplification with 0.25 μL of each primer, 6.0 μL of ligation product, 25.0 μL of Phusion Pfu high fidelity mastermix (NEB M05315S) and Ultrapure H2O to 50 μL. Amplified product was cleaned with 2% SizeSelect E-Gel (ThermoFisher G661012), and proceeded with NEB Next Ultra II Library Prep Kit for Illumina (NEB E7645s and E7335s) as per commercial protocol.
To determine the kinetics of the cleavage reaction, 200 pmol of a 5′ biotinylated RNA template identical to the sequence of 28s rRNA and 400 pmol of complementary DNA with an inosine (Cleavage substrate 13) were mixed (5 μM and 10 μM, respectively) in 2×SSC to a total volume of 50 μL. The mixture was incubated at 95° C. for 5 min 60° C. for 10 min, and room temperature for 10 min. In the meantime, 100 μL of Dynabeads streptavidin m270 (ThermoFisher, 65306) per each 100 μL volume of oligos were washed in 2×SSC 3-4 times depending on the volume of beads. The hybridized RNA:DNA duplex were added to the washed beads and gently agitated for 15 min at room temperature before being transferred to a magnetic stand for two min and the supernatant removed. The beads were then resuspended in 150 μL of DEPC H2O to 100 pmol. 60 μL of resuspended beads (20 pmol) were added with 20 μL of NEBuffer 4 10× (ThermoFisher, M0305S) and 60 μL of DEPC H2O. 30 μL was removed for the negative control. 1 uL of EndoV (ThermoFisher, M0305S) was added for a total volume of 170 uL. Cleavage was performed at 37° C. for 60 min with a 20 min heat kill at 65° C. post cleavage. For time courses, reactions were removed from 37° C. at each time point and transferred to a separate 65° C. incubator for a 20 min heat kill. RNase cocktail was created as described above. 10 μL of the cocktail was added directly to cleavage reaction and incubated at 37° C. for 1 hour followed by 10 min of 70° C. The supernatant was removed and diluted 1:2 in Ultrapure H2O. Product was run on Bioanalyzer AB13730 as in RNA templated ligation protocol.
Human total RNA extracted from Capan-1 (ATCC® HTB-79™) cells was diluted to 50 ng/μL and incubated with 0.1 μL Endonuclease V (10 U/μL) per 10 μL reaction for up to 60 min at 37° C. Samples were heat killed at 65° C. for 20 min following Endonuclease V digestion, and then immediately stored at −80° C. Samples from each timepoint were run on a Nano chip Agilent 2100 Bioanalyzer to inspect integrity via an electronic gel image.
For post cleavage ligation, 500 pmol of a synthetic 5′ biotinylated RNA template and 1000 pmol of complementary DNA with an inosine (Cleavage substrate 18) were mixed (5 μM and 10 μM, respectively) in 2×SSC to a total volume of 50 μL. The mixture was incubated at 95° C. for 5 min, 60° C. for 10 min, and room temperature for 10 min. In the meantime, 100 μL of Dynabeads streptavidin m270 (ThermoFisher, 65306) per each 100 μL volume of oligos were washed in 2×SSC 3-4 times depending on the volume of beads. The hybridized RNA:DNA duplex was added to the washed beads and gently agitated for 15 min at room temperature before being transferred to a magnetic stand for 2 min and the supernatant removed. The beads were then resuspended in 100 μL of DEPC H2O. 50 μL of resuspended beads were added with 11 μL of NEBuffer 4 10× (ThermoFisher, M0305S) and 39 μL of DEPC H2O. 10 μL of the mastermix was removed for the negative control. 10 μL of EndoV (ThermoFisher, M0305S) was added for a total volume of 100 μL. Cleavage was performed at 37° C. for 60 min with a 20 min heat kill at 65° C. post cleavage. The reaction was then gently washed in 10 mM Tris twice and resuspended in 100 μL of 10 mM Tris. Beads were then split into two aliquots, 75 μL and 25 μL for positive and negative controls, respectively. A SplintR master mix consisting of 40 μM per base A or T, 1.0 μM of SplintR Ligase (NEB, M0375L) and 1× SplintR buffer to a total volume of 20 μL per reaction. The master mix was added to washed beads, mixed gently, and incubated at 37° C. for 60. Beads were then washed with 10 mM Tris buffer three times. RNase digestion and analysis follow RNA templated ligation protocol.
A 32 mer RNA polymer (RNA template) flanking different mutation of interest and a 5′ phosphorylated, 3′ biotinylated DNA oligo (sequencing primer) complementary to the hybridization site of the RNA template were mixed (5 μM and 10p M, respectively) in 2×SSC to a total volume of 50 μL. The hybridization, ligation and RNA digestion protocol is the same as mentioned in ProRSBL section. The ligated fragments attached to the beads are diluted 1:10,000 and 1 μL is added to the qPCR reaction mix having 5 μL SYBR green (2×) and 200 μM each of qPCR adapter primer sequence. The qPCR cycles were setup as 95° C. for 5 min, 95° C. for 30 sec, 62° C. for 30 sec, 72° C. for 30 sec, 72° C. for 7 min, 4° C. forever.
Adherent immortalized human astrocytes (E6/E7 and hTERT) were cultured on a glass-bottom Mattek dish. The cells are fixed with 2 mL of 10% formalin in PBS for 15 min at 25° C. Cells are washed with 2 mL of PBS three times. Following fixation, 2 mL of 0.25% (vol/vol) Triton X-100 in DEPC-PBS for 10 min. Cells are washed with 2 mL of PBS three times. Cells are treated with 0.1 N HCl in DEPC-treated H2O for 10 min to improve permeabilization. The sequencing primer (2.5 μM) is added to the cells in presence of 2×SSC containing 10% formamide and SUPERase In (ThermoFisher AM2694) (0.1 U) and incubated overnight at 60° C. in a humidified chamber. The cells are washed with 2 ml of 2×SSC with 10% formamide for three times. The ligation mix is prepared combining 20 μL of 10× SplintR ligase buffer, 5 μL of SplintR Enzyme, 30 μL (10 μM) of each probe interrogating the mutant allele and 10 μL (5 μM) of the wild type probe. DEPC H2O was added to a total volume of 200 μL.
Ligation was performed at 37° C. for 2 hours followed by washing 3 times with 2 mL 2×SSC containing 10% formamide. Post ligation, the solution was aspirated and 10 μL of DNase-free RNase and 5 μL of RNase H in 1× RNase H buffer was added and incubated for 1 hour at 37° C. The sample was rinsed with 2 mL of nuclease-free 1-O twice to remove traces of phosphate CircLigase II (Lucigen CL9021K) reaction mixture was prepared on ice with 20 μL of 10× CircLigase II buffer, 10 μL (2.5 mM) of 50 mM MnCl2, 40 μL (0.5 M) of 5M Betaine, 5 μL (1 U μL−1) of CircLigase II and Nuclease free H2O to 200 μL. The master mix was added to the glass bottom dish containing the sample.
Cells were incubated at 60° C. in a humidified chamber for 2 hours. The RCA primer (2.5 M) in 200 μL hybridization buffer containing 2×SSC with 30% formamide was added to glass-bottom dish and incubated at 60° C. for 15 min. The primers were aspirated and washed with 2×SSC with 10% formamide for 10 min at 60° C. followed by wash with 2×SSC for 10 min at 60° C. RCA reaction mixture was prepared on ice with 20 μL of φ29 10× buffer, 373 2 μL (250 μM) of 25 mM dNTPs (Enzymatics N2050L), 2 uL (40 uM) of 4 mM Aminoallyl dUTP (Anaspec AS-83203), 10 μL. (1 U μL−1) of φ29 DNA polymerase (Enzymatics P7020-HC-L) and Nuclease free H2e to 200 μL. The master mix was added to the glass-bottom plate. The incubation was performed at 30*C overnight.
To cross-link cDNA molecules containing aminoallyl-dUTP, the RCA reaction mix was aspirated and 20 μL of reconstituted BS(PEG)9 in 980 μL of PBS was added to the sample and incubated for 1 hour at room temperature. The sample was washed with PBS and incubated with 1 M Tris, pH 8.0 for 30 min. The reaction mix was aspirated and incubated for 10 min at room temperature with 2.5 μM detection probe in 200 μL of 2×SSC preheated at 80° C. The sample was washed three times for 10 min each with gentle shaking.
RCPs were quantified using 8-bit grayscale images of hybridized fluorescent 386 detection oligos that were first filtered by gray morphology erosion operation (gray morphology plugin (2.3.4) in Fiji) using a circle radius of 2 pixels so as to remove speckles and non-RCPs fluorescent signal.
The introduction of T4 RNA ligase 2 (Rnl2) and Chlorella virus DNA ligase (PBCV-1) in place of T4 DNA ligase has increased the efficiency of in situ RNA-templated DNA ligation (RTDL). Current RTDL methods are limited by a number of features: Padlock probe (PLP) approaches rely on connected (dependent) oligo arms that both anneal to the same target with similar kinetics. (
To increase ligation accuracy, we designed ProRSBL so that the sequencing primer anneals more stably to the RNA template compared to the competing probes (Tm>60° C., and ˜37° C., respectively). Using this design, we found that probe competition reduced random erroneous ligation from 18.5±1.6% to 4.0±0.1% (P-value=0.0038, Welch Two Sample t-test, t=15.994, df=2.0075) (
To determine the read length attainable with ProRSBL, we designed probes with degenerate bases interrogating positions 1 to 4, 5 to 8 or 9 to 12 of an RNA template (
To enable probe cleavage and re-ligation for cyclic ProRSBL, we took advantage of Endonuclease V, which hydrolyzes the second or third phosphodiester bond downstream (3′) of an inosine base The relative rate of hydrolysis is higher at the second phosphodiester bond (95%) and can be increased to 100% if the third bond is substituted with phosphorothioate. Endonuclease V cleavage of DNA-templated SBL ligation products allows multiple ligation cycles to delineate longer sequences. To investigate the feasibility of cyclic RTDL, we first determined the cleavage kinetics of endonuclease V using an inosine-bearing DNA:RNA hybrid and found that cleavage exceeded 75% within 30 minutes (
Rare variants (SNPs, alleles, indels etc.) are often masked by the abundance of wildtype sequences. To avoid detecting non-informative sequences, we programmed probes to selectively sequence variants of interest. Our algorithm for probe programming utilizes a NOT logic gate at an interrogated base to exclude ligation products occurring at an undesired sequence (
ProRSBL overcomes the two-body and dual-search problems inherent in current RTDL methods. Moreover, the option of cleavage and re-ligation allows for genotyping multiple variants in near proximity on the same RNA. One of the most clinically relevant applications of Pro-RSBL will be in early or disseminated cancer cell detection. However, we foresee the power of ProRSBL to lie in the versatility of programmable probes. Each programmed probe is capable of making a logical statement (AND, OR, NOT), and these statements could be integrated (i.e. via in situ PCR stitching or primer exchange reaction) to assemble a complex statement about the cell in situ. For example, we envision being able to label cells based on conditions such as ‘gene x is expressed, not gene y, but only in the presence of mutation z’ (
This application claims priority of U.S. Provisional Application No. 62/731,708, filed Sep. 14, 2018, the contents of which are hereby incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2019/051184 | 9/13/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62731708 | Sep 2018 | US |