Protein-nucleic acid interactions are involved in many cellular functions, including transcription, RNA splicing, mRNA decay, and mRNA translation. Readily accessible synthetic molecules that can bind with high affinity to specific sequences and structural components of single- or double-stranded nucleic acids have the potential to interfere with these interactions in a controllable way, making them attractive tools for molecular biology and medicine.
The human transcriptome is composed of a vast RNA population that undergoes further diversification by splicing. Genome-wide studies highlight that 90% of genes are alternatively spliced in humans, making splicing of the main drivers of proteomic diversity and, consequently, determinant of cellular function. Unsurprisingly, given its extent, numerous splice isoforms have been described to be associated with several diseases including cancer. Interestingly, many of these splice isoforms involved in cancers are derived from the same gene and have antagonistic functions, e.g., pro- and anti-angiogenic, or pro- and anti-apoptotic (in their translated protein form). Thus, splicing could drive key regulatory processes in switching a cell from non-cancerous to cancerous particularly.
In addition, mutations affecting mRNA expression have been shown to cause up to half of all disease-causing gene alterations. This potentially represents the most frequent cause of hereditary disease. Of these mutations, the most common consequence is exon skipping. Detecting specific splice sites in this large sequence pool is the responsibility of the major and minor spliceosomes in collaboration with hundreds of additional splicing factors. Outside of the core splice site motifs, the bulk of the information required for splicing is thought to be contained in exonic and intronic cis-regulatory elements that function by recruitment of sequence-specific RNA-binding protein factors that either activate or repress the use of adjacent splice sites. This complexity makes splicing susceptible to sequence polymorphisms and deleterious mutations. Beyond this, the complex and dynamic process of splicing may require several key interactions to take place at particular kinetic points in time during the splicing process. Indeed, RNA mis-splicing underlies a growing number of human diseases with substantial societal consequences.
However, targeting RNA splicing, more specifically targeting RNA targets, is intractable due to limited available data such as 2-dimensional, and 3-dimensional structures of RNA, chemotypes that engender RNA binding affinity or selectivity, chemotypes that engender RNA binding affinity and selectivity at particular mRNA splicing hot spots, and identification of RNA structural elements that form small molecule binding pockets. Screening of small molecule libraries for binding RNA targets could generate data about chemotypes that engender RNA binding. However, few small molecule-screening collections are enriched in RNA binders; in fact, most libraries are biased with compounds that bind to proteins. In addition, several of the available RNA binder libraries are non-specific or selective to particular RNAs. To address these needs and others, the present disclosure in various embodiments provides a structure-based screening platform that can be used to identify small molecules that bind to RNA and/or RNA protein complex, design novel molecules that can fit into particular RNA binding pockets, and improve specificity and selectivity of small molecules towards disease-associated pre-mRNA splicing defects.
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
In some aspects, the present disclosure provides a method comprising: providing a polynucleotide sample comprising a target polynucleotide; contacting to the target polynucleotide a first binding agent, a second binding agent, or both; wherein the target polynucleotide and the first binding agent form a first complex, wherein the second binding agent and the first complex form a second complex; and obtaining a nuclear magnetic resonance (NMR) spectrum of the first complex, the second complex, or both using a NMR device. In some embodiments, the target polynucleotide is a target ribonucleic acid (RNA). In some embodiments, the target RNA is a precursor messenger RNA (pre-mRNA) or a portion thereof. In some embodiments, the target polynucleotide contains a splice site or a portion thereof. In some embodiments, the splice site is a 5′ splice site, a cryptic 5′ splice site, a 3′ splice site, or a cryptic 3′ spice site, or any combinations thereof. In some embodiments, the target polynucleotide contains a branch point (BP), an exonic splicing enhancer (ESE), an exonic splicing silencer (ESS), an intronic splicing enhancer (ISE), an intronic splicing silencer (ISS), or a polypyrimidine tract, or any combinations thereof. In some embodiments, the target polynucleotide contains at least one intron or a fragment thereof. In some embodiments, the target polynucleotide contains at least one exon or a fragment thereof. In some embodiments, the target polynucleotide contains at least one exon-intron boundary. In some embodiments, the target polynucleotide is at least 8 nucleotides in length. In some embodiments, the target polynucleotide is at least 25 nucleotides in length. In some embodiments, the target polynucleotide is at most 1000 nucleotides in length. In some embodiments, the target polynucleotide is from 100 to 200 nucleotides in length. In some embodiments, the target polynucleotide comprises none or at least one nucleotide isotopically labeled with one or more atomic labels comprising 2H, 13C, 15N, 19F and 31P. In some embodiments, the first binding agent comprises a first polynucleotide, a first polypeptide, or a combination thereof. In some embodiments, the first polynucleotide is a first RNA. In some embodiments, the first RNA is a small nuclear RNA (snRNA) or a portion thereof. In some embodiments, the snRNA is U1 snRNA, U2 snRNA, U4 snRNA, U5 snRNA, U6 snRNA, U11 snRNA, U12 snRNA, U4atac snRNA, U5 snRNA, U6atac snRNA; or a portion thereof. In some embodiments, the first polypeptide is a protein component of a ribonucleoprotein or a portion thereof. In some embodiments, the ribonucleoprotein is a small nuclear ribonucleoprotein (snRNP) or a portion thereof. In some embodiments, the snRNP is U1 snRNP, U2 snRNP, U4 snRNP, U5 snRNP, U6 snRNP, U11 snRNP, U12 snRNP, U4atac snRNP, U5 snRNP, U6atac snRNP; or a portion thereof. In some embodiments, the first polypeptide is a protein or a portion thereof selected from the group comprising 9G8, A1 hnRNP, A2 hnRNP, ASD-1, ASD-2b, ASF, B1 hnRNP, C1 hnRNP, C2 hnRNAP, CBP20, CBP80, CELF, F hnRNP, FBP11, Fox-1, Fox-2, G hnRNP, H hnRNP, hnRNP 1, hnRNP 3, hnRNP C, hnRNP G, hnRNP K, hnRNP M, hnRNP U, Hu, HUR, I hnRNP, K hnRNP, KH-type splicing regulatory protein (KSRP), L hnRNP, M hnRNP, mBBP, muscle-blind like (MBNL), NF45, NFAR, Nova-1, Nova-2, nPTB, P54/SFRS11, polypyrimidine tract binding protein (PTB), PRP19 complex proteins, R hnRNP, RNPC1, SAM68, SC35, SF, SF1/BBP, SF2, SF3A, SF3B, SFRS10, Sm proteins, SR proteins, SRm300, SRp20, SRp30c, SRP35C, SRP36, SRP38, SRp40, SRp55, SRp75, SRSF, STAR, GSG, SUP-12, TASR-1, TASR-2, TIA, TIAR, TRA2, TRA2a/b, U hnRNP, U1 snRNP, U11 snRNP, U12 snRNP, U1-C, U2 snRNP, U2AF1-RS2, U2AF35, U2AF65, U4 snRNP, U5 snRNP, U6 snRNP, Urp, YB1, or any combination thereof. In some embodiments, the second binding agent is a small molecule. In some embodiments, the first binding agent comprises a small molecule. In some embodiments, the second binding agent comprises a second polynucleotide, a second polypeptide, or a combination thereof. In some embodiments, the second polynucleotide is a second RNA. In some embodiments, the second RNA is a small nuclear RNA (snRNA) or a portion thereof. In some embodiments, the snRNA is U1 snRNA, U2 snRNA, U4 snRNA, U5 snRNA, U6 snRNA, U11 snRNA, U12 snRNA, U4atac snRNA, U5 snRNA, U6atac snRNA; or a portion thereof. In some embodiments, the second polypeptide is a protein component of a ribonucleoprotein or a portion thereof. In some embodiments, the ribonucleoprotein is a small nuclear ribonucleoprotein (snRNP) or a portion thereof. In some embodiments, the snRNP is U1 snRNP, U2 snRNP, U4 snRNP, U5 snRNP, U6 snRNP, U11 snRNP, U12 snRNP, U4atac snRNP, U5 snRNP, U6atac snRNP or a portion thereof. In some embodiments, the second polypeptide is a protein or a portion thereof selected from the group comprising 9G8, A1 hnRNP, A2 hnRNP, ASD-1, ASD-2b, ASF, B1 hnRNP, C1 hnRNP, C2 hnRNAP, CBP20, CBP80, CELF, F hnRNP, FBP11, Fox-1, Fox-2, G hnRNP, H hnRNP, hnRNP 1, hnRNP 3, hnRNP C, hnRNP G, hnRNP K, hnRNP M, hnRNP U, Hu, HUR, I hnRNP, K hnRNP, KH-type splicing regulatory protein (KSRP), L hnRNP, M hnRNP, mBBP, muscle-blind like (MBNL), NF45, NFAR, Nova-1, Nova-2, nPTB, P54/SFRS11, polypyrimidine tract binding protein (PTB), PRP19 complex proteins, R hnRNP, RNPC1, SAM68, SC35, SF, SF1/BBP, SF2, SF3A, SF3B, SFRS10, Sm proteins, SR proteins, SRm300, SRp20, SRp30c, SRP35C, SRP36, SRP38, SRp40, SRp55, SRp75, SRSF, STAR, GSG, SUP-12, TASR-1, TASR-2, TIA, TIAR, TRA2, TRA2a/b, U hnRNP, U1 snRNP, U11 snRNP, U12 snRNP, U1-C, U2 snRNP, U2AF1-RS2, U2AF35, U2AF65, U4 snRNP, U5 snRNP, U6 snRNP, Urp, YB1, or any combination thereof. In some embodiments, the first complex comprises a binding pocket. In some embodiments, the binding pocket comprises a bulge, or a mutation, or a stem-loop, or any combinations thereof. In some embodiments, the binding pocket does not comprise a bulge, a mutation, or a stem-loop. In some embodiments, the bulge or the mutation causes a 3-dimensional structural change in the first polynucleotide. In some embodiments, the second binding agent binds to the binding pocket. In some embodiments, the target polynucleotide comprises a sequence encoded by a gene or a gene variant thereof selected from the group consisting of ABCA4, ABCB4, ABCD1, ACADSB, ADA, ADAMTS13, AGL, ALB, ALDH3A2, ALG6, APC, APOB, AR, ATM, ATP7A, ATR, B2M, BMP2K, BRCA1, BRCA2, BTK, C3, CAT, CD46, CDH1, CDH23, CFTR, CHM, COL11A1, COL11A2, COL1A1, COL1A2, COL2A1, COL3A1, COL4A5, COL6A1, COL6A3, COL7A1, COL9A2, COLQ, CUL4B, CYBB, CYP17, CYP19, CYP27, CYP27A1, DES, DMD, DYSF, EGFR, EMD, ETV4, F13A1, F5, F7, F8, FAH, FANCA, FANCC, FANCG, FBN1, FECH, FGA, FGFR2, FGG, FIX, FLNA, FOXM1, FRAS1, GALC, GH1, GHV, HADHA, HBA2, HBB, HEXA, HEXB, HLCS, HMBS, HMGCL, HNF1A, HPRT1, HPRT2, HSF4, HSPG2, HTT, IDS, IKBKAP, INSR, ITGB2, ITGB3, JAG1, KRAS, KRT5, L1CAM, LAMA3, LDLR, LMNA, LPL, MADD, MAPT, MLH1, MSH2, MST1R, MTHFR, MUT, MVK, NF1, NF2, OAT, OPA1, OTC, PAH, PBGD, PCCA, PDH1, PGK1, PHEX, PKD2, PKLR, PLEKHM1, PLKR, POMT2, PRDM1, PRKAR1A, PROC, PSEN1, PTCH1, PTEN, PYGM, RP6KA3, RPGR, RSK2, SBCAD, SCN5A, SERPINA1, SLC12A3, SLC6A8, SMN2, SPINK5, SPTA1, TP53, TRAPPC2, TSC1, TSC2, TSHB, UGT1A1, and USH2A.
In some embodiments, a first NMR spectrum is obtained for the first complex, and a second NMR spectrum is obtained for the second complex. In some embodiments, the method further comprises comparing the first and the second NMR spectrum. In some embodiments, the method further comprises selecting a second binding agent based on a comparison of the first and the second NMR spectrum. In some embodiments, the method further comprises determining a chemical shift of the first and the second NMR spectrums.
In some aspects, the present disclosure provides a method comprising: providing a polynucleotide sample comprising a target polynucleotide, wherein the target polynucleotide comprises a splice site, a branch point (BP), an exonic splicing enhancer (ESE), an exonic splicing silencer (ESS), an intronic splicing enhancer (ISE), an intronic splicing silencer (ISS), or a polypyrimidine tract, or any combinations thereof; contacting with the target polynucleotide a first binding agent; and obtaining a first NMR spectrum of the polynucleotide sample using a NMR device. In some embodiments, the target polynucleotide is a target RNA. In some embodiments, the target polynucleotide is a pre-mRNA or a portion thereof. In some embodiments, the target polynucleotide contains at least one exon or a fragment thereof. In some embodiments, the target polynucleotide contains at least one intron or a fragment thereof. In some embodiments, the target polynucleotide contains an exon-intron boundary. In some embodiments, the target polynucleotide contains a splice site. In some embodiments, the splice site is a 5′ splice site, a cryptic 5′ splice site, 3′ splice site, or a cryptic 3′ splice site, or a portion thereof. In some embodiments, the target polynucleotide is at least 8 nucleotides in length. In some embodiments, the target polynucleotide is at least 25 nucleotides in length. In some embodiments, the target polynucleotide is at most 1000 nucleotides in length. In some embodiments, the target polynucleotide comprises none or at least one nucleotide isotopically labeled with one or more atomic labels comprising 2H, 13C, 15N, 19F and 31P. In some embodiments, the first binding agent comprises a first polynucleotide, a first polypeptide, or a combination thereof. In some embodiments, the first polynucleotide is a first RNA. In some embodiments, the first RNA is a small nuclear RNA (snRNA) or a portion thereof. In some embodiments, the snRNA is U1 snRNA, U2 snRNA, U4 snRNA, U5 snRNA, U6 snRNA, U11 snRNA, U12 snRNA, U4atac snRNA, U5 snRNA, U6atac snRNA; or a portion thereof. In some embodiments, the first polypeptide is a protein component of a ribonucleoprotein or a portion thereof. In some embodiments, the ribonucleoprotein is a small nuclear ribonucleoprotein (snRNP) or a portion thereof. In some embodiments, the snRNP is U1 snRNP, U2 snRNP, U4 snRNP, U5 snRNP, U6 snRNP, U11 snRNP, U12 snRNP, U4atac snRNP, U5 snRNP, U6atac snRNP or a portion thereof. In some embodiments, the first polypeptide is a protein or a portion thereof selected from the group comprising 9G8, A1 hnRNP, A2 hnRNP, ASD-1, ASD-2b, ASF, B1 hnRNP, C1 hnRNP, C2 hnRNAP, CBP20, CBP80, CELF, F hnRNP, FBP11, Fox-1, Fox-2, G hnRNP, H hnRNP, hnRNP 1, hnRNP 3, hnRNP C, hnRNP G, hnRNP K, hnRNP M, hnRNP U, Hu, HUR, I hnRNP, K hnRNP, KH-type splicing regulatory protein (KSRP), L hnRNP, M hnRNP, mBBP, muscle-blind like (MBNL), NF45, NFAR, Nova-1, Nova-2, nPTB, P54/SFRS11, polypyrimidine tract binding protein (PTB), PRP19 complex proteins, R hnRNP, RNPC1, SAM68, SC35, SF, SF1/BBP, SF2, SF3A, SF3B, SFRS10, Sm proteins, SR proteins, SRm300, SRp20, SRp30c, SRP35C, SRP36, SRP38, SRp40, SRp55, SRp75, SRSF, STAR, GSG, SUP-12, TASR-1, TASR-2, TIA, TIAR, TRA2, TRA2a/b, U hnRNP, U1 snRNP, U11 snRNP, U12 snRNP, U1-C, U2 snRNP, U2AF1-RS2, U2AF35, U2AF65, U4 snRNP, U5 snRNP, U6 snRNP, Urp, YB1, or any combination thereof. In some embodiments, the target polynucleotide and the first binding agent form a first complex. In some embodiments, the first complex comprises a binding pocket. In some embodiments, the binding pocket comprises a bulge, or a mutation, or a stem-loop, or any combinations thereof. In some embodiments, the binding pocket does not comprise a bulge, a mutation, or a stem-loop. In some embodiments, the bulge or the mutation causes a 3-dimensional structural change in the first polynucleotide. In some embodiments, the method further comprises contacting with the first complex a second binding agent. In some embodiments, the second binding agent comprises one or more molecules selected from a group comprising a polynucleotide, a polypeptide, a protein, a small molecule, an ion, a salt, and an atom. In some embodiments, the second binding agent is a small molecule. In some embodiments, the small molecule is a library of small molecules. In some embodiments, the method further comprises obtaining a second NMR spectrum after contacting with the first complex the second binding agent. In some embodiments, the method further comprises comparing the first and the second NMR spectrum. In some embodiments, the method further comprises determining a chemical shift of the one or more atoms from the first and the second NMR spectrums. In some embodiments, the target polynucleotide the target polynucleotide comprises a sequence encoded by a gene or a gene variant thereof selected from the group consisting of ABCA4, ABCB4, ABCD1, ACADSB, ADA, ADAMTS13, AGL, ALB, ALDH3A2, ALG6, APC, APOB, AR, ATM, ATP7A, ATR, B2M, BMP2K, BRCA1, BRCA2, BTK, C3, CAT, CD46, CDH1, CDH23, CFTR, CHM, COL11A1, COL11A2, COL1A1, COL1A2, COL2A1, COL3A1, COL4A5, COL6A1, COL6A3, COL7A1, COL9A2, COLQ, CUL4B, CYBB, CYP17, CYP19, CYP27, CYP27A1, DES, DMD, DYSF, EGFR, EMD, ETV4, F13A1, F5, F7, F8, FAH, FANCA, FANCC, FANCG, FBN1, FECH, FGA, FGFR2, FGG, FIX, FLNA, FOXM1, FRAS1, GALC, GH1, GHV, HADHA, HBA2, HBB, HEXA, HEXB, HLCS, HMBS, HMGCL, HNF1A, HPRT1, HPRT2, HSF4, HSPG2, HTT, IDS, IKBKAP, INSR, ITGB2, ITGB3, JAG1, KRAS, KRT5, L1CAM, LAMA3, LDLR, LMNA, LPL, MADD, MAPT, MLH1, MSH2, MST1R, MTHFR, MUT, MVK, NF1, NF2, OAT, OPA1, OTC, PAH, PBGD, PCCA, PDH1, PGK1, PHEX, PKD2, PKLR, PLEKHM1, PLKR, POMT2, PRDM1, PRKAR1A, PROC, PSEN1, PTCH1, PTEN, PYGM, RP6KA3, RPGR, RSK2, SBCAD, SCN5A, SERPINA1, SLC12A3, SLC6A8, SMN2, SPINK5, SPTA1, TP53, TRAPPC2, TSC1, TSC2, TSHB, UGT1A1, and USH2A.
In some aspects, the present disclosure provides a method for selecting a binding agent to a polynucleotide, the method comprising: (a) providing a polynucleotide sample comprising a target polynucleotide; (b) obtaining a first NMR spectrum of the polynucleotide sample using a NMR device; (c) contacting with the polynucleotide sample a binding agent; (d) obtaining a second NMR spectrum of the polynucleotide sample after contacting with the binding agent; and (e) comparing the first and the second NMR spectrum; and (f) selecting the binding agent based on the comparison. In some embodiments, the binding agent comprises a small molecule, a polynucleotide, or a polypeptide, or any combinations thereof. In some embodiments, the binding agent comprises a library of small molecules. In some embodiments, the polynucleotide sample further comprises a first polynucleotide. In some embodiments, the target polynucleotide and the first polynucleotide are added with about equimolar amounts. In some embodiments, the first polynucleotide is a first RNA. In some embodiments, the first RNA is a small nuclear RNA (snRNA) or a portion thereof. In some embodiments, the snRNA is U1, U2, U4, U5, U6, U11, U12, U4atac, U5, or U6atac snRNA; or a portion thereof. In some embodiments, the target and the first polynucleotide form a duplex. In some embodiments, the duplex contains a binding pocket. In some embodiments, the binding pocket comprises a bulge, or a mutation, or a stem-loop, or any combinations thereof. In some embodiments, the binding pocket does not comprise a bulge, a mutation, or a stem-loop. In some embodiments, the target polynucleotide comprises a splice site, a branch point (BP), an exonic splicing enhancer (ESE), an exonic splicing silencer (ESS), an intronic splicing enhancer (ISE), an intronic splicing silencer (ISS), or a polypyrimidine tract, or a portion thereof. In some embodiments, the target polynucleotide contains at least one exon or a fragment thereof. In some embodiments, the target polynucleotide contains at least one intron or a fragment thereof. In some embodiments, the target polynucleotide contains at least one exon-intron boundary. In some embodiments, the target polynucleotide is at least 8 nucleotides in length. In some embodiments, the target polynucleotide is at least 25 nucleotides in length. In some embodiments, the target polynucleotide is at most 1000 nucleotides in length. In some embodiments, the target polynucleotide is from 100 to 200 nucleotides in length. In some embodiments, the target polynucleotide comprises none or at least one nucleotide isotopically labeled with one or more atomic labels comprising 2H, 13C, 15N, 19F and 31P. some embodiments, the method further comprises determining a chemical shift of the first or the second NMR spectrum. In some embodiments, the method further comprises determining a 3-dimensional atomic resolution structure of the polynucleotide and the bound small molecule. In some embodiments, the 3-dimensional atomic resolution structure is determined by structure prediction software. In some embodiments, the structure prediction software is Amos/Candid-program suite. In some embodiments, the structure prediction software is MC-fold|MC-Sym pipeline. In some embodiments, determining the 3-dimensional atomic resolution structure comprises generating a plurality of theoretical structural polynucleotide 2-dimensional models using the nucleotide sequence and one or more 2-dimensional structure prediction algorithms. In some embodiments, the method further comprises generating a plurality of theoretical structural polynucleotide 3-dimensional models using a 3-dimensional structure predicting algorithm using the plurality of theoretical structural polynucleotide 2-dimensional models and optionally one or more known and/or assumed polynucleotide 2-dimensional models. In some embodiments, the method further comprises generating a predicted chemical shift set for each of the plurality of theoretical structural polynucleotide 3-dimensional models. In some embodiments, the method further comprises comparing the predicted chemical shift set to the chemical shift(s). In some embodiments, the method further comprises selecting one or more theoretical structural polynucleotide 3-dimensional models having an agreement between the respective predicted chemical shift set and the chemical shift(s) as the one or more 3-dimensional atomic resolution structures. In some embodiments, the 2-dimensional structure prediction algorithm is a nearest neighbor algorithm. In some embodiments, the method further comprises the step: generating one or more refined 3-dimensional atomic resolution structures by refining the selected one or more theoretical structural polynucleotide 3-dimensional model using a modeling software that performs one or more functions comprising energy minimization and/or a molecular dynamics simulation. In some embodiments, the predicted chemical shift set is generated by comparing each theoretical structural polynucleotide 3-dimensional model with a NMR data-structure database. In some embodiments, generating the predicted chemical shift set comprises calculating a polynucleotide structural metric comprising atomic coordinates, stacking interactions, magnetic susceptibility, electromagnetic fields, or dihedral angles from one or more experimentally determined polynucleotide 3-dimensional structures. In some embodiments, the method further comprises using a regression algorithm to generate a set of mathematical functions or objects that describe relationships between experimental chemical shifts and the polynucleotide structural metric of the experimentally determined 3-dimensional polynucleotide structures. In some embodiments, the method further comprises calculating a polynucleotide structural metric for each of the theoretical structural polynucleotide 3-dimensional models. In some embodiments, the method further comprises inputting the polynucleotide structural metric for each of the theoretical structural polynucleotide 3-dimensional models into the set of mathematical functions or objects to generate the predicted chemical shift set. In some embodiments, the regression algorithm is machine learning algorithm comprising a Random Forest algorithm. In some embodiments, the NMR spectrum is obtained with a NMR spectrometer frequency ranging from about 1 GHz MHz to about 20 MHz. In some embodiments, the NMR spectrum is obtained with a NMR spectrometer frequency ranging from 500 MHz to 900 MHz. In some embodiments, the NMR device is AVANCE III. In some embodiments, the method further comprises determining a binding kinetics of a snRNA binding to the target polynucleotide with or without the binding agent selected from the step (f). In some embodiments, the method further comprises determining a binding kinetics of a snRNP binding to the target polynucleotide with or without the binding agent selected from the step (f). In some embodiments, the method further comprises comparing the binding kinetics determined with and without the binding agent selected from step (f). In some embodiments, the method further comprises selecting a first small molecule and a second small molecule. In some embodiments, the method further comprises determining a first binding kinetics of a snRNA binding to the target polynucleotide with or without the first small molecule, and a second binding kinetics of the snRNA binding to the target polynucleotide with or without the second small molecule. In some embodiments, the method further comprises comparing the first binding kinetics and the second binding kinetics. In some embodiments, the binding kinetics is determined by surface plasmon resonance (SPR), Bio-Layer Interferometry (BLI) technology (Octet Systems), isothermal titration calorimetry (ITC), or fluorescence anisotropy. In some embodiments, the method comprises determining a 2-dimensional model or a 3-dimensional structure of the first small molecule and the second small molecule. In some embodiments, the method comprises comparing the 2-dimensional model or the 3-dimensional structure of the first and the second small molecule.
In some aspects, the present disclosure provides a method comprising: identifying one or more binding pockets formed by a target polynucleotide and a first polynucleotide, wherein the target polynucleotide contains a sequence of a splice site, a branch point (BP), an exonic splicing enhancer (ESE), an exonic splicing silencer (ESS), an intronic splicing enhancer (ISE), an intronic splicing silencer (ISS), or a polypyrimidine tract, or any combinations thereof; and virtually screening one or more small molecules or fragments thereof against the one or more binding pockets, wherein the virtual screening process identifies putative small molecule or fragment hits. In some embodiments, identifying one or more binding pockets comprises solving a 3-dimensional atomic resolution structure comprising the target polynucleotide and the first polynucleotide. In some embodiments, the 3-dimensional atomic resolution structure is determined by a NMR spectrum. In some embodiments, the method further comprises testing one or more small molecule or fragment hits from the virtual screen using an experimental assay. In some embodiments, the experimental assay is surface plasmon resonance (SPR), Bio-Layer Interferometry (BLI) technology (Octet Systems), isothermal titration calorimetry (ITC), or fluorescence anisotropy. In some embodiments, the target polynucleotide is a RNA. In some embodiments, the target polynucleotide is a pre-mRNA. In some embodiments, the splice site is a 5′ splice site, a cryptic 5′ splice site, a 3′ splice site, or a cryptic 3′ splice site. In some embodiments, the target polynucleotide contains at least one intron or a fragment thereof. In some embodiments, the target polynucleotide contains at least one exon or a fragment thereof. In some embodiments, the target polynucleotide contains at least one exon-intron boundary. In some embodiments, the target polynucleotide is at least 8 nucleotides in length. In some embodiments, the target polynucleotide is at least 25 nucleotides in length. In some embodiments, the target polynucleotide is at most 1000 nucleotides in length. In some embodiments, the target polynucleotide is from 100 to 200 nucleotides in length. In some embodiments, the target polynucleotide comprises a sequence encoded by a gene or a gene variant thereof selected from the group consisting of ABCA4, ABCB4, ABCD1, ACADSB, ADA, ADAMTS13, AGL, ALB, ALDH3A2, ALG6, APC, APOB, AR, ATM, ATP7A, ATR, B2M, BMP2K, BRCA1, BRCA2, BTK, C3, CAT, CD46, CDH1, CDH23, CFTR, CHM, COL11A1, COL11A2, COL1A1, COL1A2, COL2A1, COL3A1, COL4A5, COL6A1, COL6A3, COL7A1, COL9A2, COLQ, CUL4B, CYBB, CYP17, CYP19, CYP27, CYP27A1, DES, DMD, DYSF, EGFR, EMD, ETV4, F13A1, F5, F7, F8, FAH, FANCA, FANCC, FANCG, FBN1, FECH, FGA, FGFR2, FGG, FIX, FLNA, FOXM1, FRAS1, GALC, GH1, GHV, HADHA, HBA2, HBB, HEXA, HEXB, HLCS, HMBS, HMGCL, HNF1A, HPRT1, HPRT2, HSF4, HSPG2, HTT, IDS, IKBKAP, INSR, ITGB2, ITGB3, JAG1, KRAS, KRT5, L1CAM, LAMA3, LDLR, LMNA, LPL, MADD, MAPT, MLH1, MSH2, MST1R, MTHFR, MUT, MVK, NF1, NF2, OAT, OPA1, OTC, PAH, PBGD, PCCA, PDH1, PGK1, PHEX, PKD2, PKLR, PLEKHM1, PLKR, POMT2, PRDM1, PRKAR1A, PROC, PSEN1, PTCH1, PTEN, PYGM, RP6KA3, RPGR, RSK2, SBCAD, SCN5A, SERPINA1, SLC12A3, SLC6A8, SMN2, SPINK5, SPTA1, TP53, TRAPPC2, TSC1, TSC2, TSHB, UGT1A1, and USH2A. In some embodiments, the method further comprises identifying a first putative small molecule or and a second putative small molecule. In some embodiments, the method further comprises determining a first binding kinetics of the first putative small molecule or fragment hit binding to the target polynucleotide, and a second binding kinetics of the second putative small molecule or fragment hit binding to the target polynucleotide. In some embodiments, the method further comprises comparing the first binding kinetics and the second binding kinetics, thereby selecting a stronger small molecule or fragment hit. In some embodiments, the binding kinetics are determined using surface plasmon resonance (SPR), Bio-Layer Interferometry (BLI) technology (Octet Systems), isothermal titration calorimetry (ITC), or fluorescence anisotropy.
In some aspects, the present disclosure provides a method of selecting a binding agent to a target polynucleotide, comprising: contacting to a sample containing the target polynucleotide a binding agent, wherein the target polynucleotide contains a splice site, a branch point (BP), an exonic splicing enhancer (ESE), an exonic splicing silencer (ESS), an intronic splicing enhancer (ISE), an intronic splicing silencer (ISS), or a polypyrimidine tract, or any combinations thereof, obtaining a structure of the binding agent and the target polynucleotide in a first assay; obtaining a binding kinetics of the binding agent in a second assay; and selecting the binding agent based on the structure and the binding kinetics. In some embodiments, the first assay and the second assay are the same. In some embodiments, the first assay and the second assay are NMR. In some embodiments, the first assay is NMR, and the second assay is surface plasmon resonance (SPR), Bio-Layer Interferometry (BLI) technology (Octet Systems), isothermal titration calorimetry (ITC), or fluorescence anisotropy. In some embodiments, the binding agent is a small molecule. In some embodiments, the sample further comprises a first polynucleotide. In some embodiments, the first polynucleotide is a RNA.
In some embodiments, the RNA is a small nuclear RNA (snRNA) or a portion thereof. In some embodiments, the snRNA is U1, U2, U4, U5, U6, U11, U12, U4atac, U5, or U6atac snRNA; or a portion thereof. In some embodiments, the target and the first polynucleotide form a duplex. In some embodiments, the duplex contains a binding pocket. In some embodiments, the binding pocket comprises a bulge, or a mutation, or a stem-loop, or any combinations thereof. In some embodiments, the binding pocket does not comprise a bulge, a mutation, or a stem-loop. In some embodiments, the sample further comprises a protein or a portion thereof. In some embodiments, the protein is a ribonucleoprotein. In some embodiments, the ribonucleoprotein is a small nuclear ribonucleoprotein (snRNP) or a portion thereof. In some embodiments, the snRNP is U1 snRNP, U2 snRNP, U4 snRNP, U5 snRNP, U6 snRNP, U11 snRNP, U12 snRNP, U4atac snRNP, U5 snRNP, U6atac snRNP or a portion thereof. In some embodiments, the protein is selected from the group comprising 9G8, A1 hnRNP, A2 hnRNP, ASD-1, ASD-2b, ASF, B1 hnRNP, C1 hnRNP, C2 hnRNAP, CBP20, CBP80, CELF, F hnRNP, FBP11, Fox-1, Fox-2, G hnRNP, H hnRNP, hnRNP 1, hnRNP 3, hnRNP C, hnRNP G, hnRNP K, hnRNP M, hnRNP U, Hu, HUR, I hnRNP, K hnRNP, KH-type splicing regulatory protein (KSRP), L hnRNP, M hnRNP, mBBP, muscle-blind like (MBNL), NF45, NFAR, Nova-1, Nova-2, nPTB, P54/SFRS11, polypyrimidine tract binding protein (PTB), PRP19 complex proteins, R hnRNP, RNPC1, SAM68, SC35, SF, SF1/BBP, SF2, SF3A, SF3B, SFRS10, Sm proteins, SR proteins, SRm300, SRp20, SRp30c, SRP35C, SRP36, SRP38, SRp40, SRp55, SRp75, SRSF, STAR, GSG, SUP-12, TASR-1, TASR-2, TIA, TIAR, TRA2, TRA2a/b, U hnRNP, U1 snRNP, U11 snRNP, U12 snRNP, U1-C, U2 snRNP, U2AF1-RS2, U2AF35, U2AF65, U4 snRNP, U5 snRNP, U6 snRNP, Urp, YB1, or any combination thereof.
In some embodiments, the target polynucleotide comprises GGA/gtgagu, AGA/gugagu, AGA/gugagu, AGA/gugagu, AGA/gugagu, AGA/gugagu, AGA/gugagc, AGA/gugagu, AGA/gugagu, GGA/gugagu, CGA/guccgu, GGAguaagu, GGA/guaagu, AGA/guaagu, AGA/guaagu, AGA/guaagu, AGA/guaagu, AGA/guaagu, AGA/guaagu, AGA/guaaga, AGA/guaagu, AGA/guaagu, AGA/guaagu, GGA/guaagu, AGA/guaagg, AGA/guaagu, AGA/guaagu, AGA/guaagu, GGA/guaagu, AGA/guaaga, AGA/guaagu, AGA/guaagu, AGA/guaagu, GGA/guaagg, AGA/guaagu, AGA/guaagu, GGA/guaagu, AGA/guaagu, AGA/guaaga, AGA/guaagu, AGA/guagau, UGA/gugaau, GGA/guuagu, AGA/guaggu, AGA/guaggu, GGA/guaggu, or AGA/gugcgu.
In some embodiments, the target polynucleotide comprises ACA/gugagg, AAA/auaagu, GAA/ggaagu, GAA/guaaau, GCA/guagga, CAA/gugagu, GUA/gugagu, GAA/guggg, CCA/guaaac, UUA/guaaau, CAA/guaaac, ACA/guaaau, GAA/guaaac, UCA/guaaac, UCA/guaaau, GCA/guaaau, ACA/guaaau, CAA/gcaag, CAA/guaagg, UCA/guaagu, AUA/gugaau, CAA/gugaaa, CCA/gugaga, UCA/gugauu, GAA/gugugu, GAA/uaaguu, CAA/guaugu, AAA/guaugu, CAA/guauuu, ACA/guuagu, GCA/guuagu, or ACA/guuuga.
In some embodiments, the target polynucleotide comprises CAA/guaacu, AUA/gucagu, GAA/gucugg, or AAA/guacau.
In some embodiments, the target polynucleotide comprises NNBgunnnn, NNBhunnnn, or NNBgvnnnn, wherein N/n is A, U, G or C; B is C, G, or U; h is a, c, or u; v is a, c or g.
In some embodiments, the target polynucleotide comprises NNBgurrrn, NNBguwwdn, NNBguvmvn, NNBguvbbn, NNBgukddn, NNBgubnbd, NNBhunngn, NNBhurmhd, or NNBgvdnvn, wherein N/n is A, U, G or C; B is C, G, or U; h is a, c, or u; v is a, c or g; r is a or g; m is a or c; d is a, g or u; k is g or u; w is a or u.
In some embodiments, the target polynucleotide comprises CAC/gugagc, UCC/gugagc, AGC/gugagu, AGC/gugagu, AGG/gugagg, GUG/gugagc, GAG/gugagg, CCG/gugagg, UUG/gugagc, GUG/gugagu, UUU/gugagc, UUU/gugagc, GAU/gugagg, AGU/gugagu, AGU/gugagu, AGU/gugagu, AGU/gugagu, AGC/guaagu, GGC/guaagu, AAC/guaagu, GGC/guaagu, AGC/guaagg, GGC/guaagu, AGC/guaagu, GGC/guaagu, GGC/guaagu, AGC/guaagu, GAG/guaaga, CAG/guaagu, AGU/guaagc, AAU/guaagc, AAU/guaagg, CCU/guaagc, AGU/guaagu, GGU/guaagu, AGU/guaagu, AGU/guaagu, AGU/guaagu, GAU/guaagu, UCC/gugaau, CCG/gugaau, ACG/gugaac, CUG/gugaau, AGG/gugaau, UUG/gugaau, CCG/gugaau, GAG/gugaag, CCU/gugaau, CGU/gugaau, CCU/gugaau, GAG/guagga, CAU/guaggg, UGG/guggau, CAG/guggau, UGG/guggau, CGG/gugggu, GCG/guggga, UGG/guggggg, UGG/gugggug, CGU/gugggu, AUC/gguaaaa, GGG/guaaau, GCG/guaaaa, CAG/guaaag, UGG/guaaag, AAG/guaaag, AAG/guaaau, CAG/guaaag, UAG/guaaag, UUG/guaaag, GAG/guaaag, CAG/guaaag, AUG/guaaaa, AAG/guaaag, CAG/guaaag, CAG/guaaaa, GAG/guaaag, AAG/guaaag, UGU/guaaau, GUU/guaaau, GUU/guaaau, UCU/guaaau, GCU/guaaau, GAU/guaaau, GCU/guaaau, UCU/guaaau, ACU/guaaau, CCU/guaaau, CCU/guaaau, ACU/guaaau, AAU/guaaau, AGG/guagac, UUG/guagau, CAG/guagag, AAG/guagag, AAU/gugagu, CAG/gugagc, AAG/gugggu, AAG/guaggg, CAG/guaggc, or AGC/guaggu.
In some embodiments, the target polynucleotide comprises CAG/guaau, CAG/guaaugu, CAG/guaaugu, CAG/guaaugu, CAG/guaaugu, GAG/guaauac, GAG/guaauau, GAG/guaaugu, AAG/guaauaa, AAG/guaaugu, AAG/guaaugu, AAG/guaaugua, AAG/guaaugu, AAG/guaaugu, GCU/guaauu, CCU/guaauu, GAU/guaauu, CAU/guaauu, AAU/guaauu, AGG/guauau, CAG/guauau, UAG/guauau, CAG/guauau, CGG/guauau, GAG/guauau, CGG/guauau, CAG/guauag, AAG/guauau, CAG/guauag, AAG/guauac, UAG/guauau, CAG/guauag, CAG/guauau, AAG/guuaag, AUC/guuaga, GCG/guuagu, AAG/guuagc, UGG/guuagu, GCG/guuagu, CUG/guuugu, CUG/guauga, CAG/guauga, UAG/guauga, AAG/guaugg, AAG/guauga, GAG/guaugg, CAG/guauga, CAG/guaugg, AAG/guaugg, UGG/guaugc, CAG/guaugu, AUG/guaugu, AAG/guaugu, AAG/guaugg, CAG/guaugg, GAG/guauga, CGG/guaugg, AAU/guaugu, AAG/guauuu, AUG/guauuu, UAG/guauug, AAG/guauuu, CAG/guauug, CAG/guauug, CAU/guauuu, ACU/guauu, AAG/guuuau, AAG/guuuaa, CAG/guuugg, CAG/guuugg, CAG/guuugc, AAG/guuugg, AAG/guuugg, or UGG/guaugc.
In some embodiments, the target polynucleotide comprises CCG/guaacu, UUG/guaaca, AUG/guaacc, GGG/guaacu, AAG/guaaca, AAG/guaacu, UUG/guaaca, GCU/guaacu, ACU/guaacu, GCU/guaacu, UAG/guaccc, AAG/guaccu, CAG/guaccg, UGG/guacca, CAG/gucaau, AAG/gucaau, AAG/gucaag, AUG/guacau, GGG/guacau, UUG/guacau, CAG/guacag, CAG/guacag, CAG/guacag, CAG/guacag, AAG/guacag, CAG/guacag, GAG/guacaa, AAG/guacag, CAG/guacaa, UGU/guacau, CAG/gugcac, GGG/gugcau, CUG/gugcau, UAG/gugcau, CAG/gugcag, CAG/gugcag, AGG/gugcaa, AAC/gugacu, UCC/gugacu, CCG/gugacu, GCG/gugacu, GGG/gugacg, GGG/gugacg, GCG/gugacu, AUG/gugacc, GAU/gugacu, GGC/gucagu, or UAG/gucaga.
In some embodiments, the target polynucleotide comprises AAG/guacgg, AAG/guacgg, AAG/guacug, AAG/guagcg, AAG/guagua, AAG/guagua, AAG/guagua, AAG/guagug, AAG/guauca, AAG/guaucg, AAG/guaucu, AAG/gucucu, AAG/gugccu, AAG/guggua, AAG/guguua, ACG/guagcu, AGC/guacgu, CAG/guacug, CAG/guagua, CAG/guagug, CAG/guagug, CAG/guaucc, CAG/gugcgc, or GAG/gugccu.
In some embodiments, the target polynucleotide comprises CGG/guguau, AAG/guguau, GAG/guguac, CAG/guguau, UAG/guguau, CAG/guguag, GAG/guguau, AAG/gugugc, CAG/guguga, AAG/gugugu, CAG/guguga, CAG/gugugu, UGG/gugugg, CUG/guguga, CGG/gugugu, GAG/gugugc, CAG/guguga, AAU/gugugu, CAG/gugugu, CAG/gugugu, GAG/gugugu, CAG/guuguu, CAG/guuguc, GUG/guugua, CAG/guuguu, AAC/gugauu, CAG/gugaua, AGG/gugauc, GUG/gugauc, CCU/gugauu, GAU/gugauu, CAC/guuggu, CAG/guuggc, AAG/guuagc, or CAG/guugau.
In some embodiments, the target polynucleotide comprises AUG/gucauu, CGG/gucauaauc, AAG/gucugu, AAG/gucuggg, CAG/gucugga, CAG/gucuggu, CAG/gucuga, GAG/gucuggu, AAG/gugucu, AAG/gugucu, AGG/gugucu, CUG/gugcuu, CAG/gucuuu, CAG/guugcu, GAG/gugcug, or CAG/gugcug.
In some embodiments, the target polynucleotide comprises CGC/auaagu, UUC/auaagu, UGG/auaagg, ACG/auaagg, GUU/auaagu, CCU/auaagu, UUU/auaagc, GAG/aucugg, AAC/augagga, GAC/augagg, ACC/augagu, GGG/augagu, AAG/augagc, CAG/augagg, GAG/augagg, GCG/augagu, AAG/gaugag, CCU/augagu, GAU/augagu, GAU/augagu, UAG/augcgu, CAG/auuggu, AAG/auuugu, ACG/cuaagc, CAG/cugugu, CUG/uuaag, GAG/uuaagu, AAG/uuaagg, AUU/uuaagc, CUG/uugaga, CAG/uuuggu, or GGG/auaagu.
In some embodiments, target polynucleotide comprises CAG/auaacu, GAG/cugcag, or AAG/uuaaua.
In some embodiments, the target polynucleotide comprises GCG/gagagu, AAG/ggaaaa, AUC/gguaaaa, AAG/gcaaaa, UGU/gcaagu, GAG/gcaggu, GAG/gcgugg, GAG/gcuccc, CAG/gcuggu, or AAG/gaugag.
The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms “a”, “an” and “the” may be intended to include the plural forms as well, unless the context clearly indicates otherwise. Thus, for example, reference to “a binding agent” includes mixtures of binding agents; reference to “an NMR resonance” includes more than one resonance, and the like. The terms “comprises,” “comprising,” “including,” and “having,” are inclusive and therefore specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order discussed or illustrated, unless specifically identified as an order of performance. It is also to be understood that additional or alternative steps may be employed.
In one aspect, provided herein is a method comprising: providing a polynucleotide sample comprising a target polynucleotide; contacting to the target polynucleotide a first binding agent, a second binding agent, or both; wherein the target polynucleotide and the first binding agent form a first complex, wherein the second binding agent and the first complex form a second complex; and obtaining a nuclear magnetic resonance (NMR) spectrum of the first complex, the second complex, or both using a NMR device. In some embodiments, the target polynucleotide is a target ribonucleic acid (RNA). In some embodiments, the target RNA is a precursor messenger RNA (pre-mRNA) or a portion thereof. In some embodiments, the target polynucleotide contains a splice site or a portion thereof. In some embodiments, the splice site is a 5′ splice site, a cryptic 5′ splice site, a 3′ splice site, or a cryptic 3′ spice site, or a portion thereof. In some embodiments, the target polynucleotide contains a branch point (BP), an exonic splicing enhancer (ESE), an exonic splicing silencer (ESS), an intronic splicing enhancer (ISE), an intronic splicing silencer (ISS), or a polypyrimidine tract, or any combinations thereof. In some embodiments, the target polynucleotide contains at least one intron or a fragment thereof. In some embodiments, the target polynucleotide contains at least one exon or a fragment thereof. In some embodiments, the target polynucleotide contains at least one exon-intron boundary. In some embodiments, the target polynucleotide is at least 8 nucleotides in length. In some embodiments, the target polynucleotide is at least 25 nucleotides in length. In some embodiments, the target polynucleotide is at most 1000 nucleotides in length. In some embodiments, the target polynucleotide is from 100 to 200 nucleotides in length. In some embodiments, the target polynucleotide comprises none or at least one nucleotide isotopically labeled with one or more atomic labels comprising 2H, 13C, 15N, 19F and 31P. In some embodiments, the first binding agent comprises a first polynucleotide, a first polypeptide, or a combination thereof. In some embodiments, the first polynucleotide is a first RNA. In some embodiments, the first RNA is a small nuclear RNA (snRNA) or a portion thereof. In some embodiments, the first polypeptide is a protein or a protein component of a protein-RNA complex. In some embodiments, the polypeptide is a protein or protein component of a trans-acting factor. In some embodiments, the polypeptide is a portion, e.g. a domain or subdomain, of a protein associated with RNA splicing. In some embodiments, the polypeptide is a protein component or a portion thereof of one of proteins selected from a group comprising SR, TRA2, SF, SRSF, U1 snRNP, U2 snRNP, U4 snRNP, U5 snRNP, U6 snRNP, U11 snRNP, U12 snRNP, U1-C, Sm proteins, FBP11, SF3A, SF3B, U2AF65, U2AF35, PRP19 complex proteins, hnRNP 1, hnRNP 3, hnRNP C, hnRNP G, hnRNP K, hnRNP M, hnRNP U, ASF, SF2, 9G8, SRP20, TRA2a/b, SRP36, SRP35C, SRP30C, SRP38, SRP40, SRP55, SRP75, HUR, NFAR, NF45, YB1, and junction complex proteins. Other exemplary proteins that are associated with RNA splicing include mBBP, polypyrimidine tract binding protein (PTB), nPTB, KH-type splicing regulatory protein (KSRP), SAM68, STAR/GSG, ASD-2b, ASD-1, SUP-12, RNPC1, ASF, snRNP auxiliary factor-35 (U2AF35), ASF/SF2, Nova-1/2, Fox-1/2, Muscle-blind like (MBNL), CELF, Hu, TIA, TIAR, and their aliases. In some embodiments, the first polypeptide is a protein component of a ribonucleoprotein or a portion thereof. In some embodiments, the ribonucleoprotein is a small nuclear ribonucleoprotein (snRNP) or a portion thereof. In some embodiments, the snRNP is U1 snRNP, U2 snRNP, U4 snRNP, U5 snRNP, U6 snRNP, U11 snRNP, U12 snRNP, U4atac snRNP, U5 snRNP, U6atac snRNP or a portion thereof. In some embodiments, the second binding agent is a small molecule. In some embodiments, the first binding agent comprises a small molecule. In some embodiments, the second binding agent comprises a second polynucleotide, a second polypeptide, or a combination thereof. In some embodiments, the second polynucleotide is a second RNA. In some embodiments, the second RNA is a small nuclear RNA (snRNA) or a portion thereof. In some embodiments, the second polypeptide is a protein component of a ribonucleoprotein or a portion thereof. In some embodiments, the ribonucleoprotein is a small nuclear ribonucleoprotein (snRNP) or a portion thereof. In some embodiments, the snRNP is U1 snRNP, U2 snRNP, U4 snRNP, U5 snRNP, U6 snRNP, U11 snRNP, U12 snRNP, U4atac snRNP, U5 snRNP, U6atac snRNP or a portion thereof. In some embodiments, the first complex comprises a binding pocket. In some embodiments, the binding pocket comprises a bulge, or a mutation, or a stem-loop, or any combinations thereof. In some embodiments, the binding pocket comprises a region or sequence adjacent to a stem-loop structure. In some embodiments, the binding pocket does not comprise a bulge, a mutation, or a stem-loop. In some embodiments, the bulge or the mutation causes a 3-dimensional structural change in the first polynucleotide. In some embodiments, a binding agent targeting the binding pocket can induce a 3-dimensional structural change upon binding to the binding pocket. In some embodiments, the second binding agent binds to the binding pocket. In some embodiments, the pre-mRNA comprises a sequence encoded by a gene or a gene variant thereof selected from the group consisting of ABCA4, ABCB4, ABCD1, ACADSB, ADA, ADAMTS13, AGL, ALB, ALDH3A2, ALG6, APC, APOB, AR, ATM, ATP7A, ATR, B2M, BMP2K, BRCA1, BRCA2, BTK, C3, CAT, CDH1, CDH23, CFTR, CHM, COL11A1, COL11A2, COL1A1, COL1A2, COL2A1, COL3A1, COL4A5, COL6A1, COL6A3, COL7A1, COL9A2, COLQ, CUL4B, CYBB, CYP17, CYP19, CYP27, CYP27A1, DES, DMD, DYSF, EGFR, EMD, ETV4, F13A1, F5, F7, F8, FAH, FANCA, FANCC, FANCG, FBN1, FECH, FGA, FGFR2, FGG, FIX, FLNA, FOXM1, FRAS1, GALC, GH1, GHV, HADHA, HBA2, HBB, HEXA, HEXB, HLCS, HMBS, HMGCL, HNF1A, HPRT1, HPRT2, HSF4, HSPG2, HTT, IDS, IKBKAP, INSR, ITGB2, ITGB3, JAG1, KRAS, KRT5, L1CAM, LAMA3, LDLR, LMNA, LPL, MADD, MAPT, MLH1, MSH2, MST1R, MTHFR, MUT, MVK, NF1, NF2, OAT, OPA1, OTC, PAH, PBGD, PCCA, PDH1, PGK1, PHEX, PKD2, PKLR, PLEKHM1, PLKR, POMT2, PRDM1, PRKAR1A, PROC, PSEN1, PTCH1, PTEN, PYGM, RP6KA3, RPGR, RSK2, SBCAD, SCN5A, SERPINA1, SLC12A3, SLC6A8, SMN2, SPINK5, SPTA1, TP53, TRAPPC2, TSC1, TSC2, TSHB, UGT1A1, CD46, and USH2A. In some embodiments, a first NMR spectrum is obtained for the first complex, and a second NMR spectrum is obtained for the second complex. In some embodiments, the method further comprises comparing the first and the second NMR spectrum. In some embodiments, the method further comprises selecting a second binding agent based on a comparison of the first and the second NMR spectrum. In some embodiments, the method further comprises determining a chemical shift of the first and the second NMR spectrums.
In one aspect, provided herein is a method comprising: providing a polynucleotide sample comprising a target polynucleotide, wherein the target polynucleotide comprises a splice site, a branch point (BP), an exonic splicing enhancer (ESE), an exonic splicing silencer (ESS), an intronic splicing enhancer (ISE), an intronic splicing silencer (ISS), or a polypyrimidine tract, or any combinations thereof contacting with the target polynucleotide a first binding agent; and obtaining a first NMR spectrum of the polynucleotide sample using a NMR device. In some embodiments, the target polynucleotide is a target RNA. In some embodiments, the target polynucleotide is a pre-mRNA or a portion thereof. In some embodiments, the target polynucleotide contains at least one exon or a fragment thereof. In some embodiments, the target polynucleotide contains at least one intron or a fragment thereof. In some embodiments, the target polynucleotide contains an exon-intron boundary. In some embodiments, the target polynucleotide contains a splice site or a portion thereof. In some embodiments, the splice site is a 5′ splice site, a cryptic 5′ splice site, 3′ splice site, or a cryptic 3′ splice site, or any combinations thereof. In some embodiments, the target polynucleotide is at least 8 nucleotides in length. In some embodiments, the target polynucleotide is at least 25 nucleotides in length. In some embodiments, the target polynucleotide is at most 1000 nucleotides in length. In some embodiments, the target polynucleotide comprises none or at least one nucleotide isotopically labeled with one or more atomic labels comprising 2H, 13C, 15N, 19F and 31P. In some embodiments, the first binding agent comprises a first polynucleotide, a first polypeptide, or a combination thereof. In some embodiments, the first polynucleotide is a first RNA. In some embodiments, the first RNA is a small nuclear RNA (snRNA) or a portion thereof. In some embodiments, the first polypeptide is a protein component of a ribonucleoprotein or a portion thereof. In some embodiments, the ribonucleoprotein is a small nuclear ribonucleoprotein (snRNP) or a portion thereof. In some embodiments, the snRNP is U1 snRNP, U2 snRNP, U4 snRNP, U5 snRNP, U6 snRNP, U11 snRNP, U12 snRNP, U4atac snRNP, U5 snRNP, U6atac snRNP or a portion thereof. In some embodiments, the polypeptide is a protein or protein component of a trans-acting factor. In some embodiments, the polypeptide is a portion, e.g. a domain or subdomain, of a protein associated with RNA splicing. In some embodiments, the polypeptide is a protein component or a portion thereof of one of proteins selected from a group comprising SR, TRA2, SF, SRSF, U1 snRNP, U2 snRNP, U4 snRNP, U5 snRNP, U6 snRNP, U11 snRNP, U12 snRNP, U1-C, Sm proteins, FBP11, SF3A, SF3B, U2AF65, U2AF35, PRP19 complex proteins, hnRNP 1, hnRNP 3, hnRNP C, hnRNP G, hnRNP K, hnRNP M, hnRNP U, ASF, SF2, 9G8, SRP20, TRA2a/b, SRP36, SRP35C, SRP30C, SRP38, SRP40, SRP55, SRP75, HUR, NFAR, NF45, YB1, and junction complex proteins. Other exemplary proteins that are associated with RNA splicing include mBBP, polypyrimidine tract binding protein (PTB), nPTB, KH-type splicing regulatory protein (KSRP), SAM68, STAR/GSG, ASD-2b, ASD-1, SUP-12, RNPC1, ASF, snRNP auxiliary factor-35 (U2AF35), ASF/SF2, Nova-1/2, Fox-1/2, Muscle-blind like (MBNL), CELF, Hu, TIA, TIAR, and their aliases. In some embodiments, the target polynucleotide and the first binding agent form a first complex. In some embodiments, the first complex comprises a binding pocket. In some embodiments, the binding pocket comprises a bulge, a mutation, or a stem-loop, or any combinations thereof. In some embodiments, the bulge or the mutation causes a 3-dimensional structural change in the first polynucleotide. In some embodiments, the method further comprises contacting with the first complex a second binding agent. In some embodiments, the second binding agent comprises one or more molecules selected from a group comprising a polynucleotide, a polypeptide, a protein, a small molecule, an ion, a salt, and an atom. In some embodiments, the second binding agent is a small molecule. In some embodiments, the small molecule is a library of small molecules. In some embodiments, the second binding agent further causes a detectable structural change in the first complex. In some embodiments, the method further comprises obtaining a second NMR spectrum after contacting with the first complex the second binding agent. In some embodiments, the method further comprises comparing the first and the second NMR spectrum. In some embodiments, the method further comprises determining a chemical shift of the one or more atoms from the first and the second NMR spectrums. In some embodiments, the target polynucleotide comprises a sequence encoded by a gene or a gene variant thereof selected from the group consisting of ABCA4, ABCB4, ABCD1, ACADSB, ADA, ADAMTS13, AGL, ALB, ALDH3A2, ALG6, APC, APOB, AR, ATM, ATP7A, ATR, B2M, BMP2K, BRCA1, BRCA2, BTK, C3, CAT, CDH1, CDH23, CFTR, CHM, COL11A1, COL11A2, COL1A1, COL1A2, COL2A1, COL3A1, COL4A5, COL6A1, COL6A3, COL7A1, COL9A2, COLQ, CUL4B, CYBB, CYP17, CYP19, CYP27, CYP27A1, DES, DMD, DYSF, EGFR, EMD, ETV4, F13A1, F5, F7, F8, FAH, FANCA, FANCC, FANCG, FBN1, FECH, FGA, FGFR2, FGG, FIX, FLNA, FOXM1, FRAS1, GALC, GH1, GHV, HADHA, HBA2, HBB, HEXA, HEXB, HLCS, HMBS, HMGCL, HNF1A, HPRT1, HPRT2, HSF4, HSPG2, HTT, IDS, IKBKAP, INSR, ITGB2, ITGB3, JAG1, KRAS, KRT5, L1CAM, LAMA3, LDLR, LMNA, LPL, MADD, MAPT, MLH1, MSH2, MST1R, MTHFR, MUT, MVK, NF1, NF2, OAT, OPA1, OTC, PAH, PBGD, PCCA, PDH1, PGK1, PHEX, PKD2, PKLR, PLEKHM1, PLKR, POMT2, PRDM1, PRKAR1A, PROC, PSEN1, PTCH1, PTEN, PYGM, RP6KA3, RPGR, RSK2, SBCAD, SCN5A, SERPINA1, SLC12A3, SLC6A8, SMN2, SPINK5, SPTA1, TP53, TRAPPC2, TSC1, TSC2, TSHB, UGT1A1, CD46, and USH2A.
In one aspect, provided herein is a method for selecting a binding agent to a polynucleotide, the method comprising: providing a polynucleotide sample comprising a target polynucleotide; obtaining a first NMR spectrum of the polynucleotide sample using a NMR device; contacting with the polynucleotide sample a binding agent; obtaining a second NMR spectrum of the polynucleotide sample after contacting with the binding agent; comparing the first and the second NMR spectrum; and selecting the binding agent based on the comparison. In some embodiments, the binding agent comprises a small molecule, a polynucleotide, or a protein, or any combinations thereof. In some embodiments, the polynucleotide sample further comprises a first polynucleotide. In some embodiments, the target polynucleotide and the first polynucleotide are added with about equimolar amounts. In some embodiments, the first polynucleotide is a first RNA. In some embodiments, the first RNA is a small nuclear RNA (snRNA) or a portion thereof. In some embodiments, the snRNA is U1-U12 snRNA or a portion thereof. In some embodiments, the target and the first polynucleotide form a duplex. In some embodiments, the duplex contains a binding pocket. In some embodiments, the binding pocket comprises a bulge, or a mutation, or a stem-loop, or any combinations thereof. In some embodiments, the binding pocket does not comprise a mutation, a bulge, or a stem-loop. In some embodiments, the target polynucleotide comprises a splice site, a branch point (BP), an exonic splicing enhancer (ESE), an exonic splicing silencer (ESS), an intronic splicing enhancer (ISE), an intronic splicing silencer (ISS), or a polypyrimidine tract, or any combinations thereof. In some embodiments, the target polynucleotide contains at least one exon or a fragment thereof. In some embodiments, the target polynucleotide contains at least one intron or a fragment thereof. In some embodiments, the target polynucleotide contains at least one exon-intron boundary. In some embodiments, the target polynucleotide is at least 8 nucleotides in length. In some embodiments, the target polynucleotide is at least 25 nucleotides in length. In some embodiments, the target polynucleotide is at most 1000 nucleotides in length. In some embodiments, the target polynucleotide is from 100 to 200 nucleotides in length. In some embodiments, the target polynucleotide comprises none or at least one nucleotide isotopically labeled with one or more atomic labels comprising 2H, 13C, 15N, 19F and 31P. In some embodiments, the method further comprises determining a chemical shift of the first or the second NMR spectrum. In some embodiments, the method further comprises determining a 3-dimensional atomic resolution structure of the polynucleotide and the bound or molecularly interacting small molecule. In some embodiments, the 3-dimensional atomic resolution structure is determined by structure prediction software. In some embodiments, the structure prediction software is Atnos/Candid-program suite. In some embodiments, the structure prediction software is MC-fold|MC-Sym pipeline. In some embodiments, determining the 3-dimensional atomic resolution structure comprises generating a plurality of theoretical structural polynucleotide 2-dimensional models using the nucleotide sequence and one or more 2-dimensional structure prediction algorithms. In some embodiments, the method further comprises generating a plurality of theoretical structural polynucleotide 3-dimensional models using a 3-dimensional structure predicting algorithm using the plurality of theoretical structural polynucleotide 2-dimensional models and optionally one or more known and/or assumed polynucleotide 2-dimensional models. In some embodiments, the method further comprises generating a predicted chemical shift set for each of the plurality of theoretical structural polynucleotide 3-dimensional models. In some embodiments, the method further comprises comparing the predicted chemical shift set to the chemical shift(s) of the one or more atoms. In some embodiments, the NMR device is used to perform resonance assignments and identify NOE-derived distances to drive structure calculations. In some embodiments, the method further comprises selecting one or more theoretical structural polynucleotide 3-dimensional model having an agreement between the respective predicted chemical shift set and the chemical shift(s) of the one or more atoms as the one or more 3-dimensional atomic resolution structures. In some embodiments, the 2-dimensional structure prediction algorithm is nearest neighbor algorithm. In some embodiments, the method further comprises the step: generating one or more refined 3-dimensional atomic resolution structures by refining the selected one or more theoretical structural polynucleotide 3-dimensional model using a modeling software that performs one or more functions comprising energy minimization and/or a molecular dynamics simulation. In some embodiments, the predicted chemical shift set is generated by comparing each theoretical structural polynucleotide 3-dimensional model with a NMR data-structure database. In some embodiments, generating the predicted chemical shift set comprises calculating a polynucleotide structural metric comprising atomic coordinates, stacking interactions, magnetic susceptibility, electromagnetic fields, or dihedral angles from one or more experimentally determined polynucleotide 3-dimensional structures. In some embodiments, the method further comprises using a regression algorithm to generate a set of mathematical functions or objects that describe relationships between experimental chemical shifts and the polynucleotide structural metric of the experimentally determined 3-dimensional polynucleotide structures. In some embodiments, the method further comprises calculating a polynucleotide structural metric for each of the theoretical structural polynucleotide 3-dimensional models. In some embodiments, the method further comprises inputting the polynucleotide structural metric for each of the theoretical structural polynucleotide 3-dimensional models into the set of mathematical functions or objects to generate the predicted chemical shift set. In some embodiments, the regression algorithm is machine learning algorithm comprising a Random Forest algorithm. In some embodiments, the NMR spectrum is obtained with a NMR spectrometer frequency ranging from about 1 GHz MHz to about 20 MHz. In some embodiments, the method further comprises the NMR spectrum is obtained with a NMR spectrometer frequency ranging from 500 MHz to 900 MHz. In some embodiments, the NMR device is AVANCE III. In some embodiments, the method further comprises determining the binding kinetics of the binding agent to the duplex. In some embodiments, the binding kinetics is determined by surface plasmon resonance (SPR), Bio-Layer Interferometry (BLI) technology (Octet Systems), isothermal titration calorimetry (ITC), or fluorescence anisotropy. In one aspect, provided herein is a method comprising: identifying one or more binding pockets formed by a first polynucleotide and a second polynucleotide, wherein the first polynucleotide contains a splice site, a branch point (BP), an exonic splicing enhancer (ESE), an exonic splicing silencer (ESS), an intronic splicing enhancer (ISE), an intronic splicing silencer (ISS), or a polypyrimidine tract, or any combinations thereof; and virtually screening one or more small molecules against the one or more binding pockets, wherein the virtual screening process identifies putative small molecule hits. In some embodiments, identifying one or more binding pockets comprises solving a 3-dimensional atomic resolution structure comprising the first polynucleotide and the second polynucleotide. In some embodiments the 3-dimensional atomic resolution structure is determined by a NMR spectrum. In some embodiments, the method further comprises testing one or more small molecule hits from the virtual screen using an experimental assay. In some embodiments, the experimental assay is surface plasmon resonance (SPR), Bio-Layer Interferometry (BLI) technology (Octet Systems), isothermal titration calorimetry (ITC), or fluorescence anisotropy. In some embodiments, the first polynucleotide is a RNA. In some embodiments, the first polynucleotide is a pre-mRNA. In some embodiments, the splice site is a 5′ splice site, a cryptic 5′ splice site, a 3′ splice site, or a cryptic 3′ splice site. In some embodiments, the first polynucleotide contains at least one intron or a fragment thereof. In some embodiments, the first polynucleotide contains at least one exon or a fragment thereof. In some embodiments, the first polynucleotide contains at least one exon-intron boundary. In some embodiments, the first polynucleotide is at least 8 nucleotides in length. In some embodiments, the first polynucleotide is at least 25 nucleotides in length. In some embodiments, the first polynucleotide is at most 1000 nucleotides in length. In some embodiments, the first polynucleotide is from 100 to 200 nucleotides in length. In some embodiments, the first polynucleotide comprises a sequence encoded by a gene or a gene variant thereof selected from the group consisting of ABCA4, ABCB4, ABCD1, ACADSB, ADA, ADAMTS13, AGL, ALB, ALDH3A2, ALG6, APC, APOB, AR, ATM, ATP7A, ATR, B2M, BMP2K, BRCA1, BRCA2, BTK, C3, CAT, CDH1, CDH23, CFTR, CHM, COL11A1, COL11A2, COL1A1, COL1A2, COL2A1, COL3A1, COL4A5, COL6A1, COL6A3, COL7A1, COL9A2, COLQ, CUL4B, CYBB, CYP17, CYP19, CYP27, CYP27A1, DES, DMD, DYSF, EGFR, EMD, ETV4, F13A1, F5, F7, F8, FAH, FANCA, FANCC, FANCG, FBN1, FECH, FGA, FGFR2, FGG, FIX, FLNA, FOXM1, FRAS1, GALC, GH1, GHV, HADHA, HBA2, HBB, HEXA, HEXB, HLCS, HMBS, HMGCL, HNF1A, HPRT1, HPRT2, HSF4, HSPG2, HTT, IDS, IKBKAP, INSR, ITGB2, ITGB3, JAG1, KRAS, KRT5, L1CAM, LAMA3, LDLR, LMNA, LPL, MADD, MAPT, MLH1, MSH2, MST1R, MTHFR, MUT, MVK, NF1, NF2, OAT, OPA1, OTC, PAH, PBGD, PCCA, PDH1, PGK1, PHEX, PKD2, PKLR, PLEKHM1, PLKR, POMT2, PRDM1, PRKAR1A, PROC, PSEN1, PTCH1, PTEN, PYGM, RP6KA3, RPGR, RSK2, SBCAD, SCN5A, SERPINA1, SLC12A3, SLC6A8, SMN2, SPINK5, SPTA1, TP53, TRAPPC2, TSC1, TSC2, TSHB, UGT1A1, CD46, and USH2A.
The term “polynucleotide” as used herein generally refers to a molecule comprising one or more nucleic acid subunits, or nucleotides, and can be used interchangeably with “nucleic acid” or “oligonucleotide”. A polynucleotide may include one or more nucleotides selected from adenosine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), or variants thereof. A nucleotide generally includes a nucleoside and at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more phosphate (PO3) groups. A nucleotide can include a nucleobase, a five-carbon sugar (either ribose or deoxyribose), and one or more phosphate groups. Ribonucleotides are nucleotides in which the sugar is ribose. Deoxyribonucleotides are nucleotides in which the sugar is deoxyribose. A nucleotide can be a nucleoside monophosphate or a nucleoside polyphosphate. A nucleotide can be a deoxyribonucleoside polyphosphate, such as, e.g., a deoxyribonucleoside triphosphate (dNTP), which can be selected from deoxyadenosine triphosphate (dATP), deoxycytidine triphosphate (dCTP), deoxyguanosine triphosphate (dGTP), uridine triphosphate (dUTP) and deoxythymidine triphosphate (dTTP) dNTPs, that include detectable tags, such as luminescent tags or markers (e.g., fluorophores). A nucleotide can be isotopically labeled with, for example, 2H, 13C, 15N, 19F, and 31P. A nucleotide can include any subunit that can be incorporated into a growing nucleic acid strand. Such subunit can be an A, C, G, T, or U, or any other subunit that is specific to one or more complementary A, C, G, T or U, or complementary to a purine (i.e., A or G, or variant thereof) or a pyrimidine (i.e., C, T or U, or variant thereof). In some examples, a polynucleotide is deoxyribonucleic acid (DNA), ribonucleic acid (RNA), or derivatives or variants thereof. In some embodiments, a polynucleotide is a short interfering RNA (siRNA), a microRNA (miRNA), a plasmid DNA (pDNA), a short hairpin RNA (shRNA), small nuclear RNA (snRNA), messenger RNA (mRNA), precursor mRNA (pre-mRNA), antisense RNA (asRNA), to name a few, and encompasses both the nucleotide sequence and any structural embodiments thereof, such as single-stranded, double-stranded, triple-stranded, helical, hairpin, etc. In some cases, a polynucleotide molecule is circular. A polynucleotide can have various lengths. A nucleic acid molecule can have a length of at least about 10 bases, 20 bases, 30 bases, 40 bases, 50 bases, 100 bases, 200 bases, 300 bases, 400 bases, 500 bases, 1 kilobase (kb), 2 kb, 3, kb, 4 kb, 5 kb, 10 kb, 50 kb, or more. A polynucleotide can be isolated from a cell or a tissue. As embodied herein, the polynucleotide sequences may comprise isolated and purified DNA/RNA molecules, synthetic DNA/RNA molecules, synthetic DNA/RNA analogs.
Polynucleotides may include one or more nucleotide variants, including nonstandard nucleotide(s), non-natural nucleotide(s), nucleotide analog(s) and/or modified nucleotides. Examples of modified nucleotides include, but are not limited to diaminopurine, 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xantine, 4-acetylcytosine, 5-(carboxyhydroxylmethyl)uracil, 5-carboxymethylaminomethyl-2-thiouridine, 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, inosine, N6-isopentenyladenine, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine, 5′-methoxycarboxymethyluracil, 5-methoxyuracil, 2-methylthio-D46-isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid(v), 5-methyl-2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, 2,6-diaminopurine and the like. In some cases, nucleotides may include modifications in their phosphate moieties, including modifications to a triphosphate moiety. Non-limiting examples of such modifications include phosphate chains of greater length (e.g., a phosphate chain having, 4, 5, 6, 7, 8, 9, 10 or more phosphate moieties) and modifications with thiol moieties (e.g., alpha-thiotriphosphate and beta-thiotriphosphates). Nucleic acid molecules may also be modified at the base moiety (e.g., at one or more atoms that typically are available to form a hydrogen bond with a complementary nucleotide and/or at one or more atoms that are not typically capable of forming a hydrogen bond with a complementary nucleotide), sugar moiety or phosphate backbone. Nucleic acid molecules may also contain amine-modified groups, such as amino ally 1-dUTP (aa-dUTP) and aminohexhylacrylamide-dCTP (aha-dCTP) to allow covalent attachment of amine reactive moieties, such as N-hydroxysuccinimide esters (NHS). Alternatives to standard DNA base pairs or RNA base pairs in the oligonucleotides of the present disclosure can provide higher density in bits per cubic mm, higher safety (resistant to accidental or purposeful synthesis of natural toxins), easier discrimination in photo-programmed polymerases, or lower secondary structure. Such alternative base pairs compatible with natural and mutant polymerases for de novo and/or amplification synthesis are described in Betz K, Malyshev D A, Lavergne T, Welte W, Diederichs K, Dwyer T J, Ordoukhanian P, Romesberg F E, Marx A. Nat. Chem. Biol. 2012 Jul; 8(7):612-4, which is herein incorporated by reference for all purposes.
The term “polynucleotide sample” includes a polynucleotide or a certain quantity (e.g., a number of moles or a concentration of polynucleotide) of the polynucleotide, optionally dissolved in a solvent, wherein the polynucleotides in the polynucleotide sample has one singular nucleotide sequence. In some examples, the polynucleotides in the polynucleotide sample may only have the same nucleotide, or the polynucleotide sample can contain polynucleotides synthesized with different nucleotides. In some examples, the polynucleotides are free of any labels. In some other examples, the polynucleotides are labeled with one or more atomic labels.
As used herein, the term “protein” refers to a long polymer of amino acid residues linked via peptide bonds and which may be composed of one or more polypeptide chains. More specifically, the term “protein” refers to a molecule composed of one or more chains of amino acids in a specific order; for example, the order as determined by the base sequence of nucleotides in the gene coding for the protein. Proteins are essential for the structure, function, and regulation of the body's cells, tissues, and organs, and each protein has unique functions. Examples are hormones, enzymes, antibodies, and any fragments thereof. In some cases, a protein can be a portion of the protein, for example, a domain, a subdomain, or a motif of the protein. In some cases, a protein can be a variant (or mutation) of the protein, wherein one or more amino acid residues are inserted into, deleted from, and/or substituted into the naturally occurring (or at least a known) amino acid sequence of the protein. A protein or a variant thereof can be naturally occurring or recombinant.
As used herein, the term “peptide” is a polymer in which the monomers are amino acids and which are joined together through amide bonds and alternatively referred to as a polypeptide. In the context of this specification it should be appreciated that the amino acids may be the L-optical isomer or the D-optical isomer. Peptides are two or more amino acid monomers long, and often can be more than 20 amino acid monomers long.
A binding pocket can refer to any location on a polynucleotide (e.g. RNA) with sufficient structural complexity (e.g. secondary or tertiary structure) that enables specific interactions of a binding agent on that location to influence the confirmation and structure of the RNA, such that it essential inhibits or activates a splicing process. A binding pocket can contain a bulge, a non-mutation single and duplex RNA, a stem-loop, or sequences adjacent to a stem-loop, mutation-containing single and duplex RNA. A binding pocket may or may not comprise a mutation. In some cases, a binding pocket comprises a sequence portion with a mutation upstream/downstream of the binding pocket, wherein such mutation impacts the structure of RNA at the binding pocket.
A “binding agent” as used herein refers to a molecule that can specifically bind to a nucleic acid molecule, a complex formed by two or more nucleic acid molecules, or a complex formed by a nucleic acid and protein. A binding agent may be a protein, peptide, nucleic acid, carbohydrate, lipid, or small molecular weight compound. A binding agent disclosed herein can modulate or correct RNA mis-splicing.
As used here, a “small molecular weight compound” can be used interchangeably with “small molecule” or “small organic molecule”. Small molecules refer to compounds other than peptides, oligonucleotides, or analogs thereof and typically have molecular weights of less than about 2,000 Daltons.
A ribonucleoprotein (RNP) refers to a nucleoprotein that contains RNA. It is an association that combines a ribonucleic acid and an RNA-binding protein together. Such a combination can also be referred to as a protein-RNA complex. These complexes can function in a number of biological functions that include DNA replication, regulating gene expression and regulating the metabolism of RNA. A few examples of RNPs include the ribosome, the enzyme telomerase, vault ribonucleoproteins, RNase P, heterogeneous nuclear RNPs (hnRNPs) and small nuclear RNPs (snRNPs).
Nascent RNA transcripts from protein-coding genes and mRNA processing intermediates, collectively referred to as pre-mRNA, are generally bound by proteins in the nuclei of eukaryotic cells. From the time nascent transcripts first emerge from RNA polymerase II until mature mRNAs are transported into the cytoplasm, the RNA molecules are associated with an abundant set of nuclear proteins. These proteins are the major protein components of hnRNPs, which contain heterogeneous nuclear RNA (hnRNA), a collective term referring to pre-mRNA and other nuclear RNAs of various sizes.
Splicing factors are proteins or protein complexes that function in splicing or splicing regulation. Splicing factors include those that may be required for constitutive splicing, regulated splicing and splicing of specific messages or groups of messages. A group of related proteins, the SR proteins, can function in constitutive pre-mRNA splicing and may also regulate alternative splice-site selection in a concentration-dependent manner. SR proteins have a modular structure that consists of one or two RNA-recognition motifs (RRMs) and a C-terminal rich in arginine and serine residues (RS domain). Their activity in alternative splicing may be antagonized by members of the hnRNP A/B family of proteins. Splicing factors can also include proteins that are associated with one or more snRNAs. SR proteins in human include SC35, SRp55, SRp40, SRm300, SFRS10, TASR-1, TASR-2, SF2/ASF, 9G8, SRp75, SRp30c, SRp20 and P54/SFRS11. Other splicing factors in human that can be involved in splice site selection include, but are not limited to, U2 snRNA auxiliary factors (e.g. U2AF65, U2AF35), Urp/U2AF1-RS2, SF1/BBP, CBP80, CBP 20, SF1 and PTB/hnRNP1. The hnRNP proteins in humans include, but are not limited to, A1, A2/B1, L, M, K, U, F, H, G, R, I and C1/C2. Splicing factors may be stably or transiently associated with a snRNP or with a transcript.
The term “intron” refers to both the DNA sequence within a gene and the corresponding sequence in the unprocessed RNA transcript. As part of the RNA processing pathway, introns are removed by RNA splicing either shortly after or concurrent with transcription. Introns are found in the genes of most organisms and many viruses. They can be located in a wide range of genes, including those that generate proteins, ribosomal RNA (rRNA), and transfer RNA (tRNA). An “exon” can be any part of a gene that encodes a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term “exon” refers to both the DNA sequence within a gene and to the corresponding sequence in RNA transcripts. A “spliceosome” is assembled from snRNAs and protein complexes. The spliceosome removes introns from a transcribed pre-mRNA.
As used herein, the term “target” or “target molecule” describes a molecule that can be selected from any biological molecule which is modulated by a binding agent bound to a recognition portion on the molecule. The modulation can be activation, inhibition, or any structural change. For example, in some embodiments of the present disclosure, a binding agent can bind to a target molecule (e.g. mRNA) and modulate RNA splicing to correct some defects in splicing. Target molecules encompassed by the present technology can include a diverse array of compounds including polynucleotides, proteins, polypeptides, oligopeptides, ribonucleoproteins, and nucleic acids, including RNA and DNA. In some cases, the target molecule can be target polynucleotide, target RNA, or target DNA. The recognition portion on a molecule refers to a structural portion that interacts with the binding agent. The recognition portion can be a binding pocket, (e.g. a binding pocket on the mRNA), formed by one or more molecules (e.g. RNA and RNA duplexes). In various embodiments provided herein, the binding pocket formed by a target polynucleotide comprises a bulge, or a mutation, or a stem-loop, or any combinations thereof, and can accommodate binding agents such as small molecules. In some embodiments, the binding pocket may not comprise a bulge, a mutation, or a stem-loop.
Splicing or RNA splicing typically refers to the editing of the nascent precursor messenger RNA (pre-mRNA) transcript into a mature messenger RNA (mRNA). Splicing is a biochemical process which includes the removal of introns followed by exon ligation. Sequential transesterification reactions are initiated by a nucleophilic attack of the 5′ splice site (5′ss) by the branch adenosine (branch point; BP) in the downstream intron resulting in the formation of an intron lariat intermediate with a 2′,5′-phosphodiester linkage. This is followed by a 5′ss-mediated attack on the 3′ splice site (3′ss), leading to the removal of the intron lariat and the formation of the spliced RNA product.
Splicing can be regulated by various cis-acting elements and trans-acting factors. Cis-acting elements are sequences of the mRNA and can include core consensus sequences and other regulatory elements. Core consensus sequences typically can refer to conserved RNA sequence motifs, including the 5′ss, 3′ss, polypyrimidine tract and BP region, which can function for spliceosome recruitment. Core consensus sequences can be referred to as construct scaffolds when used in vitro for experimentation. BP refers to a partially conserved sequence of pre-mRNA, generally less than 50 nucleotides upstream of the 3′ss. BP reacts with the 5′ss during the first step of the splicing reaction. Other regulatory cis-acting elements can include exonic splicing enhancer (ESE), exonic splicing silencer (ESS), intronic splicing enhancer (ISE), and intronic splicing silencer (ISS). Trans-acting factors can be proteins or ribonucleoproteins which bind to cis-acting elements.
Splice site identification and regulated splicing can be accomplished principally by two dynamic macromolecular machines, the major (U2-dependent) and minor (U12-dependent) spliceosomes. Each spliceosome contains five snRNPs: U1, U2, U4, U5 and U6 snRNPs for the major spliceosome (which processes ˜95.5% of all introns); and U11, U12, U4atac, U5 and U6atac snRNPs for the minor spliceosome. Spliceosome recognition of consensus sequence elements along with particular structural RNA features. Usually, the U1 snRNP binds to the GU sequence at the 5′ss of an intron. In addition, a number of proteins including U2 small nuclear RNA auxiliary factor 1 (U2AF35) and USAF2 (U2AF65) and splicing factor 1 (SF1, also known as branch point binding protein) may sometimes be required for major spliceosome assembly. U2AF1 can bind at the 3′ss of the intron, and U2AF2 can bind to the polypyrimidine tract. SF1 can bind to the intron BP sequence. The U2 snRNP displaces SF1 and binds to the branch point sequence and ATP is hydrolyzed. The U5/U4/U6 snRNP trimer binds, and the U5 snRNP binds exons at the 5′site, with U6 binding to U2. The U1 snRNP is then released, U5 shifts from exon to intron, and the U6 binds at the 5′ss. U4 then is released, and U6/U2 catalyzes transesterification reaction, making the 5′-end of the intron ligate to the “A” on intron and form a lariat. U5 binds exon at 3′ss, and the 5′site is cleaved, resulting in the formation of the lariat. The U2/U5/U6 remain bound to the lariat, and the 3′ site is cleaved and exons are ligated using ATP hydrolysis. The spliced RNA is released, the lariat is released and degraded, and the snRNPs are recycled. Spliceosome recognition of consensus sequence elements at the 5′ss, 3′ss and BP sites is one of the steps in the splicing pathway, and can be modulated by ESEs, ISEs, ESSs, and ISSs, which can be recognized by auxiliary splicing factors, including SR proteins and hnRNPs. Polypyrimidine tract-binding protein (PTBP, or also known as PTB or hnRNP1) can bind to the polypyrimidine tract of introns and may promote RNA looping.
Alternative splicing is a mechanism by which a single gene may eventually give rise to several different proteins. Alternative splicing can be accomplished by the concerted action of a variety of different proteins, termed “alternative splicing regulatory proteins,” that associate with the pre-mRNA, and cause distinct alternative exons to be included in the mature mRNA. These alternative forms of the gene's transcript can give rise to distinct isoforms of the specified protein. Sequences in pre-mRNA molecules that can bind to alternative splicing regulatory proteins can be found in introns or exons, including, but not limited to, ISS, ISE, ESS, ESE, and polypyrimidine tract. Many mutations or upstream signaling pathways can alter splicing patterns. For example, mutations can be cis-acting elements, and can be located in core consensus sequences (e.g. 5′ss, 3′ss and BP) or the regulatory elements that modulate spliceosome recruitment, including ESE, ESS, ISE, and ISS, or regions that modulate the RNA structure, such as in stem loops. Mutations can also reside in a sequence considered an alternative 5′ss that is activated and recognized by the splicing machinery as a result of a mutation, or a mutation within a 5′ss can cause the use of an alternative 5′ss. For example, mis-signaling can induce more or less of a trans-acting splicing factor to bind to pre-mRNAs and modulate their production of a particular mRNA isoform.
Cryptic splice site, for example, cryptic 5′ss and cryptic 3′ss, can refer to a splice site that is not normally recognized by the spliceosome and therefore are usually in the dormant state. Cryptic splice site can be recognized or activated either by mutations in cis-acting elements or trans-acting factors.
Splicing factors can be de-regulated in cancer, and in some cases, are themselves oncogenes or pseudo-oncogenes and can contribute to positive feedback loops driving cancer progression. For example, CD44 splice isoform switching in human and mouse epithelium is essential for epithelial-mesenchymal transition and breast cancer progression. FOXM1 is expressed in three distinct splice variants, which arise from the same gene through differential splicing of the two facultative exons. FoxM1B and FoxM1C are both transcriptionally active and proteins from these transcripts drive cancer cell cycle progression; whereas FoxM1A is transcriptionally inactive because the addition of an exon abolishes any transcriptional activity of FOXM1, acting as a dominant negative form when expressed; and can stop cancer cell cycle progression. Another example is IG20/MADD, which are two splice isoforms having apposing effects in cancer cells and mice, differing by a single exon. IG20 is an anti-apoptotic form that prevents TRAIL induced apoptosis whereas MADD is a pro-apoptotic form that induced TRAIL induced apoptosis. Indeed, RNA mis-splicing underlies a growing number of human diseases with substantial societal consequences.
However, targeting RNA splicing, more specifically targeting RNA targets, is intractable due to limited available data such as 2-dimensional, and 3-dimensional structures of RNA, chemotypes that engender RNA binding affinity or selectivity, chemotypes that engender RNA binding affinity and selectivity at particular mRNA splicing hot spots, and identification of RNA structural elements that form small molecule binding pockets. In addition, RNA splicing of the pre-mRNA, is heavily influenced by a kinetic component, such that, particular 3-dimensional structures are form by the RNA and/or RNA-protein complexes in particular moments in time. RNA splicing is a dynamic process, involving several trans acting protein factors that bind to the RNA and influence RNA secondary and tertiary structure. Thus, screening for specific and selective small molecular binding agents to correct RNA splicing, may sometimes require the use of tools that can accurately assess binding of multiple agents onto RNA, measure/confirm structural changes as a result of the binding agents, and, as a result, determine changes in molecular associations and sometimes kinetic affinities (dissociation constants) of particular key proteins onto particular key binding regions, or mRNA hot spots, that influence the direction of RNA splicing to include/exclude key regions of the RNA that drive isoform RNA expression. Thus, small molecule interactions with these 3-D binding pockets can influence and correct for RNA mis-expression in disease. Screening of small molecule libraries for binding RNA targets could generate data about chemotypes that engender RNA binding. However, few small molecule-screening collections are enriched in RNA binders; in fact, most libraries are biased with compounds that bind to proteins. In addition, several of the available RNA binder libraries are non-specific or selective to particular RNAs. To address these needs and others, the present disclosure in various embodiments provides a structure-based screening platform that can be used to identify small molecules that bind to RNA and/or RNA protein complex, design novel molecules that can fit into particular RNA binding pockets, and improve specificity and selectivity of small molecules towards disease-associated pre-mRNA splicing defects.
The present disclosure in various embodiments provides a structure-based screening platform or method to identify small molecules that can bind polynucleotides and/or complexes formed by polynucleotides and proteins (i.e. polynucleotide-protein complexes) and influence the conformation of the RNA such that it influences the RNA expression. The present disclosure also provides methods to identify small molecules that can bind polynucleotides and/or polynucleotide-protein complexes involved in RNA splicing. The present disclosure also provides methods to identify small molecules that can influence the structure of the RNA and the binding affinity of the trans-acting proteins. In some embodiments, the target polynucleotide is RNA. In some embodiments, the target polynucleotide is mRNA. In some embodiments, the target polynucleotide is a pre-mRNA or a portion of the pre-mRNA. In some embodiments, the target polynucleotide contains a splice site or a portion thereof which includes a 5′ss, a cryptic 5′ss, a 3′ss, or a cryptic 3′ss. In some embodiments, the target polynucleotide comprises one or more other cis-acting elements or a portion thereof, including BP, ESE, ESS, ISE, ISS, and polypyrimidine tract. In some embodiments, the target polynucleotide comprises at least one intron or a fragment thereof. In some embodiments, the target polynucleotide comprises two, three, four, five, six, or more introns or fragments thereof. In some embodiments, the target polynucleotide comprises at least one exon or a fragment thereof. In some embodiments, the target polynucleotide comprises two, three, four, five, six, or more exons or fragments thereof. In some embodiments, the target polynucleotide comprises at least one exon-intron boundary. As used herein, the exon-intron boundary can refer to any polynucleotide that contains intron and exon sequences located at the boundary between an intron and an exon. In some embodiments, the exon-intron boundary may contain a complete sequence of an exon and a fragment sequence of an intron. In some other embodiments, the exon-intron boundary may contain a complete sequence of an intron and a fragment sequence of an exon. In some cases, the target polynucleotide contains both exon and intron sequences, and it is to be understood that the order of exon and intron can vary. For example, the exon can be on the 5′ end of the intron, or the exon can be on the 3′ end of the intron. In some embodiments, the exon-intron boundary comprises 5′ss. In some embodiments, the exon-intron boundary comprises 3′ss. The target polynucleotide can be in various lengths. For example, in some embodiments, the target polynucleotide is at least 5 nucleotides, at least 8 nucleotides, at least 10 nucleotides, at least 15 nucleotides, at least 20 nucleotides, at least 25 nucleotides, at least 30 nucleotides, at least 35 nucleotides, at least 40 nucleotides, at least 45 nucleotides, at least 50 nucleotides, at least 55 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 75 nucleotides, at least 80 nucleotides, at least 85 nucleotides, at least 90 nucleotides, at least 95 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length. In some embodiments, the target polynucleotide is at most 20 nucleotides, at most 50 nucleotides, at most 100 nucleotides, at most 150 nucleotides, at most 200 nucleotides, at most 300 nucleotides, at most 400 nucleotides, at most 500 nucleotides, at most 600 nucleotides, at most 700 nucleotides, at most 800 nucleotides, at most 900 nucleotides, or at most 1000 nucleotides in length. In some embodiments, the target polynucleotide is from 3 to 5 nucleotides, from 5 to 10 nucleotides, from 10-20 nucleotides, from 20 to 40 nucleotides, from 40 to 50 nucleotides, from 50 to 100 nucleotides, from 100 to 150 nucleotides, from 150 to 200 nucleotides, from 200 to 250 nucleotides, from 250 to 300 nucleotides, from 300 to 350 nucleotides, from 350 to 400 nucleotides, from 400 to 450 nucleotides, or from 450 to 500 nucleotides in length.
In some embodiments, the polynucleotide comprises a sequence encoded by a gene selected from the group consisting of ABCA4, ABCB4, ABCD1, ACADSB, ADA, ADAMTS13, AGL, ALB, ALDH3A2, ALG6, APC, APOB, AR, ATM, ATP7A, ATR, B2M, BMP2K, BRCA1, BRCA2, BTK, C3, CAT, CDH1, CDH23, CFTR, CHM, COL11A1, COL11A2, COL1A1, COL1A2, COL2A1, COL3A1, COL4A5, COL6A1, COL6A3, COL7A1, COL9A2, COLQ, CUL4B, CYBB, CYP17, CYP19, CYP27, CYP27A1, DES, DMD, DYSF, EGFR, EMD, ETV4, F13A1, F5, F7, F8, FAH, FANCA, FANCC, FANCG, FBN1, FECH, FGA, FGFR2, FGG, FIX, FLNA, FOXM1, FRAS1, GALC, GH1, GHV, HADHA, HBA2, HBB, HEXA, HEXB, HLCS, HMBS, HMGCL, HNF1A, HPRT1, HPRT2, HSF4, HSPG2, HTT, IDS, IKBKAP, INSR, ITGB2, ITGB3, JAG1, KRAS, KRT5, L1CAM, LAMA3, LDLR, LMNA, LPL, MADD, MAPT, MLH1, MSH2, MST1R, MTHFR, MUT, MVK, NF1, NF2, OAT, OPA1, OTC, PAH, PBGD, PCCA, PDH1, PGK1, PHEX, PKD2, PKLR, PLEKHM1, PLKR, POMT2, PRDM1, PRKAR1A, PROC, PSEN1, PTCH1, PTEN, PYGM, RP6KA3, RPGR, RSK2, SBCAD, SCN5A, SERPINA1, SLC12A3, SLC6A8, SMN2, SPINK5, SPTA1, TP53, TRAPPC2, TSC1, TSC2, TSHB, UGT1A1, CD46, and USH2A. In some embodiments, the polynucleotide is a pre-mRNA encoded by a genetic sequence with at least about 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the above mentioned gene.
In some embodiments, the target polynucleotide may be labeled or modified on one or more nucleotides.
The present disclosure provides a platform screening method to identify small molecule binding agents to bind to polynucleotides and/or polynucleotide-protein complexes by nuclear magnetic resonance (NMR) spectroscopy. In some embodiments, the target polynucleotide is free of any label. In some embodiments, the target polynucleotides comprise no nucleotide that is isotopically labeled. In some other embodiments, the target polynucleotides comprise at least one nucleotide isotopically labeled with one or more atomic labels. In some embodiments, the target polynucleotides comprise two or more nucleotides that are isotopically labeled. Typically, the atomic labels used in NMR spectroscopy can include 2H, 13C, 15N, 19F, and 31F.
In various embodiments of the present disclosure, at least one binding agent is introduced in a sample containing a target polynucleotide. In some embodiments, the target polynucleotide itself may form a recognition portion or a binding pocket to accommodate a binding agent such as a small molecule. In some embodiments, the target polynucleotide forms a complex with the at least one binding agent to form a recognition portion or a binding pocket to accommodate additional binding agent(s). The binding agent disclosed herein can be a polynucleotide, a polypeptide, a ribonucleoprotein, a small molecule, or any combinations thereof. In some embodiments, the binding agent can be a mixture of binding agents. In some embodiments, two or more binding agents are introduced to the target polynucleotide. In some embodiments, two or more binding agents are introduced together with the target polynucleotide. In some embodiments, two or more binding agents can be introduced in sequential order to the target polynucleotide.
In some embodiments, the binding agent is a polynucleotide. In a preferred embodiment, the binding agent is a snRNA or a portion thereof. In some embodiments, the binding agent is U1 snRNA or a portion thereof. In some embodiments, the binding agent is U2 snRNA or a portion thereof. In some other embodiments, the binding agent is U1 snRNA, U2 snRNA, U4 snRNA, U5 snRNA, U6 snRNA, U11 snRNA, U12 snRNA, U4atac snRNA, U5 snRNA, U6atac snRNA, or any portions thereof. In some embodiments, the binding agent is a polypeptide. In some embodiments, the binding agent is a protein component of a ribonucleoprotein. In some embodiments, the binding agent is a domain, a motif, or any portion of a protein. In some embodiments, the binding agent can be a protein or a portion thereof selected from the group comprising U1 snRNP, U2 snRNP, U4 snRNP, U5 snRNP, U6 snRNP, U11 snRNP, U12 snRNP, U4atac snRNP, U5 snRNP, U6atac snRNP, or any combinations thereof. In some embodiments, the binding agent can be an auxiliary splicing factor or a portion thereof. Exemplary auxiliary splicing factors include, but are not limited to, SR proteins and hnRNPs. In some embodiments, the binding agent can be a protein or a portion thereof selected from the group comprising SC35, SRp55, SRp40, SRm300, SFRS10, TASR-1, TASR-2, SF2/ASF, 9G8, SRp75, SRp30c, SRp20, P54/SFRS11, U2AF65, U2AF35, Urp/U2AF1-RS2, SF1/BBP, CBP80, CBP 20, PTB/hnRNP I, A1 hnRNP, A2/B1 hnRNP, L hnRNP, M hnRNP, K hnRNP, U hnRNP, F hnRNP, H hnRNP, G hnRNP, R hnRNP, I hnRNP, C1/C2 hnRNP, or any combinations thereof. In some embodiments, the polypeptide is a protein or protein component of a trans-acting factor. In some embodiments, the polypeptide is a portion, e.g. a domain or subdomain, of a protein associated with RNA splicing. In some embodiments, the polypeptide is a protein component or a portion thereof of one of proteins selected from a group comprising SR, TRA2, SF, SRSF, U1 snRNP, U2 snRNP, U4 snRNP, U5 snRNP, U6 snRNP, U11 snRNP, U12 snRNP, U1-C, Sm proteins, FBP11, SF3A, SF3B, U2AF65, U2AF35, PRP19 complex proteins, hnRNP 1, hnRNP 3, hnRNP C, hnRNP G, hnRNP K, hnRNP M, hnRNP U, ASF, SF2, 9G8, SRP20, TRA2a/b, SRP36, SRP35C, SRP30C, SRP38, SRP40, SRP55, SRP75, HUR, NFAR, NF45, YB1, and junction complex proteins. Other exemplary proteins that are associated with RNA splicing include mBBP, polypyrimidine tract binding protein (PTB), nPTB, KH-type splicing regulatory protein (KSRP), SAM68, STAR/GSG, ASD-2b, ASD-1, SUP-12, RNPC1, ASF, snRNP auxiliary factor-35 (U2AF35), ASF/SF2, Nova-1/2, Fox-1/2, Muscle-blind like (MBNL), CELF, Hu, TIA, TIAR, and their aliases. In some embodiments, the protein is a protein variant, a mutant, or a portion of the protein. In some embodiments, the binding agent is a small molecule. In some embodiments, the binding agent is a library of small molecules. Various small molecule libraries can be used with the methods disclosed herein.
In some embodiments, a first binding agent is introduced to the target polynucleotide, thereby allowing the first binding agent and the target polynucleotide to form a first complex. In some embodiments, a second binding agent is introduced to the target polynucleotides, thereby contacting the first complex. In some embodiments, the second binding agent forms a second complex with the first complex. The complex can be a nucleic acid duplex, or a polynucleotide-protein complex, or a polynucleotide-small molecule complex. For example, a first binding agent comprising a polynucleotide can be introduced to a target polynucleotide to form a duplex, and a second binding agent comprising a polypeptide and a small molecule can then be introduced. For another example, a first binding agent comprising a polynucleotide can be introduced to a target polynucleotide to form a duplex, and a second binding agent comprising a small molecule can then be introduced. For yet another example, a first binding agent comprising a polypeptide can be introduced to a target polynucleotide, and a second binding agent comprising a small molecule can then be introduced. It is to be understood that there is no required order for introducing the binding agent to a target polynucleotide. In some embodiments, a binding agent can comprise more than one molecule, and those molecules can be introduced simultaneously or sequentially.
A binding pocket formed by a polynucleotide, or polynucleotide-polynucleotide complex, or polynucleotide-protein complex can be used to accommodate a binding agent such as a small molecule. In various embodiments, a target polynucleotide forms a binding pocket. In some embodiments, a target polynucleotide binds to additional polynucleotide to form a complex which comprises a binding pocket. In some embodiments, a target polynucleotide binds to a protein-RNA complex to form a binding pocket. In some embodiments, a binding pocket comprises a bulge, or a mutation, or a stem-loop, or any combinations thereof. In some embodiments, a binding pocket may not comprise a bulge, a mutation, or a stem-loop.
Mutations in cis-acting elements of splicing can alter splicing patterns. Common mutations can be found in the core consensus sequences, including 5′ss, 3′ss, and BP regions, or other regulatory elements, including ESE, ESS, ISE, and ISS. Mutations in these cis-acting elements can result in multiple diseases. Exemplary diseases are included in Tables 1-3. The present disclosure provides methods to screen small molecule binding agents that can target pre-mRNA containing one or more mutations in the cis-acting elements. In some embodiments, the present disclosure provides methods to screen small molecule binding agents that can target pre-mRNA containing one or more mutations in the splice sites or BP regions. In some embodiments, the present disclosure provides methods to screen small molecule binding agents that can target pre-mRNA containing one or more mutations in other regulatory elements, for example, ESE, ESS, ISE, and ISS.
Mutations in cis-acting elements, and upstream mis-signaling, can induce 3-dimensional structural change in pre-mRNA. Mutations in cis-acting elements and upstream mis-signaling can induce 3-dimensional structural change in pre-mRNA when the pre-mRNA is bound to at least one snRNA, or at least one snRNP, or at least one other auxiliary splicing factor. In some embodiments, a binding pocket can be formed when the 5′ss is bound to U1 snRNA or a portion thereof. A binding pocket can contain a bulge, a non-mutation single-stranded or duplex RNA, a stem-loop, or sequences adjacent to a stem-loop, mutation-containing single and duplex RNA. A binding pocket may or may not comprise a mutation. In some cases, a binding pocket comprises a sequence portion with a mutation upstream/downstream of the binding pocket, wherein such mutation impacts the structure of RNA at the binding pocket. In some embodiments, a bulge can be formed when the 5′ss is bound to U1 snRNA or a portion thereof with or without other protein binding partners associated with splicing. In some embodiments, a bulge can be induced to form when 5′ss containing at least one mutation is bound to U1 snRNA or a portion thereof. In some embodiments, a mutation can induce the use of a cryptic 5′ss and create a bulge when it is bound to the U1 snRNA or a portion thereof. In some embodiments, a binding pocket can be formed when the 3′ss is bound to U2AF or a portion thereof. In some embodiments, a mutation can induce the use of a cryptic 3′ss and create a binding pocket when it is bound to the U2AF or a portion thereof. In some embodiments, a binding pocket can be formed when BP region is bound to U2 snRNA. The protein components of snRNP may or may not present to form such a binding pocket. Exemplary 5′ss sequences are summarized in Table 1. A polynucleotide in the methods disclosed herein can contain any one of the 5′ss sequences summarized in Table 1. In some embodiments, a small molecule can bind to the bulge.
In one aspect of the present disclosure, the binding pocket formed on the target polynucleotide comprises a bulge. In some embodiments, a bulge is naturally occurring. In some embodiments, a bulge is formed by non-canonical base-pairing between the splice site and the small nuclear RNA. For example, a bulge can be formed by non-canonical base-pairing between the 5′ss and any one of the U1-U12 snRNAs. The bulge can comprise 1 nucleotide, 2 nucleotide, 3 nucleotide, 4 nucleotide, 5 nucleotide, 6 nucleotide, 7 nucleotide, 8 nucleotide, 9 nucleotide, 10 nucleotide, 11 nucleotide, 12 nucleotide, 13 nucleotide, 14 nucleotide, or 15 nucleotide.
In some embodiments, 3-dimensional structural changes can be induced by a mutation or a mis-signaling upstream without bulge formation. In some embodiment, a bulge may be formed without any mutation in a splice site. More exemplary 5′ss mutations with or without bulge formation are summarized in Table 1. A polynucleotide in the methods disclosed herein can contain any one of the 5′ss sequences summarized in Table 1. In some embodiments, a recognition portion can be formed by a mutation in any of the cis-acting elements. In some embodiments, a small molecule can bind to a binding pocket that is induced by a mutation.
In some embodiments, a mutation in authentic 5′ss can activate usage of cryptic 5′ss during splicing. Exemplary mutated authentic 5′ss targets and corresponding activated cryptic splice site targets are summarized in Table 2.
In some embodiments, a mutation can be in one of the regulatory elements including ESE, ESS, ISE, and ISS.
In some embodiments, a target polynucleotide comprises a splice site, wherein the splice site comprises a sequence selected from the group consisting of NGAgunvrn, NHAdddddn, NNBnnnnnn, and NHAddmhvk; wherein N (or n) is A, U, G or C; B is C, G, or U; H is A, C, or U; d is a, g, or u; m is a or c; r is a or g; v is a, c or g; k is g or t.
In some embodiments, the target polynucleotide comprises a splice site, wherein the splice site comprises a sequence selected from the group consisting of NNBgunnnn, NNBhunrmn, or NNBgvnrmn, wherein N/n is A, U, G or C; B is C, G, or U; h is a, c, or t; v is a, c or g.
In some embodiments, the target polynucleotide comprises a splice site, wherein the splice site comprises a sequence selected from the group consisting of NNBgtrrm, NNBgtwwdn, NNBgtvmvn, NNBgtvbbn, NNBgtkddn, NNBgtbnbd, NNBhtnngn, NNBhtrmhd, or NNBgvdnvn, wherein N/n is A, U, G or C; B is C, G, or U; h is a, c, or u; v is a, c or g; r is a or g; m is a or c; d is a, g or u; k is g or u; w is a or u.
MUT
U1-bind
Nuclear Magnetic Resonance (NMR) spectroscopy can be a powerful analytical technique used to determine qualitative and quantitative information about organic molecules. NMR can be used to solve and provide valuable information about the structure of a variety of chemical and biological molecules, ranging from small organic compounds to complex polymers such as proteins and nucleic acids. In NMR, a sample is placed in a magnetic field and is subjected to radiofrequency (RF) excitation at a characteristic frequency called Larmor frequency (f):
where γ is the gyromagnetic ratio of nuclei and B0 is the magnetic field strength. The nuclei in the magnetic field absorb the energy provided and become energized. The frequency of the radiation necessary for absorption depends on the type of nuclei to be excited, (e.g., 1H or 13C, or 15N), the frequency will typically also depend on the chemical environment of the nucleus (e.g., the presence of various chemical electronegative groups, salts, pH of solution, and the presence of binding agents), and lastly, the frequency may also depend on the spatial location in the magnetic field if the magnetic field is not uniform, i.e., the field is not homogeneous.
In various embodiments, the methods for determining a 2-D structure and/or a 3-D atomic structure utilize NMR devices having a commercially available spectrometer frequencies, for example, at a 1H Larmor frequency of greater than about 1 GHz, about 1 GHz, from about 1 GHz to about 20 MHz, or about 900 MHz, about 800 MHz, about 700 MHz, about 600 MHz, about 500 MHz, about 400 MHz, about 300 MHz, about 200 MHz, about 100 MHz, about 75 MHz, about 50 MHz, or about 20 MHz, can be used to determine the structure of a biomolecule, for example, a polynucleotide. Solely for the purpose of convenience, the disclosure of the present methods will be exemplified with the use of polynucleotides, but the methods described herein are applicable to determine the interactions or structure of a protein or a polypeptide as the target or desired biomolecule of interest. Methods for selectively labeling proteins and polypeptides are known in the art. In some embodiments, the methods of the present technology can be performed using an NMR module operable to provide a 1H Larmor frequency of 300 MHz or less.
In some embodiments, a lower magnetic fields (for example, 300 MHz or less) can be used, which can significantly shorten the repetition delay and the total experimental time can be reduced to ¼-⅕ of that of high fields because the repetition delay depends on Ti relaxation time which is significantly shorter at low magnetic field (i.e., Ti relaxation time at 100 MHz is more than 6 times shorter than that of 600 MHz for molecules of correlation time of 4-8 ns (oligonucleotides of 25-50 bases)). This Ti relaxation time difference at between high and low magnetic fields becomes larger as molecular weight or size of a molecule increases. Within given time, 4-5 times more measurements can be repeated and added at low magnetic fields to yield signal-to-noise gain of factor of 2.
In some embodiments, there are unexpected advantages using a low field NMR device, for example, an NMR device having a spectrometer frequency of 300 MHz or less. In some embodiments, the methods are derived from the surprising finding that low field NMR can be employed to obtain structurally detailed information concerning a complex structure, such as a polynucleotide. Combining the use of low field NMR (i.e., a 1H Larmor frequency of 300 MHz or less) with selective labeling of the sample provides a sufficient resolution that permits NMR studies of complex 3-D structures using chemical shift information.
In some embodiments, the methods of the present disclosure utilize a low field NMR. These methods illustratively include interrogation of the target or selected polynucleotide selectively labeled with one or more nucleotides using a static magnetic field and reference frequency of 300 MHz or less, or about 299 MHz or less, or about 250 MHz or less, or about 225 MHz or less, or about 200 MHz or less, or less than about 175 MHz, or less than about 150 MHz, or less than about 125 MHz, or less than about 100 MHz, preferably, ranging from about 20 MHz to about 300 MHz, or from about 20 MHz to about 299 MHz, or from about 50 MHz to about 275 MHz, or from about 75 MHz to about 250 MHz, or from about 75 MHz to about 225 MHz, or from about 75 MHz to about 200 MHz, or from about 75 MHz to about 175 MHz, or from about 100 MHz to about 300 MHz, or from about 125 MHz to about 275 MHz, or from about 20 MHz to about 250 MHz, or from about 20 MHz to about 225 MHz, or from about 20 MHz to about 200 MHz, or from about 20 MHz to about 150 MHz, or from about 20 MHz to about 100 MHz.
In some embodiments a number of small molecule bound bimolecular structures can be determined for uses comprising computer aided drug discovery efforts, which commonly rely on biomolecular structures determined when bound to a small molecule.
In order to identify which small molecules interact with the biomolecule, in some embodiments, one synthesizes a uniformly isotopically labeled biomolecular sample, individually or in a combinatorial manner mix each small molecule at a ratio that one would expect to see changes in NMR signals for relatively tight binding small molecules (for a low μM Kd, a ratio of 2:1 or 4:1 could be used), collect the NMR data such as chemical shifts, resonance intensities, and/or NOEs, compare the NMR data of the biomolecule in the presence of the small molecule to the NMR data of the biomolecule in the absence of the small molecule, and select small molecules that cause significant changes in the NMR data. In some embodiments, changes in NMR data comprise a portion of a chemical shift linewidth, for example a one linewidth. In some embodiments, changes in NMR data comprise a significant reduction in an NOE and/or a resonance intensity when comparing the biomolecule NMR data in the absence and presence of the small molecule is significant). In various embodiments, NMR data of the small molecule could be monitored and similar perturbations observed on addition of the biomolecule of interest, where, in some embodiments, the biomolecule is non-isotopically labeled. In various embodiments, the same solution conditions (e.g., buffer or solubilization solution) for each sample are used to minimize random noise due to differences in solution environments.
In some aspects, the methods described herein fits within the drug discovery paradigm used in pharmaceutical and biotech industries. In a first example, the subject matter described herein exploits nucleic acid (e.g., RNA) plasticity to solve atomic-resolution nucleic acid (e.g., RNA) structures and uncover binding pockets optimized to identify key small molecule-nucleic acid (e.g., RNA) interactions. In various embodiments, these binding pockets afford efficient hit identification with atomic-level guidance during target screening. In a second example, in pursuing small molecules for hit-to-lead studies and lead optimization, the atomic-level interactions enable medicinal chemists to rationally design new compounds. In some embodiments, this affords accurate and efficient target validation.
In some aspects, the present disclosure provides a method for determining the 2-dimensional (2-D) or 3-dimensional (3-D) atomic resolution structure of a polynucleotide. The method includes providing a polynucleotide sample comprising a polynucleotide, the polynucleotide comprising none or at least one nucleotide isotopically labeled with one or more atomic labels selected from the group consisting of 2H, 13C, 15N, 19F and 31P. In some embodiments, the method further comprises obtaining a NMR spectrum of the polynucleotide sample using a NMR device. In some embodiments, the method further comprises determining a chemical shift of the one or more atoms or a subset of atoms with close molecular interactions. In some embodiments, the method further comprises determining a 2-D or a 3-D atomic resolution structure of the polynucleotide from the chemical shifts.
In some embodiments, a first NMR spectrum can be obtained for a first complex in the sample, and a second NMR spectrum can be obtained for a second complex in the sample. The second complex can contain one or more molecules (e.g. polynucleotide, polypeptide, or small molecule) more than the first complex. In some embodiments, the method further comprises comparing the first and the second NMR spectrum. In some embodiments, a NMR spectrum is obtained for a polynucleotide sample without a small molecule. In some embodiments, a NMR spectrum is obtained for a polynucleotide sample containing a small molecule. In some embodiments, the method comprises selecting or identifying a binding agent based on comparing different NMR spectrums. In some embodiments, the method comprises selecting or identifying a small molecule based on comparing different NMR spectrums.
In some embodiments, the method to determine the 2-D or 3-D structure of a polynucleotide may need interrogation of multiple polynucleotides having the same nucleotide sequence, but differing from each other in that each polynucleotide is isotopically labeled on a different nucleotide. In other words, the method determines the chemical shifts of multiple polynucleotides, each polynucleotide having the identical nucleotide sequence as the first polynucleotide analyzed, and each polynucleotide is synthesized with a different nucleotide labeled with the one or more atomic labels. For example, if the polynucleotide has 5 nucleotides, the method would require 5 polynucleotide samples, each polynucleotide labeled with the one or more atomic labels on a different nucleotide. In this same 5-mer polynucleotide example, the method may utilize a smaller number of distinct polynucleotides that the number of nucleotides presents in the nucleotide sequence, by strategically labeling one or more nucleotides in the polynucleotide with one or more atomic labels as described herein. In some embodiments, the polynucleotide sample has only one polynucleotide with one nucleotide labeling pattern. In other embodiments, the polynucleotide sample may contain two or more polynucleotides, each having a different nucleotide labeled with one or more atomic labels.
In some aspects, the method obtains a NMR spectrum of the polynucleotide sample by interrogating the polynucleotide sample with a NMR spectrometer frequency ranging from about 1 GHz to about 20 MHz. In one of these aspects, the NMR spectrometer frequency is 300 MHz or less, for example, from about 20 MHz to about 100 MHz.
In some embodiments, the NMR interrogation includes one or more of the following 6 steps. First, in some embodiments, comprises a temperature regulation step. In this aspect, the liquid sample containing the polynucleotide of interest in the appropriate chemical environment is transferred to a sample conduit and fills the analysis volume with sample for NMR interrogation. Second, in some embodiments, the sample in the sample conduit is equilibrated at a selected temperature ranging from 0 to 60° C. Third, in some embodiments, a tuning and matching step can be performed. This process adjusts the resonant circuit frequency and impedance until they coincide with the frequency of the pulses transmitted to the circuit and impedance of the transmission line (typically 50 ohm). For best signal-to-noise and minimal RF coil heating, the tuning and matching can be done for each sample. But with pre-adjustment during manufacturing process, minor or no adjustment is necessary for low field magnets. Fourth, in some embodiments, a locking step is performed. In this process, the 2H signal is found from deuterated solvent for internal feedback mechanism by which magnetic field drift can be compensated. The 2H signal (for example, 30.7 MHz at 200 MHz spectrometer) being distant from 1H signal is acquired and processed independently. Lock signal also serves as chemical shift reference.
Fifth, in some embodiments, prior to acquiring NMR data on the sample being interrogated is a shimming step. In some embodiments, the interrogation step may require creating a homogeneous magnetic field at the analysis volume by controlling electric currents in a set of coils which generate small static magnetic fields of different geometries and strength and correct inhomogeneity of the B0. For NMR interrogation of biomolecules of the present disclosure, it is preferred to have at least 50 ppb (part per billion) of field homogeneity when analyzing samples using NMR.
Sixth, in some embodiments, a sequence of precise pulses and delays are applied to 1H and 13C transmission lines connected to each resonant circuit around the analysis volume to manipulate spin quantum states of nuclei in the sample. As a result, only the desired signals such as 1H nuclei spins attached to 13C are selected and measured excluding all other 1H nuclei spins attached to other nuclei, or using shaped pulses (selective pulses) nuclei having certain chemical shift range are detected. Many different types of pulse sequences can be applicable for different purposes including a variety of HSQC, HMQC, COSY, TOCSY, NOESY, ROESY for structural determinations of biomolecules in 1-D, 2-D, and 3-D experimental settings. In some embodiments, after the pulse sequence, the same resonant circuits (including the 2 or more RF coils) are sensing fluctuation of magnetic field around analysis volume (called FID; free induction decay) as electric voltage which is digitized and recorded for predefined duration. To improve the signal-to-noise (S/N), a set of pulsing and recording steps are repeated multiple times and added with some delay in between, called relaxation delay which allow spin systems to return to initial state before starting pulsing.
In some aspects, the present disclosure provides methods for determining the structure of a target biomolecule when mixed with a small molecule, biomolecule, ligand or other chemical entity (collectively referred to as a binding agent) that could interact with the biomolecule of interest. Chemical shift changes on the addition of the binding agent indicate that the biomolecule may be interacting with the binding agent. The chemical shifts in the presence of the binding agent can be collected and used to determine the biomolecular structure of the biomolecule and the bound binding agent. In some embodiments of this aspect, the method includes the steps of providing a polynucleotide sample comprising a plurality of polynucleotides, the plurality of polynucleotides having an identical nucleotide sequence, wherein each polynucleotide comprises at least one nucleotide isotopically labeled with one or more atomic labels selected from the group consisting of 2H, 13C, 15N, 19F and 31P; admixing the polynucleotide sample with the binding agent forming a plurality of bound complexes; obtaining a NMR spectrum of the bound complexes using a NMR device; determining a chemical shift of the one or more atomic labels; and determining the 3-D atomic resolution structure of the polynucleotides from the chemical shifts.
In some embodiments of the present methods, the target polynucleotide is analyzed by creating a plurality of polynucleotides all having the same nucleotide sequence but differing in the location(s) of isotopically labeled nucleotide(s). In some embodiments, the secondary structure of the polynucleotide is used to determine the placement of the labeled nucleotide or nucleotides to reduce the number of polynucleotide samples. Taking the primary sequence of the polynucleotide, the secondary structure is predicted. Then a plurality of secondary structure predictions can be computed using a secondary structure prediction algorithm (e.g., nearest neighbor algorithm) or computer program. The method then uses an alignment step with the top 10 or so secondary structure predictions and then determines the sites that exhibit the greatest variance in secondary structure. Then the site or sites in the polynucleotide sequence that exhibit largest variance are labeled isotopically for NMR detection or a derivative, wherein one or more nucleotides are labeled per polynucleotide. The labeling scheme can be informed from the chemical shift database whereby multiple isotopic labels can be incorporated into a polynucleotide while maximizing chemical shift dispersion.
In some embodiments, the present disclosure provides a method for determining one or more specific isotopic labeling positions of one or more nucleotides within a polynucleotide sequence for the determination of 3-D atomic resolution structure or collecting other NMR interaction data of a polynucleotide. The method includes providing one or more polynucleotides each of the one or more polynucleotides having an identical polynucleotide sequence, wherein each of the one or more polynucleotides comprises one or more nucleotides labeled with an isotopic label comprising, 2H, 13C, 15N, 19F or 31P; predicting a plurality of structures of the polynucleotide sequence using a computational algorithm (e.g., MC-Sym|MC-fold); identifying one or more region(s) on each of the plurality of polynucleotide structures that exhibit a large structural variation using metrics comprising an S2<0.8 and/or RMSF>0.5 Å; calculating a plurality of chemical shifts from regions of the predicted structures having a large structural variation using a chemical shift predictor; such as Nymirum's RANDOM FOREST™ Predictors (RAMSEY), SHIFTS, NUCHEMICS, and QM methods from the predicted structures; and determining one or more specific isotopic labeling positions on each of the polynucleotide sample(s) such that the chemical shift dispersion is maximized and the number of samples is minimized. The MC-Fold|MC-Sym pipeline is a web-hosted service for RNA secondary and tertiary structure prediction. The pipeline means that the input sequence to MC-Fold outputs secondary structures that are directly inputted to MC-Sym, which outputs tertiary structures.
In some aspects, the present invention provides a NMR device that is small enough to sit on top of a standard laboratory bench. In some embodiments of the second aspect, the NMR device includes a housing; a sample handling device operable to receive a sample comprising a polynucleotide; and an NMR module. The NMR module may include a sample conduit comprising an analysis volume operable to receive at least a portion of the sample from the sample handling device; a plurality of radiofrequency coils disposed proximately to the analysis volume, each coil operable to generate a distinct excitation frequency pulse across the analysis volume to generate nuclear magnetic resonance of the nuclei of the polynucleotide in the analysis volume; and at least one magnet operable to provide a static magnetic field across the analysis volume and the radiofrequency coils. The NMR module may have a 1H Larmor frequency of 300 MHz or less and the RF coils are operable to transmit the excitation frequency pulse to the analysis volume and detect signals from NMR produced by the nuclei of the polynucleotide contained in the analysis volume. Optionally, the device further comprises a heating and cooling device in thermal coupling with the analysis volume. In this regard, the NMR device can employ the use of a sample conduit or analysis volume heating and cooling device for heating the sample containing the biomolecule, for example a protein or a nucleic acid, for example, an RNA polynucleotide to anneal the polynucleotide and bring the polynucleotide into a relaxed or stable conformation prior to acquisition of NMR spectra.
In certain embodiments, the method the step of providing the polynucleotide sample includes determining one or more 2-D or 3-D models of the polynucleotide sequence using a 2-D or 3-D structure predicting algorithm, respectively; identifying one or more structural heterogeneous regions on each of the one or more 2-D or 3-D models of the polynucleotide sequence; calculating one or more chemical shifts from the one or more structural heterogeneous regions; and synthesizing a polynucleotide comprising one or more nucleotides having one or more atomic labels positioned at one or more nuclei which results in a polynucleotide having a minimized chemical shift overlap.
In some embodiments, determining the 3-D atomic resolution structure includes generating a plurality of theoretical structural polynucleotide 2-D models using the nucleotide sequence and one or more 2-D structure predicting algorithms; generating a plurality of theoretical structural polynucleotide 3-D models using a 3-D structure predicting algorithm using the plurality of theoretical structural polynucleotide 2-D models and optionally one or more known or assumed polynucleotide 2-D model; generating a predicted chemical shift set for each of the plurality of theoretical structural polynucleotide 3-D models; comparing the predicted chemical shift set to the chemical shift(s) of the one or more atoms; and selecting one or more theoretical structural polynucleotide 3-D model having an agreement (e.g., the best agreement) between the respective predicted chemical shift set and the chemical shift(s) of the one or more atomic labels as the one or more 3-D atomic resolution structures. In some embodiments, the predicted chemical shift set is generated by comparing each theoretical structural polynucleotide 3-D model with a NMR-data polynucleotide structure database. In some embodiments, generating the predicted chemical shift set includes calculating a polynucleotide structural metric comprising atomic coordinates, stacking interactions, magnetic susceptibility, electromagnetic fields, or dihedral angles from one or more experimentally determined polynucleotide 3-D structures; generating a set of mathematical functions or objects that describe relationships between experimental chemical shifts and the polynucleotide structural metric of the experimentally determined 3-D polynucleotide structures using a regression algorithm; calculating a polynucleotide structural metric for each of the theoretical structural polynucleotide 3-D models; and inputting the polynucleotide structural metric for each of the theoretical structural polynucleotide 3-D models into the set of mathematical functions or objects to generate the predicted chemical shift set.
In some embodiments, the regression algorithm is machine learning algorithm comprising a Random Forest algorithm. In some embodiments, determining the experimental chemical shift set comprises modeling the chemical shift set using a NMR spectrometer frequency from about 1 GHz to about 20 MHz.
In some embodiments, determining the 3-D atomic resolution structure includes generating a plurality of theoretical structural polynucleotide 2-D models using the nucleotide sequence and one or more 2-D structure predicting algorithms; generating a plurality of theoretical structural polynucleotide 3-D models using a 3-D structure predicting algorithm using the plurality of theoretical structural polynucleotide 2-D models and optionally one or more known or assumed polynucleotide 2-D model; generating a predicted chemical shift set for each of the plurality of theoretical structural polynucleotide 3-D models; comparing the predicted chemical shift set to the chemical shift(s) of the one or more atoms; and selecting one or more theoretical structural polynucleotide 3-D model having an agreement (e.g., the best agreement) between the respective predicted chemical shift set and the chemical shift(s) of the one or more atomic labels as the one or more 3-D atomic resolution structures.
In some embodiments, the method also includes the step of identifying a binding pocket in the one or more 3-D atomic resolution structures. In some embodiments, the method also includes the step of associating another molecule with the identified binding pocket of each of the one or more 3-D atomic resolution structures. In some embodiments, the method also includes the step of refining the associated another molecule and binding pocket of each of the one or more 3-D atomic resolution structures using a modeling software that performs one or more functions comprising energy minimization and/or a molecular dynamics simulation. In some embodiments, the method also includes the step of identifying a binding pocket in the one or more refined 3-D atomic resolution structures. In some embodiments, the method also includes the step of using one or more coordinates of the associated another molecule in the refined 3-D structures and binding pocket of each of the one or more 3-D atomic resolution structures. In some embodiments, the predicted chemical shift set is generated by comparing each theoretical structural polynucleotide 3-D model with a NMR-data polynucleotide structure database.
In some embodiments, generating the predicted chemical shift set includes calculating a polynucleotide structural metric comprising atomic coordinates, stacking interactions, magnetic susceptibility, electromagnetic fields, or dihedral angles from one or more experimentally determined polynucleotide 3-D structures; generating a set of mathematical functions or objects that describe relationships between experimental chemical shifts and the polynucleotide structural metric of the experimentally determined 3-D polynucleotide structures using a regression algorithm; calculating a polynucleotide structural metric for each of the theoretical structural polynucleotide 3-D models; and inputting the polynucleotide structural metric for each of the theoretical structural polynucleotide 3-D models into the set of mathematical functions or objects to generate the predicted chemical shift set.
In some embodiments, structural dynamics can be determined by obtaining structural information by NMR in a temporal manner. For example, in binding a small molecule to a target polynucleotide, structural information of the small molecule binding to the target polynucleotide can be determined at different times by NMR after contacting the small molecule to the target polynucleotide. The structural information can be obtained by taking NMR spectrum at different time points. The NMR spectrum taken at different time points can be used to calculate the chemical shifts, and the chemical shifts can be compared in order to determine a binding kinetics.
In some embodiments, binding kinetics between a small molecule and a target polynucleotide can be determined by various methods in the art. For example, kinetics assays for measuring binding kinetics include, but are not limited to, surface plasmon resonance (SPR), Bio-Layer Interferometry (BLI) technology (Octet Systems), isothermal titration calorimetry (ITC), or fluorescence anisotropy. In some embodiments, one or more of the binding kinetics assay are used to confirm the identified small molecule and the target polynucleotide.
Binding kinetics of RNA splicing can broadly encompass the mechanism by which alternative splicing machinery function in conjunction with the structural RNA and execute the function of pre-mRNA splicing, excising of introns and fusion of exons to produce the final mature mRNA isoform. The kinetics of splicing can be a highly dynamic process involved both positive and negative regulators of exon inclusion, such that the overall net effect can be exon inclusion or exon inclusion. Binding agents, such as small molecules, can interact with this process and influence the exonic splicing towards one direction by impacting the affinity of particularly relevant trans-acting binding factors that form the spliceosomal complex. Binding kinetics can be reflected by various parameters, including kon, koff, and Kd. Lower Kd usually indicates stronger binding, therefore higher binding affinity.
Binding kinetics of a small molecule binding to a target can be used to determine whether the small molecule is a strong binder or not. Binding kinetics of a polynucleotides binding to another polynucleotide (e.g. a target polynucleotide) with or without a small molecule can be used to determine whether two polynucleotides bind stronger or weaker in the presence of the small molecule. Binding kinetics of a protein binding to a target polynucleotide with or without a small molecule can be used to infer whether the protein binds stronger or weaker in the presence of the small molecule. Kd can be determined by various the concentrations of the binding agent in the presence of constant concentration of a target. For example, in determining the Kd of a small molecule binding to a target mRNA or RNA-RNA duplex, the concertation of a small molecule can be changed. Kd can also be determined by measuring kon and koff during a binding process, which can be used to calculate Kd.
In some embodiments, the binding kinetics between a binding agent and a target polynucleotide can be determined. In some embodiments, the binding kinetics between a binding agent and a RNA-RNA complex can be determined. In some embodiments, the binding kinetics between a binding agent and a RNA-protein complex can be determined. For example, the binding kinetics between a small molecule and a target polynucleotide (e.g. mRNA) can be determined to infer how strong the binding is.
In some embodiments, the binding kinetics of a polynucleotide binding to a target polynucleotide to form a RNA-RNA duplex with or without a small molecule binding agent can be determined. In some embodiments, the binding kinetics of a polynucleotide binding to a target polynucleotide with and without a small molecule binding agent are determined, and the binding kinetics with and without the small molecule can be compared to infer whether the polynucleotide binds to the target polynucleotide stronger or weaker with the small molecule.
In some embodiments, the binding kinetics of a protein or protein component/polypeptide binding to a target RNA to form a protein-RNA complex with or without a small molecule binding agent can be determined. In some embodiments, the binding kinetics of a protein or polypeptide binding to a target polynucleotide with and without a small molecule binding agent are determined, and the binding kinetics with and without the small molecule can be compared to infer whether the protein binds to the target polynucleotide stronger or weaker with the small molecule.
In some embodiments, the binding kinetics of a protein-RNA complex binding to a target RNA to form a complex with or without a small molecule binding agent can be determined. In some embodiments, the binding kinetics of a protein-RNA complex binding to a target polynucleotide with and without a small molecule binding agent are determined, and the binding kinetics with and without the small molecule can be compared to infer whether the protein-RNA complex binds to the target polynucleotide stronger or weaker with the small molecule.
In some embodiments, small molecule binding agents are selected by NMR assay and then tested in the kinetics assay. For example, the kinetics assay can be used to measure the binding kinetics of two or more different molecules against the same target (e.g. RNA, RNA-RNA complex, or RNA-protein complex) and compare the Kd to infer which small molecules are strong binders. The kinetics assay can serve as secondary screening assay following the NMR initial screening. In some embodiments, the kinetics assay can also serve as initial screening assay and followed by NMR for structural determination.
In some embodiments, the binding kinetics is measured by SPR and/or BLI. In such cases, a polynucleotide is immobilized on a surface. In some situations, the target polynucleotide (e.g. target mRNA) is immobilized on a surface. In some situations, a polynucleotide such as a snRNA is immobilized on a surface. The method to immobilize a polynucleotide on a surface can include labeling the polynucleotide with biotin, and conjugate the surface with streptavidin, thereby immobilizing the polynucleotide through biotin-streptavidin interaction.
In some embodiments, the binding kinetics is measured by fluorescence anisotropy, wherein a polynucleotide can be labeled with a fluorophore. In some other embodiments, the binding kinetics is measured by ITC.
In any of the above mentioned embodiments, the kinetics assay can be tested in the presence of one or more polynucleotide molecules, or one or more polypeptides or a portion thereof. For example, U1 snRNP binding to a target mRNA containing 5′ss can be tested in the presence of one or more auxiliary splicing factors or proteins involved in the splicing. The proteins used herein can comprise a portion, for example a domain, of the proteins.
Also provided herein are methods to determine the specificity of a small molecule. For example, a small molecule selected by an initial NMR screening can be tested in any of the above mentioned kinetic assays to determine the binding affinity of the small molecule against different targets. The target can be a target mRNA bound with a snRNA in the presence or absence of a protein or a portion thereof. In some embodiments, the specificity of the small molecule is tested against different RNA-RNA duplexes comprising a target mRNA (e.g. 5′ss) and a snRNA (e.g. U1 snRNA). In some embodiments, the specificity of the small molecule is tested against different protein-RNA complexes comprising a target mRNA (e.g. 5′ss), a snRNA (e.g. U1 snRNA) and a protein or a protein domain (e.g. U1-C zinc finger domain).
Virtual screening or structure-based drug design can be performed following the NMR study. In the above mentioned NMR studies, 3-dimensional structural model can be generated for each target polynucleotide in the presence of any binding partners (e.g. a polynucleotide, or a polypeptide). For example, 3-dimensional structural model can be generated to a target mRNA bound with a snRNA or a portion thereof and a binding pocket can be identified for the RNA-RNA duplex. For another example, 3-dimensional structural model can be generated to a target mRNA bound with a snRNA in the presence of a protein binding partner or a domain of the protein, and a binding pocket can be identified for the RNA-protein complex. The identified binding pocket can be further used for structure-based drug design or virtual screening process. Structure-based drug design (or direct drug design) can rely on knowledge of the 3-dimensional structure of the biological target molecule (e.g. mRNA) obtained through methods such as x-ray crystallography or NMR spectroscopy. If an experimental structure of a target is not available, it may be possible to create a homology model of the target based on the experimental structure of a related molecule. Using the structure of the biological target, candidate drugs that are predicted to bind with high affinity and selectivity to the target may be designed using interactive graphics and the intuition of a medicinal chemist. Alternatively various automated computational procedures may be used to suggest new drug candidates.
Current methods for structure-based drug design can be divided roughly into three main categories. The first method is identification of new ligands for a given receptor by searching large databases of 3D structures of small molecules to find those fitting the binding pocket of a target using fast approximate docking programs. A second category is de novo design of new ligands. In this method, ligand molecules are built up within the constraints of the binding pocket by assembling small pieces in a stepwise manner. These pieces can be either individual atoms or molecular fragments. The key advantage of such a method is that novel structures, not contained in any database, can be suggested. A third method is the optimization of known ligands by evaluating proposed analogs within the binding pocket. The structure-based drug can be aided by computer programs (e.g. GOLD), therefore, it can be referred to a virtual screening process. As used herein, virtual screen or screening can broadly cover all the above method structure-based drug design categories. In one aspect of the present disclosure, a virtual screening process is provided to select small molecule or fragments thereof for de novo drug design and/or lead optimization. In some embodiments, the present disclosure provides a method comprising: identifying one or more binding pockets formed by a target polynucleotide and a first polynucleotide, wherein the target polynucleotide contains a splice site, a branch point (BP), an exonic splicing enhancer (ESE), an exonic splicing silencer (ESS), an intronic splicing enhancer (ISE), an intronic splicing silencer (ISS), or a polypyrimidine tract, or any combinations thereof; and virtually screening one or more small molecules or fragments thereof against the one or more binding pockets, wherein the virtual screening process identifies putative small molecule or fragment hits. In some embodiments, a first and a second small molecule hit can be identify through virtual screening process, and the binding kinetics of the first and the second small molecule hit can be determined. In some embodiments, the binding kinetics of the first and the second small molecule can be compared to infer the binding affinity of the small molecule hit and select a stronger small molecule (i.e. higher binding affinity). The binding kinetics can be determined by various assays, including surface plasmon resonance (SPR), Bio-Layer Interferometry (BLI) technology (Octet Systems), isothermal titration calorimetry (ITC), or fluorescence anisotropy.
Diseases associated with changes to RNA transcript amount are often treated with a focus on the aberrant protein expression. However, if the processes responsible for the aberrant changes in RNA levels, such as components of the splicing process or associated transcription factors or associated stability factors, could be targeted by treatment with a small molecule, it would be possible to restore protein expression levels such that the unwanted effects of the expression of aberrant levels of RNA transcripts or associated proteins. The present disclosure provides methods of modulating the amount of RNA transcripts encoded by certain genes as a way to prevent or treat diseases associated with aberrant expression of the RNA transcripts or associated proteins.
In various embodiments, the present disclosure provides methods to identify small molecule binding agents that bind to a target polynucleotide, for example, an mRNA. In some embodiments, the present disclosure provides methods to identify small molecule binding agents that bind to a polynucleotide-protein complex, for example a complex formed by a pre-mRNA and a protein involved in splicing. In various embodiments, the present disclosure provides a screening method to select small molecule binding agents that can bind to a polynucleotide-protein complex. In various embodiments, the present disclosure provides screening methods to select small molecule binding agents that can correct aberrant RNA splicing. In various embodiments, the present disclosure provides methods to select small molecule binding agents by NMR.
Aberrant splicing can happen in pre-mRNA transcribed from various genes, including, but not limited to, ABCA4, ABCB4, ABCD1, ACADSB, ADA, ADAMTS13, AGL, ALB, ALDH3A2, ALG6, APC, APOB, AR, ATM, ATP7A, ATR, B2M, BMP2K, BRCA1, BRCA2, BTK, C3, CAT, CD46, CDH1, CDH23, CFTR, CHM, COL11A1, COL11A2, COL1A1, COL1A2, COL2A1, COL3A1, COL4A5, COL6A1, COL6A3, COL7A1, COL9A2, COLQ, CUL4B, CYBB, CYP17, CYP19, CYP27, CYP27A1, DES, DMD, DYSF, EGFR, EMD, ETV4, F13A1, F5, F7, F8, FAH, FANCA, FANCC, FANCG, FBN1, FECH, FGA, FGFR2, FGG, FIX, FLNA, FOXM1, FRAS1, GALC, GH1, GHV, HADHA, HBA2, HBB, HEXA, HEXB, HLCS, HMBS, HMGCL, HNF1A, HPRT1, HPRT2, HSF4, HSPG2, HTT, IDS, IKBKAP, INSR, ITGB2, ITGB3, JAG1, KRAS, KRT5, L1CAM, LAMA3, LDLR, LMNA, LPL, MADD, MAPT, MLH1, MSH2, MST1R, MTHFR, MUT, MVK, NF1, NF2, OAT, OPA1, OTC, PAH, PBGD, PCCA, PDH1, PGK1, PHEX, PKD2, PKLR, PLEKHM1, PLKR, POMT2, PRDM1, PRKAR1A, PROC, PSEN1, PTCH1, PTEN, PYGM, RP6KA3, RPGR, RSK2, SBCAD, SCN5A, SERPINA1, SLC12A3, SLC6A8, SMN2, SPINK5, SPTA1, TP53, TRAPPC2, TSC1, TSC2, TSHB, UGT1A1, and USH2A.
Exemplary diseases caused by those aberrant splicing can include cystic Fibrosis, myotonia congenita, protoporphyria (erythropoietic), lymphoproliferative syndrome (X-linked), neurofibromatosis, retinitis pigmentosa, spondyloepiphyseal dysplasia tarda, epilepsy (progressive myoclonus), Rubinstein-Taybi syndrome, muscular dystrophy (merosin deficient), occipital horn syndrome, medium-chain acyl-CoA DH deficiency, tuberous sclerosis, Frontotemporal dementia with Parkinsonism, osteogenesis imperfecta, myotonia congenita, occipital horn syndrome, familial dysautonomia, spinal muscular atrophy, cancer, hypoxanthine phosphoribosyltransferase deficiency, Ehlers-Danlos syndrome, Fanconi anemia, Marfan syndrome, thrombotic thrombocytopenic purpura, glycogen storage disease Type III, and atypical hemolytic uremic syndrome (aHUS).
In some embodiments, the non-cancer diseases and/or associated conditions therewith that can be prevented/treated in accordance with the present disclosure include non-cancer condition or disease is selected from the group consisting of Hutchinson-Gilford progeria syndrome (HGPS), Limb girdle muscular dystrophy type 1B, Familial partial lipodystrophy type 2, Frontotemporal dementia with parkinsonism chromosome 17, Neonatal Hypoxia-Ischemia, Familial Dysautonomia, Hypoxanthine phosphoribosyltransferase deficiency, Ehlers-Danlos syndrome, Occipital Horn Syndrome, Fanconi Anemia, Marfan Syndrome, thrombotic thrombocytopenic purpura, glycogen Storage Disease Type III, Tyrosinemia (type I), Menkes Disease, Analbuminemia, Congenital acetylcholinesterase deficiency, Haemophilia B deficiency (coagulation factor IX deficiency), Recessive dystrophic epidermolysis bullosa, Dominant dystrophic epidermolysis bullosa, Somatic mutations in kidney tubular epithelial cells, X-linked adrenoleukodystrophy (X-ALD), FVII deficiency, Homozygous hypobetalipoproteinemia, Ataxia-telangiectasia, Androgen Sensitivity, Common congenital afibrinogenemia, Risk for emphysema, Mucopolysaccharidosis type II (Hunter syndrome), Severe type III osteogenesis imperfecta, Ehlers-Danlos syndrome IV, Glanzmann thrombasthenia, Mild Bethlem myopathy, Dowling-Meara epidermolysis bullosa simplex, Severe deficiency of MTHFR, Acute intermittent porphyria, Tay-Sachs Syndrome, Myophosphorylase deficiency (McArdle disease), Chronic Tyrosinemia Type 1, Mutation in placenta, Leukocyte adhesion deficiency, Hereditary C3 deficiency, Placental aromatase deficiency, Cerebrotendinous xanthomatosis, Duchenne and Becker muscular dystrophy, Severe factor V deficiency, Alpha-thalassemia, Beta-thalassemia, Hereditary HL deficiency, Lesch-Nyhan syndrome, Familial hypercholesterolemia, Phosphoglycerate kinase deficiency, Cowden syndrome, X-linked retinitis pigmentosa (RP3), Crigler-Najjar syndrome type 1, Chronic tyrosinemia type I, Sandhoff disease, Maturity onset diabetes of the young (MODY), Familial tuberous sclerosis, Polycystic kidney disease 1, Primary Hyperthyroidism, cystic fibrosis, Spinal muscular atrophy, neurofibromatosis, Neurofibromatosis type I and Neurofibromatosis type II.
In specific embodiments, the cancer treated by the compounds of the present disclosure is leukemia, acute myeloid leukemia, colon cancer, gastric cancer, macular degeneration, acute monocytic leukemia, breast cancer, hepatocellular carcinoma, cone-rod dystrophy, alveolar soft part sarcoma, myeloma, skin melanoma, prostatitis, pancreatitis, pancreatic cancer, retinitis, adenocarcinoma, adenoiditis, adenoid cystic carcinoma, cataract, retinal degeneration, gastrointestinal stromal tumor, Wegener's granulomatosis, sarcoma, myopathy, prostate adenocarcinoma, Hodgkin's lymphoma, ovarian cancer, non-Hodgkin's lymphoma, multiple myeloma, chronic myeloid leukemia, acute lymphoblastic leukemia, renal cell carcinoma, transitional cell carcinoma, colorectal cancer, chronic lymphocytic leukemia, anaplastic large cell lymphoma, kidney cancer, breast cancer, cervical cancer.
In specific embodiments, the cancer prevented and/or treated in accordance with the present disclosure is basal cell carcinoma, goblet cell metaplasia, or a malignant glioma, cancer of the liver, breast, lung, prostate, cervix, uterus, colon, pancreas, kidney, stomach, bladder, ovary, or brain.
In specific embodiments, the cancer prevented and/or treated in accordance with the present disclosure include, but are not limited to, cancer of the head, neck, eye, mouth, throat, esophagus, esophagus, chest, bone, lung, kidney, colon, rectum or other gastrointestinal tract organs, stomach, spleen, skeletal muscle, subcutaneous tissue, prostate, breast, ovaries, testicles or other reproductive organs, skin, thyroid, blood, lymph nodes, kidney, liver, pancreas, and brain or central nervous system.
Specific examples of cancers that can be prevented and/or treated in accordance with present disclosure include, but are not limited to, the following: renal cancer, kidney cancer, glioblastoma multiforme, metastatic breast cancer; breast carcinoma; breast sarcoma; neurofibroma; neurofibromatosis; pediatric tumors; neuroblastoma; malignant melanoma; carcinomas of the epidermis; leukemias such as but not limited to, acute leukemia, acute lymphocytic leukemia, acute myelocytic leukemias such as myeloblastic, promyelocytic, myelomonocytic, monocytic, erythroleukemia leukemias and myclodysplastic syndrome, chronic leukemias such as but not limited to, chronic myelocytic (granulocytic) leukemia, chronic lymphocytic leukemia, hairy cell leukemia; polycythemia vera; lymphomas such as but not limited to Hodgkin's disease, non-Hodgkin's disease; multiple myelomas such as but not limited to smoldering multiple myeloma, nonsecretory myeloma, osteosclerotic myeloma, plasma cell leukemia, solitary plasmacytoma and extramedullary plasmacytoma; Waldenstrom's macroglobulinemia; monoclonal gammopathy of undetermined significance; benign monoclonal gammopathy; heavy chain disease; bone cancer and connective tissue sarcomas such as but not limited to bone sarcoma, myeloma bone disease, multiple myeloma, cholesteatoma-induced bone osteosarcoma, Paget's disease of bone, osteosarcoma, chondrosarcoma, Ewing's sarcoma, malignant giant cell tumor, fibrosarcoma ofbone, chordoma, periosteal sarcoma, soft-tissue sarcomas, angiosarcoma (hemangiosarcoma), fibrosarcoma, Kaposi's sarcoma, leiomyosarcoma, liposarcoma, lymphangio sarcoma, neurilemmoma, rhabdomyosarcoma, and synovial sarcoma; brain tumors such as but not limited to, glioma, astrocytoma, brain stem glioma, ependymoma, oligodendroglioma, nonglial tumor, acoustic neurinoma, craniopharyngioma, medulloblastoma, meningioma, pineocytoma, pineoblastoma, and primary brain lymphoma; breast cancer including but not limited to adenocarcinoma, lobular (small cell) carcinoma, intraductal carcinoma, medullary breast cancer, mucinous breast cancer, tubular breast cancer, papillary breast cancer, Paget's disease (including juvenile Paget's disease) and inflammatory breast cancer; adrenal cancer such as but not limited to pheochromocytom and adrenocortical carcinoma; thyroid cancer such as but not limited to papillary or follicular thyroid cancer, medullary thyroid cancer and anaplastic thyroid cancer; pancreatic cancer such as but not limited to, insulinoma, gastrinoma, glucagonoma, vipoma, somatostatin-secreting tumor, and carcinoid or islet cell tumor; pituitary cancers such as but limited to Cushing's disease, prolactin-secreting tumor, acromegaly, and diabetes insipius; eye cancers such as but not limited to ocular melanoma such as iris melanoma, choroidal melanoma, and cilliary body melanoma, and retinoblastoma; vaginal cancers such as squamous cell carcinoma, adenocarcinoma, and melanoma; vulvar cancer such as squamous cell carcinoma, melanoma, adenocarcinoma, basal cell carcinoma, sarcoma, and Paget's disease; cervical cancers such as but not limited to, squamous cell carcinoma, and adenocarcinoma; uterine cancers such as but not limited to endometrial carcinoma and uterine sarcoma; ovarian cancers such as but not limited to, ovarian epithelial carcinoma, borderline tumor, germ cell tumor, and stromal tumor; cervical carcinoma; esophageal cancers such as but not limited to, squamous cancer, adenocarcinoma, adenoid cyctic carcinoma, mucoepidermoid carcinoma, adenosquamous carcinoma, sarcoma, melanoma, plasmacytoma, verrucous carcinoma, and oat cell (small cell) carcinoma; stomach cancers such as but not limited to, adenocarcinoma, fungating (polypoid), ulcerating, superficial spreading, diffusely spreading, malignant lymphoma, liposarcoma, fibrosarcoma, and carcinosarcoma; colon cancers; KRAS mutated colorectal cancer; colon carcinoma; rectal cancers; liver cancers such as but not limited to hepatocellular carcinoma and hepatoblastoma, gallbladder cancers such as adenocarcinoma; cholangiocarcinomas such as but not limited to pappillary, nodular, and diffuse; lung cancers such as KRAS-mutated non-small cell lung cancer, non-small cell lung cancer, squamous cell carcinoma (epidermoid carcinoma), adenocarcinoma, large-cell carcinoma and small-cell lung cancer; lung carcinoma; testicular cancers such as but not limited to germinal tumor, seminoma, anaplastic, classic (typical), spermatocytic, nonseminoma, embryonal carcinoma, teratoma carcinoma, choriocarcinoma (yolk-sac tumor), prostate cancers such as but not limited to, androgen-independent prostate cancer, androgen-dependent prostate cancer, adenocarcinoma, leiomyosarcoma, and rhabdomyosarcoma; penal cancers; oral cancers such as but not limited to squamous cell carcinoma; basal cancers; salivary gland cancers such as but not limited to adenocarcinoma, mucoepidermoid carcinoma, and adenoidcystic carcinoma; pharynx cancers such as but not limited to squamous cell cancer, and verrucous; skin cancers such as but not limited to, basal cell carcinoma, squamous cell carcinoma and melanoma, superficial spreading melanoma, nodular melanoma, lentigo malignant melanoma, acrallentiginous melanoma; kidney cancers such as but not limited to renal cell cancer, adenocarcinoma, hypernephroma, fibrosarcoma, transitional cell cancer (renal pelvis and/or uterer); renal carcinoma; Wilms' tumor; bladder cancers such as but not limited to transitional cell carcinoma, squamous cell cancer, adenocarcinoma, carcinosarcoma. In addition, cancers include myxosarcoma, osteogenic sarcoma, endotheliosarcoma, lymphangioendotheliosarcoma, mesothelioma, synovioma, hemangioblastoma, epithelial carcinoma, cystadenocarcinoma, bronchogenic carcinoma, sweat gland carcinoma, sebaceous gland carcinoma, papillary carcinoma and papillary adenocarcinomas.
In certain embodiments, cancers that can be prevented and/or treated in accordance with the present disclosure include, the following: pediatric solid tumor, Ewing's sarcoma, Wilms tumor, neuroblastoma, neurofibroma, carcinoma of the epidermis, malignant melanoma, cervical carcinoma, colon carcinoma, lung carcinoma, renal carcinoma, breast carcinoma, breast sarcoma, metastatic breast cancer, HIV-related Kaposi's sarcoma, prostate cancer, androgen-independent prostate cancer, androgen-dependent prostate cancer, neurofibromatosis, lung cancer, non-small cell lung cancer, KRAS-mutated non-small cell lung cancer, malignant melanoma, melanoma, colon cancer, KRAS-mutated colorectal cancer, glioblastoma multiforme, renal cancer, kidney cancer, bladder cancer, ovarian cancer, hepatocellular carcinoma, thyroid carcinoma, rhabdomyosarcoma, acute myeloid leukemia, and multiple myeloma.
In some embodiments, cancers and conditions associated therewith that are prevented and/or treated in accordance with the present disclosure are triple negative breast cancer, metastatic colorectal cancer, endometrial cancer, metastatic melanoma, hereditary nonpolyposis colorectal cancer, adenocarcinoma, sarcoma, melanoma, liver cancer, hepatocellular carcinoma, hepatoblastoma, liver carcinoma, prostate cancer, prostate adenocarcinoma, androgen-independent prostate cancer, androgen-dependent prostate cancer, leiomyosarcoma, rhabdomyosarcoma, prostate carcinoma, brain cancer, glioma, astrocytoma, brain stem glioma, ependymoma, oligodendroglioma, nonglial tumor, acoustic neurinoma, craniopharyngioma, medulloblastoma, meningioma, pineocytoma, pineoblastoma, primary brain lymphoma, anaplastic astrocytoma, juvenile pilocytic astrocytoma, a mixture of oligodendroglioma and astrocytoma elements, breast cancer, metastatic breast cancer, breast carcinoma, breast sarcoma, adenocarcinoma, lobular (small cell) carcinoma, intraductal carcinoma, medullary breast cancer, mucinous breast cancer, tubular breast cancer, papillary breast cancer, Paget's disease, juvenile Paget's disease, inflammatory breast cancer, lung cancer, KRAS-mutated non-small cell lung cancer, non-small cell lung cancer, squamous cell carcinoma (epidermoid carcinoma), adenocarcinoma, large-cell carcinoma, small cell lung cancer, lung carcinoma, colon cancer, KRAS mutated colorectal cancer, colon carcinoma, pancreatic cancer, insulinoma, gastrinoma, glucagonoma, vipoma, somatostatin-secreting tumor, carcinoid tumor, islet cell tumor, pancreas carcinoma, skin cancer, skin melanoma, basal cell carcinoma, squamous cell carcinoma, melanoma, superficial spreading melanoma, nodular melanoma, lentigo malignant melanoma, acrallentiginous melanoma, skin carcinoma, cervical cancer, cervical cancer, squamous cell carcinoma, adenocarcinoma, cervical carcinoma, ovarian cancer, ovarian epithelial carcinoma, borderline tumor, germ cell tumor, stromal tumor, ovarian carcinoma, cancer of the mouth, blood cancer, leukemia, acute myeloid leukemia, acute monocytic leukemia, chronic myeloid leukemia, acute lymphoblastic leukemia, chronic lymphocytic leukemia, acute leukemia, acute lymphocytic leukemia, acute myelocytic leukemia, myeloblastic leukemia, promyelocytic leukemia, myelomonocytic leukemia, monocytic leukemia, erythroleukemia, myclodysplastic syndrome, chronic leukemia, chronic myelocytic (granulocytic) leukemia, chronic lymphocytic leukemia, hairy cell leukemia, plasma cell leukemia, cancer of the nervous system, cancer of the central nervous system, a primary central nervous system (CNS) lymphoma, a CNS germ cell tumor, goblet cell metaplasia, kidney cancer, renal cell cancer, adenocarcinoma, hypernephroma, fibrosarcoma, transitional cell cancer (renal pelvis and/or uterer), bladder cancer, transitional cell carcinoma, squamous cell cancer, adenocarcinoma, carcinosarcoma, stomach cancer, stomach cancer, adenocarcinoma, fungating (polypoid), ulcerating, superficial spreading, diffusely spreading, malignant lymphoma, liposarcoma, fibrosarcoma, carcinosarcoma, uterine cancer, endometrial carcinoma, uterine sarcoma, cancer of the esophagus, squamous cancer, adenocarcinoma, adenoid cyctic carcinoma, mucoepidermoid carcinoma, adenosquamous carcinoma, sarcoma, melanoma, plasmacytoma, verrucous carcinoma, and oat cell(small cell) carcinoma, esophageal carcinomas, cancer of the rectum, colorectal cancer, rectal cancers, colorectal carcinoma, gallbladder cancer, adenocarcinoma, cholangiocarcinoma, pappillary cholangiocarcinoma, nodular cholangiocarcinoma, diffuse cholangiocarcinoma, testicular cancer, germinal tumor, seminoma, anaplastic testicular cancer, classic (typical) testicular cancer, spermatocytic testicular cancer, nonseminoma testicular cancer, embryonal carcinoma, teratoma carcinoma, choriocarcinoma (yolk-sac tumor), gastric cancer, gastrointestinal stromal tumor, cancer of other gastrointestinal tract organs, gastric carcinomas, bone cancer, connective tissue sarcoma, bone sarcoma, myeloma bone disease, multiple myeloma, cholesteatoma-induced bone osteosarcoma, Paget's disease of bone, osteosarcoma, chondrosarcoma, Ewing's sarcoma, malignant giant cell tumor, fibrosarcoma of bone, chordoma, periosteal sarcoma, soft-tissue sarcoma, angiosarcoma (hemangiosarcoma), fibrosarcoma, Kaposi's sarcoma, leiomyosarcoma, liposarcoma, lymphangiosarcoma, neurilemmoma, rhabdomyosarcoma, synovial sarcoma, Hodgkin's lymphoma, non-Hodgkin's lymphoma, anaplastic large cell lymphoma, cancer of the lymph node, lymphangioendotheliosarcoma, myeloma, multiple myeloma, smoldering multiple myeloma, nonsecretory myeloma, osteosclerotic myeloma, solitary plasmacytoma, extramedullary plasmacytoma, alveolar soft part sarcoma, adenoid cystic carcinoma, renal cell carcinoma, transitional cell carcinoma, germ cell cancer, a malignant glioma, renal carcinoma, vaginal cancer, squamous cell carcinoma, adenocarcinoma, melanoma, vulvar cancer, squamous cell carcinoma, melanoma, adenocarcinoma, sarcoma, Paget's disease, cancer of other reproductive organs, thyroid cancer, papillary thyroid cancer, follicular thyroid cancer, medullary thyroid cancer, anaplastic thyroid cancer, thyroid carcinoma, salivary gland cancer, adenocarcinoma, mucoepidermoid carcinoma, eye cancer, ocular melanoma, iris melanoma, choroidal melanoma, cilliary body melanoma, retinoblastoma, penal cancers, oral cancer, squamous cell carcinoma, basal cancer, pharynx cancer, squamous cell cancer, verrucous pharynx cancer, Wilms' tumor, cancer of the head, cancer of the neck, cancer of the eye, cancer of the throat, cancer of the chest, cancer of the spleen, cancer of skeletal muscle, cancer of subcutaneous tissue, adrenal cancer, pheochromocytoma, adrenocortical carcinoma, pituitary cancer, Cushing's disease, prolactin-secreting tumor, acromegaly, diabetes insipidus, myxosarcoma, osteogenic sarcoma, endotheliosarcoma, mesothelioma, synovioma, hemangioblastoma, epithelial carcinoma, cystadenocarcinoma, bronchogenic carcinoma, sweat gland carcinoma, sebaceous gland carcinoma, papillary carcinoma, papillary adenocarcinomas, ependyoma, optic nerve glioma, primitive neuroectodermal tumor, rhabdoid tumor, renal cancer, glioblastoma multiforme, neurofibroma, neurofibromatosis, pediatric cancer, neuroblastoma, malignant melanoma, carcinoma of the epidermis, polycythemia vera, Waldenstrom's macroglobulinemia, monoclonal gammopathy of undetermined significance, benign monoclonal gammopathy, heavy chain disease, pediatric solid tumor, Ewing's sarcoma, Wilms tumor, carcinoma of the epidermis, HIV-related Kaposi's sarcoma, rhabdomyosarcoma, thecomas, arrhenoblastomas, endometrial carcinoma, endometrial hyperplasia, endometriosis, fibrosarcomas, choriocarcinoma, nasopharyngeal carcinoma, laryngeal carcinoma, hepatoblastoma, Kaposi's sarcoma, hemangioma, cavernous hemangioma, hemangioblastoma, retinoblastoma, glioblastoma, Schwannoma, neuroblastoma, rhabdomyosarcoma, osteogenic sarcoma, leiomyosarcoma, urinary tract carcinoma, abnormal vascular proliferation associated with phakomatoses, edema (such as that associated with brain tumors), Meigs' syndrome, pituitary adenoma, primitive neuroectodermal tumor, medullblastoma, and acoustic neuroma.
In certain embodiments, cancers and conditions associated therewith that are prevented and/or treated in accordance with the present disclosure are breast carcinomas, lung carcinomas, gastric carcinomas, esophageal carcinomas, colorectal carcinomas, liver carcinomas, ovarian carcinomas, thecomas, arrhenoblastomas, cervical carcinomas, endometrial carcinoma, endometrial hyperplasia, endometriosis, fibrosarcomas, choriocarcinoma, head and neck cancer, nasopharyngeal carcinoma, laryngeal carcinomas, hepatoblastoma, Kaposi's sarcoma, melanoma, skin carcinomas, hemangioma, cavernous hemangioma, hemangioblastoma, pancreas carcinomas, retinoblastoma, astrocytoma, glioblastoma, Schwannoma, oligodendroglioma, medulloblastoma, neuroblastomas, rhabdomyosarcoma, osteogenic sarcoma, leiomyosarcomas, urinary tract carcinomas, thyroid carcinomas, Wilm's tumor, renal cell carcinoma, prostate carcinoma, abnormal vascular proliferation associated with phakomatoses, edema (such as that associated with brain tumors), or Meigs' syndrome. In specific embodiment, the cancer an astrocytoma, an oligodendroglioma, a mixture of oligodendroglioma and an astrocytoma elements, an ependymoma, a meningioma, a pituitary adenoma, a primitive neuroectodermal tumor, a medullblastoma, a primary central nervous system (CNS) lymphoma, or a CNS germ cell tumor.
In specific embodiments, the cancer treated in accordance with the present disclosure is an acoustic neuroma, an anaplastic astrocytoma, a glioblastoma multiforme, or a meningioma.
In other specific embodiments, the cancer treated in accordance with the present disclosure is a brain stem glioma, a craniopharyngioma, an ependyoma, a juvenile pilocytic astrocytoma, a medulloblastoma, an optic nerve glioma, primitive neuroectodermal tumor, or a rhabdoid tumor.
In some aspects of the present disclosure, small molecules identified by the screening methods can be formulated for administration to a mammal by intravenous administration, subcutaneous administration, oral administration, inhalation, nasal administration, dermal administration, or ophthalmic administration. In one aspect, small molecules identified by the screening methods can be used to treat a disease or condition that can be treated by modulating RNA splicing of a protein associated with the disease or condition.
In some embodiments, a small molecule identified by the present disclosure has a molecular weight of at most about 2000 Daltons, 1500 Daltons, 1000 Daltons or 900 Daltons. In some embodiments, a small molecule identified by the present disclosure has a molecular weight of at least 100 Daltons, 200 Daltons, 300 Daltons, 400 Daltons or 500 Daltons. In some embodiments, a small molecule identified by the present disclosure does not comprise a phosphodiester linkage.
The small molecules identified in the present disclosure can be used to modulate aberrant splicing caused by mutation in 5′ss, cryptic 5′ss, 3′ss, cryptic 3′ss, ESE, ESS, ISE, and/or ISS. The modulation can include both enhance/activate and prevent/inhibit. In some embodiments, the modulation can be enhancement/activation, wherein the small molecule stabilizes or enhances binding of one polynucleotide or polypeptide binding to a target polynucleotide. For example, small molecules can bind to target mRNAs and therefore promote the binding of additional polynucleotide or polypeptide binding to the target polynucleotide. In some cases, the small molecules can promote the binding of an RNA binding to a target mRNA. In some cases, the small molecule can promote the binding of a protein or portion thereof binding to a target mRNA. In some cases, the small molecules can promote the binding of a protein or a portion thereof binding to a target RNA-RNA duplex. In some cases, the small molecules can promote the binding of a protein-RNA complex (e.g. snRNP) binding to a target mRNA. In some cases, the small molecules can promote the binding of a protein or a portion thereof binding to a target RNA-RNA duplex by changing secondary or tertiary structure or molecular moiety of the target mRNA. For example, small molecules can promote binding of a polynucleotide and/or a polypeptide binding to a target mRNA containing a 5′ss or 3′ss or a portion thereof; thereby facilitating inclusion of the adjacent exon.
In some embodiments, the modulation can be prevention/inhibition, wherein the small molecule destabilizes or prevents one polynucleotide or polypeptide from binding to a target polynucleotide. For example, small molecules can bind to target mRNAs and therefore prevent additional polynucleotide or polypeptide from binding to the target polynucleotide. In some cases, the small molecules can prevent a RNA from binding to a target mRNA. In some cases, the small molecules can prevent a protein or a portion thereof from binding to a target mRNA. In some cases, the small molecules can prevent a protein or a portion thereof from binding to a target RNA-RNA duplex. In some cases, the small molecules can prevent a protein-RNA complex (e.g. snRNP) from binding to a target mRNA. In some cases, the small molecules can promote the binding of a protein or a portion thereof binding to a target RNA-RNA duplex by changing secondary or tertiary structure or molecular moiety of the target mRNA. For example, small molecules can prevent a polynucleotide and/or a polypeptide binding to a target mRNA containing a cryptic 5′ss or cryptic 3′ss or a portion thereof; thereby facilitating inclusion of the adjacent exon. For example, small molecules can prevent a polynucleotide and/or a polypeptide binding to a target mRNA containing an authentic 5′ss or authentic 3′ss or a portion thereof; thereby facilitating the loss of an exon.
The small molecules identified in the present disclosure can be used to treat a disease or condition associated with aberrant splicing in one or more proteins. The small molecules identified in the present disclosure may be used to modulate splicing, for example modulating the amount of RNA transcripts generated. In some embodiments, the small molecules identified in the present disclosure may be used to modulate splicing not related to any mutation in the cis-acting elements.
In some embodiments, a small molecule identified in the present disclosure modulates splicing of a splice site sequence comprising a sequence GGA/gugagu, AGA/gugagu, AGA/gugagu, AGA/gugagu, AGA/gugagu, AGA/gugagu, AGA/gugagc, AGA/gugagu, AGA/gugagu, GGA/gugagu, CGA/guccgu, GGAguaagu, GGA/guaagu, AGA/guaagu, AGA/guaagu, AGA/guaagu, AGA/guaagu, AGA/guaagu, AGA/guaagu, AGA/guaaga, AGA/guaagu, AGA/guaagu, AGA/guaagu, GGA/guaagu, AGA/guaagg, AGA/guaagu, AGA/guaagu, AGA/guaagu, GGA/guaagu, AGA/guaaga, AGA/guaagu, AGA/guaagu, AGA/guaagu, GGA/guaagg, AGA/guaagu, AGA/guaagu, GGA/guaagu, AGA/guaagu, AGA/guaaga, AGA/guaagu, AGA/guagau, UGA/gugaau, GGA/guuagu, AGA/guaggu, AGA/guaggu, GGA/guaggu, or AGA/gugcgu. In some embodiments, a small molecule identified in the present disclosure modulates splicing of a splice site sequence comprising a sequence ACA/gugagg, AAA/auaagu, GAA/ggaagu, GAA/guaaau, GCA/guagga, CAA/gugagu, GUA/gugagu, GAA/guggg, CCA/guaaac, UUA/guaaau, CAA/guaaac, ACA/guaaau, GAA/guaaac, UCA/guaaac, UCA/guaaau, GCA/guaaau, ACA/guaaau, CAA/guaagc, CAA/guaagg, UCA/guaagu, AUA/gugaau, CAA/gugaaa, CCA/gugaga, UCA/gugauu, GAA/gugugu, GAA/uaaguu, CAA/guaugu, AAA/guaugu, CAA/guauuu, ACA/guuagu, GCA/guuagu, or ACA/guuuga. In some embodiments, a small molecule identified in the present disclosure modulates splicing of a splice site sequence comprising a sequence CAA/guaacu, AUA/gucagu, GAA/gucugg, AAA/guacau. In some embodiments, a small molecule identified in the present disclosure modulates splicing of a splice site sequence comprising a sequence NNBgunnnn, NNBhunnnn, or NNBgvnnnn In some embodiments, a small molecule identified in the present disclosure modulates splicing of a splice site sequence comprising a sequence NNBgurrrn, NNBguwwdn, NNBguvmvn, NNBguvbbn, NNBgukddn, NNBgubnbd, NNBhunngn, NNBhurmhd, or NNBgvdnvn. In those embodiments, N (or n) is A, U, G or C; B (or b) is C, G, or U; H (or h) is A, C, or U; d is a, g, or u; m is a or c; r is a or g; v is a, c or g; k is g or u; w is a or u. In some embodiments, a small molecule identified in the present disclosure modulates splicing of a splice site sequence comprising a sequence CAC/gugagc, UCC/gugagc, AGC/gugagu, AGC/gugagu, AGG/gugagg, GUG/gugagc, GAG/gugagg, CCG/gugagg, UUG/gugagc, GUG/gugagu, UUU/gugagc, UUU/gugagc, GAU/gugagg, AGU/gugagu, AGU/gugagu, AGU/gugagu, AGU/gugagu, AGC/guaagu, GGC/guaagu, AAC/guaagu, GGC/guaagu, AGC/guaagg, GGC/guaagu, AGC/guaagu, GGC/guaagu, GGC/guaagu, AGC/guaagu, GAG/guaaga, CAG/guaagu, AGU/guaagc, AAU/guaagc, AAU/guaagg, CCU/guaagc, AGU/guaagu, GGU/guaagu, AGU/guaagu, AGU/guaagu, AGU/guaagu, GAU/guaagu, UCC/gugaau, CCG/gugaau, ACG/gugaac, CUG/gugaau, AGG/gugaau, UUG/gugaau, CCG/gugaau, GAG/gugaag, CCU/gugaau, CGU/gugaau, CCU/gugaau, GAG/guagga, CAU/guaggg, UGG/guggau, CAG/guggau, UGG/guggau, CGG/gugggu, GCG/guggga, UGG/guggggg, UGG/gugggug, CGU/gugggu, AUC/gguaaaa, GGG/guaaau, GCG/guaaaa, CAG/guaaag, UGG/guaaag, AAG/guaaag, AAG/guaaau, CAG/guaaag, UAG/guaaag, UUG/guaaag, GAG/guaaag, CAG/guaaag, AUG/guaaaa, AAG/guaaag, CAG/guaaag, CAG/guaaaa, GAG/guaaag, AAG/guaaag, UGU/guaaau, GUU/guaaau, GUU/guaaau, UCU/guaaau, GCU/guaaau, GAU/guaaau, GCU/guaaau, UCU/guaaau, ACU/guaaau, CCU/guaaau, CCU/guaaau, ACU/guaaau, AAU/guaaau, AGG/guagac, UUG/guagau, CAG/guagag, AAG/guagag, AAU/gugagu, CAG/gugagc, AAG/gugggu, AAG/guaggg, CAG/guaggc, or AGC/guaggu. In some embodiments, a small molecule identified in the present disclosure modulates splicing of a splice site sequence comprising a sequence CAG/guaau, CAG/guaaugu, CAG/guaaugu, CAG/guaaugu, CAG/guaaugu, GAG/guaauac, GAG/guaauau, GAG/guaaugu, AAG/guaauaa, AAG/guaaugu, AAG/guaaugu, AAG/guaaugua, AAG/guaaugu, AAG/guaaugu, GCU/guaauu, CCU/guaauu, GAU/guaauu, CAU/guaauu, AAU/guaauu, AGG/guauau, CAG/guauau, UAG/guauau, CAG/guauau, CGG/guauau, GAG/guauau, CGG/guauau, CAG/guauag, AAG/guauau, CAG/guauag, AAG/guauac, UAG/guauau, CAG/guauag, CAG/guauau, AAG/guuaag, AUC/guuaga, GCG/guuagu, AAG/guuagc, UGG/guuagu, GCG/guuagu, CUG/guuugu, CUG/guauga, CAG/guauga, UAG/guauga, AAG/guaugg, AAG/guauga, GAG/guaugg, CAG/guauga, CAG/guaugg, AAG/guaugg, UGG/guaugc, CAG/guaugu, AUG/guaugu, AAG/guaugu, AAG/guaugg, CAG/guaugg, GAG/guauga, CGG/guaugg, AAU/guaugu, AAG/guauuu, AUG/guauuu, UAG/guauug, AAG/guauuu, CAG/guauug, CAG/guauug, CAU/guauuu, ACU/guauu, AAG/guuuau, AAG/guuuaa, CAG/guuugg, CAG/guuugg, CAG/guuugc, AAG/guuugg, AAG/guuugg, or UGG/guaugc. In some embodiments, a small molecule identified in the present disclosure modulates splicing of a splice site sequence comprising a sequence CCG/guaacu, UUG/guaaca, AUG/guaacc, GGG/guaacu, AAG/guaaca, AAG/guaacu, UUG/guaaca, GCU/guaacu, ACU/guaacu, GCU/guaacu, UAG/guaccc, AAG/guaccu, CAG/guaccg, UGG/guacca, CAG/gucaau, AAG/gucaau, AAG/gucaag, AUG/guacau, GGG/guacau, UUG/guacau, CAG/guacag, CAG/guacag, CAG/guacag, CAG/guacag, AAG/guacag, CAG/guacag, GAG/guacaa, AAG/guacag, CAG/guacaa, UGU/guacau, CAG/gugcac, GGG/gugcau, CUG/gugcau, UAG/gugcau, CAG/gugcag, CAG/gugcag, AGG/gugcaa, AAC/gugacu, UCC/gugacu, CCG/gugacu, GCG/gugacu, GGG/gugacg, GGG/gugacg, GCG/gugacu, AUG/gugacc, GAU/gugacu, GGC/gucagu, or UAG/gucaga. In some embodiments, a small molecule identified in the present disclosure modulates splicing of a splice site sequence comprising a sequence AAG/guacgg, AAG/guacgg, AAG/guacug, AAG/guagcg, AAG/guagua, AAG/guagua, AAG/guagua, AAG/guagug, AAG/guauca, AAG/guaucg, AAG/guaucu, AAG/gucucu, AAG/gugccu, AAG/guggua, AAG/guguua, ACG/guagcu, AGC/guacgu, CAG/guacug, CAG/guagua, CAG/guagug, CAG/guagug, CAG/guaucc, CAG/gugcgc, or GAG/gugccu. In some embodiments, a small molecule identified in the present disclosure modulates splicing of a splice site sequence comprising a sequence CGG/guguau, AAG/guguau, GAG/guguac, CAG/guguau, UAG/guguau, CAG/guguag, GAG/guguau, AAG/gugugc, CAG/guguga, AAG/gugugu, CAG/guguga, CAG/gugugu, UGG/gugugg, CUG/guguga, CGG/gugugu, GAG/gugugc, CAG/guguga, AAU/gugugu, CAG/gugugu, CAG/gugugu, GAG/gugugu, CAG/guuguu, CAG/guuguc, GUG/guugua, CAG/guuguu, AAC/gugauu, CAG/gugaua, AGG/gugauc, GUG/gugauc, CCU/gugauu, GAU/gugauu, CAC/guuggu, CAG/guuggc, AAG/guuagc, or CAG/guugau. In some embodiments, a small molecule identified in the present disclosure modulates splicing of a splice site sequence comprising a sequence AUG/gucauu, CGG/gucauaauc, AAG/gucugu, AAG/gucuggg, CAG/gucugga, CAG/gucuggu, CAG/gucuga, GAG/gucuggu, AAG/gugucu, AAG/gugucu, AGG/gugucu, CUG/gugcuu, CAG/gucuuu, CAG/guugcu, GAG/gugcug, or CAG/gugcug. In some embodiments, a small molecule identified in the present disclosure modulates splicing of a splice site sequence comprising a sequence CGC/auaagu, UUC/auaagu, UGG/auaagg, ACG/auaagg, GUU/auaagu, CCU/auaagu, UUU/auaagc, GAG/aucugg, AAC/augagga, GAC/augagg, ACC/augagu, GGG/augagu, AAG/augagc, CAG/augagg, GAG/augagg, GCG/augagu, AAG/gaugag, CCU/augagu, GAU/augagu, GAU/augagu, UAG/augcgu, CAG/auuggu, AAG/auuugu, ACG/cuaagc, CAG/cugugu, CUG/uuaag, GAG/uuaagu, AAG/uuaagg, AUU/uuaagc, CUG/uugaga, CAG/uuuggu, or GGG/auaagu. In some embodiments, a small molecule identified in the present disclosure modulates splicing of a splice site sequence comprising a sequence CAG/auaacu, GAG/cugcag, or AAG/uuaaua. In some embodiments, a small molecule identified in the present disclosure modulates splicing of a splice site sequence comprising a sequence GCG/gagagu, AAG/ggaaaa, AUC/gguaaaa, AAG/gcaaaa, UGU/gcaagu, GAG/gcaggu, GAG/gcgugg, GAG/gcuccc, CAG/gcuggu, or AAG/gaugag.
Exemplary small molecules that could be identified by the present disclosure are summarized in Table 3.
The example provides an exemplary experimental plan using the methods provided herein to identify a binding agent binding to a target RNA. The experiment comprises the following steps:
Step 1 can include RNA duplex formation and NMR screening. NMR spectra with and without small molecule can be compared to determine whether the small molecule binds to the RNA duplex. In order to identify splicing modifiers of the target genes described herein, a library of compounds can be tested for their ability to bind the RNA duplex. In this case, a 2D 1H—1H TOCSY fingerprint of the free RNA duplex will be recorded and compared with the same fingerprint after addition of the candidate molecules. By comparing these two fingerprint spectra, one could quickly notice whether they show difference or not. If the addition of the candidate molecule induced changes of the chemical shifts of the RNA, this will support a direct interaction between the molecule and the RNA duplex. From comparing the chemical shifts and fingerprints from the two different spectra, we can determine and identify small molecules that bind to the RNA duplex or do not bind to the RNA duplex.
Step 2 can include binding specificity and effect of U1-C zinc finger domain. The screening will be based on the comparison between the free RNA and after addition of the small molecule. RNA duplex binders will be selected for further investigations. First, the strength of the interaction can be determined. By performing a titration of the RNA by the small molecule of interest, one can determine the strength of the interaction. Second, the specificity of the interaction can be determined, because the small molecule of interest can be tested against several different RNA duplexes, one can test the specificity of the identified interaction by testing the hit molecule on other RNA duplexes. Thirdly, the specificity and unique binding position of the small molecules binders on the RNA duplexes can be elucidated by comparing various RNA binders with each other. Finally, the zinc finger of U1-C can be added in the assay and offer the possibility to test how it influences or competes with the interaction of the RNA duplex—small molecule.
Step 3 can include NMR structure determination of RNA duplex—small molecule complex. The most promising small molecule—RNA duplex will be selected for structure determination using solution state NMR. In order to solve the structure of such a complex, access to high magnetic field NMR spectrometer is crucial to perform the resonance assignment but also to identify NOE-derived distances to drive structure calculations. NMR 900 MHz spectrometer or higher may be required to be used to collect data in order to solve the structure of such complex.
This example provides a method to use an mRNA fragment containing an exon-intron boundary with up to 200 nucleotides in length. In some experiments, the mRNA will not be labeled. 1H spectrum will be obtained for unlabeled targets. In some other cases, the exonic/intronic nucleotides involved in the 8-12 nucleotides of the 5′ss sequence can be isotopically labeled for measurement with the NMR. This can enable us to preserve secondary structure of the mRNA while not losing any of the resolution of the experiment and the ability to determine compound binding with the rest of the sequence. The duplex RNA between the 5′-end of U1 (5′-AUACψψACCUG-3′) and the 5′ss of the various targets (see Tables 1-2) can be formed by adding the U1 snRNA and the 5′ss in about equimolar amounts in NMR buffering. The experiment comprises the following steps: 1) Optionally, radiolabeling a section of the mRNA sequence in this case the 5′ss while the larger region of mRNA sequence remains unlabeled (but provides for 2-D/3-D structural sophistication); 2) obtaining a NMR spectrum of the polynucleotide sample, e.g. duplex RNA, using a NMR device; 3) introducing the U1 protein and then the small molecule of interests to determine a chemical shift of one or more atoms of the 5′ss duplex with snRNA; 4) measuring chemical shift changes upon the addition of the U1 protein indicating that the mRNA may be interacting with the U1 protein or not; 5) measuring chemical shift changes upon the addition of the small molecule and the U1 protein indicating that the mRNA may be interacting with the small molecule and protein differently from the addition of the U1 protein alone; and 6) collecting the chemical shifts in the presence of the U1 protein and/or the small molecule. The chemical shifts can be used to determine the bimolecular structure of the mRNA and the bound small molecule. From the NMR spectra, a 2-D or 3-D atomic resolution of the structure of the 5′ss and the small molecule can be computationally modeled. A plurality of secondary structure predictions can be computed using a secondary structure prediction algorithm (e.g., nearest neighbor algorithm) or computer program. The MC-Fold|MC-Sym pipeline is a web-hosted service for RNA secondary and tertiary structure prediction. The pipeline means that the input sequence to MC-Fold outputs secondary structures that are directly inputted to MC-Sym, which outputs tertiary structures.
This example provides exemplary experimental procedure for NMR preparation of RNA and RNA-compound complex samples. RNA for survival of motor neuron (SMN) protein is used as an example here. SMN 5′ss RNA (5′-GGAGUAAGUCU), U1 snRNA (5′-GAUACUUACCUG) and SMN ssRNA/U1 snRNP-linked RNA (5′-GGAGUAAGUCU-GAUACUUACCUG) can be synthesized by TriLink BioTechnologies or Integrated DNA Technologies. The dsRNA can be prepared by mixing equimolar concentrations of SMN ssRNA and U1 snRNA in NMR buffer (20 mM potassium phosphate, pH 6.2, 100 mM KCl and 0.1 mM EDTA). Different RNA-RNA duplex can be used for this experiment and there are examples in
NMR experiments can be performed on AVANCE III 600 MHz or 800 MHz spectrometers (Bruker). The sample temperature can be 20° C. for binding experiments with the dsRNA and 5-37° C. for structure determination experiments including 1D 1H, and 2-D COSY and TOCSY with RNA-11 and RNA-12. The model was assembled from a data set that included analysis of TOCSY spectra.
NMR spectra can be acquired at 303 K and 313 K for RNA-protein complexes or 313 K for all other protein complexes on Bruker Avance III 500, 600, 700 or 900 MHz spectrometers equipped with cryoprobes and on a Bruker Avance III 750 MHz spectrometer with a room temperature probe. Spectra can be processed with Topspin 2.1 or Topspin 3.0 and analyzed in Sparky 3.0. 1H, 13C and 15N assignments of RNA and protein can be achieved by standard methods in the art. For modeling of the RNA-protein complex, intramolecular distance restraints derived from HHC- and HHN-3D-NOESY experiments as well as residual dipolar couplings measured for backbone amides and RNA-C1′-H1′, C5-H5, C6-H6, C8-H8 and C2-H2 bonds can be used. Intermolecular distance restraints can be extracted from 3-D 13C—F1-edited, F3-filtered-NOESY-HSQCs and 2-D 1H—1H F1—13C-filtered, F2—13C-edited NOESY spectra recorded on complexes reconstituted either from 13C15N-labeled protein and unlabeled RNA or from 15N-labeled protein and 13C15N-labeled RNA.
This example provides exemplary modeling strategy. Modeling of RNA-protein complex can be implemented with a combination of different software classically required for structure prediction and determination of protein-RNA complexes. The Atnos/Candid-program suite and artificial RRM NOESY matrices can be used to generate peak lists corresponding to intramolecular NOESY patterns typical for the RRM fold. CYANA 3.0 and more particularly the CYANA noeassign command can be used to integrate distance and angle restraints and to calculate models. For modeling, CUR-MS/MS-data can be inserted as ambiguous distance restraints because crosslinking sites define various distances between base rings of nucleic acids and side chains of amino acids, respectively. Intramolecular restraints can be derived from published protein structures in RCSB Protein Data Bank (PDB) and RNA structures predicted by MC-FOLD and MC-SYM. Additional specific protein-RNA contacts extracted from available complex structures can be integrated as unambiguous distance restraints. For all models, about 200 structures per cycle can be calculated and about 20 of lowest energy can be selected as a starting ensemble for the next cycle. For modeling RNA-protein complexes, the CYANA noeassign calculation can be initiated with the average protein-RNA complex structure from PDB in cycle 1 excluding the RNA moiety. The final 20 lowest energy models obtained with CYANA noeassign can be refined with the amber 12 force field to avoid steric clashes and to improve electrostatic and hydrophobic protein-RNA contacts.
This example shows binding kinetics by SPR analysis of U1 snRNP binding to RNA. Biotinylated RNAs (5′-biotinTEG/UCUAAGGCGUAAGUCUGCCAG-3′, and 5′-biotinTEG/UCUAAGCAGUAAGUCUGCCAG-3′) can be synthesized by Integrated DNA Technologies. Initial SPR studies with compound only in the association phase can be performed on a Biacore T100 at 25° C. RNA will be diluted into SPR buffer (38 mM HEPES, pH 7.6, 60 mM KCl, 0.12 mM EDTA, 3.2 MgCl2, 0.05% P20), heated to 90° C., slowly cooled to room temperature and centrifuged for 10 min at 14,000 g, and a target level of 110 relative units (RU) will be captured onto a streptavidin-coated SA chip (GE Healthcare). U1 snRNP will be diluted 1:50 with SPR buffer containing either DMSO or compound. Final DMSO concentration will be 0.5%, and the running buffer will be adjusted to the same percentage. The surface will be regenerated with 1 M NaCl, 10 mM NaOH. Co-injection experiments will be performed under the same buffer conditions on a ProteOn XPR36 at 25° C. using a NLC chip (Bio-Rad) with a minimum of 25 RUs of target RNA loaded on the surface. The ProteOn's co-inject function allowed testing of NVS-SM2 or DMSO in both the association and dissociation phases. Dissociation rate constants are independent of analyte concentration and can be measured using the ProteOn software from two duplicate injections. All data will be double referenced to a protein-only surface as well as a buffer injection, and a DMSO correction for excluded volume will be performed.
The example shows binding kinetics by SPR analysis of U1 snRNA binding to RNA. SPR studies will be performed on a ProteOn XPR36 at 20° C. using a NLC chip (BioRad) with a minimum of 300 RUs of target RNA loaded on the surface. U1 snRNA (5′-AUACUUACCUG-3′) will be diluted to 1 μM with SPR buffer containing either DMSO or compound. The co-inject feature will be used so that the association and dissociation phases contained either DMSO or compound. Surface regeneration and referencing will be performed as above Example 5.
The small molecule of interest disclosed herein can be tested in cell-based assay for efficiency measurement, for example, IC50. To measure cell viability, cells were plated in 96-well plastic tissue culture plates at a density of 5×103 cells/well. Twenty-four hours after plating, cells were treated with RG-11-1 compound. After 72 hours, the cell culture media was removed and plates were stained with 100 mL/well of a solution containing 0.5% crystal violet and 25% methanol, rinsed with deionized water, dried overnight, and resuspended in 100 ml citrate buffer (0.1 M sodium citrate in 50% ethanol) to assess plating efficiency. Intensity of crystal violet staining, assessed at :570 nm and quantified using a Vmax Kinetic Microplate Reader and Softmax software (Molecular Devices Corp., Menlo Park, Calif.), was directly proportional to cell number. Data were normalized to vehicle-treated cells and are presented in
For example, the disclosed methods can be used to select small molecule binding agents for modulating splicing of mRNA expressed from FOXM1 gene. The exemplary small molecules can target 5′ss of FOXM1 mRNA (5′ss of exon 9). They may also target some other elements of mRNA or target other mRNA for other genes. Exemplary structures are summarized herein:
In one aspect, a compound that could be identified by the present disclosed methods has the structure of Formula (I), or a pharmaceutically acceptable salt or solvate thereof:
In another aspect, a compound that could be identified by the present disclosed methods has the structure of Formula (II), or a pharmaceutically acceptable salt or solvate thereof:
In some embodiments, a compound that could be identified herein has the structure of Formula (III), or a pharmaceutically acceptable salt or solvate thereof:
In another aspect, a compound that could be identified herein has the structure of Formula (IV), or a pharmaceutically acceptable salt or solvate thereof:
In one aspect, a compound that could be identified herein has the structure of Formula (V), or a pharmaceutically acceptable salt or solvate thereof:
In another aspect, a compound that could be identified herein has the structure of Formula (VI), or a pharmaceutically acceptable salt or solvate thereof:
In another aspect, a compound that could be identified herein has the structure of Formula (VII), or a pharmaceutically acceptable salt or solvate thereof:
In another aspect, a compound that could be identified herein that has the structure of Formula (VIII), or a pharmaceutically acceptable salt or solvate thereof:
In one aspect, a compound that could be identified herein has the structure of Formula (IX), or a pharmaceutically acceptable salt or solvate thereof:
In one aspect, described herein is a compound that has the structure of Formula (X), or a pharmaceutically acceptable salt or solvate thereof:
each R1 is independently H, substituted or unsubstituted C1-C6alkyl, substituted or unsubstituted C1-C6fluoroalkyl, substituted or unsubstituted C1-C6heteroalkyl, substituted or unsubstituted phenyl, or substituted or unsubstituted heteroaryl;
In one aspect, a compound that could be identified herein has the structure of Formula (XI), or a pharmaceutically acceptable salt or solvate thereof:
In one aspect, a compound that could be identified herein has the structure of Formula (XII), or a pharmaceutically acceptable salt or solvate thereof:
In another aspect, a compound that could be identified herein has the structure of Formula (XIII), or a pharmaceutically acceptable salt or solvate thereof:
In one aspect, a compound that could be identified herein has the structure of Formula (XIV), or a pharmaceutically acceptable salt or solvate thereof:
In another aspect, a compound that could be identified herein has the structure of Formula (XV), or a pharmaceutically acceptable salt or solvate thereof:
In one aspect, a compound that could be identified herein has the structure of Formula (XVI), or a pharmaceutically acceptable salt or solvate thereof:
In one aspect, a compound that could be identified herein has the structure of Formula (XVII), or a pharmaceutically acceptable salt or solvate thereof:
In another aspect, a compound that could be identified herein has the structure of Formula (XVIII), or a pharmaceutically acceptable salt or solvate thereof:
To develop or screen for new SMN2 splicing modifiers, the molecular basis for SMN2 specific splicing correction mediated by Compound A were investigated. The ability of the splicing modifier Compound A to bind to the RNA duplex formed by the 5′-end of U1 snRNA and the 5′-splice site of SMN2 exon 7 was first verified. Then, the solution structure of the complex Compound A-RNA duplex was solved by means of solution state NMR spectroscopy. By comparing to the solution structures of the free RNA duplex and in complex with the splicing modifier, the mechanism of action of Compound A was determined. Compound A interacts with the RNA duplex at the level of the exon-intron in the major groove and pulls the unpaired adenine into the RNA helix base stack. The splicing modifier transforms the weak 5′-splice site of SMN2 exon 7 into a stronger one. The structure of the complex revealed that Compound A repairs the bulge at position -1 to correct the splicing of SMN2 exon 7.
Spinal Muscular Atrophy (SMA) is an autosomal recessive neuromuscular disease that represents the leading genetic cause of infant mortality. The disorder can be characterized by progressive degeneration of motor neurons from the spinal cord and brain stem, resulting in muscle weakness and atrophy. SMA is caused by the genetic homozygous inactivation of the survival of motor neuron-1 gene (SMN1), the main source of SMN protein that is a ubiquitously expressed and involved in multiple cellular processes. Although a paralog gene SMN2 is found in the human genome, it differs by several silent mutations (including the C6T mutation in exon 7) that mainly triggers the production of a different mRNA isoform lacking exon 7 and encoding for an unstable protein. Reduced amount of functional SMN protein can impair motor neuron functions, however, the exact mechanism remains unclear. As SMN2 still produces small amounts of functional SMN protein (˜20%) but not enough to compensate the loss of SMN1, all SMA patients have at least one copy of the SMN2 gene and the severity of the disease inversely correlates with the SMN2 gene copy number. Recently, splicing modifiers that promote SMN2 E7 inclusion have been discovered. They can increase the production of functional SMN protein and the survival of SMA-model mice. The splicing modifiers can act at the pre-mRNA splicing level with a high specificity for the SMN2 E7 and may favor the early steps of spliceosome assembly by stabilizing a specific enhancer complex at the 5′-SS E7. To deeply understand how the splicing correction is driven at the atomic level and to develop new therapeutic molecules, the molecular mechanisms of the SMN2 splicing correction mediated by Compound A were investigated.
Compound A Binds the RNA Duplex Formed by the U1 snRNA 5′-End and the 5′-Splice Site of SMN2 Exon 7.
Compound A acts at the pre-mRNA level and should favor a splicing enhancer complex at the 5′-splice site of SMN2 exon 7. To evaluate the binding of Compound A on the RNA duplex upon spliceosome assembly, in vitro binding assays were performed by means of solution state NMR. The RNA duplex was prepared at 250 μM in MES d-8 5 mM pH 5.5, NaCl 50 mM and references spectra (1D 1H and 2D 1H—1H TOCSY) were recorded on the 600 MHz AVIII HD spectrometer equipped with a cryo-probed. Compound A was then dissolved in the same buffer was added to the RNA sample. Upon addition of the splicing modifier, the resonances of the RNA experienced chemical shift changed, in line with a direct interaction between both partners (
To obtain structural insights into the specific splicing correction induced by Compound A, the solution structure of the RNA duplex bound to Compound A was investigated. As a first step, the proton resonances of the Compound A were assigned (
The solution structure of the Compound A-RNA duplex complex was solved using 316 intramolecular distances for the RNA duplex, 18 constraints to maintain the base pairing, 146 angular restraints to ensure the ribose puckers and 30 intermolecular NOEs. The structure of the RNA was computed using a semi-automated approach for the RNA part using CYANA NOEASSIGN that analyzed the NMR data based on the chemical shift provided and coupled this interpretation to torsion angle simulated annealing. The program performs seven cycles of NOE assignment, calibration, structure calculation and evaluation of the agreement between the structure and the experimental data. The output from the automatic structure calculation was then combined with manually integrated intermolecular NOE-derived distances to calculate the structure of the complex still in the torsion-angle space. Once low target function was achieved, the structure was refined in by simulated annealing in the Cartesian space using the SANDER module of AMBER12. This structure was then utilized to develop and screen for new SMN2 splicing modifiers.
By solving the solution structure of the Compound A splicing modifier bound to the RNA duplex formed upon recognition of the 5′-splice site of SMN2 exon 7 and U1 snRNP, it as determined found that Compound A stabilizes the unpaired adenine at the exon-intron junction into the RNA helix base stack. The conformational switch of the adenine mimics a strong 5′-splice site and induces the specific splicing correction. The atomic details of the Compound A binding pocket exemplefy the ability to rationally design new splicing modifiers to SMN2 and other targets.
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
This application is a U.S. National Phase Application under 35 U.S.C. § 371 of International Application No. PCT/US2018/052743, filed Nov. 7, 2018, which claims priority to U.S. Provisional Patent Application No. 62/562,941, filed Sep. 25, 2017, which is incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US18/52743 | 9/25/2018 | WO |
Number | Date | Country | |
---|---|---|---|
62562941 | Sep 2017 | US |