The Sequence Listing written in file 048536.txt, created 2022, bytes, machine format IBM-PC, MS Windows operating system, is hereby incorporated by reference.
Interactions between glycans and proteins play important biological roles in living systems. Proteins interact with glycans of glycoproteins, glycolipids, and polysaccharides presented on the cell surface to influence biological activity and recognition. (Ref. 1). Protein-glycan interactions are involved in a broad range of biology processes, such as cell-cell communication, organism development, tumor cell metastasis, bacteria and virus invasion, and immune response. (Refs. 2-4). Despite a central role for molecular encounters, protein-glycan interactions are challenging to study due to their dynamic nature, transient interaction, and often large number of interacting partners involved. (Ref 5). Glycan structure is not genetically encoded, making it not amenable to common genetic techniques and difficult to achieve monosaccharide specificity. Another salient feature adding to the difficulty is the generally low affinity of the single protein-glycan interaction, with equilibrium dissociation constant Kd most often in the millimolar and some in micromolar range. (Refs. 1, 6). Thus it has been extremely difficult to generate high affinity protein binders for glycans. In particular, many cancer cells overexpress distinct surface glycans and pathogenic microbes are covered with glycans not found in eukaryotic cells, but anti-glycan antibodies with high affinity and specificity remain lacking. (Refs. 4, 7, 8). Therefore, to facilitate basic research of glycobiology and to exploit glycan-based diagnosis and therapy, a general approach is highly desired to increase the affinity of proteins for glycans and to stabilize their transient interactions specifically. The ability to covalently cross-link proteins with glycans under mild cellular and in vivo settings would offer a unique solution to these challenges. The present disclosure is directed to these, as well as other, important ends.
Provided herein are compounds of Formula (I), where the substituents are as defined herein:
Provided herein are biomolecules comprising an unnatural amino acid, where the unnatural amino comprises a side chain of Formula (II), where the substituents are as defined herein:
In embodiments, the biomolecule is a lipid, RNA, or glycan. In embodiments, the biomolecule is a protein. In embodiments, the biomolecule is a RNA-binding protein. In embodiments, the biomolecule is a glycan-binding protein. In embodiments, the protein is sialic acid binding Ig like lectin (Siglec) or a sialoglycan binding V-set domain of sialic acid binding Ig like lectin (Siglec).
Provided herein are biomolecule conjugate of Formula (III), where the substituents are as defined herein:
In embodiments, R2 is a protein, a lipid, RNA, or a glycan, and R3 is a protein, a lipid, RNA, or a glycan. In embodiments, R2 is a protein, and R3 is a protein. In embodiments, R2 is a protein, and R3 is a RNA. In embodiments, R2 is a protein, and R3 is a mRNA. In embodiments, R2 is a RNA-binding protein, and R3 is a RNA. In embodiments. R2 is a protein, a lipid, or RNA, and R3 is a glycan. In embodiments, R2 is a protein, and R3 is a glycan. In embodiments, R2 is a glycan-binding protein, and R3 is a glycan. In embodiments, R2 comprises Siglec or a sialoglycan binding V-set domain of Siglec, and R3 comprises a sialoglycan. In embodiments, R3 is bonded to —S(O2)— via a ribose moiety.
Provided herein are methods of treating cancer in a patient by administering to the patient an effective amount of a protein comprising an unnatural amino acid, wherein the unnatural amino comprises a side chain of Formula (II), and wherein the protein comprises Siglec or a sialoglycan binding V-set domain of Siglec. In embodiments, the cancer has elevated levels of sialoglycan.
These and other embodiments of the disclosure are provided in detail herein.
Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by a person of ordinary skill in the art. See, e.g., Singleton et al., Dictionary of Microbiology and Molecular Biology, 2nd ed., J. Wiley & Sons (New York, NY 1994); Sambrook et al., Molecular Cloning, A Laboratory Manual, Cold Springs Harbor Press (Cold Springs Harbor, NY 1989). Any methods, devices and materials similar or equivalent to those described herein can be used in the practice of this disclosure. The following definitions are provided to facilitate understanding of certain terms used frequently herein and are not meant to limit the scope of the present disclosure.
“Nucleic acid” refers to nucleotides (e.g., deoxyribonucleotides or ribonucleotides) and polymers thereof in either single-, double- or multiple-stranded form, or complements thereof. The terms “polynucleotide,” “oligonucleotide,” “oligo” or the like refer, in the usual and customary sense, to a linear sequence of nucleotides. The term “nucleotide” refers, in the usual and customary sense, to a single unit of a polynucleotide, i.e., a monomer. Nucleotides can be ribonucleotides, deoxyribonucleotides, or modified versions thereof. Examples of polynucleotides contemplated herein include single and double stranded DNA, single and double stranded RNA, and hybrid molecules having mixtures of single and double stranded DNA and RNA. Examples of nucleic acid, e.g., polynucleotides contemplated herein include any types of RNA, e.g., mRNA, siRNA, miRNA, and guide RNA and any types of DNA, genomic DNA, plasmid DNA, and minicircle DNA, and any fragments thereof. The term “duplex” in the context of polynucleotides refers, in the usual and customary sense, to double strandedness. Nucleic acids can be linear or branched. For example, nucleic acids can be a linear chain of nucleotides or the nucleic acids can be branched, e.g., such that the nucleic acids comprise one or more arms or branches of nucleotides. Optionally, the branched nucleic acids are repetitively branched to form higher ordered structures such as dendrimers and the like.
Nucleic acids, including e.g., nucleic acids with a phosphothioate backbone, can include one or more reactive moieties. As used herein, the term reactive moiety includes any group capable of reacting with another molecule, e.g., a nucleic acid or polypeptide through covalent, non-covalent or other interactions. By way of example, the nucleic acid can include an amino acid reactive moiety that reacts with an amino acid on a protein or polypeptide through a covalent, non-covalent or other interaction.
The terms also encompass nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, which have similar binding properties as the reference nucleic acid, and which are metabolized in a manner similar to the reference nucleotides. Examples of such analogs include, without limitation, phosphodiester derivatives including, e.g., phosphoramidate, phosphorodiamidate, phosphorothioate (also known as phosphorothioate having double bonded sulfur replacing oxygen in the phosphate), phosphorodithioate, phosphonocarboxylic acids, phosphonocarboxylates, phosphonoacetic acid, phosphonoformic acid, methyl phosphonate, boron phosphonate, or O-methylphosphoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A Practical Approach, Oxford University Press) as well as modifications to the nucleotide bases such as in 5-methyl cytidine or pseudouridine and peptide nucleic acid backbones and linkages. Other analog nucleic acids include those with positive backbones: non-ionic backbones, modified sugars, and non-ribose backbones (e.g., phosphorodiamidate morpholino oligos or locked nucleic acids (LNA) as known in the art), including those described in U.S. Pat. Nos. 5,235,033 and 5,034,506, and Chapters 6 and 7, ASC Symposium Series 580, Glycan Modifications in Antisense Research, Sanghui & Cook, eds. Nucleic acids containing one or more carbocyclic sugars are also included within one definition of nucleic acids. Modifications of the ribose-phosphate backbone may be done for a variety of reasons, e.g., to increase the stability and half-life of such molecules in physiological environments or as probes on a biochip. Mixtures of naturally occurring nucleic acids and analogs can be made; alternatively, mixtures of different nucleic acid analogs, and mixtures of naturally occurring nucleic acids and analogs may be made. In embodiments, the internucleotide linkages in DNA are phosphodiester, phosphodiester derivatives, or a combination of both.
Nucleic acids can include nonspecific sequences. As used herein, the term “nonspecific sequence” refers to a nucleic acid sequence that contains a series of residues that are not designed to be complementary to or are only partially complementary to any other nucleic acid sequence. By way of example, a nonspecific nucleic acid sequence is a sequence of nucleic acid residues that does not function as an inhibitory nucleic acid when contacted with a cell or organism.
A polynucleotide is typically composed of a specific sequence of four nucleotide bases: adenine (A); cytosine (C); guanine (G); and thymine (T) (uracil (U) for thymine (T) when the polynucleotide is RNA). Thus, the term “polynucleotide sequence” is the alphabetical representation of a polynucleotide molecule; alternatively, the term may be applied to the polynucleotide molecule itself. This alphabetical representation can be input into databases in a computer having a central processing unit and used for bioinformatics applications such as functional genomics and homology searching. Polynucleotides may optionally include one or more non-standard nucleotide(s), nucleotide analog(s) and/or modified nucleotides.
The term “complement,” as used herein, refers to a nucleotide (e.g., RNA or DNA) or a sequence of nucleotides capable of base pairing with a complementary nucleotide or sequence of nucleotides. As described herein and commonly known in the art the complementary (matching) nucleotide of adenosine is thymidine and the complementary (matching) nucleotide of guanidine is cytosine. Thus, a complement may include a sequence of nucleotides that base pair with corresponding complementary nucleotides of a second nucleic acid sequence. The nucleotides of a complement may partially or completely match the nucleotides of the second nucleic acid sequence. Where the nucleotides of the complement completely match each nucleotide of the second nucleic acid sequence, the complement forms base pairs with each nucleotide of the second nucleic acid sequence. Where the nucleotides of the complement partially match the nucleotides of the second nucleic acid sequence only some of the nucleotides of the complement form base pairs with nucleotides of the second nucleic acid sequence. Examples of complementary sequences include coding and a non-coding sequences, wherein the non-coding sequence contains complementary nucleotides to the coding sequence and thus forms the complement of the coding sequence. A further example of complementary sequences are sense and antisense sequences, wherein the sense sequence contains complementary nucleotides to the antisense sequence and thus forms the complement of the antisense sequence.
As described herein the complementarity of sequences may be partial, in which only some of the nucleic acids match according to base pairing, or complete, where all the nucleic acids match according to base pairing. Thus, two sequences that are complementary to each other, may have a specified percentage of nucleotides that are the same (i.e., about 60% identity, preferably 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or higher identity over a specified region).
The term “amino acid” refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids. Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, γ-carboxyglutamate, and O-phosphoserine. Amino acid analogs refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., an α carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid. Amino acid mimetics refers to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that functions in a manner similar to a naturally occurring amino acid. The terms “non-naturally occurring amino acid” and “unnatural amino acid” refer to amino acid analogs, synthetic amino acids, and amino acid mimetics which are not found in nature.
Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes.
The term “amino acid side chain” refers to the functional substituent contained on amino acids. For example, an amino acid side chain may be the side chain of a naturally occurring amino acid. Naturally occurring amino acids are those encoded by the genetic code (e.g., alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, or valine), as well as those amino acids that are later modified, e.g., hydroxyproline, γ-carboxyglutamate, and O-phosphoserine. In embodiments, the amino acid side chain is a non-natural amino acid side chain. In embodiments, the amino acid side chain is H,
The term “non-natural amino acid side chain” or “unnatural amino acid side chain” refers to the functional substituent of compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., an α carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium, allylalanine, 2-aminoisobutryric acid. Non-natural amino acids are non-proteinogenic amino acids that either occur naturally or are chemically synthesized. Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid. Non-limiting examples include exo-cis-3-aminobicyclo[2.2.1]hept-5-ene-2-carboxylic acid hydrochloride, cis-2-aminocycloheptane-carboxylic acid hydrochloride, cis-6-amino-3-cyclohexene-1-carboxylic acid hydrochloride, cis-2-amino-2-methylcyclohexanecarboxylic acid hydrochloride, cis-2-amino-2-methylcyclopentane-carboxylic acid hydrochloride, 2-(Boc-aminomethyl)benzoic acid, 2-(Boc-amino)octanedioic acid, Boc-4,5-dehydro-Leu-OH (dicyclohexylammonium), Boc-4-(Fmoc-amino)-L-phenylalanine, Boc-β-Homopyr-OH, Boc-(2-indanyl)-Gly-OH, 4-Boc-3-morpholineacetic acid, 4-Boc-3-morpholine acetic acid, Boc-pentafluoro-D-phenylalanine, Boc-pentafluoro-L-phenylalanine, Boc-Phe(2-Br)—OH, Boc-Phe(4-Br)—OH, Boc-D-Phe(4-Br)—OH, Boc-D-Phe(3-Cl)—OH, Boc-Phe(4-NH2)-OH, Boc-Phe(3-NO2)-OH, Boc-Phe(3,5-F2)-OH, 2-(4-Boc-piperazino)-2-(3,4-dimethoxy-phenyl)acetic acid purum, 2-(4-Boc-piperazino)-2-(2-fluorophenyl)acetic acid purum, 2-(4-Boc-piperazino)-2-(3-fluorophenyl)acetic acid purum, 2-(4-Boc-piperazino)-2-(4-fluorophenyl)acetic acid purum, 2-(4-Boc-piperazino)-2-(4-methoxyphenyl)acetic acid purum, 2-(4-Boc-piperazino)-2-phenylacetic acid purum, 2-(4-Boc-piperazino)-2-(3-pyridyl)acetic acid purum, 2-(4-Boc-piperazino)-2-[4-(trifluoromethyl)phenyl]acetic acid purum, Boc-β-(2-quinolyl)-Ala-OH, N-Boc-1,2,3,6-tetrahydro-2-pyridinecarboxylic acid, Boc-β-(4-thiazolyl)-Ala-OH, Boc-β-(2-thienyl)-D-Ala-OH, Fmoc-N-(4-Boc-aminobutyl)-Gly-OH, Fmoc-N-(2-Boc-aminoethyl)-Gly-OH, Fmoc-N-(2,4-dimethoxy benzyl)-Gly-OH, Fmoc-(2-indanyl)-Gly-OH, Fmoc-pentafluoro-L-phenylalanine, Fmoc-Pen(Trt)-OH, Fmoc-Phe(2-Br)—OH, Fmoc-Phe(4-Br)—OH, Fmoc-Phe(3,5-F2)-OH, Fmoc-β-(4-thiazolyl)-Ala-OH, Fmoc-β-(2-thienyl)-Ala-OH, 4-(Hydroxymethyl)-D-phenylalanine.
“Conservatively modified variants” applies to both amino acid and nucleic acid sequences. With respect to particular nucleic acid sequences, “conservatively modified variants” refers to those nucleic acids that encode identical or essentially identical amino acid sequences. Because of the degeneracy of the genetic code, a number of nucleic acid sequences will encode any given protein. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every position where an alanine is specified by a codon, the codon can be altered to any of the corresponding codons described without altering the encoded polypeptide. Such nucleic acid variations are “silent variations,” which are one species of conservatively modified variations. Every nucleic acid sequence herein which encodes a polypeptide also describes every possible silent variation of the nucleic acid. One of skill will recognize that each codon in a nucleic acid (except AUG, which is ordinarily the only codon for methionine, and TGG, which is ordinarily the only codon for tryptophan) can be modified to yield a functionally identical molecule. Accordingly, each silent variation of a nucleic acid which encodes a polypeptide is implicit in each described sequence.
As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a “conservatively modified variant” where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles of the disclosure.
The following groups each contain amino acids that are conservative substitutions for one another: (1) Alanine (A), Glycine (G); (2) Aspartic acid (D), Glutamic acid (E); (3) Asparagine (N), Glutamine (Q); (4) Arginine (R), Lysine (K); (5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); (6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W); (7) Serine (S), Threonine (T); and (8) Cysteine (C), Methionine (M).
The terms “polypeptide,” “peptide” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues, wherein the polymer may in embodiments be conjugated to a moiety that does not consist of amino acids. The terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers and non-naturally occurring amino acid polymers. A “fusion protein” refers to a chimeric protein encoding two or more separate protein sequences that are recombinantly expressed as a single moiety.
An amino acid or nucleotide base “position” is denoted by a number that sequentially identifies each amino acid (or nucleotide base) in the reference sequence based on its position relative to the N-terminus (or 5′-end). Due to deletions, insertions, truncations, fusions, and the like that must be taken into account when determining an optimal alignment, in general the amino acid residue number in a test sequence determined by simply counting from the N-terminus will not necessarily be the same as the number of its corresponding position in the reference sequence. For example, in a case where a variant has a deletion relative to an aligned reference sequence, there will be no amino acid in the variant that corresponds to a position in the reference sequence at the site of deletion. Where there is an insertion in an aligned reference sequence, that insertion will not correspond to a numbered amino acid position in the reference sequence. In the case of truncations or fusions there can be stretches of amino acids in either the reference or aligned sequence that do not correspond to any amino acid in the corresponding sequence.
The terms “numbered with reference to” or “corresponding to,” when used in the context of the numbering of a given amino acid or polynucleotide sequence, refers to the numbering of the residues of a specified reference sequence when the given amino acid or polynucleotide sequence is compared to the reference sequence.
An amino acid residue in a protein “corresponds” to a given residue when it occupies the same essential structural position within the protein as the given residue. For example, a selected residue in a selected protein corresponds to Lysine127 of Siglec-7 when the selected residue occupies the same essential spatial or other structural relationship as Lysine127 in Siglec-7. In embodiments, where a selected protein is aligned for maximum homology with Siglec-7, the position in the aligned selected protein aligning with Lysine127 is said to correspond to Lysine127. Instead of a primary sequence alignment, a three dimensional structural alignment can also be used, e.g., where the structure of the selected protein is aligned for maximum correspondence with Siglec-7 and the overall structures compared. In this case, an amino acid that occupies the same essential position as Lysine127 in the structural model is said to correspond to the Lysine127 residue. Thus, Lysine127 of SEQ ID NO: 1 corresponds to Lysine127 of SEQ ID NOS:2-4 (which can alternatively be referred to as Lysine86 in SEQ ID NO:2; as Lysine87 in SEQ ID NO:3; and as Lysine86 in SEQ ID NO:4).
“Percentage of sequence identity” is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide or polypeptide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.
The terms “identical” or percent “identity,” in the context of two or more nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same (i.e., about 60% identity, or at least 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identity over a specified region, when compared and aligned for maximum correspondence over a comparison window or designated region) as measured using a BLAST or BLAST 2.0 sequence comparison algorithms with default parameters described below, or by manual alignment and visual inspection (e.g., NCBI web site ncbi.nlm.nih.gov/BLAST/ or the like). Such sequences are then said to be “substantially identical,” This definition also refers to, or may be applied to, the compliment of a test sequence. The definition also includes sequences that have deletions and/or additions, as well as those that have substitutions. As described below, the preferred algorithms can account for gaps and the like. Preferably, identity exists over a region that is at least about 25 amino acids or nucleotides in length, or more preferably over a region that is 50-100 amino acids or nucleotides in length.
The term “biomolecule” as used herein refers to large macromolecules such as, for example, proteins, glycans, lipids, and nucleic acids, as well as small molecules such as, for example, primary and secondary metabolites. In embodiments, the term biomolecule refers to a protein. In embodiments, the term biomolecule refers to a glycan. In embodiments, the term biomolecule refers to RNA.
The term “biomolecule moiety” as used herein refers to biomolecules, including large macromolecules such as, for example, proteins, glycans, lipids, and nucleic acids (e.g., RNA), as well as small molecules such as, for example, primary and secondary metabolites. Thus, in embodiments, the biomolecule moiety is a peptidyl moiety, a glycan moiety, a lipid moiety or a nucleic acid moiety. Biomolecule moieties may form part of a molecule (e.g., biomolecule). For example, biomolecule moieties may form part of a biomolecule conjugate, where the biomolecule conjugate includes two or more biomolecule moieties. In embodiments, the biomolecule conjugate includes two or more biomolecule moieties conjugated via a bioconjugate linker.
The term “glycan” or “carbohydrate” as used herein refers to compounds containing monosaccharides linked glycosidically (e.g., N-linked, O-linked). Monosaccharides generally contain from about three to about nine carbon atoms. Exemplary monosaccharides include glyceraldehyde-3-phosphate, erythrose, threose, erythrulose, ribose, deoxyribose, arabinose, lyxose, xylose, ribulose, xylulose, glucose, mannose, galactose, gulose, idose, talose, allose, altrose, fructose, piscose, sorbose, tagatose, glycer-D-manno-heptose, seduhelpulose, methylthiolincos amide, neuraminic acid, sialic acid, legionaminic acid, psudaminic acid, and the like. In embodiments, the term “glycan” refers to a compound comprising a ribose.
The term “glycan moiety” refers to a monovalent radical of a glycan. The glycan moiety may be substituted with additional chemical moieties. In embodiments, the glycan moiety is bonded (covalently or non-covalently) with a protein, a lipid, a glycan, or RNA. In embodiments, the glycan moiety is associated with (e.g., on the surface of or embedded within the surface membrane) a cancer cell. In embodiments, the glycan moiety is covalently bonded via a ribose moiety with a protein, a lipid, a glycan, or RNA. In embodiments, the glycan moiety is covalently bonded via a ribose moiety with a protein.
The term “peptidyl moiety” refers to a protein, protein fragment, or peptide. The peptidyl moiety may be substituted with additional chemical moieties. In embodiments, a peptidyl moiety is a monovalent radical of a protein.
The term “lipid moiety” refers to a lipid or lipid fragment. The lipid may be substituted with additional chemical moieties. In embodiments, a lipid moiety is a monovalent radical of a lipid.
The term “RNA moiety” refers to a RNA, as described herein. In embodiments, a RNA moiety is a monovalent radical of RNA. In embodiments, RNA moiety refers to mRNA. In embodiments, a mRNA moiety is a monovalent radical of mRNA.
The term “pyrrolysyl-tRNA synthetase” refers to an enzyme (including homologs, isoforms, and functional fragments thereof) with pyrrolysyl-tRNA synthetase activity. Pyrrolysyl-tRNA synthetase is an aminoacyl-tRNA synthetase that catalyzes the reaction necessary to attach α-amino acid pyrrolysine to the cognate tRNA (tRNApyl), thereby allowing incorporation of pyrrolysine during proteinogenesis at amber stop codons (i.e., UAG). The term includes any recombinant or naturally-occurring form of pyrrolysyl-tRNA synthetase or variants, homologs, or isoforms thereof that maintain pyrrolysyl-tRNA synthetase activity (e.g. within at least 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or 100% activity compared to wild-type pyrrolysyl-tRNA synthetase). In embodiments, the variants, homologs, or isoforms have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring pyrrolysyl-tRNA synthetase. In embodiments, the mutant pyrrolysyl-tRNA synthetase catalyzes the attachment of the compound of Formula (I) to a tRNApyl.
The terms “tRNAPyl” and “rTNAPylCUA” and “tRNAPylCUA” (i.e., tRNA(superscript Pyl)(subscript CUA)) are used interchangeably and all refer to a single-stranded RNA molecule containing about 70 to 90 nucleotides which fold via intrastrand base pairing to form a characteristic cloverleaf structure that carries a specific amino acid (e.g., compound of Formula (I)) and matches it to its corresponding codon (i.e., a complementary to the anticodon of the (RNA) on an mRNA during protein synthesis. In tRNAPy, the anticodon is CUA. Anticodon CUA is complementary to amber stop codon UAG. The abbreviation “Pyl” of tRNAPy stands for pyrrolysine and the “CUA” of tRNAPy refers to its anticodon CUA. In embodiments, tRNAPy is attached to the compound of Formula (I).
The term “substrate-binding site” as used herein refers to residues located in the enzyme active site that form temporary bonds or interactions with the substrate. In embodiments, the substrate-binding site of pyrrolysyl-tRNA synthetase refers to residues located in the active site of pyrrolysyl-tRNA synthetase that form temporary bonds or interactions with the amino acid substrate.
The term “vector” refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. One type of vector is a “plasmid”, which refers to a linear or circular double stranded DNA loop into which additional DNA segments can be ligated. Another type of vector is a viral vector, wherein additional DNA segments can be ligated into the viral genome. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non episomal mammalian vectors) are integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operatively linked. Such vectors are referred to herein as “expression vectors,” In general, expression vectors of utility in recombinant DNA techniques are often in the form of plasmids. The terms “plasmid” and “vector” can be used interchangeably as the plasmid is the most commonly used form of vector. However, the disclosure is intended to include such other forms of expression vectors, such as viral vectors (e.g., replication defective retroviruses, adenoviruses and adeno-associated viruses), which serve equivalent functions. Some viral vectors are capable of targeting a particular cells type either specifically or nonspecifically. Exemplary vectors that can be used include, but are not limited to, pEvol vector, pMP vector, pET vector, pTak vector, pBad vector.
The term “complex” refers to a composition that includes two or more components, where the components bind together to make a functional unit. In embodiments, a complex described herein include a mutant pyrrolysyl-tRNA synthetase described herein and an amino acid substrate (e.g., the compound of Formula (I)). In embodiments, a complex described herein includes a mutant pyrrolysyl-tRNA synthetase described herein and a tRNA (e.g., tRNAPy). In embodiments, a complex described herein includes a mutant pyrrolysyl-tRNA synthetase described herein, an amino acid substrate (e.g., SFY) and a tRNA (e.g., tRNAPy). In embodiments, a complex described herein includes at least two components selected from the group consisting of a mutant pyrrolysyl-tRNA synthetase described herein, an amino acid substrate (e.g., the compound of Formula (I)), a polypeptide containing the compound of Formula (I), and a tRNA (e.g., tRNAPy).
The term “protein complex” refers to a composition that includes two or more proteins, where the proteins are proximal to each other but not bound together: the proteins are covalently bound together; or the proteins are ionically bound together. In embodiments, the proteins are proximal to each other but not bound together. In embodiments, the proteins are covalently bonded together. In embodiments, proteins are ionically bonded together. In embodiments, the proteins are covalently and ionically bonded together. In embodiments, a first protein in the protein complex comprises compound of Formula (I), and a second protein in the protein complex comprises serine, threonine, or a combination thereof. In embodiments, the compound of Formula (I) in the first protein is proximal to the serine and/or threonine in the second protein. In embodiments “proximal” means that the compound of Formula (I) in the first protein and the serine and/or threonine in the second protein are close enough to each other for a chemical reaction to occur between the compound of Formula (I) and the serine and/or threonine. In embodiments, the chemical reaction is a SuFEx reaction.
The term “glycan-binding protein/glycan complex” refers to a composition that includes at least one glycan-binding protein and at least one glycan, where the glycan-binding protein and glycan are proximal to each other but not bound together: the glycan-binding protein and glycan are covalently bound together; or the glycan-binding protein and glycan are ionically bound together. In embodiments, the glycan-binding protein and glycan are proximal to each other but not bound together. In embodiments, the glycan-binding protein and glycan are covalently bonded together. In embodiments, the glycan-binding protein and glycan are covalently bonded together via ribose moiety in the glycan. In embodiments, glycan-binding protein and glycan are ionically bonded together. In embodiments, the protein and glycan are covalently and ionically bonded together. In embodiments, the glycan-binding protein comprises the compound of Formula (I), and the glycan comprises a hydroxyl moiety. In embodiments, the compound of Formula (I) in the glycan-binding protein is proximal to the hydroxyl moiety in the glycan. In embodiments “proximal” means that the compound of Formula (I) in the glycan-binding protein and the hydroxyl moiety in the glycan are close enough to each other for a chemical reaction to occur between the compound of Formula (I) and the hydroxyl moiety in the glycan. In embodiments, the chemical reaction is a SuFEx reaction.
The term “RNA-binding protein/RNA complex” refers to a composition that includes at least one RNA-binding protein and at least one RNA, where the RNA-binding protein and RNA are proximal to each other but not bound together; the RNA-binding protein and RNA are covalently bound together; or the RNA-binding protein and RNA are ionically bound together. In embodiments, the RNA-binding protein and RNA are proximal to each other but not bound together. In embodiments, the RNA-binding protein and RNA are covalently bonded together. In embodiments. RNA-binding protein and RNA are ionically bonded together. In embodiments, the protein and RNA are covalently and ionically bonded together. In embodiments, the RNA-binding protein comprises the compound of Formula (I), and the RNA comprises a hydroxyl moiety or a N6-methyladenosine moiety. In embodiments, the compound of Formula (I) in the RNA-binding protein is proximal to the RNA. In embodiments “proximal” means that the compound of Formula (I) in the RNA-binding protein and the RNA are close enough to each other for a chemical reaction to occur between the compound of Formula (I) and the RNA. In embodiments, the chemical reaction is a SuFEx reaction.
The terms “transfection”, “transduction”, “transfecting” or “transducing” can be used interchangeably and are defined as a process of introducing a nucleic acid molecule or a protein to a cell. Nucleic acids are introduced to a cell using non-viral or viral-based methods. The nucleic acid molecules may be gene sequences encoding complete proteins or functional portions thereof. Non-viral methods of transfection include any appropriate transfection method that does not use viral DNA or viral particles as a delivery system to introduce the nucleic acid molecule into the cell. Exemplary non-viral transfection methods include calcium phosphate transfection, liposomal transfection, nucleofection, sonoporation, transfection through heat shock, magnetifection and electroporation. In embodiments, the nucleic acid molecules are introduced into a cell using electroporation following standard procedures well known in the art. For viral-based methods of transfection any useful viral vector may be used in the methods described herein. Examples for viral vectors include, but are not limited to retroviral, adenoviral, lentiviral and adeno-associated viral vectors. In embodiments, the nucleic acid molecules are introduced into a cell using a retroviral vector following standard procedures well known in the art. The terms “transfection” or “transduction” also refer to introducing proteins into a cell from the external environment. Typically, transduction or transfection of a protein relies on attachment of a peptide or protein capable of crossing the cell membrane to the protein of interest. See, e.g., Ford et al. (2001) Gene Therapy 8:1-4 and Prochiantz (2007) Nat. Methods 4:119-20.
The term “isolated,” when applied to a nucleic acid or protein, denotes that the nucleic acid or protein is essentially free of other cellular components with which it is associated in the natural state. It can be, for example, in a homogeneous state and may be in either a dry or aqueous solution. Purity and homogeneity are typically determined using analytical chemistry techniques such as polyacrylamide gel electrophoresis or high performance liquid chromatography. A protein that is the predominant species present in a preparation is substantially purified.
“Contacting” is used in accordance with its plain ordinary meaning and refers to the process of allowing at least two distinct species (e.g. chemical compounds including glycans, RNA, amino acids, proteins, peptides, biomolecules, or cells) to become sufficiently proximal to react, interact or physically touch. It should be appreciated; however, the resulting reaction product can be produced directly from a reaction between the added reagents or from an intermediate from one or more of the added reagents that can be produced in the reaction mixture. The term “contacting” may include allowing two species to react, interact, or physically touch, wherein the two species may be biomolecule moieties as described herein. In some embodiments, contacting includes allowing two proteins, a protein and a glycan, or a protein and RNA, as described herein to interact.
The symbol “” or “-” denotes the point of attachment of a chemical moiety to the remainder of a molecule or chemical formula.
The compounds described herein may also contain unnatural proportions of atomic isotopes at one or more of the atoms that constitute such compounds. For example, the compounds may be radiolabeled with radioactive isotopes, such as for example tritium (3H), iodine-125 (125I), or carbon-14 (14C). All isotopic variations of the compounds described herein, whether radioactive or not, are encompassed within the scope of the present disclosure.
“Analog,” or “analogue” is used in accordance with its plain ordinary meaning within Chemistry and Biology and refers to a chemical compound that is structurally similar to another compound (i.e., a so-called “reference” compound) but differs in composition, e.g., in the replacement of one atom by an atom of a different element, or in the presence of a particular functional group, or the replacement of one functional group by another functional group, or the absolute stereochemistry of one or more chiral centers of the reference compound. Accordingly, an analog is a compound that is similar or comparable in function and appearance but not in structure or origin to a reference compound.
A “detectable agent” or “detectable moiety” is a compound or composition detectable by appropriate means such as spectroscopic, photochemical, biochemical, immunochemical, chemical, magnetic resonance imaging, or other physical means. In embodiments, the compounds described herein comprise a detectable agent. For example, useful detectable agents include 18F, 32P, 33P, 45Ti, 47Sc, 52Fe, 59Fe, 62Cu, 64Cu, 67Cu, 67Ga, 68Ga, 77As, 86Y, 90Y. 89Sr, 89Zr, 94Tc, 94Tc, 99mTc, 99Mo, 105Pd, 105Rh, 111Ag, 111In, 123I, 124I, 125I, 131I, 142Pr, 143Pr, 149Pm, 153Sm, 154-1581Gd, 161Tb, 166Dy, 166Ho, 169Er, 175Lu, 177Lu, 186Re, 188Re, 189Re, 194Ir, 198Au, 199Au, 211At, 211Pb, 212Bi, 212Pb, 213Bi, 223Ra, 225Ac, Cr, V, Mn, Fe, Co, Ni, Cu, La, Ce, Pr, Nd, Pm, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu, 32P, fluorophore (e.g., fluorescent dyes), electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin, digoxigenin, paramagnetic molecules, paramagnetic nanoparticles, ultrasmall superparamagnetic iron oxide (“USPIO”) nanoparticles, USPIO nanoparticle aggregates, superparamagnetic iron oxide (“SPIO”) nanoparticles, SPIO nanoparticle aggregates, monocrystalline iron oxide nanoparticles, monochrystalline iron oxide, nanoparticle contrast agents, liposomes or other delivery vehicles containing Gadolinium chelate (“Gd-chelate”) molecules, Gadolinium, radioisotopes, radionuclides (e.g., carbon-11, nitrogen-13, oxygen-15, fluorine-18, rubidium-82), fluorodeoxyglucose (e.g., fluorine-18 labeled), any gamma ray emitting radionuclides, positron-emitting radionuclide, radiolabeled glucose, radiolabeled water, radiolabeled ammonia, biocolloids, microbubbles (e.g. including microbubble shells including albumin, galactose, lipid, and/or polymers; microbubble gas core including air, heavy gases, perfluorcarbon, nitrogen, octafluoropropane, perflexane lipid microsphere, perflutren, etc.), iodinated contrast agents (e.g., iohexol, iodixanol, ioversol, iopamidol, ioxilan, iopromide, diatrizoate, metrizoate, ioxaglate), barium sulfate, thorium dioxide, gold, gold nanoparticles, gold nanoparticle aggregates, fluorophores, two-photon fluorophores, or haptens and proteins or other entities which can be made detectable, e.g., by incorporating a radiolabel into a peptide or antibody specifically reactive with a target peptide. A detectable moiety is a monovalent detectable agent or a detectable agent capable of forming a bond with another compound or composition.
Radioactive substances (e.g., radioisotopes) that may be used as imaging and/or labeling agents in accordance with the embodiments of the disclosure include, but are not limited to, 18F, 32P, 33P, 45Ti, 47Sc, 52Fe, 59Fe, 62Cu, 64Cu, 67Cu, 67Ga, 68Ga, 77As, 86Y, 90Y. 89Sr, 89Zr, 94Tc, 94Tc, 99mTc, 99Mo, 105Pd, 105Rh, 111Ag, 111In, 123P, 124P, 125P, 131P, 142Pr, 143Pr, 149Pm, 153Sm, 154-1581Gd, 161Tb, 166Dy, 166Ho, 169Er, 175Lu, 177Lu, 186Re, 188Re, 189Re, 194Ir, 198Au, 199Au, 211At, 211Pb, 212Bi, 212Pb, 213Bi, 223Ra and 225Ac. Paramagnetic ions that may be used as additional imaging agents in accordance with the embodiments of the disclosure include, e.g., ions of transition and lanthanide metals (e.g., metals having atomic numbers of 21-29, 42, 43, 44, or 57-71). These metals include ions of Cr, V, Mn, Fe, Co, Ni, Cu, La, Ce, Pr, Nd, Pm, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb and Lu. In embodiments, the compounds described herein comprise a radioisotope.
The term “sulfur-fluoride exchange reaction” or “SuFEx” refers to a type of click chemistry as described in detail by, e.g., Dong et al, Angewandte Chemie, 53(36):9340-9448 (2014); and Wang et al, J. Am. Chem. Soc., 140(15):4995-4999 (2018). The term “proximally-enabled” SuFEx refers to the sulfur-fluoride exchange reaction occurring when the reactive species are proximal to each other, i.e., spatially close enough for the SuFEx reaction to occur. The proximity may occur within a single biomolecule (e.g., protein) or between two different biomolecules (e.g., protein and glycan). The skilled artisan could readily determine whether the reactive species are sufficiently proximal for the reaction to occur, e.g., sulfur-fluoride exchange reaction between the compound of Formula (I) and glycan (e.g., a hydroxyl group on a glycan).
In embodiments, “proximal” means that two compounds (e.g., biomolecules, proteins, peptides, amino acids, glycans) are adjacent (e.g., but not covalently bonded together). In embodiments, “proximal” means up to about 25 angstroms. In embodiments, “proximal” means up to about 20 angstroms. In embodiments, “proximal” means up to about 15 angstroms. In embodiments, “proximal” means up to about 10 angstroms. In embodiments, “proximal” means from about 1 angstrom to about 25 angstroms. In embodiments, “proximal” means from about 1 angstrom to about 20 angstroms. In embodiments, “proximal” means from about 1 angstrom to about 15 angstroms. In embodiments, “proximal” means from about 1 angstrom to about 12 angstroms. In embodiments, “proximal” means from about 1 angstrom to about 10 angstroms. In embodiments, “proximal” means from about 1 angstrom to about 8 angstroms. In embodiments, “proximal” means from about 1 angstrom to about 6 angstroms. In embodiments, “proximal” means from about 1 angstrom to about 5 angstroms. In embodiments, “proximal” means from about 1 angstroms to about 4 angstroms.
Where substituent groups are specified by their conventional chemical formulae, written from left to right, they equally encompass the chemically identical substituents that would result from writing the structure from right to left, e.g., —CH2O— is equivalent to —OCH2—.
The term “alkyl,” by itself or as part of another substituent, means, unless otherwise stated, a straight (i.e., unbranched) or branched carbon chain (or carbon), or combination thereof, which may be fully saturated, mono- or polyunsaturated and can include mono-, di- and multivalent radicals. The alkyl may include a designated number of carbons (e.g., C1-C10 means one to ten carbons). Alkyl is an uncyclized chain. Examples of saturated hydrocarbon radicals include, but are not limited to, groups such as methyl, ethyl, n-propyl, isopropyl, n-butyl, t-butyl, isobutyl, sec-butyl, methyl, homologs and isomers of, for example, n-pentyl, n-hexyl, n-heptyl, n-octyl, and the like. An unsaturated alkyl group is one having one or more double bonds or triple bonds. Examples of unsaturated alkyl groups include, but are not limited to, vinyl, 2-propenyl, crotyl, 2-isopentenyl, 2-(butadienyl), 2,4-pentadienyl, 3-(1,4-pentadienyl), ethynyl, 1- and 3-propynyl, 3-butynyl, and the higher homologs and isomers. An alkoxy is an alkyl attached to the remainder of the molecule via an oxygen linker (—O—). An alkyl moiety may be an alkenyl moiety. An alkyl moiety may be an alkynyl moiety. An alkyl moiety may be fully saturated. An alkenyl may include more than one double bond and/or one or more triple bonds in addition to the one or more double bonds. An alkynyl may include more than one triple bond and/or one or more double bonds in addition to the one or more triple bonds.
The term “alkylene,” by itself or as part of another substituent, means, unless otherwise stated, a divalent radical derived from an alkyl, as exemplified by, e.g., —CH2CH2CH2CH2—. Typically, an alkyl (or alkylene) group will have from 1 to 24 carbon atoms, with those groups having 10 or fewer carbon atoms being preferred herein. A “lower alkyl” or “lower alkylene” is a shorter chain alkyl or alkylene group, generally having eight or fewer carbon atoms. The term “alkenylene,” by itself or as part of another substituent, means, unless otherwise stated, a divalent radical derived from an alkene.
The term “heteroalkyl,” by itself or in combination with another term, means, unless otherwise stated, a stable straight or branched chain, or combinations thereof, including at least one carbon atom and at least one heteroatom (e.g., O, N, P, Si, and S), and wherein the nitrogen and sulfur atoms may optionally be oxidized, and the nitrogen heteroatom may optionally be quaternized. The heteroatom(s) may be placed at any interior position of the heteroalkyl group or at the position at which the alkyl group is attached to the remainder of the molecule. Heteroalkyl is an uncyclized chain. Examples include, but are not limited to: —CH2—CH2—O—CH3, —CH2—CH2—NH—CH3, —CH2—CH2—N(CH3)—CH3, —CH2—S—CH2—CH3, —CH2—CH2, —S(O)—CH3, —CH2—CH2—S(O)2—CH3, —CH═CH—O—CH3, —Si(CH3)3, —CH2—CH═N—OCH3, —CH—CH—N(CH3)—CH3, —O—CH3, —O—CH2—CH3, and —CN. Up to two or three heteroatoms may be consecutive, such as, for example, —CH2—NH—OCH3 and —CH2—O—Si(CH3)3. A heteroalkyl moiety may include one heteroatom. A heteroalkyl moiety may include two optionally different heteroatoms. A heteroalkyl moiety may include three optionally different heteroatoms. A heteroalkyl moiety may include four optionally different heteroatoms. A heteroalkyl moiety may include five optionally different heteroatoms. A heteroalkyl moiety may include up to 8 optionally different heteroatoms. The term “heteroalkenyl,” by itself or in combination with another term, means, unless otherwise stated, a heteroalkyl including at least one double bond. A heteroalkenyl may optionally include more than one double bond and/or one or more triple bonds in additional to the one or more double bonds. The term “heteroalkynyl,” by itself or in combination with another term, means, unless otherwise stated, a heteroalkyl including at least one triple bond. A heteroalkynyl may optionally include more than one triple bond and/or one or more double bonds in additional to the one or more triple bonds.
Similarly, the term “heteroalkylene,” by itself or as part of another substituent, means, unless otherwise stated, a divalent radical derived from heteroalkyl, as exemplified, but not limited by, —CH2—CH2—S—CH2—CH2— and —CH2—S—CH2—CH2—NH—CH2—. For heteroalkylene groups, heteroatoms can also occupy either or both of the chain termini (e.g., alkyleneoxy, alkylenedioxy, alkyleneamino, alkylenediamino, and the like). Still further, for alkylene and heteroalkylene linking groups, no orientation of the linking group is implied by the direction in which the formula of the linking group is written. For example, the formula —C(O)2R′— represents both —C(O)2R′- and —R′C(O)2—. As described above, heteroalkyl groups, as used herein, include those groups that are attached to the remainder of the molecule through a heteroatom, such as —C(O)R′, —C(O)NR′, —NR′R″, —OR′, —SR′, and/or —SO2R′. Where “heteroalkyl” is recited, followed by recitations of specific heteroalkyl groups, such as —NR′R″ or the like, it will be understood that the terms heteroalkyl and —NR′R″ are not redundant or mutually exclusive. Rather, the specific heteroalkyl groups are recited to add clarity. Thus, the term “heteroalkyl” should not be interpreted herein as excluding specific heteroalkyl groups, such as —NR′R″ or the like.
The terms “cycloalkyl” and “heterocycloalkyl,” by themselves or in combination with other terms, mean, unless otherwise stated, cyclic versions of “alkyl” and “heteroalkyl.” respectively. Cycloalkyl and heterocycloalkyl are not aromatic. Additionally, for heterocycloalkyl, a heteroatom can occupy the position at which the heterocycle is attached to the remainder of the molecule. Examples of cycloalkyl include, but are not limited to, cyclopropyl, cyclobutyl, cyclopentyl, cyclohexyl, 1-cyclohexenyl, 3-cyclohexenyl, cycloheptyl, and the like. Examples of heterocycloalkyl include, but are not limited to, 1-(1,2,5,6-tetrahydropyridyl), 1-piperidinyl, 2-piperidinyl, 3-piperidinyl, 4-morpholinyl, 3-morpholinyl, tetrahydrofuran-2-yl, tetrahydrofuran-3-yl, tetrahydrothien-2-yl, tetrahydrothien-3-yl, 1-piperazinyl, 2-piperavinyl, and the like. A “cycloalkylene” and a “heterocycloalkylene,” alone or as part of another substituent, means a divalent radical derived from a cycloalkyl and heterocycloalkyl, respectively.
In embodiments, the term “cycloalkyl” means a monocyclic, bicyclic, or a multicyclic cycloalkyl ring system. In embodiments, monocyclic ring systems are cyclic hydrocarbon groups containing from 3 to 8 carbon atoms, where such groups can be saturated or unsaturated, but not aromatic. In embodiments, cycloalkyl groups are fully saturated. Examples of monocyclic cycloalkyls include cyclopropyl, cyclobutyl, cyclopentyl, cyclopentenyl, cyclohexyl, cyclohexenyl, cycloheptyl, and cyclooctyl. Bicyclic cycloalkyl ring systems are bridged monocyclic rings or fused bicyclic rings. In embodiments, bridged monocyclic rings contain a monocyclic cycloalkyl ring where two non adjacent carbon atoms of the monocyclic ring are linked by an alkylene bridge of between one and three additional carbon atoms (i.e., a bridging group of the form (CH2)w, where w is 1, 2, or 3). Representative examples of bicyclic ring systems include, but are not limited to, bicyclo[3.1.1]heptane, bicyclo[2.2.1]heptane, bicyclo[2.2.2]octane, bicyclo[3.2.2]nonane, bicyclo[3.3.1]nonane, and bicyclo[4.2.1]nonane. In embodiments, fused bicyclic cycloalkyl ring systems contain a monocyclic cycloalkyl ring fused to either a phenyl, a monocyclic cycloalkyl, a monocyclic cycloalkenyl, a monocyclic heterocyclyl, or a monocyclic heteroaryl. In embodiments, the bridged or fused bicyclic cycloalkyl is attached to the parent molecular moiety through any carbon atom contained within the monocyclic cycloalkyl ring. In embodiments, cycloalkyl groups are optionally substituted with one or two groups which are independently oxo or thia. In embodiments, the fused bicyclic cycloalkyl is a 5 or 6 membered monocyclic cycloalkyl ring fused to either a phenyl ring, a 5 or 6 membered monocyclic cycloalkyl, a 5 or 6 membered monocyclic cycloalkenyl, a 5 or 6 membered monocyclic heterocyclyl, or a 5 or 6 membered monocyclic heteroaryl, wherein the fused bicyclic cycloalkyl is optionally substituted by one or two groups which are independently oxo or thia. In embodiments, multicyclic cycloalkyl ring systems are a monocyclic cycloalkyl ring (base ring) fused to either (i) one ring system selected from the group consisting of a bicyclic aryl, a bicyclic heteroaryl, a bicyclic cycloalkyl, a bicyclic cycloalkenyl, and a bicyclic heterocyclyl; or (ii) two other ring systems independently selected from the group consisting of a phenyl, a bicyclic aryl, a monocyclic or bicyclic heteroaryl, a monocyclic or bicyclic cycloalkyl, a monocyclic or bicyclic cycloalkenyl, and a monocyclic or bicyclic heterocyclyl. In embodiments, the multicyclic cycloalkyl is attached to the parent molecular moiety through any carbon atom contained within the base ring. In embodiments, multicyclic cycloalkyl ring systems are a monocyclic cycloalkyl ring (base ring) fused to either (i) one ring system selected from the group consisting of a bicyclic aryl, a bicyclic heteroaryl, a bicyclic cycloalkyl, a bicyclic cycloalkenyl, and a bicyclic heterocyclyl; or (ii) two other ring systems independently selected from the group consisting of a phenyl, a monocyclic heteroaryl, a monocyclic cycloalkyl, a monocyclic cycloalkenyl, and a monocyclic heterocyclyl. Examples of multicyclic cycloalkyl groups include, but are not limited to tetradecahydrophenanthrenyl, perhydrophenothiazin-1-yl, and perhydrophenoxazin-1-yl.
In embodiments, a cycloalkyl is a cycloalkenyl. The term “cycloalkenyl” is used in accordance with its plain ordinary meaning. In embodiments, a cycloalkenyl is a monocyclic, bicyclic, or a multicyclic cycloalkenyl ring system. In embodiments, monocyclic cycloalkenyl ring systems are cyclic hydrocarbon groups containing from 3 to 8 carbon atoms, where such groups are unsaturated (i.e., containing at least one annular carbon carbon double bond), but not aromatic. Examples of monocyclic cycloalkenyl ring systems include cyclopentenyl and cyclohexenyl. In embodiments, bicyclic cycloalkenyl rings are bridged monocyclic rings or a fused bicyclic rings. In embodiments, bridged monocyclic rings contain a monocyclic cycloalkenyl ring where two non adjacent carbon atoms of the monocyclic ring are linked by an alkylene bridge of between one and three additional carbon atoms (i.e., a bridging group of the form (CH2)w, where w is 1, 2, or 3). Representative examples of bicyclic cycloalkenyls include, but are not limited to, norbornenyl and bicyclo[2.2.2]oct 2 enyl. In embodiments, fused bicyclic cycloalkenyl ring systems contain a monocyclic cycloalkenyl ring fused to either a phenyl, a monocyclic cycloalkyl, a monocyclic cycloalkenyl, a monocyclic heterocyclyl, or a monocyclic heteroaryl. In embodiments, the bridged or fused bicyclic cycloalkenyl is attached to the parent molecular moiety through any carbon atom contained within the monocyclic cycloalkenyl ring. In embodiments, cycloalkenyl groups are optionally substituted with one or two groups which are independently oxo or thia. In embodiments, multicyclic cycloalkenyl rings contain a monocyclic cycloalkenyl ring (base ring) fused to either (i) one ring system selected from the group consisting of a bicyclic aryl, a bicyclic heteroaryl, a bicyclic cycloalkyl, a bicyclic cycloalkenyl, and a bicyclic heterocyclyl; or (ii) two ring systems independently selected from the group consisting of a phenyl, a bicyclic aryl, a monocyclic or bicyclic heteroaryl, a monocyclic or bicyclic cycloalkyl, a monocyclic or bicyclic cycloalkenyl, and a monocyclic or bicyclic heterocyclyl. In embodiments, the multicyclic cycloalkenyl is attached to the parent molecular moiety through any carbon atom contained within the base ring. In embodiments, multicyclic cycloalkenyl rings contain a monocyclic cycloalkenyl ring (base ring) fused to either (i) one ring system selected from the group consisting of a bicyclic aryl, a bicyclic heteroaryl, a bicyclic cycloalkyl, a bicyclic cycloalkenyl, and a bicyclic heterocyclyl; or (ii) two ring systems independently selected from the group consisting of a phenyl, a monocyclic heteroaryl, a monocyclic cycloalkyl, a monocyclic cycloalkenyl, and a monocyclic heterocyclyl.
In embodiments, a heterocycloalkyl is a heterocyclyl. The term “heterocyclyl” as used herein, means a monocyclic, bicyclic, or multicyclic heterocycle. The heterocyclyl monocyclic heterocycle is a 3, 4, 5, 6 or 7 membered ring containing at least one heteroatom independently selected from the group consisting of O, N, and S where the ring is saturated or unsaturated, but not aromatic. The 3 or 4 membered ring contains 1 heteroatom selected from the group consisting of O, N and S. The 5 membered ring can contain zero or one double bond and one, two or three heteroatoms selected from the group consisting of O, N and S. The 6 or 7 membered ring contains zero, one or two double bonds and one, two or three heteroatoms selected from the group consisting of O, N and S. The heterocyclyl monocyclic heterocycle is connected to the parent molecular moiety through any carbon atom or any nitrogen atom contained within the heterocyclyl monocyclic heterocycle. Representative examples of heterocyclyl monocyclic heterocycles include, but are not limited to, azetidinyl, azepanyl, aziridinyl, diazepanyl, 1,3-dioxanyl, 1,3-dioxolanyl, 1,3-dithiolanyl, 1,3-dithianyl, imidazolinyl, imidazolidinyl, isothiazolinyl, isothiazolidinyl, isoxazolinyl, isoxazolidinyl, morpholinyl, oxadiazolinyl, oxadiazolidinyl, oxazolinyl, oxazolidinyl, piperavinyl, piperidinyl, pyranyl, pyrazolinyl, pyrazolidinyl, pyrrolinyl, pyrrolidinyl, tetrahydrofuranyl, tetrahydrothienyl, thiadiazolinyl, thiadiazolidinyl, thiazolinyl, thiazolidinyl, thiomorpholinyl, 1,1-dioxidothiomorpholinyl (thiomorpholine sulfone), thiopyranyl, and trithianyl. The heterocyclyl bicyclic heterocycle is a monocyclic heterocycle fused to either a phenyl, a monocyclic cycloalkyl, a monocyclic cycloalkenyl, a monocyclic heterocycle, or a monocyclic heteroaryl. The heterocyclyl bicyclic heterocycle is connected to the parent molecular moiety through any carbon atom or any nitrogen atom contained within the monocyclic heterocycle portion of the bicyclic ring system. Representative examples of bicyclic heterocyclyls include, but are not limited to, 2,3-dihydrobenzofuran-2-yl, 2,3-dihydrobenzofuran-3-yl, indolin-1-yl, indolin-2-yl, indolin-3-yl, 2,3-dihydrobenzothien-2-yl, decahydroquinolinyl, decahydroisoquinolinyl, octahydro-1H-indolyl, and octahydrobenzofuranyl. In embodiments, heterocyclyl groups are optionally substituted with one or two groups which are independently oxo or thia. In certain embodiments, the bicyclic heterocyclyl is a 5 or 6 membered monocyclic heterocyclyl ring fused to a phenyl ring, a 5 or 6 membered monocyclic cycloalkyl, a 5 or 6 membered monocyclic cycloalkenyl, a 5 or 6 membered monocyclic heterocyclyl, or a 5 or 6 membered monocyclic heteroaryl, wherein the bicyclic heterocyclyl is optionally substituted by one or two groups which are independently oxo or thia. Multicyclic heterocyclyl ring systems are a monocyclic heterocyclyl ring (base ring) fused to either (i) one ring system selected from the group consisting of a bicyclic aryl, a bicyclic heteroaryl, a bicyclic cycloalkyl, a bicyclic cycloalkenyl, and a bicyclic heterocyclyl; or (ii) two other ring systems independently selected from the group consisting of a phenyl, a bicyclic aryl, a monocyclic or bicyclic heteroaryl, a monocyclic or bicyclic cycloalkyl, a monocyclic or bicyclic cycloalkenyl, and a monocyclic or bicyclic heterocyclyl. The multicyclic heterocyclyl is attached to the parent molecular moiety through any carbon atom or nitrogen atom contained within the base ring. In embodiments, multicyclic heterocyclyl ring systems are a monocyclic heterocyclyl ring (base ring) fused to either (i) one ring system selected from the group consisting of a bicyclic aryl, a bicyclic heteroaryl, a bicyclic cycloalkyl, a bicyclic cycloalkenyl, and a bicyclic heterocyclyl; or (ii) two other ring systems independently selected from the group consisting of a phenyl, a monocyclic heteroaryl, a monocyclic cycloalkyl, a monocyclic cycloalkenyl, and a monocyclic heterocyclyl. Examples of multicyclic heterocyclyl groups include, but are not limited to 10H-phenothiazin-10-yl, 9,10-dihydroacridin-9-yl, 9,10-dihydroacridin-10-yl, 10H-phenoxazin-10-yl, 10,11-dihydro-5H-dibenzo[b,f]azepin-5-yl, 1,2,3,4-tetrahydropyrido[4,3-g]isoquinolin-2-yl, 12H-benzo[b]phenoxazin-12-yl, and dodecahydro-1H-carbazol-9-yl.
The terms “halo” or “halogen,” by themselves or as part of another substituent, mean, unless otherwise stated, a fluorine, chlorine, bromine, or iodine atom. Additionally, terms such as “haloalkyl” are meant to include monohaloalkyl and polyhaloalkyl. For example, the term “halo(C1-C4)alkyl” includes, but is not limited to, fluoromethyl, difluoromethyl, trifluoromethyl, 2,2,2-trifluoroethyl, 4-chlorobutyl, 3-bromopropyl, and the like.
The term “acyl” means, unless otherwise stated, —C(O)R where R is a substituted or unsubstituted alkyl, substituted or unsubstituted cycloalkyl, substituted or unsubstituted heteroalkyl, substituted or unsubstituted heterocycloalkyl, substituted or unsubstituted aryl, or substituted or unsubstituted heteroaryl.
The term “aryl” means, unless otherwise stated, a polyunsaturated, aromatic, hydrocarbon substituent, which can be a single ring or multiple rings (preferably from 1 to 3 rings) that are fused together (i.e., a fused ring aryl) or linked covalently. A fused ring aryl refers to multiple rings fused together wherein at least one of the fused rings is an aryl ring. The term “heteroaryl” refers to aryl groups (or rings) that contain at least one heteroatom such as N, O, or S, wherein the nitrogen and sulfur atoms are optionally oxidized, and the nitrogen atom(s) are optionally quaternized. Thus, the term “heteroaryl” includes fused ring heteroaryl groups (i.e., multiple rings fused together wherein at least one of the fused rings is a heteroaromatic ring). A 5,6-fused ring heteroarylene refers to two rings fused together, wherein one ring has 5 members and the other ring has 6 members, and wherein at least one ring is a heteroaryl ring. Likewise, a 6,6-fused ring heteroarylene refers to two rings fused together, wherein one ring has 6 members and the other ring has 6 members, and wherein at least one ring is a heteroaryl ring. And a 6,5-fused ring heteroarylene refers to two rings fused together, wherein one ring has 6 members and the other ring has 5 members, and wherein at least one ring is a heteroaryl ring. A heteroaryl group can be attached to the remainder of the molecule through a carbon or heteroatom. Non-limiting examples of aryl and heteroaryl groups include phenyl, naphthyl, pyrrolyl, pyrazolyl, pyridavinyl, triavinyl, pyrimidinyl, imidazolyl, pyrazinyl, purinyl, oxazolyl, isoxazolyl, thiazolyl, furyl, thienyl, pyridyl, pyrimidyl, benzothiazolyl, benzoxazoyl benzimidazolyl, benzofuran, isobenzofuranyl, indolyl, isoindolyl, benzothiophenyl, isoquinolyl, quinoxalinyl, quinolyl, 1-naphthyl, 2-naphthyl, 4-biphenyl, 1-pyrrolyl, 2-pyrrolyl, 3-pyrrolyl, 3-pyrazolyl, 2-imidazolyl, 4-imidazolyl, pyrazinyl, 2-oxazolyl, 4-oxazolyl, 2-phenyl-4-oxazolyl, 5-oxazolyl, 3-isoxazolyl, 4-isoxazolyl, 5-isoxazolyl, 2-thiazolyl, 4-thiazolyl, 5-thiazolyl, 2-furyl, 3-furyl, 2-thienyl, 3-thienyl, 2-pyridyl, 3-pyridyl, 4-pyridyl, 2-pyrimidyl, 4-pyrimidyl, 5-benzothiazolyl, purinyl, 2-benzimidazolyl, 5-indolyl, 1-isoquinolyl, 5-isoquinolyl, 2-quinoxalinyl, 5-quinoxalinyl, 3-quinolyl, and 6-quinolyl. Substituents for each of the above noted aryl and heteroaryl ring systems are selected from the group of acceptable substituents described below. An “arylene” and a “heteroarylene,” alone or as part of another substituent, mean a divalent radical derived from an aryl and heteroaryl, respectively. A heteroaryl group substituent may be —O— bonded to a ring heteroatom nitrogen.
A fused ring heterocyloalkyl-aryl is an aryl fused to a heterocycloalkyl. A fused ring heterocycloalkyl-heteroaryl is a heteroaryl fused to a heterocycloalkyl. A fused ring heterocycloalkyl-cycloalkyl is a heterocycloalkyl fused to a cycloalkyl. A fused ring heterocycloalkyl-heterocycloalkyl is a heterocycloalkyl fused to another heterocycloalkyl. Fused ring heterocycloalkyl-aryl, fused ring heterocycloalkyl-heteroaryl, fused ring heterocycloalkyl-cycloalkyl, or fused ring heterocycloalkyl-heterocycloalkyl may each independently be unsubstituted or substituted with one or more of the substituents described herein.
Spirocyclic rings are two or more rings wherein adjacent rings are attached through a single atom. The individual rings within spirocyclic rings may be identical or different. Individual rings in spirocyclic rings may be substituted or unsubstituted and may have different substituents from other individual rings within a set of spirocyclic rings. Possible substituents for individual rings within spirocyclic rings are the possible substituents for the same ring when not part of spirocyclic rings (e.g. substituents for cycloalkyl or heterocycloalkyl rings). Spirocyclic rings may be substituted or unsubstituted cycloalkyl, substituted or unsubstituted cycloalkylene, substituted or unsubstituted heterocycloalkyl or substituted or unsubstituted heterocycloalkylene and individual rings within a spirocyclic ring group may be any of the immediately previous list, including having all rings of one type (e.g. all rings being substituted heterocycloalkylene wherein each ring may be the same or different substituted heterocycloalkylene). When referring to a spirocyclic ring system, heterocyclic spirocyclic rings means a spirocyclic rings wherein at least one ring is a heterocyclic ring and wherein each ring may be a different ring. When referring to a spirocyclic ring system, substituted spirocyclic rings means that at least one ring is substituted and each substituent may optionally be different.
The symbol “” or “-” denotes the point of attachment of a chemical moiety to the remainder of a molecule or chemical formula.
The term “oxo,” as used herein, means an oxygen that is double bonded to a carbon atom.
The term “alkylsulfonyl,” as used herein, means a moiety having the formula —S(O2)—R′, where R′ is a substituted or unsubstituted alkyl group as defined above. R′ may have a specified number of carbons (e.g., “C1-C4 alkylsulfonyl”).
The term “alkylarylene” as an arylene moiety covalently bonded to an alkylene moiety (also referred to herein as an alkylene linker). In embodiments, the alkylarylene group has the formula:
Each of the above terms (e.g., “alkyl,” “heteroalkyl,” “cycloalkyl,” “heterocycloalkyl,” “aryl,” and “heteroaryl”) includes both substituted and unsubstituted forms of the indicated radical. Preferred substituents for each type of radical are provided below.
Substituents for the alkyl and heteroalkyl radicals (including those groups often referred to as alkylene, alkenyl, heteroalkylene, heteroalkenyl, alkynyl, cycloalkyl, heterocycloalkyl, cycloalkenyl, and heterocycloalkenyl) can be one or more of a variety of groups selected from, but not limited to, —OR′, ═O, ═NR′, ═N—OR′, —NR′R″, —SR′, -halogen, —SiR′R″R′″, —OC(O)R′, —C(O)R′, —CO2R′, —CONR′R″, —OC(O)NR′R″, —NR″C(O)R′, —NR′—C(O)NR″R′″, —NR″C(O)2R′, —NR—C(NR′R″R′″)═NR″″, —NR—C(NR′R″)═NR′″, —S(O)R′, —S(O)2R′, —S(O)2NR′R″, —NRSO2R′, —NR′NR″R′″, —ONR′R″, —NR′C(O)NR″NR′″R″″, —CN, —NO2, —NR′SO2R″, —NR′C(O)R″, —NR′C(O)—OR″, —NR′OR″, in a number ranging from zero to (2m′+1), where m′ is the total number of carbon atoms in such radical. R, R′, R″, R′″, and R″″ each preferably independently refer to hydrogen, substituted or unsubstituted heteroalkyl, substituted or unsubstituted cycloalkyl, substituted or unsubstituted heterocycloalkyl, substituted or unsubstituted aryl (e.g., aryl substituted with 1-3 halogens), substituted or unsubstituted heteroaryl, substituted or unsubstituted alkyl, alkoxy, or thioalkoxy groups, or arylalkyl groups. When a compound described herein includes more than one R group, for example, each of the R groups is independently selected as are each R′, R″, R′″, and R″″ group when more than one of these groups is present. When R′ and R″ are attached to the same nitrogen atom, they can be combined with the nitrogen atom to form a 4-, 5-, 6-, or 7-membered ring. For example, —NR′R″ includes, but is not limited to, 1-pyrrolidinyl and 4-morpholinyl. From the above discussion of substituents, one of skill in the art will understand that the term “alkyl” is meant to include groups including carbon atoms bound to groups other than hydrogen groups, such as haloalkyl (e.g., —CF3 and —CH2CF3) and acyl (e.g., —C(O)CH3, —C(O)CF3, —C(O)CH2OCH3, and the like).
Similar to the substituents described for the alkyl radical, substituents for the aryl and heteroaryl groups are varied and are selected from, for example: —OR′, —NR′R″, —SR′, -halogen, —SiR′R″R′″, —OC(O)R′, —C(O)R′, —CO2R′, —CONR′R″, —OC(O)NR′R″, —NR″C(O)R′, —NR′—C(O)NR″R′″, —NR″C(O)2R′, —NR—C(NR′R″R′″)═NR″″, —NR—C(NR′R″)═NR′″, —S(O)R′, —S(O)2R′, —S(O)2NR′R″, —NRSO2R′, —NR′NR″R′″, —ONR′R″, —NR′C(O)NR″NR′″R″″, —CN, —NO2, —R′, —N3, —CH(Ph)2, fluoro(C1-C4)alkoxy, and fluoro(C1-C4)alkyl, —NR′SO2R″, —NR′C(O)R″, —NR′C(O)—OR″, —NR′OR″, in a number ranging from zero to the total number of open valences on the aromatic ring system; and where R′, R″, R′″, and R″″ are preferably independently selected from hydrogen, substituted or unsubstituted alkyl, substituted or unsubstituted heteroalkyl, substituted or unsubstituted cycloalkyl, substituted or unsubstituted heterocycloalkyl, substituted or unsubstituted aryl, and substituted or unsubstituted heteroaryl. When a compound described herein includes more than one R group, for example, each of the R groups is independently selected as are each R′, R″, R′″, and R″″ groups when more than one of these groups is present.
Substituents for rings (e.g. cycloalkyl, heterocycloalkyl, aryl, heteroaryl, cycloalkylene, heterocycloalkylene, arylene, or heteroarylene) may be depicted as substituents on the ring rather than on a specific atom of a ring (commonly referred to as a floating substituent). In such a case, the substituent may be attached to any of the ring atoms (obeying the rules of chemical valency) and in the case of fused rings or spirocyclic rings, a substituent depicted as associated with one member of the fused rings or spirocyclic rings (a floating substituent on a single ring), may be a substituent on any of the fused rings or spirocyclic rings (a floating substituent on multiple rings). When a substituent is attached to a ring, but not a specific atom (a floating substituent), and a subscript for the substituent is an integer greater than one, the multiple substituents may be on the same atom, same ring, different atoms, different fused rings, different spirocyclic rings, and each substituent may optionally be different. Where a point of attachment of a ring to the remainder of a molecule is not limited to a single atom (a floating substituent), the attachment point may be any atom of the ring and in the case of a fused ring or spirocyclic ring, any atom of any of the fused rings or spirocyclic rings while obeying the rules of chemical valency. Where a ring, fused rings, or spirocyclic rings contain one or more ring heteroatoms and the ring, fused rings, or spirocyclic rings are shown with one more floating substituents (including, but not limited to, points of attachment to the remainder of the molecule), the floating substituents may be bonded to the heteroatoms. Where the ring heteroatoms are shown bound to one or more hydrogens (e.g. a ring nitrogen with two bonds to ring atoms and a third bond to a hydrogen) in the structure or formula with the floating substituent, when the heteroatom is bonded to the floating substituent, the substituent will be understood to replace the hydrogen, while obeying the rules of chemical valency.
Two or more substituents may optionally be joined to form aryl, heteroaryl, cycloalkyl, or heterocycloalkyl groups. Such so-called ring-forming substituents are typically, though not necessarily, found attached to a cyclic base structure. In embodiments, the ring-forming substituents are attached to adjacent members of the base structure. For example, two ring-forming substituents attached to adjacent members of a cyclic base structure create a fused ring structure. In embodiments, the ring-forming substituents are attached to a single member of the base structure. For example, two ring-forming substituents attached to a single member of a cyclic base structure create a spirocyclic structure. In embodiments, the ring-forming substituents are attached to non-adjacent members of the base structure.
Two of the substituents on adjacent atoms of the aryl or heteroaryl ring may optionally form a ring of the formula -T-C(O)—(CRR′)q—U—, wherein T and U are independently —NR—, —O—, —CRR′—, or a single bond, and q is an integer of from 0 to 3. Alternatively, two of the substituents on adjacent atoms of the aryl or heteroaryl ring may optionally be replaced with a substituent of the formula -A-(CH2)r—B—, wherein A and B are independently —CRR′—, —O—, —NR—, —S—, —S(O)—, —S(O)2—, —S(O)2NR′—, or a single bond, and r is an integer of from 1 to 4. One of the single bonds of the new ring so formed may optionally be replaced with a double bond. Alternatively, two of the substituents on adjacent atoms of the aryl or heteroaryl ring may optionally be replaced with a substituent of the formula —(CRR′)s—X′— (C″R″R′″)d—, where s and d are independently integers of from 0 to 3, and X′ is —O—, —NR′—, —S—, —S(O)—, —S(O)2—, or —S(O)2NR′—. The substituents R, R′, R″, and R′″ are preferably independently selected from hydrogen, substituted or unsubstituted alkyl, substituted or unsubstituted heteroalkyl, substituted or unsubstituted cycloalkyl, substituted or unsubstituted heterocycloalkyl, substituted or unsubstituted aryl, and substituted or unsubstituted heteroaryl.
As used herein, the terms “heteroatom” or “ring heteroatom” are meant to include oxygen (O), nitrogen (N), sulfur (S), phosphorus (P), and silicon (Si).
A “substituent group,” as used herein, means a group selected from the following moieties:
A “size-limited substituent” or “size-limited substituent group,” as used herein, means a group selected from all of the substituents described above for a “substituent group,” wherein each substituted or unsubstituted alkyl is a substituted or unsubstituted C1-C20 alkyl, each substituted or unsubstituted heteroalkyl is a substituted or unsubstituted 2 to 20 membered heteroalkyl, each substituted or unsubstituted cycloalkyl is a substituted or unsubstituted C3-C8 cycloalkyl, each substituted or unsubstituted heterocycloalkyl is a substituted or unsubstituted 3 to 8 membered heterocycloalkyl, each substituted or unsubstituted aryl is a substituted or unsubstituted C6-C10 aryl, and each substituted or unsubstituted heteroaryl is a substituted or unsubstituted 5 to 10 membered heteroaryl.
A “lower substituent” or “lower substituent group,” as used herein, means a group selected from all of the substituents described above for a “substituent group,” wherein each substituted or unsubstituted alkyl is a substituted or unsubstituted C1-C8 alkyl, each substituted or unsubstituted heteroalkyl is a substituted or unsubstituted 2 to 8 membered heteroalkyl, each substituted or unsubstituted cycloalkyl is a substituted or unsubstituted C3-C7 cycloalkyl, each substituted or unsubstituted heterocycloalkyl is a substituted or unsubstituted 3 to 7 membered heterocycloalkyl, each substituted or unsubstituted aryl is a substituted or unsubstituted C6-C10 aryl, and each substituted or unsubstituted heteroaryl is a substituted or unsubstituted 5 to 9 membered heteroaryl.
In embodiments, each substituted group described in the compounds herein is substituted with at least one substituent group. More specifically, in embodiments, each substituted alkyl, substituted heteroalkyl, substituted cycloalkyl, substituted heterocycloalkyl, substituted aryl, substituted heteroaryl, substituted alkylene, substituted heteroalkylene, substituted cycloalkylene, substituted heterocycloalkylene, substituted arylene, and/or substituted heteroarylene described in the compounds herein are substituted with at least one substituent group. In embodiments, at least one or all of these groups are substituted with at least one size-limited substituent group. In embodiments, at least one or all of these groups are substituted with at least one lower substituent group.
In embodiments of the compounds herein, each substituted or unsubstituted alkyl may be a substituted or unsubstituted C1-C20 alkyl, each substituted or unsubstituted heteroalkyl is a substituted or unsubstituted 2 to 20 membered heteroalkyl, each substituted or unsubstituted cycloalkyl is a substituted or unsubstituted C3-C8 cycloalkyl, each substituted or unsubstituted heterocycloalkyl is a substituted or unsubstituted 3 to 8 membered heterocycloalkyl, each substituted or unsubstituted aryl is a substituted or unsubstituted C6-C10 aryl, and/or each substituted or unsubstituted heteroaryl is a substituted or unsubstituted 5 to 10 membered heteroaryl. In embodiments of the compounds herein, each substituted or unsubstituted alkylene is a substituted or unsubstituted C1-C20 alkylene, each substituted or unsubstituted heteroalkylene is a substituted or unsubstituted 2 to 20 membered heteroalkylene, each substituted or unsubstituted cycloalkylene is a substituted or unsubstituted C3-C8 cycloalkylene, each substituted or unsubstituted heterocycloalkylene is a substituted or unsubstituted 3 to 8 membered heterocycloalkylene, each substituted or unsubstituted arylene is a substituted or unsubstituted C6-C10 arylene, and/or each substituted or unsubstituted heteroarylene is a substituted or unsubstituted 5 to 10 membered heteroarylene.
In embodiments, each substituted or unsubstituted alkyl is a substituted or unsubstituted C1-C8 alkyl, each substituted or unsubstituted heteroalkyl is a substituted or unsubstituted 2 to 8 membered heteroalkyl, each substituted or unsubstituted cycloalkyl is a substituted or unsubstituted C3-C7 cycloalkyl, each substituted or unsubstituted heterocycloalkyl is a substituted or unsubstituted 3 to 7 membered heterocycloalkyl, each substituted or unsubstituted aryl is a substituted or unsubstituted C6-C10 aryl, and/or each substituted or unsubstituted heteroaryl is a substituted or unsubstituted 5 to 9 membered heteroaryl. In embodiments, each substituted or unsubstituted alkylene is a substituted or unsubstituted C1-C8 alkylene, each substituted or unsubstituted heteroalkylene is a substituted or unsubstituted 2 to 8 membered heteroalkylene, each substituted or unsubstituted cycloalkylene is a substituted or unsubstituted C3-C7 cycloalkylene, each substituted or unsubstituted heterocycloalkylene is a substituted or unsubstituted 3 to 7 membered heterocycloalkylene, each substituted or unsubstituted arylene is a substituted or unsubstituted C6-C10 arylene, and/or each substituted or unsubstituted heteroarylene is a substituted or unsubstituted 5 to 9 membered heteroarylene.
In embodiments, a substituted or unsubstituted moiety (e.g., substituted or unsubstituted alkyl, substituted or unsubstituted heteroalkyl, substituted or unsubstituted cycloalkyl, substituted or unsubstituted heterocycloalkyl, substituted or unsubstituted aryl, substituted or unsubstituted heteroaryl, substituted or unsubstituted alkylene, substituted or unsubstituted heteroalkylene, substituted or unsubstituted cycloalkylene, substituted or unsubstituted heterocycloalkylene, substituted or unsubstituted arylene, and/or substituted or unsubstituted heteroarylene) is unsubstituted (e.g., is an unsubstituted alkyl, unsubstituted heteroalkyl, unsubstituted cycloalkyl, unsubstituted heterocycloalkyl, unsubstituted aryl, unsubstituted heteroaryl, unsubstituted alkylene, unsubstituted heteroalkylene, unsubstituted cycloalkylene, unsubstituted heterocycloalkylene, unsubstituted arylene, and/or unsubstituted heteroarylene, respectively). In embodiments, a substituted or unsubstituted moiety (e.g., substituted or unsubstituted alkyl, substituted or unsubstituted heteroalkyl, substituted or unsubstituted cycloalkyl, substituted or unsubstituted heterocycloalkyl, substituted or unsubstituted aryl, substituted or unsubstituted heteroaryl, substituted or unsubstituted alkylene, substituted or unsubstituted heteroalkylene, substituted or unsubstituted cycloalkylene, substituted or unsubstituted heterocycloalkylene, substituted or unsubstituted arylene, and/or substituted or unsubstituted heteroarylene) is substituted (e.g., is a substituted alkyl, substituted heteroalkyl, substituted cycloalkyl, substituted heterocycloalkyl, substituted aryl, substituted heteroaryl, substituted alkylene, substituted heteroalkylene, substituted cycloalkylene, substituted heterocycloalkylene, substituted arylene, and/or substituted heteroarylene, respectively).
In embodiments, a substituted moiety (e.g., substituted alkyl, substituted heteroalkyl, substituted cycloalkyl, substituted heterocycloalkyl, substituted aryl, substituted heteroaryl, substituted alkylene, substituted heteroalkylene, substituted cycloalkylene, substituted heterocycloalkylene, substituted arylene, and/or substituted heteroarylene) is substituted with at least one substituent group, wherein if the substituted moiety is substituted with a plurality of substituent groups, each substituent group may optionally be different. In embodiments, if the substituted moiety is substituted with a plurality of substituent groups, each substituent group is different.
In embodiments, a substituted moiety (e.g., substituted alkyl, substituted heteroalkyl, substituted cycloalkyl, substituted heterocycloalkyl, substituted aryl, substituted heteroaryl, substituted alkylene, substituted heteroalkylene, substituted cycloalkylene, substituted heterocycloalkylene, substituted arylene, and/or substituted heteroarylene) is substituted with at least one size-limited substituent group, wherein if the substituted moiety is substituted with a plurality of size-limited substituent groups, each size-limited substituent group may optionally be different. In embodiments, if the substituted moiety is substituted with a plurality of size-limited substituent groups, each size-limited substituent group is different.
In embodiments, a substituted moiety (e.g., substituted alkyl, substituted heteroalkyl, substituted cycloalkyl, substituted heterocycloalkyl, substituted aryl, substituted heteroaryl, substituted alkylene, substituted heteroalkylene, substituted cycloalkylene, substituted heterocycloalkylene, substituted arylene, and/or substituted heteroarylene) is substituted with at least one lower substituent group, wherein if the substituted moiety is substituted with a plurality of lower substituent groups, each lower substituent group may optionally be different. In embodiments, if the substituted moiety is substituted with a plurality of lower substituent groups, each lower substituent group is different.
In embodiments, a substituted moiety (e.g., substituted alkyl, substituted heteroalkyl, substituted cycloalkyl, substituted heterocycloalkyl, substituted aryl, substituted heteroaryl, substituted alkylene, substituted heteroalkylene, substituted cycloalkylene, substituted heterocycloalkylene, substituted arylene, and/or substituted heteroarylene) is substituted with at least one substituent group, size-limited substituent group, or lower substituent group; wherein if the substituted moiety is substituted with a plurality of groups selected from substituent groups, size-limited substituent groups, and lower substituent groups; each substituent group, size-limited substituent group, and/or lower substituent group may optionally be different. In embodiments, if the substituted moiety is substituted with a plurality of groups selected from substituent groups, size-limited substituent groups, and lower substituent groups; each substituent group, size-limited substituent group, and/or lower substituent group is different.
Certain compounds of the present disclosure possess asymmetric carbon atoms (optical or chiral centers) or double bonds: the enantiomers, racemates, diastereomers, tautomers, geometric isomers, stereoisometric forms that may be defined, in terms of absolute stereochemistry, as (R)- or (S)- or, as (D)- or (L)- for amino acids, and individual isomers are encompassed within the scope of the present disclosure. The compounds of the present disclosure do not include those that are known in art to be too unstable to synthesize and/or isolate. The present disclosure is meant to include compounds in racemic and optically pure forms. Optically active (R)- and (S)-, or (D)- and (L)-isomers may be prepared using chiral synthons or chiral reagents, or resolved using conventional techniques. When the compounds described herein contain olefinic bonds or other centers of geometric asymmetry, and unless specified otherwise, it is intended that the compounds include both E and Z geometric isomers. As used herein, the term “isomers” refers to compounds having the same number and kind of atoms, and hence the same molecular weight, but differing in respect to the structural arrangement or configuration of the atoms. The term “tautomer,” as used herein, refers to one of two or more structural isomers which exist in equilibrium and which are readily converted from one isomeric form to another. It will be apparent to one skilled in the art that certain compounds of this disclosure may exist in tautomeric forms, all such tautomeric forms of the compounds being within the scope of the disclosure. Unless otherwise stated, structures depicted herein are also meant to include all stereochemical forms of the structure; i.e., the R and S configurations for each asymmetric center. Therefore, single stereochemical isomers as well as enantiomeric and diastereomeric mixtures of the present compounds are within the scope of the disclosure.
It should be noted that throughout the application that alternatives are written in Markush groups, for example, each amino acid position that contains more than one possible amino acid. It is specifically contemplated that each member of the Markush group should be considered separately, thereby comprising another embodiment, and the Markush group is not to be read as a single unit.
“Analog,” or “analogue” is used in accordance with its plain ordinary meaning within Chemistry and Biology and refers to a chemical compound that is structurally similar to another compound (i.e., a so-called “reference” compound) but differs in composition, e.g., in the replacement of one atom by an atom of a different element, or in the presence of a particular functional group, or the replacement of one functional group by another functional group, or the absolute stereochemistry of one or more chiral centers of the reference compound. Accordingly, an analog is a compound that is similar or comparable in function and appearance but not in structure or origin to a reference compound.
The terms “a” or “an,” as used in herein means one or more. In addition, the phrase “substituted with a[n],” as used herein, means the specified group may be substituted with one or more of any or all of the named substituents. For example, where a group, such as an alkyl or heteroaryl group, is “substituted with an unsubstituted C1-C20 alkyl, or unsubstituted 2 to 20 membered heteroalkyl,” the group may contain one or more unsubstituted C1-C20 alkyls, and/or one or more unsubstituted 2 to 20 membered heteroalkyls.
Where a moiety is substituted with an R substituent, the group may be referred to as “R-substituted,” Where a moiety is R-substituted, the moiety is substituted with at least one R substituent and each R substituent is optionally different. Where a particular R group is present in the description of a chemical genus (such as Formula (I)), a Roman alphabetic symbol may be used to distinguish each appearance of that particular R group. For example, where multiple R3 substituents are present, each R3 substituent may be distinguished as R3A, R3B, wherein each of R3A, R3B, is defined within the scope of the definition of R3 and optionally differently.
A person of ordinary skill in the art will understand when a variable (e.g., moiety or linker) of a compound or of a compound genus (e.g., a genus described herein) is described by a name or formula of a standalone compound with all valencies filled, the unfilled valence(s) of the variable will be dictated by the context in which the variable is used. For example, when a variable of a compound as described herein is connected (e.g., bonded) to the remainder of the compound through a single bond, that variable is understood to represent a monovalent form (i.e., capable of forming a single bond due to an unfilled valence) of a standalone compound (e.g., if the variable is named “methane” in an embodiment but the variable is known to be attached by a single bond to the remainder of the compound, a person of ordinary skill in the art would understand that the variable is actually a monovalent form of methane, i.e., methyl or —CH3). Likewise, for a linker variable (e.g., L1, L2, or L3 as described herein), a person of ordinary skill in the art will understand that the variable is the divalent form of a standalone compound (e.g., if the variable is assigned to “PEG” or “polyethylene glycol” in an embodiment but the variable is connected by two separate bonds to the remainder of the compound, a person of ordinary skill in the art would understand that the variable is a divalent (i.e., capable of forming two bonds through two unfilled valences) form of PEG instead of the standalone compound PEG).
The term “bond” or “bonded” refers to direct bonds, such as covalent bonds (e.g., direct or a linking group), or indirect bonds, such as non-covalent bond (e.g., electrostatic interactions (e.g., ionic bond, hydrogen bond, halogen bond), van der Waals interactions (e.g., dipole-dipole, dipole-induced dipole, London dispersion), ring stacking (pi effects), hydrophobic interactions, and the like).
The terms “bioconjugate” and “bioconjugate linker” refers to the resulting association between atoms or molecules of “bioconjugate reactive groups” or “bioconjugate reactive moieties”. The association can be direct or indirect. For example, a conjugate between a first bioconjugate reactive group (e.g., —NH2, —C(O)OH, —N-hydroxysuccinimide, or -maleimide) and a second bioconjugate reactive group (e.g., sulfhydryl, sulfur-containing amino acid, amine, amine sidechain containing amino acid, or carboxylate) provided herein can be direct, e.g., by covalent bond or linker (e.g. a first linker of second linker), or indirect, e.g., by non-covalent bond (e.g. electrostatic interactions (e.g. ionic bond, hydrogen bond, halogen bond), van der Waals interactions (e.g. dipole-dipole, dipole-induced dipole, London dispersion), ring stacking (pi effects), hydrophobic interactions and the like). In embodiments, bioconjugates or bioconjugate linkers are formed using bioconjugate chemistry (i.e. the association of two bioconjugate reactive groups) including, but are not limited to nucleophilic substitutions (e.g., reactions of amines and alcohols with acyl halides, active esters), electrophilic substitutions (e.g., enamine reactions) and additions to carbon-carbon and carbon-heteroatom multiple bonds (e.g., Michael reaction, Diels-Alder addition). These and other useful reactions are discussed in, for example, March, Advanced Organic Chemistry, 3rd Ed., John Wiley & Sons, New York, 1985; Hermanson, Bioconjugate Techniques, Academic Press, San Diego, 1996; and Feeney et al, Modification of Proteins, Advances in Chemistry Series, Vol. 198, American Chemical Society, Washington, D.C., 1982. In embodiments, the first bioconjugate reactive group (e.g., unnatural amino acid side chain) is covalently attached to the second bioconjugate reactive group (e.g., a hydroxyl group).
“Siglec” or “sialic-acid-binding immunoglobulin-like lectin” refers to a subset of 1-type lectins that bind to sialoglycans and are predominantly expressed on cells of the hematopoietic system in a manner dependent on cell type and differentiation. Whereas sialic acid is ubiquitously expressed, typically at the terminal position of glycoproteins and lipids, only specific, distinct sialoglycan structures are recognized by individual Siglec receptors, depending on identity and linkage to subterminal carbohydrate moieties. Siglecs are generally divided into two groups, a first subset made up of Siglec-1, Siglec-2, Siglec-4 and Siglec-15, and the CD33-related group of Siglecs which includes Siglec-3, Siglec-5, Siglec-6, Siglec-7, Siglec-8, Siglec-9, Siglec-10, Siglec-11, Siglec-12, Siglec-14 and Siglec-16.
“Siglec-7” or “CD328” is a type I trans-membrane protein belonging to the human CD33-related Siglec receptors, is characterized by a sialic acid binding N-terminal V-set Ig domain, two C2-set Ig domains and an intracytoplasmic region containing one immune-receptor tyrosine based inhibitory motif (ITIM) and one ITIM-like motif. Siglec-7 is constitutively expressed on NK cells, dendritic cells, monocytes and neutrophils. The extracellular domain of this receptor preferentially binds a (2,8)-linked disialic acids and branched a 2,6-sialyl residues, such as those displayed by ganglioside GD3.
Provided herein are biomolecules formed through the interaction of latent bioreactive unnatural amino acids with naturally occurring amino acids. The compound of Formula (I), a bioreactive unnatural amino acid, facilitates formation of chemically reactive amino acids with proximal target amino acid residues (e.g., lysine, arginine) by undergoing a click chemistry reaction (e.g., sulfur-fluoride exchange reaction (SuFEx)). For example, the compound of Formula (I) may be inserted into or replace an amino acid in a naturally occurring protein, thereby endowing the protein with the ability to form a chemically reactive amino acid with proximally positioned target functional groups (e.g., a hydroxyl group in a glycan) or amino acid residues (e.g., serine, threonine) with other proteins. The compound of Formula (I) may be used to facilitate the formation of chemically reactive amino acids in proteins and within proteins in both in vitro and in vivo conditions. As such, the bioreactive unnatural amino acid of Formula (I) is useful for forming chemically reactive amino acid residues that can be further chemically modified, as desired.
The compound of Formula (I) has shown excellent chemical functionality (i.e., superior properties) compared to previously described bioreactive unnatural amino acids. For example, the compound of Formula (I) is stable, nontoxic and nonreactive inside cells, yet when placed in proximity to target amino acid residues or reactive moieties (e.g., a hydroxyl group in a glycan) it becomes reactive under cellular conditions. The compound of Formula (I) is able to react with target amino acid residues or other reactive moieties (e.g., a hydroxyl group in a glycan) with great selectivity via proximity-enabled SuFEx reaction within and between proteins and glycans under physiological conditions.
Provided herein are compounds of Formula (I):
wherein R1, L1, and x are as defined herein. In embodiments, the compound of Formula (I) is referred to as an unnatural amino acid.
In embodiments, the compound of Formula (I) is a compound of Formula (IA):
wherein R1, L1, and x are as defined herein.
In embodiments, the compound of Formula (I) is a compound of Formula (IB):
In embodiments, the compound of Formula (IB) is referred to as SFY.
Provided herein are biomolecules comprising an unnatural amino acid, wherein the unnatural amino comprises a side chain of Formula (II):
wherein R1, L1, and x are as defined herein. In embodiments, the biomolecules are proteins, lipids, RNA, or glycans. In embodiments, the biomolecule is a lipid. In embodiments, the biomolecule is RNA. In embodiments, the biomolecule is a glycan. In embodiments, the biomolecule is a protein.
Provided herein are proteins comprising an unnatural amino acid, wherein the unnatural amino comprises a side chain of Formula (II):
wherein R1, L1, and x are as defined herein. In embodiments, the protein comprising the unnatural amino acid comprises a RNA-binding protein. In embodiments, the protein comprising the unnatural amino acid comprises a N6-methyladenosine reader protein. In embodiments, the protein comprising the unnatural amino acid comprises a N6-methyladenosine demethylase protein. In embodiments, the protein comprising the unnatural amino acid comprises a glycan-binding protein. In embodiments, the protein comprising the unnatural amino acid comprises Siglec. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-1, Siglec-2, Siglec-3, Siglec-4, Siglec-5, Siglec-6, Siglec-7, Siglec-8, Siglec-9, Siglec-10, Siglec-11, Siglec-12, Siglec-14, or Siglec-15. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-1. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-2. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-3 In embodiments, the protein comprising the unnatural amino acid comprises Siglec-4. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-5. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-6. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-8. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-9. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-10. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-11. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-12. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-14. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-15. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-7. In embodiments, the protein comprising the unnatural amino acid comprises Siglec-7 (e.g., SEQ ID NO:1, including embodiments as described herein). In embodiments, the protein comprising the unnatural amino acid comprises a glycan binding V-set domain of a glycan. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of a Siglec. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-1, Siglec-2, Siglec-3, Siglec-4, Siglec-5, Siglec-6, Siglec-7, Siglec-8, Siglec-9, Siglec-10, Siglec-11, Siglec-12, Siglec-14, or Siglec-15. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-1. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-2. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-3. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-4. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-5. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-6. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-8. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-9. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-10. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-11. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-12. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-14. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-15. In embodiments, the protein comprising the unnatural amino acid comprises a sialoglycan binding V-set domain of Siglec-7 (e.g., SEQ ID NO:2, SEQ ID NO:3, or SEQ ID NO:4, including embodiments as described herein). In embodiments, the term “sialoglycan binding V-set domain” is equivalent to the term “sialoglycan binding domain.”
In embodiments, the unnatural amino comprises a side chain of Formula (II) is an unnatural amino acid side chain of Formula (IIA):
wherein R1, L1, and x are as defined herein. In embodiments, the protein is a protein as described for Formula (II), e.g., RNA-binding protein, glycan-binding protein, Siglec, Siglec-1, Siglec-2, Siglec-3, Siglec-4, Siglec-5, Siglec-6, Siglec-7, Siglec-8, Siglec-9, Siglec-10, Siglec-11, Siglec-12, Siglec-14, or Siglec-15; a glycan binding domain of a glycan-binding protein; or a sialoglycan binding V-set domain of Siglec, Siglec-1, Siglec-2, Siglec-3, Siglec-4, Siglec-5, Siglec-6, Siglec-7, Siglec-8, Siglec-9, Siglec-10, Siglec-11, Siglec-12, Siglec-14, or Siglec-15, and all embodiments thereof as described for Formula (II).
In embodiments, the unnatural amino comprises a side chain of Formula (II) is an unnatural amino acid side chain of Formula (IIB):
Provided herein are biomolecule conjugates of Formula (III):
where R1, R2, R3, L1, L2, L3, and x are as defined herein.
In embodiments, the biomolecule conjugate of Formula (III) is a biomolecule conjugate of Formula (IIIA):
where R1, R2, R3, L1, L2, L3, and x are as defined herein.
In embodiments, the biomolecule conjugate of Formula (III) is a biomolecule conjugate of Formula (IIIB):
where R2, R3, L2, and L3 are as defined herein.
Provided herein are compounds of Formula (IV):
where L4 is as defined herein.
In embodiments, the compound of Formula (IV) is a compound of Formula (IVA):
wherein x is as defined herein.
In embodiments, the compound of Formula (IV) is NHFS:
Provided herein are compounds of Formula (V):
where L5 is as defined herein.
With reference to the compounds described herein, x is an integer from 0 to 8. In embodiments, x is an integer from 1 to 8. In embodiments, x is an integer from 1 to 7. In embodiments, x is an integer from 1 to 6. In embodiments, x is an integer from 1 to 5. In embodiments, x is an integer from 1 to 4. In embodiments, x is an integer from 1 to 3. In embodiments, x is an integer of 1 or 2. In embodiments, x is 1. In embodiments, x is 2. In embodiments, x is 3. In embodiments, x is 4. In embodiments, x is 5. In embodiments, x is 6. In embodiments, x is 7. In embodiments, x is 8. In embodiments, x is 0.
With reference to the compounds described herein, R1 is halogen, —CX13, —CHX12, —CH2X1, —OCX13, —OCH2X1, —OCHX12, —CN, —SOn1R1A, —SOv1NR1AR1B, —NHC(O)NR1AR1B, —N(O)m1, —NR1AR1B, —C(O)R1A, —C(O)—OR1A, —C(O)NR1AR1B, —OR1A, —NR1ASO2R1B, —NR1AC(O)R1B, —NR1AC(O)OR1B, —NR1AOR1B, substituted or unsubstituted alkyl, or substituted or unsubstituted heteroalkyl. In embodiments, R1 is halogen, —CX13, —CHX12, —CH2X1, —OCX13, —OCH2X1, —OCHX12, —CN, —SOn1R1A, —SOv1NR1AR1B, —NHC(O)NR1AR1B, —N(O)m1, —NR1AR1B, —C(O)R1A, —C(O)—OR1A, —C(O)NR1AR1B, —OR1A, —NR1ASO2R1B, —NR1AC(O)R1B, —NR1AC(O)OR1B, —NR1AOR1B, or substituted or unsubstituted heteroalkyl. In embodiments, R1 is halogen, —CX13, —CHX12, —CH2X1, —OCX13, —OCH2X1, —OCHX12, —CN, —SOn1R1A, —SOv1NR1AR1B, —NHC(O)NR1AR1B, —N(O)m1, —NR1AR1B, —C(O)R1A, —C(O)—OR1A, —C(O)NR1AR1B, —OR1A, —NR1ASO2R1B, —NR1AC(O)R1B, —NR1AC(O)OR1B, —NR1AOR1B, or unsubstituted heteroalkyl. In embodiments, R1 is —CN, —SOn1R1A, —SOv1NR1AR1B, —NHC(O)NR1AR1B, —N(O)m1, —NR1AR1B, —C(O)R1A, —C(O)—OR1A, —C(O)NR1AR1B, —OR1A, —NR1ASO2R1B, —NR1AC(O)R1B, —NR1AC(O)OR1B, —NR1AOR1B, or unsubstituted heteroalkyl. In embodiments, R1 is —CN, —NHC(O)NR1AR1B, —N(O)m1, —NR1AR1B, —C(O)R1A, —C(O)—OR1A, —C(O)NR1AR1B, —OR1A, —NR1ASO2R1B, —NR1AC(O)R1B, —NR1AC(O)OR1B, —NR1AOR1B, or unsubstituted heteroalkyl. In embodiments, the alkyl is a C1-4 alkyl. In embodiments, R1 is substituted or unsubstituted heteroalkyl. In embodiments, R1 is unsubstituted heteroalkyl. In embodiments, R1 is unsubstituted 2 to 8 membered heteroalkyl. In embodiments, R1 is unsubstituted 2 to 6 membered heteroalkyl. In embodiments, R1 is unsubstituted 2 to 4 membered heteroalkyl. In embodiments, R1 is —O—(CH2)mCH3, and m is an integer from 0 to 6. In embodiments, R1 is —O—(CH2)mCH3, and m is an integer from 0 to 4. In embodiments, R1 is —O—(CH2)mCH3, and m is an integer from 0 to 3. In embodiments, R1 is —O—(CH2)mCH3, and m is an integer from 0 to 2. In embodiments, R1 is —O—(CH2)mCH3, and m is 0 or 1. In embodiments, R1 is —O—CH3. In embodiments. R1 is —O—CH2CH3, In embodiments, R1 is —O—(CH2)2CH3, In embodiments, R1 is —O—(CH2)3CH3.
With reference to the compounds described herein, R1 is ortho, para, or meta to the —S(═O)2F group. In embodiments, R1 is ortho to the —S(═O)2F group. In embodiments, R1 is para to the —S(═O)2F group. In embodiments, R1 is meta to the —S(═O)2F group.
With reference to the compounds described herein, R1A is hydrogen, substituted or unsubstituted alkyl, or substituted or unsubstituted heteroalkyl. In embodiments, R1A is hydrogen, unsubstituted alkyl, or unsubstituted heteroalkyl. In embodiments, R1A is hydrogen, substituted or unsubstituted C1-4 alkyl, or substituted or unsubstituted 2 to 4 membered heteroalkyl. In embodiments, R1A is hydrogen, unsubstituted C1-4 alkyl, or unsubstituted 2 to 4 membered heteroalkyl. In embodiments, R1A is hydrogen. In embodiments, R1A is unsubstituted C1-4 alkyl. In embodiments, R1A is unsubstituted 2 to 4 membered heteroalkyl.
With reference to the compounds described herein, R1B is hydrogen, substituted or unsubstituted alkyl, or substituted or unsubstituted heteroalkyl. In embodiments, R1B is hydrogen, unsubstituted alkyl, or unsubstituted heteroalkyl. In embodiments, R1B is hydrogen, substituted or unsubstituted C1-4 alkyl, or substituted or unsubstituted 2 to 4 membered heteroalkyl. In embodiments, R1B is hydrogen, unsubstituted C1-4 alkyl, or unsubstituted 2 to 4 membered heteroalkyl. In embodiments, R1B is hydrogen. In embodiments, R1B is unsubstituted C1-4 alkyl. In embodiments, R1B is unsubstituted 2 to 4 membered heteroalkyl.
With reference to the compounds described herein, X1 is independently —F, —Cl, —Br, or —I. In embodiments, X1 is independently —F or —Cl. In embodiments, X1 is —F. In embodiments, X1 is —Cl. In embodiments, X1 is —Br. In embodiments, X1 is —I.
With reference to the compounds described herein, n1 is an integer from 0 to 4. In embodiments n1 is an integer from 0 to 3. In embodiments n1 is an integer from 0 to 2. In embodiments n1 is 0. In embodiments n1 is 1. In embodiments n1 is 2. In embodiments n1 is 3. In embodiments n1 is 4.
With reference to the compounds described herein, m1 is 1 or 2. In embodiments, m1 is 1. In embodiments, m1 is 2.
With reference to the compounds described herein, v1 is 1 or 2. In embodiments, v1 is 1. In embodiments, v1 is 2.
With reference to the compounds described herein, L1 is a bond, substituted or unsubstituted alkylene, or substituted or unsubstituted heteroalkylene. In embodiments, L1 is a bond. In embodiments, L1 is substituted or unsubstituted alkylene. In embodiments, L1 is substituted or unsubstituted C1-6 alkylene. In embodiments, L1 is substituted or unsubstituted C1-4 alkylene. In embodiments, L1 is substituted or unsubstituted heteroalkylene. In embodiments, L1 is substituted or unsubstituted 2 to 8 membered heteroalkylene. In embodiments, L1 is substituted or unsubstituted 2 to 6 membered heteroalkylene. In embodiments, L1 is —NH—C(O)—(CH2)y— or —NH—C(O)—O—(CH2)y—, and y is an integer from 0 to 6. In embodiments, L1 is —NH—C(O)—(CH2)y— or —NH—C(O)—O—(CH2)y—, and y is an integer from 0 to 5. In embodiments, L1 is —NH—C(O)—(CH2)y— or —NH—C(O)—O—(CH2)y—, and y is an integer from 0 to 4. In embodiments, L1 is —NH—C(O)—(CH2)y— or —NH—C(O)—O—(CH2)y—, and y is an integer from 0 to 3. In embodiments, L1 is —NH—C(O)—(CH2)y— or —NH—C(O)—O—(CH2)y—, and y is an integer from 0 to 2. In embodiments, L1 is —NH—C(O)—(CH2)y—, and y is an integer from 0 to 3. In embodiments, L1 is —NH—C(O)—. In embodiments, L1 is —NH—C(O)—(CH2)—. In embodiments, L1 is —NH—C(O)—(CH2)2—. In embodiments, L1 is —NH—C(O)—(CH2)3—. In embodiments, L1 is —NH—C(O)—O—(CH2)y—, and y is an integer from 0 to 3. In embodiments, L1 is —NH—C(O)—O—. In embodiments, L1 is —NH—C(O)—O—(CH2)—. In embodiments, L1 is —NH—C(O)—O—(CH2)2—. In embodiments, L1 is —NH—C(O)—O—(CH2)3—.
With reference to the compounds described herein, L2 is a bond, —NR2A—, —S—, —S(O)2—, —O—, —C(O)—, —C(O)O—, —OC(O)—, —N(R2A)C(O)—, —C(O)N(R2A)—, —NR2AC(O)NR2B—, —NR2AC(NH)NR2B—, —SO2N(R2A)—, —N(R2A)SO2—, —C(S)—, substituted or unsubstituted alkylene, substituted or unsubstituted heteroalkylene, substituted or unsubstituted cycloalkylene, substituted or unsubstituted heterocycloalkylene, substituted or unsubstituted arylene, or substituted or unsubstituted heteroarylene. In embodiments, L2 is a bond, —NH—, —S—, —S(O)2—, —O—, —C(O)—, —C(O)O—, —OC(O)—, —NHC(O)—, —C(O)NH—, —NHC(O)NH—, —NHC(NH)NH—, —SO2NH—, —NHSO2—, —C(S)—, L12-substituted or unsubstituted alkylene, L12-substituted or unsubstituted heteroalkylene, L12-substituted or unsubstituted cycloalkylene, L12-substituted or unsubstituted heterocycloalkylene, L12-substituted or unsubstituted arylene, or L12-substituted or unsubstituted heteroarylene. In embodiments, L2 is a bond, —NH—, —S—, —S(O)2—, —O—, —C(O)—, —C(O)O—, —OC(O)—, —NHC(O)—, —C(O)NH—, —NHC(O)NH—, —NHC(NH)NH—, —SO2NH—, —NHSO2—, —C(S)—, unsubstituted alkylene, unsubstituted heteroalkylene, unsubstituted cycloalkylene, unsubstituted heterocycloalkylene, unsubstituted arylene, or unsubstituted heteroarylene. In embodiments. L2 is a bond. In embodiments, the alkylene is a C1-6 alkylene. In embodiments, the alkylene is a C1-4 alkylene. In embodiments, the heteroalkylene is a 2 to 6 membered heteroalkylene. In embodiments, the heteroalkylene is a 2 to 4 membered heteroalkylene. In embodiments, the cycloalkylene is a C5-C6 cycloalkylene. In embodiments, the heterocycloalkylene is a 5 or 6 membered heterocycloalkylene. In embodiments, the arylene is a C5-6 arylene. In embodiments, the heteroarylene is a 5 or 6 membered heteroarylene.
With reference to the compounds described herein. R2A and R2B are independently hydrogen, substituted or unsubstituted alkyl, substituted or unsubstituted heteroalkyl, substituted or unsubstituted cycloalkyl, substituted or unsubstituted heterocycloalkyl, substituted or unsubstituted aryl, or substituted or unsubstituted heteroaryl. In embodiments, the alkylene is a C1-4 alkylene. In embodiments, the heteroalkylene is a 2 to 6 membered heteroalkylene. In embodiments, the heteroalkylene is a 2 to 4 membered heteroalkylene. In embodiments, the cycloalkylene is a C5-C6 cycloalkylene. In embodiments, the heterocycloalkylene is a 5 or 6 membered heterocycloalkylene. In embodiments, the arylene is a C5-6 arylene. In embodiments, the heteroarylene is a 5 or 6 membered heteroarylene.
With reference to the compounds described herein, L12 is halogen, —CF3, —CBr3, —CCl3, —CI3, —CHF2, —CHBr2, —CHCl2, —CHI2, —CH2F, —CH2Br, —CH2Cl, —CH2I, —OCF3, —OCBr3, —OCCl3, —OCI3, —OCHF2, —OCHBr2, —OCHCl2, —OCHI2, —OCH2F, —OCH2Br, —OCH2Cl, —OCH2I, —CN, —OH, —NH2, —COOH, —CONH2, —NO2, —SH, —SO3H, —SO4H, —SO2NH2, —NHNH2, —ONH2, —NHC(O)NHNH2, —N(O)2, —NHSO2H, —NHC(O)H, —NHC(O)OH, —NHOH, —N3, unsubstituted alkyl, unsubstituted heteroalkyl, unsubstituted cycloalkyl, unsubstituted heterocycloalkyl, unsubstituted aryl, or unsubstituted heteroaryl. In embodiments, the alkylene is a C1-4 alkylene. In embodiments, the heteroalkylene is a 2 to 6 membered heteroalkylene. In embodiments, the heteroalkylene is a 2 to 4 membered heteroalkylene. In embodiments, the cycloalkylene is a C5-C6 cycloalkylene. In embodiments, the heterocycloalkylene is a 5 or 6 membered heterocycloalkylene. In embodiments, the arylene is a C5-6 arylene. In embodiments, the heteroarylene is a 5 or 6 membered heteroarylene.
With reference to the compounds described herein, L3 is a bond, —N(R3A)—, —S—, —S(O)2—, —O—, —C(O)—, —C(O)O—, —OC(O)—, —N(R3A)C(O)—, —C(O)N(R3A)—, —NR3AC(O)NR3B—, —NR3AC(NH)NR3B—, —SO2N(R3A)—, —N(R3A)SO2—, —C(S)—, substituted or unsubstituted alkylene, substituted or unsubstituted heteroalkylene, substituted or unsubstituted cycloalkylene, substituted or unsubstituted heterocycloalkylene, substituted or unsubstituted arylene, or substituted or unsubstituted heteroarylene. In embodiments, L3 is a bond, —NH—, —S—, —S(O)2—, —O—, —C(O)—, —C(O)O—, —OC(O)—, —NHC(O)—, —C(O)NH—, —NHC(O)NH—, —NHC(NH)NH—, —SO2NH—, —NHSO2—, —C(S)—, L13-substituted or unsubstituted alkylene, L13-substituted or unsubstituted heteroalkylene, L13-substituted or unsubstituted cycloalkylene, L13-substituted or unsubstituted heterocycloalkylene, L13-substituted or unsubstituted arylene, or L13-substituted or unsubstituted heteroarylene. In embodiments, the alkylene is a C1-4 alkylene. In embodiments, the heteroalkylene is a 2 to 6 membered heteroalkylene. In embodiments, the heteroalkylene is a 2 to 4 membered heteroalkylene. In embodiments, the cycloalkylene is a C5-C6 cycloalkylene. In embodiments, the heterocycloalkylene is a 5 or 6 membered heterocycloalkylene. In embodiments, the arylene is a C5-6 arylene. In embodiments, the heteroarylene is a 5 or 6 membered heteroarylene.
With reference to the compounds described herein, R3A and R3B are independently hydrogen, substituted or unsubstituted alkyl, substituted or unsubstituted heteroalkyl, substituted or unsubstituted cycloalkyl, substituted or unsubstituted heterocycloalkyl, substituted or unsubstituted aryl, or substituted or unsubstituted heteroaryl. In embodiments, the alkylene is a C1-4 alkylene. In embodiments, the heteroalkylene is a 2 to 6 membered heteroalkylene. In embodiments, the heteroalkylene is a 2 to 4 membered heteroalkylene. In embodiments, the cycloalkylene is a C5-C6 cycloalkylene. In embodiments, the heterocycloalkylene is a 5 or 6 membered heterocycloalkylene. In embodiments, the arylene is a C5-6 arylene. In embodiments, the heteroarylene is a 5 or 6 membered heteroarylene.
With reference to the compounds described herein, L13 is halogen, —CF3, —CBr3, —CCl3, —CI3, —CHF2, —CHBr2, —CHCl2, —CHI2, —CH2F, —CH2Br, —CH2Cl, —CH2I, —OCF3, —OCBr3, —OCCl3, —OCI3, —OCHF2, —OCHBr2, —OCHCl2, —OCHI2, —OCH2F, —OCH2Br, —OCH2Cl, —OCH2I, —CN, —OH, —NH2, —COOH, —CONH2, —NO2, —SH, —SO3H, —SO4H, —SO2NH2, —NHNH2, —ONH2, —NHC(O)NHNH2, —N(O)2, —NHSO2H, —NHC(O)H, —NHC(O)OH, —NHOH, —N3, unsubstituted alkyl, unsubstituted heteroalkyl, unsubstituted cycloalkyl, unsubstituted heterocycloalkyl, unsubstituted aryl, or unsubstituted heteroaryl. In embodiments, the alkylene is a C1-4 alkylene, In embodiments, the heteroalkylene is a 2 to 6 membered heteroalkylene. In embodiments, the heteroalkylene is a 2 to 4 membered heteroalkylene. In embodiments, the cycloalkylene is a C5-C6 cycloalkylene. In embodiments, the heterocycloalkylene is a 5 or 6 membered heterocycloalkylene. In embodiments, the arylene is a C5-6 arylene. In embodiments, the heteroarylene is a 5 or 6 membered heteroarylene.
With reference to the compounds described herein, L4 is a bond, —N(R4A)—, —S—, —S(O)2—, —C(O)—, —C(O)O—, —O—, —OC(O)—, —N(R4A)C(O)—, —C(O)N(R4A)—, —NR4AC(O)NR4B, —NR4AC(NH)NR4B—, —SO2N(R4A)—, —N(R4A)SO2—, —C(S)—, substituted or unsubstituted alkylene, substituted or unsubstituted heteroalkylene, substituted or unsubstituted cycloalkylene, substituted or unsubstituted heterocycloalkylene, substituted or unsubstituted arylene, or substituted or unsubstituted heteroarylene. In embodiments, L4 is a bond, substituted or unsubstituted alkyl, or substituted or unsubstituted heteroalkyl. In embodiments, wherein L4 is a bond, unsubstituted alkyl, or unsubstituted heteroalkyl. In embodiments, wherein L4 is substituted alkyl. In embodiments, wherein L4 is substituted C1-8 alkyl. In embodiments, wherein L4 is substituted C1-6 alkyl. In embodiments, wherein L4 is substituted C1-4 alkyl. In embodiments, wherein L4 is unsubstituted alkyl. In embodiments, wherein L4 is unsubstituted C1-8 alkyl. In embodiments, wherein L4 is unsubstituted C1-6 alkyl. In embodiments, wherein L4 is unsubstituted C1-4 alkyl. In embodiments, wherein L4 is unsubstituted heteroalkyl. In embodiments, wherein L4 is unsubstituted 2 to 8 membered heteroalkyl. In embodiments, wherein L4 is unsubstituted 2 to 6 membered heteroalkyl. In embodiments, wherein L4 is unsubstituted 2 to 4 membered heteroalkyl. In embodiments, wherein L4 is substituted heteroalkyl. In embodiments, wherein L4 is substituted 2 to 8 membered heteroalkyl. In embodiments, wherein L4 is substituted 2 to 6 membered heteroalkyl. In embodiments, wherein L4 is substituted 2 to 4 membered heteroalkyl. In embodiments, L4 is a bond or —NH—(CH2)y—C(═O)—. In embodiments, L4 is a bond. In embodiments, L4 is —NH—(CH2)—C(═O)—. In embodiments, L4 is —NH—(CH2)2—C(═O)—. In embodiments, L4 is —NH—(CH2)y—C(═O)—. In embodiments, L4 is —NH—(CH2)4—C(═O)—. In embodiments, L4 is —NH—(CH2)5—C(═O)—. In embodiments, L4 is —NH—(CH2)6—C(—O)—. In embodiments, L4 is —NH—(CH2)7—C(═O)—. In embodiments, L4 is —NH—(CH2)8—C(═O)—.
With reference to the compounds described herein, R4A and R4B are independently hydrogen, substituted or unsubstituted alkyl, substituted or unsubstituted heteroalkyl, substituted or unsubstituted cycloalkyl, substituted or unsubstituted heterocycloalkyl, substituted or unsubstituted aryl, or substituted or unsubstituted heteroaryl. In embodiments, R4A and R4B are independently hydrogen, substituted or unsubstituted C1-4 alkyl, substituted or unsubstituted 2 to 4 membered heteroalkyl, substituted or unsubstituted C5-6 cycloalkyl, substituted or unsubstituted 5 or 6 membered heterocycloalkyl, substituted or unsubstituted C5-6 aryl, or substituted or unsubstituted 5 or 6 membered heteroaryl. In embodiments, R4A and R4B are independently hydrogen, unsubstituted C1-4 alkyl, unsubstituted 2 to 4 membered heteroalkyl, unsubstituted C5-6 cycloalkyl, unsubstituted 5 or 6 membered heterocycloalkyl, unsubstituted C5-6 aryl, or unsubstituted 5 or 6 membered heteroaryl. In embodiments, R4A and R4B are independently hydrogen, substituted or unsubstituted C1-4 alkyl, or substituted or unsubstituted 2 to 4 membered heteroalkyl. In embodiments, R4A and R4B are hydrogen. In embodiments, R4A and R4B are substituted or unsubstituted C1-4 alkyl. In embodiments, R4A and R4B are unsubstituted C1-4 alkyl. In embodiments, R4A and R4B are substituted or unsubstituted 2 to 4 membered heteroalkyl. In embodiments, R4A and R4B are unsubstituted 2 to 4 membered heteroalkyl.
With reference to the compounds described herein, L5 is a bond, —N(R5A)—, —S—, —S(O)2—, —C(O)—, —C(O)O—, —O—, —OC(O)—, —N(R5A)C(O)—, —C(O)N(R5A)—, —NR5AC(O)NR5B, —NR5AC(NH)NR5B—, —SO2N(R5A)—, —N(R5A)SO2—, —C(S)—, substituted or unsubstituted alkylene, substituted or unsubstituted heteroalkylene, substituted or unsubstituted cycloalkylene, substituted or unsubstituted heterocycloalkylene, substituted or unsubstituted arylene, or substituted or unsubstituted heteroarylene. In embodiments, L5 is a bond, substituted or unsubstituted alkyl, or substituted or unsubstituted heteroalkyl. In embodiments, wherein L5 is a bond, unsubstituted alkyl, or unsubstituted heteroalkyl. In embodiments, wherein L5 is substituted alkyl. In embodiments, wherein L5 is substituted C1-8 alkyl. In embodiments, wherein L5 is substituted C1-6 alkyl. In embodiments, wherein L5 is substituted C1-4 alkyl. In embodiments, wherein L4 is unsubstituted alkyl. In embodiments, wherein L5 is unsubstituted C1-8 alkyl. In embodiments, wherein L5 is unsubstituted C1-6 alkyl. In embodiments, wherein L5 is unsubstituted C1-4 alkyl. In embodiments, wherein L5 is unsubstituted heteroalkyl. In embodiments, wherein L5 is unsubstituted 2 to 8 membered heteroalkyl. In embodiments, wherein L5 is unsubstituted 2 to 6 membered heteroalkyl. In embodiments, wherein L5 is unsubstituted 2 to 4 membered heteroalkyl. In embodiments, wherein L5 is substituted heteroalkyl. In embodiments, wherein L5 is substituted 2 to 8 membered heteroalkyl. In embodiments, wherein L5 is substituted 2 to 6 membered heteroalkyl. In embodiments, wherein L5 is substituted 2 to 4 membered heteroalkyl. In embodiments, L5 is a bond or —NH—(CH2)y—C(═O)—. In embodiments, L5 is a bond. In embodiments, L5 is —NH—(CH2)—C(═O)—. In embodiments, L5 is —NH—(CH2)2—C(═O)—. In embodiments, L5 is —NH—(CH2)3—C(═O)—. In embodiments, L5 is —NH—(CH2)4—C(—O)—. In embodiments, L5 is —NH—(CH2)5—C(═O)—. In embodiments, L5 is —NH—(CH2)6—C(═O)—. In embodiments, L5 is —NH—(CH2)7—C(═O)—. In embodiments, L5 is —NH—(CH2)8—C(═O)—.
With reference to the compounds described herein, R5A and R5B are independently hydrogen, substituted or unsubstituted alkyl, substituted or unsubstituted heteroalkyl, substituted or unsubstituted cycloalkyl, substituted or unsubstituted heterocycloalkyl, substituted or unsubstituted aryl, or substituted or unsubstituted heteroaryl. In embodiments, R5A and R5B are independently hydrogen, substituted or unsubstituted C1-4 alkyl, substituted or unsubstituted 2 to 4 membered heteroalkyl, substituted or unsubstituted C5-6 cycloalkyl, substituted or unsubstituted 5 or 6 membered heterocycloalkyl, substituted or unsubstituted C5-6 aryl, or substituted or unsubstituted 5 or 6 membered heteroaryl. In embodiments, R5A and R5B are independently hydrogen, unsubstituted C1-4 alkyl, unsubstituted 2 to 4 membered heteroalkyl, unsubstituted C5-6 cycloalkyl, unsubstituted 5 or 6 membered heterocycloalkyl, unsubstituted C5-6 aryl, or unsubstituted 5 or 6 membered heteroaryl. In embodiments, R5A and R5B are independently hydrogen, substituted or unsubstituted C1-4 alkyl, or substituted or unsubstituted 2 to 4 membered heteroalkyl. In embodiments, R5A and R5B are hydrogen. In embodiments, R5A and R5B are substituted or unsubstituted C1-4 alkyl. In embodiments, R5A and R5B are unsubstituted C1-4 alkyl. In embodiments, R5A and R5B are substituted or unsubstituted 2 to 4 membered heteroalkyl. In embodiments, R5A and R5B are unsubstituted 2 to 4 membered heteroalkyl.
With reference to the compounds described herein, R2 is a first biomolecule moiety. In embodiments, R2 is a peptidyl moiety, a lipid moiety, an RNA moiety, or a glycan moiety. In embodiments, R2 is a lipid moiety. In embodiments, R2 is a glycan moiety. In embodiments, R2 is an RNA moiety. In embodiments, R2 is a peptidyl moiety. In embodiments, the peptidyl moiety comprises a RNA-binding peptidyl moiety. In embodiments, the peptidyl moiety comprises a N6-methyladenosine reader peptidyl moiety. In embodiments, the peptidyl moiety comprises a N6-methyladenosine demethylase peptidyl moiety. In embodiments, the peptidyl moiety comprises a glycan-binding peptidyl moiety. In embodiments, the peptidyl moiety comprises Siglec-1. Siglec-2. Siglec-3, Siglec-4, Siglec-5, Siglec-6, Siglec-7, Siglec-8, Siglec-9, Siglec-10, Siglec-11, Siglec-12, Siglec-14, or Siglec-15. In embodiments, the protein moiety comprises Siglec-7. In embodiments, the peptidyl moiety comprises Siglec-7 (e.g., SEQ ID NO: 1, including embodiments as described herein). In embodiments, the peptidyl moiety comprises a sialoglycan binding V-set domain of Siglec. In embodiments, the peptidyl moiety comprises a sialoglycan binding V-set domain of Siglec-1, Siglec-2. Siglec-3, Siglec-4, Siglec-5. Siglec-6, Siglec-7, Siglec-8, Siglec-9, Siglec-10, Siglec-11, Siglec-12. Siglec-14, or Siglec-15. In embodiments, the peptidyl moiety comprises a sialoglycan binding V-set domain of Siglec-7 (e.g., SEQ ID NO:2, SEQ ID NO:3, or SEQ ID NO:4, including embodiments as described herein).
In embodiments, R2 or the protein comprising an unnatural amino acid comprises a glycan-binding protein. In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec. In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-1, Siglec-2, Siglec-3, Siglec-4, Siglec-5, Siglec-6, Siglec-7, Siglec-8, Siglec-9, Siglec-10, Siglec-11, Siglec-12, Siglec-14, or Siglec-15 In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-1. In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-2. In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-3. In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-4. In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-5. In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-6. In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-8. In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-9. In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-10. In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-11. In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-12. In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-14. In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-15.
In embodiments, R2 or the protein comprising an unnatural amino acid comprises Siglec-7. In embodiments, Siglec-7 comprises SEQ ID NO:1. In embodiments, Siglec-7 is SEQ ID NO:1. In embodiments, R2 or the protein comprising the unnatural amino acid has at least 85% sequence identity to SEQ ID NO: 1. In embodiments, R2 or the protein comprising the unnatural amino acid has at least 90% sequence identity to SEQ ID NO: 1. In embodiments, R2 or the protein comprising the unnatural amino acid has at least 92% sequence identity to SEQ ID NO:1. In embodiments, R2 or the protein comprising the unnatural amino acid has at least 94% sequence identity to SEQ ID NO: 1. In embodiments, R2 or the protein comprising the unnatural amino acid has at least 95% sequence identity to SEQ ID NO: 1. In embodiments, R2 or the protein comprising the unnatural amino acid has at least 96% sequence identity to SEQ ID NO:1. In embodiments, R2 or the protein comprising the unnatural amino acid has at least 98% sequence identity to SEQ ID NO:1. In embodiments, the unnatural amino acid is at a lysine residue or asparagine residue in Siglec-7. In embodiments, the lysine residue is at position 104 or position 127 in SEQ ID NO: 1. In embodiments, the lysine residue is at position 104 in SEQ ID NO:1. In embodiments, the lysine residue is at position 127 in SEQ ID NO:1. In embodiments, the asparagine residue is at position 129 in SEQ ID NO: 1.
In embodiments, R2 or the protein comprising an unnatural amino acid comprises the glycan binding domain of a glycan-binding protein. In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec. In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-1, the sialoglycan binding V-set domain of Siglec-2, the sialoglycan binding V-set domain of Siglec-3, the sialoglycan binding V-set domain of Siglec-4, the sialoglycan binding V-set domain of Siglec-5, the sialoglycan binding V-set domain of Siglec-6, the sialoglycan binding V-set domain of Siglec-7, the sialoglycan binding V-set domain of Siglec-8, the sialoglycan binding V-set domain of Siglec-9, the sialoglycan binding V-set domain of Siglec-10, the sialoglycan binding V-set domain of Siglec-11, the sialoglycan binding V-set domain of Siglec-12, the sialoglycan binding V-set domain of Siglec-14, or the sialoglycan binding V-set domain of Siglec-15. In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-1. In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-2. In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-3. In embodiments. R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-4. In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-5. In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-6. In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-8. In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-9. In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-10. In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-11. In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-14. In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-14. In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-15.
In embodiments, R2 or the protein comprising an unnatural amino acid comprises the sialoglycan binding V-set domain of Siglec-7. In embodiments, the sialoglycan binding V-set domain of Siglec-7 comprises SEQ ID NO:2. In embodiments, the sialoglycan binding V-set domain of Siglec-7 is SEQ ID NO:2. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 85% sequence identity to the amino acid sequence of SEQ ID NO:2. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 90% sequence identity to the amino acid sequence of SEQ ID NO:2. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 92% sequence identity to the amino acid sequence of SEQ ID NO:2. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 94% sequence identity to the amino acid sequence of SEQ ID NO:2. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 95% sequence identity to the amino acid sequence of SEQ ID NO:2. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 96% sequence identity to the amino acid sequence of SEQ ID NO:2. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 98% sequence identity to the amino acid sequence of SEQ ID NO:2. In embodiments, the unnatural amino acid is at a lysine residue or asparagine residue in the sialoglycan binding V-set domain of Siglec-7. In embodiments, the lysine residue is at position 104 or position 127 in SEQ ID NO:2. In embodiments, the lysine residue is at position 104 in SEQ ID NO:2. In embodiments, the lysine residue is at position 127 in SEQ ID NO:2. In embodiments, the asparagine residue is at position 129 in SEQ ID NO:2.
In embodiments, the sialoglycan binding V-set domain of Siglec-7 comprises SEQ ID NO:3. In embodiments, the sialoglycan binding V-set domain of Siglec-7 is SEQ ID NO:3. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 85% sequence identity to the amino acid sequence of SEQ ID NO:3. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 90% sequence identity to the amino acid sequence of SEQ ID NO:3. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 92% sequence identity to the amino acid sequence of SEQ ID NO:3. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 94% sequence identity to the amino acid sequence of SEQ ID NO:3. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 95% sequence identity to the amino acid sequence of SEQ ID NO:3. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 96% sequence identity to the amino acid sequence of SEQ ID NO:3. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 98% sequence identity to the amino acid sequence of SEQ ID NO:3. In embodiments, the unnatural amino acid is at a lysine residue or asparagine residue in the sialoglycan binding V-set domain of Siglec-7. In embodiments, the lysine residue is at position 104 or position 127 in SEQ ID NO:3. In embodiments, the lysine residue is at position 104 in SEQ ID NO:3. In embodiments, the lysine residue is at position 127 in SEQ ID NO:3. In embodiments, the asparagine residue is at position 129 in SEQ ID NO:3.
In embodiments, the sialoglycan binding V-set domain of Siglec-7 comprises SEQ ID NO:4. In embodiments, the sialoglycan binding V-set domain of Siglec-7 is SEQ ID NO:4. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 85% sequence identity to the amino acid sequence of SEQ ID NO:4. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 90% sequence identity to the amino acid sequence of SEQ ID NO:4. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 92% sequence identity to the amino acid sequence of SEQ ID NO:4. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 94% sequence identity to the amino acid sequence of SEQ ID NO:4. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 95% sequence identity to the amino acid sequence of SEQ ID NO:4. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 96% sequence identity to the amino acid sequence of SEQ ID NO:4. In embodiments, the sialoglycan binding V-set domain of Siglec-7 has at least 98% sequence identity to the amino acid sequence of SEQ ID NO:4. In embodiments, the unnatural amino acid is at a lysine residue or asparagine residue in the sialoglycan binding V-set domain of Siglec-7. In embodiments, the lysine residue is at position 104 or position 127 in SEQ ID NO:4. In embodiments, the lysine residue is at position 104 in SEQ ID NO:4. In embodiments, the lysine residue is at position 127 in SEQ ID NO:4. In embodiments, an asparagine residue in the sialoglycan binding V-set domain of Siglec-7 comprises an unnatural amino acid side chain of Formula (II), including embodiments thereof. In embodiments, the asparagine residue is at position 129 in SEQ ID NO:4.
With reference to the compounds described herein. R3 is a second biomolecule moiety. In embodiments, R3 is a peptidyl moiety, a lipid moiety. RNA, or a glycan moiety. In embodiments. R3 is a peptidyl moiety. In embodiments, R3 is a lipid moiety. In embodiments, R3 is a RNA moiety. In embodiments, L3 is bonded to a hydroxyl group within the RNA moiety. In embodiments, L3 is bonded to 2′-hydroxyl group within the RNA moiety. In embodiments, L3 is bonded to 2′-hydroxyl group of a ribose or an amine within the RNA moiety. In embodiments, L3 is bonded to 2′-hydroxyl group of a ribose within the RNA moiety. In embodiments, L3 is bonded to 2′-hydroxyl group of an amine within the RNA moiety. In embodiments, L3 is a bond. By targeting the 2′-hydroxyl group of a ribose, the RNA-binding protein can crosslink with all four nucleotides.
In embodiments of the compounds described herein. R3 is a glycan moiety. In embodiments, R3 is a sialoglycan moiety. In embodiments, a hydroxyl group of the glycan moiety bonds to L3 via an oxygen atom (—O—) within the glycan moiety, represented as —O-L3-. In embodiments, a hydroxyl group of the sialoglycan moiety bonds to L3 via an oxygen atom (—O—) within the sialoglycan moiety, represented as —O-L3-. In embodiments, L3 is a bond, such that the oxygen atom that is part of the structure of the sialoglycan moiety is bonded to the sulfur atom of the unnatural amino acid side chain. In embodiments, L3 is bonded to a sialoglycan containing a terminal 2,8-linked sialic acid (i.e., the unnatural amino acid side chain binds to a 2,8-linked sialic acid in a glycan). In embodiments, L3 is a bond. In embodiments, L3 is bonded to a sialoglycan containing a linear Neu5 Acα2-8Neu5Ac-terminating ligand, e.g., Neu5Acα2-8Neu5Acα2-3Galβ1-4Glc, Neu5 Acα2-8Neu5Gcα2-3Galβ1-4Glc, Neu5Acα2-8Kdncα2-3Galβ1-4Glc, Neu5Gcα2-8Neu5Acα2-3Galβ1-4Glc, or Neu5Gcα2-8Neu5Gcα2-3Galβ1-4Glc, shown in
With reference to the compounds described herein, R2 is a peptidyl moiety, a lipid moiety, an RNA moiety, or a glycan moiety; and R3 is a peptidyl moiety, a lipid moiety, an RNA moiety, or a glycan moiety. In embodiments, R2 is a peptidyl moiety, a lipid moiety, or an RNA moiety; and R3 is a glycan moiety. In embodiments, R2 is a peptidyl moiety and R3 is a glycan moiety. In embodiments, R2 is a lipid moiety and R3 is a glycan moiety. In embodiments, R2 is an RNA moiety and R3 is a glycan moiety. In embodiments, R2 is a peptidyl moiety and R3 is a peptidyl moiety.
In embodiments of the biomolecule conjugates described herein, the compound of Formula (III) further comprises a protein, a lipid, or RNA bonded to R3. In embodiments, the compound of Formula (III) further comprises a protein, a lipid, or RNA bonded to R3. In embodiments, the compound of Formula (III) further comprises a protein bonded to R3. In embodiments, the compound of Formula (III) further comprises a lipid bonded to R3. In embodiments, the compound of Formula (III) further comprises RNA bonded to R3. In embodiments, the lipid comprises a lipid membrane of a cell. In embodiments, the lipid comprises a lipid membrane of a cancer cell. In embodiments, the bond is a direct bond. In embodiments, the bond is an indirect bond. In embodiments, the bond is an electrostatic interaction (e.g., ionic bond, hydrogen bond, halogen bond). In embodiments, the bond is a van der Waals interaction (e.g., dipole-dipole, dipole-induced dipole, London dispersion). In embodiments, the bond is ring stacking (pi effects). In embodiments, the bond is a hydrophobic interaction.
In embodiments, the compound of Formula (III) further comprising a protein, a lipid, or RNA bonded to R3 is represented by the compound of Formula (IIIC):
where x, L1, L2, L3, R2, and R3 are as defined herein. The dashed line (-----) is a bond. R4 is a protein, a lipid, or RNA. In embodiments, R4 is a protein, a lipid, or RNA. In embodiments, R4 is a protein. In embodiments, R4 is a lipid. In embodiments, R4 is RNA. In embodiments, the lipid comprises a lipid membrane of a cell. In embodiments, the lipid comprises a lipid membrane of a cancer cell. In embodiments, the bond (-----) is a direct bond. In embodiments, the bond (-----) is an indirect bond. In embodiments, the bond is an electrostatic interaction (e.g., ionic bond, hydrogen bond, halogen bond). In embodiments, the bond is a van der Waals interaction (e.g., dipole-dipole, dipole-induced dipole, London dispersion). In embodiments, the bond is ring stacking (pi effects). In embodiments, the bond is a hydrophobic interaction.
The disclosure provides cells comprising the compounds, compositions and complexes provided herein, including embodiments thereof. Therefore, in an embodiment is provided a cell including the compound of Formula (I) and embodiments thereof, the compound of Formula (II) and embodiments thereof, the compound of Formula (III) and embodiments thereof, the compound of Formula (IV) and embodiments thereof, or the compound of Formula (V) and embodiments thereof.
In embodiments, the cell further includes a mutant pyrrolysyl-tRNA synthetase as described herein, including embodiments thereof. In embodiments, the cell further includes a vector as described herein, including embodiments thereof. In embodiments, the cell further includes a tRNAPyl.
In embodiments, the compound of Formula (I) (including embodiments thereof) is biosynthesized inside the cell, thereby generating a cell containing the compound of Formula (I). In embodiments, the compound of Formula (I) is contained in the medium outside the cell and penetrates into the cell, thereby generating a cell containing the compound of Formula (I). In embodiments, the cell comprises the compound of Formula (II) (including embodiments thereof). In embodiments, the cell comprises the compound of Formula (II) that is synthesized inside the cell. In embodiments, the cell comprises the compound of Formula (II) that is synthesized outside a cell, and that penetrates into the cell.
In embodiments, the cell comprises the biomolecule conjugates described herein. In embodiments, the cell comprises biomolecule conjugate of Formula (III), including embodiments thereof.
A cell can be any prokaryotic or eukaryotic cell. In aspects, the cell is prokaryotic. In aspects, the cell is eukaryotic. In aspects, the cell is a bacterial cell, a fungal cell, a plant cell, an archael cell, or an animal cell. In aspects, the animal cell is an insect cell or a mammalian cell. In aspects, the cell is a bacterial cell. In aspects, the cell is a fungal cell. In aspects, the cell is a plant cell. In aspects, the cell is an archael cell. In aspects, the cell is an animal cell. In aspects, the cell is an insect cell. In aspects, the cell is a mammalian cell. In aspects, the cell is a human cell. For example, any of the compositions described herein can be expressed in bacterial cells such as E. coli, insect cells, yeast or mammalian cells (such as Hela cells, Chinese hamster ovary cells (CHO) or COS cells). In aspects, the cell is a premature mammalian cell, i.e., a pluripotent stem cell. In aspects, the cell is derived from other human tissue. Other suitable cells are known to those skilled in the art.
As described herein, an unnatural amino acid (e.g., of Formula (I) and embodiments thereof) may be inserted into or replace a naturally occurring amino acid in a biomolecule (e.g., protein). In order for the unnatural amino acid to be inserted or replace an amino acid in a biomolecule (e.g., protein), it must be capable of being incorporated during proteinogenesis. Thus, the unnatural amino acid must be present on a transfer RNA molecule (tRNA) such that it may be used in translation. Loading of amino acids occurs via an aminoacyl-tRNA synthetase, which is an enzyme that facilitates the attachment of appropriate amino acids to tRNA molecules. However, the attachment of unnatural amino acids to tRNA may not necessarily be accomplished by the naturally occurring aminoacyl-tRNA synthetase. Engineered aminoacyl-tRNA synthetases (e.g., mutant pyrrolysyl-tRNA synthetase (PyIRS)) may be useful for attaching unnatural amino acids to tRNA. A PyIRS mutant library was generated. Compared to previously described PyIRS mutant library, the PyIRS mutant library generated herein was constructed using the new small-intelligent mutagenesis approach that allows a greater number of amino acid residues to be mutated simultaneously (e.g., 10 amino acid residues).
The disclosure provides a mutant pyrrolysyl-tRNA synthetase, including at least 5 amino acid residues substitutions within the substrate-binding site of the mutant pyrrolysyl-tRNA synthetase. In embodiments, the mutant pyrrolysyl-tRNA synthetase is a mutant Methanosarcina mazei PylRS (e.g., SEQ ID NO:5). In embodiments, the mutant pyrrolysyl-tRNA synthetase comprises at least 5 amino acid residues substitutions in the amino acid sequence of SEQ ID NO:5. In embodiments, the substrate-binding site includes residues tyrosine at position 306, leucine at position 309, asparagine at position 346, cysteine at position 348, and tryptophan at position 417 as set forth in the amino acid sequence of SEQ ID NO:5. In embodiments, the at least 5 amino acid residues substitutions are leucine for tyrosine at position 306 (Y306L), alanine for leucine at position 309 (L309A), alanine for asparagine at position 346 (N346A), methionine for cysteine at position 348 (C348M), and threonine for tryptophan at position 417 (W417T) as set forth in the amino acid sequence of SEQ ID NO:5.
In embodiments, the mutant pyrrolysyl-tRNA synthetase has the amino acid sequence of SEQ ID NO:6. In embodiments, the mutant pyrrolysyl-tRNA synthetase includes an amino acid sequence of SEQ ID NO:6. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 80%, 85%, 90%. 91%, 92%, 93%, 94%. 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO:6. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 80% identity to SEQ ID NO:6. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 85% identity to SEQ ID NO:6. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 90% identity to SEQ ID NO:6. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 92% identity to SEQ ID NO:6. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 94% identity to SEQ ID NO:6. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 95% identity to SEQ ID NO:6. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 96% identity to SEQ ID NO:6. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 98% identity to SEQ ID NO:6.
The disclosure provides a mutant pyrrolysyl-tRNA synthetase, including at least 5 amino acid residues substitutions within the substrate-binding site of the mutant pyrrolysyl-tRNA synthetase. In embodiments, the mutant pyrrolysyl-tRNA synthetase is a mutant Methanomethylophilus alvus PylRS (e.g., SEQ ID NO:7). In embodiments, the mutant pyrrolysyl-tRNA synthetase comprises at least 5 amino acid residues substitutions in the amino acid sequence of SEQ ID NO:7. In embodiments, the substrate-binding site includes residues tyrosine at position 126, leucine at position 309, methionine at position 129, asparagine at position 166, valine at position 168, and tryptophan at position 239 as set forth in the amino acid sequence of SEQ ID NO:7. In embodiments, the at least 5 amino acid residues substitutions are leucine for tyrosine at position 126 (Y126L), alanine for methionine at position 129 (M129A), alanine for asparagine at position 166 (N166A), methionine for valine at position 168 (V168M), and threonine for tryptophan at position 239 (W239T) as set forth in the amino acid sequence of SEQ ID NO:7.
In embodiments, the mutant pyrrolysyl-tRNA synthetase has the amino acid sequence of SEQ ID NO:8. In embodiments, the mutant pyrrolysyl-tRNA synthetase includes an amino acid sequence of SEQ ID NO:8. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO:8. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 80% identity to SEQ ID NO:8. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 85% identity to SEQ ID NO:8. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 90% identity to SEQ ID NO:8. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 92% identity to SEQ ID NO:8. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 94% identity to SEQ ID NO:8. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 95% identity to SEQ ID NO:8. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 96% identity to SEQ ID NO:8. In embodiments, the mutant pyrrolysyl-tRNA synthetase has an amino acid sequence that has at least 98% identity to SEQ ID NO:8.
The compositions (e.g., mutant pyrrolysyl-tRNA synthetase, tRNAPyl) provided herein may be delivered to cells using methods well known in the art. Thus, in an embodiment is provided a vector including a nucleic acid sequence encoding a mutant pyrrolysyl-tRNA synthetase as described herein, including embodiments thereof. In embodiments, the vector comprises a nucleic acid sequence encoding a mutant pyrrolysyl-tRNA synthetase that comprises at least 5 amino acid residues substitutions within the substrate-binding site of the mutant pyrrolysyl-tRNA synthetase. In embodiments, the vector further includes a nucleic acid sequence encoding tRNAPyl. In embodiments, the vector comprises a nucleic acid sequence encoding a mutant pyrrolysyl-tRNA synthetase as described herein. In embodiments, the vector further includes a nucleic acid sequence encoding tRNAPyl.
The compositions provided herein are useful for forming a biomolecule or biomolecule conjugate. Thus, in an embodiment is provided method of forming a biomolecule (e.g., protein) by contacting a biomolecule (e.g., protein such as Siglec-7 or a fragment thereof), a mutant pyrrolysyl-tRNA synthetase, a tRNAPyl, and a compound of Formula (I) (including embodiments thereof), thereby producing the biomolecule, i.e., a biomolecule comprising the unnatural amino acid of Formula (I) (including embodiments thereof). The biomolecule produced by the method will comprise the unnatural amino acid side chain of Formula (II) (including embodiments thereof). The mutant pyrrolysyl-tRNA synthetase used in the method of producing the biomolecule is any described herein. The tRNAPyl used in the method of producing the biomolecule is any described herein. In embodiments, the biomolecule is a protein. In embodiments, the biomolecule is a glycan. In embodiments, the reaction is performed in vitro. In embodiments, the reaction is performed in vivo. In embodiments, the reaction is performed in one or more living cells. In embodiments, the reaction is performed in one or more living bacterial cells. In embodiments, the reaction is performed in one or more living mammalian cells.
As shown in
Provided herein are pharmaceutical compositions comprising: (i) a biomolecule which comprises an unnatural amino acid and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical compositions comprise (i) a lipid which comprises an unnatural amino acid and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical compositions comprise (i) RNA which comprises an unnatural amino acid and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical compositions comprise (i) a protein which comprises an unnatural amino acid and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical compositions comprise (i) a nucleic acid capable of encoding a protein which comprises an unnatural amino acid and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical compositions comprise (i) a vector which comprises a nucleic acid capable of encoding a protein which comprises an unnatural amino acid and (ii) a pharmaceutically acceptable excipient. In embodiments, the protein is a glycan binding protein or a fragment thereof. In embodiments, the protein is a sialoglycan binding protein or a fragment thereof. In embodiments, the pharmaceutical compositions comprise (i) the compound of Formula (II), wherein the protein comprises Siglec or a fragment thereof, and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical compositions comprise (i) the compound of Formula (II), wherein the protein comprises a sialoglycan binding V-set domain of Siglec or a fragment thereof, and (ii) a pharmaceutically acceptable excipient. In embodiments, the Siglec is Siglec-1, Siglec-2, Siglec-3, Siglec-4, Siglec-5, Siglec-6, Siglec-7, Siglec-8, Siglec-9, Siglec-10, Siglec-11, Siglec-12, Siglec-14, or Siglec-15. The compositions are suitable for formulation and administration in vitro or in vivo. Suitable carriers and excipients and their formulations are described in Remington: The Science and Practice of Pharmacy. 21st Edition. David B. Troy, ed., Lippicott Williams & Wilkins (2005).
In embodiments, the pharmaceutical compositions comprise (i) the compound of Formula (II), wherein the protein comprises Siglec-7 or a fragment thereof, and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical composition comprises (i) a nucleic acid encoding the compound of Formula (II), wherein the protein comprises Siglec-7 (or a fragment thereof), and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical composition comprises (i) a vector which comprises a nucleic acid encoding the compound of Formula (II), wherein the protein comprises Siglec-7 (or a fragment thereof), and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical compositions comprise (i) the compound of Formula (II), wherein the protein comprises SEQ ID NO: 1 (or any embodiment thereof), and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical composition comprises (i) a nucleic acid encoding the compound of Formula (II), wherein the protein comprises SEQ ID NO:1 (or any embodiment thereof), and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical composition comprises (i) a vector which comprises a nucleic acid encoding the compound of Formula (II), wherein the protein comprises SEQ ID NO: 1 (or any embodiment thereof), and (ii) a pharmaceutically acceptable excipient.
In embodiments, the pharmaceutical compositions comprise (i) the compound of Formula (II), wherein the protein comprises a sialoglycan binding V-set domain of Siglec-7, and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical composition comprises (i) a nucleic acid encoding the compound of Formula (II), wherein the protein comprises a sialoglycan binding V-set domain of Siglec-7, and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical composition comprises (i) a vector which comprises a nucleic acid encoding the compound of Formula (II), wherein the protein comprises a sialoglycan binding V-set domain of Siglec-7, and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical compositions comprise (i) the compound of Formula (II), wherein the protein comprises SEQ ID NO:2 (or any embodiment thereof), and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical composition comprises (i) a nucleic acid encoding the compound of Formula (II), wherein the protein comprises SEQ ID NO:2 (or any embodiment thereof), and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical composition comprises (i) a vector which comprises a nucleic acid encoding the compound of Formula (II), wherein the protein comprises SEQ ID NO:2 (or any embodiment thereof), and (ii) a pharmaceutically acceptable excipient.
In embodiments, the pharmaceutical compositions comprise (i) the compound of Formula (II), wherein the protein comprises SEQ ID NO:3 (or any embodiment thereof), and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical composition comprises (i) a nucleic acid encoding the compound of Formula (II), wherein the protein comprises SEQ ID NO:3 (or any embodiment thereof), and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical composition comprises (i) a vector which comprises a nucleic acid encoding the compound of Formula (II), wherein the protein comprises SEQ ID NO:3 (or any embodiment thereof), and (ii) a pharmaceutically acceptable excipient.
In embodiments, the pharmaceutical compositions comprise (i) the compound of Formula (II), wherein the protein comprises SEQ ID NO:4 (or any embodiment thereof), and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical composition comprises (i) a nucleic acid encoding the compound of Formula (II), wherein the protein comprises SEQ ID NO:4 (or any embodiment thereof), and (ii) a pharmaceutically acceptable excipient. In embodiments, the pharmaceutical composition comprises (i) a vector which comprises a nucleic acid encoding the compound of Formula (II), wherein the protein comprises SEQ ID NO:4 (or any embodiment thereof), and (ii) a pharmaceutically acceptable excipient.
In embodiments, the pharmaceutical composition comprises: (i) a RNA-binding protein comprising the compound of Formula (II) and (ii) a pharmaceutically acceptable excipient.
“Pharmaceutically acceptable excipient” and “pharmaceutically acceptable carrier” refer to a substance that aids the administration of an active agent to and absorption by a subject and can be included in the compositions of the disclosure without causing a significant adverse toxicological effect on the patient. Non-limiting examples of pharmaceutically acceptable excipients include water, NaCl, normal saline solutions, lactated Ringer's, normal sucrose, normal glucose, binders, fillers, disintegrants, lubricants, coatings, sweeteners, flavors, salt solutions (such as Ringer's solution), alcohols, oils, gelatins, carbohydrates such as lactose, amylose or starch, fatty acid esters, hydroxymethycellulose, polyvinyl pyrrolidine, and colors, and the like. Such preparations can be sterilized and, if desired, mixed with auxiliary agents such as lubricants, preservatives, stabilizers, wetting agents, emulsifiers, salts for influencing osmotic pressure, buffers, coloring, and/or aromatic substances and the like that do not deleteriously react with the compounds of the disclosure. One of skill in the art will recognize that other pharmaceutical excipients are useful.
Solutions of the active compounds as free base or pharmacologically acceptable salt can be prepared in water suitably mixed with a surfactant, such as hydroxypropylcellulose. Dispersions can also be prepared in glycerol, liquid polyethylene glycols, and mixtures thereof and in oils. Under ordinary conditions of storage and use, these preparations can contain a preservative to prevent the growth of microorganisms.
Pharmaceutical compositions can be delivered via intranasal or inhalable solutions or sprays, aerosols or inhalants. Nasal solutions can be aqueous solutions designed to be administered to the nasal passages in drops or sprays. Nasal solutions can be prepared so that they are similar in many respects to nasal secretions. Thus, the aqueous nasal solutions usually are isotonic and slightly buffered to maintain a pH of 5 to 7. In addition, antimicrobial preservatives, similar to those used in ophthalmic preparations and appropriate drug stabilizers, if required, may be included in the formulation. Various commercial nasal preparations are known and can include, for example, antibiotics and antihistamines.
Oral formulations can include excipients as, for example, pharmaceutical grades of mannitol, lactose, starch, magnesium stearate, sodium saccharine, cellulose, magnesium carbonate and the like. These compositions take the form of solutions, suspensions, tablets, pills, capsules, sustained release formulations or powders. In embodiments, oral pharmaceutical compositions will comprise an inert diluent or edible carrier, or they may be enclosed in hard or soft shell gelatin capsule, or they may be compressed into tablets, or they may be incorporated directly with the food. For oral therapeutic administration, the active compounds may be incorporated with excipients and used in the form of ingestible tablets, buccal tablets, troches, capsules, elixirs, suspensions, syrups, wafers, and the like. The percentage of the compositions and preparations may, of course, be varied and may conveniently be between about 1 to about 99% of the weight of the unit. The amount of active compounds in such compositions is such that a suitable dosage can be obtained.
For parenteral administration in an aqueous solution, for example, the solution should be suitably buffered and the liquid diluent first rendered isotonic with sufficient saline or glucose. Aqueous solutions, in particular, sterile aqueous media, are especially suitable for intravenous, intramuscular, subcutaneous and intraperitoneal administration. For example, one dosage could be dissolved in 1 ml of isotonic NaCl solution and either added to 1000 ml of hypodermoclysis fluid or injected at the proposed site of infusion.
Sterile injectable solutions can be prepared by incorporating the active compounds in the required amount in the appropriate solvent followed by filtered sterilization. Generally, dispersions are prepared by incorporating the various sterilized active ingredients into a sterile vehicle which contains the basic dispersion medium. Vacuum-drying and freeze-drying techniques, which yield a powder of the active ingredient plus any additional desired ingredients, can be used to prepare sterile powders for reconstitution of sterile injectable solutions. The preparation of more, or highly, concentrated solutions for direct injection is also contemplated. Organic solvents can be used for rapid penetration, delivering high concentrations of the active agents to a small area.
The formulations of compounds can be presented in unit-dose or multi-dose sealed containers, such as ampules and vials. Thus, the composition can be in unit dosage form. In such form the preparation is subdivided into unit doses containing appropriate quantities of the active component. Thus, the compositions can be administered in a variety of unit dosage forms depending upon the method of administration. For example, unit dosage forms suitable for oral administration include, but are not limited to, powder, tablets, pills, capsules and lozenges.
The dosage and frequency (single or multiple doses) of the pharmaceutical compositions comprising a protein which comprises an unnatural amino acid (e.g., a compound of Formula (II) and embodiments thereof) administered to a subject can vary depending upon a variety of factors, for example, whether the mammal suffers from another disease, and its route of administration; size, age, sex, health, body weight, body mass index, and diet of the recipient; nature and extent of symptoms of the disease being treated (e.g., symptoms of cancer and severity of such symptoms), kind of concurrent treatment, complications from the disease being treated or other health-related problems. Other therapeutic regimens or agents can be used in conjunction with the methods and compounds described herein. Adjustment and manipulation of established dosages (e.g., frequency and duration) are well within the ability of those skilled in the art.
For any composition and compound of Formula (V) (and embodiments thereof) described herein, the effective amount can be initially determined from cell culture assays. Target concentrations will be those concentrations that are capable of achieving the methods described herein, as measured using the methods described herein or known in the art. As is known in the art, effective amounts of the compounds and pharmaceutical compositions for use in humans can also be determined from animal models. For example, a dose for humans can be formulated to achieve a concentration that has been found to be effective in animals. The dosage in humans can be adjusted by monitoring effectiveness and adjusting the dosage upwards or downwards, as described above. Adjusting the dose to achieve maximal efficacy in humans based on the methods described above and other methods is well within the capabilities of the ordinarily skilled artisan.
Dosages of the compounds and pharmaceutical compositions may be varied depending upon the requirements of the patient. The dose administered to a patient should be sufficient to affect a beneficial therapeutic response in the patient over time. The size of the dose also will be determined by the existence, nature, and extent of any adverse side-effects. Determination of the proper dosage for a particular situation is within the skill of the art. Dosage amounts and intervals can be adjusted individually to provide levels of the compounds effective for the particular clinical indication being treated. This will provide a therapeutic regimen that is commensurate with the severity of the individual's disease state.
In embodiments, the compounds are administered to a patient at an amount of about 0.01 mg/kg to about 500 mg/kg. It is understood that where the amount is referred to as “mg/kg,” the amount is milligram per kilogram body weight of the subject being administered with the compounds described herein. In embodiments, the compound is administered to a patient in an amount from about 1 mg to about 500 mg per day, as a single dose, or in a dose administered two or three times per day.
Provided herein are methods of identifying N6-methyladenosine (m6A) sites on RNA. e.g., by contacting an N6-methyladenosine reader protein which comprises the compound of Formula (I) (or an embodiment thereof) or Formula (II) (or an embodiment thereof) with RNA. Provided herein are in vivo methods of identifying N6-methyladenosine (m6A) sites on RNA in the transcriptome. Provided herein are in vivo methods of identifying N6-methyladenosine (m6A) sites on RNA in the transcriptome comprising incorporating the compound of Formula (I) (or an embodiment thereof) or Formula (II) (or an embodiment thereof) into the YTH domain in mammalian cells, and identifying N6-methyladenosine (m6A) sites through high-throughput sequencing. Provided herein are in vivo methods of identifying N6-methyladenosine (m6A) sites on RNA in the transcriptome comprising genetically incorporating the compound of Formula (I) (or an embodiment thereof) or Formula (II) (or an embodiment thereof) into the YTH domain in mammalian cells, and identifying N6-methyladenosine (m6A) sites through high-throughput sequencing. In embodiments, the method of identifying N6-methyladenosine (m6A) sites in RNA comprises contacting an N6-methyladenosine reader protein which comprises the compound of Formula (I) (or an embodiment thereof) or Formula (II) (or an embodiment thereof) with RNA. In embodiments, the N6-methyladenosine (m6A) sites in RNA are endogenous m6A sites in cells. In embodiments, the N6-methyladenosine (m6A) sites in RNA are endogenous m6A sites in mammalian cells. In embodiments, the RNA is in the transcriptome. In embodiments, the N6-methyladenosine (m6A) sites in RNA are endogenous m6A sites in the transcriptome in cells. In embodiments, the N6-methyladenosine (m6A) sites in RNA are endogenous m6A sites in the transcriptome in mammalian cells. The disclosure provides methods of detecting endogenous m6A sites in cells throughout the transcriptome comprising contacting an N6-methyladenosine reader protein which comprises the compound of Formula (I) (or an embodiment thereof) or Formula (II) (or an embodiment thereof) with RNA using high-throughput sequencing. In embodiments, the N6-methyladenosine reader protein comprises the compound of Formula (I) (or an embodiment thereof) or Formula (II) (or an embodiment thereof) at an N6-methyladenosine binding site of the N6-methyladenosine reader protein. Expression of the N6-methyladenosine reader protein comprises the compound of Formula (I) (or an embodiment thereof) or Formula (II) (or an embodiment thereof) crosslinks at N6-methyladenosine sites in RNA (
Provided herein are methods of identifying N6-methyladenosine (m6A) sites on RNA. e.g., by contacting a N6-methyladenosine (m6A) demethylase (eraser) protein which comprises the compound of Formula (I) (or an embodiment thereof) or Formula (II) (or an embodiment thereof) with RNA. Provided herein are in vivo methods of identifying N6-methyladenosine (m6A) sites on RNA in the transcriptome. Provided herein are in vivo methods of identifying N6-methyladenosine (m6A) sites on RNA in the transcriptome comprising incorporating the compound of Formula (I) (or an embodiment thereof) or Formula (II) (or an embodiment thereof) into the YTH domain in mammalian cells, and identifying N6-methyladenosine (m6A) sites through high-throughput sequencing. Provided herein are in vivo methods of identifying N6-methyladenosine (m6A) sites on RNA in the transcriptome comprising genetically incorporating the compound of Formula (I) (or an embodiment thereof) or Formula (II) (or an embodiment thereof) into the YTH domain in mammalian cells, and identifying N6-methyladenosine (m6A) sites through high-throughput sequencing. In embodiments, the method of identifying N6-methyladenosine (m6A) sites in RNA comprises contacting a m6A demethylase protein which comprises the compound of Formula (I) (or an embodiment thereof) or Formula (II) (or an embodiment thereof) with RNA. In embodiments, the N6-methyladenosine (m6A) sites in RNA are endogenous m6A sites in cells. In embodiments, the N6-methyladenosine (m6A) sites in RNA are endogenous m6A sites in mammalian cells. In embodiments, the RNA is in the transcriptome. In embodiments, the N6-methyladenosine (m6A) sites in RNA are endogenous m6A sites in the transcriptome in cells. In embodiments, the N6-methyladenosine (m6A) sites in RNA are endogenous m6A sites in the transcriptome in mammalian cells. The disclosure provides methods of detecting endogenous m6A sites in cells throughout the transcriptome comprising contacting a m6A demethylase protein which comprises the compound of Formula (I) (or an embodiment thereof) or Formula (II) (or an embodiment thereof) with RNA using high-throughput sequencing. In embodiments, the m6A demethylase protein comprises the compound of Formula (I) (or an embodiment thereof) or Formula (II) (or an embodiment thereof) at an N6-methyladenosine binding site of the m6A demethylase protein. In embodiments, the m6A demethylase protein is FTO or ALKBH5. In embodiments, the m6A demethylase protein is FTO. In embodiments, the m6A demethylase protein is ALKBH5. The method described herein provides an antibody-free approach for identifying m6A with single-nucleotide resolution in vivo, which will reflect m6A physiological status more closely. The methods described herein provide for high-throughput sequence mapping of all m6A in the transcriptome. In addition, the present methods can be generalized to map other RNA modifications in vivo for which a reader or binder exists.
The disclosure provides methods of treating a disease in a patient in need thereof by administering to the patient an effective amount of the compounds or compositions described herein to treat the disease. In embodiments, the disease comprises an elevated level of sialoglycan relative to a control. In embodiments, the disclosure provides methods of treating cancer in a patient in need thereof by administering to the patient an effective amount of the compounds or compositions described herein to treat the cancer. In embodiments, the disclosure provides methods of treating cancer in a patient in need thereof by administering to the patient an effective amount of the compounds or compositions described herein to treat the cancer, wherein the cancer has an elevated level of sialoglycan relative to a control (e.g., an elevated level of sialoglycan on the cancer cells relative to a control). In embodiments, the disclosure provides methods of treating cancer in a patient in need thereof by administering to the patient an effective amount of the compounds or compositions described herein to treat the cancer, wherein the cancer comprises sialoglycan (e.g., sialoglycan on the cancer cells). In embodiments, the methods further comprise detecting an elevated level of sialoglycan in a biological sample obtained from the patient. In embodiments, the cancer is melanoma or breast cancer. In embodiments, the cancer is melanoma. In embodiments, the cancer is breast cancer. In embodiments, the breast cancer is breast carcinoma. In embodiments, the breast cancer is breast adenocarcinoma.
The disclosure provides methods of treating cancer in a patient in need thereof by detecting an elevated level of sialoglycan in a biological sample obtained from the patient, and administering to the patient an effective amount of the compounds or compositions described herein. In embodiments, the cancer is melanoma or breast cancer. In embodiments, the cancer is melanoma. In embodiments, the cancer is breast cancer. In embodiments, the breast cancer is breast carcinoma. In embodiments, the breast cancer is breast adenocarcinoma.
“Disease” or “condition” refer to a state of being or health status of a patient or subject capable of being treated with a compound, pharmaceutical composition, or method provided herein. The disease may be a cancer (e.g., ovarian cancer, bladder cancer, head and neck cancer, brain cancer, breast cancer, lung cancer, cervical cancer, liver cancer, colorectal cancer, pancreatic cancer, glioblastoma, neuroblastoma, rhabdomyosarcoma, osteosarcoma, renal cancer, renal cell carcinoma, non-small cell lung cancer, uterine cancer, testicular cancer, anal cancer, bile duct cancer, biliary tract cancer, gastrointestinal carcinoid tumors, esophageal cancer, gall bladder cancer, appendix cancer, small intestine cancer, stomach (gastric) cancer, urinary bladder cancer, genitourinary tract cancer, endometrial cancer, nasopharyngeal cancer, head and neck squamous cell carcinoma, or prostate cancer).
The term “cancer” refers to all types of cancer, neoplasm or malignant tumors found in mammals, including leukemia, carcinomas and sarcomas. Exemplary cancers that may be treated with a compound or method provided herein include brain cancer, glioma, glioblastoma, neuroblastoma, prostate cancer, colorectal cancer, pancreatic cancer, medulloblastoma, melanoma, cervical cancer, gastric cancer, ovarian cancer, lung cancer, cancer of the head, Hodgkin's Disease, and Non-Hodgkin's Lymphomas. Exemplary cancers that may be treated with a compound or method provided herein include cancer of the thyroid, endocrine system, brain, breast, cervix, colon, head & neck, liver, kidney, lung, ovary, pancreas, rectum, stomach, and uterus. Additional examples include, thyroid carcinoma, cholangiocarcinoma, pancreatic adenocarcinoma, skin cutaneous melanoma, colon adenocarcinoma, rectum adenocarcinoma, stomach adenocarcinoma, esophageal carcinoma, head and neck squamous cell carcinoma, breast invasive carcinoma, lung adenocarcinoma, lung squamous cell carcinoma, non-small cell lung carcinoma, mesothelioma, multiple myeloma, neuroblastoma, glioma, glioblastoma multiforme, ovarian cancer, rhabdomyosarcoma, primary thrombocytosis, primary macroglobulinemia, primary brain tumors, malignant pancreatic insulanoma, malignant carcinoid, urinary bladder cancer, premalignant skin lesions, testicular cancer, thyroid cancer, neuroblastoma, esophageal cancer, genitourinary tract cancer, malignant hypercalcemia, endometrial cancer, adrenal cortical cancer, neoplasms of the endocrine or exocrine pancreas, medullary thyroid cancer, medullary thyroid carcinoma, melanoma, colorectal cancer, papillary thyroid cancer, hepatocellular carcinoma, or prostate cancer. In embodiments, the cancer or tumor type is adrenalcortical cancer, bladder/urothelial cancer, breast cancer, cervical cancer, cholangiocarcinoma, colorectal adenocarcinoma, diffuse large B-cell lymphoma, glioma, head and neck squamous cell carcinoma, renal cancer, renal clear cell cancer, papillary cell cancer, hepatocellular cancer, lung cancer, mesothelioma, ovarian cancer, pancreatic cancer, pheochromocytoma, paraganglioma, prostate cancer, rectal cancer, sarcoma, melanoma, stomach or esophageal cancer, testicular cancer, thyroid cancer, thymoma, uterine cancer, and/or uveal melanoma.
The term “melanoma” is taken to mean a tumor arising from the melanocytic system of the skin and other organs. Melanomas that may be treated with a compound or method provided herein include, for example, acral-lentiginous melanoma, amelanotic melanoma, benign juvenile melanoma, Cloudman's melanoma, S91 melanoma, Harding-Passey melanoma, juvenile melanoma, lentigo maligna melanoma, malignant melanoma, nodular melanoma, subungal melanoma, or superficial spreading melanoma.
The terms “treating”, or “treatment” refers to any indicia of success in the therapy or amelioration of an injury, disease, pathology or condition, including any objective or subjective parameter such as abatement: remission: diminishing of symptoms or making the injury, pathology or condition more tolerable to the patient: slowing in the rate of degeneration or decline: making the final point of degeneration less debilitating; improving a patient's physical or mental well-being. The treatment or amelioration of symptoms can be based on objective or subjective parameters; including the results of a physical examination, neuropsychiatric exams, and/or a psychiatric evaluation. The term “treating” and conjugations thereof, may include prevention of an injury, pathology, condition, or disease. In embodiments, treating is preventing. In embodiments, treating does not include preventing.
“Treating” or “treatment” as used herein (and as well-understood in the art) also broadly includes any approach for obtaining beneficial or desired results in a subject's condition, including clinical results. Beneficial or desired clinical results can include, but are not limited to, alleviation or amelioration of one or more symptoms or conditions, diminishment of the extent of a disease, stabilizing (i.e., not worsening) the state of disease, prevention of a disease's transmission or spread, delay or slowing of disease progression, amelioration or palliation of the disease state, diminishment of the reoccurrence of disease, and remission, whether partial or total and whether detectable or undetectable. In other words, “treatment” as used herein includes any cure, amelioration, or prevention of a disease. Treatment may prevent the disease from occurring; inhibit the disease's spread; relieve the disease's symptoms (e.g., ocular pain, seeing halos around lights, red eye, very high intraocular pressure), fully or partially remove the disease's underlying cause, shorten a disease's duration, or do a combination of these things.
“Treating” and “treatment” as used herein include prophylactic treatment. Treatment methods include administering to a subject a therapeutically effective amount of an active agent. The administering step may consist of a single administration or may include a series of administrations. The length of the treatment period depends on a variety of factors, such as the severity of the condition, the age of the patient, the concentration of active agent, the activity of the compositions used in the treatment, or a combination thereof. It will also be appreciated that the effective dosage of an agent used for the treatment or prophylaxis may increase or decrease over the course of a particular treatment or prophylaxis regime. Changes in dosage may result and become apparent by standard diagnostic assays known in the art. In instances, chronic administration may be required. For example, the compositions are administered to the subject in an amount and for a duration sufficient to treat the patient. In embodiments, the treating or treatment is not prophylactic treatment.
“Patient” or “subject in need thereof” refers to a living organism suffering from or prone to a disease or condition that can be treated by administration of a pharmaceutical composition as provided herein. Non-limiting examples include humans, other mammals, bovines, rats, mice, dogs, monkeys, goat, sheep, cows, deer, and other non-mammalian animals. In embodiments, a patient is human.
A “effective amount”, as used herein, is an amount sufficient for a compound to accomplish a stated purpose relative to the absence of the compound (e.g., achieve the effect for which it is administered, treat a disease, reduce enzyme activity, increase enzyme activity, reduce a signaling pathway, or reduce one or more symptoms of a disease or condition). In these methods, the effective amount of the compound is an amount effective to accomplish the stated purpose of the method. An example of an “effective amount” is an amount sufficient to contribute to the treatment, prevention, or reduction of a symptom or symptoms of a disease, which could also be referred to as a “therapeutically effective amount,” A “reduction” of a symptom or symptoms (and grammatical equivalents of this phrase) means decreasing of the severity or frequency of the symptom(s), or elimination of the symptom(s). The exact amounts will depend on the purpose of the treatment, and will be ascertainable by one skilled in the art using known techniques (see, e.g., Lieberman, Pharmaceutical Dosage Forms (vols. 1-3, 1992); Lloyd, The Art, Science and Technology of Pharmaceutical Compounding (1999); Pickar, Dosage Calculations (1999); and Remington: The Science and Practice of Pharmacy, 20th Edition, 2003, Gennaro, Ed., Lippincott, Williams & Wilkins).
The term “therapeutically effective amount,” as used herein, refers to that amount of the therapeutic agent sufficient to ameliorate the disorder, as described above. For example, for the given parameter, a therapeutically effective amount will show an increase or decrease of at least 5%, 10%, 15%, 20%, 25%, 40%, 50%, 60%, 75%, 80%, 90%, or at least 100%. Therapeutic efficacy can also be expressed as “−fold” increase or decrease. For example, a therapeutically effective amount can have at least a 1.2-fold, 1.5-fold, 2-fold, 5-fold, or more effect over a control.
As used herein, the term “administering” means oral administration, administration as a suppository, topical contact, intravenous, parenteral, intraperitoneal, intramuscular, intralesional, intrathecal, intranasal or subcutaneous administration, or the implantation of a slow-release device, e.g., a mini-osmotic pump, to a subject. Administration is by any route, including parenteral and transmucosal (e.g., buccal, sublingual, palatal, gingival, nasal, vaginal, rectal, or transdermal). Parenteral administration includes, e.g., intravenous, intramuscular, intra-arteriole, intradermal, subcutaneous, intraperitoneal, intraventricular, and intracranial. Other modes of delivery include, but are not limited to, the use of liposomal formulations, intravenous infusion, transdermal patches, etc. In embodiments, the administering does not include administration of any active agent other than the recited active agent.
“Biological sample” is used in accordance with its plain and ordinary meaning and encompasses any sample type that can be used in a diagnostic, prognostic, or treatment method described herein. The biological sample may be any bodily fluid, tissue or any other sample obtained from a subject or subject's body from which clinically relevant protein marker levels or antibody levels may be determined. The definition encompasses blood and other liquid samples of biological origin, solid tissue samples such as a biopsy specimen or tissue cultures or cells derived therefrom and the progeny thereof. The definition also includes samples that have been manipulated in any way after their procurement, such as by treatment with reagents, solubilization, or enrichment for certain components, such as polypeptides or proteins. The term “biological sample” encompasses a clinical sample, but also, in embodiments, includes cells in culture, cell supernatants, cell lysates, blood, serum, plasma, urine, cerebral spinal fluid, biological fluid, and tissue samples. The sample may be pretreated as necessary by dilution in an appropriate buffer solution or concentrated, if desired. In embodiments, the biological sample is a blood sample. In embodiments, the biological sample is whole blood, plasma, or serum. In embodiments, the biological sample is a cancer cell. In embodiments, the biological sample is a cancer tumor.
“Control,” “suitable control,” or “control experiment” is used in accordance with its plain ordinary meaning and refers to an experiment in which the subjects or reagents of the experiment are treated as in a parallel experiment except for omission of a procedure, reagent, or variable of the experiment. In embodiments, the control is used as a standard of comparison in evaluating experimental effects. In embodiments, a control is the measurement of the activity of a protein in the absence of a compound as described herein (including embodiments and examples). For example, a test sample can be taken from a patient suspected of having a given disease (e.g., cancer) and compared to samples from a known cancer patient, or a known normal (non-disease) individual. A control can also represent an average value gathered from a population of similar individuals, e.g., cancer patients or healthy individuals with a similar medical background, same age, weight, etc. A control value can also be obtained from the same individual, e.g., from an earlier-obtained sample, prior to disease, or prior to treatment. One of skill will recognize that controls can be designed for assessment of any number of parameters. In embodiments, a control is a negative control. One of skill in the art will understand which controls are valuable in a given situation and be able to analyze data based on comparisons to control values. Controls are also valuable for determining the significance of data. For example, if values for a given parameter are widely variant in controls, variation in test samples will not be considered as significant.
wherein: x is an integer from 0 to 8; L1 is a bond, substituted or unsubstituted alkylene, or substituted or unsubstituted heteroalkylene; R1 is halogen, —CX13, —CHX12, —CH2X1, —OCX13, —OCH2X1, —OCHX12, —CN, —SOn1R1A, —SOv1NR1AR1B, —NHC(O)NR1AR1B, —N(O)m1, —NR1AR1B, —C(O)R1A, —C(O)—OR1A, —C(O)NR1AR1B, —OR1A, —NR1ASO2R1B, —NR1AC(O)R1B, —NR1AC(O)OR1B, —NR1AOR1B, substituted or unsubstituted alkyl, or substituted or unsubstituted heteroalkyl; X1 is independently —F, —Cl, —Br, or —I; R1A is hydrogen, substituted or unsubstituted alkyl, or substituted or unsubstituted heteroalkyl; R1B is hydrogen, substituted or unsubstituted alkyl, or substituted or unsubstituted heteroalkyl; n1 is an integer from 0 to 4; m1 is 1 or 2; and v1 is 1 or 2.
wherein: x is an integer from 1 to 8; L1 is a bond, substituted or unsubstituted alkylene, or substituted or unsubstituted heteroalkylene; R1 is halogen, —CX13, —CHX12, —CH2X1, —OCX13, —OCH2X1, —OCHX12, —CN, —SOn1R1A, —SOv1NR1AR1B, —NHC(O)NR1AR1B, —N(O)m1, —NR1AR1B, —C(O)R1A, —C(O)—OR1A, —C(O)NR1AR1B, —OR1A, —NR1ASO2R1B, —NR1AC(O)R1B, —NR1AC(O)OR1B, —NR1AOR1B, substituted or unsubstituted alkyl, or substituted or unsubstituted heteroalkyl; X1 is independently —F, —Cl, —Br, or —I; R1A is hydrogen, substituted or unsubstituted alkyl, or substituted or unsubstituted heteroalkyl; R1B is hydrogen, substituted or unsubstituted alkyl, or substituted or unsubstituted heteroalkyl; n1 is an integer from 0 to 4; m1 is 1 or 2; and v1 is 1 or 2.
wherein: R2 is a first biomolecule moiety; R3 is a second biomolecule moiety; L1 is a bond, substituted or unsubstituted alkylene, or substituted or unsubstituted heteroalkylene; x is an integer from 1 to 8; R1 is halogen, —CX13, —CHX12, —CH2X1, —OCX13, —OCH2X1, —OCHX12, —CN, —SOn1R1A, —SOn1NR1AR1B, —NHC(O)NR1AR1B. —N(O)m1, —NR1AR1B, —C(O)R1A, —C(O)—OR1A, —C(O)NR1AR1B, —OR1A, —NR1ASO2R1B, —NR1AC(O)R1B, —NR1AC(O)OR1B, —NR1AOR1B, substituted or unsubstituted alkyl, or substituted or unsubstituted heteroalkyl; X1 is independently —F, —Cl, —Br, or —I; R1A is hydrogen, substituted or unsubstituted alkyl, or substituted or unsubstituted heteroalkyl; R1B is hydrogen, substituted or unsubstituted alkyl, or substituted or unsubstituted heteroalkyl; n1 is an integer from 0 to 4; m1 is 1 or 2; v1 is 1 or 2; L2 is a bond, —NR2A—, —S—, —S(O)2—, —O—, —C(O)—, —C(O)O—, —OC(O)—, —N(R2A)C(O)—, —C(O)N(R2A)—, —NR2AC(O)NR2B—, —NR2AC(NH)NR2B—, —SO2N(R2A)—, —N(R2A)SO2—, —C(S)—, substituted or unsubstituted alkylene, substituted or unsubstituted heteroalkylene, substituted or unsubstituted cycloalkylene, substituted or unsubstituted heterocycloalkylene, substituted or unsubstituted arylene, or substituted or unsubstituted heteroarylene; L3 is a bond, —N(R3A)—, —S—, —S(O)2—, —O—, —C(O)—, —C(O)O—, —OC(O)—, —N(R3A)C(O)—, —C(O)N(R3A)—, —NR3AC(O)NR3B—, —NR3AC(NH)NR3B—, —SO2N(R3A)—, —N(R3A)SO2—, —C(S)—, substituted or unsubstituted alkylene, substituted or unsubstituted heteroalkylene, substituted or unsubstituted cycloalkylene, substituted or unsubstituted heterocycloalkylene, substituted or unsubstituted arylene, or substituted or unsubstituted heteroarylene; and R2A, R2B, R3A, and R3B are independently hydrogen, substituted or unsubstituted alkyl, substituted or unsubstituted heteroalkyl, substituted or unsubstituted cycloalkyl, substituted or unsubstituted heterocycloalkyl, substituted or unsubstituted aryl, or substituted or unsubstituted heteroaryl.
The following examples are intended to further illustrate certain embodiments of the disclosure. The examples are put forth so as to provide one of ordinary skill in the art and are not intended to limit its scope.
Recent success in creating covalent linkages between proteins in vivo has enabled the capture of elusive protein-protein interactions as well as the development of covalent protein drugs for cancer immunotherapy. (Refs. 9-12). However, glycans contain mainly the weak nucleophilic hydroxyl group, which are difficult to react with under mild aqueous conditions. Unlike amino acid residues of proteins, many of which have distinct functional groups in their side chains to distinguish, the functional groups of monosaccharides are more or less the same. Efficient differentiation between the multiple hydroxyl groups of an unprotected glycan is also difficult without enzyme catalysis. (Ref. 13). These chemical features of glycan not only make them challenging to synthesize but also to selectively target with biocompatible chemistry. (Ref. 14).
Sialic acid-binding immunoglobulin-like lectin 7 (Siglec-7) is an inhibitory transmembrane receptor mainly expressed on human natural killer (NK) cells. (Ref. 15). Siglec-7 recognizes sialic acid via its extracellular V-set immunoglobulin domain and signals through its cytosolic immunoreceptor tyrosine-based inhibitory motif (ITIM) to attenuate NK cell activation. (Ref. 16). The preferred glycan ligand for Siglec-7 is Neu5 Acα2-8Neu5Ac-containing glycans with generally low binding affinity. (Refs. 17-18). Siglec-7 natively contributes to the discrimination between self and non-self, but some pathogens and cancers can up-regulate sialoglycan to evade immune surveillance and NK cell-mediated killing. (Ref. 19). New strategies to block such exploitation would be valuable for developing glycan-based immunotherapy.
Here we developed a biocompatible method, genetically encoded chemical cross-linking of proteins with sugar (GECX-sugar), to generate covalent linkages between proteins and glycans with residue specificity. We identified that sulfonyl fluoride was able to cross-link sugar via proximity-enabled reactivity, and genetically encoded into proteins a novel bioreactive unnatural amino acid (Uaa) SFY containing the sulfonyl fluoride. The SFY-incorporated Siglec-7 covalently and specifically cross-linked its substrate sialoglycan in vitro and on cancer cell surface. Moreover, through covalent binding with sialoglycan on cancer cell surface, SFY-incorporated Siglec-7 enhanced the killing of cancer cells by natural killer (NK) cells. The site-specific covalent linkage between protein and glycan enabled by GECX-sugar will facilitate the study of protein-glycan interactions and open new avenues for engineering covalent protein-glycan complex for research and therapeutic purposes.
It is challenging to covalently target glycan under mild physiological conditions. The dominant functional group, the hydroxyl group, is a weak nucleophile and difficult to chemically differentiate from water. Recently we have succeeded in covalently targeting amino acid side chains and RNA nucleotides in vivo via proximity-enabled reactivity (Ref. 9), we thus expect that glycan could also be covalently targeted via this mechanism.
To identify functional groups that react with glycan with reactivity driven by proximity effect, we developed a strategy involving the use of plant-and-cast small molecule cross-linkers to cross-link protein-glycan complex (
Based on our experience with targeting amino acid side chains via proximity-enabled reactivity, we designed and synthesized five plant-and-cast cross-linkers containing sulfonyl fluoride, benzyl bromide, fluorosulfate, and photocaged quinone methide (QM), respectively (
To cross-link protein-glycan interactions, we chose to work with Siglec-7, a transmembrane receptor expressed on human immune cells to regulate immune function through recognizing sialoglycans. We cloned and expressed the extracellular, sialoglycan binding V-set domain of Siglec-7 in E. coli, referred to herein as Siglec-7v or SEQ ID NO:3. The Siglec-7v was purified from inclusion bodies in high concentrations of guanidine and refolded using step-wise dialysis. The intact Siglec-7v was analyzed with electrospray ionization time-of-flight mass spectrometry (ESI-TOF MS). A major peak was observed at 15937.5 Da, corresponding to intact Siglec-7v with disulfide bond formed. A glycosphingolipid glycan microarray featuring 58 glycan epitopes was used for functional analysis of the refolded Siglec-7v (
From the binding assay of Siglec-7v, we chose G11 as the model ligand for Siglec-7v cross-linking study. G11 was called GD3 ganglioside sugar, which is a tumor-associated glycan antigen. (Refs. 29-31). To facilitate identification of GD3, we added an azido group at the lactose terminal, referred to as azido-GD3 (
We then investigated if the various function groups synthesized onto the cross-linkers (
To study the specificity of NHSF-mediated glycan cross-linking, we investigated the cross-linking of Siglec-7v with azido-GD3 by NHSF in more detail. The cross-linking band was observed only when Siglec-7v, azido-GD3, and NHSF were all present: no cross-linking was detected with either one or two components withdrawn from incubation (
To assess if NHSF cross-linking of glycan was distance dependent, we first determined which Lys residue of Siglec-7v was NHSF planted on via the succinimide ester to cross-link azido-GD3. We individually mutated all Lys residues on the Siglec-7v to Gly (
To further evaluate distance dependence of NHSF cross-linking and to optimize cross-linking efficiency, we next altered the length of the cross-linker. Three additional analogs of NHSF were synthesized with increasing numbers of methylene in the linker to study the effects of linker length and flexibility on the cross-linking efficiency (
Design and Genetic Incorporation of SFY into Proteins in E. coli
To introduce the identified sulfonyl fluoride group into proteins, we designed and synthesized the unnatural amino acid (Uaa) o-sulfonyl fluoride-O-methyltyrosine (SFY) (
We next evolved a mutant pyrrolysyl-tRNA synthetase (PylRS) specific for SFY to genetically incorporate it into proteins. A PylRS mutant library was generated by mutating residues Ala302, Leu305, Tyr306, Leu309, Ile322, Asn346, Cys348, Tyr384, Val401, and Trp417 of the Methanosarcina mazei PylRS using the small-intelligent mutagenesis approach, and subjected to selection as described. (Refs. 33-35). A hit showing SFY-dependent phenotype was identified, which contained the following mutations (306L/309A/346A/348M/417T) and was named as MmSFYRS (
To evaluate the incorporation specificity of SFY into proteins in E. coli, we expressed the superfold green fluorescent protein (sfGFP) gene containing a TAG codon at position 2 (sfGFP-2TAG) with the tRNAPyl/MmSFYRS pair in E. coli. When 1 mM SFY was added in growth media, full-length sfGFP(2SFY) was significantly produced (
We further transplanted the mutations of MmSFYRS into Methanomethylophilus alvus PylRS to generate MaSFYRS, as transplanting mutations from Methanosarcina barkeri PylRS mutant into M, alvus PylRS has been shown to increase the Uaa incorporation efficiency. (Ref. 27). Expression of sfGFP-2TAG gene together with the Ma-tRNAPyl/MaSFYRS pair in E. coli also showed SFY dependent production of full-length sfGFP (
To prepare SFY incorporated Siglec-7v proteins, we expressed Siglec-7v(104TAG) gene with the Ma-tRNAPyl/MaSFYRS in E. coli in the presence of 1 mM of SFY. The expressed Siglec-7v(104SFY) protein was purified and refolded similarly as WT Siglec-7v. The Siglec-7v(104SFY) protein was produced with a yield of 5 mg/L, and the WT Siglec-7v yielded 20 mg/L. The intact mass of the purified Siglec-7v(104SFY) was analyzed with ESI-TOF MS (
We then determined if SFY could enable Siglec-7v to cross-link the bound glycan ligand. Inspired by NHSF-mediated cross-linking of azido-GD3 through Lys on Siglec-7v protein, we incorporated SFY into individual Lys sites of Siglec-7v, including sites 20, 24, 75, 104, 127, 131, and 135. Each SFY incorporated Siglec-7v mutant was incubated with 2 mM azido-GD3 for 1 h at room temperature, followed with click labeling of alkyne-biotin and Western blot analysis using streptavidin-HRP (
We further explored if SFY incorporated Siglec-7v could cross-link sialoglycan on mammalian cell surface. SK-MEL-5 is a human melanoma cell line with a high level of sialylation on cell surface. (Ref. 37). We incubated SK-MEL-5 cells with different concentrations of WT Siglec-7v or Siglec-7v(127SFY), followed with washing. Siglec-7v proteins bound to cell surface were stained with a fluorescently labeled antibody specific for the His×6 tag appended at the C-terminus of Siglec-7v, and quantified with flow cytometry. Remarkably, cells incubated with Siglec-7v(127SFY) showed higher mean fluorescence intensity (MFI) over those incubated with WT Siglec-7v in all protein concentrations tested (
Many tumors upregulate cell surface sialic acids, which bind with Siglec-7 on human NK cells, inhibiting NK cell cytotoxicity and evading immune-surveillance. Since Siglec-7v(127SFY) could irreversibly cross-link with cell surface sialoglycan, we reasoned that it would competitively block the interaction of tumor cell surface sialoglycan with Siglec-7 of NK cells, thus enhancing NK cell killing of tumor cells (
To test this hypothesis, we incubated Siglec-7v(127SFY) with three hypersialylated human cancer cell lines, SK-MEL-28 (melanoma), BT-20 (breast carcinoma), and MCF-7 (breast adenocarcinoma), respectively for 2 h to allow binding and cross-linking, using WT Siglec-7v as the control. (Ref. 39). The cells were washed and then subjected to incubation with human NK-92 cells. NK-92 is a cytotoxic human NK cell line that is currently in clinical trials for cancer treatment. (Ref. 40). Cancer cell viability was evaluated with propidium iodide staining and quantified with flow cytometry. The percent of cancer cells killed by NK-92 cells was calculated (
A major advantage of genetically encoding SFY is the ability to introduce the sulfonyl fluoride into the Siglec-7v protein site specifically. Although sulfonyl fluoride could be installed on Siglec-7v through pretreating Siglec-7v with NHSF, this approach resulted in the installation of suflonyl fluoride at multiple Lys sites nonselectively. Consequently, NHSF-pretreated Siglec-7v failed to bind with azido-GD3 in vitro and with sialoglycan on BT20 or SK-MEL-28 cells. As expected, NHSF-pretreated Siglec-7v also had no effect in enhancing NK cell killing of cancer cells. These results indicate the importance of site specificity enabled by genetic encoding.
We identified sulfonyl fluoride to cross-link glycan via proximity-enabled reactivity by applying plant-and-cast cross-linkers onto protein-glycan complex. A novel bioreative Uaa SFY bearing sulfonyl fluoride was then designed and genetically incorporated into proteins via genetic code expansion. SFY-incorporated Siglec-7v specifically cross-linked its sialoglycan ligand in vitro and on cancer cell surface. Moreover, through covalently cloaking sialoglyan on cancer cell surface. Siglec-7v(SFY) significantly enhanced NK cell killing of cancer cells over the noncovalent WT Siglec-7v.
Protein-glycan interactions are noncovalent in nature. Through developing the GECX-sugar technology, here we changed this paradigm and enabled the site-specific introduction of covalent linkages into interacting protein-glycan for the first time. The latent bioreactive Uaa SFY is genetically encoded into the protein to achieve residue specificity for the covalent linkage. SFY remains stable inside cells and in the protein. The reaction of SFY with glycan is enabled by the close proximity of SFY side chain to the glycan hydroxyl group when protein binds to glycan. Therefore, through strategically placing SFY into different sites of the protein, monosaccharide selectivity for the bound glycan can also be achieved for the covalent linkage. This site-specificity for both protein and glycan of GECX-sugar will enable the precise engineering of covalent linkages to cross-link protein to the interacting glycan. Such irreversible cross-linking fundamentally overcomes the general low affinity of glycan toward protein. Similar to how covalent cross-linking of proteins by GECX has enabled the identification of weak protein-protein interactions. GECX-sugar should provide a new route to the identification of the weak and transient protein-glycan interactions. (Ref. 11). In addition, in contrast and complementary to metabolic pathway engineering which modifies the glycan. GECX-sugar is able to covalently target endogenous glycans and thus suitable for in vivo studies and therapeutic applications. (Refs. 41-42). Cross-linking of protein to the unmodified glycan converts the binding protein into an irreversible inhibitor for the native protein-glycan interaction, which can be exploited for glycan-based diagnostic and therapeutic applications, such as enhancing NK cell killing of cancer cells demonstrated here. In essence. GECX-sugar is able to transform a glycan binding protein into a non-antibody binder for specific glycan with high affinity.
Siglec-7v(SFY) covalently cross-linked to its sialoglycan ligand specifically. Based on the SuFEx reactivity of sulfonyl fluoride and SFY incorporation site in Siglec-7v. SFY should have reacted with the hydroxyl group of sialic acid. As all monosaccharides contain the hydroxyl group, we expect that SFY can be incorporated into other glycan binding proteins to covalently target various glycans, which will be verified experimentally in the future. Siglec-7v(SFY) significantly increased NK killing of cancer cells in vitro, but its anti-tumor effect in vivo awaits demonstration. Since covalent PD-1 containing SuFEx bioreactive SFY shows dramatically enhanced anti-tumor effect than the noncovalent WT PD-1 in multiple xenograft mouse models, primarily due to decoupling of the pharmacodynamics and pharmacokinetics via the covalent mechanism, we expect that the difference in enhancing NK killing between Siglec-7v(SFY) and WT Siglec-7v would be similarly more drastic in vivo than in vitro. (Ref. 12).
In summary, GECX-sugar enables site-specific introduction of covalent linkages between proteins and glycans, providing a solution to the long-standing challenge of low affinity and weak interaction. GECX-sugar will thus advance the basic study of glycobiology and inspire new avenues for protein diagnostics and therapeutics via effectively targeting glycan.
Primers were synthesized by Integrated DNA Technologies (IDT), and all plasmids were sequenced by GENEWIZ. All reagents were obtained from New England Biolabs.
Positions K20, K24, K75, K104, K127, N129, I130, K131, K135 are in bold and underlined.
The Siglec-7v gene was synthesized by IDT. Residue 20, 24, 75, 104, 127, 129, 130, 131 or 135 of Siglec-7v was mutated to an amber stop codon TAG, respectively, via site-directed mutagenesis using primers in
Bold Underlined: amber codon TAG at 2nd position
pEvol-MmSFYRS, pEvol-MmSFYRS plasmid was generated by introducing the MmSFYRS encoding gene into pEvol vector via homologous recombination. Briefly, the SFYRS gene was amplified with primers MmSFYRS-SpeI-F and MmSFYRS-SalI-R, purified, and ligated into pEvol vector (linearized with SpeI and SalI) with Exnase™ II.
pEvol-MaPylRS-wt. According to gene alignment, the active sites of Methanosarcina mazei PylRS (MmPylRS) and Methanomethylophilus alvus PylRS (MaPylRS) are highly conserved. However, MaPylRS and its derivatives usually present better solubility than those synthetases originated from MmPylRS, which may lead to higher incorporation efficiency. In order to enhance the incorporation efficiency of SFY, we decided to examine the incorporation of SFY using the Ma-tRNAPyl/PylRS pair. To achieve this goal, a pEvol-MaPylRS plasmid encoding an orthogonal pair of wt-MapylRS and evolved MaPylT was first constructed. Briefly, the wild-type MaPylRS gene (Supp Ref. 1) was chemically synthesized, amplified with MaSFYRS-SpeI-F/MaSFYRS-SalI-R primers, and introduced into the pEvol vector via homologous recombination. Then an evolved Ma-pyrrolysyl-tRNA gene MaPylT(6) (Supp Ref 2) was introduced into pEvol vector via site-directed mutagenesis with MaPylT(6)-F/R primers. The resultant plasmid was named as pEvol-MaPylRS-wt and used as the template to generate pEvol-MaSFYRS.
pEvol-MaSFYRS. Mutations carried by MmSFYRS were directly transplanted into MaPylRS via PCR-amplification with primers (MaSFYRS-R1, —F2, —R2, —F3, —R3,) and then ligated into the pEvol vector via multiple-fragment homologous recombination. To further improve the incorporation efficiency of SFY, the evolved MaPylT(6) was swapped with the wild-type MaPylT by using site-directed mutagenesis with MaPylT(wt)-F/R primers to afford the pEvol-MaSFYRS plasmid. As shown in Figure S5, the WT-MapylT afforded much higher incorporation efficiency than the evolved MaPylT(6) (indicated as MaPylT-mut in the figure).
The scheme for chemo-enzymatic synthesis of azido-GD3 and azido-lac was shown in
Synthesis of 2-(4-((fluorosulfonyl)oxy)phenyl)acetic acid (2). 2-(4-hydroxyphenyl)acetic acid (1) was converted to fluorosulfate using [4-(acetylamino)phenyl]imidodisulfuryl difluoride (AISF). (Supp Ref. 3) 1.5 g compound 1 (1.5 g, 9.9 mmol) and AISF (3.4 g, 10.8 mmol) was dissolved in 50 mL anhydrous THF. Then 1,8-Diazabicyclo[5.4.0]undec-7-ene (DBU, 3.2 g, 22 mmol) was added dropwise at room temperature (r.t.). The mixture was stirred at r.t, for 10 min. Then 200 mL EtOAc was added to dilute the reaction mixture and the organic phrase was washed sequentially by H2O (100 mL) and brine (100 mL). The organic phase was dried over anhydrous Na2SO4 and evaporated under reduced pressure to give the crude product, which was then purified by column chromatography (silica gel, DCM:MeOH=50:1) to give a white solid (1.2 g, 53%).
Synthesis of 2,5-dioxopyrrolidin-1-yl 2-(4-((fluorosulfonyl)oxy)-phenyl)acetate (NHFS). To a stirred solution of compound 2 (500 mg, 2.1 mmol) and N-hydroxysuccinimide (NHS, 358 mg, 3.1 mmol) in 4 mL anhydrous DMF was added N-(3-Dimethylaminopropyl)-N′-ethylcarbodiimide hydrochloride (EDC·HCl, 611 mg, 3.2 mmol). The mixture was stirred at r.t. for 24 h. Then the reaction was quenched with the addition of H2O (30 mL) and the mixture was extracted with EtOAc (2×30 mL). The combined organic phase was dried over anhydrous Na2SO4 and evaporated under reduced pressure to give the crude product, which was purified by column chromatography (silica gel, DCM:EtOAc=50:1) to give a white solid (452 mg, 65%). 1H NMR (CDCl3): δ 7.46 (d, J=8.8 Hz, 2H), 7.34 (d, J=8.8 Hz, 2H), 3.98 (s, 2H), 2.84 (s, 4H). 13C NMR (CDCl3): δ 169.0, 166.2, 149.6, 132.4, 131.5, 121.5, 37.0, 25.7. NHFS itself have poor signal during mass spectrum analysis. NHFS was converted to compound 4 for mass analysis. Briefly, 20 mM compound NHFS, 20 mM tert-butyl (3-aminopropyl)carbamate (3) and 20 mM NaOH was incubated in H2O at r.t, for 2 h. Then the solution was subject to mass spectrum analysis. HRMS calcd for C16H23FN2Na2O6S [M+Na]+413.1153, found: 413.1158.
Compound 5 was synthesized from 4-(fluorosulfonyl)benzoic acid using SOCl2 according to literature procedure. (Supp Ref 4). Synthesis of compound 7. To a stirred solution of compound 6 (2.5 mmol) and triethylamine (Et3N, 5.0 mmol) in H2O (2 mL) was added dropwise compound 5 (2.5 mmol) in THF (4 mL) at 0° C. The mixture was allowed to warm to room temperature and was stirred for 2 h. Then the reaction was quenched with the addition of H2O (25 mL) and the mixture was extracted with EtOAc (2×25 mL). The combined organic phase was dried over anhydrous Na2SO4 and evaporated under reduced pressure to give the crude product, which was purified by column chromatography (silica gel, DCM:MeOH=20:1) to give a white solid (about 70%).
Synthesis of compound 8. Compound 7 (0.72 mmol), N-Hydroxysulfosuccinimide sodium salt (0.72 mmol) and N,N′-Dicyclohexylcarbodiimide (DCC, 0.72 mmol) was dissolved 1.5 mL anhydrous DMF. The mixture was stirred at r.t, for 24 h under N2. A white precipitate was formed during the reaction and was removed by filtration. 20 mL diethyl ether was added to the filtrate, and a white precipitate was formed and collected by centrifuge (10 min, 3,000 rpm). The white precipitate was redissolved in 4 mL MeOH and 20 mL diethyl ether was added, and a white precipitate was formed and collected by centrifuge (10 min, 3,000 rpm). The white precipitate was further purified by preparation HPLC (C18 column) using H2O/ACN (0.05% TFA) as mobile phrase (˜65%).
Sodium 1-((3-(4-(fluorosulfonyl)benzamido)propanoyl)oxy)-2,5-dioxopyrrolidine-3-sulfonate (8a, NHSF 2C). 1H NMR (DMSO): δ 9.08 (t, J=5.6 Hz, 1H), 8.26 (d, J=8.4 Hz, 2H), 8.15 (d, J=8.4 Hz, 2H), 3.94 (d, J=8.0 Hz, 1H), 3.63-3.58 (m, 2H), 3.14-3.07 (m, 1H), 3.02 (t, J=6.4 Hz, 2H), 2.85 (dd, J=16.0 Hz, J=2.4 Hz, 1H). 13C NMR (DMSO): δ 168.7, 165.2, 164.7, 141.2, 133.6 (d, J=23 Hz, C—F), 129.1, 128.7, 56.3, 35.2, 31.0, 30.2. HRMS calcd for C14H12FN2Na2O10S2[M+Na]+496.9707, found: 496.9716.
Sodium 1-((4-(4-(fluorosulfonyl)benzamido)butanoyl)oxy)-2,5-dioxopyrrolidine-3-sulfonate. (8b, NHSF 3C). 1H NMR (DMSO): δ 8.94 (t, J=5.6 Hz, 1H), 8.24 (d, J=8.4 Hz, 2H), 8.17 (d, J=8.4 Hz, 2H), 3.95 (d, J=8.8 Hz, 1H), 3.40-3.35 (m, 2H), 3.20-2.83 (m, 2H), 2.78 (t, J=7.6 Hz, 2H), 1.94-1.87 (m, 2H), 13C NMR (DMSO): δ 168.8, 165.4, 164.2, 141.7, 133.3 (d, J=24 Hz, C—F), 129.0, 128.6, 56.3, 38.6, 31.0, 27.9, 24.0. HRMS calcd for C15H14FN2Na2O10S2[M+Na]+510.9864, found: 510.9851.
sodium 1-((8-(4-(fluorosulfonyl)benzamido)octanoyl)oxy)-2,5-dioxopyrrolidine-3-sulfonate (8c). 1H NMR (DMSO): δ 8.84 (t, J=5.6 Hz, 1H), 8.24 (d, J=8.4 Hz, 2H), 8.15 (d, J=8.4 Hz, 2H), 3.94 (d, J=8.4 Hz, 1H), 3.30-3.26 (m, 2H), 3.15-2.82 (m, 2H), 2.65 (t, J=7.6 Hz, 2H), 1.64-1.52 (m, 4H), 1.38-1.30 (m, 6H). 13C NMR (DMSO): δ 168.8, 165.3, 164.6, 141.5, 133.4 (d, J=24 Hz, C—F), 129.2, 128.6, 56.3, 31.0, 30.2, 28.8, 28.2, 28.0, 26.3, 24.3. HRMS calcd for C19H23FN2NaO10S2[M+H]+ 545.0670, found: 545.0660.
Synthesis of tert-butyl (S)-2-((tert-butoxycarbonyl)amino)-3-(4-methoxy-3-thiocyanatophenyl)propanoate (10). To a stirred solution of Selectfluor (2.4 g, 6.8 mmol) and NaSCN (550 mg, 6.8 mmol) in ACN (20 mL) was added compound 9 (800 mg, 2.27 mmol) in ACN (5 mL) at 0° C. under N2. The reaction mixture was allowed to stir at r.t, for overnight. Then the solvent was removed under reduced pressure and the residue was dissolved in 25 mL EtOAc. The organic phrase was washed sequentially by H2O (25 mL) and brine (25 mL). The organic phase was dried over anhydrous Na2SO4 and evaporated under reduced pressure to give the crude product, which was then purified by column chromatography (silica gel, Hexane:EtOAc=5:1) to give a yellow solid (637 mg, 69%). 1H NMR (CDCl3): δ 7.33 (d, J=2.0 Hz, 1H), 7.18 (dd, J=8.4 Hz, J=2.0 Hz, 1H), 6.85 (d, J=8.4 Hz, 1H), 5.04 (d, J=8.0 Hz, 1H), 4.43-4.38 (m, 1H), 3.89 (s, 3H), 3.10-2.95 (m, 2H), 1.43 (d, 18H). 13C NMR (CDCl3): δ 170.7, 155.6, 155.1, 131.8, 131.1, 130.7, 111.5, 110.4, 82.6, 80.0, 56.4, 54.9, 37.7, 28.4, 28.2. HRMS calcd for C20H28N2NaO5S [M+Na]+431.1611, found: 431.1627.
Synthesis of tert-butyl (S)-2-((tert-butoxycarbonyl)amino)-3-(3-(fluorosulfonyl)-4-methoxyphenyl)propanoate (12). To a solution of 10 (620 mg, 1.52 mmol) in EtOH (3.5 mL) was added Na2S·9H2O (730 mg, 3.0 mmol) in H2O (12 mL) at 60° C. The reaction mixture was then heated at 85° C. for 2 h. The reaction mixture was then allowed to cool down to r.t., and 10 mL H2O was added. The mixture was then adjusted to pH 6.5 with acetic acid and extracted with EtOAc (10 mL×3). The combined organic phase was dried over anhydrous Na2SO4 and evaporated under reduced pressure to give the crude thiol product as yellow solid, which was immediately used for the next step. To a stirred solution of N-chlorosuccinimide (0.65 g, 4.9 mmol) in 2M HCl (0.6 mL) and acetonitrile (2.5 mL) was added dropwise crude thiol in acetonitrile (1 mL) dropwise at 0° C. The mixture was stirred at 0° C. for another 30 min. The mixture was then diluted with EtOAc (10 mL), the organic phrase was washed sequentially by H2O (10 mL) and brine (10 mL). The organic phase was dried over anhydrous Na2SO4 and evaporated under reduced pressure to give the crude sulfonyl chloride (compound 11) product as yellow oil (604 mg).
Half of the newly prepared crude sulfonyl chloride was used for the next step. To a stirred solution of Compound 11 (300 mg, 0.67 mmol) in anhydrous THF (2 mL) was added 1.3 mL 1M tetrabutylammonium fluoride (TBAF, 1.33 mmol) in THF. The mixture was stirred at r.t, for 1 h and the completion of reaction was monitored by mass spectrum. The mixture was then diluted with EtOAc (10 mL), the organic phrase was washed sequentially by H2O (10 mL) and brine (10 mL). The organic phase was dried over anhydrous Na2SO4 and evaporated under reduced pressure to give the crude product, which was then purified by column chromatography (silica gel, DCM:EtOAc=25:1) to give compound 12 as white solid (76 mg. 24% for 3 steps). 1H NMR (CDCl3): δ 7.70 (d, J=2.0 Hz, 1H), 7.51 (dd, J=8.4 Hz, J=2.0 Hz, 1H), 7.03 (d. J=8.4 Hz, 1H), 5.07 (d, J=8.4 Hz, 1H), 4.43-4.38 (m, 1H), 3.98 (s, 3H), 3.16-2.98 (m, 2H), 1.42 (d, 18H). 13C NMR (CDCl3): δ 170.3, 157.1, 155.1, 138.5, 132.1, 129.4, 121.1 (d, J=23 Hz, C—F), 112.9, 83.0, 80.1, 56.7, 54.8, 37.3, 28.4, 28.1. HRMS calcd for C19H28FNNaO7S [M+Na]+456.1463, found: 456.1473.
Synthesis of (S)-1-carboxy-2-(3-(fluorosulfonyl)-4-methoxyphenyl)ethan-1-aminium (SFY). Compound 12 (76 mg. 0.18 mmol) was stirred in 4 M HCl in dioxane (0.5 mL) at r.t, for 24 h. Then 5 mL diethyl ether was added to the reaction mixture, and a white precipitate was formed and collected by centrifuge (10 min, 3,000 rpm). The white solid was further dried under reduced pressure to give SFY in HCl salt form (52 mg, 92%). 1H NMR (D2O): δ 7.87 (d. J=2.4 Hz, 1H), 7.74 (dd, J=8.8 Hz, J=2.4 Hz, 1H), 7.32 (d, J=8.8 Hz, 1H), 4.30 (t, J=6.8 Hz, 1H), 4.01 (s, 3H), 3.37-3.24 (m, 2H). 13C NMR (D2O): δ 171.3, 157.5, 139.4, 131.6, 126.8, 119.5 (d, J=21 Hz, C—F), 114.3, 56.7, 54.0, 34.4. HRMS calcd for C10H13FNO5S [M+H]+ 278.0493, found: 278.0510.
For wildtype Siglec-7v expression, the plasmid pBAD-siglec-7v was transformed into E. coli BL21 (DE3). For the incorporation of SFY into siglec-7v, the plasmid pBAD-siglec-7v(TAG) and was co-transformed with pEVOL-SFYRS into E. coli BL21(DE3), and plated on LB agar plate supplemented with 100 μg/mL ampicillin and 34 μg/mL chloramphenicol. Several colonies were picked and inoculated in 50 mL 2×YT (5 g/L NaCl, 16 g/L Tryptone, 10 g/L Yeast extract). The cells were grown at 37° C., 220 rpm to an OD 0.5, the medium was then added with either 0.2% L-arabinose only or 0.2% L-arabinose plus 1 mM SFY, and the expression were carried out at 25° C., 220 rpm for 18-22 h. Cells were harvested at 3000 g. 4° C. for 10 min. For protein purification, cells were resuspended in lysis buffer (20 mM Tris-HCl pH 8.0, 200 mM NaCl, 20 mM immidazole) supplemented with EDTA free protease inhibitor cocktail, 1 μg/mL Dnase. The cells were opened by sonification, after which the cell lysis solution was centrifuged at 10,000 g at 4° C. for 15 min. The pellet was suspended in guanidine buffer (20 mM Tris-HCl pH 8.0, 200 mM NaCl, 6 M guanidine) and centrifuged at 10,000 g at 4° C. for 15 min. The supernatant was collected and incubated with 500 μL Ni-NTA affinity resin. The resin was washed with guanidine wash buffer (20 mM Tris-HCl pH 8.0, 200 mM NaCl, 20 mM immidazole, 6 M guanidine) for 3 times, and then the protein was eluted twice with 20 mM Tris-HCl pH 8.0, 200 mM NaCl, 300 mM immidazole, 6 M guanidine. The eluted protein was diluted into dialysis buffer (20 mM Tris-HCl pH 8.0, 200 mM NaCl) with 4 M guanidine to a final concentration of 0.1 mg/mL and dialyzed against dialysis buffer with 2 M or 0 M guanidine for 8 hr each at 4° C. The refolded protein was concentrated to 1 mg/mL for further use.
sfGFP(2SFY)
For the incorporation of SFY into sfGFP, the plasmid pBAD-sfGFP(2TAG) and was co-transformed with pEVOL-SFYRS into E. coli BL21(DE3), and plated on LB agar plate supplemented with 100 μg/mL ampicillin and 34 μg/mL chloramphenicol. Several colonies were picked and inoculated in 50 mL 2×YT (5 g/L NaCl, 16 g/L Tryptone, 10 g/L Yeast extract). The cells were grown at 37° C., 220 rpm to an OD 0.5, the medium was then added with either 0.2% L-arabinose only or 0.2% L-arabinose plus 1 mM SFY, and the expression were carried out at 18° C., 220 rpm for 18-22 h. Cells were harvested at 3000 g, 4° C. for 10 min. For protein purification, cells were resuspended in lysis buffer (20 mM Tris-HCl pH 8.0, 200 mM NaCl, 20 mM immidazole, EDTA free protease inhibitor cocktail, 1 μg/mL Dnase). The cells were opened by sonification, after which the cell lysis solution was centrifuged at 10,000 g at 4° C. for 15 min. The supernatant was collected and incubated with 500 μL Ni-NTA affinity resin. The resin was washed with wash buffer (20 mM Tris-HCl pH 8.0, 200 mM NaCl, 20 mM immidazole) for 3 times, and then the protein was eluted twice with 20 mM Tris-HCl pH 8.0, 200 mM NaCl, 300 mM immidazole.
For the incorporation of SFY into Z protein, the plasmid pBAD-Z(24TAG) and was co-transformed with pEVOL-SFYRS into E. coli BL21(DE3), and plated on LB agar plate supplemented with 100 μg/mL ampicillin and 34 μg/mL chloramphenicol. The Z protein expression and purification was same as described above.
Twenty μg/mL Siglec-7v was incubated with array at room temperature for 3 h with gentle shaking, then washed with TSMT buffer (20 mM Tris-HCl, pH 7.4, 150 mM NaCl, 2 mM CaCl2), 2 mM MgCl2 and 0.05% Tween-20) at room temperature for 3 times. Alexa Fluor 647 conjugated 6× His tag antibody was diluted and incubated with the array at room temperature for 2 h with gentle shaking. After 3 times wash with TSMT buffer, the array was scanned at 635 nm with GenePix 4000B. The microarray was analyzed according to the fluorescence intensity, and data was interpreted into a two-dimensional bar chart. The y-axis is the fluorescence intensity to reveal relative protein binding signals for each glycan.
To test if small molecule cross-linker could cross-link Siglec-7v with azido-GD3 and azido-lac, 60 μM Siglec-7v was incubated with 2 mM azido-GD3 or azido-lac in PBS buffer, pH 7.4 at room temperature for 1 hr. The solution was treated with or without 0.3 mM NHSF or NHBr or NHFS or NHQM or HoQM at room temperature for 1 hr, respectively. The NHQM or HoQM was then illuminated with or without UV for 15 mins at wavelength 365 nm. After that, 200 μM alkyne-biotin, 0.05 mM CuSO4, 1 mM THPTA and 1 mM sodium ascorbate were added and the reaction mixture was incubated at room temperature in dark environment for 0.5 hours. Samples were then boiled at 95° C. for 5 mins and run Western blot against 6× His tag antibody or streptavidin-horseradish peroxidase (HRP).
Selection of SFY-specific synthetase (SFYRS)
DH10B cells (100 μL) harboring the pREP positive selection reporter was transformed with 122 ng of pBK-TK3 library via electroporation. The electroporated cells were subjected to selections by following procedures previously described. (Supp Refs 5-7). The pBK plasmids encoding the selected SFYRS gene were extracted by miniprep and separated from the reporter plasmids by DNA electrophoresis. The resulted pBK plasmids were analyzed by Sanger-sequencing.
SK-MEL-5 cells were plated into 6-well plate and incubated for 24 h. 100 μL Vibrio cholerae sialidase (Sigma) or 100 μL PBS was added with 400 μL medium without FBS for 24 h treatment. To test if siglec-7v(127SFY) could cross-link sialoglycan on mammalian cell surface, different concentrations of siglec-7v or siglec-7v(127SFY) was incubated with SK-MEL-5 cells pre-treated with or without sialidase in PBS buffer, pH 7.4 at 37° C. 5% CO2 incubator for 2 h. Cells was washed 3 times in PBS buffer and labeled with Alexa Fluor 488 conjugated 6× His tag monoclonal antibody at room temperature for 1 h. Cells were harvested for fluorescence-activated cell sorting (FACS) analysis.
Target cells were pre-labeled with CellTrace far red dye (Thermo Fisher Scientific) at room temperature for 10 min. Siglec-7v or siglec-7v(127SFY) of different concentrations was incubated with 5×104 target cells in PBS buffer, pH 7.4 at 37° C. 5% CO2 incubator for 2 h. Cells were washed 3 times in PBS buffer and subsequently incubated with 5×105 NK cells in incubator for 4 h. Propidium iodide (10 μg/mL, Sigma) was added to each sample, and NK cell cytotoxicity was evaluated by fluorescence-activated cell sorting (FACS) analysis. Cells were acquired after electronic gating on CellTrace far red dye, and percentage of propidium iodide-positive cells was determined. Cell death percentage was calculated as experimental % death-control % death. Control % death was determined using the group without protein incubation.
The intact protein mass was obtained using electrospray ionization mass spectrometry (ESI-MS) with a QTOF Ultima (Waters) mass spectrometer, operating under positive electrospray ionization mode, connected to an LC-20AD (Shimadzu) liquid chromatography unit. For tandem mass spectrometry, peptides were separated by nano-LC Ultimate 3000 high-performance liquid chromatography system (Thermo Fisher). The cross-linking mass spectra were analyzed with pLink 2.3. (Supp Ref 8-9).
Here we demonstrate the incorporation of SFY (
To test SFY incorporation in mammalian cells, we transfected HEK293 cells with plasmid pcDNA-EGFP-40TAG expressing EGFP gene containing a TAG codon at site Tyr40) and plasmid pNEU-MmSFYRS expressing the Mm-tRNAPyl/MmSFYRS. Fluorescence confocal microscopy showed that, in the presence of SFY, strong EGFP fluorescence was observed throughout the cells, and cell morphology remained normal (
To determine which amino acid residues could react with SFY via proximity-enabled reactivity directly in cells, we coexpressed in E. coli the Z protein and an affibody (Afb) specifically binding it. Based on the crystal structure of Afb-Z complex, we introduced SFY at site 24 of the Z protein and various natural residues at site 7 of the affibody (
We also verified if SFY incorporated into Hfq could covalently capture RNA in E. coli cells. E. coli DH10B cells expressing Hfq(25SFY) or Hfq(49SFY) were lysed and analyzed with Urea-PAGE (
In addition, to check if SFY could cross-link all four nucleotides, we incubated 50 mM SFY with 50 mM different nucleoside monophosphates (NMPs: AMP, UMP, CMP, or GMP) at 37° C. for 16 hours. Cross-linking adducts of SFY with all four NMPs were detected using MS, confirming SFY could also cross-link nucleotides unbiasedly (data not shown).
An In Vivo Method for Detecting m6A in Mammalian Cells with Single-Nucleotide Resolution
N6-methyladenosine (m6A or m6A) is a widespread RNA modification that play important roles in the regulations and functions of mRNA. (Ref 39). Identification of the m6A sites in mRNA is critical for understanding m6A function. Although many m6A detection methods have been reported, the majority of them lack single nucleotide resolution and rely on the use of m6A-specific antibody, in which the recognition of m6A is in vitro in nature. (Refs 40-42). Specifically, we proposed to use a reader protein of m6A to recognize m6A sites on mRNA, and to incorporate a bioreactive SFY into the m6A binding site of the reader to cross-link nucleotides neighboring m6A (
We used the YTH domain of human YTHDF1 protein, which is a conserved m6A reader. (Xu et al, J. Biol. Chem, 290:24902-24913 (2015); Meyer, Nat. Methods, 16:1275-1280) (2019)). Based on the crystal structure of YTHDF1 in complex with a 5-mer m6A RNA, we chose Tyr397, a residue next to the binding pocket of m6A, as the site for incorporating the bioreactive Uaa, to aim the Uaa side chain for targeting nucleotides upstream of m6A (
To detect endogenous m6A sites in mammalian cells throughout the transcriptome, we developed GRIP-seq through combining GRIP for m6A with high-throughput sequencing, enabling global identification of m6A sites in vivo with single-nucleotide resolution (
We generated four pairs of GRIP-seq libraries. For each pair, we generated one library for the INPUT sample, which represents the RNA fragments from the whole cell lysate, and one library for the IP sample, which represents the RNA fragments cross-linked with the purified YTH proteins. These four pairs included one pair from HEK293 cells expressing YTH-WT protein serving as quality control, and three pairs from the three biological replicates of HEK293 cells expressing YTH-397SFY protein. For each library, around 10 to 35 million reads were obtained (data not shown). After removing adaptors, we first mapped the reads to the transcriptome. For IP libraries, we then used the CLIPPER algorithm to identify enriched peaks, which would represent RNA regions covering the reverse transcriptional termination sites and the cross-linking sites. Lovci et al. Nat. Struct. Mol. Biol. 20:1434-1442 (2013). While only 16,659 peaks were identified from the YTH-WT IP sample, 118, 746, 151, 153, and 139, 741 peaks were separately identified from the three YTH-397SFY IP samples. Aside from the drastic difference in total peak numbers between YTH-397SFY and YTH-WT IP samples, comparisons of each gene's peak numbers among the three YTH-397SFY IP samples indicated high reproducibility (Pearson's r>0.96.
To determine if YTH-397SFY IP samples enriched m6A sites, we first identified the cross-linking-caused reverse-transcription-termination sites in these peaks (see materials and methods). Next we performed the sequence logo analysis of the sequences surrounding these reverse-transcription-termination sites. In all YTH-397SFY IP samples, the highest enriched motif was DRACH motif, which matched exactly the preferred consensus motif for m6A (
In our design, the SFY residue in YTH-397SFY protein should cross-link with the nucleotide at the close upstream of m6A (
Based on these features, we predicted a total of 13,968 m6A sites from the GRIP-seq data (data not shown). To further validate the m6A sites identified in GRIP-seq, we applied individual m6A GRIP procedures for two RNA regions that contain known m6A sites in JUN mRNA and DICER mRNA, employing gene-specific reverse transcription, ligation, amplification, and Sanger sequencing (
To further evaluate the capacity of GRIP-seq for identifying novel m6A sites, we compared the m6A sites from GRIP-seq with the known human m6A sites from the m6A-atlas, a comprehensive database for human m6A sites collected from seven published m6A-identification methods. Tang et al. Nucleic Acids Res. 49:D134-D143 (2020). The 6,686 m6A sites from GRIP-seq were known m6A sites that have been annotated in the m6A atlas, further validating GRIP-seq's ability in identifying m6A. Interestingly. 7,274 m6A sites from GRIP-seq have not been reported by any method in the m6A-atlas. Sequence logo analysis of these novel m6A sites from GRIP-seq showed strong enrichment of DRACH motif (
RNA secondary structure could alter the ability of RBPs' binding to target RNA and the reactivity of RNA nucleotides. To assess the potential effect of RNA secondary structure on GECX-RNA, we analyzed the predicted structural potential in RNA regions surrounding m6A sites from GRIP-seq and from the m6A-atlas, respectively. The m6A regions from GRIP-seq displayed a slightly less potential for stable secondary structures than the m6A regions from the m6A-atlas (
The proximity driven reactivity of GECX-RNA would enable cross-link with target RNA continuously whenever interaction occurs, allowing enriching the cross-linked product over a long period to improve the capture of interactions on low abundance RNAs. To determine if GRIP-seq was able to detect unknown m6A modifications on low abundance RNAs, we examined the abundance of mRNAs containing m6A sites detected by GRIP-seq. Among the m6A sites identified with GRIP-seq, 6,686 sites were also detected by previous methods and thus termed as “known m6A sites,” while 7,274 sites were detected by GRIP-seq only and termed as “novel m6A sites.” Between the group of genes containing only the known m6A sites and the group of genes containing only the novel m6A sites, we found that the genes containing only the novel m6A sites had significantly lower RNA abundances (
Cloning of pNEU-MmSFYRS-4xU6M15 plasmid. The MmSFYRS gene was amplified with primers HR-MmPylRS-NheI-F/HR-MmPylRS-NotI-R and ligated into pNEU-XYRS-4xU6M15 (derived from pNEU-hMbPylRS-4xU6M15, a gift from Irene Coin, Addgene plasmid #105830) which was linearized with NheI/NotI to generate pNEU-MmSFYRS-4xU6-M15.
Cloning of pNEU-MaSFYRS-NxU6-MaPylT (N=1 to 4) plasmids. The MaSFYRS and Ma-PylT expression cassettes were cloned into pNEU-XYRS-4xU6M15. Specifically, the U6 promoter was amplified from pNEU-XYRS-4xU6M15 with primers U6-F1/U6-R1, and the evolved Ma-PylT(6) was amplified from pEvol-MaSFYRS with primers Ma-PylT(6)-F2/Ma-PylT(6)—R2. The resulting fragments were joined together by overlapping PCR with primers U6-F1/Ma-PylT(6)-R2 and then amplified again with primers HR-pNEU-tRNA-XhoI-F/HR-pNEU-tRNA-SalI-R to generate a monomeric U6-MaPylT expression cassette containing XbaI-XhoI and SalI restriction sites. The first monomeric U6-MaPylT expression cassette was ligated into pNEU-XYRS-4xU6M15 vector which was linearized with XhoI/Sa/I to generate pNEU-XYRS-1xU6-MaPylT. Then the MaSFYRS was amplified from pEvol-MaSFYRS with primers HR-Ma-SFYRS-NheI-F/HR-Ma-SFYRS-NotI-R and ligated into pNEU-XYRS-1xU6-MaPylT vector which was linearized with NheI/NorI to generate pNEU-MaSFYRS-1xU6-MaPylT. The second U6-MaPylT cassette was digested with XbaI/SalI and ligated into pNEU-MaSFYRS-1xU6-MaPylT vector that was linearized with XbaI/XhoI to generate pNEU-MaSFYRS-2xU6-MaPylT. Two more U6-MaPylT cassettes were tandemly introduced into the pNEU-MaSFYRS vector following the same procedure to construct the pNEU-MaSFYRS-4xU6-MaPylT.
Cross-linking of MBP-Z24SFY and Afb4A-7X in live E. coli cells. The pET-Duet-Afb4A-7X-MBP-Z24TAG (X=A, C, S, T, H, Y, or K)1 was co-transformed with pEvol-MmSFYRS and pEvol-MaSFYRS2 respectively into BL21(DE3) E. coli chemical competent cells. The transformants were plated on an LB-Amp100Cm34 agar plate and incubated overnight at 37° C. A single colony was inoculated into 5 mL of 2×YT-Amp100Cm34 and cultured overnight at 37° C. On the following day, 1 mL of overnight cell culture was diluted into 50 mL 2×YT-Amp100Cm34 and agitated vigorously at 37° C. When OD600 reached 0.4˜0.6, the cell culture was induced with 0.5 mM IPTG and 0.2% arabinose in the presence of 1 mM SFY, and then incubated at 37° C. for 6 h. 1 mL of cell pellets were collected by centrifugation at 21000 g for 5 min at 4° C. and directly applied for immunoblot analysis. The rest of cell pellets were collected by centrifugation at 4200 g for 30 min at 4° C. The cross-linking products of MBP-Z24SFY and Afb4A-7X (X═H, Y, or K) with affinity chromatography as described previously1.
Cross-linking of GST-103SFY-107X in live mammalian cells. One day before transfection, 3×105 HEK293T cells were seeded in a Greiner 6-well cell culture dish containing 2 mL of DMEM media with 10% FBS, and incubated at 37° C. in a CO2 incubator. 1 μg of pcDNA-GST-103TAG-107X (X=A, H, Y or K)3 and 1 μg of pNEU-MmSFYRS-4xU6M15 were co-transfected into target cells using 5 μL of lipofectamine 2000 following the manufacturer's instructions. Six hours post transfection, the media were replaced with complete DMEM media with or without 1 mM SFY. The cells were incubated at 37° C. for additional 48 h, collected, and applied for immunoblot analysis.
Fluorescence confocal microscopy. One day before transfection, 3×105 HEK293T cells were seeded in a Greiner 6-well cell culture dish containing 2 mL of DMEM media with 10% FBS, and incubated at 37° C. in a CO2 incubator. Plasmids pcDNA-EGFP-40TAG (1 μg) and pNEU-MmSFYRS-4xU6M15 (1 μg) were co-transfected into target cells using 5 μL of lipofectamine 2000 following the manufacturer's instructions. Six hours post transfection, the media were replaced with complete DMEM media with or without 1 mM SFY. The cells were incubated at 37° C. for additional 24-48 h and imaged with Nikon Eclipse Ti confocal microscope.
FACS analysis of SFY incorporation. One day before transfection, 3×105 HEK293T cells were seeded in a Greiner 6 well-cell culture dish containing 2 mL of DMEM media with 10% FBS, and incubated at 37° C. in a CO2 incubator. Plasmids pcDNA-EGFP-40TAG (1 μg) and pNEU-MaSFYRS-NxU6-MaPylT (N=1 to 4) (1 μg) were co-transfected into target cells using 5 μL of lipofectamine 2000 following the manufacturer's instructions. Six hours post transfection, the media containing transfection complex were replaced with fresh DMEM media with 10% FBS in the presence or absence of 1 mM SFY. After incubation at 37° C. for 24-48 h, transfected cells were trypsinized and collected by centrifugation (1500 rpm, 5 min, r.t.). The cells were resuspended in 500 μL of FACS buffer (1×PBS, 2% FBS, 1 mM EDTA, 0.1% sodium azide, 0.28 μM DAPI) and analyzed by BD LSRFortessa™ cell analyzer.
Cell viability assay. 2×104 cells/well of HEK293T cells were seeded in a 96-well plate. On the next day, the media were replaced with fresh DMEM media supplemented with 0, 0.0625, 0.125, 0.25, 0.5, or 1 mM of SFY. The SFY-treated and control cells were cultured for an additional 24-48 h at 37° C. and then analyzed with CellTiter-Blue R Cell Viability Assay following the manufacturer's instructions.
RNase treatment and detection for exogenous Hfq-expressing E. coli cells (Hfq-SFY samples). The procedure is the same as the RNase treatment and detection for exogenous Hfq-expressing E. coli cells(Hfq-WT and Hfq-FSY samples), with the following modifications: For the transformations, pBAD-Hfq TAG mutant plasmids (pBAD-Hfq-25TAG, pBAD-Hfq-49TAG) was co-transformed with pEvol-MmSFYRS into DH10B E. coli chemical competent cells, respectively. For the exogenous expression of Hfq-SFY proteins, the cell culture was induced with 0.2% arabinose and 1 mM SFY.
In vitro incubations of NMPs and SFY. 50 mM SFY (HCl salt) and 50 mM NMP was incubated in DI H2O. 50 mM NaOH was added to neutralize the HCl salt. The mixture was incubated at 37° C. for 48 h. Then the reaction mixture was diluted for 50 times in H2O/acetonitrile (50/50, v/v, with 0.1% trifluoracetic acid) and subjected to mass spectrum analysis using positive mode. Mass spectrum analysis was performed on SCIEX MDS, 3200 Q TRAP system.
The molecular weight (MW) of addict products between SFY and NMP was calculated following this equation: MW (adduct product)=MW (SFY)+MW (NMP)−MW (HF).
Cloning of YTH domain from human YTHDF1 protein. To generate plasmids expressing YTH domain from human YTHDF1 protein with TwinStrep tag and HA tag at C-terminal in mammalian cells, three PCR products were prepared. Insert with YTHDF1 domain was amplified with primer pair of pc31-Hd3-YTHDF1-F and YTHDF1-2xstrep-R using cDNA reverse-transcribed from total RNA of HEK293T cells as template. Insert with TwinStrep tag was amplified with primer pair of 2xstrep-tag_Hs-F and 2xstrep-tag_Hs-R, pcDNA3.1 vector backbone was amplified with primer pair of pc31-HA-strep-F and pc31-Nde1-R using empty pcDNA3.1 vector as template. The final plasmid pcDNA3.1-HsYTHDF1-WT expressing wildtype YTHDF1 domain with TwinStrep tag and HA tag at C-terminal was cloned by ligating these three PCR products together using ClonExpress II one step cloning kit (Vazyme). To generate pcDNA3.1-HsYTHDF1-397TAG mutant plasmid, residue 397 of YTHDF1 gene in pcDNA3.1-HsYTHDF1-WT were mutated into an amber stop codon TAG using site-directed mutagenesis with following primers: YTHDF1-Y397TAG-F and YTHDF1-Y397TAG-R.
GRIP for in vivo m6A detection. HEK293T cells were plated in 15-cm plates and transfected with 15 μg of pcDNA3.1-HsYTHDF1 plasmids, with an additional 15 μg of pNEU-SFYRS plasmid (encoding SFY-tRNA synthetase-tRNA system for expression in mammalian cells) and 1 mM SFY for conditions involving YTHDF1-397SFY protein expression. Forty-eight hours after transfection, cells were washed twice with ice-cold PBS, and centrifuged to collect as cell pellets. Cells were lysed with 1.5 mL of 1×RIPA Buffer supplemented with protease inhibitors and RNase inhibitor. Cells were lysed on ice for 10 min and then passed through 26G-needles for 20 times to achieve full lysis. Lysates were then pelleted by centrifugation at 16,000 g for 10 min at 4° C. and the supernatants containing cleared lysates were used for pulldown with magnetic beads.
For Strep-Tactin® XT magnetic beads (Iba-lifesciences). 200 μL per sample of beads were pelleted by application of a magnet, and the supernatant was removed. Beads were washed twice with wash buffer (PBS buffer with 6 M Urea, 1 M NaCl, 1 mM DTT), and resuspended in 11.25 mL of wash buffer (PBS buffer with 6M Urea, 1 M NaCl, 1 mM DTT). 750 μL of sample lysate were added to beads and rotated overnight at 4° C.
After incubation with sample lysate, beads were pelleted, washed three times with 6M Urea. 1 M NaCl. PBS buffer, 1 mM DTT, wash once with PBS buffer with 1 M NaCl, wash once with PBS buffer, and then washed with DNase buffer (350) mM Tris-HCl (pH 6.5); 50 mM MgCl2; 5 mM DTT). Beads were resuspended in DNase buffer and TURBO DNase was added to a final concentration of 0.1 U/μL. DNase was shaking-incubated for 30 min at 37° C. Proteins were then digested by shaking-incubation with 50 μL of 5 mg/mL protease K, 2 M urea at 37° C. for 1 h. RNA was purified using QuickRNA micro prep kits.
RNA samples were reverse-transcribed with gene-specific RT primers targeting different cross-linking genes and regions (ACTB-m6A-1-RT. DICER1-m6A-1-RT, and JUN-m6A-1-RT, as listed in
Library preparation for GRIP-seq. HEK293T cells were plated in 15-cm plates and transfected with 15 pg of pcDNA3.1-HsYTHDF1 plasmids, with an additional 15 pg of pNEU-SFYRS plasmid (encoding SFY-synthetase-tRNA system for expression in mammalian cells) and 1 mM SFY for conditions involving YTHDF1-397SFY protein expression. Forty-eight hours after transfection, cells were washed twice with ice-cold PBS, and centrifuged to collect as cell pellets. The library preparation procedure for GRIP-seq was similar to the protocol from eCLIP. In brief, the cell pellets were lysed in 1 mL of eCLIP lysis buffer, partially digested with RNase I (Invitrogen). 20 pL of the cell lysate was stored as “INPUT” sample for subsequent direct library preparation (similar as in eCLIP protocol). Van Nostrand. Nat. Methods. 14:508-514 (2016). The remainder of the cell lysate (about 1 mL) was immunoprecipitated using 200 pL of pre-washed strep-tactin-XT magnetic beads (Iba-lifesciences) targeting 2xStrep-tag sequence fused at C-terminal of YTH proteins, and stringently washed (twice with high-salt denaturing buffer (PBS buffer with 6 M Urea. 1 M NaCl. 1 mM DTT) and twice with PBS buffer). After dephosphorylation with FastAP (ThermoFisher) and T4 PNK (NEB), a barcoded RNA adaptor (1:1 mixed RNA_X1Aand RNA_X1B adaptors. Table S1) was ligated to the 3′ end (T4 RNA Ligase, NEB) of cross-linked and co-purified RNA. Ligations were performed on-bead. Next, Samples were run on protein gels and transferred to nitrocellulose membranes. On the membranes, the regions containing YTH protein-RNA cross-links were excised (membrane regions 75 kDa above the YTH protein) and treated with proteinase K to release the cross-linked RNA. RNA was then reverse-transcribed with SuperScript IV reverse transcriptase (ThermoFisher) and AR17 primer (Table S1), and treated with ExoSAP-IT (ThermoFisher) to remove excess oligonucleotides. A second DNA adaptor (Rand3Tr3 adaptor, Table S1) was then ligated to the 3′ end of the cDNA fragment (T4 RNA Ligase, NEB). After cleanup (Dynabeads MyOne Silane, ThermoFisher), an aliquot of each sample was first subjected to qPCR for determining the proper number of PCR cycles. Then, the remainder was amplified (Phanta Max Super-Fidelity DNA Polymerase, Vazyme) with a pair of PCR primer for final library amplification (P1A-0N-F and P1A-0N-R, “N” represents the specific index for different sample, Table S1) and size selected via agarose gel electrophoresis. Samples were sequenced on the Illumina NovaSeq S4 platform with paired-end 2×100 format.
Table S1 Note: for rand103Tr3 adaptor, RNA_X1A and RNA_X1B, “N” and “rN” represented random nucleotides. Note: for P1A0NF and P1A0NR, “NNNNNNNN” represented the library index sequences for illumina sequencing. Note: “r” preceding a letter refers to ribose or RNA; and “3SpC3” or “/SpC3/” refers to a linking group.
Read Processing. After standard illumina Hiseq demultiplexing, GRIP-seq libraries were first processed with Fastp tool to remove PCR duplications and cut illumina adaptors, and then processed with Cutadapt tool to remove the GRIP-seq adaptors and retrieve the inserted RNA sequences according to the following GRIP-seq final library structure. Library structure with X1A_adaptor: (Read1) NNNNNCCTATAT-INSERT-NNNNNNNNNN (Read2) Library structure with X1B_adaptor: (Read1) NNNNNTGCTATT-INSERT-NNNNNNNNNN (Read2) Note: “N” in library structures representing random nucleotides. Chen et al, Bioinformatics, 34:1884-1890 (2018); Martin, EMBnet.journal 17:10-12 (2011).
Read mapping. Reads were mapped with STAR to the human genome (hg19) by default setting. Dobin et al, Bioinformatics, 29:15-21 (2013).
Identification of m6A clusters and reverse-transcription-termination sites. After mapping, CLIPper was applied on the mapped reads 2 (reads 2 is the read starting right after the cross-linking site (
Metagene and motif analyses. After reverse-transcription-termination site was identified, the sequences spanning a region 10-nt up- and downstream of termination sites were extracted and used as input for motif discovery using MEME. Metagene analysis was performed with reads mapped within m6A clusters using metaPlotR. Bailey et al. Nucleic Acids Res. 37:W202-W208 (2009); Olarerin-George et al. Bioinformatics, 33:1563-1564 (2017).
Analysis of cross-linking site positions relative to the m6A motif. Reads 2 overlapping with regions containing motif DRACH from motif analysis were extracted. The numbers of reads 2 starting right after each position relative to DRACH motif (the middle A in the motif was designated as position 0 were calculated and plotted.
Analysis of nucleotide composition at cross-linking sites. For reads in m6A clusters, the cross-linking sites were designated as the nucleotides 1-nt upstream of read 2 starting positions.
Identification of m6A sites. After the position of cross-linking site relative to m6A motif was revealed, the precise m6A sites were assigned according to the distance to the revers-transcription-termination sites.
Secondary structure analysis around m6A sites. The coordinates of published m6A sites were from m6A-atlas database. The coordinates of m6A sites from DART-seq were m6A sites from “HEK293T, DART-seq, control sample” in m6A-atlas database. For each m6A site, a sliding window of 30 nucleotides with a step of 3 nucleotides was used to calculate RNA minimum fold free energy (MFE) spanning the regions 120-nt up- and downstream of m6A sites. For each window, MFE was calculated by ViennaRNA, using default parameters. For m6A sites from different datasets, a mean MFE in each window was calculated by averaging MFE values of the windows in the same position. Tang et al, Nucleic Acids Res. 49:D134-D143 (2020); Lorenz et al, Algorithms Mol. Biol. 6:26 (2011).
Code availability. Custom code used is available at https://github.com/Shall-We-Dance/GRIP-seq.
Accession codes. All GRIP-seq data are available in SRA database with Accession number: PRJNA797913.
In SEQ ID NO:1-4, the first occurrence of the bold underlined K refers to lysine at position 104 (or a position corresponding to position 104); the second occurrence of the bold underlined K refers to lysine at position 127 (or a position corresponding to position 127); and the bold underlined N refers to asparagine at position 129 (or a position corresponding to position 129).
This application claims the benefit of priority to U.S. Application No. 63/238,357 filed Aug. 30, 2021, and U.S. Application No. 63/196,006 filed Jun. 2, 2021, the disclosures of which are incorporated by reference herein in their entirety.
This invention was made with government support under grant no. R01GM118384 and R01CA258300 awarded by the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/031925 | 6/2/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63238357 | Aug 2021 | US | |
63196006 | Jun 2021 | US |