METHODS AND COMPOUNDS FOR THE TREATMENT OF GENETIC DISEASE

Description

FIELD OF INVENTION

Disclosed herein are new chimeric heterocyclic polyamide compounds and compositions and their application as pharmaceuticals for the treatment of disease. Methods to modulate the expression of fxn in a human or animal subject are also provided for the treatment diseases such as Friedreich's ataxia.

BACKGROUND

The disclosure relates to the treatment of inherited genetic diseases characterized by underproduction of mRNA.

Friedreich's ataxia (FA or FRDA) is an autosomal recessive neurodegenerative disorder caused by mutations in the fxn gene, which encodes the protein frataxin (FXN), a iron-binding mitochondrial protein involved in electron transport and metabolism. In most subjects with FA, a GAA trinucleotide repeat (from about 66 to over 1000 trinucleotides) is included in the first intron of fxn, and this hyperexpansion is responsible for the observed pathology. Hyperexpansion of the GAA repeats results in reduced expression of FXN.

Friedreich's ataxia is characterized by progressive degradation of the nervous system, particularly sensory neurons. In addition, cardiomyocytes and pancreatic beta cells are susceptible to frataxin depletion. Symptoms usually present by age 18; however, later diagnoses of FA are not uncommon. FA patients develop neurodegeneration of the large sensory neurons and spinocerebellar tracts, as well as cardiomyopathy and diabetes mellitus. Clinical symptoms of FA include ataxia, gait ataxia, muscle weakness, loss of upper body strength, loss of balance, lack of reflexes in lower limbs and tendons, loss of sensation, particularly to vibrations, impairment of position sense, impaired perception of temperature, touch, and pain, hearing and vision impairment, including distorted color vision and involuntary eye movements, irregular foot configuration, including pes cavus and inversion, hearing impairment, dysarthria, dysphagia, impaired breathing, scoliosis, diabetes, intolerance to glucose and carbohydrates, cardiac dysfunctions including hypertrophic cardiomyopathy, arrhythmia, myocardial fibrosis, and cardiac failure. Currently there is no cure for FA, with medical treatments being limited to surgical intervention for the spine and the heart, as well as therapy to assist with balance and coordination, motion, and speech.

SUMMARY

This disclosure utilizes regulatory molecules present in cell nuclei that control gene expression. Eukaryotic cells provide several mechanisms for controlling gene replication, transcription, and/or translation. Regulatory molecules that are produced by various biochemical mechanisms within the cell can modulate the various processes involved in the conversion of genetic information to cellular components.

Several regulatory molecules are known to modulate the production of mRNA and, if directed to fxn, would modulate the production of fxn mRNA that causes Friedreich's ataxia, and thus reverse the progress of the disease.

The disclosure provides compounds and methods for recruiting a regulatory molecule into close proximity to fxn. The compounds disclosed herein contain; (a) a recruiting moiety that will bind to a regulatory molecule, linked to (b) a DNA binding moiety that will selectively bind to fxn. The compounds will counteract the expression of defective fxn in the following manner:

- (1) The DNA binding moiety will bind selectively the characteristic GAA trinucleotide repeat sequence of fxn;
- (2) The recruiting moiety, linked to the DNA binding moiety, will thus be held in proximity to)572;
- (3) The recruiting moiety, now in proximity to fxn, will recruit the regulatory molecule into proximity with the gene; and
- (4) The regulatory molecule will modulate expression, and therefore counteract the production of defective fxn by direct interaction with the gene.

The mechanism set forth above will provide an effective treatment for Friedreich's ataxia, which is caused by the expression of defective fxn. Correction of the expression of the defective fxn; gene thus represents a promising method for the treatment of Friedreich's ataxia.

The disclosure provides recruiting moieties that will bind to regulatory molecules. Small molecule inhibitors of regulatory molecules serve as templates for the design of recruiting moieties, since these inhibitors generally act via noncovalent binding to the regulatory molecules.

The disclosure further provides for DNA binding moieties that will selectively bind to one or more copies of the GAA trinucleotide repeat that is characteristic of the defective fxn gene. Selective binding of the DNA binding moiety to fxn, made possible due to the high GAA count associated with the defective fxn gene, will direct the recruiting moiety into proximity of the gene, and recruit the regulatory molecule into position to up-regulate gene transcription.

The DNA binding moiety will comprise a polyamide segment that will bind selectively to the target GAA sequence. Polyamides have been designed by Dervan and others that can selectively bind to selected DNA sequences. These polyamides sit in the minor groove of double helical DNA and form hydrogen bonding interactions with the Watson-Crick base pairs. Polyamides that selectively bind to particular DNA sequences can be designed by linking monoamide building blocks according to established chemical rules. One building block is provided for each DNA base pair, with each building block binding noncovalently and selectively to one of the DNA base pairs: A/T, T/A, G/C, and C/G. Following this guideline, trinucleotides will bind to molecules with three amide units, i.e. triamides. In general, these polyamides will orient in either direction of a DNA sequence, so that the 5′-GAA-3′ trinucleotide repeat sequence of fin can be targeted by polyamides selective either for GAA or for AAG. Furthermore, polyamides that bind to the complementary sequence, in this case, TTC or CTT, will also bind to the trinucleotide repeat sequence of fxn and can be employed as well.

In principle, longer DNA sequences can be targeted with higher specificity and/or higher affinity by combining a larger number of monoamide building blocks into longer polyamide chains. Ideally, the binding affinity for a polyamide would simply be equal to the sum of each individual monoamide/DNA base pair interaction. In practice, however, due to the geometric mismatch between the fairly rigid polyamide and DNA structures, longer polyamide sequences do not bind to longer DNA sequences as tightly as would be expected from a simple additive contribution. The geometric mismatch between longer polyamide sequences and longer DNA sequences induces an unfavorable geometric strain that subtracts from the binding affinity that would be otherwise expected.

The disclosure therefore provides DNA moieties that comprise hexaamide or pentaamide subunits that are connected by flexible spacers. The spacers alleviate the geometric strain that would otherwise decrease binding affinity of a larger polyamide sequence.

Disclosed herein are polyamide compounds that can bind to one or more copies of the trinucleotide repeat sequence GAA, and can modulate the expression of the defective fxn gene. Treatment of a subject with these compounds will counteract the expression of the defective fxn gene, and this can reduce the occurrence, severity, and/or frequency of symptoms associated with Friedreich's ataxia. Certain compounds disclosed herein will provide higher binding affinity and/or selectivity than has been observed previously for this class of compounds.

INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.

DETAILED DESCRIPTION

The transcription modulator molecule described herein represents an interface of chemistry, biology and precision medicine in that the molecule can be programmed to regulate the expression of a target gene containing nucleotide repeat GAA. The transcription modulator molecule contains DNA binding moieties that will selectively bind to one or more copies of the GAA hexanucleotide repeat that is characteristic of the defective fxn gene. The transcription modulator molecule also contains moieties that bind to regulatory proteins. The selective binding of the target gene will bring the regulatory protein into proximity to the target gene and thus downregulates transcription of the target gene. The molecules and compounds disclosed herein provide higher binding affinity and selectivity than has been observed previously for this class of compounds and can be more effective in treating diseases associated with the defective fxn gene.

Treatment of a subject with these compounds will modulate the expression of the defective fxn gene, and this can reduce the occurrence, severity, or frequency of symptoms associated with ALS. The transcription modulator molecules described herein recruits the regulatory molecule to modulate the expression of the defective fxn gene and effectively treats and alleviates the symptoms associated with diseases such as Friedreich ataxia.

Transcription Modulator Molecule

The transcription modulator molecules disclosed herein possess useful activity for modulating the transcription of a target gene having one or more GAA repeats (e.g., fxn), and may be used in the treatment or prophylaxis of a disease or condition in which the target gene (e.g., fxn) plays an active role. Thus, in broad aspect, certain embodiments also provide pharmaceutical compositions comprising one or more compounds disclosed herein together with a pharmaceutically acceptable carrier, as well as methods of making and using the compounds and compositions. Certain embodiments provide methods for modulating the expression of fxn. Other embodiments provide methods for treating a fxn-mediated disorder in a patient in need of such treatment, comprising administering to said patient a therapeutically effective amount of a compound or composition according to the present disclosure. Also provided is the use of certain compounds disclosed herein for use in the manufacture of a medicament for the treatment of a disease or condition ameliorated by the modulation of the expression of fxn.

Some embodiments relate to a transcription modulator molecule or compound having a first terminus, a second terminus, and oligomeric backbone, wherein: a) the first terminus comprises a DNA-binding moiety capable of noncovalently binding to a nucleotide repeat sequence GAA; b) the second terminus comprises a protein-binding moiety binding to a regulatory molecule that modulates an expression of a gene comprising the nucleotide repeat sequence GAA; and c) the oligomeric backbone comprising a linker between the first terminus and the second terminus. In some embodiments, the second terminus is not a Brd4 binding moiety.

In certain embodiments, the compounds have structural Formula I:

X-L-Y (1)

or a salt thereof, wherein:

- X comprises a is a recruiting moiety that is capable of noncovalent binding to a regulatory moiety within the nucleus;
- Y comprises a DNA recognition moiety that is capable of noncovalent binding to one or more copies of the trinucleotide repeat sequence GAA; and
- L is a linker.

Certain compounds disclosed herein may possess useful activity for modulating the transcription of fxn, and may be used in the treatment and/or prophylaxis of a disease or condition in which fxn plays an active role. Thus, in broad aspect, certain embodiments also provide pharmaceutical compositions comprising one or more compounds disclosed herein together with a pharmaceutically acceptable carrier, as well as methods of making and using the compounds and compositions. Certain embodiments provide methods for modulating the expression of fxn. Other embodiments provide methods for treating a fxn-mediated disorder in a patient in need of such treatment, comprising administering to said patient a therapeutically effective amount of a compound or composition according to the present disclosure. Also provided is the use of certain compounds disclosed herein for use in the manufacture of a medicament for the treatment of a disease or condition ameliorated by the modulation of the expression of fxn.

In certain embodiments, the regulatory molecule is chosen from a bromodomain-containing protein, a nucleosome remodeling factor (NURF), a bromodomain PHD finger transcription factor (BPTF), a ten-eleven translocation enzyme (TET), methylcytosine dioxygenase (TET1), a DNA demethylase, a helicase, an acetyltransferase, and a histone deacetylase (“HDAC”).

In some embodiments, the first terminus is Y, and the second terminus is X, and the oligomeric backbone is L.

In certain embodiments, the compounds have structural Formula II:

X-L-(Y₁—Y₂—Y₃)_n—Y₀ (II)

- or a salt thereof, wherein:
  - X comprises a recruiting moiety that is capable of noncovalent binding to a regulatory molecule within the nucleus;
  - L is a linker;
  - Y₁, Y₂, and Y₃are internal subunits, each of which comprises a moiety chosen front a heterocyclic ring or a C_1-6straight chain aliphatic segment, and each of which is chemically linked to its two neighbors;
  - Y₀is an end subunit which comprises a moiety chosen from a heterocyclic ring or a straight chain aliphatic segment, which is chemically linked to its single neighbor;
- each subunit can noncovalently bind to an individual nucleotide in the GAA repeat sequence;
  - n is an integer between 1 and 200, inclusive; and
  - (Y₁—Y₂—Y₃)_n—Y₀combine to form a DNA recognition moiety that is capable of noncovalent binding to one or more copies of the trinucleotide repeat sequence GAA.

In certain embodiments, the compounds of structural Formula II comprise a subunit for each individual nucleotide in the GAA repeat sequence.

In certain embodiment, each internal subunit has an amino (—NH—) group and a carboxy (—CO—) group.

In certain embodiments, the compounds of structural Formula II comprise amide (—NHCO—) bonds between each pair of internal subunits.

In certain embodiments, the compounds of structural Formula II comprise an amide (—NHCO—) bond between L and the leftmost internal subunit.

In certain embodiments, the compounds of structural Formula II comprise an amide bond between the rightmost internal subunit and the end subunit.

In certain embodiments, each subunit comprises a moiety that is independently chosen from a heterocycle and an aliphatic chain.

In certain embodiments, the heterocycle is a monocyclic heterocycle. In certain embodiments, the heterocycle is a monocyclic 5-membered heterocycle. In certain embodiments, each heterocycle contains a heteroatom independently chosen from N, O, or S. In certain embodiments, each heterocycle is independently chosen from pyrrole, imidazole, thiazole, oxazole, thiophene, and furan.

In certain embodiments, the aliphatic chain is a C_1-6straight chain aliphatic chain. In certain embodiments, the aliphatic chain has structural formula —(CH₂)_m—, for m chosen from 1, 2, 3, 4, and 5. In certain embodiments, the aliphatic chain is —CH₂CH₂—.

In certain embodiments, each subunit comprises a moiety independently chosen from

embedded image

—NH-benzopyrazinylene-CO—, —NH-phenylene-CO—, —NH-pyridiylene-CO—, —NH-piperidinylene-CO—, —NH-pyrimidinylene-CO—, —NH-anthracenylene-CO—, —NH-quinolinylene-CO—, and

embedded image

wherein Z is H, NH₂, C_1-6alkyl, C_1-6haloalkyl or C_1-6alkyl-NH₂.

In some embodiments, Py is

embedded image

Im is

embedded image

Hp is

embedded image

Th is

embedded image

Pz is

embedded image

Nt is

embedded image

Tn is

embedded image

Nh is

embedded image

iNt is

embedded image

Um is

embedded image

HpBi is

embedded image

ImBi is

embedded image

PyBi is

embedded image

Dp is

embedded image

—NH-benzopyrazinylene-CO— is

embedded image

—NH-phenylene-CO— is

embedded image

—NH-pyridinylene-CO— is

embedded image

—NH-piperidinylene-CO— is

embedded image

—NH-pyrazinylene-CO— is

embedded image

—NH-anthracenylene-CO— is

embedded image

and —NH-quintolinylene-CO— is

embedded image

In some embodiments, Py is

embedded image

Im is

embedded image

Hp is

embedded image

Th is

embedded image

Pz is

embedded image

Nt is

embedded image

Tn is

embedded image

Nh is

embedded image

iNt is

embedded image

and iIm is

embedded image

In certain embodiments, n is between 1 and 100, inclusive. In certain embodiments, n is between 1 and 50, inclusive. In certain embodiments, n is between 1 and 20, inclusive. In certain embodiments, n is between 1 and 10, inclusive. In certain embodiments, n is between 1 and 5, inclusive. In certain embodiments, n is an integer between 1 and 3, inclusive. In certain embodiments, n is chosen from 1 and 2. In certain embodiments, n is 1.

In certain embodiments, n is an integer between 1 and 5, inclusive.

In certain embodiments, n is an integer between 1 and 3, inclusive.

In certain embodiments, n is an integer between 1 and 2, inclusive.

In certain embodiments, n is 1.

In certain embodiments, L comprises a C_1-6straight chain aliphatic segment.

In certain embodiments, L comprises (CH₂OCH₂)_m; and m is an integer between 1 to 20, inclusive. In certain further embodiments, in is an integer between 1 to 10, inclusive. In certain further embodiments, in is an integer between 1 to 5, inclusive.

In certain embodiments, the compounds have structural Formula III:

X-L-(Y₁—Y₂—Y₃)—(W—Y₁—Y₂—Y₃)_n—Y₀ (III)

- or a salt thereof, wherein:
- X comprises a recruiting moiety that is capable of noncovalent binding to a regulatory molecule within the nucleus;
- L is a linker;
- Y₁, Y₂, and Y₃are internal subunits, each of which comprises a moiety chosen from a heterocyclic ring or a C_1-6straight chain aliphatic segment, and each of which is chemically linked to its two neighbors;
- Y₀is an end subunit which comprises a moiety chosen from a heterocyclic ring or a straight chain aliphatic segment, which is chemically linked to its single neighbor;
- each subunit can noncovalently bind to an individual nucleotide in the GAA repeat sequence;
- W is a spacer;
- n is an integer between 1 and 200, inclusive; and
- (Y₁—Y₂-Y₃)—(W—Y₁—Y₂-Y₃)_n—Y₀combine to form a DNA recognition moiety that is capable of noncovalent binding to one or more copies of the hexanucleotide repeat sequence GAA.

In certain embodiments, Y₁—Y₂-Y₃is:

embedded image

In certain embodiments, Y₁—Y₂-Y₃is:

embedded image

In certain embodiments, Y₁—Y₂-Y₃is Im-Py-β.

In certain embodiments, Y₁—Y₂-Y₃is Im-Im-β.

In certain embodiments, each Y₁—Y₂-Y₃is independently chosen from β-Py-Im and β-Im-Im.

In certain embodiments, at most one Y₁—Y₂-Y₃is β-Im-Im.

In certain embodiments of the compound of structural Formula III, n is between 1 and 100, inclusive. In certain embodiments of the compound of structural Formula III, n is between 1 and 50, inclusive. In certain embodiments of the compound of structural Formula III, n is between 1 and 20, inclusive. In certain embodiments of the compound of structural Formula III, n is between 1 and 10, inclusive. In certain embodiments of the compound of structural Formula III, n is between 1 and 5, inclusive. In certain embodiments of the compound of structural Formula III, n is chosen from 1 and 2. In certain embodiments of the compound of structural Formula III, n is 1.

In certain embodiments, the compounds have structural Formula IV:

X-L-(Y₁—Y₂-Y₃)—V—(Y₄-Y₅—Y₆)—Y₀ (IV)

- or a salt thereof, wherein:
- X comprises a recruiting moiety that is capable of noncovalent binding to a regulatory molecule within the nucleus;
- Y₁, Y₂, Y₃, Y₄, Y₅, and Y₆are internal subunits, each of which comprises a moiety chosen from a heterocyclic ring or a C_1-6straight chain aliphatic segment, and each of which is chemically linked to its two neighbors;
- Y₀is an end subunit which comprises a moiety chosen from a heterocyclic ring or a straight chain aliphatic segment, which is chemically linked to its single neighbor;
- each subunit can noncovalently bind to an individual nucleotide in the GAA repeat sequence;
- L is a linker;
- V is a turn component for forming a hairpin turn;
- n is an integer between 1 and 200, inclusive; and (Y₁—Y₂-Y₃)—V—(Y₄-Y₅—Y₆)—Y₀combine to form a DNA recognition moiety that is capable of noncovalent binding to one or more copies of the the trinucleotide repeat sequence GAA.

In certain embodiments of the compound of structural Formula IV, n is between 1 and 100, inclusive. In certain embodiments of the compound of structural Formula IV, n is between 1 and 50, inclusive. In certain embodiments of the compound of structural Formula IV, n is between 1 and 20, inclusive. In certain embodiments of the compound of structural Formula IV, n is between 1 and 10, inclusive. In certain embodiments of the compound of structural Formula IV, n is between 1 and 5, inclusive. In certain embodiments of the compound of structural Formula IV, n is chosen from 1 and 2. In certain embodiments of the compound of structural Formula IV, n is 1.

In certain embodiments, V is —HN—CH₂CH₂CH₂—CO—.

In certain embodiments, the compounds have structural Formula V:

X—C(═O)—CH₂CH₂—(Y₁—Y₂-Y₃)_n—NH—Y₀ (V)

- or a salt thereof, wherein:
- X comprises a recruiting moiety that is capable of noncovalent binding to a regulatory molecule within the nucleus;
- each Y₁—Y₂-Y₃is independently chosen from β-Py-Im and β-Im-Im;
- Y₀is an end subunit which comprises a moiety chosen front a heterocyclic ring or a straight chain aliphatic segment, which is chemically linked to its single neighbor; and
- n is an integer between 1 and 200, inclusive.

In certain embodiments of the compounds of structural Formula V, at most one of Y₁—Y₂-Y₃is β-Im-Im.

In certain embodiments of the compounds of structural Formula V, Y₁—Y₂-Y₃is β-Py-Im.

In certain embodiments of the compound of structural Formula V, n is between 1 and 100, inclusive. In certain embodiments of the compound of structural Formula V, n is between 1 and 50, inclusive. In certain embodiments of the compound of structural Formula V, n is between 1 and 20, inclusive. In certain embodiments of the compound of structural Formula V, n is between 1 and 10, inclusive. In certain embodiments of the compound of structural Formula V, n is between 1 and 5, inclusive. In certain embodiments of the compound of structural Formula V, n is chosen from 1 and 2. In certain embodiments of the compound of structural Formula V, n is 1.

In certain embodiments, the compounds have structural Formula VI:

embedded image

or a salt thereof, wherein:

X comprises a recruiting moiety that is capable of noncovalent binding to a regulatory molecule within the nucleus;

Y₀is an end subunit which comprises a moiety chosen from a heterocyclic ring or a straight chain aliphatic segment, which is chemically linked to its single neighbor; and

n is an integer between 1 and 200, inclusive.

In certain embodiments of the compound of structural Formula VI, n is between 1 and 100, inclusive. In certain embodiments of the compound of structural Formula VI, n is between 1 and 50, inclusive. In certain embodiments of the compound of structural Formula VI, n is between 1 and 20, inclusive. In certain embodiments of the compound of structural Formula VI, n is between 1 and 10, inclusive. In certain embodiments of the compound of structural Formula VI, n is between 1 and 5, inclusive. In certain embodiments of the compound of structural Formula VI, n is chosen from 1 and 2. In certain embodiments of the compound of structural Formula VI, n is 1.

In certain embodiments, the compounds have structural Formula VII:

embedded image

or a salt thereof, wherein:

X comprises a recruiting moiety that is capable of noncovalent binding to a regulatory molecule within the nucleus; and

W is a spacer;

Y₀is an end subunit which comprises a moiety chosen from a heterocyclic ring or a straight chain aliphatic segment, which is chemically linked to its single neighbor; and

n is an integer between 1 and 200, inclusive.

In certain embodiments of the compound of structural Formula VII, n is between 1 and 100, inclusive. In certain embodiments of the compound of structural Formula VII, n is between 1 and 50, inclusive. In certain embodiments of the compound of structural Formula VII, n is between 1 and 20, inclusive. In certain embodiments of the compound of structural Formula VII, n is between 1 and 10, inclusive. In certain embodiments of the compound of structural Formula VII, n is between 1 and 5, inclusive. In certain embodiments of the compound of structural Formula VII, n is chosen from 1 and 2. In certain embodiments of the compound of structural Formula VII, n is 1.

In certain embodiments of the compounds of structural Formula VII,

W is —NHCH₂—(CH₂OCH₂)_p—CH₂CO—; and

p is an integer between 1 and 4, inclusive.

In some embodiments, V is —(CH₂)_a—NR¹—(CH₂)_b—, —(CH₂)_a—, —(CH₂)_a—O—(CH₂)_b—, —(CH₂)_a—CH(NHR¹)—, —(CH₂)_a—CH(NHR¹)—, —(CR²R³)_a—, or —(CH₂)_a—CH(NR¹₃)⁺—(CH₂)_b—, wherein each a is independently an integer between 2 and 4; R′ is H, an optionally substituted C_1-6alkyl, an optionally substituted C_3-10cycloalkyl, an optionally substituted C_6-10aryl, an optionally substituted 4-10 membered heterocyclyl, or an optionally substituted 5-10 membered heteroaryl; each R²and R³are independently H, halogen, OH, NHAc, or C_1-4alky. In some embodiments, R¹is H. In some embodiments, R¹is C_1-6alkyl optionally substituted by 1-3 substituents selected from —C(O)-phenyl. In some embodiments, V is —(CR²R³)—(CH₂)_a— or —(CH₂)_a—(CR²R³)—(CH₂)_b—, wherein each a is independently 1-3, b is 0-3, and each R²and R³are independently H, halogen, OH, NHAc, or C_1-4alky. In some embodiments, V is —(CH₂)— CH(NH₃)⁺—(CH₂)— or —(CH₂)— CH₂CH(NH₃)⁺—.

In one aspect, the compounds of the present disclosure bind to the GAA of fxn and recruit a regulatory moiety to the vicinity of fxn. The regulatory moiety, due to its proximity to the gene, will be more likely to modulate the expression of fxn.

Also provided are embodiments wherein any compound disclosed above, including compounds of Formulas I-VII, are singly, partially, or fully deuterated. Methods for accomplishing deuterium exchange for hydrogen are known in the art.

Also provided are embodiments wherein any embodiment above may be combined with any one or more of these embodiments, provided the combination is not mutually exclusive.

As used herein, two embodiments are “mutually exclusive” when one is defined to be something which is different than the other. For example, an embodiment wherein two groups combine to form a cycloalkyl is mutually exclusive with an embodiment in which one group is ethyl the other group is hydrogen. Similarly, an embodiment wherein one group is CH₂is mutually exclusive with an embodiment wherein the same group is NH.

In one aspect, the compounds of the present disclosure provide a polyamide sequence for interaction of a single polyamide subunit to each base pair in the GAA repeat sequence. In one aspect, the compounds of the present disclosure provide a turn component V, in order to enable hairpin binding of the compound to the GAA, in which each nucleotide pair interacts with two subunits of the polyamide.

In one aspect, the compounds of the present disclosure provide more than one copy of the polyamide sequence for noncovalent binding to the fin, and the individual polyamide sequences in this compound are linked by a spacer W, as defined above. The spacer W allows this compound to adjust its geometry as needed to alleviate the geometric strain that otherwise affects the noncovalent binding of longer polyamide sequences.

First Terminus—DNA Binding Moiety

The first terminus interacts and binds with the gene, particularly with the minor grooves of the GAA sequence. In one aspect, the compounds of the present disclosure provide a polyamide sequence for interaction of a single polyamide subunit to each base pair in the GAA repeat sequence. In one aspect, the compounds of the present disclosure provide a turn component (e.g, aliphatic amino acid moiety), in order to enable hairpin binding of the compound to the GAA, in which each nucleotide pair interacts with two subunits of the polyamide.

In one aspect, the compounds of the present disclosure are more likely to bind to the repeated GAA of fxn than to GAA elsewhere in the subject's DNA, due to the high number of GAA repeats associated with fxn.

In one aspect, the compounds of the present disclosure provide more than one copy of the polyamide sequence for noncovalent binding to GAA. In one aspect, the compounds of the present disclosure bind to fxn with an affinity that is greater than a corresponding compound that contains a single polyamide sequence.

In one aspect, the compounds of the present disclosure provide more than one copy of the polyamide sequence for noncovalent binding to the GAA, and the individual polyamide sequences in this compound are linked by a spacer W, as defined above. The spacer W allows this compound to adjust its geometry as needed to alleviate the geometric strain that otherwise affects the noncovalent binding of longer polyamide sequences.

In certain embodiments, the DNA recognition or binding moiety binds in the minor groove of DNA.

In certain embodiments, the DNA recognition or binding moiety comprises a polymeric sequence of monomers, wherein each monomer in the polymer selectively binds to a certain DNA base pair.

In certain embodiments, the DNA recognition or binding moiety comprises a polyamide moiety.

In certain embodiments, the DNA recognition or binding moiety comprises a polyamide moiety comprising heteroaromatic monomers, wherein each heteroaromatic monomer binds noncovalently to a specific nucleotide, and each heteroaromatic monomer is attached to its neighbor or neighbors via amide bonds.

In certain embodiments, the DNA recognition moiety binds to a sequence comprising at least 1000 pentanucleotide repeats. In certain embodiments, the DNA recognition moiety binds to a sequence comprising at least 500 trinucleotide repeats. In certain embodiments, the DNA recognition moiety binds to a sequence comprising at least 200 trinucleotide repeats. In certain embodiments, the DNA recognition moiety binds to a sequence comprising at least 100 trinucleotide repeats. In certain embodiments, the DNA recognition moiety binds to a sequence comprising at least 50 trinucleotide repeats. In certain embodiments, the DNA recognition moiety binds to a sequence comprising at least 20 trinucleotide repeats.

In certain embodiments, the compounds comprise a cell-penetrating ligand moiety.

In certain embodiments, the cell-penetrating ligand moiety is a polypeptide.

In certain embodiments, the cell-penetrating ligand moiety is a polypeptide containing fewer than 30 amino acid residues.

In certain embodiments, the polypeptide is chosen from any one of SEQ ID NO. 1 to SEQ ID NO. 37, inclusive.

The form of the polyamide selected can vary based on the target gene. The first terminus can include a polyamide selected from the group consisting of a linear polyamide, a hairpin polyamide, a H-pin polyamide, an overlapped polyamide, a slipped polyamide, a cyclic polyamide, a tandem polyamide, and an extended polyamide. In some embodiments, the first terminus comprises a linear polyamide. In some embodiments, the first terminus comprises a hairpin polyamide.

The binding affinity between the polyamide and the target gene can be adjusted based on the composition of the polyamide. In some embodiments, the polyamide is capable of binding the DNA with an affinity of less than about 600 nM, about 500 nM, about 400 nM, about 300 nM, about 250 nM, about 200 nM, about 150 nM, about 100 nM, or about 50 nM. In some embodiments, the polyamide is capable of binding the DNA with an affinity of less than about 300 nM. In some embodiments, the polyamide is capable of binding the DNA with an affinity of less than about 200 nM. In some embodiments, the polyamide is capable of binding the DNA with an affinity of greater than about 200 nM, about 150 nM, about 100 nM, about 50 nM, about 10 nM, or about 1 nM. In some embodiments, the polyamide is capable of binding the DNA with an affinity in the range of about 1-600 nM, 10-500 nM, 20-500 nM, 50-400 nM, or 100-300 nM.

The binding affinity between the polyamide and the target DNA can be determined using a quantitative footprint titration experiment. The experiment involve measuring the dissociation constant Kd of the polyamide for target sequence at either 24° C. or 37° C., and using either standard polyamide assay solution conditions or approximate intracellular solution conditions.

The binding affinity between the regulatory protein and the ligand on the second terminus can be determined using an assay suitable for the specific protein. The experiment involve measuring the dissociation constant Kd of the ligand for protein and using either standard protein assay solution conditions or approximate intracellular solution conditions.

In some embodiments, the first terminus comprises —NH-Q-C(O)—, wherein Q is an optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or an optionally substituted alkylene group. In some embodiments, Q is an optionally substituted C_6-10arylene group or optionally substituted 5-10 membered heteroarylene group. In some embodiments, Q is an optionally substituted 5-10 membered heteroarylene group. In some embodiments, the 5-10 membered heteroarylene group is optionally substituted with 1-4 substituents selected from H, OH, halogen, C_1-10alkyl, NO₂, CN, NR′R″, C_1-6haloalkyl, C_1-6alkoxyl, C_1-6haloalkoxy, C_1-6alkoxy)C_1-6alkyl, C_2-10alkenyl, C_2-10alkynyl, C_3-7carbocyclyl, 4-10 membered heterocyclyl, C_6-10aryl, 5-10 membered heteroaryl, (C_3-7carbocyclyl)C_1-6alkyl, (4-10 membered heterocyclyl)C_1-6alkyl, (C_6-10aryl)C_1-6alkyl, (C_6-10aryl)C_1-6alkoxy, (5-10 membered heteroaryl)C_1-6alkyl, (C_3-7carbocyclyl)-amine, (4-10 membered heterocyclyl)amine, (C_6-10aryl)amine, (5-10 membered heteroaryl)amine, acyl, C-carboxy, O-carboxy, C-amido, N-amido, S-sulfonamido, N-sulfonamido, —SR′, COOH, or CONR′R″; wherein each R′ and R″ are independently H, C_1-10alkoxyl, C_1-10haloalkyl, C_1-10alkoxyl.

In some embodiments, the first terminus comprises at least three aromatic carboxamide moieties selected to correspond to the nucleotide repeat sequence GAA and at least one aliphatic amino acid residue chosen from the group consisting of glycine, β-alanine, γ-aminobutyric acid, 2,4-diaminobutyric acid, and 5-aminovaleric acid. In some embodiments, the first terminus comprises at least one β-alanine subunit.

In some embodiments, the monomer element is independently selected from the group consisting of optionally substituted pyrrole carboxamide monomer, optionally substituted imidazole carboxamide monomer, optionally substituted C—C linked heteromonocyclic/heterobicyclic moiety, and (i-alanine.

The transcription modulator molecule of claim 1, wherein the first terminus comprises a structure of Formula (A-1):

-L_1a-[A-M]_p-E₁ (A-1)

- wherein:
- each [A-M] appears p times and p is an integer in the range of 1 to 10,
- L_1ais a bond, a C_1-6alkylene, —NR^a—C_1-6alkylene-C(O)—, —NR^aC(O)—, —NR^a—C_1-6alkylene, —O—, or —O—C_1-6alkylene;
- each A is selected from the group consisting of a bond, C_1-10alkylene, optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, —C_1-10alkylene-C(O)—, C_1-10alkylene-NR^a—, —CO—, —NR^a—, —CONR^a—, —CONR^aC_1-4alkylene, —NR^aCO—C_1-4alkylene-, —C(O)O—, —O—, —S—, —S(O)—, —S(O)₂—, —C(═S)—NH—, —C(O)—NH—NH—, —C(O)—N═N—, —C(O)—CH═CH—, (CH₂)_0-4—CH═CH—(CH₂)_0-4, —N(CH₃)—C_1-6alkylene, and

embedded image

—NH—C_1-6alkylene-NH—, —O—C_1-6alkylene-O—, —NH—N═N—, —NH—C(O)—NH—, and any combinations thereof, and at least one A is —CONH—;

- each M is an optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or an optionally substituted alkylene;
- E₁is H or A^E-G;
- A^Eis absent or —NHCO—;
- G is selected from the group consisting of optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, C_0-4alkylene-NHC(═NH)NH, —CN, —C_0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N⁺H₂)(NR^aR^b)C_1-5alkylene-NR^aR^b, C_0-4alkylene-NHC(═NH)R^a, and optionally substituted amine; and
- each R^aand R^bare independently selected from the group consisting of 14, an optionally substituted C_1-6alkyl, an optionally substituted C_3-10cycloalkyl, optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, and optionally substituted 5-10 membered heteroaryl.

In some embodiments, the first terminus can comprise a structure of Formula (A-2):

embedded image

- wherein:
- L_2ais a linker selected from —C_1-12alkylene-CR^a, —CH, N, —C_1-6alkylene-N, —C(O)N, —NR^a—C_1-6alkylene-CH, —O—C_0-6alkylene-CH,

embedded image

- each p and q are independently an integer in the range of 1 to 10;
- each m and n are independently an integer in the range of 0 to 10;
- each A is independently selected from a bond, C_1-10alkylene, —C_1-10alkylene alkylene-NR^a—, —CO—, —NR^a—, —CONR^a—, —CONR^aC_1-4alkylene-, —NR^aCO—C_1-4alkylene-, —C(O)O—, —O—, —S—, —S(O)—, —S(O)₂—, —C(═S)—NH—, —C(O)—NH—NH—, —C(O)—N═N—, or —C(O)—CH═CH—, and at least one A is CONH—;
- each M is independently an optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or an optionally substituted alkylene;
- each E₁and E₂are independently H or -A^E-G;
- each A^Eis independently absent or NHCO;
- each G is independently selected from the group consisting of C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, C_0-4alkylene-NHC(═NH)NH, —CN, —C_0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N⁺H₂)(NR^aR^b)C_1-5alkylene-NR^aR^b, C_0-4alkylene-NHC(═NH)R^a, and optionally substituted amine; and
- each R^aand R^bare independently selected from the group consisting of H, an optionally substituted C_1-6alkyl, an optionally substituted C_3-10cycloalkyl, optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, and an optionally substituted 5-10 membered heteroaryl; and
- each R^1aand R^1bis independently H, or C_1-6alkyl.

In certain embodiments, the integers p and q are 2≤p+q≤20. In some embodiments, p is in the range of about 2 to 10. In some embodiments, p is in the range of about 4 to 8. In some embodiments, q is in the range of about 2 to 10. In some embodiments, q is in the range of about 4 to 8.

In certain embodiments, L^2ais —C_2-8alkylene-CH,

embedded image

and wherein each m and n is independently an integer in the range of 0 to 10. In certain embodiments, L^2ais

embedded image

In some embodiments, L^2ais —C_2-8alkylene-CH. In some embodiments, L^2ais

embedded image

wherein (m+n) is in the range of about 1 to 4. In some embodiments, L^2ais

embedded image

and (m+n) is in the range of about 2 to 5. In some embodiments, L^2ais

embedded image

wherein (m+n) is in the range of about 1 to 6.

The transcription modulator molecule of claim 1, wherein the first terminus comprises a structure of Formula (A-3):

-L_1a-[A-M]_p1-L_3a-[M-A]_q1-E₁ (A-3)

- wherein:
  - L_1ais a bond, a C_1-6alkylene, —NH—C_0-6alkylene-C(O)—, —N(CH₃)—C_0-6alkylene, or —O—C_0-6alkylene;
  - L_3ais a bond, C_1-6alkylene, —NH—C_0-6alkylene-C(O)—, —N(CH₃)—C_0-6alkylene, or —O—C_0-6alkylene, —(CH₂)_a—NR^a—(CH₂)_b—, —(CH₂)_a—, —(CH₂)_a—O—(CH₂)_b—, —(CH₂)_a—CH(NHR^a)—, (CR^1aR^1b)_a—, or —(CH₂)_a—CH(NR^aR^b)—(CH₂)_b—;
  - each a and h are independently an integer between 2 and 4;
  - each R^aand R^bare independently selected from H, an optionally substituted C_1-6alkyl, an optionally substituted C_3-10cycloalkyl, optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, and an optionally substituted 5-10 membered heteroaryl;
  - each R^1aand R^1bis independently H, halogen, OH, NHAc, or C_1-4alkyl;
  - each [A-M] appears p¹times and p¹is an integer in the range of 1 to 10;
  - each [M-A] appears q¹times and q¹is an integer in the range of 1 to 10;
  - each A is selected from a bond, C_1-10alkylene, optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, —C_1-10alkylene-C(O)—, —C_1-10alkylene-NR^a, CO, NR^a, CONR^a—, CONR^aC_1-4alkylene-, —NR^aCO—C_1-4alkylene-, —C(O)O—, —O—, —S—, —S(O)—, —S(O)₂, C(═S)—NH—, —C(O)—NH—NH—, —C(O)—N═N—, —C(O)—CH═CH—, (CH₂)_0-4—CH═CH—(CH₂)_0-4, —N(CH₃)—C_1-6alkylene, and

embedded image

—NH— C_1-6alkylene-NH—, —O— C_1-6alkylene-O—, —NH—N═N—, —NH—C(O)—NH—, and any combinations thereof, and at least one A is NHCO;

- each M in each [A-M] and [M-A] unit is independently an optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or an optionally substituted alkylene; and
- E₁is selected from the group consisting of optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, C_0-4alkylene-NHC(═NH)NH, —CN, —C_0-4alkylene-C(═NH)(NR^aR₂), —C_0-4alkylene-C(═N⁺H₂)(NR^aR^b)C1-5alkylene-NR^aR^b, and C_0-4alkylene-NHC(═NH) R^a.

In certain embodiments, the integers p¹and q¹are 2≤p¹+q¹≤20.

In some embodiments, for Formula (A-1) to (A-4), each A is independently a bond, C_1-6alkylene, optionally substituted phenylene, optionally substituted thiophenylene, optionally substituted furanylene, —C_1-10alkylene-C(O)—, —C_1-10alkylene-NH—, CO, CONR^aC_1-4alkylene, NR^aCO—C_1-4alkylene-, —C(O)O—, —O—, —S—, —C(═S)—NH—, —C(O)—NH—NH—, —C(O)—N═N—, —C(O)—CH═CH—, —CH═CH—, —NH—N═N—, —NH—C(O)—NH—, —N(CH₃)—C_1-6alkylene, and

embedded image

—NH—C_1-6alkylene-NH—, —O—C_1-6alkylene-O—, and any combinations optionally substituted 5-10 membered heteroarylene group. In some embodiments, in Formula (A-1) and (A-3), L_1ais a bond. In some embodiments, in Formula (A-1) and (A-3), L_1ais a C_1-6alkylene. In some embodiments, in Formula (A-1) and (A-3), L_1ais —NH—C_1-6alkylene-C(O)—. In some embodiments, in Formula (A-1) and (A-3), L_1ais —N(CH₃)—C_1-6alkylene-. In some embodiments, in Formula (A-1) and (A-3), L_1ais —O—C_0-6alkylene-.

In some embodiments, L_1ais a bond. In some embodiments, L_1ais C_1-6alkylene. In some embodiments, L_3ais NH—C_1-6alkylene-C(O)—. In some embodiments, L_3ais —N(CH₃)—C_1-6alkylene C(O)—. In some embodiments, L_3ais —O—C_0-6alkylene. In some embodiments, L_3ais —(CH₂)—NR^a—(CH₂)_b—. In some embodiments, L_3ais —(CH₂)_a—O—(CH₂)_b—. In some embodiments, L_3ais (CH₂)_a—CH(NHR^a)—. In some embodiments, L_3ais (CH₂)_a—CH(NHR^a)—. In some embodiments, L_3ais (CR^1aR^1b)_a—. In some embodiments, L_3ais —(CH₂)_a—CH(NR^aR^b)—(CH₂)_b—.

In some embodiments, for Formula (A-1) to (A-4), at least one A is NH and at least one A is C(O). In some embodiments, for Formula (A-1) to (A-4), at least two A is NH and at least two A is C(O). In some embodiments, when M is a bicyclic ring, A is a bond. In some embodiments, at least one A is a phenylene optionally substituted with one or more alkyl. In some embodiments, at least one A is thiophenylene optionally substituted with one or more alkyl. In some embodiments, at least one A is a furanylene optionally substituted with one or more alkyl. In some embodiments, at least one A is (CH₂)_0-4CH═CH—(CH₂)_0-4, preferably —CH═CH—. In some embodiments, at least one A is —NH—N═N—. In some embodiments, at least one A is —NH—C(O)—NH—. In some embodiments, at least one A is —N(CH₃)—C_1-6alkylene. In some embodiments, at least one A is

embedded image

In some embodiments, at least one A is —NH— C_1-6alkylene-NH—. In some embodiments, at least one A is —O—C_1-6alkylene-O—.

In some embodiments, each M in [A-M] of Formula (A-1) to (A-4) is C_6-10arylene group, 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or C_1-6alkylene; each optionally substituted by 1-3 substituents selected from H, OH, halogen, C_1-10, alkyl, NO₂, CN. NR^aR^b, C_1-6haloalkyl, —C_1-6alkoxyl, C_1-6haloalkoxy, (C_1-6alkoxy)C_1-6alkyl, C_2-10alkenyl, C_2-10alkynyl, C_3-7carbocyclyl, 44-10 membered heterocyclyl, C_6-10aryl, 5-10 membered heteroaryl, —(C_3-7carbocyclyl)C_1-6alkyl, (4-10 membered heterocyclyl)C_1-6alkyl, (C_6-10aryl)C_1-6alkyl, (C_6-10aryl)C_1-6alkoxy, (5-10 membered heteroaryl)C_1-6alkyl, —(C_3-7carbocyclyl)-amine, (4-10 membered heterocyclyl)amine, (C_6-10aryl)amine, (5-10 membered heteroaryl)amine, acyl, C-carboxy, O-carboxy, C-amido, N-amido, S-sulfonamido, N-sulfonamido, —SR′, COOH, or CONR^aR^b; wherein each R^aand R^bare independently H, C_1-10alkyl, C_1-10haloalkyl, —C_1-10alkoxyl. In some embodiments, each M in [A-M] of Formula (A-1) to (A-3) is a 5-10 membered heteroarylene containing at least one heteroatoms selected from O, S, and N or a C_1-6alkylene, and the heteroarylene or the a C_1-6alkylene is optionally substituted with 1-3 substituents selected from OH, halogen, C_1-10alkyl, NO₂, CN, NR^aR^b, C_1-6haloalkyl, —C_1-6alkoxyl, C_1-6haloalkoxy, C_3-7carbocyclyl, 4-10 membered heterocyclyl, C_6-10aryl, 5-10 membered heteroaryl, —SR′, COOH, or CONR^aR^b; wherein each R^aand R^bare independently H, C_1-10alkyl, C_1-10haloalkyl, —C_1-10alkoxyl. In some embodiments, each R in [A-R] of Formula (A-1) to (A-3) is a 5-10 membered heteroarylene containing at least one heteroatoms selected from O, S, and N, and the heteroarylene is optionally substituted with 1-3 substituents selected from OH, C_1-6alkyl, halogen, and C_1-6alkoxyl.

In some embodiments, for Formula (A-1) to (A-4), at least one M is a 5 membered heteroarylene having at least one heteroatom selected from O, N, S and optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one Ni is a pyrrole optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one M is a imidazole optionally substituted with one or more C_1-10alkyl. In some embodiments, for Formula (A-1) to (A-4), at least one M is a C_2-6alkylene optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one M is a pyrrole optionally substituted with one or more C_1-10alkyl. In some embodiments, for Formula (A-1) to (A-4), at least one M is a bicyclic heteroarylene or arylene. In some embodiments, at least one M is a phenylene optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one M is a benzimidazole optionally substituted with one or more C_1-10alkyl.

In some embodiments, the first terminus comprises a structure of Formula (A-4):

embedded image

- wherein:
- L_1cis a bivalent or trivalent group selected from

embedded image

a C_1-10alkylene, —NH—C_0-6alkylene-C(O)—, —N(CH₃)—C_0-6alkylene, and

embedded image

- p is an integer in the range of 3 to 10;
- 2≤q≤(p−1);
- 2≤r≤(p−1);
- m and n are each independently, an integer in the range of 0 to 10;
- each A²through A^pis independently selected from the group consisting of a bond, C_1-10alkylene, optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, —C_1-10alkylene-C(O)—, —C_1-10alkylene-NR^a—, —CO—, —NR^a—, —CONR^aCONR^aC_1-4alkylene-, —NR^aCO—C_1-4alkylene, —C(O)O—, —O—, —S—, —S(O)—, —S(O)₂, —C(═S)—NH—, —C(O)—NH—NH—, —C(O)—N═N—, —C(O)—CH═CH—, (CH₂)_0-4—CH═CH—(CH₂)_0-4, —N(CH₃)—C_1-6alkylene,

embedded image

—NH— C_1-6alkylene-NH—, —O— C_1-6—NH—N═N—, —NH—C(O)—NH—, and any combinations thereof, and at least one A²through A^pis NHCO;

- each M¹through M^pis an optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or an optionally substituted alkylene;
- each T²through T^pis independently selected from the group consisting of a bond, C_1-10alkylene, optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, —C_1-10alkylene-C(O) C_1-10alkylene-NR^a, —CO—, —NR^a—, —CONR^a—, —CONR^aC_1-4alkylene, NR^aCO—C_1-4alkylene, —C(O)O—, —O—, —S—, —S(O)—, —S(O)₂, —C(═S)—NH—, —C(O)—NH—NH—, —C(O)—N═N, C(O)—CH═CH, (CH₂)_0-4—CH═CH—(CH₂)_0-4, —N(CH₃)—C_1-6alkylene,

embedded image

—NH— C_1-6alkylene-NH—, —O— C_1-6alkylene-O—, —NH—N═N—, and —NH—C(O)—NH—, and any combinations thereof;

- each Q¹to Q^pis an optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or an optionally substituted alkylene;
- each A¹, A², E₁, and E₂are independently H or -A^E-G;
- each A^Eis independently absent or NHCO;
- each G is independently selected from the group consisting of optionally substituted H, C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, C_0-4alkylene-NHC(═NH)NH, —CN, —C_0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N⁺H₂)(NR^aR^b)C1-5alkylene-NR^aR^b, C_0-4alkylene-NHC(═NH) R^a, and optionally substituted amine;
- when L_1cis a trivalent group, the oligomeric backbone is attached to the first terminus through L_1c, and each G is an end group independently selected from the group consisting of optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, C_0-4alkylene-NHC(═NH)NH, —CN, —C_0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N⁺H₂)(NR^aR^b)C1-5alkylene-NR^aR^b, C_0-4alkylene-NHC(═NH) R^a, and optionally substituted amine;
- when L_1cis a divalent group, the oligomeric backbone is attached to the first terminus through one of A¹, T¹, E₁, and E₂, and each G is independently selected from the group consisting of a bond, a —C_1-6alkylene-, —NH—C_0-6alkylene-C(O)—, —N(CH₃)—C_0-6alkylene, —C(O)—, —C(O)—C_1-10alkylene, and —O—C_0-6alkylene, optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, C_0-4alkylene-NHC(═NH)NH, —CN, —C_0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N⁺H₂)(NR^aR^b)C1-5alkylene-NR^aR^b, C_0-4alkylene-NHC(═NH)R^a, and optionally substituted amine; or
- when L_1cis a bivalent group, the oligomeric backbone is attached to the first terminus through a nitrogen or carbon atom on one of M¹, M², . . . , M^p−1, M^p, T¹, T², . . . T^p−1, and T^p, and each G is an end group independently selected from the group consisting of optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, C_0-4alkylene-NHC(═NH)NH, —CN, —C_0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N⁺H₂)(NR^aR^b)C1-5alkylene-NR^aR^b, C_0-4alkylene-NHC(═NH)R^a, and optionally substituted, and
- each R^aand R^bare independently H, an optionally substituted C_1-6alkyl, an optionally substituted C_3-10cycloalkyl, optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, or an optionally substituted 5-10 membered heteroaryl;
  
  each R^1aand R^1bare independently H or an optionally substituted C_1-6alkyl.

In some embodiments, the first terminus comprises a structure of Formula (A-4a) or (A-4b):

embedded image

- wherein:
- L_1cis a bivalent or trivalent group selected from

embedded image

a C_1-10alkylene, —NH—C_0-6alkylene-C(O)—, —N(CH₃)—C_0-6alkylene, and

embedded image

- p is an integer in the range of 2 to 10;
- p¹is an integer in the range of 2 to 10;
- p′ is an integer in the range of 2 to 10;
- 2≤q≤(p−1);
- 2≤r≤(p−1)
- m and n are each independently an integer in the range of 0 to 10;
- each A²through A^pis independently selected from the group consisting of a bond, C_1-10alkylene, optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, —C_1-10alkylene-C(O)—, —C_1-10alkylene-NR^a, —CO—, —NR^a—, —CONR^a—, —CONR^aC_1-4alkylene-, —NR^aCO—C_1-4alkylene, —C(O)O—, —O—, —S—, —S(O)—, —S(O)₂, C(═S)—NH C(O)—NH—NH C(O)—N═N, C(O)—CH═CH, (CH₂)_0-4—CH═CH—(CH₂)_0-4, —(CH₃)—C_1-6alkylene, and

embedded image

—NH—C_1-6alkylene-NH—, —O— C_1-6alkylene-O—, —NH—N═N—, —NH—C(O)—NH—, and any combinations thereof, and at least one of A²through A^pis —CONH—;

each M¹through M^pis an optionally substituted. C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or an optionally substituted alkylene;

each T²through T^p′ in formula (A-4a) is independently selected from the group consisting of a bond, C_1-10alkylene, optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, —C_1-10alkylene-C(O)—, —C_1-10alkylene-NR^a—, —CO—, —NR^a—, —CONR^a—, —CONR^aC_1-4alkylene-, —NR^aCO—C_1-4alkylene, —C(O)O—, —O—, —S—, —S(O)—, —S(O)₂, C(═S)—NH, C(O)—NH—NH, C(O)—N═N—, —C(O)—CH═CH—, (CH₂)_0-4—CH═CH—(CH₂)_0-4, —N(CH₃)—C_1-6alkylene, and

embedded image

—NH— C_1-6alkylene-NH—, —O— C_1-6alkylene-O—, —NH—N═N—, —NH—C(O)—NH—, and any combinations thereof, and at least one of T²through T^pis —CONH—;

- each Q¹to Q^p; is an optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or an optionally substituted alkylene;
- each A¹, T¹, E₁, and E₂are independently H or -A^E-G,
- each A^Eis independently absent or NHCO,
- each G is independently selected from the group consisting of optionally substituted H, C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, C_0-4alkylene-NHC(═NH)NH, —C_0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N⁺H₂)(NR^aR^b)C1-5alkylene-NR^aR^b, C_0-4alkylene-NHQ═NH) R^a, and optionally substituted amine;
- when L_1cis a trivalent group, the oligomeric backbone is attached to the first terminus through L_1c, when L_1cis a bivalent group, the oligomeric backbone is attached to the first terminus through one of A¹, T¹, E₁, and E₂, or the oligomeric backbone is attached to the first terminus through a nitrogen or carbon atom on one of M¹, M², . . . M^p−1, M^p, T¹, T², . . . T^p′−1, and T^p′, and
- each R^aand R^bare independently H, an optionally substituted C_1-6alkyl, an optionally substituted C_3-10cycloalkyl, optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, or an optionally substituted 5-10 membered heteroaryl;
- each R^1aand R^1bare independently H or an optionally substituted C_1-6alkyl

In certain embodiments, L_1cis

embedded image

C_1-10alkylene, or

embedded image

In certain embodiments, L_1cis C_3-8alkylene. In certain embodiments, L_1cis

embedded image

and wherein 2≤m+≤10. In some embodiments, L_1cis C_2-8alkylene. In some embodiments, L_1cis C_3-8alkylene. In some embodiments, L_1cis C_4-8alkylene. In some embodiments, L_1cis C₃alkylene, C₄alkylene, C₅alkylene, C₆alkylene, C₇alkylene, C₈alkylene, or alkylene.

In certain embodiments, 3≤m+n≤7. In certain embodiments (m+n) is 3, 4, 5, 6, 7, 8, or 9. In certain embodiments, m is in the range of 3 to 8. In certain embodiments, in is 3, 4, 5, 6, 7, 8, or 9.

In certain embodiments, M^qis a five to 10 membered heteroaryl ring comprising at least one nitrogen; Q^qis a five to 10 membered heteroaryl ring comprising at least one nitrogen; and M is linked to Q^qthrough L_1c. In certain embodiments, M^qis a five membered heteroaryl ring comprising at least one nitrogen; Q^qis a five membered heteroaryl ring comprising at least one nitrogen; M′ is linked to Q^qthrough L_c, and L_1ais attached to the nitrogen atom on M^qand L_1cis attached to the nitrogen atom on Q.

In certain embodiments, each M¹through M^pis independently selected from an optionally substituted pyrrolylene, an optionally substituted imidazolylene, an optionally substituted pyrazolylene, an optionally substituted thioazolylene, an optionally substituted diazolylene, an optionally substituted benzopyridazinylene, an optionally substituted benzopyrazinylene, an optionally substituted phenylene, an optionally substituted pyridinylene, an optionally substituted thiophenylene, an optionally substituted furanylene, an optionally substituted piperidinylene, an optionally substituted pyrimidinylene, an optionally substituted anthracenylene, an optionally substituted quinolinylene, and an optionally substituted C_1-6alkylene.

In certain embodiments, at least one M of M¹through M^pis a 5 membered heteroarylene having at least one heteroatom selected from O, N, S and optionally substituted with one or more C_1-10alkyl. In certain embodiments, at least two M of M¹through M^pis a 5 membered heteroarylene having at least one heteroatom selected from O, N, S and optionally substituted with one or more C_1-10alkyl. In certain embodiments, at least three, four, five, or six M of M¹through M^pis a 5 membered heteroarylene having at least one heteroatom selected from O, N, S and optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of M¹through M^pis a pyrrole optionally substituted with one or more C_1-10, alkyl. In some embodiments, at least one of M¹through M^pis a imidazole optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of M¹through M^pis a C_2-6alkylene optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of M¹through M^pis a phenyl optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of M¹through M^pis a bicyclic heteroarylene or arylene. In some embodiments, at least one of M¹through M^pis a phenylene optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of M¹through M^pis a benzimidazole optionally substituted with one or more C_1-10alkyl.

In certain embodiments, each Q¹to Q^pis independently selected from an optionally substituted pyrrolylene, an optionally substituted imidazolylene, an optionally substituted pyrazolylene, an optionally substituted thioazolylene, an optionally substituted diazolylene, an optionally substituted benzopyridazinylene, an optionally substituted benzopyrazinylene, an optionally substituted phenylene, an optionally substituted pyridinylene, an optionally substituted thiophenylene, an optionally substituted furanylene, an optionally substituted piperidinylene, an optionally substituted pyrimidinylene, an optionally substituted anthracenylene, an optionally substituted quinolinylene, and an optionally substituted C_1-6alkylene.

In certain embodiments, at least one Q of Q¹through Q^pis a 5 membered heteroarylene having at least one heteroatom selected from O, N, S and optionally substituted with one or more C_1-10alkyl. In certain embodiments, at least two Q of Q¹through Q^pis a 5 membered heteroarylene having at least one heteroatom selected from O, N, S and optionally substituted with one or more C_1-10alkyl. In certain embodiments, at least three, four, five, or six Q of Q¹through Q^pis a 5 membered heteroarylene having at least one heteroatom selected from O, N, S and optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of Q¹through Q^pis a pyrrole optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of Q¹through Q^pis a imidazole optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of Q¹through Q^pis a C_2-6alkylene optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of Q¹through Q^pis a phenyl optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of Q¹through Q^pis a bicyclic heteroarylene or arylene. In some embodiments, at least one of Q¹through Q^pis a phenylene optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of Q¹through Q^pis a benzimidazole optionally substituted with one or more C_1-10alkyl.

In some embodiments, at least one of A²through A^pis NH and at least one of A²through A^pis C(O). In some embodiments, at least two of A²through A^pis NH and at least two of A²through A^pis C(O). In some embodiments, when one of M²through M″ is a bicyclic ring, the adjacent A is a bond. In some embodiments, one of A²through A^pis a phenylene optionally substituted with one or more alkyl. In some embodiments, one of A²through A^pis thiophenylene optionally substituted with one or more alkyl. In some embodiments, one of A²through A^pis a furanylene optionally substituted with one or more alkyl. In some embodiments, one of A²through A^pis (CH₂)_0-4—CH═CH—(CH₂)_0-4, preferably —CH═CH—. In some embodiments, one of A²through A^pis —NH—N═N—. In some embodiments, one of A²through A^pis —NH—C(O)—NH—. In some embodiments, one of A²through A^pis —N(CH₃)—C_1-6alkylene. In some embodiments, one of A²through A^pis

embedded image

In some embodiments, one of A²through A^pis —NH— C_1-6alkylene-NH—. In some embodiments, one of A²through A^pis —O—C_1-6alkylene-O—.

In certain embodiments, each A²through A^pis independently selected from a bond, C_1-10alkylene, optionally substituted phenylene, optionally substituted thiophenylene, optionally substituted furanylene, —C_1-10alkylene-C(O)—, —C_1-10alkylene-NH—, —CO—, —NR^a—, —CONR^a—, —CONR^aC_1-4alkylene-, —NR^aCO—C_1-4alkylene-, —C(O)O—, —O—, —S—, —C(═S)—NH—, —C(O)—NH—NH—, —C(O)—N═N—, —C(O)—CH═CH—, —CH═CH—, —NH—N═N—, —NH—C(O)—NH—, —N(CH₃)—C_1-6alkylene,

embedded image

—NH— C_1-6alkylene-NH—, and —O—C_1-6alkylene-O—, and any combinations thereof.

In some embodiments, at least one T of T²through T^pis NH and at least one of T of T²through T^pis C(O). In some embodiments, at least two T of T²through T^pis NH and at least two T of T²through T^pis C(O). In some embodiments, when one Q of Q²through Q^pis a bicyclic ring, the adjacent T is a bond. In some embodiments, one T of T¹through T^pis a phenylene optionally substituted with one or more alkyl. In some embodiments, one T of T²through T^pis thiophenylene optionally substituted with one or more alkyl. In some embodiments, one T of T²through T^pis a furanylene optionally substituted with one or more alkyl. In some embodiments, one T of T²through T^pis (CH₂)_0-4—CH═CH—(CH₂)_0-4, preferably —CH═CH—. In some embodiments, one T of T²through T^pis —NH—N═N—. In some embodiments, one T of T²through T^pis —NH—C(O)—NH—. In some embodiments, one T of T²through T^pis —N(CH₃)—C_1-6alkylene. In some embodiments, one T of T²through T^pis

embedded image

In some embodiments, one T of T²through T^pis —NH— C_1-6alkylene-NH—. In some embodiments, one T of T²through T^pis —O—C_1-6alkylene-O—.

In certain embodiments, each T²through T^pis independently selected from a bond, C_1-10alkylene, optionally substituted phenylene, optionally substituted thiophenylene, optionally substituted furanylene, —C_1-10alkylene-C(O)—, —C_1-10alkylene-NH—, —CO—, —NR^a—, —CONR^a—, CONR^aC_1-4alkylene, NR^aCO—C_1-4alkylene-, —C(O)O—, —O—, —S—, —C(═S)—NH—, —C(O)—NH—NH—, —C(O)—N═N—, —C(O)—CH═CH, —CH═CH—, —NH—N═N—, —NH—C(O)—NH—, —N(CH₃)—C_1-6alkylene, and

embedded image

—NH—C_1-6alkylene-NH—, —O—C_1-6alkylene-O—, and any combinations thereof.

In certain embodiments, each A¹, T¹, E₁, and E₂are independently -A^E-G, and each A^Eis independently absent or NHCO. In certain embodiments, each A¹, T¹, E₁, and E₂are independently -A^E-G and each A^Eis independently NHCO.

In certain embodiments, for Formula (A-1) to (A-4), each end group G independently comprises a moiety selected from the group consisting of optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, a 5-10 membered heteroaryl optionally substituted with 1-3 substituents selected from C_1-6alkyl, —NHCOH, halogen, —NR^aR^b, an optionally substituted C_1-6alkyl, C_0-4alkylene-NHC(═NH)NH, C_0-4alkylene-NHC(═NH)—R_E, —C_1-4alkylene-R_E, —CN, —C_0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N⁴H₂)(NR^aR^b)C_1-5alkylene-NR^aR^b, C_0-4alkylene-NHC(═NH) R^a, —CO-halogen, and optionally substituted amine, wherein each R^aand R^bare independently H, an optionally substituted C_1-6alkyl, an optionally substituted C_3-10cycloalkyl, optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, or an optionally substituted 5-10 membered heteroaryl. In certain embodiments, for Formula (A-1) to (A-4), each end group G independently comprises a NH or CO group. In certain embodiments, each R^aand R^bare independently H or C_1-6alkyl. In certain embodiments, for formula (A-1) to (A-4), at least one of the end groups is H. In certain embodiments, for Formula (A-1) to (A-4), at least two of the end groups are H. In certain embodiments, for Formula (A-1) to (A-4), at least one of the end groups is H. In certain embodiments, for Formula (A-1) to (A-4), at least one of the end groups is NH-5-10 membered heteroaryl ring optionally substituted with one or more alkyl or —CO-5-10 membered heteroaryl ring optionally substituted with one or more alkyl.

In certain embodiments, for Formula (A-1) to (A-4), each end group G is independently selected from C_1-4alkylNHC(═NH)NH₂,

embedded image

—C(═NH)(NH₂),

embedded image

In certain embodiments, for Formula (A-1) to (A-4), each E₁independently comprises an optionally substituted thiophene-containing moiety, optionally substituted pyrrole containing moiety, optionally substituted imidazole containing moiety, or optionally substituted amine.

In certain embodiments, for Formula (A-1) to (A-4), each E₂independently comprises an optionally substituted thiophene-containing moiety, optionally substituted pyrrole containing moiety, optionally substituted imidazole containing moiety, or optionally substituted amine.

In certain embodiments, for Formula (A-1) to (A-4), each E₁and F₂independently comprises a moiety selected from the group consisting of optionally substituted N-methylpyrrole, optionally substituted N-methylimidazole, optionally substituted benzimidazole moiety, and optionally substituted 3-(dimethylamino)propanamidyl. In certain embodiments, each E₁and E₂independently comprises thiophene, benzothiophene, C—C linked benzimidazole/thiophene-containing moiety, or C—C linked hydroxybenzimidazole/thiophene-containing moiety. In certain embodiments, for Formula (A-1) to (A-4), each E₁and E₂independently also comprises NH or CO group.

In certain embodiments, for Formula (A-1) to (A-4), each E, or E₂independently comprises a moiety selected from the group consisting of isophthalic acid; phthalic acid; terephthalic acid; morpholine; N,N-dimethylbenzamide; N,N-bis(trifluoromethyl)benzamide; fluorobenzene; (trifluoromethyl)benzene; nitrobenzene; phenyl acetate; phenyl 2,2,2-trifluoroacetate; phenyl dihydrogen phosphate; 2H-pyran; 2H-thiopyran; benzoic acid; isonicotinic acid; and nicotinic acid; wherein one, two, or three ring members in any of the end-group candidates can be independently substituted with C, N, S or O; and where any one, two, three, four or five of the hydrogens bound to the ring can be substituted with R^3a, wherein R₅may be independently selected from H, OH, halogen, C_1-10alkyl, NO₂, NH₂, C_1-10haloalkyl, —OC_1-10haloalkyl, COOH, and CONR^1cR^1d; wherein each R^1cand R^1dare independently H, C_1-10alkyl, C_1-10haloalkyl, or —C_1-10alkoxyl.

In some embodiments, the first terminus comprises the stricture of Formula (A-5a) or Formula (A-5b):

A^1a-NH-Q¹-C(O)—NH-Q²-C(O)—NH-Q³-C(O) . . . —NH-Q^p−1C(O)—NH—C(O)NH-G (A-5a)

T^1a-C(O)-Q¹-NH—C(O)-Q²NH—C(O)-Q³-NH— . . . —C(O)-Q^p−1NH—C(O)-Q^p-NHC(O)-G (A-5b)

- wherein:
- each Q¹, Q², Q³. . . through Q^pare independently an optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or an optionally substituted alkylene;
- each A^1aand T^1aare independently a bond, H, a —C_1-6alkylene-, —NH—C_0-6alkylene-C(O)—, —N(CH₃)—C_0-6alkylene, —C(O)—, —C(O)—C_1-10alkylene, and —O—C_0-6alkylene, optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, C_0-4alkylene-NHC(═NH)NH, —CN, —C_0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N⁺H₂)(NR^aR^b)C1-5alkylene-NR^aR^b, C_0-4alkylene-NHC(═NH) R^a, and optionally substituted amine;
- p is an integer between 2 and 10; and
- G is selected from the group consisting of optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, or an optionally substituted alkyl, C_0-4alkylene —NHC(═NH)NH, —CN, —C_0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N⁺H₂)(NR^aR^b)C1-5alkylene-C_6-10alkylene-NHC(═NH) R^a, and optionally substituted amine;
- each R^aand R^bare independently H, an optionally substituted C_1-6alkyl, an optionally substituted C_3-10cycloalkyl, optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, or an optionally substituted 5-10 membered heteroaryl; and
- wherein the first terminus is connected to the oligomeric backbone through either A¹or T¹, or a nitrogen or carbon atom on one of Q¹through V.

In certain embodiments, the first terminus comprises the structure of Formula (A-5c):

embedded image

- wherein:
- each Q_a¹, Q_a². . . Q_a^p. . . through Q_a^pare independently an optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or an optionally substituted alkylene;
- each Q_b¹, Q_b². . . Q_b^r. . . through Q_b^pare independently an optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or an optionally substituted alkylene;
- p is an integer between 3 and 10;
- 2≤q≤(p−1);
- 2≤r≤(p−1);
- L_ais selected from a divalent or trivalent group selected from the group consisting of

embedded image

a C_1-10alkylene, —NH—C_0-6alkylene-C(O)—, —N(CH₃)—C_0-6alkylene, and

embedded image

- each m and n are independently an integer in the range of 1 to 10;
- n is an integer in the range of 1 to 10;
- each R^1aand R^1bare independently H, or C_1-6alkyl;
- when L_ais a trivalent group, the oligomeric backbone is attached to the first terminus through L_a, and each W_a¹, G_a, G_b, and W_b¹are end groups independently selected from the group consisting of optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, C_0-4alkylene-NHC(═NH)NH, —CN, —C_0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N⁺H₂)(NR^aR^b)C1-5alkylene-NR^aR^b, C_0-4alkylene-NHC(═NH) R^a, and optionally substituted amine;

when L_ais a divalent group, the oligomeric backbone is attached to the first terminus through one of W_a¹, G_a, G_b, and W_b¹, and each W_a¹, G_a, G_b, and W_b¹are independently selected from the group consisting of a bond, a —C_1-6alkylene-, —NH—C_0-6alkylene-C(O)—, —N(CH₃)—O_0-6alkylene, —C(O)—, —C(O)—C_1-10alkylene, and —O—C_0-6alkylene, optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl. C_0-4alkylene-NHC(═NH)NH, —CN, —C_0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N⁺H₂)(NR^aR^b)C1-5alkylene-NR^aR^b, C_0-4alkylene-NHC(═NH) R^a, and optionally substituted amine; or

- when L_ais a bivalent group, the oligomeric backbone is attached to the first terminus through a nitrogen or carbon atom on one of Q_a¹, Q_a², . . . Q_a^p−1, Q_a^p, Q_b¹, Q_a², . . . Q_b^p−1, and Q_b^p, and each W_a¹, G_a, G_b, and W_b¹are end groups independently selected from the group consisting of optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, C_0-4alkylene-NHC(═NH)NH, —CN, —C_0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N⁺H₂)(NR^aR^b)C1-5alkylene-NR^aR^b, C_0-4alkylene-NHC(═NH) R^a, and optionally substituted amine, and
- each R^aand R^bare independently H, an optionally substituted C_1-6alkyl, an optionally substituted C_3-10cycloalkyl, optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, or an optionally substituted 5-10 membered heteroaryl.

In some embodiments, the first terminus comprises the structure of Formula (A-5c) or (A-5d):

embedded image

- wherein:
- each Q_a¹, Q_a². . . Q_a^q. . . through Q_a^pare independently an optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or an optionally substituted alkylene;
- each Q_b¹, Q_b². . . Q_b^r. . . through Q_b^p′ are independently an optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or an optionally substituted alkylene;

p and p¹are independently an integer between 3 and 10;

- 2≤q≤(p−1);
- 2≤r≤(p−1);
- L_ais selected from a divalent or trivalent group selected from the group consisting of

embedded image

a C_1-10alkylene, —NH—C_0-6alkylene-C(O)—, —N(CH₃)—C_0-6alkylene, and

embedded image

- each m and n are independently an integer in the range of 1 to 10;
- n is an integer in the range of 1 to 10;
- each R^1aand R^1bare independently H, or C_1-6alkyl;
- each W_a¹, G_a, G_b, and W_b¹are end groups independently selected from the group consisting of optionally substituted H, C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, C_0-4alkylene-NHC(═NH)NH, —CN, —C_0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N′ H₂)(NR^aR^b)C1-5alkylene-NR^aR^b, C_0-4alkylene-NHC(═NH) R^a, and optionally substituted amine;
- when L_ais a trivalent group, the oligomeric backbone is attached to the first terminus through L_a; and when L_ais a divalent group, the oligomeric backbone is attached to the first terminus through one of W_a¹, E_a, E_b, and W_b¹, or the oligomeric backbone is attached to the first terminus through a nitrogen or carbon atom on one of Q_a¹, Q_a², . . . Q_b^p−1, Q_a^p, Q_b¹, Q_a², . . . Q_b^p′−1, and Q_b^P′; and
- each R^aand R^bare independently H, an optionally substituted C_1-6alkyl, an optionally substituted C_3-10cycloalkyl, optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, or an optionally substituted 5-10 membered heteroaryl.

In certain embodiments of Formula (A-5c)-(A-5d), L_ais a C_2-8alkylene. In certain embodiments, L_ais C_3-8alkylene. In certain embodiments, L_ais

embedded image

and wherein 2≤m+n≤10. In some embodiments, L_ais C_4-8alkylene. In some embodiments, L_ais C_3-7alkylene. In some embodiments, L_ais C₃alkylene, C₄alkylene, C₅alkylene, C₆alkylene, C₇alkylene, C₈alkylene, or Ca alkylene.

In certain embodiments, for Formula (A-5c)-(A-5d), 3≤m+n≤7. In certain embodiments, (m+n) is 3, 4, 5, 6, 7, 8, or 9. In certain embodiments, in is in the range of 3 to 8. In certain embodiments, m is 3, 4, 5, 6, 7, 8, or 9. In certain embodiments, for Formula (A-5c), p is 2-10. In certain embodiments, for formula (A-5c), p is 3-8. In certain embodiments, for formula (A-5c), p is 2, 3, 4, 5, 6, 7, or 8. In certain embodiments, for Formula (A-5c), q is 2-5. In certain embodiments, for formula (A-5c), p is 2-4. In certain embodiments, for Formula (A-5c), p is 2, 3, 4, 5, or 6.

In certain embodiments, Q_a^qis a five to 10 membered heteroaryl ring comprising at least one nitrogen; Q_b^q′ is a five to 10 membered heteroaryl ring comprising at least one nitrogen; and Q_a^qis linked to Q_b^rthrough L_a. In certain embodiments, Q_a^qis a five membered heteroaryl ring comprising at least one nitrogen; Q_b^ris a five membered heteroaryl ring comprising at least one nitrogen; Q_a^qis linked to Q_b^rthrough L_a, and L_ais attached to the nitrogen atom on Q_aq and L₁, is attached to the nitrogen atom on Q_b^E.

In certain embodiments, each Q_a¹through Q_a^pis independently selected from an optionally substituted pyrrolylene, an optionally substituted imidazolylene, an optionally substituted pyrazolylene, an optionally substituted thioazolylene, an optionally substituted diazolylene, an optionally substituted benzopyridazinylene, an optionally substituted benzopyrazinylene, an optionally substituted phenylene, an optionally substituted pyridinylene, an optionally substituted thiophenylene, an optionally substituted furanylene, an optionally substituted piperidinylene, an optionally substituted pyrimidinylene, an optionally substituted anthracenylene, an optionally substituted quinolinylene, and an optionally substituted C_1-6alkylene.

In certain embodiments, at least one Q of Q_a¹through Q_a^pis a 5 membered heteroarylene having at least one heteroatom selected from O, N, S and optionally substituted with one or more C_1-10alkyl. In certain embodiments, at least two Q of Q_a¹through Q_a^pis a 5 membered heteroarylene having at least one heteroatom selected from O, N, S and optionally substituted with one or more C_1-10alkyl. In certain embodiments, at least three, four, five, or six Q of Q_a¹through Q_a^pis a 5 membered heteroarylene having at least one heteroatom selected from O, N, S and optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one Q of Q_a¹through Q_a^pis a pyrrole optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of Q of Q_a¹through Q_a^pis a imidazole optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one Q of Q_a¹through Q_a^pis a C_2-6alkylene optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one Q of Q_a¹through Q_a^pis a phenyl optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one Q of Q_a¹through Q_a^pis a bicyclic heteroarylene or arylene. In some embodiments, at least one Q of Q_a¹through Q_a^pis a phenylene optionally substituted with one or more C_1-10, alkyl. In some embodiments, at least one Q of Q_a¹through Q_a^pis a benzimidazole optionally substituted with one or more C_1-10alkyl.

In certain embodiments, each Q_b¹through Q_b^pis independently selected from an optionally substituted pyrrolylene, an optionally substituted imidazolylene, an optionally substituted pyrazolylene, an optionally substituted thioazolylene, an optionally substituted diazolylene, an optionally substituted benzopyridazinylene, an optionally substituted benzopyrazinylene, an optionally substituted phenylene, an optionally substituted pyridinylene, an optionally substituted thiophenylene, an optionally substituted furanylene, an optionally substituted piperidinylene, an optionally substituted pyrimidinylene, an optionally substituted anthracenylene, an optionally substituted quinolinylene, and an optionally substituted C_1-6alkylene.

In certain embodiments, at least one Q of Q_b¹through Q_b^p′ is a 5 membered heteroarylene having at least one heteroatom selected from O, N, S and optionally substituted with one or more C_1-10alkyl. In certain embodiments, at least two Q of Q_b¹through Q_b^p′ is a 5 membered heteroarylene having at least one heteroatom selected from O, N, S and optionally substituted with one or more C_1-10alkyl. In certain embodiments, at least three, four, five, or six Q of Q_b¹through Q₁^p′ is a 5 membered heteroarylene having at least one heteroatom selected from O, N, S and optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of Q_b¹through Q_b^p′ is a pyrrole optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of Q_b¹through Q_b^p′ is a imidazole optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of Q_b¹through Q_b^p; is a C_2-6alkylene optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of Q_b¹through Q_b^p′ is a phenyl optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of Q_b¹through Q_b^p′ is a bicyclic heteroarylene or arylene. In some embodiments, at least one of Q_b¹through Q_b^p′ is a phenylene optionally substituted with one or more C_1-10alkyl. In some embodiments, at least one of Q_b¹through Q_b^p′ is a benzimidazole optionally substituted with one or more C_1-10alkyl.

In certain embodiments, for Formula (A-5c), each end group G_a, G_b, W_a¹, and W_b¹is independently selected from the group consisting of optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, a 5-10 membered heteroaryl optionally substituted with 1-3 substituents selected from C_1-6alkyl, —NHCOH, halogen, —NR^aR^b, an optionally substituted C_1-6alkyl, C_0-4alkylene-NHC(═NH)NH, C_0-4alkylene-NHC(═NH)—R^a, —C_1-4alkylene-R^a, —CN, —C0-4alkylene-C(═NH)(NR^aR^b), —C_0-4alkylene-C(═N⁺H₂)(NR^aR^b)C_1-5alkylene-NR^aR^b, C_0-4alkylene-NHC(═NH) R^a, —CO-halogen, and optionally substituted amine, wherein each R^aand R^bare independently H, an optionally substituted C_1-6alkyl, an optionally substituted C_3-10cycloalkyl, optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, or an optionally substituted 5-10 membered heteroaryl. In certain embodiments, each R^aand R^bare independently H or C_1-6alkyl. In certain embodiments, at least one of the end groups is 5-10 membered heteroaryl optionally substituted with C_1-6alkyl, COOH, or OH. In certain embodiments, at least two of the end groups are 5-10 membered heteroaryl optionally substituted with C_1-6alkyl, COOH, or OH. In certain embodiments, for Formula (A-1) to (A-5d), at least one of the end groups is 5-10 membered heteroaryl optionally substituted with C_1-6alkyl, COOH, or OH. In certain embodiments, at least one of the end groups is 5-10 membered heteroaryl ring optionally substituted with one or more alkyl.

In some embodiments, A^Eis absent. In some embodiments, A^Eis —NHCO—.

In some embodiments, the first terminus comprises at least one C_3-5achiral aliphatic or heteroaliphatic amino acid.

In some embodiments, the first terminus comprises one or more subunits selected from the group consisting of optionally substituted pyrrole, optionally substituted imidazole, optionally substituted thiophene, optionally substituted furan, optionally substituted beta-alanine, γ-aminobutyric acid, (2-aminoethoxy)-propanoic acid, 3((2-aminoethyl)(2-oxo-2-phenyl-1λ²-ethyl)amino)-propanoic acid, or dimethylaminopropylamide monomer.

In some embodiments, the first terminus comprises a polyamide having the structure of Formula (A-6):

embedded image

- wherein:
- each A¹is —NH— or —NH—(CH₂)_m—CH₂—C(O)—NH—;
- each M is an optionally substituted C_6-10arylene group, optionally substituted 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or optionally substituted alkylene;
- m is an integer between 1 to 10; and
- n is an integer between 1 and 6.

In some embodiments, each M¹in [A¹-M¹] of Formula (A-6) is a C_6-10arylene group, 4-10 membered heterocyclene, optionally substituted 5-10 membered heteroarylene group, or C_1-6alkylene; each optionally substituted by 1-3 substituents selected from H, OH, halogen, C_1-10alkyl, NO₂, CN, NR′R″, C_1-6haloalkyl, —C_1-6alkoxyl, C_1-6haloalkoxy, (C_1-6alkoxy)C_1-6alkyl, C_2-10alkenyl, C_2-10alkynyl, C_3-7carbocyclyl, 4-10 membered heterocyclyl 4-10 membered heterocyclyl, C_6-10aryl, 5-10 membered heteroaryl, —(C_3-7carbocyclyl)C_1-6alkyl, (4-10 membered heterocyclyl 4-10 membered heterocycyl)C_1-6alkyl, (C_6-10aryl)C_1-6alkyl, (C_6-10aryl)C_1-6alkoxy, (5-10 membered heteroaryl)C_1-6alkyl, —(C_3-7carbocyclyl)-amine, (4-10 membered heterocyclyl)amine, (C_6-10aryl)amine, (5-10 membered heteroaryl)amine, acyl, C-carboxy, O-carboxy, C-amido, N-amido, S-sulfonamido, N-sulfonamido, —SR′, COOH, or CONR′R″; wherein each R′ and R″ are independently H, C_1-10alkyl, C_1-10haloalkyl, —C_1-10alkoxyl. In some embodiments, each R¹in [A¹-R¹] of Formula (A-6) is a 5-10 membered heteroarylene containing at least one heteroatoms selected from O, S, and N or a C_1-6alkylene, and the heteroarylene or the a C_1-6alkylene is optionally substituted with 1-3 substituents selected from OH, halogen, C_1-10alkyl, NO₂, CN, NR′R″, C_1-6haloalkyl, —C_1-6alkoxyl, C_1-6haloalkoxy, C_3-7carbocyclyl, 4-10 membered heterocyclyl, C_6-10aryl, 5-10 membered heteroaryl, —SR′, COOH, or CONR′R″; wherein each R′ and R″ are independently H, C_1-10alkyl, O_1-10haloalkyl, —C_1-10alkoxyl. In some embodiments, each R¹in [A¹-R¹] of Formula (A-6) is a 5-10 membered heteroarylene containing at least one heteroatoms selected from O, S, and N, and the heteroarylene is optionally substituted with 1-3 substituents selected from OH, C_1-6alkyl, halogen, and C_1-6alkoxyl.

In some embodiments, the first terminus has a structure of Formula (A-7):

embedded image

or a salt thereof, wherein:

E is an end subunit which comprises a moiety chosen from a heterocyclic group or a straight chain aliphatic group, which is chemically linked to its single neighbor;

- X¹, Y¹, and Z¹in each m¹unit are independently selected from CR⁴, N, O or S;
- X², Y², and Z²in each m³unit are independently selected from CR⁴, N, NR⁵, O, or S;
- X³, Y³, and Z⁴in each m⁵unit are independently selected from CR⁴, N, NR⁵, O, or S;
- X⁴, Y⁴, and Z⁴in each m⁷unit are independently selected from CR⁴, N, NR⁵, O, or S;
- each R⁴is independently H, —OH, halogen, C_1-6alkyl, C_1-6alkoxyl;
- each R⁵is independently H, C_1-6alkyl or C_1-6alkylamine;
- each m¹, m³, m⁵and m⁷are independently an integer between 0 and 5;
- each m², m⁴and m⁶are independently an integer between 0 and 3; and
- m³+M¹+m⁵+M⁶+M⁷is between 3 and 15.

In some embodiments, m¹is 3, and X¹, Y¹, and Z¹in the first unit is respectively CH, N(CH₃), and CH; X¹, Y¹, and Z¹in the second unit is respectively CH, N(CH₃), and N; and X¹, Y¹and Z¹in the third unit is respectively CH, N(CH₃), and N. In some embodiments, m³is 1, and X², Y², and Z²in the first unit is respectively CH, N(CH₃), and CH. In some embodiments, m) is 2, and X³, Y³, and Z³in the first unit is respectively CH, N(CH₃), and N; X³, Y³, and Z³in the second unit is respectively CH, N(CH₃), and N. In some embodiments, m⁷is 2, and X⁴, Y⁴, and Z⁴in the first unit is respectively CH, N(CH₃), and CH; X⁴, Y⁴, and Z⁴in the second unit is respectively CH, N(CH₃), and CH. In some embodiments, each m², m⁴and m⁶are independently 0 or 1. In some embodiments, each of the X¹, Y¹, and Z¹in each m¹unit are independently selected from CH, N, or N(CH₃). In some embodiments, each of the X², Y², and Z²in each m³unit are independently selected from CH, N, or N(CH₃). In some embodiments, each of the X³, Y³, and Z′ in each ne unit are independently selected from CH, N, or N(CH₃). In some embodiments, each of the X⁴, Y⁴, and Z⁴in each m′ unit are independently selected from CH, N, or N(CH₃). In some embodiments, each Z¹in each m′ unit is independently selected from CR⁴or NR⁵. In some embodiments, each Z²in each m³unit is independently selected from CR⁴or NR⁵. In some embodiments, each Z³in each m⁵unit is independently selected from CR⁴or NR⁵. In some embodiments, each Z⁴in each m⁷unit is independently selected from CR⁴or NR⁵. In some embodiments, R⁴is H, CH₃, or OH. In some embodiments, R⁵is H or CH₃.

In some embodiments, for Formula (A-7), the sum of m², m⁴and m⁶is between 1 and 6. In some embodiments, for formula (A-7), the sum of m², m⁴and m⁶is between 2 and 6. In some embodiments, for Formula (A-7), the sum of m³, m⁵and m⁷is between 2 and 10. In some embodiments, the sum of m′, m³, m⁵and m⁷is between 3 and 8. In some embodiments, for Formula (A-7), (m¹+m²+m³+m⁴+m⁵+m⁶+m′) is between 3 and 12. In some embodiments, (m¹+m²+m³+m⁴+m⁵+m⁶+m⁷) is between 4 and 10.

In some embodiments, for Formula (A-1) to (A-7), the first terminus comprises at least one beta-alanine moiety. In some embodiments, for Formula (A-1) to (A-7), the first terminus comprises at least two beta-alanine moieties. In some embodiments, for Formula (A-1) to (A-7), the first terminus comprises at least three or four beta-alanine moieties.

In some embodiments, the first terminus has the structure of Formula (A-8):

embedded image

or a salt thereof, wherein:

E is an end subunit which comprises a moiety chosen from a heterocyclic group or a straight chain aliphatic group, which is chemically linked to its single neighbor;

- W is C_1-6alkylene,

embedded image

- X^1′, Y^1′, and Z^1′ in each n¹unit are independently selected from CR⁴, N, NR⁵, O, or S;
- X^2′, Y^2′, and Z^2′ in each n³unit are independently selected from CR⁴, N, NR⁵, O, or S;
- X^3′, Y^3′, and Z^3′ in each n⁵unit are independently selected from CR⁴, N, NR⁵, O, or S;
- X^4′, Y^4′, and Z^4′ in each n⁶unit are independently selected from CR⁴, N, NR⁵, O, or S;
- X^5′, Y^5′, and Z^5′ in each n⁸unit are independently selected from CR⁴, N, NR⁵, O, or S;
- X^6′, Y^6′, and Z^6′ in each n¹⁰unit are independently selected from CR⁴, N, NR⁵, O, or S;
- each R⁴is independently H, —OH, halogen, C_1-6alkyl, C_1-6alkoxyl;
- each R⁵is independently H, C_1-6alkyl or C_1-6alkylaminen is an integer between 1 and 5;
- each n¹, n³, n⁵, n⁶, n⁸and n¹⁰are independently an integer between 0 and 5;
- each n², n⁴, n⁷and n⁹are independently an integer between 0 and 3, and
- n¹+n²+n³+n⁴+n⁵+n⁶+n⁷+n⁸+n⁹+n¹⁰is between 3 and 15.

In some embodiments, for Formula (A-8), the sum of n², n⁴, n⁷and n⁹is between 1 and 6. In some embodiments, for Formula (A-8), the sum of n², n⁴, n⁷and n⁹is between 2 and 6. In some embodiments, for Formula (A-8), the sum of n¹, n³, n⁶, n⁸and n¹⁰is between 3 and 13. In some embodiments, the sum of n¹, n³, n⁵, n⁶, n⁸and n¹⁰is between 4 and 10. In some embodiments, for Formula (A-8), (n¹+n²+n³+n⁴+n⁵+n⁶+n⁷+n⁸+n⁹+n¹⁰) is between 3 and 12. In some embodiments, (n¹+n²+n³+n⁴+n⁵+n⁶+n⁷+n⁸+n⁹+n¹⁰) is between 4 and 10.

In some embodiments, n¹is 3, and X^1′, Y^1′, and Z^1′ in the first unit is respectively CH, N(CH₃), and CH; X^1′, Y^1′, and Z^1′ in the second unit is respectively CH, N(CH₃), and N; and X^1′, Y^1′, and Z^1′ in the third unit is respectively CH, —N(CH₃), and N. In some embodiments, n³is 1, and X^2′, Y^2′, and Z^2′ in the first unit is respectively CH, N(CH₃), and CH. In some embodiments, n⁵is 2, and X^3′, Y^3′, and Z^3′ in the first unit is respectively CH, N(CH₃), and N; X^3′, Y^3′, and Z^3′ in the second unit is respectively CH, N(CH₃), and N. In some embodiments, n⁶is 2, and X^4′, Y^4′, and Z^rin the first unit is respectively CH, N(CH₃), and N; X^4′, Y^4′, and Z^4′ in the second unit is respectively CH, N(CH₃), and N. In some embodiments, the X^1′, Y^1′, and Z^1′ in each n¹unit are independently selected from CH, N, or N(CH₃). In some embodiments, the X^2′, Y^2′, and Z^2′ in each n³unit are independently selected from CH, N, or N(CH₃). In some embodiments, the X^3′, Y^3′, and Z^3′ in each n⁵unit are independently selected from CH, N, or N(CH₃). In some embodiments, the X^4′, Y^4′, and Z^4′ in each n⁶unit are independently selected from CH, N, or N(CH₃). In some embodiments, the Y^5′, and in each Z⁶unit are independently selected from CH, N, or N(CH₃). In some embodiments, the X^6′, Y^6′, and Z^6′ in each n¹⁰unit are independently selected from CH, N, or N(CH₃). In some embodiments, each Z^1′ in each n¹unit is independently selected from CR⁴or NR⁵. In some embodiments, each Z^2′ in each n³unit is independently selected from CR⁴or NR⁵. In some embodiments, each Z^3′ in each n⁵unit is independently selected from CR⁴or NR⁵. In some embodiments, each Z^4′ in each n⁶unit is independently selected from CR⁴or NR⁵. In some embodiments, each Z^5′ in each n⁸unit is independently selected from CR⁴or NR⁵. In some embodiments, each Z^6′ in each n¹⁰unit is independently selected from CR⁴or NR⁵. In some embodiments. R⁴is H, CH₃, or OH. In some embodiments, R⁵is H or CH₃.

In some embodiments, the first terminus has the structure of Formula (A-9):

embedded image

- or a salt thereof, wherein:
- X^v, Y^r, and Z¹′ in each n¹unit are independently selected from CR′, N, NW, 0, or S;
- X^2′, Y^2′, and Z^2′ in each n³unit are independently selected from CR⁴, N, NR⁵, O, or S;
- X^3′, Y^3′, and Z^3′ are independently selected from CR⁴, N, NR⁵, O, or S;
- X^4′, Y^4′, and Z^4′ in each n¹¹unit are independently selected from CR⁴, N, NR⁵, O, or S;
- Y^5′, and Z^5′ in each n⁸unit are independently selected from CR⁴, N, NR⁵, O, or S;
- X^6′, Y^6′, and Z^6′ in each n⁹unit are independently selected from CR⁴, N, NR⁵, O, or S;
- X^7′, Y^7′, and Z^7′ in each n¹¹unit are independently selected from CR⁴, N, NR⁵, O, or S;
- Y^8′, Y^8′, and Z^8′ are independently selected from CR⁴, N, NR⁵, O, or S;
- X^9′, Y^9′, and Z^9′ in each n¹¹unit are independently selected from CR⁴, N, NR⁵, O, or S;
- X^10′, Y^10′, and Z^10′ in each n¹⁶unit are independently selected from CR⁴, N, NR⁵, O, or S;
- each R⁴is independently H, —OH, halogen, C_1-6alkyl, C_1-6alkoxyl;
- each R⁵is independently H, C_1-6alkyl or C_1-6alkylamine;
- each n³, n⁶, n⁸, n⁹, n¹¹, n¹⁴, and n¹⁶are independently an integer between 0 and 5;
- each n², n⁴, n⁵, n⁷, n¹⁰, n¹³, and n¹⁵are independently an integer between 0 and 3,
- n¹+n²+n³+n⁴+n⁵+n⁶+n⁷+n⁸+n⁹+n¹⁰+n¹¹+n¹²+n¹³+n¹⁴+n¹⁵+n¹⁶is between 3 and 18
- or a salt thereof, wherein:
- L_ais selected from a divalent or trivalent group selected from the group consisting of

embedded image

a C_1-10alkylene, —NH—C_0-6alkylene-C(O)—, —N(CH₃)—C_0-6alkylene, and

embedded image

- each R^1aand R^1bare independently H, or an C_1-6alkyl;
- each m and n are independently an integer between 1 and 10;
- when L, is a trivalent group, the oligomeric backbone is attached to the first terminus through L, and each E_1a, E_2a, E_1b, and E_2bare end groups independently selected from the group consisting of optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, and optionally substituted amine;

when L, is a divalent group, the oligomeric backbone is attached to the first terminus through one of E_1a, E_2a, E_1b, and E_2b, and each E_1a, E_2a, E_1b, and E_2bare independently selected from the group consisting of a bond, a —C_1-6alkylene-, —NH—C_0-6alkylene-C(O)—, —N(CH₃)—C_0-6alkylene, —C(O)—, —C(O)—C_1-16alkylene, and —O—C_0-6alkylene, optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, and optionally substituted amine; or

- when L_ais a bivalent group, the oligomeric backbone is attached to the first terminus through a nitrogen or carbon atom on one of five-membered heteroaryl rings, and each E_1a, E_2a, E_1b, and E_2bare end groups independently selected from the group consisting of optionally substituted C_6-10aryl, optionally substituted 4-10 membered heterocyclyl, optionally substituted 5-10 membered heteroaryl, an optionally substituted C_1-6alkyl, and optionally substituted amine

In some embodiments, the first terminus comprises a polyamide having the structure of Formula (A-10):

embedded image

- wherein:
- each Y′, Y², Z′, and Z²are independently CR′, N, NR⁵, O, or S;
- each R⁴is independently H, —OH, halogen, C_1-6alkyl, or C_1-6alkoxyl;
- each R⁵is independently H, C_1-6alkyl, or C_1-6alkylamine;
- each W¹and W²are independently a bond, NH, a C_1-6alkylene, —NH—C_1-6alkylene, —NH-5-10 membered heteroarylene, —NH-5-10 membered heterocyclene, —N(CH₃)—C_0-6alkylene, —C(O)—, —C(O)—C_1-10alkylene, or —O—C_0-6alkylene; and
- n is an integer between 2 and 11.

In some embodiments, each R⁴is independently H, —OH, halogen, C_1-6alkyl, C_1-6alkoxyl; and each R²is independently H, C_1-6alkyl or C_1-6alkylamine. In some embodiments, each R⁴is selected from the group consisting of H, COH, Cl, NO, N-acetyl, benzyl, C_1-6alkyl, C_1-6alkoxyl, C_1-6alkenyl, C_1-6alkynyl, C_1-6alkylamine, —C(O)NH—(CH₂)_1-4—C(O)NH —(CH₂)_1-4—NR^aR^b; and each R^aand R^bare independently hydrogen or C_1-6alkyl.

In some embodiments, R⁵is independently selected from the group consisting of H, C_1-6alkyl, and C_1-6alkylNH₂, preferably H, methyl, or isopropyl.

In some embodiments, R⁴in Formula (A-7) to (A-8) is independently selected from H, OH, C_1-6alkyl, halogen, and C_1-6alkoxyl. In some embodiments, R⁴in Formula (A-7) to (A-8) is selected from H, OH, halogen, C_1-10alkyl, NO₂, CN, NR′R″, C_1-6haloalkyl, —C_1-6alkoxyl, C_1-6haloalkoxy, (C_1-6alkoxy)C_1-6alkyl, C_2-10alkenyl, C_2-10alkynyl, C_3-7carbocyclyl, 410 membered heterocyclyl, C_6-10aryl, 5-10 membered heteroaryl, —(C_3-7carbocyclyl)C_1-6alkyl, (4-10 membered heterocyclyl)C_1-6alkyl, (C_6-10aryl)C_1-6alkyl, (C_6-10aryl)C_1-6alkoxy, (5-10 membered heteroaryl)C_1-6alkyl, —(C_3-7carbocyclyl)-amine, (4-10 membered heterocyclyl)amine, (C_6-10aryl)amine, (5-10 membered heteroaryl)amine, acyl, C-carboxy, O-carboxy, C-amido, N-amido, S-sulfonamido, N-sulfonamido, —SR′, COOH, or CONR′R″; wherein each R′ and R″ are independently H, C_1-10alkyl, C_1-10haloalkyl, —C_1-10alkoxyl. In some embodiments. In some embodiments, R⁴in Formula (A-7) to (A-8) is selected from O, S, and N or a C_1-6alkylene, and the heteroarylene or the a C_1-6alkylene is optionally substituted with 1-3 substituents selected from OH, halogen, C_1-10alkyl, NO₂, CN, NR′R″, C_1-6haloalkyl, alkoxyl, C_1-6haloalkoxy. C_3-7carbocyclyl, 4-10 membered heterocyclyl. C_6-10aryl, 5-10 membered heteroaryl, —SR′, COOH, or CONR′R″, wherein each R′ and R″ are independently H, C_1-10alkyl, C_1-10haloalkyl, —C_1-10alkoxyl.

For the chemical Formula (A-1) to (A-9), each E, E₁and E₂independently are optionally substituted thiophene-containing moiety, optionally substituted pyrrole containing moiety, optionally substituted imidazole containing moiety, and optionally substituted amine. In some embodiments, each E, E₁and E₂are independently selected from the group consisting of N-methylpyrrole, N-methylimidazole, benzimidazole moiety, and 3-(dimethylamino)propanamidyl, each group optionally substituted by 1-3 substituents selected from the group consisting of H, OH, halogen, C_1-10alkyl, NO₂, CN, NR′R″, C_1-6haloalkyl, —C_1-6alkoxyl, haloalkoxy, (C_1-6alkoxy)C_1-6alkyl, C_2-10alkenyl, C_2-10alkynyl, C_3-7carbocyclyl, 4-10 membered heterocyclyl, C_6-10aryl, 5-10 membered heteroaryl, amine, acyl, C-carboxy, O-carboxy, C-amido, N-amido, S-sulfonamido, N-sulfonamido, COOH, or CONR′R″; wherein each R′ and R″ are independently H, C_1-10alkyl, C_1-10haloalkyl, alkoxyl. In some embodiments, each E₁and E₂independently comprises thiophene, benzothiophene, CC linked benzimidazole/thiophene-containing moiety, or CC linked hydroxybenzimidazole/thiophene-containing moiety, wherein each R′ and R″ are independently H, C_1-10alkyl, C_1-10haloalkyl, alkoxyl.

In some embodiments, each E, E₁or E₂are independently selected from the group consisting of isophthalic acid; phthalic acid; terephthalic acid; morpholine; N,N-dimethylbenzamide; N,N-bis(trifluoromethyl)benzamide; fluorobenzene; (trifluoromethyl)benzene; nitrobenzene; phenyl acetate; phenyl 2,2,2-trifluoroacetate; phenyl dihydrogen phosphate; 2H-pyran; 2H-thiopyran; benzoic acid; isonicotinic acid; and nicotinic acid; wherein one, two or three ring members in any of these end-group candidates can be independently substituted with C, N, S or O; and where any one, two, three, four or five of the hydrogens bound to the ring can be substituted with R₅, wherein R₅may be independently selected for any substitution from H, OH, halogen, C_1-10alkyl, NO₂, NH₂, O_1-10haloalkyl, —OC_1-10haloalkyl, COOH, CONR′R″; wherein each R′ and R″ are independently H, C_1-10alkyl, C_1-10haloalkyl, —C_1-10alkoxyl.

The DNA recognition or binding moiety can include one or more subunits selected from the group consisting of:

embedded image

—NH-benzopyrazinylene-CO—, —NH-phenylene-CO—, —NH-pyridinylene-CO—, —NH-piperidinylene-CO—, —NH-pyrimidinylene —CO—, —NH-anthracenylene-CO—, —NH-quinolinylene-CO—, and

embedded image

wherein Z is H, NH₂, C_1-6alkyl, or C_1-6alkylNH₂.

In some embodiments, Py is

embedded image

Im is

embedded image

Hp is

embedded image

Th is

embedded image

Pz is

embedded image

Nt is

embedded image

Tn is

embedded image

Nh is

embedded image

iNt is

embedded image

iIm is

embedded image

HpBi is

embedded image

ImBi is

embedded image

PyBi is

embedded image

Dp is

embedded image

—NH-benzopyrazinylene-CO— is

embedded image

—NH-phenylene-CO— is

embedded image

—NH-pyridinylene-CO— is

embedded image

—NH-piperidinylene-CO— is

embedded image

—NH-pyrazinylene-CO— is

embedded image

—NH-anthracenylene-CO— is

embedded image

and —NH-quinolinylene-CO— is

embedded image

In some embodiments, the first terminus comprises one or more subunits selected from the group consisting of optionally substituted N-methylpyrrole, optionally substituted N-methylimidazole, and β-alanine (β).

In some embodiments, the first terminus does not have a structure of

embedded image

The first terminus in the molecules described herein has a high binding affinity to a sequence having multiple repeats of GAA and binds to the target nucleotide repeats preferentially over other nucleotide repeats or nucleotide sequences. In some embodiments, the first terminus has a higher binding affinity to a sequence having multiple repeats of GAA than to a sequence having repeats of CGG. In some embodiments, the first terminus has a higher binding affinity to a sequence having multiple repeats of GAA than to a sequence having repeats of CCG. In some embodiments, the first terminus has a higher binding affinity to a sequence having multiple repeats of GAA than to a sequence having repeats of CCTG. In some embodiments, the first terminus has a higher binding affinity to a sequence having multiple repeats of GAA than to a sequence having repeats of TGGAA. In some embodiments, the first terminus has a higher binding affinity to a sequence having multiple repeats of GAA than to a sequence having repeats of GGGGCC. In some embodiments, the first terminus has a higher binding affinity to a sequence having multiple repeats of GAA than to a sequence having repeats of CAG. In some embodiments, the first terminus has a higher binding affinity to a sequence having multiple repeats of GAA than to a sequence having repeats of CTG.

Due to the preferential binding between the first terminus and the target nucleotide repeat, the transcription modulation molecules described herein become localized around regions having multiple repeats of GAA. In some embodiments, the local concentration of the first terminus or the molecules described herein is higher near a sequence having multiple repeats of GAA than near a sequence having repeats of CGG. In some embodiments, the local concentration of the first terminus or the molecules described herein is higher near a sequence having multiple repeats of GAA than near a sequence having repeats of CCG. In some embodiments, the local concentration of the first terminus or the molecules described herein is higher near a sequence having multiple repeats of GAA than near a sequence having repeats of CCTG. In some embodiments, the local concentration of the first terminus or the molecules described herein is higher near a sequence having multiple repeats of GAA than near a sequence having repeats of TGGAA. In some embodiments, the local concentration of the first terminus or the molecules described herein is higher near a sequence having multiple repeats of GAA than near a sequence having repeats of GGGGCC. In some embodiments, the local concentration of the first terminus or the molecules described herein is higher near a sequence having multiple repeats of GAA than near a sequence having repeats of CTG. In some embodiments, the local concentration of the first terminus or the molecules described herein is higher near a sequence having multiple repeats of GAA than near a sequence having repeats of CAG.

The first terminus is localized to a sequence having multiple repeats of GAA and binds to the target nucleotide repeats preferentially over other nucleotide repeats. In some embodiments, the sequence has at least 2, 3, 4, 5, 8, 10, 12, 15, 20, 25, 30, 40, 50, 100, 200, 300, 400, or 500 repeats of GAA. In certain embodiments, the sequence comprises at least 1000 nucleotide repeats of GAA. In certain embodiments, the sequence comprises at least 500 nucleotide repeats of GAA. In certain embodiments, the sequence comprises at least 200 nucleotide repeats of GAA. In certain embodiments, the sequence comprises at least 100 nucleotide repeats of GAA. In certain embodiments, the sequence comprises at least 50 nucleotide repeats of GAA. In certain embodiments, the sequence comprises at least 20 nucleotide repeats of GAA.

In one aspect, the compounds of the present disclosure can bind to the repeated GAA of frit than to GAA elsewhere in the subject's DNA.

The polyamide composed of a pre-selected combination of subunits can selectively bind to the DNA in the minor groove. In their hairpin structure, antiparallel side-by-side pairings of two aromatic amino acids bind to DNA sequences, with a polyamide ring packed specifically against each DNA base. N-Methylpyrrole (Py) favors T, A, and C bases, excluding G; N-methylimidazole (Im) is a G-reader; and 3-hydroxyl-N-methylpyrrol (Hp) is specific for thymine base. The nucleotide base pairs can be recognized using different pairings of the amino acid subunits using the paring principle shown in Table 1A and 13 below. For example, an Im/Py pairing reads G° C. by symmetry, a Py/Im pairing reads C.G, an Hp/Py pairing can distinguish T.A from A.T, G.C, and C.G, and a Py/Py pairing nonspecifically discriminates both A.T and T.A from G.C and C.G.

In some embodiments, the first terminus comprises Im corresponding to the nucleotide G; Py or beta corresponding to the nucleotide A; Py corresponding to the nucleotide A, wherein Im is N-alkyl imidazole, Py is N-alkyl pyrrole, and beta is β-alanine. In some embodiments, the first terminus comprises Im/Py to correspond to the nucleotide pair G/C, Py/beta or Py/Py to correspond to the nucleotide pair A/T, and wherein Im is N-alkyl imidazole (e.g, N-methyl imidazole), Py is N-alkyl pyrrole (e.g., N-methyl pyrrole), and beta is 3-alanine.

TABLE 1A

Base paring for single amino acid subunit (Favored (+), disfavored (−))

Subunit
G
C
A
T

Py
−
+
+
+

Im
+
−
−

embedded image

−
−
−
+

Hp

embedded image

−
−
+
+

(Th),

embedded image

−
−
+
+

(Pz),

embedded image

−
−
+
+

(Tp),

embedded image

+
−
−
−

(Nt)

embedded image

−
−
−
+

(Ht),

embedded image

+
−
−
−

(iPTA)

embedded image

−
−
−
+

(“CTh”);

embedded image

−
+
+
+

PEG

embedded image

+
−
−
−

iIm

embedded image

+
−
−
−

Ip

embedded image

−
−
−
+

Hz

embedded image

−
−
−
+

Bi

embedded image

−
−
−
−

(gly)

embedded image

−
−
+
+

(β)

embedded image

−
−
+ (as a part of the turn)
+ (as a part of the turn)

(gAB)

embedded image

−
+
−
−

(Alx)

embedded image

−
−
+
+

(Da)

embedded image

−
−
+
+

(Dp)

embedded image

−
−
+
+

(iPP)

embedded image

+
+
−
−

(CTh)

embedded image

−
−
+
+

(Dab)

embedded image

−
−
+
+

(gAH)

embedded image

WW* (bind to two nucleotides with same selectivity as Hp-Py)

πpBi

embedded image

WW* (bind to two nucleotides with same selectivity as Py-Py)

PyBi

embedded image

GW* (bind to two nucleotides with same selectivity as Im-Py)

ImBi

*The subunit HpBi, ImBi, and PyBi function as a conjugate of two monomer subunits and bind to two nucleotides. The binding property of HpBi, ImBi, and PyBi corresponds to Hp-Py, Im-Py, and Py-Py respectively.

TABLE 1B

Base paring for hairpin polyamide

G•C
C•G
T•A
A•T

Im/β
+
−
−
−

β/Im
−
+
−
−

Py/β
−
−
+
+

β/Py
−
−
+
+

β/β
−
−
+
+

Py/Py
−
−
+
+

Im/Im
−
−
−
−

Im/Py
+
−
−
−

Py/Im
−
+
−
−

Th/Py
−
−
+
−

Py/Th
−
−
−
+

Th/Im
+
−
−
−

Im/Th
−
+
−
−

β/Th
−
−
+
−

Th/β
−
−
−
+

Hp/Py,
−
−
+
−

Py/Hp,
−
−
−
+

Hp/Im
+
−
−
−

Im/Hp
−
+
−
−

Tn/Py
−
−
+
+

Py/Tn,
−
−
+
+

Ht/Py,
−
−
+
+

Py/Ht,
−
−
+
+

Bi/Py,
−
−
+
+

Py/Bi,
−
−
+
+

β/Bi
−
−
+
+

Bi/β
−
−
+
+

Bi/Im,
−
+
−
−

Im/Bi,
+
−
−
−

Tp/Py,
−
−
+
+

Py/Tp,
−
−
+
+

β/Tp
−
−
+
+

Tp/β
−
−
+
+

Tp/Im,
−
+
−
−

Im/Tp
+
−
−
−

Tp/Tp
−
−
+
+

Tp/Tn
−
−
+
+

Tn/Tp
−
−
+
+

Hz/Py,
−
−
+
−

Py/Hz,
−
−
−
+

Ip/Py
+
−
−
−

Py/Ip,
−
+
−
−

Bi/Hz,
−
−
−
+

Hz/Bi,
−
−
+
+

Bi/Bi
−
+
+
+

Th/Py,
−
−
+
+

Py/Th
−
−
+
+

Im/gAB
+
−
−
−

gAB/Im
−
+
−
−

Py/gAB
+
−
−
−

gAB/Py
−
+
−
−

gAB/β
−
−
+
+

β/gAB
−
−
+
+

Im/Dp
+
−
−
−

Dp/Im
−
+
−
−

Py/Dp
−
−
+
+

Dp/Py
−
−
+
+

Dp/β
−
−
+
+

Each of HpBi, ImBi, and PyBi can bind to two nucleotides and have binding properties corresponding to Hp-Py, Im-Py, and Py-Py respectively. HpBi, ImBi, and PyBi can be paired with two monomer subunits or with themselves in a hairpin structure to bind to two nucleotide pairs.

The monomer subunits of the polyamide can be strung together based on the paring principles shown in Table 1A and Table 1B. The monomer subunits of the polyamide can be strung together based on the paring principles shown in Table 1C and Table 1D.

Table 1C shows an example of the monomer subunits that can bind to the specific nucleotide. The first terminus can include a polyamide described having several monomer subunits stung together, with a monomer subunit selected from each row. For example, the polyamide can include Im-β-Py that binds to GAA, with Im selected from the first G column, β from the A column, and Py from the second A column. The polyamide can be any combinations that bind to the subunits of GAA, with a subunit selected from each column in Table 1C, wherein the subunits are strung together following the GAA order.

In addition, the polyamide can also include a partial or multiple sets of the five subunits, such as 1.5, 2, 2.5, 3, 3.5, or 4 sets of the three subunits. The polyamide can include 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, and 16 monomer subunits. The multiple sets can be joined together by W. In addition to the five subunits or ten subunits, the polyamide can also include 1-4 additional subunits that can link multiple sets of the five subunits.

The polyamide can include monomer subunits that bind to 2, 3, 4, or 5 nucleotides of GAA. For example, the polyamide can bind to GA, AA, GAA, AAG, AGA, GAAG, AAA, GAAGA or GAAGAA.

The polyamide can include monomer subunits that bind to 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides of GAA repeats. The nucleotides can be joined by W.

The monomer subunit, when positioned as a terminal unit, does not have an amine or a carboxylic acid group at the terminal. The amine or carboxylic acid group in the terminal is replaced by a hydrogen. For example, Py, when used as a terminal unit, is understood to have the structure of

embedded image

and Im, when positioned as a terminal unit, is understood to have the structure of

embedded image

In addition, when Py or Im is used as a terminal unit, Py and Im can be respectively replaced by PyT

embedded image

and ImT

embedded image

The linear polyamide can have nonlimiting examples including but not limited β-Py-Im, Im-Py-β-Im-Py-β-Im-Py, Im-Py-β-Im-Py-Py-Im-β, Im-Py-Py-Im-Py-β-Im-β, and any combinations thereof.

TABLE 1C

Examples of monomer subunits in a

linear polyamide that binds to GAA.

Nucleotide
G
A
A

Subunit that
Im or ImT
Py
Py

selectively binds to
iIm or iImT
Th
Th

nucleotide
PEG
Pz
Pz

CTh
Tp
Tp

Nt
PEG
PEG

iPTA
β
β

Ip
iPP
iPP

CTh
Da
Da

Dp
Dp

Dab
Dab

gAH
gAH

The DNA-binding moiety can also include a hairpin polyamide having subunits that are strung together based on the pairing principle shown in Table 1B. Table 1D shows some examples of the monomer subunit pairs that selectively bind to the nucleotide pair. The hairpin polyamide can include 2n monomer subunits (n is an integer in the range of 2-8), and the polyamide also includes a W in the center of the 2n monomer subunits. W can be —(CH₂)_a—NR¹—(CH₂)_b—, —(CH₂)_a—, —(CH₂)_a—O—(CH₂)_b—, —(CH₂)a-CH(NHR¹)—, —(CR²R³)_a— or —(CH₂)_a—CH(NR¹₃)⁺—(CH₂)_b—, wherein each a is independently an integer between 2 and 4; R¹is H, an optionally substituted C_1-6alkyl, an optionally substituted C_3-10cycloalkyl, an optionally substituted C_6-10aryl, an optionally substituted 4-10 membered heterocyclyl, or an optionally substituted 5-10 membered heteroaryl; each R²and R³are independently H, halogen, OH, NHAc, or C_1-4alky. In some embodiments, W is —(CH₂)—CH(NH₃)⁺—(CH₂)— or —(CH₂)—CH₂CH(NH₃)⁺—. In some embodiments, R¹is H. In some embodiments, R¹is C_1-6alkyl optionally substituted by 1-3 substituents selected from —C(O)-phenyl. In some embodiments, W is (CR²R³)—(CH₂)a- or —(CH₂)_a—(CR²R³)—(CH₂)_b—, wherein each a is independently 1-3, b is 0-3, and each R²and R³are independently H, halogen, OH, NHAc, or C_1-4alky. W can be an aliphatic amino acid residue shown in Table 4 such as gAB.

When n is 2, the polyamide includes 4 monomer subunits, and the polyamide also includes a W joining the first set of two subunits with the second set of two subunits, Q1-Q2-W-Q3-Q4, and Q1/Q4 correspond to a first nucleotide pair on the DNA double strand, Q2/Q3 correspond to a second nucleotide pair, and the first and the second nucleotide pair is a part of the GAA repeat. When n is 3, the polyamide includes 6 monomer subunits, and the polyamide also includes a W joining the first set of three subunits with the second set of three subunits, Q1-Q2-Q3-W-Q4-Q5-Q6, and Q1/Q6 correspond to a first nucleotide pair on the DNA double strand, Q2/Q5 correspond to a second nucleotide pair, Q3/Q4 correspond to a third nucleotide pair, and the first and the second nucleotide pair is a part of the A repeat. When n is 4, the polyamide includes 8 monomer subunits, and the polyamide also includes a W joining the first set of four subunits with the second set of four subunits, Q1-Q2-Q3-Q4-W-Q5-Q6-Q7-Q8, and Q1/Q8 correspond to a first nucleotide pair on the DNA double strand, Q2/Q7 correspond to a second nucleotide pair, Q3/Q6 correspond to a third nucleotide pair, and Q4/Q5 correspond to a fourth nucleotide pair on the DNA double strand. When n is 5, the polyamide includes 10 monomer subunits, and the polyamide also includes a W joining a first set of five subunits with a second set of five subunits, Q1-Q2-Q3-Q4-Q5-W-Q6-Q7-Q8-Q9-Q10, and Q1/Q10, Q2/Q9, Q3/Q8, Q4/Q7, Q5/Q6 respectively correspond to the first to the fifth nucleotide pair on the DNA double strand. When n is 6, the polyamide includes 12 monomer subunits, and the polyamide also includes a W joining a first set of six subunits with a second set of six subunits, Q1-Q2-Q3-Q4-Q5-Q6-W-Q7-Q8-Q9-Q10-Q11-Q12, and Q1/Q12, Q2/Q11, Q3/Q10, Q4/Q9, Q5/Q8, Q6/Q7 respectively correspond to the first to the six nucleotide pair on the DNA double strand. When n is 8, the polyamide includes 16 monomer subunits, and the polyamide also includes a W joining a first set of eight subunits with a second set of eight subunits, Q1-Q2-Q3-Q4-Q5-Q6-Q7-Q8-W-Q9-Q10-QI I-Q12-Q13-Q14-Q15-Q16, and QI/Q16, Q2/Q15, Q3/Q14, Q4/Q13, Q5/Q12, Q6/Q11, Q7/Q10, and Q8/Q9 respectively correspond to the first to the eight nucleotide pair on the DNA double strand. In some hairpin polyamide structures, the number of monomer subunits on each side of W can be different, and one side of the hairpin can partial pair with the other side of the hairpin to bind the nucleotide pairs on a double strand DNA based on the binding principle in Table 1B and 1D, while the rest of the unpaired monomer subunit(s) can bind to the nucleotide based on the binding principle in Table 1A and 1C but does not pair with the monomer subunit on the other side. The hairpin polyamide can have one or more overhanging monomer subunit that binds to the nucleotide but does not pair with the monomer subunit on the antiparrallel strandFor example, the hairpin structure can include 5 monomer subunits on one side of W and 4 monomer subunits on the other side of W, Q1-Q2-Q3-Q4-Q5-W-Q6-Q7-Q8-Q9, and Q2/Q9, Q3/Q8, Q4/Q7, Q5/Q6 respectively correspond to the first to the fourth nucleotide pair on the DNA double strand, and Q1 binds to a single nucleotide but does not pair with a monomer subunit on the other strand to bind with a nucleotide pair. W can be an aliphatic amino acid residue such as gAB or other appropriate spacers as shown in Table 4. In some instances, when W is gAB, it favors binding to T.

Because the target gene can include multiple repeats of GAA, the subunits can be strung together to bind at least two, three, four, five, six, seven, eight, nine, or ten nucleotides in one or more GAA repeat (e.g., GAAGAAGAAGAA). For example, the polyamide can bind to the GAA repeat by binding to a partial copy, a full copy, or a multiple repeats of GAA such as GA, AA, GAA, AAG, AGA, GAAG, RAGA, GAAGA or GAAGAA. For example, the polyamide can include Im-Py-β-W-Py-β-Py that binds to GAA and its complementary nucleotides on a double strand DNA, in which the Im/Py pair binds to the G.C., the Py/(3 pair binds to A.T, and the β/Py pair binds to G.A. In another example Im-Py-β-Im-Wβ-Py-β-Py that binds to GAAG and its complementary nucleotides on a double strand DNA, in which the Im/Py pair binds to the G.C, the Py/β pair binds to A.T, the β/Py pair binds to GA, and the Im/β pair binds to the G.C, W can be an aliphatic amino acid residue such as gAB or other appropriate spacers as shown in Table 4. In another example, Im-Py-β-Im-gAB-Im-Py binds to with a part of the complementary nucletides (ACG) on the double strand DNA, in which Im binds to G, Py binds to A, β/Py binds to the A.T, Im/Im binds to G.C.

Some additional examples of the polyamide include but are not limited to Im-Py-Py-Im-gAB-Py-Im-Im-Py; Im-Py-Py-Im-gAB-Py-Im-Im-PyT; Im-Py-Py-Im-gAB-Py-Im-Im-β; Im-Py-Py-Im-gAB-Py-Im Im-β-G; Im-β-Py-Im-gAB-Py-Im-Im-β; Im-β-Py-Im-gAB-Py-Im-Im-β-G; Im-β-Py-Im-gAB-Py-Im-Im-Py; Im-β-Py-Im-gAB-Py-Im-Im-PyT; Py-Py-Im-β-gAB-Im-Py-Im-Im; Py-Py-Im-β-gAB-Im-Py-Im-ImT; Py-Py-Im-Py-gAB-Im-Py-Im-Im; Py-Py-Im-Py-gAB-Im-Py-Im-ImT; Py-Py-Im-β-gAB-Im-β-Im-Im; Py-Py-Im-β-gAB-Im-β-Im-ImT; Py-Py-Im-Py-gAB-Im-β-Im-Im; Py-Py-Im-Py-gAB-Im-β-Im-ImT; Im-β-Py-gAB-Im-Im-Py; Im-β-Py-gAB-Im-Im-PyT; Im-β-Py-gAB-Im-Im-β; Im-β-Py-gAB-Im-Im-β-G; Im-Py-Py-gAB-Im-Im-β; Im-Py-Py-gAB-Im-Im-O-G; Im-Py-Py-gAB-Im-Im-Py; Im-Py-Py-gAB-Im-Im-PyT; Im-β-Py-gAB-Im-Im-Py; and Im-β-Py-gAB-Im-Im-PyT; wherein G may be hydrogen, alkyl, alkenyl, alkynyl, or —C(O)—R_B; and R_Bmay be a hydrogen, C₁-C₆alkyl, C₁-C₆alkenyl, or C₁-C₆alkynyl group. In some embodiments, the hairpin polyamide has a structure of Im-Py-β-Im-gAB-Im-Py; Im-Py-β-Im-gAB-Im-Py-β-Im; Py-β-Im-gAB-Im-Py-β-Im; or β-Im-gAB-Im-Py-β-Im.

TABLE 1D

Examples of monomer pairs in a hairpin

or H-pin polyamide that binds to GAA.

Nucleotide
G•C
A•T
A•T

Subunit pairs that
Im/β
Py/β
Py/β

selectively binds to
Im/Py
β/Py
β/Py

nucleotide
Th/Im
β/β
β/β

Hp/Im
Py/Py
Py/Py

Im/Bi
Py/Th
Py/Th

Im/Tp
Th/β
Th/β

Ip/Py
Py/Hp,
Py/Hp,

Im/gAB
Tn/Py
Tn/Py

Py/gAB
Py/Tn,
Py/Tn,

Im/Dp
Ht/Py,
Ht/Py,

Py/Ht,
Py/Ht,

Bi/Py,
Bi/Py,

Py/Bi,
Py/Bi,

β/Bi
β/Bi

Bi/β
Bi/β

Tp/Py,
Tp/Py,

Py/Tp,
Py/Tp,

β/Tp
β/Tp

Tp/β
Tp/β

Tp/Tp
Tp/Tp

Tp/Tn
Tp/Tn

Tn/Tp
Tn/Tp

Py/Hz,
Py/Hz,

Bi/Hz,
Bi/Hz,

Hz/Bi,
Hz/Bi,

Bi/Bi
Bi/Bi

Th/Py,
Ht/Py,

Py/Th
Py/Th

gAB/β
gAB/β

β/gAB
β/gAB

Py/Dp
Py/Dp

Dp/Py
Dp/Py

Dp/β
Dp/β

Recognition of a nucleotide repeat or DNA sequence by two antiparallel polyamide strands depends on a code of side-by-side aromatic amino acid pairs in the minor groove, usually oriented N to C with respect to the 5′ to 3′ direction of the DNA helix. Enhanced affinity and specificity of polyamide nucleotide binding is accomplished by covalently linking the antiparallel strands. The “hairpin motif” connects the N and C termini of the two strands with a W (e.g., gamma-aminobutyric acid unit (gamma-turn)) to form a folded linear chain. The “H-pin motif” connects the antiparallel strands across a central or near central ring/ring pairs by a short, flexible bridge.

The DNA-binding moiety can also include a H-pin polyamide having subunits that are strung together based on the pairing principles shown in Table 1A and/or Table 1B. Table 1C shows some examples of the monomer subunit that selectively binds to the nucleotide, and Table 1D shows some examples of the monomer subunit pairs that selectively bind to the nucleotide pair. The h-pin polyamide can include 2 strands and each strand can have a number of monomer subunits (each strand can include 2-8 monomer subunits), and the polyamide also includes a bridge L₁to connect the two strands in the center or near the center of each strand. At least one or two of the monomer subunits on each strand are paired with the corresponding monomer subunits on the other stand following the paring principle in Table 1D to favor binding of either GC or CG, AT, or TA pair, and these monomer subunit pairs are often positioned in the center, close to center region, at or close to the bridge that connects the two strands. In some instances, the H-pin polyamide can have all of the monomer subunits be paired with the corresponding monomer subunits on the antiparallel strand based on the paring principle in Table 1B and 1D to bind to the nucleotide pairs on the double strand DNA. In some instances, the H-pin polyamide can have a part of the monomer subunits (2, 3, 4, 5, or 6) be paired with the corresponding monomer subunits on the antiparallel strand based on the binding principle in Table 1B and 1D to bind to the nucleotide pairs on the double strand DNA, while the rest of the monomer subunit binds to the nucleotide based on the binding principle in Table 1A and 1C but does not pair with the mononer subunit on the antiparallel strand. The h-pin polyamide can have one or more overhanging monomer subunit that binds to the nucleotide but does not pair with the nomoner subunit on the antiparrallel strand.

Another polyamide structure that derives from the h-pin structure is to connect the two antiparallel strands at the end through a bridge, while only the two mononer subunits that are connected by the bridge form a pair that bind to the nucleotide pair G-C or CG based on the binding principle in Table 1B/1D, but the rest of the monomer subunits on the strand form an overhang, bind to the nucleotide based on the binding principle in Table 1A and/or 1C and do not pair with the monomer subunit on the other strand.

The bridge can be is a bivalent or trivalent group selected from

embedded image

a C_1-10alkylene, —NH—C_0-6alkylene-C(O)—, —N(CH₃)—C_0-6alkylene, and

embedded image

—(CH₂)_a—NR¹—(CH₂)_b—, —(CH₂)_a—, —(CH₂)_a—O—(CH₂)_b—, —(CH₂)_a—CH(NHR¹)—, —(CH₂)_a—CH(NHR¹)—, (CR²R³)_a— or —(CH₂)_a—CH(NR¹₃)⁺—(CH₂)_b—, wherein m is an integer in the range of 0 to 10; n is an integer in the range of 0 to 10; each a is independently an integer between 2 and 4; R¹is H, an optionally substituted C_1-6alkyl, an optionally substituted C_3-10cycloalkyl, an optionally substituted C_6-10aryl, an optionally substituted 4-10 membered heterocyclyl, or an optionally substituted 5-10 membered heteroaryl; each R²and R³are independently H, halogen, OH, NHAc, or C_1-4alky. In some embodiments, W is —(CH₂)—CH(NH₃)⁺—(CH₂)— or —(CH₂)—CF₂CH(NH₃)⁺—. In some embodiments, R¹is H. In some embodiments, R¹is C_1-6alkyl optionally substituted by 1-3 substituents selected from —C(O)-phenyl. In some embodiments, L₁is (CR²R³)—(CH₂)_a— or (CH₂)_a—(CR²R³)—(CH₂)_b—, wherein each a is independently 1-3, b is 0-3, and each R²and R³are independently 1-1, halogen, OH, NHAc, or C_1-4alky. L₁can be a C_2-9alkylene or (PEG)_2-8.

When n is 3, the polyamide includes 6 monomer subunits, and the polyamide also includes a bridge L₁joining the first set of three subunits with the second set of three subunits, and Q¹-Q2-Q3 can be joined to Q4-Q5-Q6 through L₁at the center Q2 and Q5, and QI/Q4 correspond to a first nucleotide pair on the DNA double strand, Q2/Q5 correspond to a second nucleotide pair, Q3/Q6 correspond to a third nucleotide pair. When n is 4, the polyamide includes 8 monomer subunits, and the polyamide also includes a bridge L₁joining the first set of four subunits with the second set of four subunits, Q1-Q2-Q3-Q4 can be joined to Q5-Q6-Q7-Q8 through L₁at Q2 and Q6 Q2 and Q7, Q3 and Q6, or Q3 and Q7 positions; Q1/Q5 may correspond to a nucleotide pair on the DNA double strand, and Q3/Q8 may correspond to another nucleotide pair; or Q1 and Q8 form overhangs on each strand, or Q and Q5 form overhangs on each strand. When n is 5, the polyamide includes 10 monomer subunits, and the polyamide also includes a bridge L₁joining a first set of five subunits with a second set of five subunits, and Q¹-Q2-Q3-Q4-Q5 can be joined to QC-Q7-Q8-Q9-Q10 through a bridge L₁at non-terminal positions (any position except for Q1, Q5, Q6 and Q10); if the two strands are linked at Q3 and Q8 by the bridge, QI/QC, Q2/Q7, Q3/Q8, Q4/Q9, and Q5/Q1.0 can be paired to bind to the nucleotide pairs; if the two strands are linked at Q2 and Q9 by the bridge, then Q1/Q8, Q3/Q10 can be paired to bind to the nucleotide pairs, Q4 and Q5 form an overhang on one strand and Q6 and Q7 form an overhang on the other strand.

In some embodiments, the monomer subunit at the central or near the central (n/2, (n±1)/2) on one strand is paired with the corresponding one on the other strand to bind to the nucleotide pairs on the double stranded DNA. In some embodiments, the monomer subunit at the central or near the central (n/2, (n±1)/2) on one strand is connected with the corresponding one on the other strand through a bridge L₁.

When n is 4, the polyamide includes 8 monomer subunits, and the polyamide also includes a bridge L₁joining the first set of four subunits with the second set of four subunits, Q1-Q2-Q3-Q4 can be joined to Q5-Q6-Q7-Q8 at the end Q4 and Q5 through L₁, while Q4/Q5 can be paired to bind to the nucleotide pairs, Q1-Q2-Q3 form an overhang on one strand and Q6-Q7-Q8 form an overhang on the other strand.

Some additional examples of the polyamide include but are not limited to Im-Py-Py-Im (Linked in the middle either position 2 or 3) to Py-Py-Py-Py, Im-Py-Py-Im (Linked in the middle position 3 py and Py) to Im-Py-β-Py-Py, Im-Py-β-Im (linked to the bolded position) Im-Py; Im-Pyβ-Im (linked in the middle, either position 2 or 3) Im-Py-b-Im; Py-β-Im (linked to the middle position bolded) Im-Py-β-Im; or β-Im (linked at bolded position) Im-Py-β-Im.

Second Terminus Regulatory protein binding moiety

In certain embodiments, the regulatory molecule is chosen from a nucleosome remodeling factor (NURF), a bromodomain PHD finger transcription factor (BPTF), a ten-eleven translocation enzyme (TET), methylcytosine dioxygenase (TET1), a DNA demethylase, a helicase, an acetyltransferase, and a histone deacetylase (“HDAC”).

The binding affinity between the regulatory protein and the second terminus can be adjusted based on the composition of the molecule or type of protein. In some embodiments, the second terminus binds the regulatory molecule with an affinity of less than about 600 nM, about 500 nM, about 400 nM, about 300 nM, about 250 nM, about 200 nM, about 150 nM, about 100 nM, or about 50 nM. In some embodiments, the second terminus binds the regulatory molecule with an affinity of less than about 300 nM. In some embodiments, the second terminus binds the regulatory molecule with an affinity of less than about 200 nM. In some embodiments, the polyamide is capable of binding the DNA with an affinity of greater than about 200 nM, about 150 nM, about 100 nM, about 50 nM, about 10 nM, or about 1 nM. In some embodiments, the polyamide is capable of binding the DNA with an affinity in the range of about 1-600 nM, 10-500 nM, 20-500 nM, 50-400 nM, 100-300 nM, or 50-200 nM.

In some embodiments, the second terminus comprises one or more optionally substituted C_6-10aryl, optionally substituted C_4-10carbocyclic, optionally substituted 4 to 10 membered heterocyclic, or optionally substituted 5 to 10 membered heteroaryl.

In some embodiments, the protein-binding moiety binds to the regulatory molecule that is selected from the group consisting of a CREB binding protein (CBP), a P300, an O-linked β-N-acetylglucosamine-transferase- (OGT-), a P300-CBP-associated-factor- (PCAF-), histone methyltransferase, histone demethylase, chromodomain, a cyclin-dependent-kinase-9- (CDK9-), a nucleosome-remodeling-factor- (NURF-), a bromodomain-PHD-finger-transcription-factor- (BPIF-1, a ten-eleven-translocation-enzyme- (TET-), a methylcytosine-dioxygenase- (TET1-), histone acetyltransferase (HAT), a histone deacetalyse (HDAC), a host-cell-factor-1 (HCF1-), an octamer-binding-transcription-factor- (OCT1), a P-TEFb-, a cyclist T1-, a PRC2-, a DNA-demethylase, a helicase, an acetyltransferase, a histone-deacetylase, methylated histone lysine protein.

In some embodiments, the second terminus comprises a moiety that binds to an O-linked β-N-acetylglucosamine-transferase (OGT), or CREB binding protein (CBP). In some embodiments, the protein binding moiety is a residue of a compound that binds to an O-linked β-N-acetylglucosamine-transferase (OGT), or CREB binding protein (CBP).

In some embodiments, the second terminus does not comprises 0.101, iBET762, OTXOIS, RVX208, or AU 1. In some embodiments, the second terminus does not comprises JQ1. In some embodiments, the second terminus does not comprises a moiety that binds to a bromodomain protein.

In some embodiments, the second terminus comprises a diazine or diazepine ring, wherein the diazine or diazepine ring is fused with a C_6-10aryl or a 5-10 membered heteroaryl ring comprising one or more heteroatom selected from S, N and O.

In some embodiments, the second terminus comprises an optionally substituted bicyclic or tricyclic structure. In some embodiments, the optionally substituted bicyclic or tricyclic structure comprises a diazepine ring fused with a thiophene ring.

In some embodiments, the second terminus does not comprise an optionally substituted bicyclic structure, wherein the bicyclic structure comprises a diazepine ring fused with a thiophene ring.

In some embodiments, the second terminus does not comprise an optionally substituted tricyclic structure, wherein the tricyclic structure is a diazepine ring that is fused with a thiophene and a triazole.

In some embodiments, the second terminus does not comprise an optionally substituted diazine ring.

In some embodiments, the second terminus does not comprise a structure of Formula (C-1.1):

embedded image

- wherein:
- each of A^1pand B^1pis independently an optionally substituted aryl or heteroaryl ring;
- X^1pis CH or N;
- R^1pis hydrogen, halogen, or an optionally substituted C_1-6alkyl group; and
- R^2pis an optionally substituted C_1-6alkyl, cycloalkyl, C_6-10aryl, or heteroaryl.

In some embodiments, X^1pis N. In some embodiments, A^1pis an aryl or heteroaryl substituted with one or more substituents. In some embodiments, A^1pis an aryl or heteroaryl substituted with one or more substituents selected from halogen, C_1-6alkyl, hydroxyl, C_1-6alkoxy, and C_1-6haloalkyl. In some embodiments, B^1pis an optionally substituted aryl or heteroaryl substituted with one or more substituents selected from halogen, C_1-6hydroxyl, C_1-6alkoxy, and C_1-6haloalkyl.

In some embodiments, A^1pis an optionally substituted thiophene or phenyl. In some embodiments, A^1pis a thiophene or phenyl, each substituted with one or more substituents selected from halogen, C_1-6alkyl, hydroxyl, C_1-6alkoxy, and C_1-6haloalkyl. In some embodiments, B^1pis an optionally substituted triazole. In some embodiments, B^1pis a triazole substituted with one or more substituents selected from halogen, C_1-6alkyl, hydroxyl, C_1-6alkoxy, and C_1-6haloalkyl.

In some embodiments, the protein binding moiety is not

embedded image

In some embodiments, the protein binding moiety is not

embedded image

In some embodiments, the protein binding moiety does not have the structure of Formula (C-12):

embedded image

- wherein:
- R_1qis a hydrogen or an optionally substituted alkyl, hydroxyalkyl, aminoalkyl, alkoxyalkyl, halogenated alkyl, hydroxyl, alkoxy, or —COOR_4q;
- R_4qis hydrogen, or an optionally substituted aryl, aralkyl, cycloalkyl, heteroaryl, heteroaralkyl, heterocycloalkyl, alkyl, alkenyl, alkynyl, or cycloalkylalkyl group, optionally containing one or more heteroatoms;
- R_2qis an optionally substituted aryl, alkyl, cycloalkyl, or aralkyl group;
- R_3qis hydrogen, halogen, or an optionally substituted alkyl group, preferably (CH₂)_x—C(O)N(R₂₀)(R₂₁), or (CH₂)_x—N(R₂₀)—C(O)R₂₁; or halogenated alkyl group;
- wherein x is an integer from 1 to 10; and R₂₀and R₂₁are each independently hydrogen or C₁-C₆alkyl group, preferably R₂₀is hydrogen and R₂₁is methyl; and
- Ring E is an optionally substituted aryl or heteroaryl group.

The protein binding moiety can include a residue of a compound that binds to a regulatory protein. In some embodiments, the protein binding moiety can be a residue of a compound shown in Table 2.

Exemplary residues include, but are not limited to, amides, carboxylic acid esters, thioesters, primary amines, and secondary amines of any of the compounds shown in Table 2.

TABLE 2

A list of compounds that bind to regulatory proteins.

Target

protein
Compound

p300/CBP HAT (histone acetyl- transferase)

embedded image

Lys-CoA

p300/CBP HAT (histone acetyl- transferase)

embedded image

CH₃CO—ARTKQTARKSTGGKAPPXQLH3—CoA-20

p300/CBP HAT (histone acetyl- transferase)

embedded image

anacardic acid (AA)

p300/CBP HAT (histone acetyl- transferase)

embedded image

curcumin

p300/CBP HAT (histone acetyl- transferase)

embedded image

MB-3

p300/CBP HAT (histone acetyl- transferase)

embedded image

isothiazolones

X = H, Cl

R = NO₂, Cl, CF₃, OCH₃, COOC₂H₅

p300/CBP HAT (histone acetyl- transferase)

embedded image

garcinol

p300/CBP HAT (histone acetyl- transferase)

embedded image

MC1823 (4)

p300/CBP HAT (histone acetyl- transferase)

embedded image

MC1626 (R = CH₃)

MC1752 (R = H)

p300/CBP HAT (histone acetyl- transferase)

embedded image

1
(R = OC₂H₅; R¹= CH₃)

2
(R = OH; R¹= CH₃)

3
(R = OC₂H₅; R¹= C₅H₁₁)

5
(R = OC₂H₅; R¹= C₁₀H₂₁)

6
(R = OH; R¹= C₁₀H₂₁)

7
(R = OC₂H₅; R¹= C₁₅H₃₁)

8
(R = OH; R¹= C₁₅H₃₁)

p300/CBP HAT (histone acetyl- transferase

embedded image

CBP30

Time (min)

embedded image

p300/CBP HAT (histone acetyl- transferase)

embedded image

p300/GBP HAT (histone acetyl- transferase)

embedded image

p300/CBP HAT (histone acetyl- transferase)

embedded image

Ph

Me

i-Pr

p300/CBP HAT (histone acetyl- transferase)

embedded image

p300/CBP HAT (histone acetyl- transferase)

embedded image

R

H

3-Me

2-CH₂NH₂

see above

p300/CBP HAT (histone acetyl- transferase)

embedded image

Ph

i-Pr

i-Pr

embedded image

p300/CBP HAT (histone acetyl- transferase)

embedded image

X = Cl, (R,R)-31

X = Br, (R,R)-32

embedded image

X = Cl, (S,S)-31

X = Br, (S,S)-32

p300/CBP HAT (histone acetyl- transferase)

embedded image

p300/CBP HAT (histone acetyl- transferase)

embedded image

Garcinol

C646

p300/CBP HAT (histone acetyl- transferase)

embedded image

p300/CBP HAT (histone acetyl- transferase)

embedded image

3a R = H

3b R = Me

embedded image

p300/CBP HAT (histone acetyl- transferase)

embedded image

*stereochemistry
R1
R2

R,S
H
H

R,S
CN
H

R,S
H
CN

R,S
CONH₂
H

R,S
H
CONH₂

R,S
OMe
H

R,S

text missing or illegible when filed

H

R,S
cyclopropyl
H

R,S

text missing or illegible when filed

H

S

text missing or illegible when filed

H

S
NHCONHMe
H

p300/CBP HAT (histone acetyl- transferase)

embedded image

p300/CBP HAT (histone acetyl- transferase)

embedded image

compd
R1
R2
X

22
Me
cyclopropyl
H

23
CF₃
cyclopropyl
F

24
Me
CF₃
F

p300/CBP HAT

embedded image

R¹
R²

Cl

embedded image

OGT

R₁is H or C_1-6alkyl;

R₂is H or C_1-6alkyl

R₃is H or C_1-6alkyl

embedded image

OGT

LFA-1/ 1CAM-1

embedded image

Methyl- lysine binding/ L3MBTL1

embedded image

8: R = H

9: R = Me

embedded image

Methyl- lysine binding/ L3MBTL3

embedded image

UNC1021

UNC928

Methyl- lysine binding/ L3MBTL3

embedded image

UNC1215

UNC1879

UNC2533

Methyl- lysine binding/ L3MBTL3

embedded image

UNC2170

UNC2892

15: R = I

16: R = i-Pr

17: R = CF₃

Methyl- lysine binding/ L3MBTL3

embedded image

A366

YX-11-102

Chromo- domain

embedded image

Ac-FALKme3S-NH2

embedded image

18

Chromo- domain

embedded image

Chromo- domain

embedded image

MS37452 (MS452)

embedded image

MS351

Chromo- domain

embedded image

22: R = Me

23: R = Et

24: R = i-Pr

Chromo- domain

embedded image

25: n = 1

26: n = 2

embedded image

27: n = 1

28: n = 2

embedded image

Chromo- domain

embedded image

IS19

CF1

CF2

CF4

CF16

CF18

MM-401

34

Chromo- domain/ CBX7

embedded image

Chromo- domain

embedded image

EED226

A-395

Chromo- domain

embedded image

36

Chromo- domain

embedded image

UNC5114

UNC5115

UNC3866

Methyl
DOT1L
EPZ004777 (ref. 21), EPZ-5676 (ref. 24),

transferase

SG00946 (ref. 86)

EZH2
GSK126 (ref. 37), GSK343 (refs 87, 88),

EPz005687 (ref. 38), EPZ-6438 (ref. 44), EI1 (ref.

39), UNC1999 (ref. 89)

G9A
BIX01294 (ref. 90), UNC0321 (ref. 91), UNC0638

(ref. 92), NC0642 (ref. 88), BRD4770 (ref. 93)

PRMT3
14u (ref. 94)

PRMT4 (CARM1)
17b (Bristol-Myers Squibb) (refs 95, 96),

MethylGene (ref. 97)

Methyl
BAZ2B
GSK2801 (ref. 88)

transferase
Chromodomains

L3MBTL1
UNC669 (ref. 100)

L3MBTL3
UNC1215 (ref. 101)

Histone demethylases

LSD1
Tranylcypromine (ref. 62), ORY-1001 (ref. 63)

Methyl transferase

embedded image

EPZ004777

Br-SAH

Methyl transferase

embedded image

Hybrid

DZNep

Methyl transferase

embedded image

Methyl transferase

embedded image

Tranylcypromine

embedded image

Oryzon LSD1 inhibitor

Chormo- domain
a) embedded image

UNC3866

b)

embedded image

UNC3866-PEGA

embedded image

UNC4990

UNC4991

Chormo- domain
a) embedded image

R1:

R2:

R3:

R4:

R5:

R6:

Chormo- domain

embedded image

5 redundant hits

embedded image

4 redundant hits

embedded image

4 redundant hits

Chormo- domain
a) embedded image

UNC4797

UNC4980

UNC4981

UNC4982

Chormo- domain

embedded image

Chormo- domain

embedded image

R =

Chormo- domain

embedded image

NR₃⁻

Chormo- domain
a) embedded image

UNC3086

b embedded image

UNC3567 (1)

Chormo- domain

embedded image

UNC4219 (3)

Chormo- domain
c embedded image

UNC4195 (4)

Methyl lysine binding domain

embedded image

Methyl lysine binding domain

embedded image

UNC1215

UNC2533 (1)

embedded image

UNC669

UNC1079

UNC1215

Methyl lysine binding domain

embedded image

Methyl lysine binding domain

embedded image

R
R′
R″

embedded image

Methyl lysine binding domain

embedded image

Disulfiram

embedded image

Phenothiazine

embedded image

Amiodarone HCl

embedded image

Tegaserod maleate

Methyl lysine binding domain

embedded image

Benzbromarone

embedded image

Dronedarone

embedded image

Desethylamiodarone

embedded image

-desethylamiodarone

Methyl lysine binding domain

embedded image

WAG-003 (n = 2, trimethyl)

embedded image

WAG-004 (n = 2, dimethyl)

embedded image

WAG-005 (n = 3, trimethyl)

embedded image

WAG-006 (n = 3, dimethyl)

Methyl lysine binding domain

embedded image

IS1

IS2

IS3

IS5

IS12

IS15

Methyl lysine binding domain

embedded image

group 1-3

(b) group 4

Methyl lysine binding domain

embedded image

MM-102

MM-401

OICR-9429

WDRS-47

Methyl lysine binding domain

embedded image

Methyl lysine binding domain

embedded image

33
R = 4-fluoro

34
R = 4-methoxyl

35
R = 3,4-dimethoxyl

36
R = 2,4,6-trimethyl

Methyl lysine binding domain

embedded image

R₁
R₂
R₃

—NH₂
—H
—H

3-COOH—Ph
—H
—H

4-COOH—Ph
—H
—H

4-CN—Ph
—H
—H

—Ph
—H
—H

4-F—Ph
—H
—H

4-Pyridyl
—H
—H

5-Pyrimidyl
—H
—H

4-NO₂—Ph
—H
—H

4-NH₂—Ph
—H
—H

—Ph
—NO₂
—H

—NO₂
—NO₂
—H

—H
—H
4-COOH—Ph

—H
—H
4-Pyridyl

—H
—H
4-NO₂—Ph

—H
—H
4-NH₂—Ph

—NO₂
—H
—H

Methyl lysine binding domain

embedded image

37a
R = 4-fluoro-2-chloro-3-methyl

37
R = 4-fluoro-2-chloro-3-methyl

38a
R = 3-methoxyl

38
R = 3-methoxyl

39a
R = 2,4-difluoro

39
R = 2,4-difluoro

40a
R = 2-chloro

40
R = 2-chloro

Methyl lysine binding domain

embedded image

X
R₄

—NHSO₂—
4-fluoro

—NHSO₂—
4-methoxyl

—NHSO₂—
3,4-dimethoxyl

—NHSO₂—
2,4,6-trimethyl

—CONH—
4-fluoro-2-chloro-3-methyl

—CONH—
3-methoxyl

—CONH—
2,4-difluoro

—CONH—
2-chloro

—NHCO—
4-fluoro-2-chloro-3-methyl

Methyl lysine binding domain

embedded image

R = —CH₃

R = —Ph

R = —CH₂CH₃

R = —CH(CH₃)₂

R = —CH₂CH₂CH₃

R = —CH₂NH-Boc

R = —CH(CH₃)NH-Boc

R = —CH₂CH₂NH-Boc

R = —C(CH₃)₂NH-Boc

embedded image

R = —(CH₂)₃NH-Boc

R = —CH₂CH(CH₃)₂

embedded image

Methyl lysine binding domain

embedded image

R₁
R₂

—Ph
—H

4-Pyridyl
—H

4-NH₂—Ph
—H

—Ph
—NO₂

4-NO₂—Ph
—NHCOCH₃

4-Pyridyl
—NO₂

4-COOCH₃—Ph
—NO₂

—Ph
—NH₂

4-Pyridyl
—NH₂

4-COOCH₃—Ph
—NH₂

4-NH₂—Ph
—NHCOCH₃

4-Pyridyl
—NHCOCH₃

4-NO₂—Ph
—NO₂

4-NH₂—Ph
—NH₂

Methyl lysine binding domain

embedded image

R₁
R₂

4-NO₂—Ph
4-F-3-NO₂

4-NO₂—Ph
3-NO₂

4-NH₂—Ph
4-F-3-NH₂

4-NH₂—Ph
3-NH₂

4-Pyridyl
4-F-3-NO₂

4-Pyridyl
4-F-3-NH₂

Methyl lysine binding domain

embedded image

R

—NHCOCH₂CH₂NH₂

—NHCOCH₂CH₂NHBoc

—NHCOCH(i-Pro)NH₂

—NHCOCH(i-Pro)NHBoc

embedded image

—NHCO(CH₂)₃NH₂

—NHCO(CH₂)₃NHBoc

—NHCOCH₂CH(CH3)₂

embedded image

—NHCOCH₃

—NHCOPh

—NHCOCH₂CH₃

—NHCOCH(CH₃)₂

—NHCOCH₂CH₂CH₃

—NHCOCH₂NH₂

—NHCOCH₂NHBoc

—NHCOCH(CH₃)NH₂

—NHCOCH(CH₃)NHBoc

Methyl lysine binding domain

embedded image

WDR5-0101

WDR5-0102

WDR5-0103

Methyl lysine binding domain

embedded image

Methyl lysine binding domain

embedded image

R

2-CF₃, 5-F

2-CF₃, 4-OH

2-Cl, 4-CF₃

2-Cl, 5-CF₃

2-Cl, 5-Me

2-Cl, 6-F

3-CF₃, 4-OMe

3-Me, 5-Me

3-Me, 5-CF₃

3-F, 5-CF₃

3-Cl, 5-Cl

3-OH, 5-CF₃

2-F, 5-SO₂NH₂

2-F, 3-F, 5-OH

2-F, 3-Cl, 5-CF₃

2-Cl, 3-Me, 6-F

2-F, 3-Me, 4-F

2-Me, 3-F, 5-F

3-Me, 4-F, 5-Me

2-F, 3-Me, 4-F, 5-Me, 6-F

Methyl lysine binding domain

embedded image

Methyl lysine binding domain

embedded image

R

NO₂

embedded image

Methyl lysine binding domain

embedded image

X = N, R¹= Me, R²= H, n = 1

X = N, R¹= Me, R²= Me, n = 1

X = N, R¹= Me, R²= H, n = 2

X = O, R²= H, n = 1

X = CH₂, R²= H, n = 1

X = N, R¹= Et, R²= H, n = 1

text missing or illegible when filed

X = CH, R¹= NMe₂, R²= H, n = 0

text missing or illegible when filed

X = CH, R¹= NMe₂, R²= H, n = 1

text missing or illegible when filed

X = N, R¹= Boc, R²= H, n = 1

text missing or illegible when filed

X = N, R¹= H, R²= H, n = 1

text missing or illegible when filed

X = CH, R¹= NHBoc, R²= H, n = 0

text missing or illegible when filed

X = CH, R¹= NH₂, R²= H, n = 0

text missing or illegible when filed

X = CH, R¹= NHBoc, R²= H, n = 1

text missing or illegible when filed

X = CH, R¹= NH₂, R²= H, n = 1

text missing or illegible when filed

X = NMe, R¹= Me, R²= H, n = 1

Methyl lysine binding domain

embedded image

R¹(2° amine)

1-methylpiperazine

F

1,2-dimethylpiperazine

1-methyl-1,4-diazepane

morpholine

piperidine

1-ethylpiperazine

N^1,1-dimethylpyrrolidin-3-amine

N^1,1-dimethylpiperidin-4-amine

piperazine

pyrrolidin-3-amine

piperidin-4-amine

N^1,1,2-trimethylethan-1,2-diamine

Methyl lysine binding domain

embedded image

R¹= Me

R¹= 3-Cl—Ph

R¹= 3-Me—Ph

R¹= 2-Cl, 3-Me—Ph

R¹= 3-OH—Ph

R¹= 3-OMe—Ph

R¹= 4-F—Ph

R¹= 2-Cl, 4-F—Ph

R¹= 3-Me, 4-F—Ph

46 R¹= 3-OMe, 4-F—Ph

47 R¹= 2-Cl, 3-Me, 4-F—Ph

48 R¹= phenyl

49 R¹=

50 R¹= 1-naphthyl

51 R¹= 5-quinolyl

52 R¹=

53 R¹= 3-pyridyl

54 R¹= 2-furanyl

R¹

2-Cl-phenyl

Me

3-Cl-phenyl

3-Me-phenyl

2-Cl-3-Me-phenyl

3-OH-phenyl

3-OMe-phenyl

4-F-phenyl

2-Cl-4-F-phenyl

3-Me-4-F-phenyl

3-OMe-4-F-phenyl

2-Cl-3-Me-4-F-phenyl

phenyl

cyclohexyl

1-naphthyl

5-quinolyl

benzyl

3-pyridyl

2-furanyl

Methyl lysine binding domain

embedded image

R¹= NO₂

R¹= NH₂

R¹= CO₂Me

R¹= CO₂H

R¹= CF₃

R¹= Br

R¹= cyclopropyl

R¹= 2-furanyl

R¹= 4-pyridyl

R¹

NO₂

CO₂Me

CF₃

Br

NH₂

CO₂H

cyclopropyl

2-furanyl

4-pyridyl

Methyl lysine binding domain

embedded image

CDK2

CDK1, 2, or 4

embedded image

CDK2, CDK1, or CDK5

embedded image

CDK2, CDK4, CDK5, CDK1, CDK7

embedded image

CDK2, CDK1, CDK4

embedded image

CDK2, CDK4, CDK5, or CDK1

embedded image

CDK2, CDK5, or CDK7

embedded image

CDK2 or CDK4

embedded image

CDK2

CDK2 or CDK1

embedded image

CDK1, CDK2, CDK4 or CDK9

embedded image

CDK2

CDK1 or CDK2

embedded image

CDK5 or GSK3beta

embedded image

CDK1, CDK5, or GSK3 alpha/ beta

embedded image

CDK4 or FLT3

embedded image

CDK8

CDK8 or CDK19

embedded image

CDK8

CDK8 or CDK19

embedded image

CDK9

CDK7/9

CDK9

CDK12/13

CDK12

CDK12/2

CDK1/2/ 5/9 (Dina- ciclib)

embedded image

CDK9/4/ 1/2/6 (P276-00)

embedded image

CDK9 (voru- ciclib)

embedded image

CDK1/2/ 4/5/9 (AT- 7519M)

embedded image

CDK9/2/7/ GSK3alpha (SNS-032)

embedded image

CDK2

SCH 727965

CDK1/2/4

embedded image

CDK1/2/ 7/9

embedded image

CDK1/2/ 4/7/9

embedded image

CDK12/13 (THZ531)

embedded image

CDK9/2/7/ GSK3alpha

embedded image

CDK2 (rosco- vitine)

embedded image

CDK2 (NU2058)

embedded image

CDK2 (R457)

embedded image

CDK2 (Flavo- piridol)

embedded image

Flavopiridol

CDK1/2/4/ 5/7/9 (R547)

embedded image

H3K4 lysine methyl- transferase KMT7 (PFI-2)

embedded image

H3K4 lysine methyl- transferase KMT7 (cypro- hepata- diene)

embedded image

KDM1A/B (RN1)

embedded image

KDM1A (GSK- 2879552)

embedded image

KDM5 (CPI-455)

embedded image

KDM5 (KDM- C49)

embedded image

KDM5 (amio- darone)

embedded image

KDM5 (Di- sulfuram)

embedded image

EHMT2 aka G9a

embedded image

1

BIX-01294

embedded image

2

UNC0638

embedded image

12

(A-366)

EHMT2 aka G9a

embedded image

R¹
R²

embedded image

EHMT or GLP methyl- transferase

embedded image

(UNC0638)

G9a or HDAC

embedded image

R¹
R²

embedded image

SMYD2

LLY-507

SMYD2

DOT1L

embedded image

EPZ-5676

DOT1L

DOT1L

embedded image

(pinometostat)

PRMT5

embedded image

EPZ015666 (GSK3235025)

PRMT5

Pan-jmjC

embedded image

Methylstat

pan-jmjC

JMJD3/ UTX/ JARID

embedded image

GSK-J1

JMJD3/

UTX/

JARID

JARID

embedded image

KDM5-C49

JARID

LSD1

embedded image

ORY-1001

LSD1

LSD1

embedded image

OGT

TET1, TET2

embedded image

TET1

CBP BRD

R

1
2
3

A
CH₃
H

A
H
CH₃

B
CH₃
H

B
H
CH₃

A
H
(R)-CH₃

A
H
(S)-CH₃

B
H
(R)-CH₃

B
H
(S)-CH₃

C
H
(R)-CH₃

C
H
(S)-CH₃

embedded image

C

HDAC

embedded image

HDAC

HDAC1, HDAC2, HDAC3

embedded image

HDAC2, HDAC3

embedded image

HDAC1, HDAC3

embedded image

HDAC

HDAC1, HDAC2, HDAC3

embedded image

HDAC6, HDAC8

embedded image

HDAC6

HDAC

HDAC6

HDAC1, HDAC2, HDAC3, HDAC6

embedded image

HDAC4

HDAC6, HDAC8

embedded image

HDAC6

HDAC

HDAC6

HDAC1, HDAC6

embedded image

HDAC6, HDAC8

embedded image

HDAC1, HDAC6

embedded image

HDAC5, HDAC5, HDAC6, HDAC8

embedded image

HDAC6

HDAC1, HDAC6

embedded image

HDAC

HDAC1, HDAC2, HDAC3, HDAC5, HDAC 6

embedded image

HDAC1, HDAC6

embedded image

HDAC8, HDAC11

embedded image

HDAC8

HDAC1, HDAC6

embedded image

HDAC

HDAC1

HDAC1, HDAC2, HDAC3, HDAC6, HDAC8, HDAC10, HDAC11

embedded image

HDAC1, HDAC 2, HDAC3, HDAC6, HDAC8, HDAC10, HDAC11

embedded image

HDAC4, HDAC5, HDAC7, HDAC9

embedded image

HDAC4

HDAC5, HDAC8

embedded image

HDAC4, HDAC8

embedded image

HDAC

HDAC4

HDAC1, HDAC6, HDAC9

embedded image

HDAC2, HDAC6

embedded image

P300/CBP

p300, PCAF

embedded image

p300

HAT

Tip60

p300/CBP, PCAF, Tip60

embedded image

p300 activator

embedded image

PCAF

Tip60

PCAF

p300

p300, PCAF

embedded image

p300

p300/CBP

p300

p300/CBP

PCAF

GCN5

p300

Tip60

p300

Tip60

HDAC1, HDAC2, HDAC3, HDAC8

embedded image

HDAC1, HDAC2, HDAC3

embedded image

HDAC1, HDAC2, HDAC3, HDAC8

embedded image

HDAC1, HDAC2, HDAC3

embedded image

HDAC2, HDAC3

embedded image

CDK2

2: R = H

3: R = SO₂NH₂

CDK2

embedded image

CDK2, CDK7, CDK9

embedded image

CDK2

R¹
R²

embedded image

—

SO₂NH₂

H
—

H
H

H
SO₂NH₂

text missing or illegible when filed

SO₂NH₂

OEt
SO₂NH₂

embedded image

SO₂NH₂

H

CDK2

embedded image

R¹
R²

text missing or illegible when filed

SO₂NH₂

H

Et
SO₂NH₂

embedded image

SO₂NH₂

Ph
SO₂NH₂

embedded image

SO₂NH₂

SO₂NH₂

CDK2

embedded image

CDK2
Structure
R

embedded image

H SO₂NH₂

CDK

embedded image

PCAF BRD, L3MBTL3

embedded image

CBP/p300

PRMT5

HDAC

2- oxo- glutarate dependent KDM5 demethyl- ases

embedded image

CDK4, CDK6

embedded image

HDAC

Pan-HDAC

HDAC

HDAC1, HDAC3

embedded image

HDAC

Pan-HDAC

HDAC6

Class I HDAC

embedded image

Class IIa HDAC

embedded image

HDAC3

HDAC6

HDAC8

HDAC1, HDAC2

embedded image

HDAC1

HDAC

HDAC, PI3K

embedded image

HDAC, EGFR, HER2

embedded image

HDAC

HDAC1, HDAC6, ER

embedded image

Class I HDAC, ZEB1

embedded image

HDAC, Akt

HDAC

HDAC1

Class I HDACs

embedded image

HDAC6

HDAC3, HDAC6, HDAC8

embedded image

HDAC6

HDAC2

HDAC4

HDAC1, HDAC2

embedded image

Pan-HDAC

HDAC4

HDAC6

G9a, GLP

SMYD2

EZH2

DOT1L

PRMT5

Pan-jmjC

JARID

JMJD3, UTX, JARID

embedded image

LSD1

L3MBTL1- MBT

embedded image

CBX7

53BP1

JARID1A- PHD3

embedded image

Pygo-PHD

WDR5- MML

CDK1, CDK2, CDK4, CDK5, CDK6, CDK7, CDK9

embedded image

CDK1, CDK2, CDK4, CDK6, CDK9

embedded image

CDK1, CDK2, CDK5, CDK7

embedded image

CDK1, CDK2, CDK5, CDK9

embedded image

CDK1, CDK2, CDK4, CDK5, CDK6, CDK7

embedded image

CDK1, CDK2, CDK4, CDK5, CDK7, CDK9

embedded image

CDK1, CDK2, CDK5, CDK7, CDK9

embedded image

CDK4, CDK6

embedded image

CDK1, CDK2, CDK4, CDK5

embedded image

CDK4, CDK6

embedded image

CDK1, CDK2, CDK5, CDK6, CDK7, CDK9

embedded image

CDK2, CDK4, CDK5, CDK6, CDK9

embedded image

CDK1, CDK2, CDK4, CDK7, CDK9

embedded image

CDK1, CDK2, CDK4, CDK5, CDK6, CDK9

embedded image

CDK4

CDK1, CDK4

embedded image

CDK4, CDK6

embedded image

CDK4

CDK2, CDK9

embedded image

CDK5

CDK8

CDK1, CDK2, CDK5, CDK7, CDK9

embedded image

CDKs

CDK1, CDK2, CDK5, CDK9

embedded image

CDK7

CDK2

CDK2, HDAC

embedded image

CDK3

CDK5

CDK4

CDK8

CDK4

CDK2, CDK9

embedded image

CDK9

CDK2, HDAC

embedded image

CDK7

CDK2, CDK9

embedded image

CDK1, CDK2, CDK5, CDK9

embedded image

CDK2, HDAC1

embedded image

CDK9

CDK, CDC7

CDK8, CDK19

embedded image

CDK8, CDK19, MAP4K2, YSK4

embedded image

CDK8, CDK19

embedded image

CDK4, CDK6

embedded image

CDK9, CK2, PIM1

embedded image

CDK1, CDK2, CDK5

embedded image

CDK1, CDK2, CDK3, CDK4, CDK6, CDK7, CDK9, HDAC

embedded image

CDK2

indicates data missing or illegible when filed

In some embodiments, the second terminus does not comprises JQ1, JQ-1, OTX015, RVX208 acid, or RVX208 hydroxyl.

In certain embodiments, the protein binding moiety is a residue of a compound having a structure of Formula (C-1):

embedded image

- wherein:
- X^ais NHC(O)—, —C(O)—NH—, —NHSO₂—, or —SO₂NH—;
- A^ais selected from an optionally substituted —C_1-12alkyl, optionally substituted —C_2-10alkenyl, optionally substituted —C_2-10alkynyl, optionally substituted —C_1-12alkoxyl, optionally substituted —C_1-12haloalkyl, optionally substituted C_6-10aryl, optionally substituted C_3-7cycloalkyl, optionally substituted 5- to 10 membered heteroaryl, and optionally substituted 5- to 10-membered heterocycloalkyl;
- X^bis a bond, NH, NH—C_1-10alkylene, C_1-12alkyl, NHC(O)—, or —C(O)—NH—;
- A^bis selected from an optionally substituted —C_1-12alkyl, optionally substituted —C_2-10alkenyl, optionally substituted —C_2-10alkynyl, optionally substituted —C_1-12alkoxyl, optionally substituted —C_1-12haloalkyl, optionally substituted C_0-10aryl, optionally substituted C_3-7cycloalkyl, optionally substituted 5- to 10 membered heteroaryl, and optionally substituted 4- to 10-membered heterocycloalkyl; and
- each R^1e, R^2e, R^3e, R^4eare independently selected from the group consisting of H, OH, —NO₂, halogen, amine, COOH, COOC_1-10alkyl, —NHC(O)-optionally substituted —C_1-12alkyl, —NHC(O)(CH₂)_1-4NR^fR^g, —NHC(O)(CH₂)_0-4CHR^f(NR^fR′^g), —NHC(O)(CH₂)_0-4CHR^fR^g, —NHC(O)(CH₂)_0-4—C_3-7cycloalkyl —NHC(O)(CH₂)_0-4-5- to 10-membered heterocycloalkyl, NHC(O)(CH₂)_0-4C_6-10aryl, —NHC(O)(CH₂)_0-4-5- to 10-membered heteroaryl, —(CH₂)_1-4—C_3-7cycloalkyl, —(CH₂)_1-4-5- to 10-membered heterocycloalkyl, —(CH₂)_1-4C_6-10aryl, —(CH₂)_1-4-5- to 10-membered heteroaryl, optionally substituted —C_2-10alkenyl, optionally substituted —C_2-10alkynyl, optionally substituted —C_1-12alkoxyl, optionally substituted —C_1-12haloalkyl, optionally substituted C_6-10aryl, optionally substituted C_3-7cycloalkyl, optionally substituted 5- to 10-membered heteroaryl, and optionally substituted 4- to 10-membered heterocycloalkyl, and
- wherein each R^fand R^gare independently H or C_1-6alkyl.

In certain embodiments, the protein binding moiety is a residue of a compound having a structure of Formula (C-2):

embedded image

wherein R^5eis independently selected from the group consisting of H, COOC_1-10alkyl, —NHC(O)-optionally substituted —C_1-12alkyl, optionally substituted —C_2-10alkenyl, optionally substituted —C_2-10alkynyl, optionally substituted —C_1-12alkoxyl, optionally substituted —C_1-12haloalkyl, optionally substituted C_6-10aryl, optionally substituted C_3-7cycloalkyl, optionally substituted 5- to 10-membered heteroaryl, and optionally substituted 5- to 10-membered heterocycloalkyl substituted —C_2-10alkenyl, optionally substituted —C_2-10alkenyl, optionally substituted —C_1-12alkoxyl, optionally substituted —C_1-12haloalkyl, optionally substituted C_6-10aryl, optionally substituted C_3-7cycloalkyl, optionally substituted 5- to 10-membered heteroaryl, and optionally substituted 5- to 10-membered heterocycloalkyl.

In certain embodiments, Aa is selected from an optionally substituted C_6-10aryl, optionally substituted C_3-7cycloalkyl, optionally substituted 5- to 10 membered heteroaryl, and optionally substituted 5- to 10-membered heterocycloalkyl. In certain embodiments, Ad is an optionally substituted C_6-10aryl.

In certain embodiments, the protein binding moiety is a residue of a compound having a structure of Formula (C-3):

embedded image

- wherein:
- M^1cis CR^2hor N, and
- each R^1h, R^2h, R^3h, R^4h, and R^5hare independently selected from the group consisting of H, OH, —NO₂, halogen, amine, COOK COOC_1-10alkyl, NHC(O)-optionally substituted —C_1-12alkyl, —NHC(O)(CH₂)_1-4NR^fR^g, —NHC(O)(CH₂)_0-4CHR^f(NR^fR^g), —NHC(O)(CH₂)_0-4CHR^fR^g, —NHC(O)(CH₂)_0-4—C_3-7cycloalkyl, —NHC(O)(CH₂)_0-4-5- to 10-membered heterocycloalkyl, NHC(O)(CH₂)_0-4C_6-10aryl, —NHC(O)(CH₂)_0-4-5- to 10-membered heteroaryl, —(CH₂)_1-4—C_3-7cycloalkyl, —(CH₂)_1-4-5- to 10-membered heterocycloalkyl, —(CH₂)_1-4C_6-10aryl, —(CH₂)_1-4-5- to 10-membered heteroaryl, optionally substituted —C_2-10alkenyl, optionally substituted —C_2-10alkynyl, optionally substituted —C_1-12alkoxyl, optionally substituted —C_1-12haloalkyl, optionally substituted C_6-10aryl, optionally substituted C_3-7cycloalkyl, optionally substituted 5- to 10-membered heteroaryl, and optionally substituted 5- to 10-membered heterocycloalkyl, wherein each R^fand R^gare independently H or C_1-6alkyl.

In certain embodiments, each R^1hand R^5hare independently hydrogen, halogen, or C_1-6alkyl. In certain embodiments, each R^2hand R^3hare independently H, OH, —NO₂, halogen, C_1-4haloalkyl, amine, COOH, COOC_1-10alkyl, —NHC(O)-optionally substituted —C_1-12alkyl, —NHC(O)(CH₂)_1-4NR^fR^g, —NHC(O)(CH₂)_0-4CHR′(NR′R″), —NHC(O)(CH₂)_0-4CHR^fR^g, —NHC(O)(CH₂)_0-4—C_3-7cycloalkyl, —NHC(O)(CH₂)_0-4-5- to 10-membered heterocycloalkyl, NHC(O)(CH₂)_0-4C_6-10aryl, —NHC(O)(CH₂)_0-4-5- to 10-membered heteroaryl, —(CH₂)_1-4—C_3-7cycloalkyl, —(CH₂)_1-4-5- to 10-membered heterocycloalkyl, —(CH₂)_1-4C_6-10aryl, —(CH₂)_1-4-5- to 10-membered heteroaryl, optionally substituted C_2-10alkenyl, optionally substituted C_2-10alkynyl, optionally substituted —C_1-12alkoxyl, optionally substituted C_6-10aryl, optionally substituted C_3-7cycloalkyl, optionally substituted 5- to 10-membered heteroaryl, and optionally substituted 5- to 10-membered heterocycloalkyl. In certain embodiments, R^b′, R^ae, and R^4′ are hydrogen.

In certain embodiments, R^2eis selected from the group consisting of H, OH, —NO₂, halogen, amine COOH, COOC_1-10alkyl, —NHC(O)-optionally substituted —C_1-12alkyl, —NHC(O)(CH₂)_1-4NR^fR^g, NHC(O)(CH₂)_0-4CHR^f(NR^fR^g), —NHC(O)(CH₂)_0-4CHR^fR^g, —NHC(O)(CH₂)_0-4—C_3-7cycloalkyl, —NHC(O)(CH₂)_0-4-5- to 10-membered heterocycloalkyl, NHC(O)(CH₂)_0-4C_6-10aryl, —NHC(O)(CH₂)_0-4-5- to 10-membered heteroaryl, —(CH₂)_1-4—C_3-7cycloalkyl, —(CH₂)_1-4-5- to 10-membered heterocycloalkyl, —(CH₂)_1-4C_6-10aryl, —(CH₂)_1-4-5- to 10-membered heteroaryl, optionally substituted —C_1-12alkyl, -optionally substituted —C_2-10alkenyl, optionally substituted —C_2-10alkynyl, optionally substituted —C_1-12alkoxyl, optionally substituted —C_1-12haloalkyl, optionally substituted C_6-10aryl, optionally substituted C_3-7cycloalkyl, optionally substituted 5- to 10-membered heteroaryl, and optionally substituted 5- to 10-membered heterocycloalkyl wherein each R^fand R^gare independently H or C_1-6alkyl.

In certain embodiments, R^2eis an phenyl or pyridinyl optionally substituted with 1-3 substituents, wherein the substituent is independently selected from the group consisting of OH, —NO₂, halogen, amine, COOH, COOC_1-10alkyl, —NHC(O) —C_1-12alkyl, —NHC(O)(CH₂)_1-4NR^fR^g, —NHC(O)(CH₂)_0-4CHR^f(NR^fR^g), —NHC(O)(CH₂)_0-4CHR^fR⁶, —NHC(O)(CH₂)_0-4—C_3-7cycloalkyl, —NHC(O)(CH₂)_0-4-5- to 10-membered heterocycloalkyl, NHC(O)(CH₂)_0-4C_6-10aryl, —NHC(O)(CH₂)_0-4-5- to 10-membered heteroaryl, —(CH₂)_1-4—C_3-7cycloalkyl, —(CH₂)_1-4-5- to 10-membered heterocycloalkyl, —(CH₂)_1-4C_6-10aryl, —(CH₂)_1-4-5- to 10-membered heteroaryl, —C_1-12alkoxyl, C_1-12haloalkyl, C_6-10aryl, C_3-7cycloalkyl, 5- to 10-membered heteroaryl, and 5- to 10-membered heterocycloalkyl, wherein each R^fand R^gare independently, H or C_1-6alkyl

In certain embodiments, A^ais a C_6-10aryl substituted with 1-4 substituents, and each substituent is independently selected from halogen, OH, NO₂, an optionally substituted —C_1-12alkyl, optionally substituted —C_2-10alkenyl, optionally substituted —C_2-40alkynyl, optionally substituted —C_1-12alkoxyl, optionally substituted —C_1-12haloalkyl, optionally substituted C_6-10aryl, optionally substituted C_3-7cycloalkyl, optionally substituted 5- to 10 membered heteroaryl, and optionally substituted 5- to 10-membered heterocycloalkyl.

In certain embodiments, the protein binding moiety is a residue of a compound having the structure of Formula (C-4):

embedded image

- wherein:
- R^1cis an optionally substituted C_6-10aryl or an optionally substituted 5- to 10-membered heteroaryl,
- X^cis —C(O)NH—, —C(O), —S(O₂)—, —NH—, or —C_1-4alkyl-NH,
- n is 0-10,
- R^2jis NR^3jR^4j, optionally substituted C_6-10aryl, optionally substituted C_3-7cycloalkyl, optionally substituted 5- to 10-membered heteroaryl, or optionally substituted 4- to 10-membered heterocycloalkyl; and
- each R^3jand R^4jare independently H or optionally substituted —C_1-12alkyl.

In some embodiments, R^2jis —NHC(CH₃)₃, or a 4- to 10-membered heterocycloalkyl substituted with C_1-12alkyl.

In certain embodiments, the protein binding moiety is a residue of a compound having the structure of Formula (C-5):

embedded image

- wherein:
- X^2cis a bond, C(O), SO₂, or CHR^3c;
- M^2cis CH or
- n is 0-10,
- R^2jis —NR^3jR^4j, optionally substituted C_6-10aryl, optionally substituted C_3-7cycloalkyl, optionally substituted 5- to 10-membered heteroaryl, or optionally substituted 4- to 10-membered heterocycloalkyl;

each R^5jis independently —NR^3jR^4j, —C(O)R^3j, —COOH, —C(O)NHC_1-6alkyl, an optionally substituted C_6-10aryl, or an optionally substituted 5- to 10-membered heteroaryl;

R^6jis —NR^3jR^4j, —C(O)R^3j, an optionally substituted C_6-10aryl, or an optionally substituted 5- to 10-membered heteroaryl; and

each R^3jand R^4jare independently H, an optionally substituted C_6-10aryl, optionally substituted 4- to 10-membered heterocycloalkyl, or optionally substituted —C_1-12alkyl.

In certain embodiments, R^2jis a 4- to 10-membered heterocycloalkyl substituted by a 4- to 10-membered heterocycloalkyl. In certain embodiments, R^6jis —C(O)^3j, and R^3jis a 4- to 10-membered heterocycloalkyl substituted by a 4- to 10-membered heterocycloalkyl. In certain embodiments, each lei is independently H, —COOH, —C(O)NHC_1-6alkyl, NH—C_6-10aryl, or optionally substituted C_6-10aryl

In certain embodiments, the protein binding moiety is a residue of a compound having the structure of Formula (C-6):

embedded image

- wherein:
- X^3cis a bond, NH, C_1-4alkylene, or NC_1-4alkyl;
- R^7jis an optionally substituted C_1-6alkyl, an optionally substituted cyclic amine, an optionally substituted aryl, an optionally substituted 5- to 10-membered heteroaryl, or optionally substituted 4- to 10-membered heterocycloalkyl,
- R^8jis H, halogen, or C_1-6alkyl; and
- R^9jis H, or C_1-6alkyl.

In certain embodiments, is an optionally substituted cyclic secondary or tertiary amine. In certain embodiments, R^7jis a tetrahydroisoquinoline optionally substituted with C_1-4alkyl.

In certain embodiments, the protein binding moiety is a residue of a compound having the structure of Formula (C-7):

embedded image

- wherein:
- A^1ais an optionally substituted aryl or heteroaryl;
- X²is a bond, (CH₂)_1-4, or NH; and
- A^2ais an optionally substituted aryl, heterocyclic, or heteroaryl, linked to an amide group.

In certain embodiments, A^1ais an aryl substituted with one or more halogen, C_1-6alkyl, hydroxyl, C_1-6alkoxy, or C_1-6haloalkyl. In certain embodiments, X′ is NH. In certain embodiments, A^2ais a heterocyclic group. In certain embodiments, A²a is a pyrrolidine. In certain embodiments, A^2aa is an optionally substituted phenyl. In certain embodiments, A²⁰is a phenyl optionally substituted with one or more halogen, C_1-6alkyl, hydroxyl, C_1-6alkoxy, or C_1-6haloalkyl.

In certain embodiments, the protein binding moiety is a residue of a compound having the structure of Formula (C-8):

embedded image

wherein R^1kis H or C_1-25alkyl and R^2kis OH or —OC_1-12alkyl.

In certain embodiments, the protein binding moiety is a residue of a compound having the structure of Formula (C-9):

embedded image

- wherein R_1mis H, OH, —CONH₂, —COOH, —NHC(O)—C_1-6alkyl, —NHC(O)O—C_1-6alkyl, —NHS(O)₂—C_1-6alkyl, —C_1-6alkyl, —C_1-6alkoxyl, or —NHC(O)NH—C_1-6alkyl;
- R_2mis H, CN, or CONH₂; and
- R_3mis an optionally substituted C_6-10aryl.

In certain embodiments, the protein binding moiety is a residue of a compound having the structure of Formula (C-10):

embedded image

- wherein R_1nis an optionally substituted C_6-10aryl or optionally substituted 5- to 10-membered heteroaryl, and
- each R_2nand R_3nare independently H, —C_1-4alkyl-C_6-10aryl, —C_1-4alkyl-5- to 10-membered heteroaryl, C_6-10aryl, or -5- to 10-membered heteroaryl, or
- R₂and R_3ntogether with N form an optionally substituted 4-10 membered heterocyclic or heteroaryl group.

In certain embodiments, the regulatory molecule is not a bromodomain-containing protein chosen from BRD2, BRD3, BRD4, and BRDT.

In certain embodiments, the regulatory molecule is BRD4. In certain embodiments, the recruiting moiety is a BRD4 activator. In certain embodiments, the BRD4 activator is chosen from JQ-1, OTX015, RVX208 acid, and RVX208 hydroxyl.

embedded image

In certain embodiments, the regulatory molecule is BPIF. In certain embodiments, the recruiting moiety is a BPIF activator. In certain embodiments, the BPIF activator is AU 1.

embedded image

In certain embodiments, the regulatory molecule is histone acetyltransferase (“HAT”), In certain embodiments, the recruiting moiety is a HAT activator. In certain embodiments, the HAT activator is a oxopiperazine helix mimetic OHM. In certain embodiments, the HAT activator is selected from OHM1, OHM2, OHM3, and OHM4 (BB Lao et al., PNAS USA 2014, 111(21), 7531-7536). In certain embodiments, the HAT activator is OHM4.

embedded image

In certain embodiments, the regulatory molecule is histone deacetylase (“HDAC”). In certain embodiments, the recruiting moiety is an HDAC activator. In certain embodiments, the HDAC activator is chosen from SAHA and 109 (Soragni E Front, Neurol. 2015, 6, 44, and references therein).

embedded image

In certain embodiments, the regulatory molecule is histone deacetylase (“HDAC”). In certain embodiments, the recruiting moiety is an HDAC inhibitor. In certain embodiments, the HDAC inhibitor is an inositol phosphate.

In certain embodiments, the regulatory molecules is O-linked β-N-acetylglucosamine transferase (“OGT”). In certain embodiments, the recruiting moiety is an OUT activator. In certain embodiments, the OGT activator is chosen from ST045849, ST078925, and ST060266 (Itkonen H M, “Inhibition of O-GlcNAc transferase activity reprograms prostate cancer cell metabolism”, Oncotarget 2016, 7(11), 12464-12476).

embedded image

In certain embodiments, the regulatory molecule is chosen from host cell factor 1 (“HCF1”) and octamer binding transcription factor (“OCT1”). In certain embodiments, the recruiting moiety is chosen from an HCF1 activator and an OCT1 activator. In certain embodiments, the recruiting moiety is chosen from VP16 and VP64.

In certain embodiments, the regulatory molecule is chosen from CBP and P300. In certain embodiments, the recruiting moiety is chosen from a CBP activator and a P300 activator. In certain embodiments, the recruiting moiety is CTPB.

embedded image

In certain embodiments, the regulatory molecule is P300/CBP-associated factor (“PCAF”). In certain embodiments, the recruiting moiety is a PCAF activator. In certain embodiments, the PCAF activator is embelin.

embedded image

In certain embodiments, the regulatory molecule modulates the rearrangement of histones.

In certain embodiments, the regulatory molecule modulates the glycosylation, phosphorylation, alkylation, or acylation of histones.

In certain embodiments, the regulatory molecule is a transcription factor.

In certain embodiments, the regulatory molecule is an RNA polymerase.

In certain embodiments, the regulatory molecule is a moiety that regulates the activity of RNA polymerase.

In certain embodiments, the regulatory molecule interacts with TATA binding protein.

In certain embodiments, the regulatory molecule interacts with transcription factor II D.

In certain embodiments, the regulatory molecule comprises a CDK9 subunit.

In certain embodiments, the regulatory molecule is P-TEFb.

In certain embodiments, X binds to the regulatory molecule but does not inhibit the activity of the regulatory molecule. In certain embodiments, X binds to the regulatory molecule and inhibits the activity of the regulatory molecule. In certain embodiments, X binds to the regulatory molecule and increases the activity of the regulatory molecule.

In certain embodiments, X binds to the active site of the regulatory molecule. In certain embodiments, X binds to a regulatory site of the regulatory molecule.

In certain embodiments, the recruiting moiety is chosen from a CDK-9 inhibitor, a cyclin Ti inhibitor, and a PRC2 inhibitor.

In certain embodiments, the recruiting moiety is a CDK-9 inhibitor. In certain embodiments, the CDK-9 inhibitor is chosen from flavopiridol, CR8, indirubin-3′-monoxime, a 5-fluoro-N2,N4-diphenylpyrimidine-2,4-diamine, a 4-(thiazol-5-O)-2-(phenylamino)pyrimidine, TG02, CDKI-73, a 2,4,5-trisubstituted pyrimidine derivatives, LCD000067, Wogonin, BAY-1000394 (Roniciclib), AZD5438, and DRB (F Morales et al. “Overview of CDK9 as a target in cancer research”, Cell Cycle 2016, 15(4), 519-527, and references therein).

embedded image

In certain embodiments, the regulatory molecule is a histone demethylase. In certain embodiments, the histone demethylase is a lysine demethylase. In certain embodiments, the lysine demethylase is KDM5B. In certain embodiments, the recruiting moiety is a KDMSB inhibitor. In certain embodiments, the KDM5B inhibitor is AS-8351 (N. Cao, Y. Huang, J. Zheng, et al., “Conversion of human fibroblasts into functional cardiomyocytes by small molecules”, Science 2016, 352(6290), 1216-1220, and references therein.)

embedded image

In certain embodiments, the regulatory molecule is the complex between the histone lysine methyltransferases (“HKMT”) GLP and G9A (“GLP/G9A”). In certain embodiments, the recruiting moiety is a GLP/G9A inhibitor. In certain embodiments, the GLP/G9A inhibitor is MX-01294 (Chang Y, “Structural basis for G9a-like protein lysine methyltransferase inhibition by BIX-01294”, Nature Struct. Mol. Biol. 2009, 16, 312-317, and references therein).

embedded image

In certain embodiments, the regulatory molecule is a DNA methyltransferase (“DNMT”). In certain embodiments, the regulatory moiety is DNMT1. In certain embodiments, the recruiting moiety is a DNMT1 inhibitor. In certain embodiments, the DNMT1 inhibitor is chosen from RG108 and the RG108 analogues 1149, T1, and G6. (B Zhu et al. Bioorg Med Chem 2015, 23(12), 2917-2927 and references therein).

embedded image

In certain embodiments, the recruiting moiety is a PRC1 inhibitor. In certain embodiments, the PRC1 inhibitor is chosen from UNC4991, UNC3866, and UNC3567 (J I Stuckey et al. Nature Chem Biol 2016, 12(3), 180-187 and references therein; K D Barnash et al. ACS Chem. Biol. 2016, 11(9), 2475-2483, and references therein).

embedded image

In certain embodiments, the recruiting moiety is a PRC2 inhibitor. In certain embodiments, the PRC2 inhibitor is chosen from A-395, MS37452, MAK683, DZNep, EPZ005687, EI1, GSK126, and UNC1999 (Konze K D ACS Chem Biol 2013, 8(6), 1324-1334, and references therein).

embedded image

In certain embodiments, the recruiting moiety is rohitukine or a derivative of rohitaine.

In certain embodiments, the recruiting moiety is DB08045 or a derivative of DB08045,

embedded image

In certain embodiments, the recruiting moiety is A-395 or a derivative of A-395.

In certain embodiments, the regulatory molecule is chosen from a bromodomain-containing protein, a nucleosome remodeling factor (MIRE), a bromodomain PHD finger transcription factor (BPIF), a ten-eleven translocation enzyme (TET), methylcytosine dioxygenase (TET1), a DNA demethylase, a helicase, an acetyltransferase, and a histone deacetylase (“HDAC”).

In certain embodiments, the regulatory molecule is a bromodomain-containing protein chosen from BRD2, BRD3, BRD4, and BRDT.

In certain embodiments, the regulatory molecule is BRD4. In certain embodiments, the recruiting moiety s a BRD4 activator. In certain embodiments, the BRD4 activator is chosen from JQ-1, OTX015, RVX208 acid, and RVX208 hydroxyl.

embedded image

In certain embodiments, the regulatory molecule is BPIF. In certain embodiments, the recruiting moiety is a BPTF activator. In certain embodiments, the BPTF activator is AU 1.

embedded image

In certain embodiments, the regulatory molecule is histone acetyltransferase (“HAT”). In certain embodiments, the recruiting moiety s a HAT activator. In certain embodiments, the HAT activator is a oxopiperazine helix mimetic OHM. In certain embodiments, the HAT activator is selected from OHM1, OHM2, OHM3, and OHM4 (BB Lao et al., PNAS USA 2014, 111(21), 7531-7536). In certain embodiments, the HAT activator is OHM4.

embedded image

In certain embodiments, the regulatory molecules is O-linked β-N-acetylglucosamine transferase (“OGT”), In certain embodiments, the recruiting moiety is an OGT activator. In certain embodiments, the OGT activator is chosen from ST045849, ST078925, and ST060266 (Itkonen H M, “Inhibition of O-GlcNAc transferase activity reprograms prostate cancer cell metabolism”, Oncotarget 2016, 7(11), 12464-12476),