The present invention is generally directed to a predictive tool for selectivity prediction to enhance target selectivity and, in certain embodiments, a predictive tool for isoform-selective anti-histone deacetylase activity.
Optimization of specificity is a fundamental problem in chemistry that is particularly acute in the development of therapeutics. The complexity of molecular recognition in biological systems severely limits the ability to hit a single therapeutic target, for example. Routinely, one has a potential drug that shows some adverse side effects due to off-target interactions. Alternatively, some drugs attempt to target molecules that undergo rapid mutation, necessitating the design of drugs that retain their efficacy against multiple mutant forms of the target. Thus, there exists an unmet need for methods that allow the researcher to select ligands with enhanced specificity for the target(s) while minimizing the affinity for off-target interactions.
Among the various aspects of the present invention is a predictive system and a methodology whereby available structural and activity information is integrated into joint, predictive three-dimensional-quantitative structure-activity relationship (3D-QSAR) models for target(s) and off-targets to allow iterative optimization of specificity for the target(s) and minimization of interaction with the off-targets.
Briefly, therefore, in one embodiment the present invention is directed to a computational method for selecting an effector having specificity for a target molecule. The method comprises compiling a database containing (i) three-dimensional structural data for members of a library of molecules each having a known chemical sequence comprising sequence elements, the library comprising the target molecule and other member molecules, (ii) structural data for members of a population of ligands each having a known chemical structure, and (iii) activity data quantifying an effect of ligand population members upon the activity of molecule library members wherein the ligands of the ligand-molecule pairs are selected from the ligand population members, the molecules of the ligand-molecule pairs are selected from the molecule library members and different ligand-molecule pairs in the set comprise a different ligand, a different molecule, or both a different ligand and a different molecule relative to other ligand-molecule pairs in the set, and wherein the activity data differs for different ligand-molecule pairs in the set. The computational method further comprises determining spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data. Equivalence of the sequence elements may then be based on the determined spatial orientations of the ligand population members in the ligand-molecule pairs for which the data comprises activity data and the sequence elements of different molecule library members may then be labeled to reflect said equivalence. The computational method further comprises calculating, for the ligand-molecule pairs for which the database comprises activity data, interaction energies of the ligand population member with proximal sequence elements of the molecule library member of the respective ligand-molecule pairs when the ligand population member is in a determined likely spatial orientation. The computational method further comprises generating at least one statistical model that is predictive of those sequence elements of the molecule library members that may contribute to a differential effect of the ligand population members on the molecule library members using the calculated interaction energies and the activity data corresponding to the ligand-molecule pairs for which the database contains activity data. An effector that is predicted, based upon the generated statistical model(s), to have a specificity for the target molecule that differs from the specificity of the effector for other molecule library member(s) may then be selected and activity data quantifying an effect of the selected effector upon the activity of one or more of the molecule library members may then be experimentally determined. Preferably, the sequence of steps are repeated wherein an effector selected in an earlier iteration of the sequence of steps is considered a member of the population of ligands in a subsequent iteration of the sequence of steps.
In another embodiment, the present invention is directed to a computational method for selecting an effector having specificity for a target molecule. The method comprises compiling a database containing (i) three-dimensional structural data for members of a library of molecules each having a known chemical sequence comprising sequence elements, the library comprising the target molecule and other member molecules, (ii) structural data for members of a population of ligands each having a known chemical structure, and (iii) activity data quantifying an effect of ligand population members upon the activity of molecule library members for a set of ligand-molecule pairs wherein the ligands of the ligand-molecule pairs are selected from the ligand population members, the molecules of the ligand-molecule pairs are selected from the molecule library members, and different ligand-molecule pairs in the set comprise a different ligand, a different molecule, or both a different ligand and a different molecule relative to other ligand-molecule pairs in the set, and wherein the activity data differs for different ligand-molecule pairs in the set. In one preferred embodiment, the other member molecules of the library are structurally related to the target molecule. The method further comprises establishing structure-based equivalence of the sequence elements and labeling the sequence elements of different molecule library members to reflect said equivalence and determining likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data. The method further comprises calculating, for the ligand-molecule pairs for which the database comprises activity data, interaction energies of the ligand population member with proximal sequence elements of the molecule library member of the respective ligand-molecule pairs when the ligand population member is in a determined likely spatial orientation and generating at least one statistical model that is predictive of those sequence elements of the molecule library members that are likely to contribute to the differential effect of ligand population members on molecule library members using the calculated interaction energies and the activity data corresponding to the ligand-molecule pairs for which the database contains activity data. An effector that is likely, based upon the generated statistical model(s), to have specificity for the target molecule that exceeds the specificity of the effector for other molecule library member(s) may then be selected and activity data quantifying an effect of the selected effector upon the activity of one or more molecule library members may then be experimentally determined. In a preferred embodiment, the sequence of steps are repeated at least wherein in a later iteration the effector selected in an earlier iteration of the steps is a member of the population of ligands in a later iteration of steps.
An additional embodiment of the present invention is a computational method for selecting an effector having specificity for a target molecule. The method comprises:
An additional embodiment of the present invention is a system for selecting an effector having specificity for a target molecule. The system comprises: a processor for compiling a database containing (i) three-dimensional structural data for members of a library of molecules each having a known chemical sequence comprising sequence elements, the library comprising the target molecule and other member molecules, (ii) structural data for members of a population of ligands each having a known chemical structure, and (iii) activity data quantifying an effect of ligand population members upon the activity of molecule library members wherein the ligands of the ligand-molecule pairs are selected from the ligand population members, the molecules of the ligand-molecule pairs are selected from the molecule library members and different ligand-molecule pairs in the set comprise a different ligand, a different molecule, or both a different ligand and a different molecule relative to other ligand-molecule pairs in the set, and wherein the activity data differs for different ligand-molecule pairs in the set, determining likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data, and establishing equivalence of the sequence elements based on determined likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the data comprises activity data and labeling the sequence elements of different molecule library members to reflect said equivalence; a calculator for calculating, for the ligand-molecule pairs for which the database comprises activity data, interaction energies of the ligand population member with proximal sequence elements of the molecule library member of the respective ligand-molecule pairs when the ligand population member is in a determined likely spatial orientation; and a classifer for generating at least one statistical model that is predictive of those sequence elements of the molecule library members that are likely to contribute to a differential effect of ligand population members on molecule library members using the calculated interaction energies and the activity data corresponding to the ligand-molecule pairs for which the database contains activity data.
Another embodiment of the present invention is a system for selecting an effector having specificity for a target molecule. The system comprises: means for compiling a database containing (i) three-dimensional structural data for members of a library of molecules each having a known chemical sequence comprising sequence elements, the library comprising the target molecule and other member molecules structurally related to the target molecule, (ii) structural data for members of a population of ligands each having a known chemical structure, and (iii) activity data quantifying an effect of ligand population members upon the activity of molecule library members wherein the ligands of the ligand-molecule pairs are selected from the ligand population members, the molecules of the ligand-molecule pairs are selected from the molecule library members and different ligand-molecule pairs in the set comprise a different ligand, a different molecule, or both a different ligand and a different molecule relative to other ligand-molecule pairs in the set, and wherein the activity data differs for different ligand-molecule pairs in the set; means for establishing structure-based equivalence of the sequence elements and labeling the sequence elements of different molecule library members to reflect said equivalence; means for determining likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data; means for calculating, for the ligand-molecule pairs for which the database comprises activity data, interaction energies of the ligand population member with proximal sequence elements of the molecule library member of the respective ligand-molecule pairs when the ligand population member is in a determined likely spatial orientation; means for generating at least one statistical model that is predictive of those sequence elements of the molecule library members that are likely to contribute to a differential effect of ligand population members on molecule library members using the calculated interaction energies and the activity data corresponding to the ligand-molecule pairs for which the database contains activity data; means for selecting an effector that is likely, based upon the generated statistical model(s), to have specificity for the target molecule that exceeds the specificity of the effector for other molecule library member(s); means for experimentally determining activity data quantifying an effect of the selected effector upon the activity of one or more molecule library members; and, means for at least once, repeating steps (a) and (c) through (g) wherein in a later iteration of steps (a) and (c) through (g) the effector selected in step (f) of an earlier iteration of steps (c) through (g) is a member of the population of ligands.
An additional embodiment of the present invention is a system for selecting an effector having specificity for a target molecule. The system comprises: a processor for compiling a database containing (i) three-dimensional structural data for members of a library of molecules each having a known chemical sequence comprising sequence elements, the library comprising the target molecule and other member molecules structurally related to the target molecule, (ii) structural data for members of a population of ligands each having a known chemical structure, and (iii) activity data quantifying an effect of ligand population members upon the activity of molecule library members wherein the ligands of the ligand-molecule pairs are selected from the ligand population members, the molecules of the ligand-molecule pairs are selected from the molecule library members and different ligand-molecule pairs in the set comprise a different ligand, a different molecule, or both a different ligand and a different molecule relative to other ligand-molecule pairs in the set, and wherein the activity data differs for different ligand-molecule pairs in the set, establishing structure-based equivalence of the sequence elements and labeling the sequence elements of different molecule library members to reflect said equivalence, and determining likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data; a calculator for calculating, for the ligand-molecule pairs for which the database comprises activity data, interaction energies of the ligand population member with proximal sequence elements of the molecule library member of the respective ligand-molecule pairs when the ligand population member is in a determined likely spatial orientation; and, a classifier for generating at least one statistical model that is predictive of those sequence elements of the molecule library members that are likely to contribute to a differential effect of ligand population members on molecule library members using the calculated interaction energies and the activity data corresponding to the ligand-molecule pairs for which the database contains activity data.
An additional embodiment of the present invention is a system for selecting an effector having specificity for a target molecule. The system comprises: means for compiling a database containing (i) three-dimensional structural data for members of a library of molecules each having a known chemical sequence comprising sequence elements, the library comprising the target molecule and other member molecules, (ii) structural data for members of a population of ligands each having a known chemical structure, and (iii) activity data quantifying an effect of ligand population members upon the activity of molecule library members wherein the ligands of the ligand-molecule pairs are selected from the ligand population members, the molecules of the ligand-molecule pairs are selected from the molecule library members and different ligand-molecule pairs in the set comprise a different ligand, a different molecule, or both a different ligand and a different molecule relative to other ligand-molecule pairs in the set, and wherein the activity data differs for different ligand-molecule pairs in the set; means for determining likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data; means for establishing equivalence of the sequence elements based on determined likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the data comprises activity data and labeling the sequence elements of different molecule library members to reflect said equivalence; means for calculating, for the ligand-molecule pairs for which the database comprises activity data, interaction energies of the ligand population member with proximal sequence elements of the molecule library member of the respective ligand-molecule pairs when the ligand population member is in a determined likely spatial orientation; means for generating at least one statistical model that is predictive of those sequence elements of the molecule library members that are likely to contribute to a differential effect of ligand population members on molecule library members using the calculated interaction energies and the activity data corresponding to the ligand-molecule pairs for which the database contains activity data; means for selecting an effector that is likely, based upon the generated statistical model(s), to have specificity for the target molecule that exceeds the specificity of the effector for other molecule library member(s); means for experimentally determining activity data quantifying an effect of the selected effector upon the activity of one or more molecule library members; and, means for at least once, repeating steps (a) through (g) wherein in a later iteration of steps (a) through (g) the effector selected in step (f) of an earlier iteration of steps (a) through (g) is a member of the population of ligands.
Other objects and features will be in part apparent and in part pointed out hereinafter.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The following definitions and methods are provided to better define the present invention and to guide those of ordinary skill in the art in the practice of the present invention. Unless otherwise noted, terms are to be understood according to conventional usage by those of ordinary skill in the relevant art.
When introducing elements of the present invention or the preferred embodiment(s) thereof, the articles “a,” “an,” “the,” and “said” are intended to mean that there are one or more of the elements. The terms “comprising,” “including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.
Activator: any chemical composition that increases the stability and/or activity of a target molecule or the expression of a gene or gene product. For example, classes of activators include, but are not limited to, allosteric activators and genetic activators. Allosteric activators bind to an alternative site on an enzyme, separate from the active site, and positively regulate the enzyme's activity. Allosteric activators typically elicit their effects by changing the conformation of the enzymes they bind to. This usually leads to changes in the active site of an enzyme, allowing for more efficient binding between an enzyme and its substrate. Enzyme activity typically increases as a result. Genetic activators interact with nucleic acids, typically deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), to promote expression of a gene or gene product, respectively. A non-limiting example of genetic activators comprises transcription factors. Transcription factors typically bind to DNA sequences upstream of a gene to be expressed, thereafter recruiting various transcription-related proteins and inducing conformational changes in the DNA that promote gene expression. Transcription factors can bind to promoter regions proximal and upstream of the transcription start site of a gene, or to regions farther upstream of a gene, known as enhancer elements. In either case, transcription factors bind to specific DNA sequences, leaving open the possibility of engineering novel transcription factor-DNA sequence interactions by modifying either transcription factors themselves or a DNA sequence of interest.
Activity data: any measurable quantity that describes some effect of a ligand on a target molecule and/or some property of the ligand itself. Examples of activity data include, but are not limited to, pKa, Ki, pKi, IC50, pIC50, free energy, entropy and enthalpy of ligand-target molecule complex formation, log P, and the number of hydrogen bond donors/acceptors.
Acetylation enzyme/acetyl transferases: any enzyme that catalyzes the transfer of an acetyl group from one compound to another. Examples of acetyltransferases include, but are not limited to, histone acetyltransferases, choline acetyltransferases, chloramphenicol acetyltransferases, serotonin N-acetyltransferase, NatA acetyltransferases, and NatB acetyltransferases.
Amino acid: any naturally occurring or synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function similarly to naturally occurring amino acids. Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, gamma-carboxyglutamate, and O-phosphoserine. Amino acid analogs refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, e.g., a carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs may have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid. Amino acid mimetics refers to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that functions similarly to a naturally occurring amino acid.
Antibody: encompasses naturally occurring immunoglobulins (e.g. IgM, IgG, IgD, IgA, IgE, etc.) as well as non-naturally occurring immunoglobulins, including, for example, single chain antibodies, chimeric antibodies (e.g., humanized murine antibodies) and heteroconjugate antibodies (e.g., bispecific antibodies), as well as antigen-binding fragments thereof, (e.g., Fab′, F(ab′)2, Fab, Fv, and rIgG). See also, e.g., Pierce Catalog and Handbook, 1994-1995 (Pierce Chemical Co., Rockford, Ill.); Kuby, J., Immunology, 3rd Ed., W.H. Freeman & Co., New York (1998). The term antibody also includes bivalent, trivalent, tetravalent, bispecific, and trispecific molecules, including but not limited to diabodies, triabodies, and tetrabodies. Bivalent and bispecific molecules are described in, e.g., Kostelny et al. (1992) J Immunol 148:1547, Pack and Pluckthun (1992) Biochemistry 31:1579, Hollinger et al., 1993, supra, Gruber et al. (1994) J Immunol: 5368, Zhu et al. (1997) Protein Sci 6:781, Hu et al. (1996) Cancer Res. 56:3055, Adams et al. (1993) Cancer Res. 53:4026, and McCartney, et al. (1995) Protein Eng. 8:301. Non-naturally occurring antibodies can be constructed using solid phase peptide synthesis, can be produced recombinantly, or can be obtained, for example, by screening combinatorial libraries consisting of variable heavy chains and variable light chains as described by Huse et al., Science 246:1275-1281 (1989), which is incorporated herein by reference. These and other methods of making, for example, chimeric, humanized, CDR-grafted, single chain, and bifunctional antibodies, are well known to those skilled in the art (Winter and Harris, Immunol. Today 14:243-246 (1993); Ward et al., Nature 341:544-546 (1989); Harlow and Lane, supra, 1988; Hilyard et al., Protein Engineering: A practical approach (IRL Press 1992); Borrabeck, Antibody Engineering, 2d ed. (Oxford University Press 1995); each of which is incorporated herein by reference).
Deacetylation enzyme/deacetylases: any enzyme that catalyzes the removal of an acetyl group from a substrate molecule. Deacetylases include, but are not limited to, zinc-based and nicotinamide adenine dinucleotide (NAD)-based deacetylases.
Effector: any compound that potentially regulates the biological activity of a target molecule. Effectors include, but are not limited to, inhibitors and activators. In a preferred embodiment, effectors are small organic molecules.
Epigenetic modifications: often closely linked and act in a self-reinforcing manner in the regulation of different cellular processes. DNA methylation and histone acetylation are major epigenetic modifications that are dynamically linked in the epigenetic control of gene expression and their deregulation plays an important role in tumorigenesis. See Feinberg, et al., Nat. Rev. Genet. 7:21-33 (2006); Jones & Baylin, Nat. Rev. Genet. 3:415-428 (2002). Recent studies suggested that an intimate communication and mutual dependence exists between histone acetylation and DNA methylation in the process of gene silencing. Communication between histone acetylation and cytosine methylation may proceed in both directions. In one scenario, DNA methylation may be the primary mark for gene silencing that triggers events leading to non-permissive chromatin state. In another scenario, the loss of histone acetylation may serve as the initial event of gene silencing, which is followed by DNA methylase targeting and induction of local DNA hypermethylation. See Vaissiere, et al., Mut. Res. 659:40-48 (2008).
Target molecule: as described herein can be a molecule of any size that binds, complexes, or otherwise associates with ligands to generate a desired effect. In some embodiments, the macromolecules are proteins or nucleic acids.
Inhibitor: any chemical composition that decreases the stability and/or activity of a target molecule. Inhibitors are typically divided into two classes: reversible and irreversible, based on the nature of their interaction with a target molecule. Irreversible inhibitors tend to interact with a target through covalent bonding, thereby fundamentally changing the chemical nature of the target. Reversible inhibitors, on the other hand, interact with a target via non-covalent interactions such as ionic or hydrogen bonds and hydrophobic interactions. Reversible inhibitors are further divided into four classes, including competitive, noncompetitive, uncompetitive, and mixed inhibitors. For enzymes, the term “competitive inhibition” is used to refer to competitive inhibition in accord with the Michaelis-Menton model of enzyme kinetics. Competitive inhibition is recognized experimentally because the percent inhibition at a fixed inhibitor concentration is decreased by increasing the substrate concentration. At sufficiently high substrate concentration, Vmax can essentially be restored even in the presence of the inhibitor. Conversely, “non-competitive inhibition” refers to inhibition that is not reversed by increasing the substrate concentration. “Uncompetitive inhibition” refers to inhibition in which an inhibitor only binds to the enzyme-substrate complex whereas “mixed inhibition” refers to inhibition in which the inhibitor can bind to an enzyme whether the enzyme is in complex with its substrate or not, though its affinity will vary depending on the binding state of the enzyme.
Histone deacetylases (HDACs): a family of protein modifying-enzymes found in bacteria, fungi, plants and animals. In the human, 18 different isoforms have been identified and divided into 4 classes according to size, cellular localization, number of active sites and homology with yeast deacetylases (Mai, A., et al., 2005). Class I, that includes HDAC-1, -2, -3 and -8, is related to yeast RPD3, shares nuclear localization with the exception of HDAC3, and has ubiquitous expression. Instead, class II shows domains with similarity to yeast Hda1 and can be further divided into class IIa, which includes HDAC-4, -5, -7 and -9, and class IIb (HDAC-6 and -10) that contain two catalytic sites. HDAC3 and members of class II have been shown to shuttle between the cytoplasm and nucleus, and have tissue-specific expression. HDAC11 is the only member of class IV. HDAC classes I, II and IV are zinc-dependent proteases; unlike those of class III, called sirtuins, which require NAD+ as cofactor. HDACs play a key role in epigenetics—controlling gene expression involved in all aspects of biology—cell proliferation, chromosome remodeling, gene silencing, and gene transcription (Hu, E., et al., 2003). They regulate the acetylated state of histone proteins removing the acetyl moiety from the ε-amino group of lysine residues on the N-terminal extension of the core histones, this leads to changes in the structure of histones and therefore modifies the accessibility of transcription enzymes with gene-promoter regions. In addition, HDACs dynamically modify the activity of diverse types of non-histone proteins (Choudhary, C., et al., 2009). These include transcription factors, signal-transduction mediators, microtubules and a molecular chaperone. In particular, distinct HDACs class I and II are overexpressed in several types of cancer.
HDAC inhibitors (HDACIs): classified according to their chemical structure as, for example, short-chain fatty acids, hydroxamic acids, benzamides, ketones and cyclic peptides with a pendant functional group. Because of the overexpression of some HDACs in cancer, HDACIs have been developed and approved for the treatment of cutaneous T-cell lymphoma: for example, Merck's Zolinza (suberoylanilide hydroxamic acid, SAHA) and Celgene's Istodax (Romidepsin, FK228) (Zain, J., et al., 2010). More recently, HDACIs have emerged as potential therapeutics for the stimulation of viral expression from infected cells in the hope of eradication of HIV infection (Savarino, A., et al., 2009, Choudhary, S. K., et al., 2011, Matalon, S., et al., 2011, Ortiz, A. R., et al., 1997, Ortiz, A. R., et al., 1995, Perez, C., et al., 1998, Lozano, J. J., et al., 2000, Ballante, F., et al., 2012). Many HDACIs show variability in their ability to inhibit particular isoforms. Unfortunately, as for SAHA and trichostatin A (TSA), the majority of HDACIs inhibit many HDAC isoforms nonspecifically. Others, such as MS-275, a benzamide, are more selective for class I, but still not isoform specific.
Interaction energy: the total energy of interaction between two entities. In the context of the present invention, interaction energies may be calculated according to the interaction between a given ligand and a sequence element, for example, an amino acid of a target protein. In a preferred embodiment of the invention, interaction energies are broken down into their component parts for a particular interaction between a ligand and a sequence element, i.e. electrostatic interaction energy, van der Waals interaction energy, desolvation energy, surface complementarity (polar vs. non-polar), volume of cavity occupied, etc.
Nucleic acids: Nucleic acid” or “oligonucleotide” or “polynucleotide” used herein mean at least two nucleotides covalently linked together. Many variants of a nucleic acid may be used for the same purpose as a given nucleic acid. Thus, a nucleic acid also encompasses substantially identical nucleic acids and complements thereof. Nucleic acids may be single stranded or double stranded, or may contain portions of both double stranded and single stranded sequences. The nucleic acid may be DNA, both genomic and cDNA, RNA, or a hybrid, where the nucleic acid may contain combinations of deoxyribo- and ribo-nucleotides, and combinations of bases including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine hypoxanthine, isocytosine and isoguanine. Nucleic acids may be synthesized as a single stranded molecule or expressed in a cell (in vitro or in vivo) using a synthetic gene. Nucleic acids may be obtained by chemical synthesis methods or by recombinant methods. The nucleic acid may also be a RNA such as a mRNA, tRNA, short hairpin RNA (shRNA), short interfering RNA (sRNA), double-stranded RNA (dsRNA), transcriptional gene silencing RNA (ptgsRNA), Piwi-interacting RNA, pri-miRNA, pre-miRNA, micro-RNA (miRNA), or anti-miRNA, as described, e.g., in U.S. patent application Ser. Nos. 11/429,720, 11/384,049, 11/418,870, and 11/429,720 and Published International Application Nos. WO 2005/116250 and WO 2006/126040. sRNA gene-targeting may be carried out by transient sRNA transfer into cells, achieved by such classic methods as lipid-mediated transfection (such as encapsulation in liposome, complexing with cationic lipids, cholesterol, and/or condensing polymers, electroporation, or microinjection). sRNA gene-targeting may also be carried out by administration of sRNA conjugated with antibodies or sRNA complexed with a fusion protein comprising a cell-penetrating peptide conjugated to a double-stranded (ds) RNA-binding domain (DRBD) that binds to the sRNA (see, e.g., U.S. Patent Application Publication No. 2009/0093026). An shRNA molecule has two sequence regions that are reversely complementary to one another and can form a double strand with one another in an intramolecular manner. shRNA gene-targeting may be carried out by using a vector introduced into cells, such as viral vectors (lentiviral vectors, adenoviral vectors, or adeno-associated viral vectors for example). The design and synthesis of siRNA and shRNA molecules are known in the art, and may be commercially purchased from, e.g., Gene Link (Hawthorne, N.Y.), Invitrogen Corp. (Carlsbad, Calif.), Thermo Fisher Scientific, and Dharmacon Products (Lafayette, Colo.). The nucleic acid may also be an aptamer, an intramer, or a spiegelmer. The term “aptamer” refers to a nucleic acid or oligonucleotide molecule that binds to a specific molecular target. Aptamers are derived from an in vitro evolutionary process (e.g., SELEX (Systematic Evolution of Ligands by EXponential Enrichment), disclosed in U.S. Pat. No. 5,270,163), which selects for target-specific aptamer sequences from large combinatorial libraries. Aptamer compositions may be double-stranded or single-stranded, and may include deoxyribonucleotides, ribonucleotides, nucleotide derivatives, or other nucleotide-like molecules. The nucleotide components of an aptamer may have modified sugar groups (e.g., the 2′—OH group of a ribonucleotide may be replaced by 2′-F or 2′-NH2), which may improve a desired property, e.g., resistance to nucleases or longer lifetime in blood. Aptamers may be conjugated to other molecules, e.g., a high molecular weight carrier to slow clearance of the aptamer from the circulatory system. Aptamers may be specifically cross-linked to their cognate ligands, e.g., by photo-activation of a cross-linker (Brody, E. N. and L. Gold (2000) J. Biotechnol. 74:5-13). The term “intramer” refers to an aptamer which is expressed in vivo. For example, a vaccinia virus-based RNA expression system has been used to express specific RNA aptamers at high levels in the cytoplasm of leukocytes (Blind, M. et al. (1999) Proc. Natl. Acad. Sci. USA 96:3606-3610). The term “spiegelmer” refers to an aptamer which includes L-DNA, L-RNA, or other left-handed nucleotide derivatives or nucleotide-like molecules. Aptamers containing left-handed nucleotides are resistant to degradation by naturally occurring enzymes, which normally act on substrates containing right-handed nucleotides. A nucleic acid will generally contain phosphodiester bonds, although nucleic acid analogs may be included that may have at least one different linkage, e.g., phosphoramidate, phosphorothioate, phosphorodithioate, or O-methylphosphoroamidite linkages and peptide nucleic acid backbones and linkages. Other analog nucleic acids include those with positive backbones; non-ionic backbones, and non-ribose backbones, including those disclosed in U.S. Pat. Nos. 5,235,033 and 5,034,506. Nucleic acids containing one or more non-naturally occurring or modified nucleotides are also included within the definition of nucleic acid. The modified nucleotide analog may be located for example at the 5′-end and/or the 3′-end of the nucleic acid molecule. Representative examples of nucleotide analogs may be selected from sugar- or backbone-modified ribonucleotides.
It should be noted, however, that also nucleobase-modified ribonucleotides, i.e. ribonucleotides, containing a non-naturally occurring nucleobase instead of a naturally occurring nucleobase such as uridines or cytidines modified at the 5-position, e.g. 5-(2-amino)propyl uridine, 5-bromo uridine; adenosines and guanosines modified at the 8-position, e.g. 8-bromo guanosine; deaza nucleotides, e.g. 7-deaza-adenosine; 0- and N-alkylated nucleotides, e.g. N6-methyl adenosine are suitable. The 2′-OH-group may be replaced by a group selected from H, OR, R, halo, SH, SR, NH2, NHR, NR2 or CN, wherein R is C1-C6 alkyl, alkenyl or alkynyl and halo is F, Cl, Br or I. Modified nucleotides also include nucleotides conjugated with cholesterol through, e.g., a hydroxyprolinol linkage as disclosed in Krutzfeldt et al., Nature (Oct. 30, 2005), Soutschek et al., Nature 432:173-178 (2004), and U.S. Patent Application Publication No. 20050107325. Modified nucleotides and nucleic acids may also include locked nucleic acids (LNA), as disclosed in U.S. Patent Application Publication No. 20020115080. Additional modified nucleotides and nucleic acids are disclosed in U.S. Patent Application Publication No. 20050182005. Modifications of the ribose-phosphate backbone may be done for a variety of reasons, e.g., to increase the stability and half-life of such molecules in physiological environments, to enhance diffusion across cell membranes, or as probes on a biochip. Mixtures of naturally occurring nucleic acids and analogs may be made; alternatively, mixtures of different nucleic acid analogs, and mixtures of naturally occurring nucleic acids and analogs may be made.
Protein/peptide/polypeptide: The terms “peptide,” “polypeptide,” and “protein” are used interchangeably herein. In the present invention, these terms mean a linked sequence of amino acids, which may be natural, synthetic, or a modification, or combination of natural and synthetic. The term includes antibodies, antibody mimetics, domain antibodies, lipocalins, targeted proteases, and polypeptide mimetics. The term also includes vaccines containing a peptide or peptide fragment intended to raise antibodies against the peptide or peptide fragment.
Proximal sequence elements: includes, but is not limited to, the component parts of a sequence of linked chemical substances. For example, the sequence elements of a nucleotide sequence are nucleic acids, such as, for example, adenine, cytosine, guanine, and thymine in DNA or uracil in RNA. For proteins, the sequence elements are amino acids, including, but not limited to, naturally occurring and synthetic amino acids. The term “proximal” in the context of sequence elements refers to those sequence elements of a target molecule that are within a given distance of a complexed ligand. In some embodiments of the present invention, the distance is a variable usually measured from the ligand-binding site on the target molecule that encompasses those residues of the target with a significant contribution to discriminate relative affinities of ligands.
Specificity: refers to a binding reaction between molecules that produces activity data at least two times the background and more typically more than 10 to 100 times background molecular associations under physiological conditions. In the context of the present invention, the desired specificity may be for a particular ligand to interact favorably with one library member (sometimes referred to herein as a target molecule) relative to other molecules (sometimes referred to herein as off-target molecules) from a library of molecules containing the molecule (e.g. a single HDAC isoform out of a library of several HDAC isoforms) or for a particular ligand to interact most favorably with two or more library members (e.g. multiple mutant forms of human immunodeficiency virus-1 reverse transcriptase (HIV-1 RT).
Small molecule: includes any relatively small chemical or other moiety that can act to affect biological processes. Small molecules can include any number of therapeutic agents presently known and used, or can be synthesized in a library of such molecules for the purpose of screening for biological function(s). Small molecules are distinguished from macromolecules by size. The small molecules of this invention usually have a molecular weight less than about 5,000 daltons (Da), preferably less than about 2,500 Da, more preferably less than 1,000 Da, most preferably less than about 500 Da. “Organic compound” refers to any carbon-based compound other than biologics such as nucleic acids, polypeptides, and polysaccharides. In addition to carbon, organic compounds may contain calcium, chlorine, fluorine, copper, hydrogen, iron, potassium, nitrogen, oxygen, sulfur and other elements. An organic compound may be in an aromatic or aliphatic form. Non-limiting examples of organic compounds include acetones, alcohols, anilines, carbohydrates, mono-saccharides, di-saccharides, amino acids, nucleosides, nucleotides, lipids, retinoids, steroids, proteoglycans, ketones, aldehydes, saturated, unsaturated and polyunsaturated fats, oils and waxes, alkenes, esters, ethers, thiols, sulfides, cyclic compounds, heterocyclic compounds, imidizoles, and phenols. Organic compounds also include nitrated organic compounds and halogenated (e.g., chlorinated) organic compounds. Collections of small molecules, and small molecules identified according to the invention are characterized by techniques such as accelerator mass spectrometry (AMS; see Turteltaub et al., Curr Pharm Des 2000 6:991-1007, Bioanalytical applications of accelerator mass spectrometry for pharmaceutical research; and Enjalbal et al., Mass Spectrom Rev 2000 19:139-61, Mass spectrometry in combinatorial chemistry.) Preferred small molecules are relatively easier and less expensively manufactured, formulated or otherwise prepared. Preferred small molecules are stable under a variety of storage conditions. Preferred small molecules may be placed in tight association with macromolecules to form molecules that are biologically active and that have improved pharmaceutical properties. Improved pharmaceutical properties include changes in circulation time, distribution, metabolism, modification, excretion, secretion, elimination, and stability that are favorable to the desired biological activity. Improved pharmaceutical properties include changes in the toxicological and efficacy characteristics of the chemical entity.
Structurally related: refers to the target molecules in the library of molecules used in the methods, models, and systems of the present invention. Structurally related molecules may show some degree of similarity in sequence or three-dimensional structural homology in their respective structures. “Structural homology” refers to the degree of coincidence in space between two or more protein backbones. Protein backbones that adopt the same protein structure, fold and show similarity upon three-dimensional structural superposition in space can be considered structurally homologous. Structural homology is not based on sequence homology, but rather on three-dimensional homology. Two amino acids in two different proteins said to be homologous based on structural homology between those proteins, do not necessarily need to be in sequence-based homologous regions. For example, protein backbones that have a root mean squared (RMS) deviation of less than 3.5, 3.0, 2.5, 2.0, 1.7 or 1.5 angstroms at a given space position or defined region between each other can be considered to be structurally homologous in that region. It is contemplated herein that substantially equivalent amino acid positions that are located on two or more different protein sequences that share a certain degree of structural homology will have comparable functional tasks. These two amino acids then can be said to have structure-based equivalence with each other, even if their precise primary linear positions on the amino acid sequences, when these sequences are aligned, do not match with each other. Amino acids that are exhibit structure-based equivalence can be far away from each other in the primary protein sequences when these sequences are aligned following the rules of classical sequence homology.
The present invention provides methods, models, and systems for selecting an effector having a desired specificity for a target molecule. The methods, models, and systems of the present invention, sometimes arbitrarily referred to herein as the DISCRIMINATE method, model, or system, or merely DISCRIMINATE, are computer-implemented approaches to utilizing the abundance of available data from diverse sources of structure-activity studies to select existing molecules or design new molecules optimized for a desired effect. Drug discovery efforts are greatly enhanced by the inclusion of computer-based, predictive methods due to the practically infinite number of compounds theoretically available for testing. Moreover, determining the various effects of a compound of interest is a rigorous, time-consuming, labor-intensive, and expensive process. Hence, there is a continuing need for improved computational methods used in the development of accurate, predictive models for drug discovery applications.
For clarity of discussion, molecules for which an effector is sought will be referred to as “targets” or “target molecules” whereas those other molecule library members for which an effector is not sought will be referred to as “off-targets” or “off-target molecules.” In some embodiments of the present invention, effectors will be selected for exhibiting specificity for a target or a set of targets that exceeds the specificity for an off-target or a set of off-targets.
The methods, models, and systems of the present invention can be applied to practically any problem in which ligand activity specific for a target or a subset of targets is desired. For example, targets may include, but are not limited to, peptides, nucleic acids, carbohydrates, lipids, and combinations thereof. In some embodiments of the present invention, the peptides are, for example, receptors, enzymes, and ribosomal peptides. Receptors may include G-protein-coupled receptors, for example. Enzymes may include, but are not limited to, proteolytic enzymes, such as, for example, HIV protease, kinases, such as, for example, tyrosine kinases, HIV reverse transcriptase, and enzymes that catalyze epigenetic modifications, such as, for example methyl transferases (methylases), demethylases, acetyl transferases (acetylases), and deacetylases. Enzymes that catalyze epigenetic modifications can act on multiple types of substrates, including, for example, nucleic acid, such as DNA, and peptides, such as histones. In some embodiments of the present invention, the acetyl transferases are lysine acetyl transferases (KATs). In some embodiments of the present invention, the deacetylases are zinc-based lysine deacetylases (KDACs). Zinc-based lysine deacetylases include, but are not limited to, histone deacetylases (HDACs). In some embodiments of the present invention, the deacetylases are NAD-based lysine deacetylases. In additional embodiments of the present invention, ribosomal peptides include any peptide that comprises a ribosome. In some embodiments of the present invention, the nucleic acids are ribonucleic acids, such as, for example, ribozymes, siRNAs, and shRNAs. In additional embodiments of the present invention, the nucleic acids are deoxyribonucleic acids. The deoxyribonucleic acids of the present invention may comprise protein binding sites, such as, for example, promoters, transcription factor binding sites, and enhancer binding sites.
The effectors of the present invention may produce, for example, a measurable change in activity for the target molecules of the present invention. In some embodiments of the present invention, the effectors are inhibitors of the target molecule. In some embodiments of the present invention, the effectors are activators of the target molecule. In some embodiments of the present invention, the effectors may produce no measurable change in the activity of the target molecule. It is to be understood that effectors of the present invention are selected based on predictive models produced by the methods and systems of the present invention. Effectors predicted to, for example, inhibit or activate a target molecule, may prove not to exhibit the predicted effect when tested experimentally. Thus, it is to be understood that effectors of the present invention need not produce the predicted effect in the target molecule. However, these experimental determinations are still useful in generating a new iterative model with improved predictive power.
In some embodiments of the present invention, the effector is selected to have a specificity for a target molecule. In some embodiments of the invention, an effector's specificity for a target molecule may produce a change in activity of the target molecule (compared to an untreated target molecule or control treated target molecule) that is at least 2 to 100 times the change measured in off-targets (compared to untreated or control off-targets). For example, an effector's specificity for a target molecule may produce a change in activity of the target molecule that is at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, or 90 times the change measured in off-targets. In some embodiments of the present invention, one may wish to select an effector having lesser specificity, such as, for example, an effector that produces a change in the activity of the target molecule that is equal to or less than 1.01 to 10 times the change measured in off-targets. In this example, the effector's specificity for a target molecule may produce a change in activity of the target molecule that is equal to or less than 1.02, 1.03, 1.04, 1.05, 1.1, 1.2, 1.3, 1.4, 1.5, 1.75, 2, 3, 4, 5, 6, 7, 8, or 9 times the change measured in off-targets. This type of approach may be useful in designing a drug that would be insensitive to potential mutations in its target. An ideal target for such a drug may be, for example, HIV-1 RT, discussed in greater detail below.
Other approaches exist for the prediction of drug binding affinities, most notably, comparative binding energy analysis (COMBINE). (Ortiz, A., et al., 1995, Ortiz, A., et al., 1997, Perez, C., et al., 1998, Lozano, J. J., et al., 2000, Murcia, M. et al., 2006, Henrich, S. et al., 2009). The present invention improves on these approaches in several substantive ways. First, the models, methods and systems of the present invention comprise an iterative method that improves its predictive ability by the inclusion of experimental data gathered from experimentally testing the effect of a selected effector on the target molecule and off-targets. For example, experimental data can be generated, both from target molecules and off-targets, after experimentally evaluating the activity of a compound predicted by the models, methods and systems of the present invention to have a desired specificity. Additionally, newly published data as well as data profiling of known compounds against both targets and off-targets can also be used in iterative refinements of the methods, models, and systems of the present invention as such data becomes available. Other approaches to building predictive binding models are not iterative in nature and, as such, said models cannot be further improved by the addition of new data.
The iterative nature of the models, methods and systems of the present invention provides a user with a greater degree of flexibility when choosing ligand-target molecule and ligand-off-target molecule pairs because activity data for each and every possible permutation of ligands with the targets and off-targets is not required. The models, methods and systems of the present invention can generate predictive models based on any initial database size, regardless of the absence of data for any given ligand-target or ligand-off-target molecule combination, which can then be used to select and experimentally determine the activity of a ligand predicted to have a desired specificity for the target(s). Once obtained, this activity data may be added to the database, effectively improving the predictability of the models, methods and systems of the present invention in subsequent iterations. In one embodiment, for example, the method is repeated at least twice for two selected ligands. By way of further example, in one embodiment, the method is repeated at least three times for at least three different selected ligands. By way of further example, in one embodiment, the method is repeated at least five times for at least five different selected ligands.
Furthermore, the models, methods, and systems of the present invention improve on a number of other deficiencies inherent to previous methods that are understood by one of skill in the art to introduce noise to the parameters calculated for generation of predictive 3D-QSAR models. Examples of such deficiencies include, but are not limited to, inadequate sampling of alternative ligand-binding poses when computationally determining a likely spatial orientation of a ligand-target molecule or ligand-off-target molecule pair, inaccuracies in scoring functions during docking, and limitations of force fields regarding electrostatics (e.g. monopole force fields lacking polarizability). The models, methods, and systems of the present invention address these limitations by implementing systematic search approaches in docking (SKATE) and atomic multipole optimized energetics for biomolecular applications) (AMOEBA) force fields instead of the more primitive monopole force field methods used previously. Additionally, numerous heuristic approaches to generating 3D-QSARs are compatible within the models, methods, and systems of the present invention, including, but not limited to, partial least squares of latent variables (PLS) (reviewed in Haenlein, M, et al., 2004, which is incorporated herein by reference), neural networks (reviewed in Cheng, B., et al., 1994 and Khosravi, A., et al., 2011, which are incorporated herein by reference), and support vector machines (reviewed in Naul, B, 2009, which is incorporated herein by reference). The methodology chosen to generate the heuristic 3D-QSAR models in the methods and systems of the present invention can be varied to optimize the predictability of the models generated depending on the size and quality of the datasets. In the examples given below, PLS is the methodology used.
In some embodiments of the present invention, a database is compiled. In the context of the present invention, the database may include, for example, a list of ligand-target and ligand-off-target pairs along with a number of other types of associated data, including, but not limited to, three-dimensional structural data for the targets and off-targets (i.e., members of the library of molecules), structural data for the ligands, and activity data relating the effect of a particular ligand on a molecule (target or off-target) it is in complex with. It is to be understood, as discussed above, that the database need not be complete, meaning, for example, that for a given list of ligand-target and ligand-off-target pairs, activity data for each pair is not required for the methods and systems of the invention to function. Activity data may be determined in a later iteration of the methods of the present invention and subsequently added to the database or additional ligand-target and ligand-off-target pairs may be added to the database as activity data for said pairs becomes available.
In some embodiments of present invention, the three-dimensional structural data can be gathered from a number of broadly defined sources including, but not limited to, experimentally determined three-dimensional structural data and computationally determined three-dimensional structural data. Experimentally determined three-dimensional structural data is produced as the result of a number of techniques, including, but not limited to, X-ray crystallography (reviewed in Stryer, L., 1968, Matthews, B. W., 1976, and Russo Krauss, I., et al., 2013, each of which is incorporated herein by reference) nuclear magnetic resonance spectroscopy (reviewed in Allerhand, A., et al., 1970, Dyson, H. J., et al., 1996, and Otting, G., et al., 2010, each of which is incorporated herein by reference), and cryo-electron microscopy (reviewed in van Heel, et al., 2000, Frank, J., 2002, Milne, J. L, et al., 2012, each of which is incorporated herein by reference). All of these techniques yield some representation, of varying resolution, of the three-dimensional structure of a protein/nucleic acid or protein/nucleic acid-ligand complex. Computationally determined three-dimensional structural data can be generated using a number of techniques including, but not limited to, homology modeling and protein threading. Homology modeling is discussed in Krieger, E., et al., 2003, which is incorporated herein by reference. Protein threading is discussed in Xu, J., et al., 2008, which is incorporated herein by reference. Additionally, the ability to predict lower resolution 3D structures is becoming an increasing reality that is also contemplated for use in the present invention.
In some embodiments of the present invention, the library of molecules includes two or more molecules that may exhibit disparate activity data when exposed to various ligands. In some embodiments, the library of molecules includes targets and off-targets. In some embodiments of the present invention, the library of molecules includes three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more molecules. It is to be understood that the present invention has no upward limit on the number of molecules that the library of molecules may comprise. Additionally, in some embodiments of the present invention, the library of molecules constitutes, for example, a set of similar related molecules for which one would like to determine specific effectors for each or a subset of the molecules. Similar molecules include, but are not limited to, homologous molecules, isoforms, structurally related molecules, and mutant molecules. For example, a library of molecules may constitute molecules of high sequence or structural identity for which a ligand of particular specificity is required. In this example, one may wish to decipher the individual roles of a collection of various protein isoforms when suitable isoform-specific inhibitors may not yet exist. Such is the case with HDACIs. Selective HDACIs, which would affect either a single HDAC isoform or only a few isoforms within a single class, would be ideal molecular scalpels to help elucidate the individual functions of each HDAC isoform in the complexity of epigenetics. In some embodiments of the present invention, the library of molecules may constitute, for example, a target molecule and other molecules bearing little to no structural (i.e. are not structurally related) or functional relationship with the target molecule. In these embodiments, likely spatial orientations of ligands in targets can be determined before establishing equivalence of residues on targets and off-targets. Equivalence, in this example, may be established by using the docked ligand as the frame of reference. In this example, “equivalent” residues will be those residues in each complex that interact with the docked ligand. This type of approach may be used, for example, if one wishes to enhance specificity of a ligand for the target molecule versus a completely different class of molecule to, for example, eliminate off-target side effects.
In some embodiments of the present invention, the chemical sequences of the targets and off-targets are known. In some embodiments of the present invention, the chemical sequences comprise sequence elements. For example, in the case of DNA or RNA molecules, the sequence elements comprise nucleotides. In another example, the chemical sequences of peptides comprise amino acids. In another example, the chemical sequence of carbohydrates comprise sugars.
In some embodiments of the present invention, the population of ligands includes two or more ligands that, when in complex with individual members of the library of molecules, may produce a measurable change in activity of the library molecules (compared an uncomplexed library molecule control, for example). In some embodiments of the present invention, the population of ligands includes three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more ligands. It is to be understood that the present invention has no upward limit on the number of ligands that the population of ligands may comprise. In some embodiments, the population of ligands can include, but is not limited to, small molecules, lipids, steroids, peptides, biogenic amines, carbohydrates, nucleic acids, such as, for example, small interfering RNAs (siRNAs), short hairpin RNAs (shRNAs), and DNA aptamers, lipids, and proteins, such as, for example, transcription factors and antibodies.
In some embodiments of the present invention, structural data for the population of ligands may include, for example, three-dimensional structural data as discussed above (for proteins, nucleic acids, and carbohydrates). For small molecules, two-dimensional chemical structures are sufficient for the methods and systems of the present invention to function, but will require further additional preparation to generate 3D conformer libraries.
In certain embodiments of the present invention, activity data includes, but is not limited to, measurements of Ka, pKa, Ki, pKi, IC50, pIC50, free energy, entropy, and enthalpy of ligand-target and ligand-off-target complex formation, log P, and the number of hydrogen bond donors/acceptors of each member in a given complex.
In some embodiments of the present invention, structure-based equivalence data is gathered by aligning sequence elements based on their functional roles. For example, in the context of peptides, amino acid sequences are typically aligned based on sequence homology to determine which amino acids can be considered crucial to the respective functions of the molecules. In theory, amino acids conserved over multiple peptides may play some important evolutionary role or be critical for some shared function of the peptides. However, because certain amino acids have redundant functionality with each other, some peptides may share some functionality while exhibiting lower levels of sequence homology. In this situation, experimental or computational methods can be used to align sequence elements based on their function rather than sequence identity. Such experimental methods include, but are not limited to, X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy and such computational methods include, for example, homology modeling. Homology modeling is usually performed computationally, by programs such as Modeller. An example of how one may establish structure-based equivalence may include two amino acid sequences sharing low levels of homology, but, from the experimental or computational methods discussed above, both sequences may be predicted to form an alpha helix in a particular region of protein. These sequences would thus be functionally aligned and be structurally equivalent, which may or may not result in a different amino acid numbering system than that brought about from a simple amino acid sequence alignment. In some embodiments of the present invention, labeling the sequence elements of the targets and off-targets may be performed to reflect the structural and functional equivalence of their respective sequence elements during molecular recognition of the ligand. In some embodiments of the present invention, establishing structure-based equivalence of residues on different targets would identify residues that are, for example, within 2 angstroms root mean square deviation (rmsd).
In some embodiments of the present invention, the likely spatial orientations of the ligand population members in the ligand-target and ligand-off-target pairs may be determined experimentally or computationally. X-crystallography experiments, for example, may yield three-dimensional structural data for targets and off-targets in complex with various ligand population members. The experimentally determined spatial orientation of the ligand in, for example, an enzyme active site, is typically an accurate representation of a ligand's native spatial orientation when in complex with the enzyme. Other methods for experimentally determining the likely spatial orientations of the ligands in the ligand-target or ligand-off-target pairs include, but are not limited to, NMR spectroscopy and cryo-electron microscopy. In some embodiments of the invention, molecular docking simulations can be used to computationally determine a likely spatial orientation. However, due to inaccuracies in computational docking or in the experimental determination of the bound conformation of a ligand in complex with a target or off-target, refinement by energy minimization can improve the geometry of the complex. For example, molecular interactions can be quantified by atomic-based force fields. Assuming that the force field chosen is sufficiently accurate, then the minimal energy complex of the ligand-target or ligand-off-target pairs generally is the correct, most likely, spatial orientation.
Computationally derived likely spatial orientations are typically determined using molecular docking software. Generally, molecular docking software can determine the preferred binding orientation (or “pose”) of a ligand when in complex with a molecule such as, for example, a peptide. Suitable molecular docking software includes, but is not limited to, AutoDock (http://autodock.scripps.edu), PatchDock (http://bioinfo3d.cs.tau.ac.il/PatchDock), ClusPro (http://cluspro.bu.edu, http://nrc.bu.edu/cluster), DockingServer (http://www.dockingserver.com), DOCK (http://dock.compbio.ucsf.edu), 3DLigandSite (http://www.sbabio.ic.ac.uk/˜3dligandsite), ATOME (http://atome.cbs.cnrs.fr/AT2/meta.html), AutoDock Vina (http://vina.scripps.edu), BSP-SLIM (http://zhanglab.ccmb.med.umich.edu/BSP-SLIM), FiberDock (http://bioinfo3d.cs.tau.ac.il/FiberDock), GEMDOCK (http://gemdock.life.nctu.edu.tw/dock), Hex (http://hex.loria.fr), idTarget http://idtarget.rcas.sinica.edu.tw), iGEMDOCK (http://gemdock.life.nctu.edu.tw/dock/igemdock.php), iScreen (http://iscreen.cmu.edu.tw), ParDOCK (http://www.scfbio-iitd.res.in/dock/pardock.isp), Quantum.Ligand.Dock (http://87.116.85.141/LigandDock.html), Surflex-Dock (http://www.tripos.com/index.php?family=modules,SimplePage . . . &page=Surflex_Dock), ADAM (http://www.immd.co.jp/en/product_2.html), ADDock (http://www.biodelight.com.tw/English/addock_index.html), AuPosSOM (https://www.biomedicale.univ-paris5.fr/aupossom), BetaDock (http://voronoi.hanyang.ac.kr/software.htm), DOCK Blaster (http://blaster.docking.org), DockIt (http://www.metaphorics.com/products/dockit.html), DockVision (http://dockvision.com), eHiTS (http://www.simbiosys.ca/ehits), FITTED (http://fitted.ca/index.php?option=com_content&task=view&id=50&Itemid=40), Fleksy (http://www.cmbisu.nl/software/fleksy), FlexX (http://www.biosolveit.de/flexx), FLIPDock (http://flipdock.scripps.edu/what-is-flipdock), FRED (http://www.eyesopen.com/docs/oedocking/current/html/fred.html), GlamDock (http://www.chil2.de/Glamdock.html), GOLD (http://www.ccdc.cam.ac.uk/products/life_sciences/gold), GPCRautomodel (http://genome.jouy.inra.fr/GPCRautomdl/cgi-bin/welcome.pl), GRAMM-X (http://vakser.bioinformatics.ku.edu/resources/gramm/grammx), HADDOCK (http://www.nmr.chem.uu.nl/haddock), HomDock (http://www.chil2.de/HomDock.html), HYBRID (http://www.eyesopen.com/docs/oedocking/current/html/hybrid.html#hybrid), ICM-Docking (http://www.molsoft.com/docking.html), kinDOCK (http://abcis.cbs.cnrs.fr/LIGBASE_SERV_WEB/PHP/kindock.php), Lead Finder (http://www.moltech.ru), Magnet (http://www.metaphorics.com/products/magnet), MEDock (http://medock.csie.ntu.edu.tw), MVD (http://www.molegro.com/mvd-product.php), ParaDocks (http://www.paradocks.org), PLANTS (http://www.tcd.uni-konstanz.de/research/plants.php), POSIT (http://www.eyesopen.com/docs/posit/current/html/theory.html), Rosetta FlexPepDock (http://flexpepdock.furmanlab.cs.huji.ac.il/index.php), RosettaLigand (http://www.rosettacommons.org/software), SwissDock (http://swissdock.vital-it.ch), SymmDock (http://bioinfo3d.cs.tau.ac.il/SymmDock), TarFisDock (http://www.dddc.ac.cn/tarfisdock), VEGA ZZ (http://www.vepazz.net), VLifeDock (http://www.vlifesciences.com/products/VLifeMDS/VLifeDock.php). (Sravanthi Davuluri and Akhilesh Bajpai (Correspondence: Acharya K K, kshitish@ibab.ac.in), A list of resources for molecular docking; In: Startbioinfo; 23 Oct. 2012, http://www.shodhaka.com/cgi-bin/startbioinfo/prelimresources.pl?tn=Molecular docking), and SKATE.
In some embodiments, the interaction energies calculated by the methods and systems of the present invention are calculated computationally. A number of different programs can be used in this regard, including, for example, AutoGrid. AutoGrid is a program that pre-calculates energies for various atom types, such as aliphatic carbons, aromatic carbons, hydrogen bonding oxygens, and so on, with macromolecules such as, for example, peptides and nucleic acids. Total interaction energies of ligands in complex with targets or off-targets tend to show little correlation with associated activity data, however when component interaction energies (e.g. interaction energies due to electrostatic, van der Waals, and desolvation interactions) are calculated for each proximal sequence element, higher levels of correlation may be observed. In some embodiments of the present invention, when using, for example, PLS for statistical analysis, an r2 value of 0.6 is considered substantially significant, though higher levels of correlation, such as, for example, r2 values of 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95, 1.0, and all ranges in between are possible and within the scope of the present disclosure. Component interaction energies are generally calculated using force fields that include parameters for various atomic species in a number of appropriate submolecular environments (e.g. functional groups). Force fields that are applicable to the methods of the present invention include, but are not limited to, MARTINI, VAMM, ReaxFF, EVB, RWFF, COSMOS-NMR, GEM, NEMO, ORIENT, AMOEBA, SIBFA, CHARMM, AMBER, CPE, PFF, PIPF, DRF90, CFF/ind, ENZYMIX, X-Pol, QVBMM, MM2, MM3, MM4, MMFF, CFF, UFF, QCFF/PI, ECEPP/2, OPLS, GROMOS, GROMACS, and CVFF.
In some embodiments of the present invention, proximal sequence elements are determined computationally. Typically, the distance of a sequence element from a complexed ligand is a variable usually measured from the ligand-binding site on the target or off-target that encompasses those residues of the target with a significant contribution to discriminate relative affinities of ligands.
In some embodiments of the present invention, the statistical models generated by the methods and systems of the present invention are products of heuristic-based multivariate analysis, for example, PLS, neural networks, and support vector machines.
In some embodiments, the statistical models produced by the methods and systems of the present invention may be predictive of those sequence elements of the targets and off-targets most likely to contribute to any differences that exist in the activity data. As discussed above, an r2 value of 0.6 is typically considered substantially significant, though higher levels of correlation, such as, for example, r2 values of 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95, 1.0, and all ranges in between are possible and within the scope of the present disclosure. In some embodiments of the present invention, those ligand-target and ligand-off-target pairs listed in the database may show variability in activity data between them. Then, for example, the predictive methods, models and systems of the present invention may suggest, on a residue-by-residue basis, if a functionally-aligned sequence element is more or less likely to contribute to the variability seen in the activity data.
Thus, in accordance with some embodiments, one of skill in the art would be enabled to select or rationally design an effector molecule that would be predicted, by the methods, models, and systems of the present invention, to have a desired specificity for a target molecule. As discussed above, in some embodiments, the desired specificity may be that seen for a highly specific ligand or it may be that seen for a non-specific ligand (i.e. one with substantially equal specificity for multiple targets). In the former example, one may select or design a ligand that would maximize interactions with those sequence elements predicted to be associated with the desired (i.e. high) level of activity in the target molecule(s) and/or the desired (i.e. low) level of activity in the off-target molecules. Likewise, interactions associated with, for example, low activity in the target molecule and high activity in the off-targets would be minimized. Thus, in some embodiments, an effector would be selected that is likely, based upon the generated statistical model(s), to have specificity for the target molecule that exceeds the specificity of the effector for off-target molecules In the latter example, one may select or design a ligand that would maximize interactions with those sequence elements predicted to not be associated with significant differences in activity data and/or minimize interactions with those sequence elements predicted to be associated with significant differences in activity data. In some embodiments of the present invention, this type of approach may result in effectors selected or designed to have specificity for multiple target molecules. Thus, in some embodiments, an effector would be selected that is likely, based upon the generated statistical model(s), to have specificity for the target molecule that does not exceed the specificity of the effector for off-targets.
In some embodiments, the methods and systems of the present invention may involve experimentally determining the activity data associated with the selected effector in complex with targets and off-targets. Experimental protocols for determining various forms of activity data are extensive and include, but are not limited to, in vitro binding assays executed by any of a number of techniques (including, but not limited to, enzyme inhibition, isothermal titration calorimetry, fluorescence polarization, and radioisotope-labeled binding), in vitro cell-based assays, isolated tissue bioassays (i.e. electrophysiological assays and tissue contractility assays, for example), and whole animal measurements (blood pressure, respiration, heart rate, metabolism, behavioral measurements, and nocioceptive measurements, for example).
In some embodiments, the methods and systems of the present invention may be used iteratively. Experimentally determined activity data from the selected effector in complex with targets and off-targets may be incorporated into the database and the steps of the method repeated. It is not essential that the step concerning establishing structure-based equivalence of the sequence elements be repeated unless new (i.e. not in the database in the previous iteration) targets or off-targets are added to the database in subsequent iterations of the methods. In the event that new targets or off-targets are added to the database, structure-based equivalence may need to be reestablished. Theoretically, with each iteration of the methods of the present invention, the predictive power of the models of the present invention may improve. Thus, the iterative nature of the invention may allow for higher quality predictions as the database becomes larger (i.e. with the addition of new targets and off-targets) and more complete (i.e. with less gaps in the activity data for various complexes). In some embodiments of the present invention, new targets/off-targets and new ligands may be added to the database in subsequent iterations, along with any corresponding activity data. In some embodiments of the present invention, the iterative nature of the methods allows for the use of incomplete databases. For example, if one were attempting to determine a specific inhibitor of HDAC-1 over other HDACs, the database would not need to initially include data for each population ligand in complex with each HDAC. With each iteration of the methods of the present invention, blanks in the ligand-target and ligand-off-target database may be filled in. As previously noted, in one embodiment, the method of the present invention comprises at least two, at least three, at least five, at least ten or even more iterations.
In some embodiments of the present invention, the target molecules constitute enzymes that are known therapeutic targets. An exemplary enzyme useful in the implementation of the present invention is HIV-1 RT. HIV-1 RT continues to be of therapeutic interest in the ongoing effort to provide HIV/AIDS therapeutics that have improved efficacy against drug-resistant mutants of the HIV virus that continue to evolve post-infection.
In some embodiments of the present invention, the target molecules constitute G-protein coupled receptors (GPCRs). GPCRs are one of the most common means of cellular signal transduction and a historically important class of therapeutic targets (Lundstrom, K., et al., 2009). In particular, multiple subtypes of GPCRs are common targets for therapeutics and selectivity of ligands for a given subtype is a common priority (such as, for example, the multiple members of the opioid GPCR family).
In some embodiments of the present invention, the target molecules constitute tyrosine kinases. Over 500 different tyrosine kinases are expressed as another dominant means of cellular signal transduction associated with disease. In this example, once again, discrimination of a ligand for a particular kind or kinds of tyrosine kinase is an important objective.
In some embodiments of the present invention, the target molecules constitute ribosomes. Many classes of antibiotics target ribosomes of microbial pathogens. Unfortunately, many of the most potent show toxic side effects due to their affinity for the ribosomes of eukaryotes. Enhanced selectivity of structurally modified antibiotics for the ribosomes of microbial pathogens versus human ribosomes may provide novel therapeutics against drug-resistant microbes, such as Methicillin-resistant Staphylococcus aureus (MRSA).
In some embodiments, the methods, models, and systems of the present invention can also be used to design transcription factor sequences for recognition of specific DNA initiation sites. Control of gene expression is an emerging therapeutics area. The ability to selectively target a particular initiation site and either stimulate or eliminate gene expression is a desirable therapeutic objective that may be achieved through the use of the present invention.
In some embodiments of the present invention, the ligands constitute antibodies and the target molecules are antigens. For example, humanized antibodies are currently one of the most effective therapeutics in the clinic due to their ability to target diseased cells. Given an antigenic target on a cell such as, for example, epidermal growth factor receptor 2 (EGFR2), one would be able to modify the antibody sequence to enhance the affinity and selectivity for EGFR2, which is overexpressed in many breast cancers.
In some embodiments of the present invention, the ligands constitute DNA aptamers. While random selection of DNA sequences to generate selective aptamers for a given application is effective, the use of the methods, models, and systems of the present invention to further iteratively refine the selectivity for a particular molecular target is envisaged.
It is to be understood that there is no basis for a limitation of the methods, models, and systems of the present invention to a particular class of targets, such as proteins or nucleic acids. This focus only reflects the large amount of structural information available on these therapeutic targets at the time the invention was reduced to practice. Thus,
In some embodiments, the methods of the present invention are performed on the system depicted in
In some embodiments, the methods of the present invention are as described in one or more of the following enumerated embodiments.
A computational method for selecting an effector having specificity for a target molecule, the method comprising:
The method of claim 1, wherein the effector is an inhibitor of the target molecule.
The method of embodiment 1, wherein the effector is an activator of the target molecule.
The method of embodiment 1, wherein the target molecule is a peptide.
The method of embodiment 4, wherein the peptide is a ribosomal peptide.
The method of embodiment 4, wherein the peptide is an enzyme.
The method of embodiment 6, wherein the enzyme is a HIV reverse transcriptase.
The method of embodiment 6, wherein the enzyme catalyzes epigenetic modifications.
The method of embodiment 8, wherein the enzyme that catalyzes epigenetic modifications is a DNA methylation enzyme.
The method of embodiment 8, wherein the enzyme that catalyzes epigenetic modifications is a DNA demethylation enzyme.
The method of embodiment 8, wherein the enzyme that catalyzes epigenetic modifications is a protein methylation enzyme.
The method of embodiment 8, wherein the enzyme that catalyzes epigenetic modifications is a protein demethylation enzyme.
The method of embodiment 8, wherein the enzyme that catalyzes epigenetic modifications is an acetyl transferase.
The method of embodiment 13, wherein the acetyl transferase is a lysine acetyl transferase (KAT).
The method of embodiment 8, wherein the enzyme that catalyzes epigenetic modifications is a deacetylase.
The method of embodiment 15, wherein the deacetylase is a zinc-based lysine deacetylase (KDAC).
The method of embodiment 16, wherein the zinc-based lysine deacetylase is a histone deacetylase (HDAC).
The method of embodiment 15, wherein the deacetylase is a NAD-based lysine deacetylase.
The method of embodiment 1, wherein the target molecule is a nucleic acid.
The method of embodiment 19, wherein the nucleic acid is a ribonucleic acid.
The method of embodiment 20, wherein the ribonucleic acid is a ribozyme.
The method of embodiment 19, wherein the nucleic acid is a deoxyribonucleic acid.
The method of embodiment 22, wherein the deoxyribonucleic acid comprises a protein binding site.
The method of embodiment 23, wherein the protein binding site comprises a promoter.
The method of embodiment 23, wherein the protein binding site comprises a transcription factor binding site.
The method of embodiment 23, wherein the protein binding site is an enhancer binding site.
The method of embodiment 22, wherein the deoxyribonucleic acid comprises an aptamer.
The method of embodiment 1, wherein the population of ligands comprises antibodies.
The method of embodiment 4, wherein the peptide is a G-protein coupled receptor.
The method of embodiment 4, wherein the peptide is a tyrosine kinase.
The method of embodiment 1, wherein the database does not contain activity data for all ligand-molecule pairs.
The method of embodiment 1, wherein structure-based equivalence is established using X-ray crystallography data.
The method of embodiment 1, wherein structure-based equivalence is established using nuclear magnetic resonance spectroscopy data.
The method of embodiment 1, wherein structure-based equivalence is established using cryo-electron microscopy data.
The method of embodiment 1, wherein structure-based equivalence is established using homology modeling.
The method of embodiment 1, wherein likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data are determined computationally.
The method of embodiment 1, wherein likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data are determined experimentally.
The method of embodiment 1, wherein the at least one statistical model is generated from a partial least squares analysis.
The method of embodiment 1, wherein the at least one statistical model is generated from a neural network.
The method of embodiment 1, wherein the at least one statistical model is generated from a support vector machine.
The method of embodiment 1, wherein an effector is selected that is likely, based upon the generated statistical model(s), to have specificity for the target molecule that does not exceed the specificity of the effector for other molecule library member(s).
A method as in any one of the preceding embodiments, wherein the effector is selected to have specificity for multiple target molecules.
A system for selecting an effector having specificity for a target molecule, comprising: means for compiling a database containing (i) three-dimensional structural data for members of a library of molecules each having a known chemical sequence comprising sequence elements, the library comprising the target molecule and other member molecules structurally related to the target molecule, (ii) structural data for members of a population of ligands each having a known chemical structure, and (iii) activity data quantifying an effect of ligand population members upon the activity of molecule library members wherein the ligands of the ligand-molecule pairs are selected from the ligand population members, the molecules of the ligand-molecule pairs are selected from the molecule library members and different ligand-molecule pairs in the set comprise a different ligand, a different molecule, or both a different ligand and a different molecule relative to other ligand-molecule pairs in the set, and wherein the activity data differs for different ligand-molecule pairs in the set; means for establishing structure-based equivalence of the sequence elements and labeling the sequence elements of different molecule library members to reflect said equivalence; means for determining likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data; means for calculating, for the ligand-molecule pairs for which the database comprises activity data, interaction energies of the ligand population member with proximal sequence elements of the molecule library member of the respective ligand-molecule pairs when the ligand population member is in a determined likely spatial orientation; means for generating at least one statistical model that is predictive of those sequence elements of the molecule library members that are likely to contribute to a differential effect of ligand population members on molecule library members using the calculated interaction energies and the activity data corresponding to the ligand-molecule pairs for which the database contains activity data; means for selecting an effector that is likely, based upon the generated statistical model(s), to have specificity for the target molecule that exceeds the specificity of the effector for other molecule library member(s); means for experimentally determining activity data quantifying an effect of the selected effector upon the activity of one or more molecule library members; and, means for at least once, repeating steps (a) and (c) through (g) wherein in a later iteration of steps (a) and (c) through (g) the effector selected in step (f) of an earlier iteration of steps (c) through (g) is a member of the population of ligands.
The system of embodiment 43, wherein the effector is an inhibitor of the target molecule.
The system of embodiment 43, wherein the effector is an activator of the target molecule.
The system of embodiment 43, wherein the target molecule is a peptide.
The system of embodiment 46, wherein the peptide is a ribosomal peptide.
The system of embodiment 46, wherein the peptide is an enzyme.
The system of embodiment 48, wherein the enzyme is a HIV reverse transcriptase.
The system of embodiment 48, wherein the enzyme catalyzes epigenetic modifications.
The system of embodiment 50, wherein the enzyme that catalyzes epigenetic modifications is a DNA methylation enzyme.
The system of embodiment 50, wherein the enzyme that catalyzes epigenetic modifications is a DNA demethylation enzyme.
The system of embodiment 50, wherein the enzyme that catalyzes epigenetic modifications is a protein methylation enzyme.
The system of embodiment 50, wherein the enzyme that catalyzes epigenetic modifications is a protein demethylation enzyme.
The system of embodiment 50, wherein the enzyme that catalyzes epigenetic modifications is an acetyl transferase.
The system of embodiment 55, wherein the acetyl transferase is a lysine acetyl transferase (KAT).
The system of embodiment 50, wherein the enzyme that catalyzes epigenetic modifications is a deacetylase.
The system of embodiment 57, wherein the deacetylase is a zinc-based lysine deacetylase (KDAC).
The system of embodiment 58, wherein the zinc-based lysine deacetylase is a histone deacetylase (HDAC).
The system of embodiment 57, wherein the deacetylase is a NAD-based lysine deacetylase.
The system of embodiment 43, wherein the target molecule is a nucleic acid.
The system of embodiment 61, wherein the nucleic acid is a ribonucleic acid.
The system of embodiment 62, wherein the ribonucleic acid is a ribozyme.
The system of embodiment 61, wherein the nucleic acid is a deoxyribonucleic acid.
The system of embodiment 64, wherein the deoxyribonucleic acid comprises a protein binding site.
The system of embodiment 65, wherein the protein binding site comprises a promoter.
The system of embodiment 65, wherein the protein binding site comprises a transcription factor binding site.
The system of embodiment 65, wherein the protein binding site is an enhancer binding site.
The system of embodiment 64, wherein the deoxyribonucleic acid comprises an aptamer.
The system of embodiment 43, wherein the population of ligands comprises antibodies.
The system of embodiment 46, wherein the peptide is a G-protein coupled receptor.
The system of embodiment 46, wherein the peptide is a tyrosine kinase.
The system of embodiment 43, wherein the database does not contain activity data for all ligand-molecule pairs.
The system of embodiment 43, wherein structure-based equivalence is established using X-ray crystallography data.
The system of embodiment 43, wherein structure-based equivalence is established using nuclear magnetic resonance spectroscopy data.
The system of embodiment 43, wherein structure-based equivalence is established using cryo-electron microscopy data.
The system of embodiment 43, wherein structure-based equivalence is established using homology modeling.
The system of embodiment 43, wherein likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data are determined computationally.
The system of embodiment 43, wherein likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data are determined experimentally.
The system of embodiment 43, wherein the at least one statistical model is generated from a partial least squares analysis.
The system of embodiment 43, wherein the at least one statistical model is generated from a neural network.
The system of embodiment 43, wherein the at least one statistical model is generated from a support vector machine.
The system of embodiment 43, wherein an effector is selected that is likely, based upon the generated statistical model(s), to have specificity for the target molecule that does not exceed the specificity of the effector for other molecule library member(s).
The system as in one of embodiments 43-83, wherein the effector is selected to have specificity for multiple target molecules.
A system for selecting an effector having specificity for a target molecule, comprising: a processor for compiling a database containing (i) three-dimensional structural data for members of a library of molecules each having a known chemical sequence comprising sequence elements, the library comprising the target molecule and other member molecules structurally related to the target molecule, (ii) structural data for members of a population of ligands each having a known chemical structure, and (iii) activity data quantifying an effect of ligand population members upon the activity of molecule library members wherein the ligands of the ligand-molecule pairs are selected from the ligand population members, the molecules of the ligand-molecule pairs are selected from the molecule library members and different ligand-molecule pairs in the set comprise a different ligand, a different molecule, or both a different ligand and a different molecule relative to other ligand-molecule pairs in the set, and wherein the activity data differs for different ligand-molecule pairs in the set, establishing structure-based equivalence of the sequence elements and labeling the sequence elements of different molecule library members to reflect said equivalence, and determining likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data; a calculator for calculating, for the ligand-molecule pairs for which the database comprises activity data, interaction energies of the ligand population member with proximal sequence elements of the molecule library member of the respective ligand-molecule pairs when the ligand population member is in a determined likely spatial orientation; and, a classifier for generating at least one statistical model that is predictive of those sequence elements of the molecule library members that are likely to contribute to a differential effect of ligand population members on molecule library members using the calculated interaction energies and the activity data corresponding to the ligand-molecule pairs for which the database contains activity data.
The system of embodiment 85, wherein the effector is an inhibitor of the target molecule.
The system of embodiment 85, wherein the effector is an activator of the target molecule.
The system of embodiment 85, wherein the target molecule is a peptide.
The system of embodiment 88, wherein the peptide is a ribosomal peptide.
The system of embodiment 88, wherein the peptide is an enzyme.
The system of embodiment 90, wherein the enzyme is a HIV reverse transcriptase.
The system of embodiment 90, wherein the enzyme catalyzes epigenetic modifications.
The system of embodiment 92, wherein the enzyme that catalyzes epigenetic modifications is a DNA methylation enzyme.
The system of embodiment 92, wherein the enzyme that catalyzes epigenetic modifications is a DNA demethylation enzyme.
The system of embodiment 92, wherein the enzyme that catalyzes epigenetic modifications is a protein methylation enzyme.
The system of embodiment 92, wherein the enzyme that catalyzes epigenetic modifications is a protein demethylation enzyme.
The system of embodiment 92, wherein the enzyme that catalyzes epigenetic modifications is an acetyl transferase.
The system of embodiment 97, wherein the acetyl transferase is a lysine acetyl transferase (KAT).
The system of embodiment 92, wherein the enzyme that catalyzes epigenetic modifications is a deacetylase.
The system of embodiment 99, wherein the deacetylase is a zinc-based lysine deacetylase (KDAC).
The system of embodiment 100, wherein the zinc-based lysine deacetylase is a histone deacetylase (HDAC).
The system of embodiment 99, wherein the deacetylase is a NAD-based lysine deacetylase.
The system of embodiment 85, wherein the target molecule is a nucleic acid.
The system of embodiment 103, wherein the nucleic acid is a ribonucleic acid.
The system of embodiment 104, wherein the ribonucleic acid is a ribozyme.
The system of embodiment 103, wherein the nucleic acid is a deoxyribonucleic acid.
The system of embodiment 106, wherein the deoxyribonucleic acid comprises a protein binding site.
The system of embodiment 107, wherein the protein binding site comprises a promoter.
The system of embodiment 107, wherein the protein binding site comprises a transcription factor binding site.
The system of embodiment 107, wherein the protein binding site is an enhancer binding site.
The system of embodiment 106, wherein the deoxyribonucleic acid comprises an aptamer.
The system of embodiment 85, wherein the population of ligands comprises antibodies.
The system of embodiment 88, wherein the peptide is a G-protein coupled receptor.
The system of embodiment 88, wherein the peptide is a tyrosine kinase.
The system of embodiment 85, wherein the database does not contain activity data for all ligand-molecule pairs.
The system of embodiment 85, wherein structure-based equivalence is established using X-ray crystallography data.
The system of embodiment 85, wherein structure-based equivalence is established using nuclear magnetic resonance spectroscopy data.
The system of embodiment 85, wherein structure-based equivalence is established using cryo-electron microscopy data.
The system of embodiment 85, wherein structure-based equivalence is established using homology modeling.
The system of embodiment 85, wherein likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data are determined computationally.
The system of embodiment 85, wherein likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data are determined experimentally.
The system of embodiment 85, wherein the at least one statistical model is generated from a partial least squares analysis.
The system of embodiment 85, wherein the at least one statistical model is generated from a neural network.
The system of embodiment 85, wherein the at least one statistical model is generated from a support vector machine.
The system of embodiment 85, wherein an effector is selected that is likely, based upon the generated statistical model(s), to have specificity for the target molecule that does not exceed the specificity of the effector for other molecule library member(s).
The system as in one of embodiments 85-125, wherein the effector is selected to have specificity for multiple target molecules.
A computational method for selecting an effector having specificity for a target molecule, the method comprising:
The method of embodiment 127, wherein the effector is an inhibitor of the target molecule.
The method of embodiment 127, wherein the effector is an activator of the target molecule.
The method of embodiment 127, wherein the target molecule is a peptide.
The method of embodiment 130, wherein the peptide is a ribosomal peptide.
The method of embodiment 130, wherein the peptide is an enzyme.
The method of embodiment 132, wherein the enzyme is a HIV reverse transcriptase.
The method of embodiment 132, wherein the enzyme catalyzes epigenetic modifications.
The method of embodiment 134, wherein the enzyme that catalyzes epigenetic modifications is a DNA methylation enzyme.
The method of embodiment 134, wherein the enzyme that catalyzes epigenetic modifications is a DNA demethylation enzyme.
The method of embodiment 134, wherein the enzyme that catalyzes epigenetic modifications is a protein methylation enzyme.
The method of embodiment 134, wherein the enzyme that catalyzes epigenetic modifications is a protein demethylation enzyme.
The method of embodiment 134, wherein the enzyme that catalyzes epigenetic modifications is an acetyl transferase.
The method of embodiment 139, wherein the acetyl transferase is a lysine acetyl transferase (KAT).
The method of embodiment 134, wherein the enzyme that catalyzes epigenetic modifications is a deacetylase.
The method of embodiment 141, wherein the deacetylase is a zinc-based lysine deacetylase (KDAC).
The method of embodiment 142, wherein the zinc-based lysine deacetylase is a histone deacetylase (HDAC).
The method of embodiment 141, wherein the deacetylase is a NAD-based lysine deacetylase.
The method of embodiment 127, wherein the target molecule is a nucleic acid.
The method of embodiment 145, wherein the nucleic acid is a ribonucleic acid.
The method of embodiment 146, wherein the ribonucleic acid is a ribozyme.
The method of embodiment 145, wherein the nucleic acid is a deoxyribonucleic acid.
The method of embodiment 148, wherein the deoxyribonucleic acid comprises a protein binding site.
The method of embodiment 149, wherein the protein binding site comprises a promoter.
The method of embodiment 149, wherein the protein binding site comprises a transcription factor binding site.
The method of embodiment 149, wherein the protein binding site is an enhancer binding site.
The method of embodiment 148, wherein the deoxyribonucleic acid comprises an aptamer.
The method of embodiment 127, wherein the population of ligands comprises antibodies.
The method of embodiment 130, wherein the peptide is a G-protein coupled receptor.
The method of embodiment 130, wherein the peptide is a tyrosine kinase.
The method of embodiment 127, wherein the database does not contain activity data for all ligand-molecule pairs.
The method of embodiment 127, wherein structure-based equivalence is established using X-ray crystallography data.
The method of embodiment 127, wherein structure-based equivalence is established using nuclear magnetic resonance spectroscopy data.
The method of embodiment 127, wherein structure-based equivalence is established using cryo-electron microscopy data.
The method of embodiment 127, wherein structure-based equivalence is established using homology modeling.
The method of embodiment 127, wherein likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data are determined computationally.
The method of embodiment 127, wherein likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data are determined experimentally.
The method of embodiment 127, wherein the at least one statistical model is generated from a partial least squares analysis.
The method of embodiment 127, wherein the at least one statistical model is generated from a neural network.
The method of embodiment 127, wherein the at least one statistical model is generated from a support vector machine.
The method of embodiment 127, wherein an effector is selected that is likely, based upon the generated statistical model(s), to have specificity for the target molecule that does not exceed the specificity of the effector for other molecule library member(s).
A method as in one of embodiments 127-167, wherein the effector is selected to have specificity for multiple target molecules.
A system for selecting an effector having specificity for a target molecule, comprising: means for compiling a database containing (i) three-dimensional structural data for members of a library of molecules each having a known chemical sequence comprising sequence elements, the library comprising the target molecule and other member molecules, (ii) structural data for members of a population of ligands each having a known chemical structure, and (iii) activity data quantifying an effect of ligand population members upon the activity of molecule library members wherein the ligands of the ligand-molecule pairs are selected from the ligand population members, the molecules of the ligand-molecule pairs are selected from the molecule library members and different ligand-molecule pairs in the set comprise a different ligand, a different molecule, or both a different ligand and a different molecule relative to other ligand-molecule pairs in the set, and wherein the activity data differs for different ligand-molecule pairs in the set; means for determining likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data; means for establishing equivalence of the sequence elements based on determined likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the data comprises activity data and labeling the sequence elements of different molecule library members to reflect said equivalence; means for calculating, for the ligand-molecule pairs for which the database comprises activity data, interaction energies of the ligand population member with proximal sequence elements of the molecule library member of the respective ligand-molecule pairs when the ligand population member is in a determined likely spatial orientation; means for generating at least one statistical model that is predictive of those sequence elements of the molecule library members that are likely to contribute to a differential effect of ligand population members on molecule library members using the calculated interaction energies and the activity data corresponding to the ligand-molecule pairs for which the database contains activity data; means for selecting an effector that is likely, based upon the generated statistical model(s), to have specificity for the target molecule that exceeds the specificity of the effector for other molecule library member(s); means for experimentally determining activity data quantifying an effect of the selected effector upon the activity of one or more molecule library members; and, means for at least once, repeating steps (a) through (g) wherein in a later iteration of steps (a) through (g) the effector selected in step (f) of an earlier iteration of steps (a) through (g) is a member of the population of ligands.
The system of embodiment 169, wherein the effector is an inhibitor of the target molecule.
The system of embodiment 169, wherein the effector is an activator of the target molecule.
The system of embodiment 169, wherein the target molecule is a peptide.
The system of embodiment 172, wherein the peptide is a ribosomal peptide.
The system of embodiment 172, wherein the peptide is an enzyme.
The system of embodiment 174, wherein the enzyme is a HIV reverse transcriptase.
The system of embodiment 174, wherein the enzyme catalyzes epigenetic modifications.
The system of embodiment 176, wherein the enzyme that catalyzes epigenetic modifications is a DNA methylation enzyme.
The system of embodiment 176, wherein the enzyme that catalyzes epigenetic modifications is a DNA demethylation enzyme.
The system of embodiment 176, wherein the enzyme that catalyzes epigenetic modifications is a protein methylation enzyme.
The system of embodiment 176, wherein the enzyme that catalyzes epigenetic modifications is a protein demethylation enzyme.
The system of embodiment 176, wherein the enzyme that catalyzes epigenetic modifications is an acetyl transferase.
The system of embodiment 181, wherein the acetyl transferase is a lysine acetyl transferase (KAT).
The system of embodiment 176, wherein the enzyme that catalyzes epigenetic modifications is a deacetylase.
The system of embodiment 183, wherein the deacetylase is a zinc-based lysine deacetylase (KDAC).
The system of embodiment 184, wherein the zinc-based lysine deacetylase is a histone deacetylase (HDAC).
The system of embodiment 183, wherein the deacetylase is a NAD-based lysine deacetylase.
The system of embodiment 169, wherein the target molecule is a nucleic acid.
The system of embodiment 187, wherein the nucleic acid is a ribonucleic acid.
The system of embodiment 188, wherein the ribonucleic acid is a ribozyme.
The system of embodiment 187, wherein the nucleic acid is a deoxyribonucleic acid.
The system of embodiment 190, wherein the deoxyribonucleic acid comprises a protein binding site.
The system of embodiment 191, wherein the protein binding site comprises a promoter.
The system of embodiment 191, wherein the protein binding site comprises a transcription factor binding site.
The system of embodiment 191, wherein the protein binding site is an enhancer binding site.
The system of embodiment 190, wherein the deoxyribonucleic acid comprises an aptamer.
The system of embodiment 169, wherein the population of ligands comprises antibodies.
The system of embodiment 172, wherein the peptide is a G-protein coupled receptor.
The system of embodiment 172, wherein the peptide is a tyrosine kinase.
The system of embodiment 169, wherein the database does not contain activity data for all ligand-molecule pairs.
The system of embodiment 169, wherein structure-based equivalence is established using X-ray crystallography data.
The system of embodiment 169, wherein structure-based equivalence is established using nuclear magnetic resonance spectroscopy data.
The system of embodiment 169, wherein structure-based equivalence is established using cryo-electron microscopy data.
The system of embodiment 169, wherein structure-based equivalence is established using homology modeling.
The system of embodiment 169, wherein likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data are determined computationally.
The system of embodiment 169, wherein likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data are determined experimentally.
The system of embodiment 169, wherein the at least one statistical model is generated from a partial least squares analysis.
The system of embodiment 169, wherein the at least one statistical model is generated from a neural network.
The system of embodiment 169, wherein the at least one statistical model is generated from a support vector machine.
The system of embodiment 169, wherein an effector is selected that is likely, based upon the generated statistical model(s), to have specificity for the target molecule that does not exceed the specificity of the effector for other molecule library member(s).
A system as in one of embodiments 169-209, wherein the effector is selected to have specificity for multiple target molecules.
A system for selecting an effector having specificity for a target molecule, comprising: a processor for compiling a database containing (i) three-dimensional structural data for members of a library of molecules each having a known chemical sequence comprising sequence elements, the library comprising the target molecule and other member molecules, (ii) structural data for members of a population of ligands each having a known chemical structure, and (iii) activity data quantifying an effect of ligand population members upon the activity of molecule library members wherein the ligands of the ligand-molecule pairs are selected from the ligand population members, the molecules of the ligand-molecule pairs are selected from the molecule library members and different ligand-molecule pairs in the set comprise a different ligand, a different molecule, or both a different ligand and a different molecule relative to other ligand-molecule pairs in the set, and wherein the activity data differs for different ligand-molecule pairs in the set, determining likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data, and establishing equivalence of the sequence elements based on determined likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the data comprises activity data and labeling the sequence elements of different molecule library members to reflect said equivalence; a calculator for calculating, for the ligand-molecule pairs for which the database comprises activity data, interaction energies of the ligand population member with proximal sequence elements of the molecule library member of the respective ligand-molecule pairs when the ligand population member is in a determined likely spatial orientation; and a classifer for generating at least one statistical model that is predictive of those sequence elements of the molecule library members that are likely to contribute to a differential effect of ligand population members on molecule library members using the calculated interaction energies and the activity data corresponding to the ligand-molecule pairs for which the database contains activity data.
The system of embodiment 211, wherein the effector is an inhibitor of the target molecule.
The system of embodiment 211, wherein the effector is an activator of the target molecule.
The system of embodiment 211, wherein the target molecule is a peptide.
The system of embodiment 214, wherein the peptide is a ribosomal peptide.
The system of embodiment 214, wherein the peptide is an enzyme.
The system of embodiment 216, wherein the enzyme is a HIV reverse transcriptase.
The system of embodiment 216, wherein the enzyme catalyzes epigenetic modifications.
The system of embodiment 218, wherein the enzyme that catalyzes epigenetic modifications is a DNA methylation enzyme.
The system of embodiment 218, wherein the enzyme that catalyzes epigenetic modifications is a DNA demethylation enzyme.
The system of embodiment 218, wherein the enzyme that catalyzes epigenetic modifications is a protein methylation enzyme.
The system of embodiment 218, wherein the enzyme that catalyzes epigenetic modifications is a protein demethylation enzyme.
The system of embodiment 218, wherein the enzyme that catalyzes epigenetic modifications is an acetyl transferase.
The system of embodiment 223, wherein the acetyl transferase is a lysine acetyl transferase (KAT).
The system of embodiment 218, wherein the enzyme that catalyzes epigenetic modifications is a deacetylase.
The system of embodiment 225, wherein the deacetylase is a zinc-based lysine deacetylase (KDAC).
The system of embodiment 226, wherein the zinc-based lysine deacetylase is a histone deacetylase (HDAC).
The system of embodiment 225, wherein the deacetylase is a NAD-based lysine deacetylase.
The system of embodiment 211, wherein the target molecule is a nucleic acid.
The system of embodiment 229, wherein the nucleic acid is a ribonucleic acid.
The system of embodiment 230, wherein the ribonucleic acid is a ribozyme.
The system of embodiment 229, wherein the nucleic acid is a deoxyribonucleic acid.
The system of embodiment 232, wherein the deoxyribonucleic acid comprises a protein binding site.
The system of embodiment 233, wherein the protein binding site comprises a promoter.
The system of embodiment 233, wherein the protein binding site comprises a transcription factor binding site.
The system of embodiment 233, wherein the protein binding site is an enhancer binding site.
The system of embodiment 232, wherein the deoxyribonucleic acid comprises an aptamer.
The system of embodiment 211, wherein the population of ligands comprises antibodies.
The system of embodiment 214, wherein the peptide is a G-protein coupled receptor.
The system of embodiment 214, wherein the peptide is a tyrosine kinase.
The system of embodiment 211, wherein the database does not contain activity data for all ligand-molecule pairs.
The system of embodiment 211, wherein structure-based equivalence is established using X-ray crystallography data.
The system of embodiment 211, wherein structure-based equivalence is established using nuclear magnetic resonance spectroscopy data.
The system of embodiment 211, wherein structure-based equivalence is established using cryo-electron microscopy data.
The system of embodiment 211, wherein structure-based equivalence is established using homology modeling.
The system of embodiment 211, wherein likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data are determined computationally.
The system of embodiment 211, wherein likely spatial orientations of the ligand population members in the ligand-molecule pairs for which the database comprises activity data are determined experimentally.
The system of embodiment 211, wherein the at least one statistical model is generated from a partial least squares analysis.
The system of embodiment 211, wherein the at least one statistical model is generated from a neural network.
The system of embodiment 211, wherein the at least one statistical model is generated from a support vector machine.
The system of embodiment 211, wherein an effector is selected that is likely, based upon the generated statistical model(s), to have specificity for the target molecule that does not exceed the specificity of the effector for other molecule library member(s).
A system as in one of embodiments 211-251, wherein the effector is selected to have specificity for multiple target molecules.
The following examples are provided to further illustrate the methods and systems of the present invention. These examples are illustrative only and are not intended to limit the scope of the invention in any way.
All molecular graphics images were produced using UCSF Chimera package (www.cgl.ucsf.edu/chimera/) from the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco on a 3 Ghz AMD CPU-equipped, IBM-compatible workstation using the Debian 5.0 version of the Linux operating system. For all calculations, a Beowulf cluster of 12 quadcore Xeon CPUs was used.
Inhibitor Structures.
All ligands used were generated with Chemaxon Marvin molecular mechanics software (http://www.chemaxon.com/) and used without further optimization. The protonation and tautomer states were assigned considering a physiological pH and the more common tautomer according to basic organic chemistry and structural information reported in the corresponding ligand referenced papers.
HDAC Homology Models.
Those HDAC isoforms whose experimental structures were not available (HDAC-1, -3, -5, -6-1, -6-2, -9, -10 and -11), were built by homology modeling using 4 automated web servers:
Several protein conformations for each HDAC isoform were used to include some target flexibility in the subsequent training set and test set cross-docking runs. For each HDAC isoform, 4 homology models were generated. All inhibitors were modeled into each of the four-homology models and the resulting complexes energy minimized to supply four complexes for each inhibitor, leading to 220 complexes. The servers were used with their default parameters and in a totally automated way to avoid human intervention and to allow maximum reproducibility.
To compile the final training set of 94 complexes (see Training Set section below), one homology complex per inhibitor was chosen using the preliminary DISCRIMINATE models derived with only crystallized HDAC complexes. For each inhibitor, the HDAC/inhibitor complex whose predicted pIC50s had the best fit to the experimental pIC50s for that isoform was selected and utilized in the final training set (Table 1).
Complex Minimization.
Training set complexes were submitted to a single-point minimization using a protocol described previously. (Musmuca, I., et al., 2010) Briefly, the minimization protocol was applied as follows. (1) ANTECHAMBER with AM1-BCC charges was used to determine missing ligand parameters; (2) the tLeap module was used to solvate the complexes with water molecules in a octahedral box extending 10 {acute over (Å)} and to neutralize them with Na+ and Cl− ions; (3) the structures were minimized with the Amber 2003 force field by energy minimization with the SANDER modules: 1000 steps of steepest-descent energy minimization followed by 4000 steps of conjugate-gradient energy minimization, with a non-bonded cutoff of 5 {acute over (Å)}. Trials for longer non-bonded cutoff values were done without substantial differences, therefore the 5 {acute over (Å)} was chosen for faster calculations. The Zn ion was treated as non-bonded, similarly as in several other applications where HDACs were reported.
Ligand/Residues Interactions.
The calculation of the ligand/residue interactions was conducted similarly as previously reported. (Ballante, F., et al., 2012). The AutoGrid module of AutoDock was used with its default setting to compute the interaction energies between each amino-acid residue of the enzymes and an inhibitor. AutoGrid used the united-atom AMBER force field and returned an energy value combining Lennard-Jones (LJ) and hydrogen-bonding (HB) energies between a target and each atom type (probe). The electrostatic interactions were calculated using a distant-dependent Coulombic function and finally, a third score for hydrophobic interactions was also estimated. In its original use, AutoGrid calculated the interaction energies of a probe atom that was placed on a regularly spaced grid in which a molecular target (the protein) or a portion of it was buried. In this way AutoGrid returns what is called the molecular interaction field (MIF) of a given target, where at each grid point it estimates the interaction values for LJ and HB (STE), electrostatic (ELE) and desolvation (DRY), and saves them in three distinct map files. In the DISCRIMINATE approach, the target was the inhibitor in the complex and the STE, ELE and DRY interactions were calculated using a grid box centered, at each step, on each atom of the protein (the probe). To the grid is given a step size so that the whole complex was contained within it, and thus only one value was returned (the center) for each field. The interaction energy for each amino acid of the enzyme was simply obtained by summing all the values for all residue atoms. The calculations were performed in a box with dimensions of 70×128×74 {acute over (Å)}. This procedure allowed the decomposition of the enzyme/inhibitor interactions energies into three main contributions (fields) as follows: steric, electrostatic and hydrophobic. The default parameters for Zn in AutoGrid were used and no attempts to include intramolecular terms were done.
Statistical Analysis.
All statistical calculations were performed with R, a free software environment for statistical computing and graphics. For the final training set, seven different combinations of the fields previously calculated were tried: the single fields (STE, ELE and DRY) and the multi-field ELE+STE, ELE+DRY, STE+DRY and ELE+STE+DRY.
Partial Least Squares (PLS).
All the calculations were conducted using the PLS and cross-validation features of the PLS package described by Mevik. (Mevik, B.-H., et al., 2007). An in-house R script was compiled to import the data and carry out all calculations.
BUW.
Furthermore, in the case of multiple probes, a scaling procedure, called Block Unscaled Weights (BUW), was applied as data pretreatment. This procedure enforces the same importance to each interaction type within the model, normalizing the energy distribution of the X-variables as described by Kastenholz et al. (Kastenholz, M. A., et al., 2000). BUW coefficients are reported in Table 2.
AutoDock Settings. The AutoDockTools package was employed to generate the docking input files and to analyze the docking results. A grid box size of 57×44×53 with a spacing of 0.375 {acute over (Å)} between the grid points was implemented. A total of 100 runs were generated by using the genetic algorithm, while the remaining run parameters were maintained at their default setting. A cluster analysis was carried out using 2 {acute over (Å)} as the RMSD tolerance.
AutoDockVina Settings.
The same AutoDock grid box was used for its calculations. The docking simulations were carried out with an energy range of 10 kcal/mol and exhaustiveness of 100. The output comprised 20 different conformations for every receptor considered. Although Vina does not include any clustering of the output poses, the clustering feature of the AutoDock program was used to inspect the conformation families using a clustering tolerance set at 2 {acute over (Å)}.
The comparative binding energy (COMBINE) approach is a structure-based 3-D QSAR method that uses a series of receptor-ligand complexes to quantify interaction energies by molecular mechanics (Ortiz, A. R., et al., 1997, Ortiz, A. R., et al., 1995, Perez, C., et al., 1998, Lozano, J. J., et al., 2000). The fundamental idea of a COMBINE analysis is that a simple expression for the differences in binding affinity of a series of related ligand-receptor complexes can be derived by using multivariate statistics to correlate experimental data on binding affinities with per residue ligand-receptor interactions, computed from 3-D structures. The basis of the COMBINE method is the assumption that the protein-receptor binding free energy, ΔG, can be approximated by a weighted sum of n terms, ΔU, each describing the change in property u upon binding as described by the following equation:
From this expression, biological activities may be derived by assuming that these quantities are linear functions of ΔG. The expression is derived by analyzing the interaction of a set of ligands with experimentally known binding affinities for a target receptor (Ortiz, A. R., et al., 1995).
In order to apply this approach to predict the selective inhibition of HDAC isozymes, a modified protocol, called DISCRIMINATE (Ballante, F., et al., 2012) (depicted generally in
Training Set:
Nine experimental 3-D structures of HDAC-2, -4, -7 and -8 co-crystallized with different ligands were retrieved from the Protein Data Bank (Bernstein, F. C., et al., 1977) (Table 3). The remaining HDAC isoforms whose experimental structures were not experimentally available (HDAC-1, -3, -5, -6-1, -6-2, -9, -10 and -11) were built by homology modeling. In the case of HDAC-6, both the histone- and tubuline-catalytic domains were built (histones: HDAC-6-1 and tubulin: HDAC-6-2) with the same experimental inhibitory activities assigned to each complex.
In addition to co-crystallized inhibitors, other compounds (Table 4) reported simultaneously from the same laboratory by Blackwell et al. (Blackwell, L., et al., 2008) were selected. The data set composed of 15 different inhibitors and 12 HDAC isoforms was reduced from the theoretical number of 180 to 94 due to a lack of complete isozyme-inhibitory data. Therefore, the final training set summarized in Table 5 comprised 39 complexes derived with crystallized structures, built according to structural similarity of modeled inhibitors with co-crystallized compounds and 55 complexes derived from homology models. The latter are generated according to the web-servers used for producing the homology models (see “HDAC Homology Models” section, above).
The training set complexes were energy minimized with Amber 10 (Case, D. A., et al., 2005) and multiply aligned using Modeller (Fiser, A., et al., 2003) to establish structure-based residue equivalence. This alignment provided the structural basis for computing the molecular-interaction fields with a corresponding per-residue basis for all enzyme isoforms. Because different isoforms of HDACs show structural diversity in terms of amino-acid sequences and differed in numbers of amino acids (multi-target study), all HDACs residues were renumbered in an arbitrary way: the same numbering was assigned to those residues showing spatial superimposition; conversely, a “ghost” residue was attributed to the regions which presented structural diversity (see Supplemental File 5). In this way, a total of 571 amino-acid residues, 12-fragmented HDACs isoform structures were obtained. The calculation of the ligand/residues was conducted similarly as previously reported (Ballante, F., et al., 2012). The calculated molecular descriptors were imported in R (Ballante, F. and Ragno, R., 2012) to generate structure-based 3-D QSAR models. The purpose of training-set complex minimization was to generate not only 94 optimized complexes, but also to have several conformations for each HDAC useful in the subsequent preparation of test-set complexes by ligand cross-docking (see below).
Each derived DISCRIMINATE model was subjected to internal (cross-validation) and external (test-set) assessments. Cross-validation was done using both the leave-one-out (LOO) and random 5 groups leave-some-out (R5G-LSO) techniques. For external validation, a series of molecules with known inhibitory activity against HDAC isozymes was selected as an external test set for the model's predictability assessment.
External Test Sets for the DISCRIMINATE Model Validation.
Three different test sets were used for external validation. The first one (modeled test set, MTS) contained a series of molecules, docked with AutoDockVina (Trott, O., et al., 2010), that showed inhibitory activity against several HDAC isoforms (Table 6).
The second test set was comprised of a series of co-crystallized complexes structures (crystal test set, CTS) containing two HDAC8 complexes (not available from the PDB during model development) and four bacterial HDAC homologs (Table 7). The third test set was also modeled, using largazole (a cyclotetrapeptide—containing HDAC inhibitor, largazole test set, LTS) whose crystal structure with HDAC8 was reported, (Cole, K. E., et al., 2011) but whose inhibitory activity was available only for four HDAC isoforms (Table 8). For LTS, largazole was docked with HDAC1, HDAC2, HDAC3 and HDAC6-1. The bacterial HDAC complexes with hydroxamic acids were available from the PDB (Table 7).
DISCRIMINATE Models—
Overall analysis. All final models contained 94-inhibitor/enzyme complexes spanning an activity range, expressed as pIC50, between 2.7 (NABUT against HDAC5) to 8.4 (SCRIPTAID against HDAC6). The statistical results of the final models are summarized in Table 9. Genetic algorithm variable-selection was applied, but provided little improvement in either descriptive or predictive performance, hence the non-GA-optimized models were used.
Structure-activity relationships of the various HDAC inhibitors have previously been described in other studies. (Ragno, R., et al., 2006, Ragno, R., et al., 2008). Crystal structures of receptor-ligand complexes have been analyzed qualitatively or by comparison of bound ligands. (Mai, A., et al., 2002, Mai, A., et al., 2003). DISCRIMINATE analysis permits quantification of structure-activity relationships through the electrostatic (coulombic) and van der Waals interaction energies as well as additional parameters, such as solvation energy. Distinguished from the original COMBINE procedure of Ortiz (Ortiz, A. R., et al., 1995), DISCRIMINATE computes enzyme/ligand interactions using the AutoGrid program based on the AMBER united-atom force field and chosen for its simpler molecular format (PDBQT). The data in Table 9 refer to the mono-probe fields (ELE, STE, DRY) and the multi-probe ones: electrostatic-steric (ELE+STE), electrostatic-desolvation (ELE+DRY) and electrostatic-steric-desolvation (ELE+STE+DRY). The reported statistical coefficients allowed estimates of goodness and robustness of each model. Results indicated the ELE+DRY model as the best. In fact, the overall generated model showed the highest conventional squared correlation coefficient (r2) and lowest standard deviation error of calculation (SDEC) values: 0.80 and 0.73 respectively (
The charts in
Finally, both robustness and absence-of-chance correlation of the DISCRIMINATE models listed in Table 9 were checked by random scrambling (Y-scrambling). Through this approach, a random reassignment of inhibitory activity to compounds of the data set was achieved to generate numerous datasets; for each scrambled dataset, a R5G-LSO cross-validation was run. One hundred Y-scrambling runs were examined; their analysis revealed that only 6% of all Y vectors had a correlation with the original Y values with maximum scrambled q2 of only 0.08 in the case of ELE+DRY probe. Regarding the other models, in the case of ELE and ELE+STE+DRY, a chance correlation of 4% and 5% with a q2 maximum value of 0.04 and 0.07 were observed, respectively. The ELE+STE probe showed a chance correlation of 2% with a q2 maximum value of 0.05. These correlations appear random and excluded possible correlations between the original Y vector and the scrambled Y vectors. For the best model (ELE+DRY) in 100-random scrambled models, the number of positive q2 values were only 6 leading to a probability of chance correlation lower than 1% with a q2 value of 0.1, quite acceptable results considering the cross-validation coefficients of 0.76 of the model. Cross-validation runs using the most stringent leave-half-out method confirmed the robustness of the models.
ELE-DRY Model Interpretation.
Interpretation of DISCRIMINATE models can identify the residues relevant for differences in activity and quantify their relative importance. To this aim, the PLS-coefficients (
To analyze the significance of the fields (ELE and DRY) and the contribution for each ligand/residue interaction, the residues were color-coded in Table 10. The residues located in the rim region are colored red, while the residues forming the central channel are blue, and those in proximity to the catalytic Zn ion are black (Supplemental File 2). In
Regarding the importance of the overall interactions, the sums for either the ELE or DRY activity contributions for each training-set complex are shown in
Field ELE.
All residues selected having PLS Coeff. higher than 0.001, except for 398, showed positive values, indicating that all the electrostatic interaction are attractive (
In the outer part of the enzymes, the five selected residues (
Regarding the channel-forming residues, 294 (at the edge between the channel and the bottom of the HDAC-binding sites) displayed the highest values in all three plots of
Most of the ELE-selected residues (18 out of 27) are in the deep part of the channels around the catalytic Zn. Of particular interest are residues, involved in HDAC catalytic process conserved among the 12 isoforms, as follows: residues 253 (His), 254 (His), 292 (Asp), 392 (Asp) and 571 (Zn). In general the activity contribution associated with these five residues modulates the activity decrement for carboxylate-based zinc-binding groups. As examples, residues 253 (SAHA in HDAC1) and 254 (SAHA in HDAC3, HDAC4 and HDAC6-2; and SBHA in HDAC4 and HDAC8) are associated with a positive activity contribution of about 0.1.
Field DRY.
The DRY field gives a rough estimation of steric interactions. Between ELE and DRY selected residues about 35% of these are shared (12 out of 34) in significance, nevertheless, for the DRY field a totally different and more complicated scenario can be observed on the relative importance of each residue. In general, the most important modulating interaction relates to 401 Leu, replaced by Met in HDAC8 or by Lys in HDAC6-1 (Table 10). Upon deeper inspection (not considering the small-molecule complexes, NABUT, VA and NHB), only 27 of 94 activities are modulated by residue 401 with activity contributions ranging between 0.7 and 2.13 (Supplemental File 1,
Without considering the contribution of residue 401, it is evident from the plot in
Seven out of 10 residues (204, 253, 254, 262, 263, 294 and 442) are related to negative modulating values, while the other three (205, 206 and 323) are positive modulators. Residue 263 (Tyr for HDAC6-1 and Phe for the others) located in the wall of the channel shows the largest range with larger negative values. No specific pattern is detected for residue 263 in modulating regarding the different enzyme classes or inhibitor structures (Supplemental File 1,
Residue 254 (His in the zinc-binding region) is second with the higher StDev value and from
Among the three DRY positive-modulating residues, 323, an aromatic side-chain-bearing residue missing in HDAC1 and HDAC11, shows the highest maximum-activity contribution and larger variability; maximum-activity contributions occur with APHA8 and TSA binding to either class I or class II enzymes (
Analysis of Interactions Contributing to Isoform Selectivity.
Interaction- and activity-contribution analyses suggest that useful insight into structural determinants exists for both HDAC isoforms and their inhibitors to help optimize isoform-specific inhibitors using the derived DISCRIMINATE model. Derivation of rules to guide the structural basis for isoform selectivity required single analysis for each specific isoform model. For nine of the inhibitors used in the training set (Table 4), at least 9 out of 12 isoform-inhibition profiles were available (Table 12, Supplemental File 1).
In Supplemental File 3 are reported the recalculated activity profiles for each of the nine inhibitors of Table 4 showing the models sensitivity to HDAC-isoform inhibition by different compounds. To illustrate the DISCRIMINATE model's potential use, two inhibitors were selected seeking potential structure determinants for isoform selectivity. Among the training set, analysis on the activity range indicated MS-275 and SCRIPTAID as good examples. From Supplemental File 1, Table 12, MS-275 and
SCRIPTAID display large variability, and from Table 4 MS-275 results partially selective for class I HDACs (particularly for HDAC3 IC50=0.07 μM and HDAC2 IC50=0.5 μM), while SCRIPTAID is partially selective for class II displaying sub-micromolar activities against these enzymes.
MS-275. This inhibitor is specifically selective for class I HDAC3 over class IIa HDAC4 and comparison of data belonging to the relative complexes shows how the model helps rationalize the higher activity of MS-275 for HDAC3 versus HDAC4. As shown in
SCRIPTAID.
SCRIPTAID was chosen as a selective class II inhibitor. Similarly to MS-275, the electrostatic interactions differentiated when comparing the activity contributions of HDAC6 and HDAC8 (
Docking Assessment.
X-ray structures of HDAC-inhibitors were used to evaluate the ability of a docking program to predict the correct geometry of protein-ligand complex (Redocking). To this aim, two different docking programs were tested: AutoDock Ver. 4.2 and AutoDockVina Ver. 1.1. Docking results were assessed with RMSD (root-mean-square deviation) of the predicted ligand configuration versus the crystal structure. Tables 13 and 14 show RMSD values for best docked (the lowest energy docked conformation of the first cluster generated), best cluster (the lowest energy docked conformation of the most populated cluster) and best fit (the lowest energy conformation of the cluster showing the lowest RMSD value) (Musmuca, I., et al., 2010), obtained with the two programs. In all cases AutoDockVina was found to be more accurate displaying a docking accuracy (DA) of 75% for the best cluster poses (Tables 13 and 14). AutoDockVina was able to predict the right binding disposition of all ligands with a RMSD<3 Å. From Tables 13 and 14, the best cluster conformation displayed the lowest RMSD values. For subsequent dockings, therefore, only the AutoDockVina program was used considering the best cluster conformation as the first choice. Considering the Best Fit pose, AutoDockVina proved to be able to find the correct binding mode with a DA of 100%. Although the Best Fit poses is irrelevant for the docking applicability, it further supported that AutoDockVina is quite good in searching for the right conformation, but the scoring function is not able to select it. For docking, the side-chain flexibility features of AutoDock and AutoDockVina were not used as the results were always worse than in fixed receptor dockings in preliminary docking studies.
Model Predictivity.
Once the docking protocols were assessed, cross-docking approach was applied to the MTS, CTS and LTS test sets of inhibitors to prepare the HDAC-x complexes.
Modeled Test Set.
Regarding the MTS, all minimized HDAC structures were used as templates for docking simulations. Thus, each inhibitor of Table 6 was docked into all receptor binding sites, a total of 304 individual docking simulations. For each isoform, all poses were collected in a bin and the output poses clustered by means of the AutoDock program. It was found that AutoDockVina had the ability to reproduce the experimental binding modes with modest errors (Table 14); in some cases, the best cluster conformation was found in a non-active pose (i.e. the zinc-binding group rotated away from the Zn ion). This clearly indicated the limitations of the docking protocol in selecting the correct poses. In these cases, either the best-docked pose or an arbitrary-chosen conformation on the basis of Zn chelation that mimicking the binding mode of closest-related experimentally bound inhibitor was used. This approach is consistent with the fact that AutoDock Vina proved to be able to find the right binding mode (see comments for the Best Fit pose in Docking Assessment section). For MTS, a total of 76 HDAC-inhibitors complexes were compiled, and the ELE+DRY DISCRIMINATE model was used to predict inhibitors activities.
Comparisons of predictions for single HDAC isoforms reveal that complexes of HDAC2 and HDAC3 were the best predicted with an average absolute error of prediction (AAEP) of 0.53 and 0.65, respectively. Complexes related with HDAC7, HDAC9, HDAC10 and HDAC11 showed the highest AAEP values. For HDAC9, HDAC10 and HDAC11, the worst predictions were associated with a lower number of complexes in the training set. In general, the model was able to reproduce the activity of class I HDACs better than class II. Regarding HDAC10 and HDAC11, the smaller amount of experimental data in the training set was the probable cause for the failed activity-trend predictions (
Crystal Test Set.
The CTS was compiled using only experimental bound inhibitors. The usefulness of this test set was two-fold. Firstly, from Table 16, the training-set model-binding conformations were confirmed to be self-consistent with only 2 PCs (
Largazole Test Set. Finally the third test set comprised a cyclotetrapeptide-like inhibitor (largazole) (Cole, K. E., et al., 2011). In this case the model was tested for its predictive ability against a class of inhibitor (peptide-like) totally different from those included in the training set. To some extent, the DISCRIMINATE model was able to recognize the relative potency of largazole for HDAC1, HDAC2 and HDAC6-1; while for HDAC3, the predicted pIC50 was underestimated, indicating that further modeling of this class of inhibitor is needed (Table 17 and
A structure-based 3-D QSAR model using comparative binding-energy analysis that focused on the selectivity of the 11 human zinc-based histone deacetylase isoforms has been developed through a modified protocol called DISCRIMINATE. The derived DISCRIMINATE model shows good statistical coefficients, was predictive for the compounds in the test sets, and robust to cross-validation while omitting multiple data. The model was able to rationalize the different activity profiles of the HDAC inhibitors studied. This model provides a useful tool for the a priori prediction of activity of compounds yet to be synthesized in order to improve their selectivity profiles. The role of dynamic acetylation in epigenetics and other signaling pathways (Choudhary, C., et al., 2009) provides strong motivation for the development of molecular scalpels, specific inhibitors of histone deacetylases, to dissect the complexities of epigenetic control of gene expression and other signaling pathways. The DISCRIMINATE model would prove useful in this endeavor.
Molecular Modeling, DISCRIMINATE, and Docking Calculations.
All molecular modeling calculations were performed on a 6 blades (8 Intel-Xeon E5520 2.27 GHz CPU and 24 GB DDR3 RAM each) cluster (48 CPU total) running the Debian GNU/Linux 5.03 operating system. The experimental activities of EFV and NVP reported by Rotili et al. (Rotili, D., et al., 2012) were performed according to previous studies. (Cancio, R., et al., 2007, Samuele, A., et al., 2008). To build the non-experimental complexes, the cross-docking procedure previously described (Musmuca, I., et al., 2010) was used by the AutodockVina program. Docking assessment was checked for either Autodock 4.2 or AutodockVina 1.1, root mean square deviation (RMSD) errors are reported in Table 18.
All complexes were arbitrary superimposed using as template 1vrt, since its superior crystallographic resolution (R=2.2 Å). The superimpositions of the RT complexes were made with Chimera (Pettersen, E. F., et al., 2004) using the command-line implementation of MatchMaker. (Meng, E. C., et al., 2006). Prior any minimization, all crystal waters were discarded following a procedure already described (Mai, A., et al., 2001, Quaglia, M., et al., 2001, Ragno, R., et al., 2004) and hydrogen atoms were added using the tleap module of the AMBER suite. (Case, D. A., et al., 2005). The protonation states at pH 7.4 were considered, i.e., lysines, arginines, aspartates, and glutamates were assumed to be in the ionized form and parameters were calculated by means of the Antechamber module of AMBER. The complexes were solvated (SOLVATEOCT command) in a box extending 10 Å with water molecules (TIP3 model) and neutralized with Na+ and Cl− ions. The solvated complexes were then refined by a single-point minimization using the Sander module of AMBER. The minimized complexes were realigned with MatchMaker using the same reference complex separated while maintaining the coordinates (experimental alignments) into ligands (key) and proteins (lock) and were used as obtained for the energy deconvolution to develop the DISCRIMINATE models. Using Autogrid4 (Morris, G. M., et al., 2009), three contributing energy fields were calculated: the electrostatic (ELE), the steric (STE) and the desolvation (DRY). Being the RT composed of 1000 residues, 1000 COMBINE descriptors were calculated for each field. Seven combination of the field were set up (ELE, STE, DRY, ELE+STE, ELE+DRY, STE+DRY and ELE+STE+DRY). By the means of the PLS algorithm as implemented in the R (Mevik, B-H., et al., 2007), an in-house script was adapted to carry out all the statistical calculations and cross-validations (Table 19).
DISCRIMINATE Model.
To build the DISCRIMINATE model, training set selection was driven by both the availability of co-crystal structures and homogeneous inhibition data from the Mai lab. From a literature search, 14 complexes (characterized by 7 different HIV-RT wild-type and mutant enzymes) were selected as a training set using complexes with only two HIV-RT inhibitors, NVP and EFV, for which inhibition constants were available as previously tested by our collaborators. (Musmuca, I., et al., 2010).
As reported in Table 20, the training set was composed of NVP and EFV in complex with seven different HIV-RT enzymes (WT, L100I, K103N, V106A, V179D, Y181I, Y188L). Of the 14 complexes, structural data were experimentally available from the PDB for only five (WT/EFV: 1fk9, (Ren, J., et al., 2000), K103N/EFV: 1fko, (Id.), WT/NVP: 1vrt, (Ren, J., et al., 1995), L100I/NVP: 1s1u, (Ren, J., et al., 2004) and K103N/NVP: 1fkp (Ren, J., et al., 2000). The other nine complexes (L100/EFV, V106A/NVP, V106A/EFV, V179D/NVP, V179D/EFV, Y181I/NVP, Y181I/EFV, Y188L/NVP and Y188L/EFV) were directly modeled using side-chain structural information retrieved from other complexes present in the PDB and using the BUILD module of UCSF Chimera.
Different from the original COMBINE protocol, DISCRIMINATE used the Autogrid module of the AutoDock 4 suite (Morris, G. M., et al., 2009) to compute the energy interactions between the inhibitors and each amino-acid residue of the enzyme in a complex. The ligand/residues/energy deconvolution matrix was directly obtained by the sum of the interaction energies between all ligand atoms and those composing each amino acid residue in HIV-RT. The complexes were optimized by a short energy minimization followed by docking experiments conducted with AutoDockVina. (Trott, O., et al., 2010). From the Autogrid application, three kinds of interaction contributions were calculated: the steric (STE), the electrostatic (ELE) and the desolvation (DRY) ones. HIV-1 RT is a heterodimer with a subunit of 560 residues (p66) and a second subunit (p51) of 440 residues. Therefore, for each contribution, a total of 1000 interactions were computed, and modeled using the PLS algorithm implemented in the R (R-Development-Core-Team. The R Foundation for Statistical Computing. http://www.r-project.org) environment. Considering all possible combination of contributions, seven different DISCRIMINATE models were independently derived (CM1-CM7, Table 2). From data reported in Table 19, all seven DISCRIMINATE models were highly robust and endowed with good predictive power. Among the seven models, CM1 and CM4 (
As discussed by Gago et al. (Perez, C., et al., 1998, Rodriguez-Barrios, F., et al., 2004) and common to other 3-D QSAR studies (Ballante, F., et al., 2012, Baroni, M., et al., 1993). COMBINE-like models have to be analyzed by means of PLS coefficients and activity contribution (interaction energies multiplied by the PLS coefficients) plots. While PLS coefficients indicated which residues contributed most to the COMBINE relationships (general indication), the activity contributions provided the real pKi contribution for each inhibitor/residue pair to the enhancement or decrease of the given inhibitor activity starting from a constant threshold value (intercept). Further indications of significance can be inferred from the PLS coefficients weighted by the standard deviation values (PLS*StDev) to give the overall importance of each amino-acid residue in the DISCRIMINATE model. In
Regarding the desolvation energy (DRY), from
DISCRIMINATE Predictions.
The reported DISCRIMINATE model CM4 was used to rationalize the role of mutation on the activity profile of (R)- and (S)-MC1501, and of (R)- and (S)-MC2082 reported by Rotili et al. (Rotili, D., et al., 2012). The binding modes of the four DABO derivatives (
Once the binding modes of MC compounds were calculated, the DISCRIMINATE model CM4 was readily applied. As reported in Table 21, the DISCRIMINATE model, although developed on different classes of compounds, predicted the experimental MC activities with an acceptable average absolute error of prediction (0.89 pKi). The CM4 model percentage of prediction error ranged between 61.6% and 0.9% with an average error of 14.3% which are comparable to those experimentally reported by Rotili et al. (Rotili, D., et al., 2012) that were 37.5%, 1.5% and 16.2%, respectively.
Most notably, the model was able to correctly predict the right eudismic ratio for the two R/S pairs of MC derivatives.
The DISCRIMINATE model CM4 application to the external set (MC compounds) gave further information from the interpretation of the calculated activity contributions (
Comparing the activity contributions of R- and S-enantiomers of MC1501, the hydrophobic effect of residue Lys101 become negligible, while that from Trp229 became more appreciable, with an average contribution of 0.24 pKi units. In comparison, Lys101-related steric contribution is more than doubled (see Tables 5 and 6). In the case of MC2082 R- and S-enantiomers, the activity contribution Lys101 is only reduced of 32% (0.17), that of Trp229 increased to 0.16 and the Lys101 steric contribution raised up to more than 5 times (1.05).
Single-point mutations from model CM4 residue 188 demonstrated a key role in modulating the interactions of the ligands both in its wild type (Tyr188) and in the Leu188 mutation. Interestingly, for another mutating residue, residue 188 seems to offset any loss of interaction as a result of the residue mutation itself, more remarkably in the case of the more active compounds (R)-MC1501 and (R)-MC2082. Comparing the activity contribution profile of (R)-MC2082 docked into wild type HIV-RT and in the V106A mutated form, the only values changing drastically are those associated with Tyr188. A possible explanation for this could be that the incoming missing interactions for the (R)-MC2082/Val106→(R)-MC2082/Ala106 replacement are readily filled by the augmented (R)-MC2082/Tyr188 interactions (compare Tyr188 positions in
Finally, Tables 22 and 23 clearly demonstrated that most of the mutations contribute to force the ligands to re-adapt their interaction network mainly around the two non-mutating Lys101 and Trp229 residues, supplying in this way hydrogen bond and hydrophobic anchor points with which the ligands interact upon complex formation.
0.20
0.01
−0.07
0.33
0.10
0.09
−0.08
0.34
0.43
0.20
0.01
−0.01
−0.08
0.34
0.19
0.09
−0.01
−0.08
0.33
1.05
The DISCRIMINATE approach integrates multiple sources of SAR information to build a self-consistent model of the amino acid residues in both wild-type and mutant enzymes responsible for molecular recognition and discrimination. As with all such underdetermined 3-D QSAR models, predictability is the only real means of selecting one model over another. This study on HIV-RT used a minimal set of inhibitor complexes to extract possible models for HIV-RT variants that rationalize the experimentally observed inhibitory activity of a novel set of compounds described by Rotili et al. including the relative activity of two different sets of stereoisomers. Obviously, prediction of novel inhibitors and their activities against HIV-RT is a logical next step to validate the utility of the DISCRIMINATE approach.
All documents cited in this application are hereby incorporated by reference as if recited in full herein.
Although illustrative embodiments of the present invention have been described herein, it should be understood that the invention is not limited to those described, and that various other changes or modifications may be made by one skilled in the art without departing from the scope or spirit of the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US14/44805 | 6/30/2014 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61842191 | Jul 2013 | US |