Diet can be a key environmental factor that affects human health—while poor diet can significantly increase the risk for coronary heart disease (CHD) and diabetes, a healthy diet can play a protective role, even mitigating genetic risk of CHD. Polyphenols are a class of compounds that can play a protective role for a wide range of diseases, from cancer to diabetes mellitus, as well as for cardiovascular and neurodegenerative diseases. Polyphenols can act as antioxidants and are present in plant-based foods, such as fruits, vegetables, herbs, spices, teas, and wine. Polyphenols are characterized by multiples of phenolic or hydroxy-phenolic structural features, and most contain repeating phenolic moieties of resorcinol, pyrocatechol, pyrogallol, and phloroglucinol linked by ester or carbon-carbon bonds. Recent efforts profiling over 500 polyphenols in more than 400 foods have documented the high diversity of polyphenols humans are exposed to through their diet, ranging from flavonoids to phenolic acids, lignans, and stilbens.
While polyphenols, as one example of a class of chemical compounds that can affect human health, are generally known to provide for healthful effects, underlying molecular mechanisms through which specific polyphenols exert their function, as well as associations with particular diseases, remain largely unexplored.
Systems and methods are described that can be used as tools in providing for the identification of diseases affected by a given chemical or class of chemicals, such as polyphenols. The systems and methods described can provide for mechanistic insight as to the molecular pathways responsible for the health implications of a chemical.
A method of identifying a disease associated with a therapeutic chemical includes generating a candidate disease list based on proximities of proteins associated with a plurality of diseases and proteins associated with a therapeutic chemical in a protein-protein interaction network. The method further includes applying gene expression information associated with the therapeutic chemical to generate enrichment scores for diseases of the candidate disease list and identifying at least one disease associated with the therapeutic chemical based on the determined enrichment scores.
A method of filtering data in a protein-protein interaction network includes mapping proteins associated with a plurality of diseases and proteins associated with a therapeutic chemical. The method further includes determining proximities of proteins associated with the plurality of diseases and proteins associated with the therapeutic chemical. An enrichment score is generated for each of the plurality of diseases based on gene expression information associated with the therapeutic chemical. A reduced dataset of proteins within the protein-protein interaction network is generated, the reduced dataset of proteins being proteins associated with a subset of the plurality of diseases based on the determined proximities and the determined enrichment scores. The subset of diseases can be a candidate disease list.
Generating a candidate disease list can include generating a proximity value for a disease and the therapeutic chemical. Determining proximities, or determining a proximity value, can be based on shortest path lengths between nodes representing proteins associated with the disease and nodes representing proteins associated with the therapeutic chemical in the protein-protein interaction network. The proximity value can be a distance metric, such as dc(S,T) as given by the following:
where S is a set of proteins associated with the disease, T is a set of proteins associated with the therapeutic chemical, s is a node representing a protein in set S, t is a node representing a protein in set T, and d(s,t) is a shortest path length between nodes s and tin the protein network.
Generating an enrichment score can include measuring an extent of gene expression perturbation by the therapeutic chemical for a disease, such as, for example, by performing a Gene Set Enrichment Analysis.
The methods can further include ranking the diseases of the candidate disease list based on the determined proximity and the determined enrichment scores. The protein-protein interaction network can be a human interactome. The proteins associated with a therapeutic chemical can be proteins to which the therapeutic chemical binds. For example, the therapeutic chemical can be a polyphenol and the proteins associated with the therapeutic chemical can be binding targets of the polyphenol.
A method of treating a subject having a disease includes administering a therapeutic chemical, wherein the disease is a disease identified by any of the method described above as being associated with the therapeutic chemical.
A system for identifying a disease associated with a therapeutic chemical includes a processor configured to generate a candidate disease list based on proximities of proteins associated with a plurality of diseases and proteins associated with a therapeutic chemical in a protein-protein interaction network. The processor is further configured to apply gene expression information associated with the therapeutic chemical to generate enrichment scores for diseases of the candidate disease list and to identify at least one disease associated with the therapeutic chemical based on the determined enrichment scores.
A system for filtering data in a protein-protein interaction network includes a processor configured to map proteins associated with a plurality of diseases and proteins associated with a therapeutic chemical and determine proximities of proteins associated with the plurality of diseases and proteins associated with the therapeutic chemical. The processor is further configured to generate an enrichment score for each of the plurality of diseases based on gene expression information associated with the therapeutic chemical and to generate a reduced dataset of proteins within the protein-protein interaction network, the reduced dataset of proteins being proteins associated with a subset of the plurality of diseases based on the determined proximities and the determined enrichment scores.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The foregoing will be apparent from the following more particular description of example embodiments, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments.
A description of example embodiments follows.
Systems and methods are presented for identifying diseases whose proteins are candidates to show gene expression perturbation under a treatment with a given chemical compound. The systems and methods presented herein can function as a filter in a protein-protein interaction network, such as the human interactome, to reduce proteins present in the network to a subset of proteins associated with a chemical compound and a disease.
An example of a filter 100 that can be applied to a protein-protein interaction network 102 is shown in
An example of a method 200 for identifying a disease associated with a therapeutic chemical is shown in
Example methods and systems for identifying a disease cluster within a protein network are described in WO2015/084461, the entire contents of which are incorporated herein by reference. Disease clusters identified within a network can be used to generate candidate disease lists. Examples of disease clusters within a network are described in the examples that follow and are shown, for example, in
The chemical compound can be any chemical, including, for example natural and food-borne chemical compounds, therapeutic chemicals, such as polyphenols, synthetic drugs, and nutraceuticals, and nontherapeutic chemicals, such as toxins, and general phytochemicals present in food. In the examples that follow, polyphenols are described for illustration purposes only.
The protein-protein interaction network can be, for example, the human interactome, which includes a map of protein interactions in the human cell. Other protein-protein interaction networks can be used, such as, for example, networks from STRINGDB and GeneMania databases.
In the systems and methods shown in
As further described in the examples that follow, generating the candidate disease list can include generating a proximity value for a disease and the therapeutic chemical. Proximity between a disease and a chemical can be evaluated using a distance metric that takes into account path lengths between chemical targets and disease proteins within the network. For example, the proximity value can be determined based on shortest path lengths between nodes representing proteins associated with the disease and nodes representing proteins associated with the therapeutic chemical. The proximity value can be a distance metric dc(S,T) determined according to:
where S is a set of proteins associated with the disease, T is a set of proteins associated with the therapeutic chemical, s is a node representing a protein in set S, t is a node representing a protein in set T, and d(s,t) is a shortest path length between nodes s and tin the protein network.
To assess significance of a distance between a chemical and a disease (S,T), a reference distance distribution corresponding to expected distances between two randomly selected groups of proteins matching size and degrees of the original disease proteins and chemical targets in the network can be used. For example, a reference distance distribution can be generated by calculating a proximity between two randomly selected groups, and this procedure can be repeated several (e.g., 100, 500, 1000, 2000) times. The mean and standard deviation of the reference distribution can be used to convert the absolute distance to a relative distance (Z-score). Due to the scale-free nature of the human interactome, there are few nodes with high degrees. To avoid repeatedly choosing the same (high degree) nodes, a degree-preserving random selection can be performed.
As further described in the examples that follow, generating an enrichment score for diseases of a candidate disease list can include measuring an extent of gene expression perturbation by the therapeutic chemical for a given disease. This can include performing a Gene Set Enrichment Analysis. For example, pertubations signatures can be obtained, such as from the ConnectivityMap database (https://clue.io/), for cell lines treated with different chemicals. These signatures reflect the perturbation of the gene expression profile caused by treatment with a chemical under consideration relative to a reference population, which is composed of other treatments in the same experimental plate. For chemicals having more than one experimental instance (e.g., time of exposure, cell line, dose), the one with highest distil\_cc\_q75 value (i.e., 75th quantile of pairwise spearman correlations in landmark genes) can be selected. Gene Set Enrichment Analysis can then be performed to evaluate the enrichment of disease genes among the top deregulated genes in the perturbation profiles. This analysis results in an Enrichment Score (ES) that has small values when genes are randomly distributed among the ordered list of expression values and high values when genes are concentrated at the top or bottom of the list. Methods of performing an Enrichment Analysis are further described in Subramanian, A. et al. “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.” Proc. Natl. Acad. Sci. U. S. A. 102, 15545-50 (2005), the entire contents of which is incorporated herein by reference.
An ES significance can be calculated by creating, for example, 1000 random selections of gene sets with the same size as the original gene set and calculating an empirical p-value by considering a proportion of random sets resulting in ES smaller than the original case. The p-value can be adjusted for multiple testing by using the Benjamini-Hochberg method.
With the proximity values and enrichment scores, the diseases of the candidate disease list can be ranked to provide the CDPR. For example, the ranking can prioritize chemicals by therapeutic potential. The chemicals with greatest therapeutic potential can be defined as those that are proximal to disease proteins and significantly perturb expression of disease genes. The CDPR can advantageously provide for prioritization of a set of chemicals in respect to a disease, or a set of diseases in respect to a chemical, for further evaluation. The CDPR can also provide for a quantitative and molecular-based description of a relationship between chemical compound targets and disease processes, which can in-turn provide for mechanism-of-action information for the chemical compounds.
Conventional methods of evaluating chemical-disease relations involve evaluation of structural properties of chemical compounds. The methods and systems described can advantageously omit such analysis by accounting for how a chemical interacts with various proteins and how those proteins interact with each other and with associated disease processes through the protein-protein interaction network. The methods and systems described do not require knowledge of the specific type of interactions (e.g., activation, inhibition) between a chemical and its protein targets.
In the case of polyphenols, or other food-borne chemicals, the systems and methods described can advantageously provide for the identification of health effects related to chemical compounds present in foods. For example, and as described in the Example sections that follow, from a CDPR, Rosemarinic Acid (RA) was shown to have an association with vascular diseases and was predicted to have a direct impact on platelet function. With this information, RA was further evaluated, and experimental evidence demonstrated that RA inhibits platelet aggregation and alpha granule secretion, thereby providing for valuable information of foods that may benefit individuals with poor cardiovascular health.
The systems and methods described can advantageously provide for identification of chemical compounds that can be potentially used for disease treatment, identification of health effects related to chemical compounds, such as those present in foods, and streamlining of research by prioritizing chemicals demonstrated to show bioactivity. This methodology can be coupled with technologies such as CRISPR-CAS9 to genetically change life forms (e.g., plants and their seeds) for greater production of chemical compounds with beneficial health effects.
In particular, embodiments of the present invention execute processor routines for the filter 100 and method 200 of
In alternative embodiments, the propagated signal is an analog carrier wave or digital signal carried on the propagated medium. For example, the propagated signal may be a digitized signal propagated over a global network (e.g., the Internet), a telecommunications network, or other network. In one embodiment, the propagated signal is a signal that is transmitted over the propagation medium over a period of time, such as the instructions for a software application sent in packets over a network over a period of milliseconds, seconds, minutes, or longer. In another embodiment, the computer readable medium of computer program product 92 is a propagation medium that the computer system 50 may receive and read, such as by receiving the propagation medium and identifying a propagated signal embodied in the propagation medium, as described above for computer program propagated signal product.
Generally speaking, the term “carrier medium” or transient carrier encompasses the foregoing transient signals, propagated signals, propagated medium, other mediums and the like.
In other embodiments, the computer program product 92 provides Software as a Service (SaaS) or similar operating platform.
Alternative embodiments can include or employ clusters of computers, parallel processors, or other forms of parallel processing, effectively leading to improved performance, for example, of generating a computational model. Given the foregoing description, one of ordinary skill in the art understands that different portions of processor routine 100 and different iterations operating on respective sequence reads may be executed in parallel on such computer clusters or parallel processors.
Despite the widespread evidence of the positive role of polyphenols on human health, the underlying molecular mechanisms through which specific polyphenols exert their function remain largely unexplored. From a mechanistic perspective their role is rather special because dietary polyphenols are not processed by the endogenous metabolic processes of anabolism and catabolism. Rather, dietary polyphenols impact human health through their ant- or pro-oxidant activity, by binding to proteins and modulating the activity of key cellular signaling and metabolic pathways, interacting with digestive enzymes, and modulating gut microbiota growth. Yet, the variety of experimental settings used so far to explore the molecular effects of polyphenols—represented by different concentrations, administration routes, model organisms, populations, and evaluated outcomes—have, to date, offered a range of often conflicting evidence for interpretation. For example, different clinical trials resulted in contrasting conclusions about the beneficial effects of resveratrol on glycemic control of type 2 diabetes patients. Therefore, there is a need for a framework to interpret the evidence present in the literature, and to offer in-depth mechanistic predictions on the molecular pathways responsible for the health implications of polyphenols present in diet. These insights can aid in the development of novel diagnostic and therapeutic strategies, and may lead to the synthesis of novel drugs.
A network medicine framework was developed to capture the molecular interactions between polyphenols and their cellular binding targets, unveiling their relationship to complex diseases. The developed framework is based on the human interactome, a comprehensive network of all known physical interactions between human proteins, which has been validated before as a platform for understanding disease mechanisms, rational drug target identification, and drug repurposing.
First, it was found that the proteins to which polyphenols bind form identifiable neighborhoods in the human interactome. It was then demonstrated that the proximity between polyphenol targets and proteins associated with specific diseases is predictive of the known therapeutic effects of polyphenols. Finally, the potential therapeutic effects of rosmarinic acid on vascular diseases was unveiled with a prediction that the effect was related to modulation of platelet function. This prediction was confirmed by the performance of experiments that demonstrated that rosmarinic acid modulates platelet function in vitro by inhibiting tyrosine protein phosphorylation. Altogether, the results demonstrate that the network-based relationship between disease proteins and polyphenol targets offers a tool to systematically unveil the health effects of polyphenols.
The methodology described can provide for the foundation of mechanistic interpretation of alternative pathways through which polyphenols can affect health: e.g., the combined effect of different polyphenols and their interaction with drugs. Furthermore, the methodology described can be applied to other food-related chemicals, providing a framework to understand their health effects.
The study started with a list of 759 polyphenols catalogued in the PhenolExplorer database, of which 387 were only detected in foods, 251 were only detected in biofluids, and 121 are present in both foods and biofluids (
To identify the cellular processes potentially affected by specific polyphenol molecules, the polyphenol targets were mapped to the human interactome, consisting of 17,651 proteins and 351,393 interactions (
It was next asked whether the polyphenol targets cluster in specific regions of the human interactome. The focus was on polyphenols with more than two targets (n=46,
Taken together, these results indicate that the targets of polyphenols modulate specific well localized neighborhoods of the interactome (
Polyphenols act like drugs: they bind to specific proteins, affecting their ability to perform their normal functions. The closer the targets of a polyphenol are to disease proteins, the more likely that the polyphenol will affect the disease phenotype, resulting in detectable therapeutic effects on the disease. The network proximity between polyphenol targets and proteins associated with 299 diseases was calculated using the closest measure, dc, representing the average shortest path length between each polyphenol target and the nearest disease protein. Consider for example (−)-epigallocatechin 3-O-gallate (EGCG), a polyphenol abundant in green tea. Epidemiological studies have found a positive relationship between green tea consumption and reduced risk of type 2 diabetes mellitus (T2D), and physiological and biochemical studies have shown that EGCG presents glucose-lowering effects in both in vitro and in vivo models. Fifty-four experimentally validated EGCG protein targets were identified and mapped to the interactome, and it was found that the ECGC targets form an LCC of 17 proteins (Z=7.61) (
These methods were expanded to all polyphenol-disease pairs, with the goal of predicting diseases for which specific polyphenols might have therapeutic effects. For this, all possible 19,435 polyphenol-disease associations between 65 polyphenols and 299 diseases were grouped into known (1,525) and unknown (17,910) associations. The known polyphenol-disease set was retrieved from CTD, limiting to manually curated associations for which there is literature-based evidence. For each polyphenol, how well network proximity discriminates between the known and unknown sets was tested by evaluating the area under the Receiving Operating Characteristic (ROC) curve (AUC). For EGCG, network proximity offers a good discriminative power (AUC=0.78, CI: 0.70-0.86) between diseases with known and unknown therapeutic associations (Table 1). It was found that network proximity (dc) offers predictive power with an AUC >0.7 for 31 polyphenols (
Finally, multiple robustness checks were performed to rule out the role of potential biases in the input data. To test if the predictions are biased by the set of known associations retrieved from CTD, 100 papers were randomly selected from PubMed containing MeSH terms that tag EGCG to diseases. The evidence was manually curated for EGCG's therapeutic effects for the diseases discussed in the published papers, excluding reviews and non-English language publications. The dataset was processed to include implicit associations, resulting in a total of 113 diseases associated with EGCG, of which 58 overlap with the associations reported by CTD (
To validate the predicted polyphenol-disease associations expression perturbation signatures were retrieved from the Connectivity Map database for the treatment of the breast cancer MCF7 cell line with 22 polyphenols. The database assigns each gene a z-score capturing the extent to which its expression is perturbed by a given polyphenol. The relationship between the extent in which polyphenols perturb the expression of disease genes, the network proximity between the polyphenol targets and disease proteins, and their known therapeutic effects was investigated (
Network proximity can also be predictive of the overall gene expression perturbation caused by a polyphenol on the genes of a given disease. To test this, in each experimental combination defined by the polyphenol type and its concentration, the maximum perturbation score among genes for each disease was evaluated. The magnitude of the observed perturbation between diseases that were proximal (dc<25th percentile, Zdc<−0.5) or distal (dc>75th percentile, Zdc>−0.5) to the polyphenol targets were compared.
Taken together, these results indicate that network proximity offers a mechanistic interpretation for the gene expression perturbations induced by polyphenols, being also predictive of whether these perturbations result in therapeutic effects.
How the network-based framework can facilitate the mechanistic interpretation of the therapeutic effects of selected polyphenols was demonstrated, with a focus on Vascular Diseases (VD). Out of 65 polyphenols evaluated in this study, 27 were found to have associations to VD, as their targets were hitting the VD network neighborhood (Table 3). The targets of 15 out of the 27 polyphenols with 10 or less targets were inspected, as experimentally validating the mechanism of action among the interactions of more than 10 targets would provide complexities beyond the scope of this study. The network analysis identified direct links between biological processes related to vascular health and the targets of three polyphenols, gallic acid, rosmarinic acid, and 1,4-naphthoquinone (
Gallic Acid: Gallic acid has a single human protein target, SERPINE1, which is also a VD-associated protein, resulting in d_c=0 and Z_dc=−3.02. SERPINE1 is involved in the regulation of blood clot dissolution and regulation of cell adhesion and spreading by modulating the proteins PLAT and PLAU, respectively. An inspection of the LCC formed by VD proteins also revealed that SERPINE1 directly interacts with the VD proteins PLG, LRP1, and F2 (
1,4-Naphthoquinone: 1,4-naphthoquinone targets four proteins, MAP2K1, MAOA, CDC25B and IDO1, which are proximal to VD-associated proteins (d_c=1.25, Z
_dc=−1.51) (
Rosmarinic Acid: Rosmarinic acid (RA) can bind to three human proteins, FYN, MCL1, and AKR1B1, offering a statistically significant proximity to VD genes (d_c=1.00, Z
_dc=−1.38). The analysis of the RA target FYN and three of its seven direct neighbors in the VD module (CD36, APP, and PRKCH) suggests the role of this polyphenol on platelet function—cells specialized in blood clot formation and involved in abnormal clotting that can lead to heart attacks and stroke. FYN also directly interacts with NFE2L2 (also known as NRF2), a transcription factor that regulates the expression of several genes with anti-oxidant properties43. Using RA perturbation profiles from the Connectivity Map database, it was observed that two cell lines (A549, MCF7) showed higher perturbation scores for genes that are directly regulated by NFE2L2 after treatment with RA. Indeed, recent reports show that mice lacking FYN have reduced platelet activit and that RA's protective effects on vascular calcification and on aortic endothelial function after diabetes-induced damage is mediated by anti-oxidant mechanisms. These observations suggest that RA activity might be mediated by FYN, ultimately regulating the processes of platelet activity and expression of anti-oxidant genes. The RA target MCL1 has also been proposed as an essential survival factor for endothelial cells in blood vessel production during angiogenesis, and it has been observed that RA has been found to restore cardiac function in rat models of ischemia/reperfusion injury.
In summary, by integrating literature evidence and by inspecting the polyphenol targets and their neighbors in the interactome, the molecular mechanisms underlying the protective effects of gallic acid, rosmarinic acid, and 1,4-naphthoquinone for VD were identified. The analysis suggests that gallic acid activity involves blood clot dissolution processes, rosmarinic acid acts on platelet activation and anti-oxidant pathways through FYN and its neighbors, and 1,4-naphthoquinone acts on signaling pathways of vascular cells through MAP2K1 activity.
To validate the predictive power of the developed framework, direct experimental evidence of the predicted mechanistic role of Rosmarinic acid (RA) in VD was sought. The VD network neighborhood shows that RA targets are in close proximity to proteins related to platelet function, cells that control blood clot formation and whose inhibition is the mechanism underlying drugs prescribed to prevent heart attack and stroke.
The molecular mechanisms involved in the functional impact of RA on platelets was clarified. The RA target FYN is a protein-tyrosine kinase and platelet activation is coordinated by several kinases that phosphorylate adaptors, enzymes, and cytoskeletal proteins downstream of platelet surface receptors. Given this connection, RA may inhibit platelets function by blocking agonist-induced protein tyrosine phosphorylation. It was observed that RA-treated platelets demonstrated a dose-dependent reduction in total tyrosine phosphorylation in response to CRPXL, TRAP-6 and U46619 (
Altogether, these findings support the prediction that RA, by targeting a network neighborhood related to platelet function, modulates platelet activation and function. It also supports the observation that its mechanism of action involves the protein-tyrosine kinase FYN (
The human interactome was assembled from 16 databases containing different types of protein-protein interactions (PPIs): 1) binary PPIs tested by high-throughput yeast-two-hybrid (Y2H) experiments; 2) kinase-substrate interactions from literature-derived low-throughput and high-throughput experiments from KinomeNetworkX, Human Protein Resource Database (HPRD), and PhosphositePlus; 3) carefully literature-curated PPIs identified by affinity purification followed by mass spectrometry (AP-MS), and from literature-derived low-throughput experiments from InWeb, BioGRID, PINA, HPRD, MINT, IntAct, and InnateDB; 4) high-quality PPIs from three-dimensional (3D) protein structures reported in Instruct, Interactome3D, and INSIDER; 5) signaling networks from literature-derived low-throughput experiments as annotated in SignaLink2.0; and 6) protein complex from BioPlex2.0. The genes were mapped to their Entrez ID based on the National Center for Biotechnology Information (NCBI) database as well as their official gene symbols. The resulting interactome includes 351,444 protein-protein interactions (PPIs) connecting 17,706 unique proteins. The largest connected component has 351,393 PPIs and 17,651 proteins.
The 759 polyphenols were retrieved from the PhenolExplorer database. The database lists polyphenols with food composition data or profiled in biofluids after interventions with polyphenol-rich diets. For the analysis, only polyphenols that: 1) could be mapped in PubChem IDs, 2) were listed in the Comparative Toxicogenomics (CTD) database as having therapeutic effects on human diseases, and 3) had protein-binding information present in the STITCH database with experimental evidence were considered (
The polyphenol-disease associations were retrieved from the Comparative Toxicogenomics Database (CTD). Only manually curated associations labeled as therapeutic were considered. By considering the hierarchical structure of diseases along the MeSH tree, the study expanded explicit polyphenol-disease associations to include also implicit associations. This procedure was performed by propagating associations in the lower branches of the MeSH tree to consider also the diseases in the higher levels of the same tree branch. For example, a polyphenol associated with ‘heart diseases’ would also be associated to the more general category of ‘cardiovascular diseases’. By performing this expansion, a final list of 1,525 known associations between the 65 polyphenols and the 299 diseases considered in this study was obtained.
The proximity between a disease and a polyphenol was evaluated using a distance metric that takes into account the shortest path lengths between polyphenol targets and disease proteins. Given S, the set of disease proteins, T, the set of polyphenol targets, and d(s,t), the shortest path length between nodes s and tin the network, it is defined:
To assess the significance of the distance between a disease and a polyphenol (S, T), a reference distance distribution was created corresponding to the expected distances between two randomly selected groups of proteins matching the size and degrees of the original disease proteins and polyphenol targets in the network. The reference distance distribution was generated by calculating the proximity between these two randomly selected groups, a procedure repeated 1,000 times. The mean μ_(d(S,T)) and s.d. σ_(d(S,T)) of the reference distribution were used to convert the absolute distance d_c to a relative distance Z_dc, defined as:
Due to the scale-free nature of the human interactome, there are few nodes with high degrees and to avoid repeatedly choosing the same (high degree) nodes, a degree-preserving random selection was performed.
For each polyphenol, AUC was used to evaluate how well the network proximity distinguishes diseases with known therapeutic associations from all the others of the set of 299 diseases. The set of known associations (therapeutic) retrieved from CTD were used as positive instances, all unknown associations were defined as negative instances, and the area under the ROC curve was computed using the implementation in the Scikit-learn Python package. Furthermore, 95% confidence intervals were calculated using the bootstrap technique with 2,000 resamplings with sample sizes of 150 each. Considering that AUC provides an overall performance, a metric to evaluate the top-ranking predictions was used. For this analysis, the precision of the top 10 predictions was calculated, considering only the polyphenol-disease associations with relative distance Z_dc<−0.520.
Perturbation signatures were retrieved from the Connectivity Map database (https://clue.io/) for the MCF7 cell line after treatment with 22 polyphenols. These signatures reflect the perturbation of the gene expression profile caused by the treatment with that particular polyphenol relative to a reference population, which comprises all other treatments in the same experimental plate. For polyphenols having more than one experimental instance (time of exposure, cell line, dose), the one with highest distil_cc_q75 value (75th quantile of pairwise spearman correlations in landmark genes, https://clue.io/connectopedia/perturbagen\_types\_and\_controls) was selected. Gene Set Enrichment Analysis was performed to evaluate the enrichment of disease genes among the top deregulated genes in the perturbation profiles. This analysis offers an Enrichment Scores (ES) that have small values when genes are randomly distributed among the ordered list of expression values and high values when they are concentrated at the top or bottom of the list. The ES significance was calculated by creating 1,000 random selection of gene sets with the same size as the original set and calculating an empirical p-value by considering the proportion of random sets resulting in ES smaller than the original case. The p-values were adjusted for multiple testing using the Benjamini-Hochberg method. The network proximity d_c of disease proteins and polyphenol targets for diseases with significant ES were compared according to their therapeutic and non-therapeutic associations using the Student's t-test.
Human blood collection was performed as previously described in accordance with the Declaration of Helsinki and ethics regulations with Institutional Review Board approval from Brigham and Women's Hospital (P001526). Healthy volunteers did not ingest known platelet inhibitors for at least 10 days prior. Citrated whole blood underwent centrifugation with a slow break (177 x g, 20 minutes) and the PRP fraction was acquired for subsequent experiments. For washed platelets, PRP was incubated with 1 μM prostaglandin E1 (Sigma, P5515) and immediately underwent centrifugation with a slow break (1000 x g, 5 minutes). Platelet-poor plasma was aspirated, and pellets resuspended in platelet resuspension buffer (PRB; 10 mM Hepes, 140 mM NaCl, 3 mM KCl, 0.5 mM MgCl2, 5 mM NaHCO3, 10 mM glucose, pH 7.4).
Platelet aggregation was measured by turbidimetric aggregometry. Briefly, PRP was pretreated with RA for 1 hour before adding 250 μL to siliconized glass cuvettes containing magnetic stir bars. Samples were placed in Chrono-Log® Model 700 Aggregometers before the addition of various platelet agonists. Platelet aggregation was monitored for 6 minutes at 37° C. with a stir speed of 1000 rpm and the maximum extend of aggregation recorded using AGGRO/LINK®8 software. In some cases, dense granule release was simultaneously recorded by supplementing samples with Chrono-Lume® (Chrono-Log®, 395) according to the manufacturer's instructions.
Changes in platelet surface expression of P-selectin (CD62P) or binding of Alexa Fluor™ 488-conjugated fibrinogen were used to assess alpha granule secretion and integrin αIIbβ3 activation, respectively. First, PRP was pre-incubated with RA for 1 hour, followed by stimulation with various platelet agonists under static conditions at 37° C. for 20 minutes. Samples were then incubated with APC-conjugated anti-human CD62P antibodies (BioLegend®, 304910) and 100 μg/mL Alexa Fluor™ 488-Fibrinogen (Thermo Scientific™, F13191) for 20 minutes, before fixation in 2% [v/v] paraformaldehyde (Thermo Scientific™, AAJ19945K2). 50,000 platelets were processed per sample using a Cytek™ Aurora spectral flow cytometer. Percent-positive cells were determined by gating on fluorescence intensity compared to unstimulated samples.
Cytotoxicity were tested by measuring lactate dehydrogenase (LDH) release by permeabilized platelets into the supernatant. Briefly, washed platelets were treated with various concentrations of RA for 1 hour, before isolating supernatants via centrifugation (15,000 x g, 5 min). A Pierce LDH Activity Kit (Thermo Scientific™, 88953) was then used to assess supernatant levels of LDH.
Washed platelets were pre-treated with RA for 1 hour, followed by agonist stimulation for 10 minutes. Platelets were lysed on ice with RIPA Lysis Buffer System® (Santa Cruz®, sc-24948) and sample supernatants clarified via centrifugation (14,000 rpm, 5 min, 4° C.). Supernatants were reduced with Laemmli Sample Buffer (Bio-Rad, 1610737) and proteins separated by molecular weight in PROTEAN TGX™ precast gels (Bio-Rad, 4561084). Proteins were transferred to PVDF membranes (Bio-Rad, 1620174) and probed with 4G10 (Milipore, 05-321), a primary antibody clone that recognizes phosphorylated tyrosine residues. Membranes were probed with horseradish peroxidase-conjugated secondary antibodies (Cell Signaling Technologies, 7074S) to catalyze an electrochemiluminescent reaction (Thermo Scientific™, PI32109). Membranes were visualized using a Bio-Rad ChemiDoc Imaging System and densitometric analysis of protein lanes conducted using ImageJ (NIH, Version 1.52a).
The teachings of all references cited herein and in the attached paper are hereby incorporated in their entirety.
While example embodiments have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the embodiments encompassed by the appended claims.
This application claims the benefit of U.S. Provisional Application No. 62/852,800, filed on May 24, 2019. The entire teachings of the above application are incorporated herein by reference.
This invention was made with government support under 1P01HL132825 from the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/034299 | 5/22/2020 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62852800 | May 2019 | US |