This invention relates to molecular evaluation methods, and in particular to methods for predicting the molecular mechanisms through which a perturbation promotes or inhibits a cellular transition from a first cell state to a second or more other cell states.
The concept of cell states has a long tradition in biology. Cell states help describe how biological processes combine to form autonomous units with defined form and function. The cell state concept has proven a very useful lens to view and understand the organization of tissues and organisms, their development and responses to exogenous and endogenous changes in health and disease.
While initially mainly based on descriptions of morphology and phenotypes, progress in global analysis methods now allows phenotypes to be connected with underlying molecular processes. These connections enable the characterization of cell states with fine molecular resolution and open the door to understand how cell states can evolve and transition into each other.
In 1940 Waddington suggested that cells move through a landscape of mountains and valleys as rolling marbles from one (meta)stable state to another (Organisers and Genes by C. H. Waddington, The University Press, 1940). This now famous model appeals through its intuitive nature but leaves open how and why the marbles roll into certain valleys and also whether they can go back to an initial state.
Recent efforts have applied computational models to understand transitions between states. These include models that describe cell state transitions as static states generated through convergence of molecular processes or dynamic states created by stochastic transitions (Brackstn, R. D. et al. (2018). Transition state characteristics during cell differentiation. PLOS Comput Biol 14, e1006405; Wang, J. et al. (2011). Quantifying the Waddington landscape and biological paths for development and differentiation. Proc Natl Acad Sci USA 108, 8257-8262).
Other models use lineage analysis of cell states to characterize and infer cell state transitions (Hormoz, S. et al. (2016). Inferring Cell-State Transition Dynamics from Lineage Trees and Endpoint Single-Cell Measurements. Cell Syst 3, 419-433 e418; Su, Y. et al. (2020). Multi-omic single-cell snapshots reveal multiple independent trajectories to drug tolerance in a melanoma cell line. Nat Commun 11, 2345). These efforts show that cell states are interconvertible and that this involves changes in dynamic molecular processes, such as gene expression and signal transduction networks. However, a critical gap is the lack of a mechanistic understanding of how cellular networks drive cell state transitions that would allow us to purposefully manipulate and control cell states.
US 2008/0195322 A1 discloses a method for profiling the effects of perturbations on biological samples by acquiring images of cells in different cell states, applying statistical multivariate methods that use morphological features derived from the images to separate the cell states, and distinguish morphological changes in response to different perturbations, such as drug treatments. The method provides no predictive value in designing perturbations to achieve a desired cell state and nor does it provide any insight into the cause of the morphological changes observed. It simply screens compounds according to the observed effect on cells. No information is generated regarding the governing molecular networks involved in cell state transitions, rendering it unsuitable for predicting molecular mechanisms.
There is provided a molecular evaluation method for predicting the molecular mechanisms through which a perturbation promotes or inhibits a cellular transition from a first cell state to a second cell state, comprising the steps of:
In determining the ranked list of molecular features and the top ranked components, the list and/or ranking may be limited to actionable components, i.e. enzymes, transcription factors, transporters, channels, receptors, scaffolds, and the like. In this way the ranked list may exclude high-rank components that do not affect other proteins and cannot be inhibited (e.g. some structural proteins, for instance, caveolin).
The term “perturbation” encompasses (where the context permits) a combination of individual perturbations.
The cell states referred to as “first” or “second” cell states are merely two possible cell states which can be adopted, and does not imply that the system has only two such states. The claimed method may encompass any other number of cell states and may generate network graphs for each such cell state quantifying the effects of the core components on the DPD in each such cell state to describe the molecular mechanisms that characterise those cell states.
The term “graph” does not imply any specific graphical representation, but rather denotes a mathematical graph i.e. a structure which models pairwise relations between the core network components and DPD (i.e. the nodes) in terms of connection strengths (i.e. the weighted, directed edges). The method is typically a computer-implemented method embodied in program instructions which when executed on a suitable computing system cause the method to be carried out by that computing system.
Preferably, the step of calculating a respective causal network graph comprises calculating a respective causal network connection matrix specifying the strength of connection between each of the core components and between each core component and the DPD.
Preferably, the calculation of a causal network connection matrix comprises inferring the topology and strengths of causal connections of the core network and the DPD using Modular Response Analysis.
More preferably, the Modular Response Analysis used is Bayesian Modular Response Analysis.
Preferably, the method further comprises the steps of experimentally perturbing the cell states, observing the effect of the perturbation on the cell states, and inferring from the observed effects the strength of connection between each of the core components, and between each core component and the DPD.
Preferably, observing the effect of the perturbation on the cell states comprises measuring one or more molecular responses to the experimental perturbation.
Preferably, experimentally perturbing the cell states comprises applying a plurality of perturbations and observing the effect of the perturbations on the cell states.
Preferably, a perturbation comprises exposing cells to a chemical compound, exposing cells to a biological compound, inducing an epigenetic or genetic change in cells, exposing cells to pathogens, exposing cells to an interaction with other cells, and exposing cells to an interaction with a biological or artificial surface.
Further, preferably:
Preferably, step (g) comprises processing data for a population of cells to which the or each perturbation has been applied, wherein said processing comprises (i) mapping said cells in said reduced multi-dimensional space, and (ii) identifying clusters of cells in said mapped cells associated with the cells before and after the perturbation is applied.
Preferably, identifying within said ranked list the components of a core biochemical network comprising the top ranked components above a cut-off in the ranking, comprises determining a cut-off in the ranking which maximises the number of components which can be mapped onto existing biochemical pathways while minimising the total number of ranked components used according to an optimisation function.
Further, preferably, determining the number of components which can be mapped onto existing biochemical pathways comprises determining from one or more databases whether each component can be mapped to a pathway whose characteristics are known from the one or more databases.
The method may be adapted to multi-state cellular transitions by applying the method to evaluate the transitions between different pairs of a multi-state system.
In some embodiments, therefore, said first and second cell states are any two states chosen from a set of three or more cell states, and the step of processing data for a population of cells identifies cells associated with said three or more cell states by identifying clusters of cells in said representation associated with each of said three or more cell states.
Preferably, said hypersurface is a hyperplane.
Preferably, said distinct molecular features of the cells are identified in said processed data as a set of measured analyte levels each of which corresponds to a distinct molecular feature.
There is also provided a molecular evaluation method according to any preceding statement of invention, wherein said molecular features of the cells that define said cell states are selected from RNA expression, protein expression and posttranslational protein modification.
The method preferably further comprises identifying an intervention likely to promote or inhibit a cellular transition between first and second cell states, by one or more of:
Suitably, the intervention is a combination of interventions, and the assessment in step (a) or (b) considers the effect of the interventions simultaneously.
Alternatively, the intervention is a combination of interventions, and the assessment in step (a) or (b) considers the effect of the interventions serially.
In certain embodiments, determining whether an intervention will change one cell state into another cell state comprises determining whether the distance from the first cell state data points to the hypersurface decreases following said intervention.
In other embodiments, determining whether an intervention will move a said cell state along the STV away from, towards or across the separating hypersurface comprises calculating a change in the DPD using a computational model built from the data, as given by:
wherein S is the DPD value being calculated by the model, ƒ(S) is the restoring driving force,
is the signaling driving force, xj(t) are the outputs of signaling modules, rSj are the corresponding, BMRA-inferred connection coefficients to the STV (see Table S4), and Sst.st. and xjst.st are the initial steady-state values of S and x; before perturbations.
The methods of the invention may be implemented as a system, a method, and/or a computer program product at any technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, configuration data for integrated circuitry, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Python, C++, or the like, and procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
Aspects of the present invention are described herein as method steps. It will be understood that each step, and combinations of steps, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The invention will now be further illustrated by the following description of embodiments thereof, given by way of example only with reference to the accompanying drawings, in which:
In the following description there is disclosed a cell State Transition Identification and Control Key (cSTAR) that identifies cell states, quantifies their determining elements, reconstructs a mechanistic network that controls the cell state transitions, and unlocks pathway manipulations that allow us to convert one cell state into another.
The overview of the method is shown in
cSTAR sequentially integrates the following elements:
The cSTAR approach was validated experimentally, and the method used to devise specific drug perturbations to cell signaling networks that successfully converted proliferation into differentiation states in the neuroblastoma SH-SY5Y cell line. While here we used reversed phase protein arrays (RPPA) as data source, cSTAR is a versatile method that can utilize different types of omics data to design precision interventions for controlling and interconverting cell fate decisions.
In order to rigorously test cSTAR we were looking for an experimental system that is a meaningful biological model and features robust cell fate decisions based on subtle molecular differences. This poses a stringent challenge of distinguishing cell states and a realistic chance of identifying feasible perturbations that can convert cell fates. The SH-SY5Y human neuroblastoma cell line is a well-established cell model for studying neuronal differentiation, neurodegeneration, and therapeutic target discovery in neuroblastoma.
Expression of the TrkA or TrkB receptor tyrosine kinases specifies different cell fate decisions in SH-SY5Y cells. TrkA stimulates terminal differentiation marked by neurite outgrowth, whereas TrkB drives proliferation, as illustrated in
TrkA expression is associated with good prognosis, while TrkB expression correlates with aggressive tumor behavior. TrkA and TrkB activate very similar signaling pathways, and it is unclear what particular changes in signaling and expression patterns cause these distinct cell fate decisions (Schramm, A. et al. (2005). Biological effects of TrkA and TrkB receptor signaling in neuroblastoma, Cancer Lett 228, 143-153). Therefore, SH-SY5Y cells are an ideal system to test cSTAR.
Experiments used a custom made RPPA with 115 validated antibodies listed in Table S1A below to interrogate the activities of signaling pathways involved in TrkA/B signaling (Schramm et al., 2005), including MAPK (RAS-ERK, JNK, p38), PI3 kinase (AKT, mTORC1/2), JAK-STAT, PLCγ-PKC, TGFβ-SMAD, Wnt, cyclic-AMP, cell cycle (CDK1, Cyclins, Rb, p53, p21WAF), apoptosis (BAX, BCL2, BCLx, Caspase 3), transcription factor (MYC, NFκB, JUN), and tyrosine kinase (SRC, EGFR, PDGFR, IGFR) pathways. In addition to the 115 antibodies, three controls are also included, Mouse 1, 2 and 3.
For each antibody, RPPA phosphoproteomics data was collected, measured as raw fluorescent intensity values in TrkA and TrkB cells stimulated with a ligand and treated with different inhibitors. A sample of the data is reproduced in Table S1B below, showing the measured fluorescent intensity values for a small sample of antibodies and a small sample of treatment and stimulation conditions. The full data set, including replicated experiments, contains 118 rows (one per antibody) and 144 columns (each containing 118 measurements and relating to an experimental set of treatment and stimulation conditions, which include replicated experiments).
In addition, TrkA and TrkB activities were measured by Western blotting, as shown in Table S2 below. These antibodies detect phosphorylation sites that change protein activities or protein abundances. We measured pathway activities in untreated cells and cells treated with NGF (TrkA ligand) or BDNF (TrkB ligand) for 10 or 45 minutes. After normalization we calculated the fold changes in protein phosphorylation levels or abundances producing a data point for each protein.
In terms of the computational approach to processing the acquired data, each analyte level was first normalized on the GAPDH level, and then on the value of the same analyte in the absence of inhibitors and ligand stimulation to obtain fold-changes.
The individual data points for TrkA and TrkB cells can be perceived as points in the molecular data space of 115 dimensions (corresponding to the measurement of 115 protein features) that describe the cell states. However, phenotypically SH-SY5Y cells exhibit only three different states: (1) a common ‘ground’ state of isogenic TrkA and TrkB cells with no GF stimulation, (2) a differentiation state following TrkA cell stimulation with NGF, and (3) a proliferation state following TrkB cell stimulation with BDNF. This suggests that not all data points are equally important in defining a cell state, and that distinct states might be determined by a handful of different patterns that are hidden in the molecular data.
Consequently, transitions between different cell states can be described by a few critical parameters, which for complex systems are termed the order parameters (Haken, H. (2004). Synergetics: Introduction and Advanced Topics (Springer); Landau, L. D., and Lifshitz, E. M. (1980). CHAPTER XIV—PHASE TRANSITIONS OF THE SECOND KIND AND CRITICAL PHENOMENA. In Statistical Physics (Third Edition), L. D. Landau, and E. M. Lifshitz, eds. (Oxford: Butterworth-Heinemann), pp. 446-516). While in physics the order parameters are found using theoretical models of state transitions, at present no mechanistic models can determine the dynamic changes in whole-cell signaling patterns between distinct cell states (Needham, E. J., Parker, B. L., Burykin, T., James, D. E., and Humphrey, S. J. (2019). Illuminating the dark phosphoproteome. Science Signaling 12, eaau8645).
To address this gap we developed the STV, which allows us to determine how signaling data patterns of a given cell state would have to change to enable the transition from one cell state into another.
The first step is to distinguish and separate distinct cell states in protein phosphorylation and/or expression molecular data space, using machine learning (ML) methods to cluster and classify signaling patterns. Two different unsupervised ML methods, Ward's hierarchical clustering and the K-means clustering (Duda, R. O., Hart, P. E., and Stork, D. G. (2012). Pattern Classification (Wiley)) generated identical results and determined two distinct sets of data points that correspond to two different cell states, NGF-stimulated TrkA differentiation state and BDNF-stimulated TrkB proliferation state.
Pandas Python library (Mckinney, W.a.o. (2010). Data structures for statistical computing in python. Paper presented at: Proceedings of the 9th Python in Science Conference (Austin, TX)) was used for RPPA data analysis and manipulation. For PCA compression and K-means data clustering we used the scikit-learn Python library (Pedregosa, F. et al., (2011). Scikit-learn: Machine Learning in Python. J Mach Learn Res 12, 2825-2830.).
R base functions (Team, R. C. (2013). R: A Language and Environment for Statistical Computing, R.F.f.S. Computing, ed. (Vienna, Austria)) and the pheatmap R package (Kolde, R. (2015). pheatmap: Pretty heatmaps [Software]. R. package) were used for Ward's hierarchical clustering and building heatmap.
Following stimulation with growth factors or drug perturbations, the fold changes in the phosphorylation levels or abundances of each protein are depicted in the molecular dataspace with the Cartesian coordinates. Then we applied a SVM, which is a supervised ML algorithm (Steinwart, I., and Christmann, A. (2008). Support Vector Machines (Springer Publishing Company, Incorporated)), to build a maximum margin hyperplane that maximizes the separation (aka margin) between distinct phenotypic states in the multidimensional dataspace of the RPPA data.
The SVM algorithm with a linear kernel from scikit-learn python library was applied to build a maximum margin hyperplane in the molecular dataspace that distinguish different cell states. The separation hyperplane is defined as,
Here {right arrow over (x)} is a radius vector from the origin of the coordinates to any point on the separation hyperplane, {right arrow over (n)} is the vector of unit length that is orthogonal to the separation hyperplane, and h is a constant.
To visualize the data and the state separation hyperplanes, we use principal component analysis (PCA).
The second step is building a vector, which connects the centroids of the point clouds that represent the two phenotypic states. i.e. differentiation and proliferation. To determine the components contributing to this centroid-connecting vector, we calculate the difference of fold-changes in the detected phosphorylation levels or abundances between the centroids of the TrkA and TrkB point clouds for each protein. Dividing this centroid-connecting vector by its length, we define a state transition vector (STV); its projection to PCA space is shown as an arrow in
Let A be the centroid of a cloud of points Ai (i=1, 2 . . . ) that corresponds to state 1 and B be the centroid of the point cloud Bi, corresponding to state 2. A state transition vector (STV) from state 1 to state 2 is defined as a vector {right arrow over (s)} of unit length that has the same direction as the vector {right arrow over (AB)} connecting the centroids A and B,
Eq. 2 shows that the STV is initially built in the full molecular dataspace of 115 dimensions.
Global cell signaling network spans multiple layers, from receptors to the cytoplasmic signaling layer and to the transcription factor layer (Citri, A., and Yarden, Y. (2006). EGF-ERBB signalling: towards the systems level. Nat Rev Mol Cell Biol 7, 505-516). Information transfer and interactions between these layers ultimately determine changes in cell state. Importantly, the STV allows us to capture the relative contributions of individual proteins to the overall change in molecular data that will switch TrkB proliferation state into TrkA cell differentiation state.
The vector s determined by Eq. 2 in the Cartesian coordinates has the components sk, k=1, . . . , N. Each STV component sk corresponds to an analyte k, measured by an antibody to a specific phosphosite on a protein or the protein abundance. The absolute value |sk| determines the STV rank of the analyte k, telling us about its importance for the switching of cell states. These STV ranks for a subset of the analytes with the highest rankings are presented in Table S3 (it being understood that the full table from which Table S3 is extracted would include all of the 115 analytes). To generate this table, the highest rank proteins and some of their immediate effectors were selected as core signaling network components. The changes in individual protein activities or abundances between the centroids of the data point clouds that characterize two different cells states were projected onto the STV to determine protein ranks. Resulting high rank proteins constitute a core signaling network.
If the protein vector and the STV are parallel or antiparallel, the projection of the STV to the protein's axis in the multidimensional space equals the full length of the individual protein vector while the length of the projection decreases as the direction of the two vectors diverge, becoming zero when these vectors are orthogonal. Therefore, these STV projections capture the relative contributions of different individual proteins to the overall direction of change in protein activities or abundances that will convert cell fates.
Consequently, the STV allows us to directly assign ranks to individual proteins according to their importance in switching cell states based on the magnitude of their contributions to the STV. That means we can identify the components of a core signaling network that controls cellular responses, as identified in the rightmost column of Table S3.
We observe that proteins belonging to the peripheral layers of cell signaling have the highest ranks, i.e. (i) receptor tyrosine kinases (RTKs), which include TrkA, TrkB, EGFR and ERBB2 Volinsky, N., and Kholodenko, B. N. (2013). Complexity of receptor tyrosine kinase signal processing. Cold Spring Harb Perspect Biol 5, a009043); and (ii) soluble kinases, AKT, RAF, MEK and ERK. This may not surprise as receptors control many downstream signaling pathways, and the ERK and AKT pathways are considered main downstream effectors of TrkA/B receptor signaling (Vaishnavi, A., Le, A. T., and Doebele, R. C. (2015). TRKing Down an Old Oncogene in a New Era of Targeted Therapy. Cancer Discovery 5, 25).
Interestingly, other highly ranked effectors are p70S6K (S6K) and p90RSK (RSK) kinases, which are targets where ERK and AKT signaling converge (Abe et al., 2009). This indicates that the differential integration of ERK and AKT activities may be a key factor in determining different cell fates in this cell system.
In summary, the STV allows us to identify the signaling molecules that control cell fate decisions. The highest ranked molecules can be perceived as the components of a core signaling network that controls the larger network in terms of cell fate decisions.
Determining which components belong to the core signaling network can be treated as an optimization: determining a cut-off in the ranking which maximises the number of components which can be mapped onto existing biochemical pathways while minimising the total number of ranked components used.
Next, we tested whether this knowledge can be directly used to design interventions that purposefully can change cell fates.
The strategy considered is to experimentally perturb these core components and test whether these perturbations can change the cell states. In order to analyze and predict the effects of such perturbations we can take advantage of the fact that the STV also contains information about the contributions made by all the other components of the signaling network measured by the RPPA. Therefore, removing the core components from the STV slightly reduces the dimensionality of the STV but renders it a representation of the overall signaling network downstream of the core components. It also eliminates potentially confounding effects resulting from the perturbations indirectly affecting the activity of upstream network components through feedback loops.
For instance, inhibition of ERK will abolish negative feedbacks to TrkA/B mediated RAS activation (Lake, D. et al. (2016). Negative feedback regulation of the ERK1/2 MAPK pathway, Cell Mol Life Sci 73, 4397-4413; Lavoie, H. et al. (2020). ERK signalling: a master regulator of cell behaviour, life and fate, Nature Reviews Molecular Cell Biology 21, 607-632), which would register as a change in ERK signaling, however is inconsequential for ERK mediated downstream events, as ERK is blocked by the inhibitor.
Thus, we use this dimensionality reduced STV for estimating the network effects and biological outcomes of experimental perturbations. For each perturbation we determine a perturbation vector that connects the centroids of the point clouds that have been obtained before and after the perturbation. This vector changes the cell phenotype when it pushes the centroid of the original point cloud across the plane that separates different cell states, stabilizes a cell state when moving the centroid away from the separation plane, or has no effect when it is (nearly) parallel to the separation plane.
These outcomes can be quantified by determining the dot product (P) of the perturbation vector and the dimensionality reduced STV. As the STV indicates the direction from a proliferation to a differentiation state, a negative P moves the proliferation state towards differentiation, which will be achieved when the value is large enough to cross the separation plane. Conversely, a positive P moves stabilizes the proliferation state, and P=0 does not change cell states.
For a correct interpretation of perturbation data using the STV, we have to exclude the analytes that composed the modules of our core signaling network. Accordingly, the dimensionality of the molecular dataspace where the STV and perturbation vectors are calculated is reduced from 115 to 70, due to 45 analytes from Table S1A, related to the five proteins flagged in Table S3, being allocated to core network components, leaving the remaining 70 analytes to defined the reduced dimensionality dataspace.
Let A with the radius-vector {right arrow over (x)}A be the centroid of the point cloud Ai, corresponding to the unperturbed state 1. Let Apert with the radius-vector {right arrow over (x)}A
The projection P of the perturbation vector {right arrow over (P)} on the STV ({right arrow over (s)}) is obtained as a dot product of these two vectors,
Distance from a Data Point to the Separation Plane Along the STV
Starting from a data point A in the molecular dataspace, we build a vector, which is collinear with the STV ({right arrow over (s)}) and crosses the separation hyperplane at a point, As. Thus, we have,
If the vector {right arrow over (AAS)} has the same direction as the STV, the value of S is positive, and S is negative if the vector {right arrow over (AAS)} has the opposite direction to the STV. In either scenario, the length (|S|) of the vector {right arrow over (AAS)} is the distance from the point A to the separation surface along the STV.
The vectors {right arrow over (x)}A
Using Eqs. 1, 5 and 6, we obtain,
Eq. 7 allows us to calculate the distance |S| from a point A in the molecular dataspace to the separation hyperplane between two different cell states, as follows
If {right arrow over (s)}={right arrow over (n)} then |S| is the shortest distance to the separation hyperplane. If {right arrow over (s)}≠{right arrow over (n)} then |S| is larger than the shortest distance to the hyperplane, because vectors {right arrow over (s)} and {right arrow over (n)} have unit lengths. If point A is a centroid of a point cloud that corresponds to a distinguishable cell state, than Eq. 8 determines the distance of this centroid to the separation hyperplane.
For experimental testing we used small molecule inhibitors to target core components:
Generally, the predicted effects of the inhibitors on the signaling network correlated well with the biological outcomes (Table 1 below). As predicted, the large negative P values for TrkB or p70S6K inhibition also strongly increased differentiation. Interestingly, and also as predicted, the RSK inhibitor had differential effects, decreasing differentiation in TrkA cells and weakly increasing differentiation in TrkB cells.
These correlations show that the STV produces good predictions which perturbations can change cell states. In Waddington's terms, the STV predicts well how we can steer a marble into a valley but does not reveal why that steer works. In order to obtain mechanistic insights into how a cell ‘computes’ its fate decisions after experimental perturbations, we need to reconstruct the computing machinery, i.e., the connections of a core network, and establish how perturbations to its constituents affect the DPD. The only means to precisely predict and explain the outcome of these experimental manipulations, which maneuver a cell through a Waddington landscape, is to explicitly model the nonlinear signaling dynamics that determine cell state transitions.
Explaining STV Predictions and Cell State Transitions by Mechanistic Insights Gained from Non-Linear Dynamic Models
In this context, an informative mechanistic model needs to comprise (i) a faithfully reconstructed network topology of the core network components deduced from the STV with interaction signs and strengths; and (ii) a network node that summarizes the remainder of the global network controlled by the core network and which links signaling changes to phenotypical changes; we call this node the dynamic phenotype descriptor (DPD).
In the molecular dataspace, we consider the STV as a vector {right arrow over (s)} of unit length directed from a centroid of a differentiation TrkA point cloud to a centroid of a proliferation TrkB point cloud. We now define the output of the DPD module as the S value,
In this definition, the direction of the vector {right arrow over (n)}, which is orthogonal to the separation hyperplane, points from the TrkB cloud to the TrkA cloud. Then, the DPD value (S) is positive for proliferation TrkB points and negative for differentiation TrkA points. The DPD values for ground state of TrkA and TrkB cells, GF stimulations and inhibitor treatments are given in Tables S5A and S5B.
Table S5C below shows the DPD module outputs panel, indicating the analytes which were taken as outputs of core signaling network modules.
Using the STV ({right arrow over (s)}), a perturbation vector ({right arrow over (P)}) and the unit length vector ({right arrow over (n)}) orthogonal to the separation hyperplane, we can calculate how the DPD value changes following each inhibitor perturbation. The DPD values, determined for the centroids of unperturbed (A) and perturbed (Apert) states, S and Spert, respectively, are the following (see Eq. 9),
Using Eqs. 3, 10 and 11, the change in the DPD upon a perturbation is expressed as follows,
From Eq. 11 it follows that if {right arrow over (s)}={right arrow over (n)}, then ΔSc=−P.
We build upon a physics-based method, termed Modular Response Analysis (MRA) that can exactly reconstruct and quantify causal, local connections between network nodes, including feedback loops (Bastiaens, P. et al. (2015). Silence on the relevant literature and errors in implementation. Nat Biotechnol 33, 336-339; de la Fuente A. et al. (2002). Linking the genes: inferring quantitative gene networks from microarray data. Trends Genet 18, 395-398, 2002; Kholodenko et al., (2002). Untangling the wires: A strategy to trace functional interactions in signaling and gene networks. Proceedings of the National Academy of Sciences 99, 12841; Yalamanchili et al., (2006) Quantifying gene network connectivity in silico: scalability and accuracy of a modular approach. Syst Biol (Stevenage) 153, 236-246).
In the MRA framework, each node is a reaction module, which can be a single protein or gene, a signaling pathway, or any functional object that can be defined in terms of input-output relations. For instance, in our core network the ERK module is a three-tier pathway that includes all isoforms of RAF, MEK and ERK. The network topology is quantified in terms of connection coefficients, aka local responses or connection strengths (Kholodenko et al., (1997) Quantification of information transfer via cellular signal transduction pathways [published erratum appears in FEBS Lett 1997 Dec. 8; 419(1):150]. FEBS Lett 414, 430-434).
These cannot be directly measured, because responses propagate through a network masking direct connections. Only systems-level responses to perturbations are captured in experimental data, and MRA infers a network from these responses. The responses are measured when a network approaches a steady state, or at the time instances when a signaling response is near its maximum or minimum, because in both cases the time derivative is about zero (Kholodenko, B. N., and Kholodov, L. E. (1980). Individualization and optimization of dosings of pharmacological preparations; principle of maximum in the analysis of pharmacological response. Pharmaceutical Chemistry Journal 14, 287-291; Santos et al., (2007). Growth factor-induced MAPK network topology shapes Erk response determining PC-12 cell fate. Nat Cell Biol 9, 324-330; Sontag et al., (2004). Inferring dynamic architecture of cellular networks using time series of gene expression, protein and metabolite data. Bioinformatics 20, 1877-1886). Whereas the overall topology usually does not markedly change between early peak and steady state responses, the connection strengths are highly dynamic and necessarily change at different time moments after perturbation.
The original MRA method requires as many perturbations as there are nodes in a network, and it is sensitive to measurement noise in the data (Thomaseth et al., (2018). Impact of measurement noise, experimental design, and estimation methods on Modular Response Analysis based network reconstruction. Sci Rep 8, 16217).
To increase its robustness, statistical reformulations of MRA have been developed based on maximum likelihood and Bayesian algorithms (Klinger et al., (2013). Network quantification of EGFR signaling unveils potential for targeted combination therapy. Mol Syst Biol 9, 673; Santra et al., (2013). Integrating Bayesian variable selection with Modular Response Analysis to infer biochemical network topology. BMC Syst Biol 7, 57).
The Bayesian MRA formulation (BMRA) requires fewer perturbations than MRA, is tolerant to noise, and allows to incorporate existing pathway knowledge as a prior network to improve inference precision (Santra et al., (2018). Reconstructing static and dynamic models of signaling pathways using Modular Response Analysis. Current Opinion in Systems Biology 9, 11-21). Even when this knowledge is inaccurate for half of the network edges, BMRA recovers a nearly perfect network topology as validated in independent experiments (Halasz et al., (2016). Integrating network reconstruction with mechanistic modeling to predict cancer therapies. Sci Signal 9, ra114).
Mapping the core components specified by the STV onto known signaling pathways we obtained a prior topology of a core network, which contains the Trk and ERBB receptors, and downstream signaling pathways, i.e. the ERK, AKT, p70S6K, p90RSK and JNK pathways. This prior network was identical for the TrkA and TrkB expressing cells, as seen in
In order to reconstruct the posterior network, we used the drug perturbations described in Table 1, measuring 10 and 45 minute timepoints in TrkA and TrkB cells stimulated with NGF or BDNF, respectively. The time courses after growth factor stimulation indicated that the TrkA, TrkB, EGFR, ERBB2, AKT and ERK peaked around 10 minutes and attained steady-state levels at about 45 minutes, as seen in
Network reconstruction was performed for both time points using BMRA as shown in Table S4 below, and as described in the mathematical treatment that follows.
Although not surprisingly, connection strengths were different between the peak and steady-state levels, a common consensus network can readily be derived for each cell line.
The BMRA-reconstructed TrkA and TrkB signaling networks feature numerous differences in their topologies. Major differences include a strong negative feedback from JNK to AKT in the TrkA network and a strong positive feedback loop from RSK to ERBB in the TrkB network that may act as an autocatalytic amplifier of the ERBB->ERK->RSK->ERBB module. The strong activation of p70S6K by ERK in TrkB cells is subverted into a strong inhibition of ERK by p70S6K in TrkA cells. Overall, the TrkA network has more inhibitory connections, while the TrkB network comprises more stimulatory interactions.
To reconstruct the topology and strengths of causal connections of the core network, including the influence of each pathway module on the Dynamic Phenotype Descriptor (DPD), we used a modified version of BMRA (Halasz et al., 2016). A family of Modular Response Analysis (MRA) methods, including BMRA, allows both (i) predicting systems-level network responses to different perturbations and (ii) reconstructing the topology and strengths of causal network connections based on experimentally measured responses to perturbations (Bastiaens et al., 2015; Santra et al., 2018).
Each core network module has a single quantitative output (xi), termed communicating species in the MRA family framework. The temporal dynamics of the module outputs is given by a system of ordinary differential equations (ODE),
Here the functions ƒi describe how the rate of change of independent variables xi depends on the activities of other network modules. The parameters, pi∈P, represent kinetic constants and any external or internal conditions, such as the conserved moieties and external concentrations that are maintained constants.
For each network module xi, the connection coefficient (rij) quantifies the fractional change (Δxi/xi) in its output brought about by a change in the output of another module (Δxj/xj), while keeping the remaining nodes (xk, k≠i,j) unchanged to prevent the spread of this perturbation over the network (Kholodenko et al., 1997; Kholodenko et al., 2002).
Positive and negative rij quantify direct activation and inhibition, respectively, whereas zero values show that there are no direct connections. The coefficients rij are expressed in terms of the elements of the Jacobian matrix (∂ƒi/∂xj) of the ODE system at the steady state (st. st.), as follows (Kholodenko et al., 2002),
The connection coefficients cannot be directly measured and are inferred using the systems-level, global network responses to perturbations. Following a change (Δpj) in a parameter (pj) that affects node j, the global response (Rij) to this perturbation is determined as,
To infer the connection coefficients rij based on the experimentally measured, global responses Rij, the entire network is initially divided into n subnetworks, each containing only edges directed to a particular node (i). To determine the connection coefficients {rik} for all xk (k≠i), n−1 independent parameters pj (j=1, . . . , n−1) must be perturbed, neither of which can directly influence node i, whereas any other node k (k≠i) is affected by at least one of these parameters pj. Formally, for each xi (i=1, . . . , n), we choose a subset Piof n−1 parameters pj known to have the property that the function ƒi for node i in Eq. 2 does not explicitly depend upon pj, whereas each of the remaining nodes k (k≠i) is perturbed by at least one pj∈Pi. This condition is described as follows,
Taken into consideration that rii=−1 (Eq. 15), all connections to the node i can be found by solving the following system of linear equations,
Repeating this procedure for all n subnetworks, the entire network is reconstructed.
This standard MRA procedure can fail when the data are too noisy or some module responses were not detected (Thomaseth et al., 2018). BMRA overcomes these limitations by explicitly incorporating noise in Eq. 18 (Halasz et al., 2016),
Here, Aik are the elements of the adjacency matrix, which are equal to 1 if the connection coefficient rik is non-zero, or equal to 0 otherwise; ϵij are the error variables assumed to be independently and identically distributed Gaussian random variables with the 0 mean and the variance σ2, i.e. ϵik˜(0, σ2). The error variance (σ2) is assumed to be a random variable with the inverse Gamma distribution, i.e. σ2˜IG(a, b), where a and b are the location and scale parameters. Following the common practice, we chose a=1, b=1. Further, for brevity we refer to this distribution P(σ2).
BMRA uses prior knowledge that is formulated in the form of the prior probability distributions. Based on the existing knowledge (Kanehisa et al., (2010). KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Research 38, D355-D360; Szklarczyk D. et al., (2019). STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research 47, D607-D613; Vaishnavi et al., 2015), we derived the reference network Ai0={Aik0}. The prior distribution P(Ai), Ai={Aik}, has the maximum at the reference network Ai0 and penalizes for the deviation from this network as follows, P(Ai)∝exp(−ψ·dH(Ai, Ai0)), where dH(Ai, Ai0) is the Hamming distance between the network Ai and the reference network Ai0, ψ is a constant. The prior distribution of ri is dependent on Ai and σ2 and is denoted by P(ri|Ai, σ2). If there is no direct connection from xj to xi, i.e. Aij=0, the corresponding connection strength (rij) is assumed to have 0 value with probability 1, whereas the connection strengths representing direct interactions (ri={rij: Aij=1, j≠i}) were assumed to have a Gaussian prior P(ri|Ai)˜(0, Vi) where Vi=cσ2(RiR*iT+λI). Here, R*i is the global response matrix of the nodes (nj, j≠i) which directly regulate ni (i.e. nj, j≠i: Aij=1), c is the proportionality constant which is also known as the Zellner's constant. As previously, we chose c=Npi, where Npi is the number of perturbations other than those directly affecting node xi, and λ=0.2 (Halasz et al., 2016; Santra et al., 2013).
Bayesian statistics is applied to update prior estimates of the binary vector Ai={Aik}, k=1, . . . , n and the vector of connection coefficients ri={rik}, k=1, . . . , n, to obtain posterior estimates of these variables using the experimental data, i.e., the global response matrix R=Rik (Eq. 16),
Here, P(R|ri, Ai, σ2) is the likelihood function of the global response matrix R, given a connection coefficient vector ri and a binary vector Ai. P(ri|Ai, σ2) and P(Ai) are the prior distributions of ri and Ai, respectively. The denominator P(R) is defined as follows,
A key to BMRA is that the likelihood function for the observed global response matrix R is derived from the MRA equations (Eq. 18-20),
Here, Ri={Rik, k≠i} is the global response of node xi to perturbations that do not directly affect xi, R*iT is the global response matrix of the nodes (xj, j≠i) which directly regulate node xi (i.e. xj, j≠i: Aij=1, and r={rij: Aij=1, j≠i}. N(Ri|R*iTr,σ2I) designates the normal distribution for Ri where the mean equals R*iTr and the variance σ2I.
The denominator in Eq. 20 that normalizes the probability P(ri, Ai, σ2|R) cannot be obtained analytically. The posterior distributions were estimated using Markov Chain Monte Carlo (MCMC) sampling algorithm. The posterior probability of Ai provides a quantitative measure of how well a certain configuration of Ai is supported by both prior knowledge and experimental data.
The values and confidence intervals for the corresponding connection coefficients are obtained from the posterior probability of ri. To increase the accuracy of the method, we have modified the previously published algorithm (Halasz et al., 2016) and applied Occam's razor approach, by calculating mean and STD value of ri using not the entire posterior distribution of ri, but only the part that has the highest posterior likelihood.
Table S5C presents the list of analytes that are outputs of signaling modules (TRK, ERBB, ERK, AKT, JNK, S6K and RSK) of our core network. The output of the DPD module is determined using Eqs. 9-12 in the 70-dimensional molecular dataspace. To calculate the global response coefficients for the signaling modules, xi, we used central fractional differences to approximate the logarithmic derivatives,
Here xi0 and xi1 are the i-th module outputs before and after a perturbation to the parameter pj. Because the sign of the DPD value (S) could change for large perturbations, we used either left or right fractional differences,
A feature of the BMRA formalism is that some modules might not be perturbed, but still the network topology will be inferred. In a core network we have not perturbed a module consisting of the ERBB family of RTKs, which can crosstalk with Trk receptors either directly or through downstream signaling pathways and feedback loops. The output of this additional RTK module (termed ERBB) is determined as the sum of EGFR and ERBB2 phosphorylation, measured with the corresponding antibody that does not distinguish between these two ERBB receptors. Having determined the global response coefficients of all other modules using Eqs. 23 and 24, BMRA inferred the connection strengths and confidence intervals that are given in Table S4.
In order to understand how the core network exerts control over the global signaling network and specifies cell states and their transitions, we need to include it into the model. The STV/DPD formalism allows us to summarize all individual contributions to the global network responses in a dedicated network module termed the DPD.
This DPD module comprises all measured protein activities and abundances, except for the components of the core network. The absolute value of the DPD output (S) is the distance between the hyperplane that separates phenotypic states and the centroid of a point cloud of a given cell state (minus the core network components), but the sign of S can be negative or positive. This sign depends on whether the distance is determined in the parallel or antiparallel direction to the STV. For the selected STV direction, the sign of S is positive if a centroid of a point cloud is on the same side of the separation plane as a proliferation cloud and the sign of S is negative at the differentiation side. Therefore, any perturbation that drives the cellular response from differentiation to proliferation changes the S sign to positive, whereas moving from proliferation to differentiation makes S negative.
Introduction of the DPD allows us to systematically examine the influence of all core network pathways onto cell state transitions alone and in combination. We again used BMRA to determine connections to the DPD, as a network node, for each core pathway. The BMRA inferred influence of the core network on the DPD in TrkA and TrkB cells is shown respectively in
As the DPD also links the network to cell fate decisions, the connection coefficient indicates not only a change in signaling but also a change in phenotype. A positive connection coefficient means that the cell is pushed towards proliferation, whereas a negative coefficient indicates a push to differentiation.
Because the changes in the DPD are downstream of the core network and therefore plausibly require more time, we assessed the DPD responses after 45 minutes of GF stimulation. By measuring the fold-changes in the outputs of signaling pathways and the STV module (ΔS/S), we obtained the global, systems-level responses to perturbations. These data enabled BMRA to infer the influence of each signaling pathway on the proximity of a point cloud to the separation hyperplane between proliferation and differentiation states, i.e., on the DPD and cell phenotype.
The distances of centroids of TrkB cells to the separation surface (light grey quadrilateral) before and after perturbation are shown by black lines. The DPD module output is the distance of a centroid from the separation hyperplane determined along or opposite the STV direction taken with the plus sign if the centroid is located at the right side from the separation hyperplane (proliferation), and with the minus sign if the centroid is at the left side (differentiation).
As expected, the ERK and the S6K modules strongly promote cell proliferation in both TrkA and TrkB networks. The facilitation of proliferation by RTKs, including Trk receptors, and AKT is mediated by their downstream effectors, i.e., by the ERK and S6K modules for RTKs and by the S6K module for AKT. At the same time, the influence of the RSK and JNK modules on cell phenotype is drastically different between these networks. In the TrkA network, RSK and JNK are two main signals that suppress proliferation and induce differentiation phenotype, whereas in the TrkB network these pathway modules do not influence the STV module and, therefore, the phenotype. Thus, ERK-induced activation of JNK and RSK modules in TrkB-expressing cells does not lead to the suppression of proliferation of these cells.
Quantitative comprehension of signaling dynamics can only be achieved using a nonlinear mechanistic model. It is a necessary prerequisite to explain and predict how diverse experimental manipulations alter signaling patterns, resulting in changes in the DPD and, consequently, cell states. The BMRA-quantified core network topologies and the inferred influences of its signaling pathways on the DPD allow us to directly derive mechanistic dynamical models for TrkA and TrkB cells. These models predict both the dynamics of core pathway outputs and the changes in cellular phenotype.
The TrkA and TrkB core networks showed several differences in connections and their strengths (
The model simulations, as seen in
For example, it can be seen that inhibition of S6K increases phosphorylation levels of ERK and AKT due to downregulation of S6K-induced negative feedback loops, which are stronger in TrkA cells than in TrkB cells (
Using the quantified core network topologies and the inferred pathway influences on the DPD, a nonlinear ODE model is built by rule-based approach (Blinov, M. L. et al., (2004). BioNetGen: software for rule-based modeling of signal transduction based on the interactions of molecular domains. Bioinformatics; Borisov, N. M., et al. (2008). Domain-oriented reduction of rule-based network models. IET Syst Biol 2, 342-351; Chylek L. A. et al. (2014). Rule-based modeling: a computational approach for studying biomolecular site dynamics in cell signaling systems. Wiley Interdisciplinary Reviews: Systems Biology and Medicine 6, 13-36) for TrkA and TrkB cells. Here we describe the fundamentals of the model.
The activation of TRK and ERBB receptors by ligand binding and dimerization is modeled mechanistically. Briefly, NGF/BDNF binding to TrkA/TrkB is followed by receptor dimerization and phosphorylation, whereas the basal rate of ERBB dimerization is maintained by diverse GFs present in serum. The homo-dimerization of TrkA, TrkB and ERBB and hetero-dimerization of TrkB and ERBB is modeled using the thermodynamic approach developed previously (Kholodenko, B. N. (2015). Drug resistance resulting from kinase dimerization is rationalized by thermodynamic factors describing allosteric inhibitor effects. Cell Rep 12, 1939-1949). The binding of the first and second molecules of the ligand and the subsequent homo and hetero-dimerization of RTKs satisfy so-called “detailed balance” constraints (see, e.g., (Ederer and Gilles, (2007). Thermodynamically feasible kinetic models of reaction networks. Biophys J 92, 1846-1857; Hearon, J. Z. (1953). The kinetics of linear systems with special reference to periodic reactions. Bull Math Biophys 15, 121-141; Kholodenko et al., (1999). Quantification of short term signaling by the epidermal growth factor receptor. J Biol Chem 274, 30169-30181).
These thermodynamic restrictions require the product of the equilibrium dissociation constants (Kd's) along a cycle to be equal to 1, as at equilibrium the net flux through any cycle vanishes, since the overall free energy change is zero. Because ligand binding facilitates the RTK dimerization, following the thermodynamic approach (Kholodenko, 2015), we introduce three thermodynamic factors, describing how the Kd's of homo- and hetero-dimerization of RTKs change upon ligand binding. When Trk receptor inhibitor is added, an inhibitor-free protomer can still cross-phosphorylate the other protomer in a dimer.
The core network dynamics is modeled up to 45 minutes, and therefore the total moieties of ERK, AKT, JNK, S6K and RSK are assumed to be conserved. However, internalization of RTKs that is occurring on this timescale is included in the model (Cosker, K. E., and Segal, R. A. (2014). Neuronal signaling through endocytosis. Cold Spring Harb Perspect Biol 6). Following, internalization RTKs are subsequently degraded, whereas there is also an influx of receptors from the cell interior to the membrane. The disappearance of RTKs from the plasma membrane depends on the dimer composition. In the model the rate of internalization of TrkB-ERBB heterodimers is assumed to be slower than the internalization rate of TrkA and TrkB homodimers, based on the literature. The BMRA-inferred connections show that there are multiple feedback loops to the ERBB module from downstream kinase modules (Table S4). The influence of these feedbacks on the ERBB module activity is modeled as hyperbolic multipliers that modify the rate of activating ERBB phosphorylation (Tsyganov, M. A. et al. (2012). The topology design principles that determine the spatiotemporal dynamics of G-protein cascades. Mol Biosyst 8, 730-743). The RTK dephosphorylation is catalyzed by phosphatases. The activation and deactivation dynamics of the downstream signaling modules is modeled using the Michaelis-Menten kinetics and hyperbolic multipliers that account for signaling crosstalk between the pathways. The developed model of the core signaling network consisted of 81 species and 404 reactions.
The BMRA network reconstruction constrains parameters of the dynamical model by maximum likelihood values of the inferred connection strengths (Table S4). In particular, only interactions between modules where the connection coefficients have statistically significant non-zero values are included in the model. Additional constraints on the parameter values occur because the inferred connection coefficients are normalized Jacobian elements (Kholodenko et al., 2002), which are functions of the model parameters (Eq. 15 and as described further below).
The model includes the DPD module whose output summarizes the contributions of all individual proteins (minus core network constituents) to the global network responses. This module describes cell-wide signaling and the DPD output (S) is defined by Eq. 9. The DPD maps the network-wide changes, which occur in the multidimensional molecular dataspace upon perturbations, into a 1D (S) space. If the data point clouds before and after a particular perturbation are measured in the experiments, ΔS can be calculated using Eq. 12. Our model allows to determine the dynamics of S following any drug perturbation to core network pathways. The calculated DPD trajectory is a 1D projection of cell maneuvering in Waddington's landscape, determined as follows,
Here ƒ(S) is the restoring driving force guided by Waddington's landscape. The sum in Eq. 25 is the signaling driving force, xj(t) are the outputs of signaling modules, rSj are the corresponding, BMRA-inferred connection coefficients to the STV (see Table S4), and Sst.st. and xjst.st are the initial steady-state values of S and x; before perturbations.
The restoring driving force ƒ(S) is given by the derivative of the potential (U), as follows
The potential (U) that models Waddington's landscape has 3 minimums. These minimums correspond to three stable steady states of neuroblastoma cells: the ground state (Sg), differentiation (Sd) and proliferation (Sp). There are two unstable steady states at the borders between the basins of attraction of two neighboring steady states.
Assuming the quadratic potential U in the vicinity of each stable state, (which is widely used in physics (Landau and Lifshitz, 1980)), the restoring driving force ƒ(S) is modeled using a piece-wise linear approximation. This force ƒ(S) is set to zero at the borders between the basins of attraction, and ƒ(S) reaches its maximum at the half distance to the border from the stable steady state (Eq. 27).
Eq. 25 allows for an interpretation of a cell progressing through the molecular dataspace as a particle that moves in the potential force field (Waddington's landscape) and the field of external forces exerted by responses of core signaling pathways,
In the vicinity of the steady state Si∈{S0, SD, SP}, the solution of Eqs. 26 and 27 is expressed analytically as follows,
Eq. 29 illustrates the system has the characteristic memory time, tm˜1/α. On the times much smaller than the memory time, t<<tm, the entire change in S is determined by the time integral over signaling driving force.
To decrease the number of parameters to fit, the concentrations of different protein forms and the parameters with the concentration dimensionality, such as, the Michaelis' constants, were normalized on the conserved total protein concentrations. Only the time was left as the dimensional variable (measured in seconds) to readily interpret model simulations.
To refine the parameters of pathway interactions of our core network inferred by BMRA, the data were split into a training set and a validation set. The training set included the time course of TrkA and TrkB phosphorylation measured by Western Blot and 10 min RPPA data for the remaining signaling modules. The model-generate time courses were fitted to these training set data with the objective function defined as the sum of squares of deviations. A feature of our parameter refinement is that in addition to the training dataset, we constrained the parameters using the BMRA inferred connection coefficients within their confidence intervals. Implicit constraints on the parameter values occur because the connection coefficients defined in Eq. 15 have to be within the confidence intervals of the BMRA inferred connections. Then, we used a unique feature of the pyBioNetFit software, which allows adding parameter constraints in the forms of inequalities to the parameter fitting process (Mitra et al., 2019). A combination of scatter search and simplex methods and pyBioNetFit software were used to fit the model simulations to the training dataset, as shown in
In total, the rule-based nonlinear model of the core signaling network and cell state transitions consists of 82 species and 405 reactions. The simulations of the models were run using BioNetGen software (Blinov et al., 2004), which used CVODE routine from the SUNDIALS software package (Hindmarsh, A. C. et al., (2005). SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers. ACM Trans Math Softw 31, 363-396) for solving ordinary differential equations (ODE). Matplotlib Python package was used for plotting experimental and modeling results.
Eq. 25 determines the DPD dynamics when the cell progression through the molecular dataspace is directed by the signaling driving force and the restoring force. For the signaling driving force, we fit the coefficients,
in the ranges constrained by the confidence intervals of the BMRA-inferred connection coefficients (rSj), whereas the signaling module outputs (xj) are calculated by the model. For the restoring driving force, Eq. 27, only slopes parameters (α) are fitted.
A key feature of the cSTAR approach is that we integrate cell state transitions into a mechanistic kinetic model. Thus, we follow both the kinetics of pathway outputs of the core network (
In the molecular dataspace, we present cell states as centroids of the data point clouds that describe a particular phenotype. Before ligand stimulation or drug perturbations, cells reside in (meta)stable states. Following a perturbation, the movement of a centroid is governed by two driving forces. One is a signaling driving force that emerges from the changes in core network activities, and the other is a restoring force that pushes the centroid back to its original (meta)stable state, provided the deviation from this original state has not been too large (
The mechanistic nature of the BMRA based network reconstruction combined with the inclusion of the DPD as a quantitative indicator of cell states, allows us to calculate the DPD values following GF stimulation.
Our simulations show that starting from the ground state TrkA and TrkB cells differentially progress through Waddington's landscape and assume two different states, differentiation and proliferation, as shown in
These predictions are further supported by the experimental data illustrated in
We also calculated the DPD trajectories after inhibitor perturbations.
The model shows that inhibition of TrkB and S6K are pro-differentiation interventions (
These simulations are corroborated by live cell images taken at 72 hours in these cells (
In
Interestingly, the model shows that whereas RSK does not directly affect the STV module output in TrkB cells, RSK inhibition decreases the ERBB module activity, resulting in the decrease of proliferation stimulation by the ERK and S6K modules (Table S5 and
Thus, the developed model allows capturing both direct and network-mediated effects of drugs on cell phenotype. The model predicts that a marked increase in the AKT activity will result in abolishing differentiation and increased proliferation of TrkA cells. This is illustrated in
In
Indeed, transfection of TrkA cells with myristolated AKT, which is constantly active, stops differentiation and leads to proliferation of TrkA cells.
Although the sensitivity of the STV module output to diverse signaling inhibitors (Trk, S6K, ERK, AKT, RSK and ERBB) is different, simulations show that sufficiently high doses of these inhibitors facilitate, at least partially, TrkB cell differentiation (
Using the model, we can calculate not only time dependent DPD responses to a certain drug but also DPD dose responses. Importantly, the model predicts signaling patterns and cell state responses for different doses of drugs applied not only separately but also in combinations.
In
In
The model suggests that this combination synergistically induces differentiation of TrkB cells (
Experiments corroborate model predictions, showing that a combination of the ERBB inhibitor Gefitinib and the MEK inhibitor Trametinib synergistically inhibits ERK signaling.
cSTAR Flexibility and Scalability
Next, we tested cSTAR's performance with data of different type and scale. Using the same conditions as in the RPPA dataset, we acquired quantitative phosphoproteomics MS datasets for TrkA and TrkB cells. Calculating the STV and DPD changes for cell-wide signaling pattern of ca. 5000 phosphosites resulted in similar core network components and a key prediction of synergy between ERBB and MEK inhibitors in inducing TrkB cell differentiation without affecting the TrkA cell phenotype (
Thus, cSTAR produces robust and reproducible results even when the input data differ vastly in scale and bias.
To map drug resistance mechanisms, we applied cSTAR to an extensive RPPA dataset of 238 proteins measured under 89 perturbations of RAF inhibitor resistant SKMEL-133 cells. As different phenotypic states we selected proliferation (untreated cells) and apoptosis induced by combination treatment with MEK and PI3K/AKT/mTOR inhibitors.
The STV ranked the MEK/ERK, AKT, mTOR/S6K, SRC, CDK4/6, PKC, and IRS modules as the components of a core network that controlled these states. Next, we applied BMRA to single-drug perturbation data, inferring the core network circuitry and its connections to the DPD modules. The reconstructed network included known signaling routes, including the IRS-mediated activation of the ERK and AKT modules, AKT activation of mTOR, CDK4/6 activation by ERK and mTOR, and negative feedback from mTOR to IRS.
However, BMRA also uncovered activating connections from PKC to AKT, mTOR, SRC and CDK4/6, a negative connection from PKC to IRS, and CDK4/6-induced positive and negative feedback loops to the AKT and SRC modules.
Based on their direct connections to the DPD, mTOR and PKC drive proliferation, while the phenotypical effect of other nodes is indirect. For instance, ERK activates mTOR through SRC and CDK4/6 to stimulate proliferation, partially counteracted by CDK4/6-mediated feedback inhibition of ERK. Although SRC directly inhibits the DPD, it stimulates proliferation on the systems level by activating mTOR.
The original publication by Korkut (referenced above in relation to
Informed by the BMRA network reconstruction, we built a nonlinear dynamical model of SKMEL-133 cell signaling and phenotypic behavior. Because cSTAR enables building models of different granularities, we tested the effects of including or omitting MYC. Adding MYC only changed parameters of modules directly interacting with MYC without changing any model predictions. Thus, the ODE description of each network module can be extended to include additional mechanistic knowledge.
The model predicted that an mTOR inhibitor was the most efficient single drug to induce apoptosis in SKMEL-133 cells, whereas PI3K/AKT inhibition was less effective.
This differential sensitivity is explained by the double-positive feedback between CDK4/6 and mTOR (
The cSTAR model recapitulated the results by Korkut et al including the synergy between MEK and MYC inhibitors. Furthermore, the model predicted that combining Insulin/IGF1 receptor and PI3K/AKT inhibition enhances synergy. This result is supported by calculating the Talalay-Chou combination index and simulating SKMEL-133 cell maneuvering in Waddington's landscape following inhibitor treatments.
Thus, PI3K/AKT or Insulin/IGF1 receptor inhibitors given separately do not switch the DPD to negative, apoptotic region (
cSTAR quantifies phenotypic changes via the DPD, opening the possibility to integrate different omics datasets by comparing the normalized DPD changes following perturbations. Testing this, we applied cSTAR to two datasets that analyzed EMT suppression by kinase inhibitors. One study (Cook, D. P. & Vanderhyden, B. C., “Context specificity of the EMT transcriptional response”. Nature Communications 11, 2142, doi:10.1038/s41467-020-16066-2 (2020)) used single-cell RNA sequencing (scRNA-seq) of four cancer cell lines stimulated with three different ligands, TGFβ, EGF and TNFα. The other (Chen, W. S. et al. “Uncovering axes of variation among single-cell cancer specimens”. Nature Methods 17, 302-310, doi:10.1038/s41592-019-0689-z (2020)) used single-cell resolution mass cytometry of phosphoproteomic responses in Py2T breast cancer cells stimulated with TGFβ.
The results of the cSTAR analysis correspond well to the original phenomenological observations and conclusions drawn in these papers. They show that cSTAR correctly captures the relationships between phenotypical and underlying molecular states. Moreover, cSTAR adds new insights. Interestingly, the DPD analysis of scRNAseq data demonstrated that at single-cell resolution the observed partial EMT states comprise a continuum of intermediate states between fully epithelial and fully mesenchymal states. To underpin these states with mechanistic interpretations, which was previously not possible, we applied BMRA to reconstruct the twelve signaling networks (four cell lines, three ligands), underlying these phenotypes in each cell type under each condition. These networks show how differential network topologies and connection strengths cause cell type and stimulation-specific responses. These reconstructions of different network topologies will help designing the most informative experiments to disentangle the relationships between these multiple EMT states.
MS proteomics data used in the experimental work are uploaded to the PRIDE database (accession number PXD028943). The RPPA data for SKMEL-133 cell line are available at http://projects.sanderlab.org/pertbio/. The CYTOF data for EMT in Py2T cell line32 is available at https://community.cytobank.org/cytobank/experiments#project-id=1296. Software code for the data analysis, network reconstruction and modeling are available at https://github.com/OleksiiR/cSTAR_Nature.
Cells employ signaling networks to process input signals and generate specific biological outputs. Signaling networks function via posttranslational modifications (PTMs) and are controlled by external cues and feedback loops mediated by PTMs and expression changes. Therefore, protein phosphorylation and expression datasets of cell responses to external cues contain rich information about cell states and fate decisions. There are several distinctive states, including differentiation, proliferation, senescence and apoptosis, which exhibit different phenotypes that can be well-detected by current experimental methods.
Omics data allow us to correlate cell-wide expression activity values with each phenotype, but how cell fate decisions are governed by signaling networks remains obscure.
Here, we have developed and experimentally validated the cSTAR approach that uses omics data to distinguish cell states, infer and quantify a core signaling network that determines transitions between these states.
This approach separates different cell states in the omics dataspace by machine learning methods and introduces the State Transition Vector (STV). Using the STV, the contributions of different protein abundances or activities to a cell state can be directly ranked. The components with high rank populate a core network, which drives global signaling patterns.
Subsequently, the causal core network connections and their strength are inferred using the Bayesian formulation of Modular Responses Analysis. It then builds mechanistic models to predict experimental perturbations that convert cellular states, e.g., proliferation into differentiation.
A key feature is that a process of cell fate decision making is included into a mechanistic model. We connected the signaling dynamics to the intuitively attractive picture of Waddington's landscape. Although many attempts were undertaken to quantify this landscape, the cell progression from one state to another was never connected to the responses of cell surface receptors and downstream signaling networks to external cues that drive this progression, and therefore experimental signaling activities data have not been used. Integrating biochemistry and physics the quantitative cSTAR approach determines how the activities of multiple signaling pathways dynamically control cell progression via Waddington's landscape, resulting in state transitions and fate decisions.
The cSTAR approach introduces a signaling driving force, which is coming from the responses of receptors and kinases to perturbations, such as external cues and pharmacological interventions, and is imposed on the initial potential, shaping Waddington's landscape. This force drives downstream signaling and transcription factor activities that ultimately determine cell fate decisions.
Because only omics data are available for this cell-wide signaling and transcription factor network, its mechanistic modeling is currently impractical. Previously, Waddington's landscape and transitions of stem cells to differentiation were interpreted by calculating multiple (quasi)steady states of a small transcription factor networks (Lu, M. et al. (2014). Construction of an Effective Landscape for Multistate Genetic Switches. Physical Review Letters 113, 078102).
A distinction of the cSTAR approach is the use of omics data obtained in response to experimental perturbations of core signaling network specified by the STV. Informed by these data, the cSTAR approach builds a core network mechanistic model, which includes global cell network as a dedicated module. The output of this module, a quantitative descriptor of cell phenotype, DPD together with the signaling pathway outputs are biochemically interpretable variables of the model. The model examines cell maneuvering in Waddington's landscape by monitoring the coordinated regulation of the components of the global cell network described by DPD. This model predicts how external and internal cues will change cell states.
The cSTAR approach can be flexibly extended to other omics datasets. If an omics dataset, for instance, RNAseq, contains data for only two different phenotypic states, a standard approach is determining of differentially expressed genes. Likewise, differentially phosphorylated phosphosites are determined for phosphoproteomics datasets. In the absence of perturbations, ranking of analytes by their contribution to the STV can provide similar information as above approaches. However, if a dataset contains at least a handful of perturbations, the calculation of a dot product of the STV and the perturbation vector helps us determine where each perturbation moves a cell state with respect to the state separation hyperplane, and thereby the change to the DPD brought about by this perturbation. Moreover, if an omics dataset contains a sufficient number of perturbations, the cSTAR approach determines (i) causal connections between signaling nodes of a core network driving cell fate decisions, (ii) connections to the DPD node linking signaling to cell state changes, (iii) nonlinear mechanistic model that predicts signaling and cell state responses to inhibitor perturbations.
Our application examples show that cSTAR can utilize and integrate diverse omics data including targeted and unbiased data of different scales as well as single cell data. This universality and scalability distinguishes cSTAR from other approaches that are more specialized in terms of input data, e.g. approaches relying on mRNA velocity input.
Summarizing, cSTAR offers a cell-specific mechanistic approach to describe, understand and purposefully manipulate cell fate decisions. As such it has numerous applications across biology that go beyond the use for interconverting proliferation and differentiation shown here as example.
Number | Date | Country | Kind |
---|---|---|---|
2107576.7 | May 2021 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2022/064502 | 5/27/2022 | WO |