Patients with chronic kidney and/or end-stage renal diseases (CKD/ESRD) are subject to increased health and morbidity risks than the general population. For example, cardiovascular disease (CVD), accelerated by cardiometabolic risks, is a leading cause of mortality in patients with CKD/ESRD. CKD/ESRD patients are at a higher risk of mortality about two to ten times that of the general population. This risk is not entirely accounted for by traditional risk factors such as hypertension, diabetes, and hyperlipidemia alone; non-traditional risk factors related to CKD metabolic bone disorder (CKD-MBD) and vascular calcification have provided additional missing links. Despite all of these risk factors, cardiovascular disease in CKD/ESRD patients remains underdiagnosed and undertreated. A primary reason is that standard clinical interventions used in the general population for managing cardiovascular disease are ineffective in reducing mortality risk in CKD/ESRD patients, due to interactions between interlinked pathophysiological mechanisms with nonlinear cascading effects. A similar situation exists for other disorders and health conditions, such as anemia, bone mineral metabolism, acid-base disorders, and/or the like.
It is with respect to these and other considerations that the present improvements may be useful.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to necessarily identify key features or essential features of the claimed subject matter, nor is it intended as an aid in determining the scope of the claimed subject matter.
In one embodiment, an apparatus may include at least one processor and a memory coupled to the at least one processor. The memory may include instructions that, when executed by the at least one processor, cause the at least one processor to perform a chronic kidney and/or end-stage renal diseases (CKD/ESRD) condition analysis process to determine a CKD/ESRD condition model configured to model a CKD/ESRD condition, the vascular calcification analysis process to: receive input data associated with at least one patient, perform a dynamical system learner process to build a collection of dynamical system models, determine a model rank for at least a portion of the collection of dynamical system models, and determine an optimal dynamical system model for modeling the CKD/ESRD condition for the at least one patient.
In some embodiments of the apparatus, the instructions, when executed by the at least one processor, may cause the at least one processor to pre-process the input data to impute missing values. In various embodiments of the apparatus, the instructions, when executed by the at least one processor, may cause the at least one processor to perform a causal analysis of the input data to generate causal information. In some embodiments of the apparatus, the causal information may include a causal diagram. In various embodiments of the apparatus, the model rank may be configured to indicate model performance for dynamical relationships between variables in the input data.
In exemplary embodiments of the apparatus, the collection of dynamical system models may model one or more of the following variables: pre-treatment pulse pressure (P), Neutrophils-Lymphocytes ratio (ρNL), Serum calcium concentration (CCa), Intact Parathyroid Hormone (CPTH), Serum albumin concentration (g/dL) (CAb), Serum phosphorus concentration (CP), or Alkaline Phosphatase (CAP).
In various embodiments of the apparatus, the instructions, when executed by the at least one processor, may cause the at least one processor to: receive patient information for a patient; analyze the patient information using one of the collections of dynamical models to predict a CKD/ESRD condition process for the patient based on modeled variables.
In various embodiments of the apparatus, the input data may include a time series of system observables and a library of functions configured as an operator on the input data. In some embodiments of the apparatus, the dynamical system models may include differential equations that describe a time evolution of at least one variable of the input data.
In one embodiment, a computer-implemented method may be configured to perform a chronic kidney and/or end-stage renal diseases (CKD/ESRD) condition analysis process to determine a CKD/ESRD condition model configured to model a CKD/ESRD condition. The method may include receiving input data associated with at least one patient, performing a dynamical system learner process to build a collection of dynamical system models, determining a model rank for at least a portion of the collection of dynamical system models, and determining an optimal dynamical system model for modeling the CKD/ESRD condition for the at least one patient.
In some embodiments of the method, the method may include pre-processing the input data to impute missing values. In various embodiments of the method, the method may include performing a causal analysis of the input data to generate causal information. In some embodiments of the method, the causal information may include a causal diagram. In various embodiments of the method, the model rank may be configured to indicate model performance for dynamical relationships between variables in the input data.
In exemplary embodiments of the method, the collection of dynamical system models may model one or more of the following variables: pre-treatment pulse pressure (P), Neutrophils-Lymphocytes ratio (ρNL), Serum calcium concentration (CCa), Intact Parathyroid Hormone (CPTH), Serum albumin concentration (g/dL) (CAb), Serum phosphorus concentration (CP), or Alkaline Phosphatase (CAP).
In one embodiment, a computer-implemented method of vascular calcification analysis may include, via a processor of a computing device: determining a vascular calcification model configured to model vascular calcification of a virtual patient to determine a causal relationship between at least one patient characteristic and a vascular calcification indicator; and generate a causal relationship structure configured to visualize a causal relationship between the at least one patient characteristic and the vascular calcification indicator.
In some embodiments of the method, the vascular calcification indicator may include one of pulse pressure (PP) or pulse wave velocity. In some embodiments of the method, the at least one patient characteristic may include at least one of parathyroid hormones (PTH), calcium (Ca), phosphate (PO4), calcium-phosphate product (CaPO4), neutrophil-lymphocyte ratio (NLR), and albumin (Alb). In some embodiments of the method, the causal relationship structure may include at least one of a causality fingerprint or a causality pathway map. In some embodiments of the method, the method may include administering a treatment regimen based on the causal relationship structure.
By way of example, specific embodiments of the disclosed machine will now be described, with reference to the accompanying drawings, in which:
The present embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which several exemplary embodiments are shown. The subject matter of the present disclosure, however, may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the subject matter to those skilled in the art. In the drawings, like numbers refer to like elements throughout.
The described technology is generally directed to processes, systems, and methods for the analysis of conditions of chronic kidney and/or end-stage renal diseases (CKD/ESRD) patients. A non-limiting example of CKD/ESRD conditions may include vascular calcification. In accordance with various aspects of the described embodiments, a method of CKD/ESRD conditions analysis may include a model generation process operative to provide a physiology-based model configured to model the CKD/ESRD conditions of a patient. The models may be generated, determined, or otherwise processed using a dynamical system modeler process or a dynamical system learner process.
In various embodiments, the model generation process may operate using an application that infers a model representation of a dynamical system from noisy time series data with missing values. Such a dynamical system may be represented by a set of ordinary or partial differential equations with or without time delays. From this model representation, important properties about the dynamical evolution of the system may be extracted or otherwise determined, such as causal relationships between the variables. Also, the processes, algorithms, equations, graphs, and/or the like obtained via the model generation process may be used to make predictions about a (real or virtual) biological system based on the present values of the state variables.
In one embodiment, the model generation process may operate to, given a set of noisy time-dependent data on multiple relevant variables, sampled from different realizations and at different times, construct a dynamical system model that best represents the observed data and that allows the determination of causal relations between the variables.
Conventional techniques have proven inadequate for the diagnosis and treatment of CKD/ESRD conditions. Non-limiting examples of CKD/ESRD conditions may include vascular calcification, anemia, bone mineral metabolism disorders, acid-base disorders, and/or the like. For instance, with respect to vascular calcification, CKD/ESRD patients experience up to 30-fold higher cardiovascular disease mortality than the general population. The risks include traditional factors such as age, diabetes mellitus, dyslipidemia, and inflammation, as well as CKD-specific risk factors such as bone mineral metabolism (BMM) disturbances and associated therapies. Increased cardiovascular risk is multifactorial and is due partly to pathophysiological processes specific to CKD, making prevention of cardiovascular disease (CVD) by standard interventions directed at single traditional risks difficult. Currently, there is no broad applicable clinical strategy available to ameliorate the development or progression of vascular calcification and many of the therapeutic clinical studies have shown poor results.
In some embodiments, for example, a vascular calcification model may be provided that is operative to disentangle the prevalent underlying physiological mechanisms that accelerate the risk of cardiovascular events associated with CKD-metabolic bone disorder (CKD-MBD) using mathematical models. A vascular calcification model may operate to, inter alia, validate patient-specific optimized therapy in order to achieve desired cardiometabolic homeostasis, provide a cost-effective process for identification of vascular calcification and cardiovascular disease progression, and develop individualized interventions to slow the progression of vascular calcification in CKD/ESRD patients. Additionally, causality-based analysis modeling is developed to identify individual patient-specific drivers of vascular calcification. The causality analysis may include a physiology-based model of vascular calcification, the creation of a virtual patient population, and a generic virtual clinical trial environment for this and/or other renal pathophysiology.
In some embodiments, a vascular calcification model may be or may include a causal pathway-based physiological model that utilizes clinical data to identify patients and/or risk levels (e.g., high-risk patients) of progression of vascular calcification (VC) and cardiometabolic diseases to provide multifactorial intervention strategies targeting the risk factors. The vascular calcification models may use information including, without limitation, pulse pressure (PP, a proxy for pulse wave velocity) to parathyroid hormones (PTH), calcium (Ca), phosphate (PO4), calcium-phosphate product (CaPO4), neutrophil-lymphocyte ratio (NLR), and/or albumin (Alb). Pulse pressure may account for both cardiac and vascular conditions (e.g., atrial fibrillation, aortic insufficiency, arterial stiffness or arteriovenous malformation, aortic valve stenosis, cardiac insufficiency, or cardiac tamponade).
CKD/ESRD models, such as vascular calcification models, according to some embodiments may demonstrate the causal pathway of PTH, Ca, PO4, NLR, and/or Alb on PP, and may demonstrate that there are likely paths from PTH, Ca, PO4, CaPO4, NLR to PP, where the strength of the relationships varies from patient to patient. Using these pathways, a dynamic model describing these interactions may be used to determine the impact of the dynamics on the progression of vascular calcification, and/or other conditions, and provide treatment recommendations and strategies to patients.
CKD/ESRD models and methods of using CKD/ESRD models may provide multiple technological advantages over conventional techniques, including advances in computing technology. In one non-limiting example of technological advantage, CKD/ESRD models may operate to accurately and efficiently model CKD/ESRD conditions in a manner not available using existing processes. In another non-limiting technological advantage, CKD/ESRD models may provide a causality analysis to identify patient-specific causes of CKD/ESRD conditions in patients and/or patient populations. CKD/ESRD condition risk, such as CVD risk, is multifactorial and interventions directed to a single risk factor are not effective. Accordingly, in one non-limiting technological advantage, CKD/ESRD condition analysis processes may determine multifactorial causes and interventions to CKD/ESRD conditions, such as CVD. A further non-limiting technological advantage may include providing a CKD/ESRD model configured to demonstrate a causal pathway of various conditions including, without limitation, PTH, Ca, PO4, NLR, and/or Alb on PP, and determine paths from PTH, Ca, PO4, CaPO4, NLR, and/or Alb to PP.
Current computing systems, including machine learning approaches, are not able to provide causal and pathophysiological insights that are required for prescriptive analysis. In addition, the process of many CKD/ESRD conditions, such as vascular calcification, is not well understood, making the development of a mechanistic model (for instance, an “avatar” approach) impractical, if not impossible. Furthermore, differential temporal resolutions of clinical data may result in poor performance. Accordingly, another non-limiting technological advantage may include providing improved computing technology capable of computer-based determinations of CKD/ESRD conditions, such as vascular calcification, causation of CKD/ESRD conditions, and/or treatment recommendations for CKD/ESRD conditions (for instance, based on determined causation) that are efficient, accurate, and not currently available using conventional computer systems.
A person of skill in the art would recognize additional technological advantages based on the teachings of the present disclosure. Embodiments are not limited in this context.
Processes, techniques, methods, systems, and/or the like described in the present disclosure may be integrated into various practical applications. For example, CKD/ESRD condition analysis using processes and computational models according to some embodiments may determine one or more causal factors for CKD/ESRD conditions, such as vascular calcification, of a patient and/or patient population. In an additional example, CKD/ESRD condition analysis processes according to some embodiments may be integrated into the practical application of diagnosing a patient. In a further example, CKD/ESRD condition analysis processes according to some embodiments may be integrated into the practical application of administering treatment to a patient, such as providing treatment options, recommendations, prescriptions, and/or the like based on the patient information and a CKD/ESRD condition diagnosis and causation determination. For example, administration of a treatment may include determining a dosage of a drug, administering the dosage of a drug, determining a testing regimen, administering the testing regimen, determining a treatment regimen (such as a dialysis treatment regimen, parameters (for instance, ultrafiltration rate), or prescription), administering the treatment regimen, and/or the like. Embodiments are not limited in this context.
Additional technological advantages and integrations of embodiments into practical applications are described in, and would be known to those of skill in the art in view of, the present disclosure.
Computing device 110 may be configured to manage, among other things, operational aspects of a CKD/ESRD condition analysis process, such as a vascular calcification analysis process, according to some embodiments. Although only one computing device 110 is depicted in
Computing device 110 may include a processor circuitry that may include and/or may access various logics for performing processes according to some embodiments. For instance, processor circuitry 120 may include and/or may access a CKD/ESRD analysis logic 122. In some embodiments, CKD/ESRD condition analysis logic 122 may operate to perform vascular calcification analysis processes according to some embodiments.
Processing circuitry 120, CKD/ESRD analysis logic 122, and/or portions thereof may be implemented in hardware, software, or a combination thereof. In some embodiments, CKD/ESRD analysis logic 122 may include and/or implement a dynamical system modeler process and/or a dynamical system learner process according to various embodiments. As used in this application, the terms “logic,” “component,” “layer,” “system,” “circuitry,” “decoder,” “encoder,” “control loop,” and/or “module” are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution, examples of which are provided by the exemplary computing architecture 1500. For example, a logic, circuitry, or a module may be and/or may include but are not limited to, a process running on a processor, a processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium), an object, an executable, a thread of execution, a program, a computer, hardware circuitry, integrated circuits, application-specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field-programmable gate array (FPGA), a system-on-a-chip (SoC), memory units, logic gates, registers, semiconductor device, chips, microchips, chip sets, software components, programs, applications, firmware, software modules, computer code, a control loop, a computational model or application, an AI model or application, an ML model or application, a proportional-integral-derivative (PID) controller, variations thereof, combinations of any of the foregoing, and/or the like. In one non-limiting example, CKD/ESRD condition analysis logic 122 may include instructions loaded into processor circuitry (for instance, instructions or code associated with CKD/ESRD condition analysis application 150).
Although CKD/ESRD condition analysis logic 122 is depicted in
Memory unit 130 may include various types of computer-readable storage media and/or systems in the form of one or more higher speed memory units, such as read-only memory (ROM), random-access memory (RAM), dynamic RAM (DRAM), Double-Data-Rate DRAM (DDRAM), synchronous DRAM (SDRAM), static RAM (SRAM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, polymer memory such as ferroelectric polymer memory, ovonic memory, phase change or ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, an array of devices such as Redundant Array of Independent Disks (RAID) drives, solid-state memory devices (e.g., USB memory, solid-state drives (SSD) and any other type of storage media suitable for storing information. In addition, memory unit 130 may include various types of computer-readable storage media in the form of one or more lower speed memory units, including an internal (or external) hard disk drive (HDD), a magnetic floppy disk drive (FDD), and an optical disk drive to read from or write to a removable optical disk (e.g., a CD-ROM or DVD), a solid-state drive (SSD), and/or the like.
Memory unit 130 may store various types of information and/or applications for a vascular calcification analysis process according to some embodiments. For example, memory unit 130 may store patient information 132, computational models 134, CKD/ESRD information 136, treatment information 138, and/or a CKD/ESRD condition analysis application 150. In some embodiments, some or all of patient information 132, computational models 134, CKD/ESRD information 136, treatment information 138, and/or a CKD/ESRD condition analysis application 150 may be stored in one or more data stores 172a-n accessible to computing device 110 via network 170. For example, one or more of data stores 172a-n may be or may include a HIS, an EMR system, a dialysis information system (DIS), an image archiving and communication system (PACS), a Centers for Medicare and Medicaid Services (CMS) database, U.S. Renal Data System (USRDS), a proprietary database, and/or the like.
Patient information 132 may include characteristics of a patient and/or a patient population that may be relevant to determining a CKD/ESRD condition, such as vascular calcification. In some embodiments, computational models 134 may be or may include one or more artificial intelligence (AI) models, machine learning (ML) models, deep learning (DL) models, neural networks (NN), Dynamical Systems (DS), combinations thereof, and/or the like.
In some embodiments, CKD/ESRD condition analysis application 150 may use one or more computational models 134 to analyze patient information 132 to determine vascular calcification information 140 and/or treatment information 142. In general, calcification information 140 is information generated by a computational model 134 according to some embodiments to derive a vascular calcification diagnosis (for a particular patient and/or patient population, including causation, as described in the present disclosure. In some embodiments, CKD/ESRD condition analysis application 150 may include and/or implement a dynamical system modeler process and/or a dynamical system learner process according to various embodiments. In some embodiments, computational models 134 may include a dynamical system model generated via at least one of a dynamical system learner process (see, for example,
As demonstrated in
Accordingly, some embodiments may provide a physiology-based model operative to, among other things, validate patient-specific optimized therapy to achieve desired cardiometabolic homeostasis, develop and validate a cost-effective algorithm for identification of vascular calcification and cardiovascular disease progression, and/or develop individualized interventions to slow the progression of vascular calcification in CKD/ESRD patients.
Although vascular calcification is used in examples, embodiments are not so limited. In particular, the analysis solutions according to some embodiments may be applied to CKD/ESRD conditions and/or pathophysiologies, for example, to develop a causality-based analysis model to identify individual patient-specific drivers of a condition. Non-limiting examples of CKD/ESRD conditions that may be modeled according to some embodiments may include anemia, bone mineral metabolism disorders, acid-base disorders, and/or the like.
CKD/ESRD condition analysis processes according to some embodiments may provide a causality analysis including, without limitation: a physiology-based model of a CKD/ESRD condition, such as vascular calcification; creation of a virtual patient population; generic virtual clinical (VC) trial environment (for instance, via software development and implementation); validation of CKD/ESRD condition models, such as vascular calcification models, for example, using data from ongoing clinical studies and/or using data from other collaborators.
There are various possible ways of extracting causal information from the collection a models determined according to some embodiment. Non-limiting examples of extraction methods may include probability-based causal analysis and ensemble causal analysis. Both methods rely on the same analysis on an individual level: y is said to cause x if the time derivative of x is an explicit function of y, i.e., {dot over (x)}=ƒ(y).
For the probability-baseds approach, to determine a single causal diagram for a population of patients or a collection of models consists of determining the probability that each individual term of the function library (Λ) is present in any arbitrary model in the collection. In other words, a probability model is constructed where for each element of the function library there is assigned the probability that the associated term is found in any model in the collection. A final model may be obtained by eliminating the terms in Λ with probabilities smaller than a given threshold. The causal diagram may then constructed by analyzing the surviving terms in this thresholded model.
For the ensemble causal approach, the ensemble causal diagram is built from the causal diagrams of each individual model added to the collection, i.e., for each new model added to the collection of models, a causal diagram is constructed. From the set of all causal diagrams in the collection a probability of occurrence may be assigned to each one of all the possible causal links between the different variables in the model (or for each element in the adjacency matrix). Finally, each link may be thresholded given its probability of occurrence in the whole collection of models.
A probability-based causal diagram may take into consideration each different term separately, when thresholding the terms in the final step it may select the highest probable model according to this approach, and extracts a causal diagram from this selected model. It is more sensitive to the heterogeneity of the models included in the collection and, as a consequence, the probability thresholds that need to be used are usually smaller. The thresholds may be configured according to various value systems, such as a range of 0 to 1 (or any other range). In some embodiments, the thresholds may include probability thresholds (for example, ranging between 0 to 1, 0% to 100%, and/or the like). Embodiments are not limited in this context.
An ensemble causal diagram extracts the causal diagram from each model in the collection and thresholds over the probability of each causal link. In that sense it is not sensitive to which particular terms were selected in each model, but only whether a given functional relationship between two variables existed. Therefore, it is expected that the causal relations obtained may be more robust, which is confirmed by the larger thresholds used when selecting the final causal diagram.
A vascular analysis process according to some embodiments may use the causation output to determine patients that are likely to have vascular calcification and/or causes of vascular calcification. Accordingly, vascular analysis processes may determine treatment recommendations based on the causation output that may be administered to target specific causes or potential causes of vascular calcification.
Some embodiments may use dynamical system representation of biological systems. A dynamical system may be understood as a system that changes in time and can be prescribed deterministically, i.e., there is a well-determined rule for its temporal evolution, in such a way that by knowing only this rule and the system's initial condition, it is possible to predict the system future states.
For example, some embodiments may operate to, given a set of noisy time-de data on multiple relevant variables, sampled from different realizations and on different times, construct a dynamical system model that represents the observed data and that allows the extraction of causal relations between the variables.
A large class of dynamical systems can be represented by a set of N differential equations in N variables represented by the state vector x=(x1, . . . , xN), with temporal evolution given by:
{dot over (x)}=ƒ(x;ξ),
where ξ is a vector of constant parameters and x is a vector of the state variables. Once a set of initial conditions is specified, i.e., x(t=0)=x0 then the state of the system is known at any given future time t. For any given application, the state vector is composed of all the relevant variables to the description of the system, e.g., in the vascular calcification context its components can be pulse pressure, the concentration of calcium and phosphate in circulation, etc. The function ƒ encodes all the information of the relationship between the different variables in the system, and its mathematical structure reveals how the different system elements interact and influence each other. In some embodiments, the dynamical system model may be represented by a set of ordinary differential equations with or without time delay, and/or partial differential equations in 1, 2 or 3 spatial dimensions, or dependent of any other independent variable of relevance.
In some embodiments, a model may be defined by the vector function ƒ(x, ξ). An objective may be to find a function that represents all the multiple realizations of the system, differing only by the value of the parameters ξ between different realization; or to find different functions for different realizations but that allow forecasting the future value of the system state variables x based on its present value.
For many complicated and realistic applications, it is difficult, if not impossible, to determine the mathematical structure of ƒ from first principles. On the other hand, there has been an increase in the availability of data in many scientifical fields and major developments in machine learning tools designed to handle and analyze relevant data. For instance, there are data-oriented approaches to solve the problem of finding a mathematical representation of ƒ for complicated systems to which there are large data sets available.
In general, computational models according to some embodiments may operate to, given a set of noisy time-dependent data on multiple relevant variables, for example, sampled from different realizations and at different times, construct a dynamical system model that best represents the observed data and that allows the extraction of causal relations between the variables.
For noisy time-dependent data, the data may be represented by a set of observed variables yi, where i=1, . . . , M and M is the number of state variables, the relationship to the observables y and the actual state variables x is given by:
y
i(t)=g(xi(t))+(0,σi),
where the second term represents a normally distributed noise with zero mean and standard deviation σi; and g(⋅) can be any function leading x(t) to y(t) (even the identity function, in which case the observable and the state variable are the same, except for the observational noise). In some embodiments, each variable xi has a different degree of observational noise, represented by a different standard deviation σi. A schematic representation of the data is provided in
In some embodiments, realization may provide that the data set available is supposed to represent the system in different realizations, i.e., there are data available of the system in different experimental or observational conditions. For observational studies that can mean data on different patients with similar conditions; for experimental studies different repetitions of the same experiments that generates the data. The underlying assumption is that every realization of the system can be described by the same model, represented by the function ƒ that provides the dynamical evolution of the system. Different realizations are not expected to share the same set of parameters ξ.
Some embodiments may perform time sampling, e.g., in which the observables of the system are not expected to be sampled at the same times, i.e., for each observable and each state variable there is a different time grid (t1, t2, . . . , tl, . . . , tN), where N is the total number of observations for a given observable and it might be different in different realizations of the system. Also, the time step for a given realization of the system does not need to be uniform, i.e., ti+1−ti≠tj+1−tj for i≠j.
Accordingly, in some embodiments, from the extracted model, it is possible to establish causal relationships between the state variables of the system, for instance, by looking at the arguments of a given component of the function ƒ. In this context causality should be interpreted as: if X is said to cause Y, then it is expected that a change in the value of X leads to a change in the value of Y, in the dynamical system representation of the system that reads {dot over (Y)}=ƒ(X, ξ).
In various embodiments, causal analysis 802 may correspond to extracting causal information from input data 811, which may be outputted in various forms, such as the format of a causal diagram 812.
Important insight may be obtained from data by analyzing how much information is exchanged between different variables, such analysis can help establish cause-effect relationships for the state variables of the system. A traditional approach is based on the Granger causality, a linear method that states that Y “Granger causes” X if the inclusion of Y in a predictive model for X increases the accuracy of such a model (see, for example, Granger, C. W. J., ‘Investigating Causal Relations by Econometric Models and Cross-Spectral Methods’, Essays in Econometrics vol II: Collected Papers of Clive W J. Granger, 37(3), pp. 31-47 (2008)). Although useful in many contexts, Granger causality assumes linearity and stationarity, both assumptions cannot be assured in medical and biological applications. For that reason, several generalizations of the Granger procedure have been proposed, in particular, a kernel-based version of it was developed specifically to deal with nonlinear systems (see, for example, Marinazzo et al., ‘Kernel-Granger causality and the analysis of dynamical networks’, Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 77(5), pp. 1-9 (2008)).
An advantage of employing such causality methods based on time series forecasting lies on their relative simplicity and low computational cost, allowing fast results that can help understand complex relationships between variables. Such methods may require the time series to be sampled in time with enough precision so that the underlying models have sufficient predictive power. If that condition is not satisfied, imputation and modeling of the time series may be used to provide enough precision. The result, in the form of a causal diagram, is dependent on the imputation used; however, it may still provide meaningful information about the topological structure of the causal diagram of the system.
In some embodiments, a dynamical system learner step 803 may operate to build a symbolic dynamical system model. In various embodiments, dynamical system learner step 803 may receive causal information 817 as input. From running one or more of previous steps 801-803 in a set of data from multiple patients, it may be possible to extract a collection of models in a model collection 804 step. By analyzing the frequency of occurrence of specific terms in such collection, embodiments may build a probabilistic representation for the system of equations 813, from the dependencies of the different models extracted it is possible to build a causal diagram 814 and/or probabilistic model 815, for example, for a patient or an entire population of patients.
In some embodiments, a model ranking 805 step may operate to calculate a model rank, for example, determining one or more optimal models 816, for instance, indicating which model(s) performs better when explaining the dynamical relationships between the variables in the input data.
In various embodiments, input data to a CKD/ESRD condition analysis processes, such as dynamical system learner process 800, may be or may include a set of noisy time series with missing values. In general, a time series may correspond to measurements of a variable of interest from patient physiology made at different moments in time. Assume y is such variable (e.g., pulse pressure or concentration of calcium in the patient blood stream), a time series for y corresponds to a set of N measurements of y made at the times t1, t2, . . . , tN, and is denoted: {y}={y1, y2, . . . yN}.
Each observable measured at time t is described by its value y(t); however, to measure the true value of the underlying state variable x directly and precisely is impossible. Consequently, the measurement provides a value y(t) called an observation of x(t), which relates to the actual value of x through the following equation:
y(t)=g(x(t))+(0,σ).
For example, it may be assumed that each value of y is a function of the corresponding value of x(g(⋅)) can be the identity function, in which case y corresponds to the state variable x, except for the noise), and an additive Gaussian noise with zero mean and standard deviation σ. The added noise may have a different distribution other than Gaussian, the distribution can be for example, but not limited to, student-t distributed noise, or lognormal noise, or exponentially distributed noise.
The presence of missing values, which may be represented by NA, means that at any particular time t the corresponding value y(t) was not measured and is therefore missing. For a set of noisy time series with missing values, since more than one observable are measured simultaneously, there is not one but M different variables. Each time series may be denoted {y}j where j=1, . . . , M.
Representing all this information together, the input can be written as a matrix where each column corresponds to the time series of a different observable, and each row corresponds to a different measurement in time. Therefore, the full input Y can be represented as:
where the missing values are placed randomly for illustrative purposes. Not all the variables have the same missing value rate, i.e., some variables have more missing values than others and at different times. The same is valid for the amplitude of the noise, some variables have higher noise to signal ratio than others, in other words, the standard deviation σj of the additive noise may not be the same for all {y}j.
In some embodiments, input causal information (such as causal information 817) may be in a form based on one of the following definitions of causality:
Some embodiments may perform time series pre-processing. For example, a first approach to handle the missing values may be to impute them, following some established criteria. In most applications, standard imputation schemes usually involve replacing the missing values by some measure of the variable distribution, usually by the mean or median, be it from the entire time series or from the first neighbors of the missing point.
However, if the missing values outnumber the observations for a given variable, these standard imputation schemes will lead to a biased result, in which case more robust techniques would need to be employed. In such a scenario, the imputation acquires a modeling characteristic, in which the missing values are replaced by a function that satisfies few conditions imposed by the researcher (such as continuity) and that are consistent with the observed data. The advantage of such approach is the possibility of applying time-series analysis to the imputed data, such as Granger-based causality measures and their various potential modifications.
As shown in
In some embodiments, pre-processing may operate to perform one or more of: transform input data into a uniform representation for all variables; smoothen the data and reduce the importance of noise, trying to extract the relevant dynamical evolution of the variables; impute missing values with significant values that approximate the underlying true value of the variable; and increase the number of points available to feed the code, in order to improve performance.
Various processes, methods, algorithms, and/or the like may be used for pre-processing according to some embodiments. Non-limiting examples of methods may include spline, smoothing spline, and/or Gaussian processes.
In general, the Gaussian Process (GP) may operate to generalize the concept of a Gaussian probability distribution for random numbers for the case of stochastic processes. Each value ƒ(x) of the process is taken from a Gaussian distribution with statistical properties given by a covariance matrix K, that determines how the point x covaries with previous observations xi. The covariance matrix is calculated on a function of the distance between the points x and xi, called the kernel. The obtained result is dependent of the choice of the kernel function. A general kernel function (which is appropriate when not much is known about the underlying process generating the observed data) is the Radial Basis Function kernel (or RBF) which, given two observations x and x′, may be defined as:
This kernel is infinitely differentiable which leads to a smooth Gaussian process over the observed data, making it suitable to impute missing data in observations from natural processes. On the downside, for extremely under sampled observations, in which case the distance between successive points is too large, so that K(x, x′)≈0, Gaussian processes may fail to appropriately estimate the underlying dynamics of the system. Gaussian processes also effectively handle noisy data by adding a constant value α to the diagonal of the covariance matrix.
Another suitable kernel is the Matérn covariance function Kv({right arrow over (t)}, {right arrow over (t)}′), given by:
where is a typical length scale, Γ(⋅) is the Gamma function, Jv(⋅) is the modified Bessel function of the second kind, r=|t−t′| is the distance between the two arguments of the kernel and v is a positive parameter, such that the process η(x) is k-times mean-square differentiable if and only if v>k. A common value is v=3/2, so only once differentiability is required for the noisy data. Any other kernel function of the distance between two points in the time series can be used in Gaussian Process pre-processing, and it should not be limited to the previous two examples.
To assume a dynamical system representation implies, among other things, that the state variables of the system are connected through a mathematical representation of their temporal evolution. Because of that, similar states are expected to generate similar outcomes. It is possible to use this reasoning in order to impute the missing values of the data using the nearest neighbor algorithm.
In one example, the state of the system may be characterized as a vector {right arrow over (y)}(t) with missing values which is sampled at times t1, t2, . . . , tL. If the value of the i-th component of the state vector at time tj is missing, i.e., yi(tj)=NA, then its value is replaced by:
y
i(tj)=k nearest neighbors i-th component,
where the angled bracket represents the average, and the “k nearest neighbors” (k-NN or KNN) are the k different points {right arrow over (y)}(tk) with tk≠tj closest to {right arrow over (y)}(tj), where distances are calculated only with the components of {right arrow over (y)} that form the orthogonal complement of the missing value.
Because only the missing values are imputed, the obtained time series is discontinuous. A Savitzky-Golay filter may be applied to smoothen the result, or any other smoothing filter technique. In graph C of
The two imputation methods presented so far can capture different features of the underlying dynamical system: Gaussian processes favor precision at the data points; and KNN favors similarity of recurrent states. In order to leverage both advantages, it is possible to calculate an average of them, if each point and method is carefully weighted, the resulting signal can be smooth and representative of the dynamics.
The resulting averaged time series ximp(t) may be calculated as:
where wi(t) is the contribution at time t of each of the NP points in the data set, and σ(t)=Σi=1N
where 0≤ρ≤1 is a regularization factor and τ is a typical time scale. The contribution at time t of each data point to the average may be given by:
where the Greek index iterates over the different imputation methods; ωα(t) are the weights of each method and may be given by:
ωα,i(t)=Πβ≠αdi(β)(t),
with the following:
where Δt=max(ti)−min(ti) and Δx=max(xi)−min(xi) for all i. And σi(t)=Σαωα,i(t) is the normalization factor. In other words, the weights were carefully chosen to favor the imputation that is closer to the data points.
In
In some embodiments, imputation methods may use machine learning (ML), artificial intelligence (AI), and/or deep learning (DL) techniques. Non-limiting examples of ML may include recursive neural networks (RNN), cognitive neural network (CNN), and generative adversarial networks (GAN), or a combination and/or adaptation of both. The RNN architecture is suitable for time series analysis as it includes information about previous values when training the network; as for the GAN's, this architecture is very efficient in generating fake data due to its use of two competing networks: one attempts to model a mapping from noise to the true data, and the other one reads the data and assigns to it a probability of it being true. The combined iteration of these two adversarial networks eventually converge to a good quality of the generated fake data.
In some embodiments, a causal analysis may be or may correspond with causal analysis 802 of process 800 of
Causal analysis may operate to provide insight about the possible mechanistic relationship between the observables of the system and the underlying state variables. In addition, causal analysis may operate to provide a causal diagram that may be used as input to a dynamical system learner (e.g., dynamical system learner 803 of
The causality diagram may have other forms, such as a matrix form, called an adjacency matrix. This is a binary matrix where if the element in the i-th rom and j-th column is 1 (or True) it means that yi causes yj.
{dot over (y)}
i=ƒi(y,t) i=1, . . . ,M.
A dynamic system learner, such as dynamic system learner 1301, may perform various technical functions, including, without limitation: from time series data, infer a set of ordinary differential equations, or partial differential equations, or time delay differential equations, that better explain the time evolution of the observed variables; obtain a different model, with either different equations or only different parameters, for each one of the different realizations of the data; an obtained set of equations may be able to be used to make forecasts about the system state; an obtained set of equations may be used to model the effect of treatments for the patient.
In various embodiments, a dynamic system learner may include and/or use various processes, functions, algorithms, and/or the like. For example, an algorithm may be configured to extract dynamical system models from data. The algorithm may be configured to find the set of M functions ƒi, such as:
which can be rewritten as:
D(y,t){dot over (y)}i−N(y,t)=0.
A method to solve this problem may include writing this expression as a product between a matrix (Θ) built from a library of functions (Λi), and the input data (Y and {dot over (Y)}), and a vector of coefficients {right arrow over (ξ)}i, such as:
D(y,t){dot over (y)}i−N(y,t)=Θ(Λ,Y,{dot over (Y)}i){right arrow over (ξ)}i=0,
where the preprocessed time series Y is given by:
and
where M is the number of dimensions of the input data, and N is the number of times in which it was sampled.
In some embodiments, a library of functions, Λi={h1({right arrow over (y)}), . . . , hL({right arrow over (y)})} may include a set of functions of the input {right arrow over (y)}. By applying the library of functions to the input data:
In which the algorithm finds M sparse sets of coefficients {right arrow over (ξ)}i={ξ1, . . . , ξ}, such as:
Θ(Λ,Y,{dot over (Y)}i){right arrow over (ξ)}i=(Λ(Y)∘{right arrow over ({dot over (Y)})}i−Λ(Y)){right arrow over (ξ)}i=0 i=1, . . . ,M,
where ∘ is a row-wise multiplication:
In some embodiments, a causality adjacency matrix () may be a M×M binary matrix, if ij is 1, it means that yi causes yj; zero otherwise. Its effect on the code is to filter out columns of the library of functions Λi (the index refers to the M different libraries used to each variable yi). Then, if the -th function of the library is an explicit function of yj, i.e., if =(yj); and Cij=0, then is removed from Λi.
To solve the equation Θξi=0, the sequentially thresholding least squares (STLSq) algorithm may be used. The method relies on transforming the problem to a convex problem, by removing one of the columns of the matrix Θ and rearranging the problem:
θj=Θj(Λi,Y,{dot over (Y)}i){right arrow over (ξ)}i(j),
where θj is the j-th column of the matrix Θ(Λi, Y, {dot over (Y)}i), and Θj corresponds to Θ with the j-th column removed. {right arrow over (ξ)}i(j) is the sparsest vector of coefficients that solves the equation, found using the sequentially thresholding least squares algorithm, which performs linear regressions sequentially, removing the coefficients with values smaller than a given pre-established threshold β.
In conventional methods, a problem arises of having to iterate over all elements of the library of functions and apply the STLSq to all of them. However, depending on the size of the library of functions, the number of different columns can be so large as to increase the computation time to prohibitive limits. In order to overcome this problem, embodiments may operate to associate to each column a probability pj that the corresponding term belongs to the true model. This probability can be initialized with an arbitrary prior, and updated as that given term is selected on successive iterations over different θj. With this information, instead of navigating linearly through the columns of Θ, one can then go through them in an order of decreasing probability. By implementing a stopping criterion, one can early stop the execution once a convergence metric is satisfied.
In various embodiments, the output of a dynamic system learner may be or may include a set of coefficients given a library of functions. It is possible to generalize these coefficients and build an object that represents a dynamical system model, for example, via a dynamical system model process or algorithm. For example, in some embodiments, given multiple input data due to different realization/observations of the system, where NP stands for the number of different realizations of the input. A dynamical system learner may output different models, given different inputs. Assuming Nm is the number of different outputted models, the outputted models may be added to a collection of models, which in some embodiments, may be a structure in the code designed to organize the models. Mathematically, the collection of models may be or may include a set of models, where, with μi representing one of the Nm different models added to the collection, then ={μ1, μ2, . . . , μN
In some embodiments, process 1400 may include accessing or building a dynamical system learner 1401. In various embodiments, dynamical system learner 1401 may operate to generate a general dynamical system model, given a library of functions, and which can then be fitted to any new data. Dynamical system learner 1401 may receive a set of different realizations 1405, for example, of time series data. A collection of models 1410 may be built from the set of different realizations 1405. In various embodiments, dynamical system learner 1401 may operate to estimate each model parameter for any new data set presented to the model. From model collection 1410, the probability that a given term from the input library of functions is present in the collection may be estimated. For instance, given multiple different models being added to the collection, obtained from different realizations of the system, process 1400 may estimate the probability of a particular term being in any model included in the collection. In various embodiments, process 1400 may also operate to infer a causal diagram from the causal relationships between the variables extracted from the models in collection 1410. In exemplary embodiments, model collection 1410 may be used as a database of models, for example, for complex systems.
Various model or model discovery processes may be used according to various embodiments. A non-limiting example may include a Michaelis-Menten system. In various embodiments, a Michaelis-Menten system may be represented as follows:
where jx=0.6, Vmax=1.0, and km=0.3.
In its rational form, the Michaelis-Menten equation can be rewritten as:
which, given a library of functions made of second order polynomials for both the numerator and denominator terms, the data-frame representation outputted by a dynamical system learner may be:
In some embodiments, for example, in a code or instructions form, a dynamical system model object may be a tuple μ=(Λ, {right arrow over (β)}), where Λ is a library of functions, and {right arrow over (β)} is a vector of Boolean values representing which terms in the provided library are present in the current model μ. Using the Michaelis-Menten system as an example, its dynamical system model representation may be:
Where the terms of the library of functions A that multiply {dot over (x)} (the first three rows in the example above) correspond to the denominator terms in the actual model represented mathematically in the previous paragraph.
In various embodiments, when instantiating a new dynamical system model, arguments may be passed, for instance, as a data-frame with actual coefficient values. In some embodiments, any coefficient with an absolute value smaller than the machine epsilon (e.g., ˜10−16) may be considered zero; the input data-frame may be simplified. For example, the input data-frame may be simplified using a (Python) sympy.cancel function, which may operate to cancel common factors in a rational function ƒ; and/or for M-dimensional systems, there are M different vectors βi, each one representing the selected models for each variable i. Because the dynamical system model does not store any information regarding the value of the coefficients, two models may be considered equal as long as they have the same terms from the library of functions, i.e., if their {right arrow over (β)} and their function library Λ are the same.
Parameter estimation may be performed according to various embodiments. For example, given a data set composed of simultaneous observations of the variable {right arrow over (y)} and its derivatives {right arrow over ({dot over (y)})}, the vector of coefficients ξ=ξ({right arrow over (y)}, {right arrow over ({dot over (y)})}; Λ) may be obtained given the model μ(Λ, {right arrow over (β)}). The regression operator is such that:
ξ({right arrow over (y)},{right arrow over ({dot over (y)})};Λ)=(μ(Λ,{right arrow over (β)})|{right arrow over (y)},{right arrow over ({dot over (y)})}).
The actual regression, for instance, a coefficient estimation, may be performed through the following steps:
Λβ({right arrow over (x)}){right arrow over (ξ)}=0,
which can be rewritten as:
[Λβ(y,{dot over (y)})]1=[Λβ(y,{dot over (y)})]2, . . . ,Nξpartial,
where N is the number of True elements in the vector {right arrow over (β)}, the notation [⋅]i corresponds to the i-th column of the matrix in between brackets; and the full coefficients from the selected model are given by: {right arrow over (ξ)}={right arrow over ((1,))}(−{right arrow over (ξ)}partial), where {right arrow over ((1,))} is the one-dimensional unity vector, and is the concatenation operator;
Parameter estimation for non-linear dynamical systems is a complicated problem to which different approaches have been attempted. For instance, as an optimization problem, in which the parameters (and sometimes initial condition for the variables) are optimized to minimize some objective function, an error measure of some kind, such as an information criterion or likelihood. However, due to its usual high dimensionality, non-linearity and data quality, this problem is usually ill-posed and does not have a simple method that solves it. For that reason, different methods can be used to estimate the parameters in the vector of coefficients ξ. For example, more generally, the problem of parameter estimation can be stated as:
where ε is the objective function. It could be the Mean Square Error, or Bayesian Information Criterion, among others.
In various embodiments, the collection of models may not keep repeated models, i.e., if two distinct sets of input data generate the same model (have the same vector {right arrow over (β)}), the respective dynamical system model will be added only once to the model collection. However, the information that this particular model was identified twice is not lost by not adding the model two times to the collection. Instead, this information may be stored on the vector of probabilities {right arrow over (π)}. The vector {right arrow over (π)} has the same dimension as {right arrow over (β)}, and its i-th element πi is the probability that the corresponding i-th term from the library of functions was present in one model selected at the many realizations from the input data used to generate the model collection.
In some embodiments, the calculation of the probability vector {right arrow over (π)} may be carried out as follows:
In one example, using a Michaelis-Menten process, if the following models were added to the collection (only representing the binary vector {right arrow over (β)} of each model): μ1=(1,0,0,1,1,1)T; μ2=(1,1,0,1,1,0)T; and μ3=(1,1,0,1,1,0)T. Then the probabilistic model in this case would be a vector given by: {right arrow over (π)}=(1,⅔,0,1,1,⅓)T.
For each model in the collection , it is possible to determine a causal diagram (for instance, a confidence matrix). Given a model for a M-dimensional problem μ={Λ,βi}, where i=1, . . . , M, yj causes yi if any non-zero element of βi is associated to a term of the library of functions Λ that contains yj. For example, it is possible to write an equation in which {dot over (y)}i=ƒ(yj). If that is true, then the corresponding term of the confidence matrix ji is 1, otherwise it is zero.
The causal diagram for the collection of models may be extracted simply by averaging the confidence matrices from all the models that belong to the collection. Consequently, the collection confidence matrix is no longer a binary matrix but a matrix of probabilities, in which the value of the ij element corresponds to the probability that the causal link from yi to yj exists in the collection of models.
In some embodiments, a CKD/ESRD condition analysis process, such as a vascular calcification analysis processes (for instance, process 800 of
In various embodiments, the models in a model collection may be ranked according to the goodness of fit to any new input data. This model ranking may allow for determining best models for subpopulations of the input data set. For example, given the model collection, the models could be ranked separately for patients with and without diabetes. Use the model collection to build a causal diagram based on the occurrence of specific terms in the best ranked model.
The model score may be determined using various processes. In one non-limiting example, a model score may be determined using a process or based on a process described in Pfister et al., “Learning stable and predictive structures in kinetic systems,” Proceedings of the National Academy of Sciences, 116 (51), pp. 25405-25411(2019) (“Pfister process”), which is incorporated by reference as if fully set forth in the present disclosure. For example, let RSS(ŷ, {right arrow over (y)}) be the residual sum of squares between the approximate function ŷ and the noisy data set {right arrow over (y)}; and let {right arrow over ({dot over (y)})}μ=μ({right arrow over (y)}), be the value of the derivative calculated by the model μ with parameters defined from the data. Then the model score TM is given by:
where S[⋅] corresponds to an smoothing spline approximation, and S[⋅|⋅] corresponds to a derivative constrained smoothing spline.
The model scores may be implemented according to various embodiments. For example, non-limiting implementations may include one or more of: the current score does not measure how well the current model fits to the data, but how well the current model fits the smoothing spline approximation of the data; the constrained smoothing spline is a numerically expensive calculation, since it requires a constrained optimization for the regularization parameter λ, which is significantly slower than a unconstrained optimization required for a regular smoothing spline; the score requires knowledge both of the data and of its derivative, the derivative is necessary to do the regression of the models coefficients; or the current score does not favor sparse models over non-sparse ones, which could lead to overfitting.
In addition, various alternative scores may be used according to exemplary embodiments. For instance, it is possible to use classical model selection criteria, as Bayesian Information Criterion (BIC) or Akaike information criterion (AICc) as long, for example, there is available a good approximation for the data derivative. For example, let Γ(y, ypredicted, k) be any arbitrary model selection criterion, where k is the number of parameters selected in the model . Then the proposed new score may be determined as follows:
σ(μ;{right arrow over (x)},{right arrow over ({dot over (x)})}):=Γ(S′[{right arrow over (x)}],{right arrow over ({dot over (x)})}μ,∥{right arrow over (β)}∥0),
where S′[{right arrow over (x)}] corresponds to the first derivative of the spline approximation of the noisy data {right arrow over (x)}; is the derivative calculated by the model; and ∥{right arrow over (β)}∥0 is the 0 norm of the binary vector {right arrow over (β)}, which is simply the number of selected elements from .
Using these standard criteria for model selection is advantageous for not relying on the calculation of a constrained smoothing spline. Also, they favor sparse models, and consequently reduces the chance of overfitting. To improve the ability of the method to identify the correct model, it is possible to limit the value of the sparsity degree (SD) such that only models with its SD value smaller than a pre-defined threshold will be included in the list of models to rank. This threshold must be imposed by the user and has the only objective of excluding from the ranking any model with large sparsity degrees, which tend to overfit to the training data and are less representative of the system dynamics and less generalizable.
Dynamical system models and associated methods may be used in various applications according to some embodiments. Non-limiting example applications may include: determining a model that is representative of the system dynamics and can be interpreted physiologically, providing insight to the researcher as to the possible mechanisms underlying the observed interactions between the system observables; building a model that allows forecasts of future values of the system observables, which can then be used to virtually test the effect of different interventions; incorporating the algorithm in a model predictive control loop, in which case the model is determined from data, specifically for each patient, allowing a more precise prediction and accurate intervention; and/or extracting causal information from time series data.
Dynamical system modeling algorithms may use a library of test functions onto which they perform a sparsity promoting regression with the data. In some embodiments, a function library 1605 may be an input of process 1600. In various embodiments, function library 1605 may be or may include a collection of NF functions ƒi:, represented as Λ=(ƒ1, ƒ2, . . . , θN
A function library Λ may be an operator on the data, given a matrix of data: X=(x1 x2 . . . xN) where each row represents an observation of the state variables at a given time. Then the operator Λ=(ƒ1, ƒ2, . . . , ƒN
Patient data 1606 for process 1600 may be heterogeneous and noisy, for example, such that some variables have missing values (represented by NA) and the values that are missing are not in the same times for all variables. The data matrix Y may have each row of the matrix referring to a different time, and each column referring to a different variable:
Model selection 1610 may receive as input function library 1605 (e.g., the library of functions Λ) and patient data 1606 from all the patients in each population (each patient data denoted Y) and builds a model collection 1607 (e.g., collection of models =μ1, μ2, . . . ). Each model may include or may be a Boolean matrix where each column refers to different state variables and each row informs whether the corresponding element from function library 1605 was selected.
In one non-limiting example, function library 1605, with second-order polynomial for two variables, may be selected:
where ξij are coefficients. Then, the model to be incorporated into model collection would be:
For any given data matrix Y with N state variables, and its derivatives given by {dot over (Y)} and a given function library Λ, the output model μ is given by:
μ=Γ(Y,{dot over (Y)};Λ),
where Γ is the model selection operator.
The operator Γ requires its input data Y and its derivatives {dot over (Y)} to be complete, i.e., there cannot be missing values in any of the times passed to the operator. In order to overcome this difficulty, an iterative use of smoothing splines may be used, defined as: Smoothing splines: for each noisy variable yi(t) with i=1, . . . , N, the smoothing spline is given by:
where λ is a regularization parameter, C is the space of all smooth functions for which values and the first two derivatives are bounded in absolute value by C (for instance, according to the Pfister process). For Smoothing splines with derivative constraints: a similar procedure can be done, however with a constraint imposed in the derivatives {dot over (y)}ι:
S[yi|{dot over (y)}ι]=S[yi] s.t. {dot over (y)}()={dot over (y)}ι().
With these two operators, the model selection algorithm may operate as follows (the indexes represent the algorithm iteration):
A model selection operator Γ may follow a procedure in which, given the data X, its derivative {dot over (X)}, and the appropriate function library Λ, to find the best model is to solve the following constrained optimization problem for each column {dot over (x)}i of {dot over (X)}:
where the selected model coefficients correspond to the columns of Ξ with a sparse representation that allows accurate prediction of {dot over (x)}i given {dot over (X)}.
Solving this minimization problem is very difficult due to its non-convexity. However, it can be rewritten as a set of 2×NF convex problems:
λj(X,{dot over (x)}i)=Λ(X,{dot over (x)}i|λj)ξj,
where λj is the j-th column of Λ; and Λ(⋅|λj) is the library of functions without its j-th column. Then, we solve this problem by applying the sequential threshold least square algorithm to find a sparse solution ξj.
The choice of which set of coefficients ξj is the actual solution to the system may be made by one of many different model selection criteria, including, without limitation, AIC, BIC, or derivates thereof. In principle, this optimization problem must be solved 2×NF times, one for each element of the function library. However, it is possible to build a probability that a given element j of the function library participates in the system dynamics. This probability is updated at every new ξ calculated, allowing the determination of some early stop criteria.
For performance gains, some embodiments may operate to reduce the number of times model selection 1610 is called for a given population. It does not need to be calculated for all the patients in the population and it also does not need to run every time the algorithm is executed. Model selection 1610 may be run as many times as necessary to build a comprehensive model collection 1607 for the given problem. With model collection 1607 (or ), model ranking 1615 may operate to rank the models according to their prediction power in a particular population of patients.
For a given model μi in model collection 1607, the actual coefficient values of the model for each patient (or patient(s) of interest) of the population may be calculated. For example, with Λμ
In some embodiments, the process or algorithm to find the model coefficients may include:
Each model in model collection 1607 may be scored according to the following equation, where the sum is through all the Np patients in the population:
where RSS is the residual sum of squares, ƒμ(Y, ξi) is the derivative calculated by the model μ with the coefficients ξi calculated for patient i, and X0 is the unconstrained smoothing spline of the data. In some embodiments, the model with the lowest score is the model that best represents the population, from all the models in the model collection, as it shows a more stable fit across different patients.
The model selection process or algorithm may operate to score the variables as they might be present in some models but not in others. A process the same or similar to the Pfister process may be used to rank the variables according to their relevance in the best scoring models. For instance, given the K top-ranked models, the following score may be assigned to the j-th variable:
s
j=Fraction of the top K models that depend on xj.
In various embodiments, the relevance of a given variable may be interpreted as a robust measure of causality, i.e., if a model for xi depends on xj, then we can say that xj causes xi, in a sense that changes in xj lead to changes in xi. The score sj measures the fraction of the top-scoring models that captured this causal relation.
A series of tests made with data generated by known models was performed for the dynamical system learner to show the performance of the dynamical system learner under different conditions. In order to assess the quality of the system output, the following series of benchmark metrics were calculated under different conditions of the data:
The data used in this benchmark was generated according to the following steps: (1) Generate data X and {dot over (X)} from ni initial conditions; (2) Add Gaussian Noise: {tilde over (X)}=X+(μ=0, σ=ρΔX), where ΔX is the amplitude of the input time series, and ρ is a factor representing the value of the added noise standard deviation in terms of the percentage of the time series amplitude. (μ, σ) is a Gaussian noise with mean μ and variance σ2; (3) Define a library of functions: Λ; (4) Calculate the model μ with dynamical system learner (or DynSysLearner) μ=Γ(X, {dot over (X)}, Λ; β), where β is the regularization_parameter, and Γ is the model selection operator, in other words, it is the mathematical representation of the dynamical system learner method; (5) Calculate benchmark metrics as a function of the regularization parameter =(β).
Every model discovered by the DynSysLearner is saved to a collection of models. These saved models can then be ranked according to the value of a model score (e.g., via the Pfister process), and described by the following equation:
where n corresponds to the number of different observations of a given system, i.e., different time series that represent the same system (e.g., data from different patients, or different observations of the same experimental conditions for a given experiment). RSS is the residual sum of squares ŷ is a smoothing spline approximation of the time series calculated from the noisy data, and ŷconstrained is the smoothing spline approximation of the data constrained to the values of derivatives calculated by the discovered model in the model collection.
This score is then calculated for all the models in the model collection, and the model with the smallest score is considered the one with the most stable fit. Considering that each model is saved in the model collection only with the information about which terms of the library of functions are present in the model, a linear regression may be calculated on the data to estimate the values of the coefficients, for each different input data set.
A set of benchmarks were made using a different score to rank the models: the Bayesian Information Criterion: BIC(; x, {dot over (x)})=BIC(S′[x], , k), where S′[x] is the derivative of the smoothing spline approximation to the data, {dot over (x)}μ is the derivative calculated from the model; and k is the number of non zero coefficients from the selected model. The appropriate value of the regularization parameter for the smoothing spline is calculated using a k-fold cross-validation.
In this Experiment I, benchmark results were determined for four different models: (i) The Michaelis-Menten model; (ii) the Selkov system; (iii) the Lorenz system; and (iv) a model describing Osteoblast-Osteoclast activity (or bone remodeling system).
The equation defining the Michaelis-Menten model is:
where jx=0.6, Vmax=1.0, km=0.3. An appropriate integration time may be Δt=5 a. u. t., where a.u.t. stands for arbitrary units of time. The determination of the integration time was made as to ensure that the time series passed to the algorithm has not yet reached a steady state.
In order to determine the most appropriate value of the regularization parameter, the system was benchmarked without noise and without any pre-processing for different values of the STLSQ threshold β, and without optimizing the value of β for each run. The following simulation parameters were used:
As expected, if β is too small the method fails to determine the correct model. However, as β increases, all the appropriate metrics increase accordingly suggesting that the DynSysLearner successfully found the correct equation. However, if the value of β is increased beyond a given threshold, then real terms are not selected, which deteriorates the quality of the benchmark metrics. The chosen value for β was 0.75.
For systems that fall too rapidly to a steady state (fixed point), only the short lived transient dynamics is relevant for model discovery. In the case that the underlying system is too complex, it may be necessary to simulate more than one random initial condition to increase the information on the system dynamics. Even for a system as complex as the Michaelis-Menten model, only one initial condition (with 100 data points) may be enough to obtain the correct model via DynSysLearner.
Benchmarks were determined for the model discovery method when the input time series is more realistic and has noise. In order to mimic such behavior, Gaussian noise was added to the time series and its derivative:
X′=X+(0,ρamp(X))
{dot over (X)}′={dot over (X)}+(0,ρamp({dot over (X)}))′
where (μ, σ) is a Gaussian distribution with mean μ and standard deviation σ, and amp(⋅) is the amplitude operator. In this way ρ represents the ratio between the added noise standard deviation and the time series amplitude, and it is referred to as the percentage_of_amplitude. In the benchmarks, only noise levels as large as 10−3 start to disrupt the quality of the benchmark, suggesting that the method is robust against noise.
Regarding the presence of noise for models with rational form and the value of the regularization parameter, because both numerator and denominator can be re-scaled by a common factor, the value of the coefficients are not fixed. That means that the value of the regularization parameter may greatly affect the performance of the method.
The determination of the regularization parameter may be useful for the success of the dynamical system learner process, in which case it would be desirable to have a non-arbitrary way of determining it. For that reason, the value of the regularization parameter can be optimized to minimize one of three implemented metrics for model selection: Cross Validation; Bayesian Information criterion (BIC); or Akaike Information criterion (AIC)
For no-preprocessing benchmarks, a total of 2 distinct models were selected from all the realizations with different parameters, and the true model is in the collection of models. When ranked over a data set with 5 realizations of the system (5 initial conditions), with each realization with 100 points, without added noise, the true model is the model with the lowest rank in the collection, with rank value equal to 1.78518147×1001.
Using a collection of models with 7 different models (generated from noisy data set), one of them being the true model, using the following parameters when generating the noisy data to rank the models:
Under these conditions and with 25% of missing values, using Bayesian Information Criterion to rank the models, the method is capable of correctly identifying the true model.
The equation defining the Selkov model is:
{dot over (x)}=−x+ay+x
2
y
{dot over (y)}=b−ay−x
2
y,
where: a=0.1 and b=0.41. An appropriate integration time may be Δt=25 a. u. t., where a.u.t. stands for arbitrary units of time.
Benchmark simulations were determined where no noise is added to the time series. In order to determine the most appropriate value of the regularization parameter, the system was benchmarked without noise and without any pre-processing for different values of the STLSQ threshold β. The following simulation parameters were used:
The results are very robust or small values of the regularization parameter, only when β increases to values with magnitude comparable to that of the real coefficients of the system (β˜0.4) is that the method begins to fail. The value of β chosen for the subsequent benchmarks was β=0.05. In addition, the values of the benchmark metrics are insensitive to the number of initial conditions.
Gaussian noise was added to the time series for the Selkov model using the same prescription already explained in the respective section for the Michaelis-Menten model. The noiseless case are robust to noise levels as large as ˜10−4 of the input time series amplitude.
The optimization conditions used for the Selkov system are similar to the ones used for the Michaelis-Menten system, the difference here is that having two equations, when the optimize parameter from the fit method from the DynSysLearnerND object is set to ‘all’, it means that all the regularization parameters ({right arrow over (β)}=(β0, β1)T) are going to be optimized separately, consequently the value of the regularization parameter for each equation might be different.
For results with BIC, a total of 21 distinct models were identified and added to the collection of models. The true model belongs to the final collection of models. The following parameters were used when generating the data to rank the model:
In this condition, the BIC was able to correctly rank the true model. The true model was also correctly ranked when the noise standard deviation was 10% of the time series amplitude and there was 50% missing values, in which case a total of 10 different initial conditions were used.
The Lorenz system is a system used in the study of non-linear dynamics. The equations defining it are:
{dot over (x)}=σ(y−x)
{dot over (y)}=x(ρ−z)−y,
ż=xy−βz
where the values of the coefficients are given by: ρ=28, σ=10, β=8/3.
The Lorenz system is very stable when subjected to the DynSysLearner, as the true positive rate is 1 and the false discovery rate is zero for a very large range of values of the regularization parameter. As the performance of the method seems to be indifferent to the value of β, within the interval of values calculated in this study. The value β=0.05 was kept fixed in all the subsequent benchmarks. Regarding the number of initial conditions, the results for the Lorenz system is consistent to the results observed previously for the other systems: the performance of the DynSysLearner seems to be indifferent to the number of initial conditions used, as long as the number of points sampled to the algorithm is large enough.
The algorithm is robust against the addition of noise for noise standard deviations as large as 10−3 of the input time series amplitude. By increasing the amplitude of the added noise, first the number of falsely discovered terms increases and it is only at ρ˜10−2 that the true positive rate starts to decrease.
When optimizing the regularization parameter there are two different strategies implemented: (i) if the value of the optimize parameter is set to ‘all’, then the regularization parameter is going to be optimized separately for each equation in the model; (ii) if the value of optimize is set to ‘first’, then the regularization parameter of the first equation is optimized and its value is used for the subsequent equations. Collection of models
There were 27 different models saved in the model collection for the Lorenz system. The true model is present in this model collection. When ranked with a noiseless data set, the model with the lowest score is not the true model. Under these conditions the model with the lowest rank corresponds to the following set of equations:
{dot over (x)}=α
1+α2x+α3y+α4yz
{dot over (y)}=β
1+β2x+β3y+β4yz,
ż=γ
1
where α, β and γ are constants. Increasing the amplitude of the noise in the input data to 1% of the input time series amplitude, the model with the lowest rank corresponds to the following set of equations:
{dot over (x)}=α
1+α2x+α3y+α4xz+α5yz
{dot over (y)}=β
1+β2x+β3y+β4xz
ż=γ
1+γ2z+γ3xy+γ4y2
where α, β and γ are constants.
For BIC, the following are the parameters used when ranking the models for the Lorenz system. There were 38 models in the model collection.
In this noiseless case, the true model is correctly identified as the highest ranked model. When PERCENTAGE_OF_AMPLITUDE=0.01, then the highest ranked model is no longer the correct model. However, reducing the time of integration to 10 allows the system to correctly rank the true model. In this case, due to the higher frequencies present in the Lorenz system time series, the smoothing spline step may have difficulties converging to a good approximation of the system, jeopardizing the quality of the scoring of the models. Using the smaller time spam leads to less oscillations in the observed time series, allowing for better convergence of the smoothing spline. In this condition, even setting the rate of missing values in the input data to 25% allows the correct identification of the true model.
Given the following parameters:
the true model was correctly identified with the highest rank. Note that it was necessary to increase the number of initial conditions, in other words, it was necessary to increase the amount of data for the true equation to be correctly ranked, given the high noise amplitude and missing value rate.
A model describing Osteoblast-Osteoclast activity, the same or similar to the model described in Lemaire et al., “Modeling the interactions between osteoblast and osteoclast activities in bone remodeling,” Journal of Theoretical Biology, 229(3), pp. 293-309 (2004), was performed. The following is a description of the model:
The following are the equations for the model:
The values used for the parameters are:
When not explicitly mentioned, the following parameters were used in the simulations:
Only a few additional initial conditions is sufficient to help the algorithm to find the right model.
Given an added noise with standard deviation corresponding to 10−8 and 10−4 of the time series amplitude, a term by term comparison of the differences between the calculated and the true model is provided. To make the comparison easier, both the calculated and true model were normalized in their data frame representation, i.e., the following (Python) operation was performed:
where the sympyfy_model uses the sympy.cancel to simplify the model in its rational function representation.
The results presented in the following sections are organized in tables in which each row refers to a different term from the library of functions. The first column represents the values of the coefficients for of the true equation; the second column are the coefficients of the terms that are missing in the model selected; and the in the third column are the coefficients of the terms that were selected but that are not present in the true equation of the system.
Equation for R, ρ=10−8:
For this small noise amplitude, the result for {dot over (R)} is satisfactory, since there are no extra terms and the only missing term has a very small amplitude.
Equation for B, ρ=10−8:
In this case, the extra terms all have the C2 term in common, meaning that they could be simplified and lead only to terms proportional to R and B in the function numerator, which are terms present in the original equation.
Equation for C, ρ=10−8
The solution for C presents a large number of extra terms, suggesting that the method had difficulties finding a sparse solution. This is probably explained by the fact that the dynamics for C rapidly converges to a steady state, conveying not enough information on the system dynamics.
Equation for R, ρ=10−4
The increase in the noise amplitude leads the algorithm to misidentify a significant number of extra terms, producing a non-sparse solution.
Equation for B, ρ=10−4
For the {dot over (B)} equation, the same consideration is valid, the obtained solution has too many extra terms and is non-sparse.
Equation for C, ρ=10−4
As for the Ċ equation, the problem is opposite, i.e., it is the number of missing terms that is too large, leading to non-representative solution.
For results with BIC, the true model is in the collection of models. The collection of models has a total of 21 distinct models. The following are the parameters used when generating data for the model ranking:
Note the small value for the integration time (TIME parameter), larger values make it harder for the model ranking to correctly rank the true model. That is mostly due to the short lived dynamics of the C variable, which very rapidly goes to a steady state given the set of parameters used. Under these parameters the best ranked model is not the true model, it is instead the following model:
The selected model is not sparse, in fact it has a sparsity degree (ratio selected terms) of 0.522. For that reason, it might be interesting to limit the search for the best model only including models that have a sparsity degree smaller than a pre-stablished threshold. If we set this threshold to 30%, in which case the ranking should be called:
Setting max_sparsity to 0.3 limits the search and allows the system to correctly rank the true model. Even if the noise amplitude and the missing value rate are increased to PERCENTAGE_OF_AMPLITUDE=0.01 and MISSING_VALUE_RATE=0.25, the system still correctly ranks the true model.
The dynamical system learner (or DynSysLearner) was applied to a series of real data from patients undergoing hemodialysis. The results include data obtained from the application of the method to individual patients, a goal was to look at specific models that were obtained and see their performance when reproducing the input time series through numerical integration. Results were also obtained for the application of the method to a population of patients, aiming to obtain more robust causal information from the statistical analysis of the models obtained from different patients.
The following variables were available for analysis:
The population included a total of 2558 patients, with average age (62±15) years, average weight (84±24) kg, 57.6% are male, 42.2% are declared white, 40.3% are black, 68 are asian, 25 are native hawaiian/other pacific islander, and 2 are american indian or alaska native.
When applying the method to individual patient data, the following conditions were used:
Results were determined for one single randomly selected patient with AP included and three different patients without AP.
For the results with AP, the following table summarizes the values of the regularization parameter and the corresponding sparsity degree for each variable, obtained from the Bayesian optimization:
The sparsity degree corresponds to the ratio of the number of selected terms and the length of the input library of functions. In this case, the optimizer was able to successfully find sparse models, with the least sparse model having only 15 terms. The following table depicts the confidence matrix extracted from the obtained model. Each column represents the causal relations present in the model for that particular variable, e.g., if in the column for prePP the row for NLR is True, that means that in the model prePP=ƒ(NLR).
For illustrative purposes, the following are the obtained models for each one of the variables:
For the results without AP, The exclusion of AP could improve the quality of the obtained models for two reasons: (i) by reducing the number of equations in the system, (ii) and because AP has a more indirect influence in the physiology of vascular calcification.
For Patient 1, In the following tables there are the results for the sparsity degree and regularization parameter obtained from the Bayesian optimizer with 10 iterations. Originally the value of the regularization parameter obtained for the NLR equation was not sufficiently low to enforce the desired degree of sparsity. For that reason, it was artificially decreased.
The next table shows the confidence matrix obtained from the model.
The following are the associated equations:
In order to validate the obtained models, it is possible to numerically integrate the equations and compare the solution with the original data. As initial condition, it was used the value of the variables pre-processing at t=0.
An important remark is that the entire time series was used as input data of the dynamical system learner (DynSysLearner or DynSysLearnerND), in order to more accurately test the obtained equations, it is necessary to test their efficiency against a section of the time series that was not used as input.
The following table provides the results of the regularization parameter and sparsity degree for patient 2, the Bayesian optimizer was used with only 10 iterations:
For patient 3, the results are presented in the following table:
In this case, the optimizer was unable to find an appropriate value for the regularization parameter for the equation for Albumin. consequently, the number of terms in the obtained equation was 136 (with a sparsity degree of 0.92).
In order to increase sparsity in the obtained model, the DynSysLearner can also run with regularization parameters informed by the user (in which case the Bayesian optimizer is not used). Using pre-determined regularization parameters (informed in the following table), the method is capable of finding more sparse equations (0.095 sparsity degree for the albumin equation), which leads to more stable solutions of the obtained equations, as shown in the graphs of
In order to analyze the performance of the system over a population of patients, a smaller subpopulation was selected from the original one with 2558 patients. The inclusion criteria was patients that had at least 1.5 years of data and did not present any of the following comorbidities: Ischemic heart disease, Peripheral vascular or arterial disease, Congestive heart failure, Cardiovascular disease, Chronic obstructive pulmonary disease, Myocardial infarction including cardiac arrest comorbidity, Hypertension, Cardiac dysrhythmia, and Diabetes.
A total of 129 patients were selected that satisfied all of these conditions, with average age (57±17) years, average weight (80±25) kg, of which 40.3% are white, 36.4% are black, and 9 are Asian. The regularization parameters were fixed with the same value for every patient in the population:
The validation of the value of the regularization parameter can be made indirectly by evaluating the sparsity degree of the obtained models.
From the selected models, it is possible to perform a causal analysis that is averaged over the entire population and extracts more robust causal information between the variables.
According to some embodiments, the causal diagrams were obtained in two different ways:
From the two approaches there is a slight difference in interpretation. The first approach represents the probability that a given term was selected, therefore it is more restrictive in the sense that it counts not only the existence of a causal link, but that the causal relationships have a similar functional form for different patients. On the other hand, the second approach is less restrictive as it only takes in consideration the existence of a causal dependency between two variables, regardless of its functional form.
The same analysis from the previous section was performed on a subpopulation with only 25 randomly selected patients, in this case the regularization parameters were optimized with a Bayesian optimizer with 100 iteration.
A dynamical system modeler process may be implemented using various systems, platforms, and/or the like. For example, this Example Implementation is written in Python 3 and is structured around a Patients class, which organizes and handles patients data. The raw data may be in various forms, including, without limitation, a *.csv or other table file, for instance, where each column represents a different variable and each row is a different measure of those variables, at different times. There may be other data columns, rows, fields, and/or the like. For example, the patient data may include a column providing patient identification.
The Patients class may be structured around, for example, the Pandas multi-index data frame as described in McKinney, W., “Data Structures for Statistical Computing in Python,” Proceedings of the 9th Python in Science Conference, pp. 56-61 (2010). In some embodiments, the Patients class may include the following attributes: data: data frame with the raw data passed to the class; IDS: list of numbers with the patients' identification; npatients: an integer variable storing the total number of patients; variables: list with the column names of the variables in the study; metadata: data frame with each patient metadata: duration of the study, date of death, and/or the like; nobs: data frame with the number of observations for each variable and every patient; and parameters: values for the criteria used to select valid patients.
In some embodiments, the Patients class may include the following methods: read_raw_data: read the data from a data frame without any pre-processing (raw format)l; select_valid: select only the valid patients (invalid ones are dropped), where validity is determined by the total and a yearly number of observations; drop_borders: drop patient data from the borders of the time series; addpatient: add new patient information to the dataset; delpatients: remove patients from the dataset based on the list of IDS; to_file: write patients' information to h5 file; from_file: read patients' information from the h5 file; sort_IDS: sort the patients' IDs; and get_from_IDS: get the patient information directly from ID.
The input data is noisy and non-uniform, where non-uniformity in this context means that the time step between successive points is not constant throughout the time series. Accordingly, the time series is pre-processed. The pre-processing is managed by the class preProcessPopulation which applies one of the different pre-processing methods implemented to all the patients in a population stored in a Patients class object.
The preProcessPopulation class has three methods: setup: reads an object derived from the preProcessing base class. This object implements a particular pre-processing method; read_patients: reads an instance of the Patients class; and run: a method that performs the actual pre-processing, it must be called after the two previous methods.
The preProcessing class is used as the base for all the objects that implement a particular pre-processing, it provides the interface for two methods: process and verify. The process method reads the data from a particular patient and performs the actual pre-processing. Every object must have its own version of it. The verify method is called by the preProcessPopulation before it starts the pre-processing, its intent is simply verifying if all the necessary configuration was called prior to the calculation. If the child class does not implement it, the preProcessing class provides it.
Non-limiting examples of classes implemented for pre-processing may include: preProcessingDoNothing: which simply smooths the variables marked as noisy, for all the other variables it does nothing; preProcessingSpline: applies a simple spline to the data; preProcessingGP: uses Gaussian process as pre-processing; and preProcessingKNN: applies the nearest neighbors algorithm to impute the missing data.
The following code excerpt exemplifies the general workflow for pre-processing using splines.
It is possible to average the result of two different pre-processing steps, which is done through the average_population function from the preprocessing module. Two different methods to calculate the weights of the averages are implemented: linear and exponential. The usage of the average is exemplified as follows:
where the method parameter should be one of linear or exp.
The implementation of statistical analysis follows a similar structure to that of the pre-processing. It is structured around two classes: AnalysisManagerBase, which is a base class for different statistical or causal analysis one might want to implement; and PopulationAnalyzer, which applies any method to a Patients class instance.
The AnalysisManagerBase provides an interface for any child class that implements any statistical test to be applied to the Patients class. It enforces two methods: process, which must be provided by the child class, and that reads a data frame with the patient data; and verify, which is supposed to verify any configuration required by the method implemented. The base class provides a verify method that does nothing, in case the implemented method has no requisites.
There are two different correlation tests implemented: Kendall and Spearman correlation (see, for example, Kendal et al., “A New Measure of Rank Correlation,” Biometrika, 30(1-2), pp. 81-93 (1938) and Kokoska et al., CRC Standard Probability and Statistics Tables and Formulae, Student Edition. Taylor & Francis (2000)), which are available through the classes KendallCorrelationManager and SpearmanCorrelationManager, respectively. The following shows how to calculate Kendall correlation for a particular population of patients:
The results are stored in a multi-index data frame at analyzer.results. Both classes provide an average_over_population static method, which averages the correlations calculated for each patient. The correlation_heatmap function is a wrapper that plots the heatmap from matplotlib for the calculated correlations.
The causality analyses are also implemented as classes derived from AnalysisManagerBase, namely: GrangerCausalityManager and KernelGrangerCausalityManager, for the Granger causality and kernel Granger causality, respectively. Both classes share a similar structure (as they are both derived from the same parent class), only the kernel Granger causality has a method get_causality_metadata implemented that is responsible for calculating the values of hyperparameters of the method, and it should be called right after the class in instantiated. The following code shows how to use the KernelGrangerCausalityManager class:
The code is implemented for separate data only, i.e., given a set of data for X and {dot over (X)} (where X is a matrix where each column corresponds to the time-series of a different variable), the algorithm returns a set of coefficients for each element of a given pre-established function library.
The library of functions is defined through the class LibraryConstructor, this object should be instantiated before the call of select_best_model. The following provides exemplary usage:
Where order is the maximum order of the polynomial terms in the library of functions, X and Xdot are the time series and their derivatives, respectively. Model is a data frame where each column corresponds to a different variable and each row to a different term in the library of functions, the value of each element corresponds to the coefficient of that term in the selected model. The function sympyfy_model transforms the data frame representation of the selected model into a list of sympy expressions.
A potential source of error from the causality analysis is its dependency on the imputation method used to handle the missing data values. A possible approach to overcome this difficulty is to use a probabilistic approach derived from Gaussian processes. Given a data matrix X where each column corresponds to the time series of a different variable of your system, and Σ is the matrix of all their standard deviations, and a Kernel function K; let (X, Σ, K) be the Gaussian process derived from X and K in a way that ˜(X, Σ, K) is a random time series sampled from .
Each different sample drawn from the GP, when subjected to the Kernel Granger causality method, results in a different confidence matrix. Consequently, a causal link, let us say from xi→xj, might not be present in the confidence matrix from all the samples. Based on the frequency of selection of a given causal link over multiple samples, it is possible to build the probability Pi→j that variable xi causes xj.
Given an appropriate choice of the kernel function K, and the correct estimation of the variables' standard deviation Σ, the samples from the Gaussian process are all valid models for the time series, given the information available. In this regard, the replacement of a deterministic view on causality by a probabilistic one directly handles the problem of the validity of the chosen imputation/modeling method used.
To use Granger-based methods to identify causal relations in deterministic non-linear systems might be problematic because these methods rely on the dynamics of the variables of the system being separable, which is not always true for non-linear systems. For this reason, special care must be taken when looking for causal relations in such systems.
One interpretation for causality is that if changes in xi lead to changes in xj, then xi is said to cause xj. Possibly, this relationship could be estimated by measuring how much information is shared by xi and the derivative of xj. Assuming mutual information as a valid measure for this goal, we could define the Boolean variable (i→j) as being true if xi causes xj, under the condition:
(i→j)=μ(xi,{dot over (x)}j)>ϵ
where μ(x, y) is the mutual information between x and y, and ϵ is a pre-established threshold. The use of mutual information (or any other metric of shared information between the variables and the derivatives) can help reduce the number of terms in the library of functions in the model selection algorithm, which could greatly reduce its time of execution.
The method includes calculation of the sparse regression on the terms of the function library. To that end, several different methods can be used, including, without limitation: STLSQ: sequential thresholding least square, which performs repeated linear regression on the data, removing from the following iterations the terms with the absolute value of the coefficients smaller than a given threshold; EFS: evolutionary feature synthesis, a regression technique on features based on evolutionary computation; FFX: fast function extraction, a symbolic regression technique; STRidge: sequential threshold ridge regression, which is similar to STLSQ, but it uses a ridge regression instead of a linear regression.
The method of model selection may include the discovery of systems with some spatial dependency, in which case a variable of interest (ψ) will depend not only on time t, but also on some spatial variable x, therefore ψ=ψ(t, x). The mathematical formulation of this problem is not fundamentally different from the model selection algorithm presented for the case in which ψ is only time dependent. However partial differential equation recovery from noisy data seems to be more sensitive to the noise magnitude, making the denoising step of the process of paramount importance.
Let ψ(t, {right arrow over (r)}) be the real value of the variable of interest, where {right arrow over (r)}∈n is a vector of the spatial coordinates, where n can be any value, but will generally be limited to 1, 2, or 3. It is assumed that the spatiotemporal evolution of ψ(t, {right arrow over (r)}) is given by:
(ψ,ψt,ψ{right arrow over (r)},ψ{right arrow over (r)}{right arrow over (r)}, . . . ,{right arrow over (r)},t)=0
Where is an operator on ψ and its derivatives. is supposed to be linear on a small number of terms, e.g., the Korteweg-De Vries equation is a linear combination of the terms ψt, ψxxx and ψψx, so that the KdV equation is given by: ψt+ψxxx−6ψψx=0. If noisy data ϕ is measured at different locations of t and {right arrow over (r)}, and:
ϕ(t,{right arrow over (r)})=(t,{right arrow over (r)})+σ(t,{right arrow over (r)})
where σ is the noise amplitude, which might depend on the spatial location of the measured point. Let (⋅, nt, nx) be a denoising operator, which could be a Gaussian process (Rasmussen and Williams, 2005), or some machine learning-based algorithm, and nt and nx are the orders of desired time and space derivatives, respectively. In a way that the application of on ϕ allows the recovery of an approximation of ψ and its derivatives, denoted {tilde over (ψ)}, i.e., ψ≈{tilde over (ψ)}=(ϕ, nt=0, nx=0).
With the denoising operator at hand, it is possible to build a function library Λ, which may be or may include a matrix where each column is a different term of the library, and each row corresponds to a different time and space position from the data:
In this way, the model selection algorithm is reduced to finding the sparse solution {right arrow over (ξ)} to the following equation:
Λ{right arrow over (ξ)}=0
where {right arrow over (ξ)} is a vector with the coefficients of each term in the function library.
Delay differential equations are ordinary differential equations in which the current value of the derivative depends on past as well as current values of the state variables, i.e.:
{right arrow over (x)}={right arrow over (ƒ)}(t,{right arrow over (x)},{right arrow over (x)}τ)
where {right arrow over (x)}τ(t)={right arrow over (x)}(t−τ). Its formulation is entirely like the formulation of the original model selection algorithm, with the difference that the library of functions should be extended with terms from τ times before, i.e., let Λt be the original library of functions at time t, the new extended library {tilde over (Λ)}t is given by:
{tilde over (Λ)}t=(ΛtΛt-τ)
The determination of the appropriate value of τ can be made from different methods: (i) it can be any typical time scale derived from the underlying problem; (ii) it can be derived from the first minimum of the autocorrelation function or from the delayed mutual information.
The applications of the CKD/ESRD condition analysis process, such as a vascular calcification analysis process, described in the present disclosure are not limited to the study of vascular calcification on patients with CKD, but can be extended to the study of any system in which there is access to time series data. One non-limiting application may include the development of virtual clinics. Different from the usual machine learning models, in which a prediction from a target variable is made based on input. The method proposed allows the identification of a model that explains the relationship between the studied variables and allows for simulation and numerical experimentation. With such a model at hand, it is possible to simulate the effect of different interventions, which can provide insight into different courses of action for treatment.
More importantly, because the model is derived directly from the data set, it is possible not only to generate a general model with scientific interest, which can be used to better understand a given phenomenon. But it is also possible to determine a model-specific for a particular data set (from one patient, for example), which could provide information tailored for that particular patient, instead of a model derived from average behavior from a given population of patients.
As used in this application, the terms “system” and “component” and “module” are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution, examples of which are provided by the exemplary computing architecture 3800. For example, a component can be, but is not limited to being, a process running on a processor, a processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium), an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a server and the server can be a component. One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers. Further, components may be communicatively coupled to each other by various types of communications media to coordinate operations. The coordination may involve the uni-directional or bi-directional exchange of information. For instance, the components may communicate information in the form of signals communicated over the communications media. The information can be implemented as signals allocated to various signal lines. In such allocations, each message is a signal. Further embodiments, however, may alternatively employ data messages. Such data messages may be sent across various connections. Exemplary connections include parallel interfaces, serial interfaces, and bus interfaces.
The computing architecture 3800 includes various common computing elements, such as one or more processors, multi-core processors, co-processors, memory units, chipsets, controllers, peripherals, interfaces, oscillators, timing devices, video cards, audio cards, multimedia input/output (I/O) components, power supplies, and so forth. The embodiments, however, are not limited to implementation by the computing architecture 3800.
As shown in
The system bus 3808 provides an interface for system components including, but not limited to, the system memory 3806 to the processing unit 3804. The system bus 3808 can be any of several types of bus structure that may further interconnect to a memory bus (with or without a memory controller), a peripheral bus, and a local bus using any of a variety of commercially available bus architectures. Interface adapters may connect to the system bus 3808 via a slot architecture. Example slot architectures may include without limitation Accelerated Graphics Port (AGP), Card Bus, (Extended) Industry Standard Architecture ((E)ISA), Micro Channel Architecture (MCA), NuBus, Peripheral Component Interconnect (Extended) (PCI(X)), PCI Express, Personal Computer Memory Card International Association (PCMCIA), and the like.
The system memory 3806 may include various types of computer-readable storage media in the form of one or more higher speed memory units, such as read-only memory (ROM), random-access memory (RAM), dynamic RAM (DRAM), Double-Data-Rate DRAM (DDRAM), synchronous DRAM (SDRAM), static RAM (SRAM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, polymer memory such as ferroelectric polymer memory, ovonic memory, phase change or ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, an array of devices such as Redundant Array of Independent Disks (RAID) drives, solid-state memory devices (e.g., USB memory, solid-state drives (SSD) and any other type of storage media suitable for storing information. In the illustrated embodiment shown in
The computer 3802 may include various types of computer-readable storage media in the form of one or more lower speed memory units, including an internal (or external) hard disk drive (HDD) 3814, a magnetic floppy disk drive (FDD) 3816 to read from or write to a removable magnetic disk 3811, and an optical disk drive 3820 to read from or write to a removable optical disk 3822 (e.g., a CD-ROM or DVD). The HDD 3814, FDD 3816 and optical disk drive 3820 can be connected to the system bus 3808 by an HDD interface 3824, an FDD interface 3826 and an optical drive interface 3828, respectively. The HDD interface 3824 for external drive implementations can include at least one or both of Universal Serial Bus (USB) and IEEE 1114 interface technologies.
The drives and associated computer-readable media provide volatile and/or nonvolatile storage of data, data structures, computer-executable instructions, and so forth. For example, a number of program modules can be stored in the drives and memory units 3810, 3812, including an operating system 3830, one or more application programs 3832, other program modules 3834, and program data 3836. In one embodiment, the one or more application programs 3832, other program modules 3834, and program data 3836 can include, for example, the various applications and/or components of computing device 110.
A user can enter commands and information into the computer 3802 through one or more wired/wireless input devices, for example, a keyboard 3838 and a pointing device, such as a mouse 3840. These and other input devices are often connected to the processing unit 3804 through an input device interface 3842 that is coupled to the system bus 3808, but can be connected by other interfaces.
A monitor 3844 or other types of display device is also connected to the system bus 3808 via an interface, such as a video adaptor 3846. The monitor 3844 may be internal or external to the computer 3802. In addition to the monitor 3844, a computer typically includes other peripheral output devices, such as speakers, printers, and so forth.
The computer 3802 may operate in a networked environment using logical connections via wired and/or wireless communications to one or more remote computers, such as a remote computer 3848. The remote computer 3848 can be a workstation, a server computer, a router, a personal computer, portable computer, microprocessor-based entertainment appliance, a peer device or other common network node, and typically includes many or all of the elements described relative to the computer 3802, although, for purposes of brevity, only a memory/storage device 3850 is illustrated. The logical connections depicted include wired/wireless connectivity to a local area network (LAN) 3852 and/or larger networks, for example, a wide area network (WAN) 3854. Such LAN and WAN networking environments are commonplace in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which may connect to a global communications network, for example, the Internet.
The computer 3802 is operable to communicate with wired and wireless devices or entities using the IEEE 802 family of standards, such as wireless devices operatively disposed in wireless communication (e.g., IEEE 802.115 over-the-air modulation techniques). This includes at least Wi-Fi (or Wireless Fidelity), WiMax, and Bluetooth™ wireless technologies, among others. Thus, the communication can be a predefined structure as with a conventional network or simply an ad hoc communication between at least two devices. Wi-Fi networks use radio technologies called IEEE 802.11x (a, b, g, n, etc.) to provide secure, reliable, fast wireless connectivity. A Wi-Fi network can be used to connect computers to each other, to the Internet, and to wire networks (which use IEEE 802.3-related media and functions).
Numerous specific details have been set forth herein to provide a thorough understanding of the embodiments. It will be understood by those skilled in the art, however, that the embodiments may be practiced without these specific details. In other instances, well-known operations, components, and circuits have not been described in detail so as not to obscure the embodiments. It can be appreciated that the specific structural and functional details disclosed herein may be representative and do not necessarily limit the scope of the embodiments.
Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. These terms are not intended as synonyms for each other. For example, some embodiments may be described using the terms “connected” and/or “coupled” to indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
Unless specifically stated otherwise, it may be appreciated that terms such as “processing,” “computing,” “calculating,” “determining,” or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical quantities (e.g., electronic) within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices. The embodiments are not limited in this context.
It should be noted that the methods described herein do not have to be executed in the order described, or in any particular order. Moreover, various activities described with respect to the methods identified herein can be executed in a serial or parallel fashion.
Although specific embodiments have been illustrated and described herein, it should be appreciated that any arrangement calculated to achieve the same purpose may be substituted for the specific embodiments shown. This disclosure is intended to cover any and all adaptations or variations of various embodiments. It is to be understood that the above description has been made in an illustrative fashion, and not a restrictive one. Combinations of the above embodiments, and other embodiments not specifically described herein, will be apparent to those of skill in the art upon reviewing the above description. Thus, the scope of various embodiments includes any other applications in which the above compositions, structures, and methods are used.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
As used herein, an element or operation recited in the singular and proceeded with the word “a” or “an” should be understood as not excluding plural elements or operations, unless such exclusion is explicitly recited. Furthermore, references to “one embodiment” of the present disclosure are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features.
The present disclosure is not to be limited in scope by the specific embodiments described herein. Indeed, other various embodiments of and modifications to the present disclosure, in addition to those described herein, will be apparent to those of ordinary skill in the art from the foregoing description and accompanying drawings. Thus, such other embodiments and modifications are intended to fall within the scope of the present disclosure. Furthermore, although the present disclosure has been described herein in the context of a particular implementation in a particular environment for a particular purpose, those of ordinary skill in the art will recognize that its usefulness is not limited thereto and that the present disclosure may be beneficially implemented in any number of environments for any number of purposes. Accordingly, the claims set forth below should be construed in view of the full breadth and spirit of the present disclosure as described herein.
This application claims the benefit to U.S. Provisional Application No. 63/195,818, filed Jun. 2, 2021, the entire contents of which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
63195818 | Jun 2021 | US |