The disclosed implementations relate generally to healthcare applications and more specifically to a method, system, and device for machine learning derived personalized disease treatment recommendations.
Healthcare providers encounter situations in which there are multiple guideline-endorsed treatment options available but no clear best choice for an individual patient. In such cases, it would be helpful to have an up-to-date efficacy comparison across all treatment options to guide decisions. Given the nature of bias and confounding in medicine, it is a requirement that these comparisons be conducted in a causal framework to understand the effect of the treatment choice itself. However, it is arduous to do massively multi-comparator randomized control trials (RCTs) of already approved treatments due to scale, cost, and time. To date, observational causal framework methodologies have been practically limited to working with only a few simultaneous trial arms.
Accordingly, there is a need for an automated causal recommender system (e.g., for chronic-disease management) that is trained on real-world evidence from electronic medical records and health insurance claims. This casual recommender system, a set of machine-learning derived measures, can be used to suggest personalized treatment regimens. The automated causal recommendation engine described herein can be used for chronic-disease management, and is capable of assessing an arbitrarily large number of simultaneous treatment options across numerous patient sub-groups. Moreover, the causal recommender can be used to generate a ranked list of treatments based on real-world observed efficacy for each sub-population while controlling for an arbitrarily large number of confounders.
In one aspect, some implementations include a computer-implemented method for implementing a causal recommender for personalized disease treatment selection using machine learning. The method includes obtaining health trajectories for patients. Each health trajectory corresponds to a respective patient and represents a time-ordered series of health events for the respective patient. Each health trajectory may include at least one health condition, at least one treatment event and at least one index event. Each health trajectory includes a respective plurality of sub-trajectories. Each sub-trajectory includes a treatment event and ends at a respective index event.
The method also includes stratifying the sub-trajectories for each patient to form a plurality of stratified patient segments. In some implementations, each stratified patient segment corresponds to a separate and distinct health condition and includes the sub-trajectories for patients that have the health condition. In some implementations, stratification is not based only on separate and distinct health conditions. Any set of patient covariates (e.g., age, gender) is used to segment out the population. In some implementations, a series of if- then-else rules can be used for stratification. For example, these rules are obtained from clinicians or other health experts.
The method also includes performing, for each segment of the plurality of stratified patient segments, pairwise causal inference analysis on one or more treatments corresponding to the sub-trajectories of the respective segment, to estimate average treatment effect (ATE) values. Network meta-analysis is performed on the ATE values, thereby ranking the one or more treatments for each patient in the respective segment. In accordance with a determination that a respective patient has a health condition which could cause a set of treatments to be unsafe based on one or more clinical rules and the health trajectories, reranking the one or more treatments after excluding the set of treatments. The method also includes outputting treatment options for personalized disease treatment selection for a patient based on ranked treatments for the plurality of stratified patient segments. For example, a patient may have some comorbidities that make a recommendation contraindicated. For example, insurance claims data may show that prescription caused health issues in a given population.
In some implementations, performing the network meta-analysis includes constructing a densely connected network graph based on the ATE values, and performing a Network Meta-Analysis (NMA) (e.g., Bayesian Network Meta-Analysis, Frequentist) on the densely connected network graph. For example, for Bayesian-Network Meta-Analysis, a hierarchical random-effects model is used, according to some implementations. For example, each node of the densely connected network graph may correspond to a treatment, and may be connected to every other node via the measured ATE as an edge to obtain a densely connected network.
In some implementations, performing the Network Meta-Analysis (NMA) includes computing synthesized ATEs, for the ATE values, and computing the synthesized ATEs against a baseline treatment. For example, in a causal recommender for Diabetes, the baseline treatment may be set to Metformin because it is the first-line therapy for Type 2 Diabetes Mellitus or T2DM.
In some implementations, performing the Network Meta-Analysis (NMA) includes computing a Surface Under the Cumulative RAnking curve (SUCRA) score for each treatment. For example, samples may be drawn from the posterior predictive distributions of the trained model to compute Surface Under the Cumulative RAnking curve (SUCRA) scores for all treatments. Performing the Network Meta-Analysis (NMA) further includes ranking the one or more treatments according to the SUCRA curve. For example, ranking the one or more treatments according to the SUCRA curve may lead to the most effective treatment being assigned the highest rank. If a Frequentist approach is used for NMA, ranking can be done on the basis of p-values.
In some implementations, the pairwise causal inference analysis includes neural-network causal analysis to determine causal inference between each pair of treatments of the one or more treatments. For example, this step uses a neural-network-based propensity-score model for causal inference. In some implementations, the pairwise causal inference analysis estimates a total of N2 unique ATE values per segment, where N is the number of treatments (e.g., 15,000) in the respective segment.
In some implementations, the pairwise causal inference analysis uses inverse probability of treatment weighting (IPTW) method, where patients in control and treatment arms are assigned weights equal to the inverse probability for getting the treatment they received.
In some implementations, stratifying the sub-trajectories includes grouping patients on the basis of clinical covariates in the health trajectories.
In some implementations, stratifying the sub-trajectories is performed by applying a machine learning algorithm on the health trajectories. In some implementations, the machine learning algorithm is an unsupervised k-means clustering that clusters similar patients based on treatments.
In some implementations, stratifying the sub-trajectories is performed by generating a bespoke recommender for each patient trained on a cohort of their k-nearest-neighbors.
In some implementations, stratifying the sub-trajectories includes splitting (sometimes called segmenting) the health trajectories into segments based on age, prior treatment, and comorbidity index values.
In some implementations, the method further includes selecting treatments that have at least a minimum cohort size to include in the one or more treatments. For example, a minimum cohort size may be a predetermined value, such as 30.
In some implementations, the method further includes: for each patient of the plurality of patients: identifying a respective treatment event and a respective index event in the health trajectory for the respective patient, wherein a respective index event is any clinical or health data point; and segmenting the health trajectory into a respective plurality of sub-trajectories such that each sub-trajectory includes a treatment event and ends at a respective index event. In some implementations, each sub-trajectory terminates in a pair of lab measurements (e.g., HbA1c lab measurements, two blood pressure events for Diabetes), and the method further includes: computing age of the respective patient, any comorbidities, and prior medication as of the date of the first lab of the pair of lab measurements in the sub-trajectory; and using current medication as the treatment for the patient corresponding to the sub-trajectory, for the period between the two labs of the pair of lab measurements. In some implementations, while segmenting the health trajectory, the method also includes excluding sub-trajectories where the duration between the lab pairs is not within a predetermined time period (e.g., less than 90 days or greater than 365 days). Two or more lab measurements occurring in a very short period of time may not be very different and causally attributing the small difference to the treatment may be problematic. Instead of excluding the sub-trajectories, labs that occur very close together (e.g., same day or same week) can be averaged. On the other hand, labs that occur too far apart are also problematic, because many of the confounders are measured as of the time of the first lab of the pair, and these confounders may have changed substantially when the period between the labs is too long. This can again lead to problems in causal attribution to the treatment. In some implementations, removing patients with a single lab measurement from the plurality of patients, prior to splitting the health trajectory. In some implementations, when the respective patient has multiple medications, the method further includes using a combination of the medications as the treatment for the respective patient for the period between the two labs of the pair of lab measurements.
In some implementations, the method further includes, for a new patient, recommending a personalized treatment option by identifying one or more sub-trajectories for a particular stratified patient segment that are most similar to the new patient's health trajectory. In some implementations, for new patient(s) (e.g., a patient whose previous medical history is unknown), recommendations are based on the strata to which they belong.
In another aspect, a system configured to perform any of the above methods is provided, according to some implementations.
For a better understanding of the various described implementations, reference should be made to the Description of Implementations below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.
Reference will now be made in detail to implementations, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the various described implementations. However, it will be apparent to one of ordinary skill in the art that the various described implementations may be practiced without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure aspects of the implementations.
It will also be understood that, although the terms first, second, etc. are, in some instances, used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first electronic device could be termed a second electronic device, and, similarly, a second electronic device could be termed a first electronic device, without departing from the scope of the various described implementations. The first electronic device and the second electronic device are both electronic devices, but they are not necessarily the same electronic device.
The terminology used in the description of the various described implementations herein is for the purpose of describing particular implementations only and is not intended to be limiting. As used in the description of the various described implementations and the appended claims, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “includes,” “including,” “comprises,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used herein, the term “if” is, optionally, construed to mean “when” or “upon” or “in response to determining” or “in response to detecting” or “in accordance with a determination that,” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event]” or “in response to detecting [the stated condition or event]” or “in accordance with a determination that [a stated condition or event] is detected,” depending on the context.
As described above in the Summary section, there is a need for a causal recommendation engine capable of assessing an arbitrarily large number of simultaneous treatment options across numerous patient sub-groups, to generate a ranked list of treatments based on real-world observed efficacy, for each sub-population, while controlling for an arbitrarily large number of confounders.
Traditionally, providers have relied on medical guidelines when prescribing drugs for chronic diseases (e.g., Type 2 Diabetes Mellitus). Such guidelines are prescriptive for first-line therapies, but if first-line therapies fail, the guidelines leave each physician to use their best judgement to select one option from sometimes several thousand options. For example, when the disease has been inadequately managed for a prolonged period despite following guidelines for first or subsequent lines of treatment, physicians oftentimes resort to informed guesswork to formulate treatments that will bring their patient's disease under control. This problem is exacerbated for combination therapies where many drug choices are available, and the number of treatment options becomes combinatorially large.
To supplement guideline-based practice, a causal recommender system for healthcare management is described herein. The system is trained on real-world evidence from electronic medical records and claims of a large number of patients (e.g., more than 100,000 patients with an A1C greater than 9%). In stratified sub-populations, the recommender is trained on more than ten thousand (e.g., 15,000) confounder-adjusted, case-control, observational studies between drug combinations (e.g., anti-hyperglycemic drug combinations), ranks treatment options based on their comparative efficacy using network meta-analysis, and returns sets of most effective medications for those sub-populations. In a retrospective study, after controlling for confounding, the causal recommender system found that individuals that followed recommendations lowered their levels of glycated hemoglobin by approximately 1% versus non-compliers. Causal recommender systems like the one described here are an important step towards achieving population-level glycemic control.
In some implementations, the automated causal recommender engine utilizes Balancing Covariates Automatically Using Supervision (BCAUS) to create covariate balance guarantees along with Inverse Probability of Treatment Weighting (IPTW) as the counter-factual generating causal framework, followed by a Network Meta-Analysis (NMA) to create rankings. Some implementations use modern deep learning to optimize these well-established methods. In some implementations, as a final step, a recommendation engine is employed on held-out data to retrospectively estimate the improvements to patient outcomes that would be seen if physicians used this causal framework to inform treatment decisions. Concordant prescriptions may be defined herein as those where a top-3 treatment for a patient (sub-group specific) is given, and non-concordant as any other option.
In some implementations, the management of glycemic control (HbA1C) is used for advanced, prior treatment-exposed, adult patients with type 2 diabetes as an illustrative use case, utilizing real-world evidence (e.g., claims, laboratory results, socio-demographics, from 1.2 million patients). Patients are divided into multiple clinical sub-populations (e.g., 10 groups) based on age, insulin dependence, and disease-burden, resulting in a large number of simultaneous trial arms (e.g., approximately 15,000 simultaneous trial arms). For this example, a treatment may be defined as the combination of one or more classes of antihyperglycemics, giving 364 unique treatments. After filtering down to treatments seen, at least 35 times in a particular sub-group, there are approximately 15,000 pairwise comparisons.
Recommender systems have become ubiquitous in our online lives. We rely on them to select what movies to stream, what products to buy on websites, or which of our social media friends to follow. Such systems create personalized experiences and tailor their offerings to be most suitable for the particular characteristics of their target consumers. Additionally, when the number of available options may be potentially unlimited, they are effective at solving the “long-tail” problem by identifying choices that are less popular in the general population but may lead to increased consumer satisfaction in niche segments.
In healthcare, RCTs are conducted in small, well-controlled cohorts to evaluate safety and efficacy of medical treatments. In such trials it may be most common to compare a single treatment against a placebo, though some studies may have multiple arms where a few drugs are tested simultaneously. The physician in the clinic, outside of the idealized settings of an RCT, is faced with a different challenge: instead of deciding between placebo and treatment, the physician has to decide between several choices of approved drugs and, where combination therapy is indicated, how best to combine available drugs. Medical guidelines provide valuable support to ease this decision-making process. However, when options specified by guidelines have been exhausted, as may be the case for patients with long-standing chronic conditions, the physician may need to iterate between different treatment choices before finding the optimal one. The twin problems identified here i.e. seemingly endless choice and the lack of personalization, are well suited to be tackled by a recommender system.
Observational studies on retrospective healthcare data may provide additional evidence to support RCTs by determining the real-world efficacy of drugs. In such studies, covariate distributions between treatment arms can vary and it is essential to disambiguate between associative and causal effects of treatment. Variables that influence the treatment choice as well as the effect (referred to as confounders), if not properly accounted for, can blur the distinction between the two. Using causal inference studies, confounders are identified based on clinical inputs and/or user input, and used for model fitting, and subsequently inference purposes, according to some implementations. Treatment options are systematically identified and individuals between treatment arms are matched to simulate the controlled settings of an RCT and tease out the causal treatment effect from the observed effect. Observational studies also offer a unique opportunity to measure causal effects between treatments that haven't been compared directly in an RCT. By measuring relative effects of approved drugs, some implementations rank treatment options based on their comparative efficacy and find treatments that are optimal.
Meta-analytic studies combine results across different RCTs conducted for a particular control-treatment pair to compute a more robust estimate of the treatment effect. When multiple treatments exist, in some implementations, experimental evidence is represented as a connected graph where the nodes are the available treatments and the edges connect treatments which have been compared directly via (one or more) RCTs. In some implementations, a network meta-analysis (NMA) consolidates evidence across such a graph to indirectly compare treatments that have not been studied in an RCT and generate rankings based on the comparative efficacy of all available treatments relative to a baseline treatment.
According to some implementations, recommender systems for personalized medicine use recent advances in genomics and the large-scale digitization of patient medical records with concurrent advancements in machine learning and artificial intelligence. Unlike other application areas of recommender systems, the system described herein incorporates elements of causal modeling and uses evidence from multiple lines of investigation. In some implementations, the method described herein uses real-world data from health records of 100,000 patients to develop a recommender system for anti-hyperglycemic drugs that meets these requirements.
In some implementations, the health records 102 pertain to data spanning several years, representing the health status of patients over a specific period of time. For example, the health records 102 of 56.4 million members extracted may change over the several years the system 100 collected data. Some implementations automatically and/or continuously collect healthcare information. Some implementations use manually input data or augmented data. The health records 102 of patients may have new diagnoses, medications may be discontinued, and/or new medications may be administered.
To properly account for this evolution, each patient's health history may be segmented into a series of temporal snapshots (also referred to herein as sub-trajectories, e.g., snapshots 122, 124, 126), as shown in
Referring next to
Table 1 shown below provides example mean and standard deviation for Continuous Confounding Variables (CCV) of the health records 102 at the patient and snapshot levels, according to some implementations. For treatment personalization, individuals may be stratified based on their age and Charlson Comorbidity Index (CCI) into five segments. Each segment may be further subdivided on the basis of the presence or absence of insulin in the individual's prior treatment regimen.
Table 2 shown below provides example summary statistics for counts for binary confounding variables defined, according to some implementations.
Table 3 shown below provides an example for training datasets split into 10 segments based on age, prior use of insulin and comorbidity index values, according to some implementations. The number of available treatments, based on NMA rankings, varies for each group and may be typically higher for larger cohort sizes. Concordant percent values indicate the percent of each group that received a drug recommended in either the top 3 rankings (top-3) or the top 4-10 rankings (top 4-10) or anywhere between position 11 and a drug not in the NMA rankings list (bottom). The number of patients on a top-3 most-effective treatment is extremely small, ranging from less than 1% in the most common subgroups to approximately 10% in the subgroups containing the most extremely sick.
1%
89%
9%
The example described above stratified patients into 10 segments. A higher degree of personalization is possible by using other strategies. Unsupervised learning via k-means clustering can be used to discover subsets of related individuals. Higher degree of personalization can be achieved by defining cohorts consisting of the k-nearest neighbors of a seed member. Some implementations implement personalization by ranking according to Individual Treatment Effects (ITEs) instead of Average Treatment Effects. Such methods will improve treatment selections considerably when tested on retrospective data. Some implementations trade off clinical transparency for treatment selection options.
In some implementations, patient snapshots are split into training (80%) and evaluation (20%) datasets, such that in each dataset the relative sizes of the segments are the same. To gauge the efficacy of the causal recommender system, steps shown in an evaluation phase 224 may be performed on the trained model (model trained using the steps in training phase 222, described above). Recommendations 228 may be generated for each patient using the method described above and their change in lab measurements (e.g., HbA1c) may be recorded. The change in the lab measurements (e.g., HbA1c) between concordant cohort (where the treatment matched one of the recommendations) and non-concordant cohort may be compared in a causal inference setting to estimate the ATE of the causal recommender system. This metric is used to validate the model and anticipate, using a retrospective analysis, the additional improvement in HbA1c the recommender system would have over the standard-of-care if such a system may be deployed in a prospective study.
Among causal inference techniques, the most popular by far are ones based on propensity-score modelling. A propensity-score model is a binary classifier that is trained to predict the probability or propensity for receiving a particular intervention (either control or the treatment under study), using covariates of the cohort as input. Several variants of propensity score modelling exist. The inverse probability of treatment weighting (IPTW) method may be used, where individuals in control and treatment arms are assigned weights equal to the inverse probability for getting the treatment they received. If the propensity model is correctly specified, this weighting creates pseudo-populations in the two arms that are matched in every covariate except the intervention under study. The estimated treatment effect can then be causally attributed to the intervention. In some implementations, the typical IPTW workflow includes three steps: i) the propensity model is trained using assigned treatments as targets, ii) standardized mean differences are computed between the inverse propensity weighted covariates of the two arms to test for removal of confounder imbalance, and iii) the ATE is computed as a weighted average of the outcomes. If covariate imbalance has not been sufficiently removed in step ii, the propensity model is deemed to be incorrectly specified and the classifier has to be retrained by varying model definition or via data transformations till the two arms are sufficiently balanced. Most often, observational studies are performed to compare a single treatment against a single control and this iterative procedure suffices. However, when thousands of studies need to be performed, as is the case for the causal recommender, this approach becomes infeasible.
Some implementations use a technique called BCAUS (Balancing Covariates Automatically Using Supervision) to perform causal analysis in massive multi-arm studies that is well suited for the causal recommender. As shown in TOTAL=
BCE+νμ
BIAS, according to some implementations. The first term,
BCE is a binary cross-entropy loss which penalizes incorrect treatment assignment, while the second,
BIAS is a loss term which explicitly tries to minimize imbalance between inverse probability weighted covariates. Details of extraction and transformation processes, including causal analysis with BCAUS, are described below, according to some implementations. For each pairwise comparison between treatments in the causal recommender, a separate BCAUS model may be trained. The outputs of trained models may be used to compute inverse probability scores and estimate ATEs. A bootstrapping procedure may be used to compute standard errors and confidence intervals. The input data for NMA consisted of the estimated ATEs and standard errors.
To ascertain if all 10,000 propensity models are correctly specified, for each observational study, the standardized mean difference (SMD) between control and treatment groups for every confounder was computed prior to and after inverse-propensity-based adjustment. A commonly accepted rule-of-thumb is to consider a confounding covariate as sufficiently balanced if the SMD is below a threshold value of 0.1. The plots in
As indicated above, in some implementations, the BCAUS model is trained using the joint loss TOTAL=
BCE+νμ
BIAS. Here, μ is the scalar ratio of
BCE to
BIAS that is detached from the computation graph. The relative contribution of each loss component is tuned using hyperparameter, ν. The cross-entropy loss is calculated as:
In the above equation, t(i)∈{0, 1} is the treatment given to individual i. To compute the bias loss, the propensity score p(i) is used to compute the inverse probability weight (IPW) using the following equation:
The mean squared error of the M covariates weighted according to the equation (S2) is used to calculate the bias loss, according to the following equation:
The two terms in the equation above represent the weighted means of the covariates for the treatment and control groups respectively. To assess balance, the standardized mean difference Δj for each covariate j is computed according to the following equation:
In the equation (S4) shown above,
In some implementations, the BCAUS model is implemented in Python using the PyTorch neural networks library. Each BCAUS model consists of two hidden layers with the number of neurons in each layer set to twice the number of input covariates. Rectified Linear Units (ReLU) activation is used for all layers except the last layer consisting of single neuron which uses sigmoid activation. The learning rate is set to 0.001, the hyperparameter ν is set to 4 and the networks are trained for 1000 epochs. An early-stopping procedure is implemented where training terminated if all covariates remained balanced (i.e. standardized mean difference <0.1) for more than 10 epochs.
For each clinical subgroup, all treatments with more than 35 treated individuals are chosen and BCAUS models are trained comparing every treatment with every other treatment. For a treatment pair i and j, the estimated ATE values should be antisymmetric, i.e., ATEij=−ATEji, and for n treatments, n(n−1)/2 pairwise comparisons should suffice. However, since the propensity scores output by BCAUS are not calibrated probabilities, a small deviation from this asymmetric property (with differences much smaller than the standard error) is observed in practice. Therefore, ATEij and ATEji are computed separately and a total of n(n−1) BCAUS models are trained. Prior to training, all continuous covariates in each clinical subgroup were Z-scored to have zero mean and unit standard deviation. Propensity scores trimming is applied at the 0.01 level (e.g., propensity scores below 0.01 are set to 0.01 and those above 0.99 are set to 0.99). This ensured that no individual received an inverse propensity weight >10. A bootstrapping procedure is used to estimate the standard error for the ATE values. Inverse propensity weighted outcomes are picked at random and with replacement from the dataset and ATEs are computed between control and treatment individuals in each draw. The standard deviation of ATE values across 100 draws is reported as the standard error. ATE values and their standard errors for all pairwise treatment combinations are computed for each clinical subgroup and Network Meta-Analysis (NMA) is performed using example techniques described below, according to some implementations.
An ATE value measured via a direct causal comparison between two treatments has to be consistent with values that are indirectly estimated (under the transitivity assumption) by comparing each treatment of the pair with intermediary treatments and then computing differences, according to some implementations. To build a consolidated and self-consistent view of the evidence, a densely connected network graph is constructed for each stratified segment where every treatment node was connected with every other treatment node. Edges representing observational studies where all confounding covariates are not balanced are trimmed and NMA is performed over the resultant graph. Heterogeneity in the treated populations is accounted for by using a random-effects hierarchical model, uninformative priors are set, and a Markov Chain Monte Carlo (MCMC) sampling procedure was used to construct posterior distributions of ATE values for all treatment pairs. To determine relative ranks, samples are drawn from the posterior predictive distributions of ATEs of all treatments compared against Metformin, which are treated as the baseline treatment. For each draw, treatments are ranked in ascending order of ATE values (i.e., higher ranks for more negative values), and a mean rank is computed for each treatment across all draws. This mean rank is normalized to compute the SUCRA score. Treatments are ranked in descending order of SUCRA scores such that the treatment that reduced HbA1c by the largest amount relative to Metformin had the highest rank. This ranked list of treatments is returned to all members of the segment.
To illustrate Network Meta-Analysis, an example implementation is described herein, according to some implementations. In some implementations, Network Meta-Analysis is performed with a Python code developed using the PyMC3 probabilistic programming library. The network graph is encoded as a hierarchical, mixed-effects model:
ATEij˜Normal(δij,seij2) (S5)
δij=dij+τNormal(0,1) (S6)
d
ij
=d
i
−d
j (S7)
τ˜HalfCauchy(5) (S8)
di˜Normal(1,15*max(|ATE|)) (S9)
In the equations shown above, ATEij is the ATE value measured by comparing treatment i against treatment j and seij is the corresponding standard error, di is the ATE of treatment i relative to the baseline treatment with dbaseline=0. Uninformative priors are set for τ (the hierarchical standard deviation) and di with the standard deviation for the sampling distribution of the latter set to 15 times the maximum absolute value of measured ATEs. A non-centered parameterization is chosen for the model, because Markov Chain Monte Carlo (MCMC) samplers have difficulties sampling from the “Neal's funnel” that can lead to divergent trajectories and biased results. A No U-Turns Sampler (NUTS) is tuned with 10,000 warm-up steps and 100,000 samples are drawn from 4 chains that are run simultaneously. The tuning samples and the first 50,000 samples in each chain are discarded. To compute SUCRA scores, 200,000 samples are drawn from the posterior distribution of di and treatments are ranked for each draw. The SUCRA score for treatment i is calculated as:
In the equation (S10), Ri
is the mean rank for treatment i across all draws and n is the number of treatments (Ri∈[0, n−1]). Posterior samples are used to compute the mean and 94% credible intervals for ATE values di of all treatments relative to Metformin, the baseline treatment.
The efficacy of a recommender engine like the one described here, may be measured by deploying it in an RCT, where members of a “treatment” cohort get optimized recommendations from the engine while the “control” cohort gets the standard-of-care treatment as decided by a physician without access to the recommendations. Any differences in measured outcomes (e.g., reductions in HbA1c) can then be causally attributed to the recommender engine. In the absence of such a randomized trial, it is still possible to approximately estimate the causal effect of the recommender by performing an observational study with retrospective data. To do this, the held-out test dataset may be used as a study cohort. For each patient snapshot in this set, ranked treatment recommendations may be generated depending on the stratified segment to which the snapshot belonged. For individuals with certain health conditions, one or more of the drug classes in a particular treatment regimen may be contraindicated. A set of safety filters may be used to check if such contraindicated drugs are present in the returned list of combination treatments, and when present, the entire treatment is removed from the list. An example list of the safety filters applied is shown below, according to some implementations.
In some implementations, the health history of each individual is analyzed and any treatments which contained contraindicated drugs are removed from the set of recommendations. An example list of safety filters used to identify contraindicated drugs is shown in Table 4, according to some implementations.
After censoring contraindicated treatments, the Top-K=3 of the remaining treatments are returned for each patient snapshot. If the current treatment of a patient matched one of the recommendations, the patient may be considered to be concordant with the recommendation. Roughly ˜5% of the patients were found concordant, which implied that a large majority of patients were being treated by regimens that were less than optimal. To determine the causal effect of the recommender, a BCAUS propensity model was trained using the same confounding covariates as before but considering the concordant cohort as the treatment cohort that is treated with the recommender, and the non-concordant cohort as the control group. Inverse propensity weights were used to adjust the outcomes (difference in lab measurements, e.g., ΔHbA1c) and the ATE of the causal recommender was estimated. Results are summarized in Table 2.
The true measure of the impact of a recommender system for personalized medicine cannot be determined without conducting a well-controlled prospective study. However, the results of the retrospective study reported here may be indicative of what one might expect from a randomized experiment. A change in HbA1c in a diabetic population under managed care could lead to improvements in patient health outcomes that are substantive. By forestalling adverse events that arise from uncontrolled diabetes, it could reduce patient suffering and lead to significant reductions in healthcare costs.
The example causal recommender described here is optimized to reduce HbA1c. In some instances, absolute reductions in HbA1c may be not be desirable in certain sub-populations. For older patients, or for those with multiple comorbidities, for example, it may be more beneficial to reach appropriate targets set by the care manager. Some implementations generalize the causal recommender to optimize for meeting targets instead of absolute HbA1c reduction.
Treatment personalization in the causal recommender may be derived from two sources: (i) the stratification based on age, comorbidities, and prior insulin use and (ii) the censoring of contraindicated medications by looking through an individual's health history. The decision to use the stratification scheme described here is driven by clinical inputs. Some implementations use other stratification approaches that utilize machine learning algorithms, such as unsupervised k-means to discover natural clusters of similar patients or generating a bespoke recommender for each patient trained on the cohort of their k-nearest-neighbors. This latter approach provides a very high level of personalization but may come at the cost of increased computational overhead. Another layer of personalization is achieved by ranking treatments on Individual Treatment Effects (ITEs) instead of ATEs, according to some implementations.
The systems and methods described herein are readily extensible for making treatment recommendations for a variety of chronic diseases. In some implementations, training pipelines for causal recommenders is incorporated within the claims processing infrastructure of healthcare systems so that they can learn constantly and improve with time as more and newer data becomes available. For example, the causal recommender system describe herein could be incorporated within integrated EHR or claims processing systems so that online learning becomes possible (e.g., the model continues to learn and improve daily as more data comes in). In this way, causal recommenders can play an important role in personalizing medicine at the population level.
The foregoing description, for purpose of explanation, has been described with reference to specific implementations. However, the illustrative discussions above are not intended to be exhaustive or to limit the scope of the claims to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The implementations are chosen in order to best explain the principles underlying the claims and their practical applications, to thereby enable others skilled in the art to best use the implementations with various modifications as are suited to the particular uses contemplated.
Number | Date | Country | |
---|---|---|---|
63144357 | Feb 2021 | US |