The subject matter described herein relates generally to machine learning and more specifically to deep learning enabled techniques for patient stratification.
In many domains, early recognition of diseases and timely initiation of medical interventions have been shown to be an effective approach to improve clinical outcomes. As one example, early recognition of life-threatening conditions, such as sepsis, as well as timely initiation of life-saving treatments in hospitalized patients, such as antibiotics, have increased patient survival.
Systems, methods, and articles of manufacture, including computer program products, are provided for machine learning enabled patient stratification. In one aspect, there is provided a system for patient stratification. The system may include at least one processor and at least one memory. The at least one memory may include program code that provides operations when executed by the at least one processor. The operations may include: applying a first machine learning model to determine, based at least on a clinical data of a patient, a risk score for the patient; in response to the risk score for the patient exceeding a first threshold, applying a second machine learning model to determine a first probability of the risk score being a false positive; in response to the risk score for the patient failing to exceed the first threshold, applying a third machine learning model to determine a second probability of the risk score being a false negative; and determining, based at least on the risk score, the first probability of the risk score being the false positive, and the second probability of the risk score being the false negative, one or more clinical recommendations for the patient.
In another aspect, there is provided a method for machine learning enabled patient stratification. The method may include: applying a first machine learning model to determine, based at least on a clinical data of a patient, a risk score for the patient; in response to the risk score for the patient exceeding a first threshold, applying a second machine learning model to determine a first probability of the risk score being a false positive; in response to the risk score for the patient failing to exceed the first threshold, applying a third machine learning model to determine a second probability of the risk score being a false negative; and determining, based at least on the risk score, the first probability of the risk score being the false positive, and the second probability of the risk score being the false negative, one or more clinical recommendations for the patient.
In another aspect, there is provided a non-transitory computer readable medium storing instructions. When executed by at least one data processor, the instructions may cause operations that include: applying a first machine learning model to determine, based at least on a clinical data of a patient, a risk score for the patient; in response to the risk score for the patient exceeding a first threshold, applying a second machine learning model to determine a first probability of the risk score being a false positive; in response to the risk score for the patient failing to exceed the first threshold, applying a third machine learning model to determine a second probability of the risk score being a false negative; and determining, based at least on the risk score, the first probability of the risk score being the false positive, and the second probability of the risk score being the false negative, one or more clinical recommendations for the patient.
In some variations of the methods, systems, and non-transitory computer readable media, one or more of the following features can optionally be included in any feasible combination. A conformity metric indicative of a similarity between the clinical data of the patient and one or more conformal sets may be determined. In response to the conformity metric satisfying a second threshold, the first machine learning model may be applied to determine the risk score for the patient. In response to the conformity metric failing to satisfy the second threshold, the clinical data of the patient may be rejected as indeterminate.
In some variations, the clinical data of the patient may be encoded to generate a reduced dimension representation of the clinical data. The conformity metric indicative of the similarity between the clinical data of the patient and one or more conformal sets may be determined based at least on the reduced dimension representation of the clinical data.
In some variations, the conformity metric may include a Euclidean distance or a cosine distance.
In some variations, the one or more conformal sets may include a control conformal set of clinical data associated with patients without a disease and a case conformal set of clinical data associated with patients with the disease.
In some variations, the one or more conformal sets may be generated by at least clustering training data including true cases of patients with a disease, true controls of patients without the disease, false positives of patients without the disease but diagnosed as having the disease, and false negatives of patients with the disease but diagnosed as without the disease.
In some variations, the first probability of the risk score being the false positive and the second probability of the risk score being the false negative may be determined based on one or more of a quantity of missing clinical variables in the clinical data, an uncertainty associated with the risk score, an extent of conformity between the clinical data and the one or more conformal sets, and a number of nearest neighbors with discordant labels or a spread in the risk score.
In some variations, an uncertainty associated with the risk score of the patient may be determined. The one or more clinical recommendations for the patient may be determined based at least on the uncertainty associated with the risk score.
In some variations, the uncertainty associated with the risk score of the patient may include an uncertainty associated with the first machine learning model. The uncertainty associated with the first machine learning model may be determined by at least applying a Monte Carlo dropout to assess a change in the risk score caused by ignoring an output of one or more layers of the first machine learning model.
In some variations, the uncertainty associated with the risk score of the patient may include an uncertainty associated with the clinical data. The uncertainty associated with the clinical data may be determined by at least assessing a change in the risk score caused by excluding random portions of the clinical data.
In some variations, the uncertainty associated with the risk score of the patient may be assessed based on a quantity similar patients with discordant labels and/or a spread in risk score determined at least by repeated substitution of at least one missing feature of the patient by a value of a corresponding feature or a most relevant feature from similar patients.
In some variations, the first machine learning model, the second machine learning model, and the third machine learning model may be feed forward neural networks.
In some variations, the one or more clinical recommendations for the patient may be determined by applying a decision tree to the risk score, an uncertainty associated with the risk score, a contextual information for the patient, the first probability of the risk score being the false positive, and the second probability of the risk score being the false negative.
In some variations, the one or more clinical recommendations may be generated based at least on a context of the patient being one of emergency, general wards, or intensive care.
In some variations, the one or more clinical recommendations may include notifying a clinician and enrolling in a clinical trial.
In some variations, the one or more clinical recommendations may include ordering one or more additional labs.
In some variations, the one or more additional labs may provide one or more clinical observations determined by at least identifying one or more similar patients and identifying the one or more clinical observations as a set of most important features included in a clinical data of the one or more similar patients but missing from the clinical data of the patient.
In some variations, the set of most important features may be determined by altering one or more input features provided to the first machine learning model to identify a set of input features that cause the risk score of the patient to exceed the first threshold, and ranking the set of input features based on a magnitude of change relative to a baseline value.
In some variations, a measured clinical outcome of the patient as a result of implementing the one or more clinical recommendations may be determined. An expected clinical outcome of the patient may be determined. An adjustment to one or more hyper-parameters associated with the determining of the one or more clinical recommendations may be determined based at least on a difference between the measured clinical outcome and the expected clinical outcome.
In some variations, the difference between the measured clinical outcome and the expected clinical outcome may be decomposed into a first fraction attributable to a change in clinical practice directly engendered by the one or more clinical recommendations and a second fraction attributable to other unmeasured confounders. The adjustment to the one or more hyper-parameters associated with the determining of the one or more clinical recommendations may be determined based at least on the first fraction attributable to the change in clinical practice directly engendered by the one or more clinical recommendations.
In some variations, the adjustment to the one or more hyper-parameters may be determined by at least performing a Bayesian optimization.
In some variations, the measured clinical outcome and the expected clinical outcome may include one or more of a mortality, a length of stay, and a cost of care.
In some variations, the expected outcome of the patient may be determined by applying a fourth machine learning model trained to predict clinical outcomes.
In some variations, the one or more hyper-parameters may include an alert threshold for various patient groups, a maximum number of allowable alarms per time period, and a frequency of ordering of additional labs.
In some variations, the adjustment to the one or more hyper-parameters may be determined for a specific subset of patients as defined based on one or more of phenotypes, care settings, and diagnostic related grouping (DRG).
Implementations of the current subject matter can include, but are not limited to, methods consistent with the descriptions provided herein as well as articles that comprise a tangibly embodied machine-readable medium operable to cause one or more machines (e.g., computers, etc.) to result in operations implementing one or more of the described features. Similarly, computer systems are also described that may include one or more processors and one or more memories coupled to the one or more processors. A memory, which can include a non-transitory computer-readable or machine-readable storage medium, may include, encode, store, or the like one or more programs that cause one or more processors to perform one or more of the operations described herein. Computer implemented methods consistent with one or more implementations of the current subject matter can be implemented by one or more data processors residing in a single computing system or multiple computing systems. Such multiple computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including, for example, to a connection over a network (e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.
The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims. While certain features of the currently disclosed subject matter are described for illustrative purposes in relation to the stratification of sepsis patients, it should be readily understood that such features are not intended to be limiting. The claims that follow this disclosure are intended to define the scope of the protected subject matter.
The accompanying drawings, which are incorporated in and constitute a part of this specification, show certain aspects of the subject matter disclosed herein and, together with the description, help explain some of the principles associated with the disclosed implementations. In the drawings,
When practical, similar reference numbers denote similar structures, features, or elements.
Early recognition of diseases and timely initiation of medical interventions have been shown to be an effective approach to improve clinical outcomes. The increased adoption of Electronic Health Records (EHRs) in hospitals has motivated the development of machine learning based models for the early prediction of physiological decompensation. However, a major barrier to implementation of such predictive system is the high false alarm rate, which can lead to high cognitive burden on the end-user (e.g., alarm fatigue) and unnecessary and potentially harmful interventions. This high false alarm rate may be due to a number of factors including, for example, uncertainty in the coefficients of the prediction model (e.g., machine learning model), data quality and characteristics (e.g., a shift in demographic characteristics), healthcare-specific variations in data generating process (e.g., frequency of ordering of laboratory tests), and/or the like. False negative diagnosis (e.g., a risk score below the detection threshold for a true case) may occur due to missing data. Meanwhile, false positive diagnosis (e.g., risk score above the detection threshold for a control case) may occur due to erroneous data or an outlier case previously unseen by the prediction model.
In some example embodiments, a patient stratification system may be configured to support a multi-pronged machine learning based patient stratification workflow. For example, the patient stratification system may include a data analyzer trained to detect non-conformal clinical data and a stratification controller configured to generate, based on conformal clinical data, one or more clinical recommendations based on the probability of false positives and false negatives associated with the conformal clinical data. The stratification controller may include a predictive model trained to determine a risk score based on features extracted from clinical data identified as conformal by the data analyzer. Furthermore, the stratification controller may include one or more additional models (e.g., a false positive network, a false negative network, an uncertainty network, and/or the like) trained to assess the validity of risk score determined by the predictive model (e.g., probabilities the risk score being a true positive or a true negative, uncertainty associated with the risk score, and/or the like) and before a decision analyzer generates one or more actionable clinical recommendations. For instance, the decision analyzer may combine information about data missingness and quality, metrics of target data distribution shift and/or conformity, model uncertainty, and/or other contextual information (e.g., patient, provider, and/or care facility-related information) to predict potential true or false episodes of an impending critical event and stratify patients into actionable sub-groups.
As one example deployment, the patient stratification system may be configured to diagnose and generate clinical recommendations for sepsis. Accordingly, the various models included in the patient stratification system may be trained using a development dataset that includes electronic health record (EHR) data of one million patients across multiple academic medical centers before the patient stratification system is deployed in a community hospital and exposed to a target dataset in which at least some patients exhibit novel genetic makeup and comorbidities not present in the development dataset. In this deployment environment, the patient stratification system is able to use features derived from the characteristics of data at the community hospital and metrics of model uncertainty to significantly reduce the incidence of false positives and false negative in its predictions. The resulting actionable recommendations provided by the patient stratification system (e.g., order additional labs to reduce model uncertainty, consult with a clinician to start the patient on antibiotics, and/or the like) is therefore associated with better overall clinical outcome as well.
In some example embodiments, the patient stratification system may implement a variety of techniques for reducing false negatives and false positives in the output of its constituent machine learning models. These techniques include combining information embedded in model-uncertainty, correlations among observations, and a shift in characteristics of target data distribution to significantly reduce false alarms and provide actionable recommendations. For example, the predictive model may rely on the observation that physiological systems operate under closed feedback loop and often multiple abnormal laboratory measurements occur in consort prior to patient decompensation. In some cases, data uncertainty may be captured by randomly excluding certain observations and assessing the change in predictive risk scores. Meanwhile, model uncertainty may be captured by systematically blocking certain nodes in the model (e.g., randomly ignoring or “dropping out” the output of some layers of a neural network) and assessing the change in predictive risk scores (e.g., Monte Carlo dropout). In such settings, a large variation in risk may indicate that the system is overly relying on a few spurious observations to produce a risk score. Accordingly, the stratification controller may use this information to identify potential false positives or false negatives.
In some example embodiments, the data analyzer of the patient stratification system may use a distance metric (e.g., Euclidean distance, cosine distance, and/or the like) to compare observations (or a representation of the data such as an encoding) from a target dataset to that of the development dataset to detect outliers and to characterize the degree of similarity or conformity of various datasets at deployment time. The patient stratification system may use this information, in association with other features of data missingness and uncertainty, to identify potential false positives or false negatives, and provide actionable recommendations to the end-user (e.g., order additional labs to reduce model uncertainty, consult with a clinician to start the patient on antibiotics, and/or the like).
In some example embodiments, the patient stratification system 100 may be configured to determine, based at least on a patient's clinical data 135 (e.g., electronic health record (EHR) data) from the data store 130, one or more clinical recommendations for the patient. For example, the patient stratification system 100 may apply one or more machine learning models to determine the one or more clinical recommendations. Moreover, the one or more clinical recommendations may be determined based on the clinical data 135 as well as the uncertainties associated with the clinical data 135 and the machine learning models operating on the clinical data 135. Examples of clinical recommendations include notifying a clinician, enrolling in a clinical trial, and ordering additional labs. The one or more clinical recommendations may be display, for instance, in a user interface 125 at the client device 120.
Another example deployment of the patient stratification system 100 is shown in
Referring again to
In some example embodiments, the causal inference and meta learning (CaMeL) sub-system (25) may track measured clinical outcomes (26), thus tracking improvements in clinical, quality, and, financial indices (e.g., hourly organ failure scores, mortality, length of stay, cost of care, compliance with recommended treatment protocols, and/or the like). As shown in
The real-time quality improvement assessment and decomposition (RQADe) analyzer (28) shown in
A meta learning engine (29) in the casual inference and meta learning sub-system (25) may perform Bayesian optimization to identify one or more changes in the hyper-parameters of the predictive sub-system (24) (e.g., alert thresholds, maximum number of allowable alarms per day, frequency of ordering of additional labs and/or the like) to maximize the impact of the clinical recommendations (23) output by the predictive sub-system (24) on the clinical outcome of interest. In some example embodiments, the meta learning engine (29) module may fine-tune the hyper-parameters of the predictive sub-system (24) for a specific subset of patients as defined, for example, by their phenotypes, care settings, diagnostic related grouping (DRG), and/or the like.
Referring back to
In some example embodiments, to determine the clinical recommendation (23) including the ordering of additional labs (22), the decision analyzer (9) may perform a cluster analysis, such as a k-nearest neighbour (KNN) search, using the representation of a patient's clinical data generated by the encoder (4) to identify one or more similar patients (e.g., a top k number of similar patients). Among these similar patients, the decision analyzer (9) may identify those with the lowest prediction uncertainty and determine the most important features (e.g., using a “feature importance” ranking technique) that are not missing from the clinical data of these patients but might have been missing the clinical data of the original patient. In this context, an important feature may be a clinical observation having a large effect on the risk score (15) output of the predictor (5) as well as the output of the decision analyzer (19). The decision analyzer (9) may determine to order additional labs (22) providing features in order to reduce the prediction uncertainty (16) and, by corollary, reduce the likelihood of a false positive (18) or false negative (19) associated with the risk score (15).
In some example embodiment, the decision analyzer (9) may identify the aforementioned most important features by systematically and iteratively altering the input features (e.g., using gradient descent) to change output of the predictor (5). For example, if the risk score (15) of the patient is 0.4 and the decision threshold applied by the stratification controller (3) is 0.5, a gradient descent approach may be applied in which the input features are altered to increase the risk score (15) above the 0.5 decision threshold. The features are then sorted according to the magnitude of their respective changes from the corresponding baseline values. The decision analyzer (9) (or some other logic unit) may use this information, in addition to information about the age of each feature, to determine which clinical observations are needed to improve model confidence. A set of features that are most likely to change the output of the decision analyzer (9) may be identified based on a ranking of the altered input features. Additional labs (22) providing the corresponding clinical observations may be ordered to reduce the prediction uncertainty of the decision analyzer (9) and, by corollary, reduce the likelihood of a false positive (18) or false negative (19) associated with the risk score (15).
At 702, the patient stratification system 100 may determine a conformity of a clinical data of a patient. In some example embodiments, the patient stratification system 100, for example, the data analyzer (2) of the predictor sub-system (24), may determine whether the clinical data of a patient from the clinical data source (1) exhibits sufficient conformity to one or more trust sets to support the generation of the clinical recommendation (23). For example, as shown in
At 704, the patient stratification system 100 may respond to the clinical data of the patient exhibiting sufficient conformity by at least determining, based at least on the clinical data of the patient, a risk score for the patient. In some example embodiments, where the clinical data of the patient is determined to exhibit sufficient conformity to the control conformal set or the case conformal set, the predictor (5) may determine, based at least on the clinical data of the patient, a risk score for the patient. In some cases, the risk score may be a probability (e.g., between 0 and 1) indicative of a diagnosis for the patient such as the patient's risk of physical decompensation. Contrastingly, where the clinical data of the patient is an outlier with respect to the control conformal set and the case conformal set, the data analyzer (2) may reject the clinical data for further analysis at least because the clinical data cannot support an accurate determination of the clinical recommendations (23).
At 706, the patient stratification system 100 may respond to the risk score exceeding a threshold by at least determining a first probability of the risk score being a false positive. In some example embodiments, the risk score of the patient output by the data analyzer (2) may be passed to the stratification controller (3) for further analysis. For example, as shown in
At 708, the patient stratification system 100 may respond to the risk score failing to exceed the threshold by at least determining a second probability of the risk score being a false negative. Alternatively, where the risk score of the patient does not exceed the decision threshold, the stratification controller (3) may utilize the false negative network (8) to determine if the patient's risk score indicates a potential false negative or true negative. As shown in
At 710, the patient stratification system 100 may determine, based at least on the risk score, the first probability of the risk score being a false positive, and the second probability of the risk score being a false negative, one or more clinical recommendations for the patient. In some example embodiments, the decision analyzer (9) of the stratification controller (3) may be a context aware (e.g., care level such as emergency, general wards, or intensive care) decision tree that combines information from the predictor (5), the false positive network (7), and the false negative network (8) to generate the clinical recommendation (23). Examples of the clinical recommendation (23) include notifying clinician (20), enrolling in a clinical trial (21), and ordering additional labs (22).
At 712, the patient stratification system 100 may adjust one or more hyper-parameters associated with determining the one or more clinical recommendations based at least on a difference in a measured clinical outcome and an expected clinical outcome for the patient. In some example embodiments, the casual inference and meta learning (CaMeL) sub-system (25) of the patient stratification system 100 may be configured to provide feedback on the performance of the predictive sub-system (24). Dong so may enable an end-to-end optimization of the hyper-parameters of the predictive sub-system (24) including, for example, alert thresholds for various patient groups, maximum number of allowable alarms per time period, the frequency of ordering of additional labs, and/or the like. For example, as shown in
As shown in
One or more aspects or features of the subject matter described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs, field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/or combinations thereof. These various aspects or features can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device. The programmable system or computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
These computer programs, which can also be referred to as programs, software, software applications, applications, components, or code, include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the term “machine-readable medium” refers to any computer program product, apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor. The machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid-state memory or a magnetic hard drive or any equivalent storage medium. The machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example, as would a processor cache or other random access memory associated with one or more physical processor cores.
To provide for interaction with a user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including acoustic, speech, or tactile input. Other possible input devices include touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive track pads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.
In the descriptions above and in the claims, phrases such as “at least one of” or “one or more of” may occur followed by a conjunctive list of elements or features. The term “and/or” may also occur in a list of two or more elements or features. Unless otherwise implicitly or explicitly contradicted by the context in which it is used, such a phrase is intended to mean any of the listed elements or features individually or any of the recited elements or features in combination with any of the other recited elements or features. For example, the phrases “at least one of A and B;” “one or more of A and B;” and “A and/or B” are each intended to mean “A alone, B alone, or A and B together.” A similar interpretation is also intended for lists including three or more items. For example, the phrases “at least one of A, B, and C;” “one or more of A, B, and C;” and “A, B, and/or C” are each intended to mean “A alone, B alone, C alone, A and B together, A and C together, B and C together, or A and B and C together.” Use of the term “based on,” above and in the claims is intended to mean, “based at least in part on,” such that an unrecited feature or element is also permissible.
The subject matter described herein can be embodied in systems, apparatus, methods, and/or articles depending on the desired configuration. The implementations set forth in the foregoing description do not represent all implementations consistent with the subject matter described herein. Instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and subcombinations of the disclosed features and/or combinations and subcombinations of several further features disclosed above. In addition, the logic flows depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. For example, the logic flows may include different and/or additional operations than shown without departing from the scope of the present disclosure. One or more operations of the logic flows may be repeated and/or omitted without departing from the scope of the present disclosure. Other implementations may be within the scope of the following claims.
This application claims priority to U.S. Provisional Application No. 63/227,885, entitled “METHODS FOR ACCURATE PATIENT STRATIFICATION USING DEEP LEARNING PREDICTIVE MODELS” and filed on Jul. 30, 2021, the disclosure of which is incorporated herein by reference in its entirety.
This invention was made with government support under ES025445 and LM013517, awarded by the National Institute of Health (NIH). The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/038926 | 7/29/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63227885 | Jul 2021 | US |