Acute Respiratory Distress Syndrome (ARDS) is respiratory failure with rapid onset of widespread inflammation in the lungs. In many scenarios, ARDS is not triggered by a single pathology as it can be caused by sepsis, pneumonia, trauma, aspiration, pancreatitis, and/or other insults. Therefore, ARDS patients are often not responsive to certain therapies, given the underlying differences in pathologies. Prior attempts to distinguish ARDS patients have implemented machine learning classifier models that are complex (e.g., they use up to 40 predictor variables). For example, in Calfee C.S. et al (2014) Subphenotypes in acute respiratory distress syndrome: latent class analysis of data from two randomised controlled trials. The Lancet Respiratory Medicine 2:611-620, the authors describe models that use biomarkers and other variables that are not easily and readily available at the bedside, which makes generalizability of these models very limited.
Disclosed herein are methods, non-transitory computer readable media, and systems for subphenotyping acute respiratory distress syndrome (ARDS) patients by analyzing corresponding electronic health data (EHR) using a patient subphenotype classifier. For example, using a patient subphenotype classifier, the ARDS subjects can be classified into one out of two or more ARDS subphenotypes, examples of which include an ARDS subphenotype characterized by hyperinflammation and an ARDS subphenotype characterized by hypoinflammation. Depending on the particular ARDS subphenotype determined for a subject, a treatment recommendation can be selected and provided to the subject. Here, the patient subphenotype classifiers analyze EHR data without necessarily analyzing other variables (e.g., biomarker values) that would problematically increase the complexity of the model. Thus, such patient subphenotype classifiers can be rapidly deployed on readily obtainable EHR data, thereby enabling their implementation in settings where time is of the essence (e.g., in hospital intensive care units and/or emergency rooms).
Disclosed herein is a method comprising: obtaining or having obtained electronic health record (EHR) data for a subject exhibiting acute respiratory distress syndrome (ARDS); and determining a classification of the subject selected from two or more subphenotypes by analyzing, using a patient subphenotype classifier, the EHR data for the subject without analyzing biomarker levels of the subject. In various embodiments, the patient subphenotype classifier receives one or more input variables comprising heart rate, mean arterial pressure, and respiratory rate. In various embodiments, the patient subphenotype classifier receives each of the input variables of heart rate, mean arterial pressure, and respiratory rate. In various embodiments, the patient subphenotype classifier further receives one or more input variables comprising arterial pH, partial pressure of oxygen, and bicarbonate. In various embodiments, the patient subphenotype classifier further receives each of the input variables comprising arterial pH, partial pressure of oxygen, and bicarbonate. In various embodiments, the patient subphenotype classifier further receives one or more input variables comprising inspirited fraction of oxygen, creatinine, and bilirubin. In various embodiments, the patient subphenotype classifier further receives each of the input variables comprising inspirited fraction of oxygen, creatinine, and bilirubin. In various embodiments, the patient subphenotype classifier further receives one or more input variables comprising partial pressure of carbon dioxide, PaO2/FiO2, platelet count, age, gender, positive end-expiratory pressure, and tidal volume. In various embodiments, the patient subphenotype classifier further receives each of the input variables comprising partial pressure of carbon dioxide, PaO2/FiO2, platelet count, age, gender, positive end-expiratory pressure, and tidal volume. In various embodiments, the patient subphenotype classifier further receives one or more input variables comprising body mass index, plateau pressure, minute ventilation, and vasopressor use in prior 24 hours. In various embodiments, the patient subphenotype classifier further receives each of the input variables comprising body mass index, plateau pressure, minute ventilation, and vasopressor use in prior 24 hours.
In various embodiments, the patient subphenotype classifier comprises a subphenotyping submodel that outputs a prediction for an ARDS subphenotype. In various embodiments, the patient subphenotype classifier comprises a mortality submodel that outputs a prediction of an ARDS mortality rate. In various embodiments, the patient subphenotype classifier comprises: (A) a subphenotyping submodel that outputs a prediction for an ARDS subphenotype; and (B) a mortality submodel that outputs a prediction of an ARDS mortality rate. In various embodiments, the prediction for the ARDS subphenotype outputted by the subphenotyping submodel serves as an input to the mortality submodel. In various embodiments, the subphenotyping submodel receives one or more input variables comprising the subject’s arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FiO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, the subphenotyping submodel receives each of the input variables of the subject’s arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FIO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2).
In various embodiments, implementation of the subphenotyping submodel comprises implementing an unsupervised clustering algorithm. In various embodiments, the mortality submodel receives input variables comprising the subject’s gender and age. In various embodiments, the mortality submodel receives input variables comprising the subject’s bilirubin, partial pressure of carbon dioxide (PaCO2), PaO2/FiO2, positive end expiratory pressure (PEEP), platelet count, and tidal volume. In various embodiments, the mortality submodel receives input variables comprising the subject’s arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FiO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, the mortality submodel receives 10 or more input variables comprising the prediction for the ARDS subphenotype outputted by the subphenotyping submodel, the subject’s gender, age, bilirubin, partial pressure of carbon dioxide (PaCO2), PaO2/FiO2, positive end expiratory pressure (PEEP), platelet count, tidal volume, and BMI. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.689 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.650.
In various embodiments, the mortality submodel receives 9 or more input variables comprising the prediction for the ARDS subphenotype outputted by the subphenotyping submodel, the subject’s gender, age, bilirubin, partial pressure of carbon dioxide (PaCO2), PaO2/FiO2, positive end expiratory pressure (PEEP), platelet count, and tidal volume. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.673 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.668. In various embodiments, the mortality submodel receives 12 or more input variables comprising the prediction for the ARDS subphenotype outputted by the subphenotyping submodel, the subject’s gender, age, bilirubin, arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FIO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.658 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.597. In various embodiments, the mortality submodel receives 11 or more input variables comprising the prediction for the ARDS subphenotype outputted by the subphenotyping submodel, the subject’s gender, age, arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FiO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.643 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.532.
In various embodiments, implementation of the mortality submodel comprises implementing a supervised machine learning algorithm. In various embodiments, determining the classification of the subject based on the EHR data using the patient subphenotype classifier comprises: determining that data elements of a higher rank mortality submodel are unavailable in the EHR data; and determining that data elements of the mortality submodel are available in the EHR data. In various embodiments, determining the classification of the subject based on the EHR data using the patient subphenotype classifier comprises implementing the mortality submodel responsive to determining that data elements of the mortality submodel are available in the EHR data.
In various embodiments, the mortality submodel comprises two or more sub-models that each outputs a prediction informative for determining an ARDS mortality rate. In various embodiments, the first sub-model receives input variables comprising a first prediction for the ARDS subphenotype outputted by the subphenotyping submodel and the second sub-model receives input variables comprising a second prediction for the ARDS subphenotype outputted by the subphenotyping submodel. In various embodiments, the first sub-model receives input variables further comprising the subject’s bilirubin. In various embodiments, the second sub-model receives input variables further comprising the subject’s bilirubin, partial pressure of carbon dioxide (PaCO2), PaO2/FiO2, positive end expiratory pressure (PEEP), platelet count, and tidal volume. In various embodiments, the subphenotyping submodel comprises two or more sub-models that each outputs a prediction of an ARDS subphenotype.
In various embodiments, implementation of the two or more sub-models comprises implementing unsupervised clustering algorithms. In various embodiments, the patient subphenotype classifier further comprises a pre-mortality model that outputs a prediction that serves as input to the mortality submodel. In various embodiments, implementation of the pre-mortality model comprises implementing a supervised machine learning algorithm.
In various embodiments, the mortality submodel receives, as input, 8 or more input variables. In various embodiments, the 8 or more input variables comprise at least the subject’s arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FiO2), and heart rate. In various embodiments, the 8 or more input variables further comprise at least the subject’s airway pressure, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, the patient subphenotype classifier comprises one of a first model, a second model, a third model, and a fourth model, wherein the first model receives, as input, 13 input variables, wherein the second model receives, as input, 8 input variables, wherein the third model receives, as input, 17 input variables, and wherein the fourth model receives, as input, 13 input variables. In various embodiments, the 13 input variables of the first model comprise the subject’s arterial pH, bicarbonate, creatinine, diastolic blood pressure (BP), FiO2, heart rate, highest mean arterial pressure, lowest mean arterial pressure, potassium, highest respiratory rate, lowest respiratory rate, SPO2, and systolic BP. In various embodiments, the 13 input variables of the first model comprise the subject’s most recent arterial pH, lowest bicarbonate, most recent creatinine, most recent diastolic blood pressure (BP), most recent FiO2, most recent heart rate, highest mean arterial pressure, lowest mean arterial pressure, most recent potassium, highest respiratory rate, lowest respiratory rate, most recent SPO2, and most recent systolic BP. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.67 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.40.
In various embodiments, the 8 input variables of the second model comprise the subject’s arterial pH, bicarbonate, creatinine, FiO2, heart rate, PaO2, mean arterial pressure, and respiratory rate. In various embodiments, the 8 input variables of the second model comprise the subject’s most recent arterial pH, lowest bicarbonate, most recent creatinine, most recent FiO2, most recent heart rate, most recent PaO2, most recent mean arterial pressure, and most recent respiratory rate. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.69 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.42.
In various embodiments, the 17 input variables of the third model comprise the subject’s age, arterial pH, bicarbonate, bilirubin, BMI, creatinine, FiO2, gender, heart rate, PaCO2, PaO2/FiO2, PaO2, positive end-expiratory pressure (PEEP), platelet count, tidal volume, mean arterial pressure, and respiratory rate. In various embodiments, the 17 input variables of the third model comprise the subject’s age, most recent arterial pH, lowest bicarbonate, highest bilirubin, BMI, most recent creatinine, most recent FiO2, gender, most recent heart rate, most recent PaCO2, lowest PaO2/FiO2 within 24 hours following ARDS diagnosis, most recent PaO2, most recent positive end-expiratory pressure (PEEP), lowest platelet count, lowest tidal volume, most recent mean arterial pressure, and most recent respiratory rate. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.71 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.62. In various embodiments, the 13 input variables of the fourth model comprise the subject’s arterial pH, bicarbonate, BMI, creatinine, FiO2, gender, heart rate, PaCO2, PaO2/FiO2, PEEP, platelet count, mean arterial pressure, and respiratory rate. In various embodiments, the 13 input variables of the fourth model comprise the subject’s most recent arterial pH, most recent bicarbonate, BMI, most recent creatinine, most recent FiO2, gender, most recent heart rate, most recent PaCO2, lowest PaO2/FiO2 within 24 hours following ARDS diagnosis, most recent PEEP, lowest platelet count, most recent mean arterial pressure, and most recent respiratory rate. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.67 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.46.
In various embodiments, the classification of the subject is selected from three or more subphenotypes. In various embodiments, the three or more subphenotypes comprise a lower risk subphenotype, a medium risk subphenotype, and a high risk subphenotype. In various embodiments, the classification of the subject is selected from three by comparing a score to two threshold values. In various embodiments, the patient subphenotype classifier has at least an area under receiver-operator curve (AUROC) greater than or equal to 0.691.
In various embodiments, the patient subphenotype classifier is trained using a training dataset comprising patient data from one or more clinical trial datasets. In various embodiments, the one or more clinical trial datasets are any of ARMA dataset, KARMA dataset, LARMA dataset, ALVEOLI dataset, EDEN dataset, FACTT dataset, SAILS dataset, ROSE dataset, eICU-CRD dataset, and the Brazillian ART dataset. In various embodiments, the patient data is derived from a sub-cohort of patients of the one or more clinical trial datasets, wherein the sub-cohort of patients are characterized by having a ratio of arterial oxygen concentration to the fraction of inspired oxygen (P/F ratio) of less than or equal to 200. In various embodiments, the patient data is derived from a sub-cohort of patients of the one or more clinical trial datasets, wherein the sub-cohort of patients are characterized by having a ratio of arterial oxygen concentration to the fraction of inspired oxygen (P/F ratio) of less than or equal to 300.
In various embodiments, the two or more subphenotypes comprise subphenotype A and subphenotype B that are characterized by differences in expression levels in one or more biomarkers. In various embodiments, the one or more biomarkers comprise one or more of PAI-1, IL-6, IL-8, IL-10, TNFR-I, TNFR-II, ICAM-1, or von Willebrand factor. In various embodiments, the one or more biomarkers comprise each of PAI-1, IL-6, IL-8, IL-10, TNFR-I, TNFR-II, ICAM-1, or von Willebrand factor.
Additionally disclosed herein is a method for identifying a mortality prognosis for a subject, the method comprising: obtaining a classification of the subject exhibiting acute respiratory distress syndrome (ARDS), the classification of the subject selected from two or more subphenotypes and determined using methods disclosed herein; and identifying a mortality prognosis for the subject based at least in part on the classification, wherein responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes, the mortality prognosis identified for the subject comprises high mortality risk, and wherein responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes, the mortality prognosis identified for the subject comprises low mortality risk. In various embodiments, low mortality risk comprises at least one of reduced risk of hospital mortality, reduced risk of ICU mortality, reduced risk of 28-day mortality, reduced risk of 90-day mortality, reduced risk of 180-day mortality, and reduced risk of 6-month mortality relative to high mortality risk. In various embodiments, low mortality risk further comprises positive patient outcome, wherein high mortality risk further comprises negative patient outcome, and wherein positive patient outcome comprises at least one of shorter hospital length of stay, shorter ICU length of stay and more ventilator-free days relative to negative patient outcome.
Additionally disclosed herein is a method for identifying a therapy recommendation for a subject, the method comprising: obtaining a classification of a subject exhibiting acute respiratory distress syndrome (ARDS), the classification of the subject selected from two or more subphenotypes and determined using methods disclosed herein; and identifying a therapy recommendation for the subject based at least in part on the classification, wherein responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes, the therapy recommendation identified for the subject comprises one or more of neuromuscular blockade (NMB) therapy or no NMB therapy, high PEEP or low PEEP, no treatment or methylprednisolone, dexamethasone, no lisofylline, ketoconazole, catheter and fluid treatment, recruitment maneuver, statins, or full or trophic enteral feeding and wherein responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes, the therapy recommendation identified for the subject comprises one or more of NMB therapy, low PEEP therapy, no methylprednisolone, no treatment or dexamethasone, no treatment or lisofylline, no treatment or ketoconazole, no combination of catheter and fluid treatment, no recruitment maneuver, statins as a preemptive therapy, or full enteral feeding.
Additionally disclosed herein is a method for identifying candidate subjects to be provided a therapy, the method comprising: for one or more subjects, obtaining a classification of the subject exhibiting acute respiratory distress syndrome (ARDS), the classification of the subject selected from two or more subphenotypes and determined using methods disclosed herein; and determining whether the subject is a candidate subject based at least in part on the classification. In various embodiments, the therapy is a neuromuscular blockade (NMB) therapy, and wherein determining whether the subject is a candidate subject comprises determining that the subject is a likely responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is a neuromuscular blockade (NMB) therapy, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a low positive end-expiratory pressure (PEEP) treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is a high positive end-expiratory pressure (PEEP) treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a corticosteroid treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a corticosteroid treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the corticosteroid treatment is methylpredinosolone or dexamethasone. In various embodiments, the therapy is a lisofylline treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a lisofylline treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is a ketoconazole treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a pulmonary artery catheter and liberal fluid treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a pulmonary artery catheter and liberal fluid treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the catheter and fluid treatment comprises a central venous catheter line treatment or a pulmonary artery catheter line treatment. In various embodiments, the therapy is a recruitment maneuver, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a recruitment maneuver, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is a statin treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a preemptive statin treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is full enteral feeding, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is trophic enteral feeding, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes.
Additionally disclosed herein is a non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain or have obtained electronic health record (EHR) data for a subject exhibiting acute respiratory distress syndrome (ARDS); and determine a classification of the subject selected from two or more subphenotypes by analyzing, using a patient subphenotype classifier, the EHR data for the subject without analyzing biomarker levels of the subject. In various embodiments, the patient subphenotype classifier receives one or more input variables comprising heart rate, mean arterial pressure, and respiratory rate. In various embodiments, the patient subphenotype classifier receives each of the input variables of heart rate, mean arterial pressure, and respiratory rate. In various embodiments, the patient subphenotype classifier further receives one or more input variables comprising arterial pH, partial pressure of oxygen, and bicarbonate. In various embodiments, the patient subphenotype classifier further receives each of the input variables comprising arterial pH, partial pressure of oxygen, and bicarbonate. In various embodiments, the patient subphenotype classifier further receives one or more input variables comprising inspirited fraction of oxygen, creatinine, and bilirubin. In various embodiments, the patient subphenotype classifier further receives each of the input variables comprising inspirited fraction of oxygen, creatinine, and bilirubin. In various embodiments, the patient subphenotype classifier further receives one or more input variables comprising partial pressure of carbon dioxide, PaO2/FiO2, platelet count, age, gender, positive end-expiratory pressure, and tidal volume. In various embodiments, the patient subphenotype classifier further receives each of the input variables comprising partial pressure of carbon dioxide, PaO2/FiO2, platelet count, age, gender, positive end-expiratory pressure, and tidal volume. In various embodiments, the patient subphenotype classifier further receives one or more input variables comprising body mass index, plateau pressure, minute ventilation, and vasopressor use in prior 24 hours. In various embodiments, the patient subphenotype classifier further receives each of the input variables comprising body mass index, plateau pressure, minute ventilation, and vasopressor use in prior 24 hours.
In various embodiments, the patient subphenotype classifier comprises a subphenotyping submodel that outputs a prediction for an ARDS subphenotype. In various embodiments, the patient subphenotype classifier comprises a mortality submodel that outputs a prediction of an ARDS mortality rate. In various embodiments, the patient subphenotype classifier comprises: (A) a subphenotyping submodel that outputs a prediction for an ARDS subphenotype; and (B) a mortality submodel that outputs a prediction of an ARDS mortality rate. In various embodiments, the prediction for the ARDS subphenotype outputted by the subphenotyping submodel serves as an input to the mortality submodel. In various embodiments, the subphenotyping submodel receives one or more input variables comprising the subject’s arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FiO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, the subphenotyping submodel receives each of the input variables of the subject’s arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FIO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, implementation of the subphenotyping submodel comprises implementing an unsupervised clustering algorithm. In various embodiments, the mortality submodel receives input variables comprising the subject’s gender and age. In various embodiments, the mortality submodel receives input variables comprising the subject’s bilirubin, partial pressure of carbon dioxide (PaCO2), PaO2/FiO2, positive end expiratory pressure (PEEP), platelet count, and tidal volume. In various embodiments, the mortality submodel receives input variables comprising the subject’s arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FiO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, the mortality submodel receives 10 or more input variables comprising the prediction for the ARDS subphenotype outputted by the subphenotyping submodel, the subject’s gender, age, bilirubin, partial pressure of carbon dioxide (PaCO2), PaO2/FiO2, positive end expiratory pressure (PEEP), platelet count, tidal volume, and BMI. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.689 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.650.
In various embodiments, the mortality submodel receives 9 or more input variables comprising the prediction for the ARDS subphenotype outputted by the subphenotyping submodel, the subject’s gender, age, bilirubin, partial pressure of carbon dioxide (PaCO2), PaO2/FiO2, positive end expiratory pressure (PEEP), platelet count, and tidal volume. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.673 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.668.
In various embodiments, the mortality submodel receives 12 or more input variables comprising the prediction for the ARDS subphenotype outputted by the subphenotyping submodel, the subject’s gender, age, bilirubin, arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FIO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.658 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.597. In various embodiments, the mortality submodel receives 11 or more input variables comprising the prediction for the ARDS subphenotype outputted by the subphenotyping submodel, the subject’s gender, age, arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FiO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.643 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.532.
In various embodiments, implementation of the mortality submodel comprises implementing a supervised machine learning algorithm. In various embodiments, the instructions that cause the processor to determine the classification of the subject based on the EHR data using the patient subphenotype classifier further comprises instructions that, when executed by the processor, cause the processor to: determine that data elements of a higher rank mortality submodel are unavailable in the EHR data; and determine that data elements of the mortality submodel are available in the EHR data. In various embodiments, the instructions that cause the processor to determine the classification of the subject based on the EHR data using the patient subphenotype classifier further comprises instructions that, when executed by the processor, cause the processor to implement the mortality submodel responsive to determining that data elements of the mortality submodel are available in the EHR data. In various embodiments, the mortality submodel comprises two or more sub-models that each outputs a prediction informative for determining an ARDS mortality rate. In various embodiments, the first sub-model receives input variables comprising a first prediction for the ARDS subphenotype outputted by the subphenotyping submodel and the second sub-model receives input variables comprising a second prediction for the ARDS subphenotype outputted by the subphenotyping submodel. In various embodiments, the first sub-model receives input variables further comprising the subject’s bilirubin. In various embodiments, the second sub-model receives input variables further comprising the subject’s bilirubin, partial pressure of carbon dioxide (PaCO2), PaO2/FiO2, positive end expiratory pressure (PEEP), platelet count, and tidal volume. In various embodiments, the subphenotyping submodel comprises two or more sub-models that each outputs a prediction of an ARDS subphenotype.
In various embodiments, implementation of the two or more sub-models comprises implementing unsupervised clustering algorithms. In various embodiments, the patient subphenotype classifier further comprises a pre-mortality model that outputs a prediction that serves as input to the mortality submodel. In various embodiments, implementation of the pre-mortality model comprises implementing a supervised machine learning algorithm.
In various embodiments, the mortality submodel receives, as input, 8 or more input variables. In various embodiments, the 8 or more input variables comprise at least the subject’s arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FiO2), and heart rate. In various embodiments, the 8 or more input variables further comprise at least the subject’s airway pressure, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, the patient subphenotype classifier comprises one of a first model, a second model, a third model, and a fourth model, wherein the first model receives, as input, 13 input variables, wherein the second model receives, as input, 8 input variables, wherein the third model receives, as input, 17 input variables, and wherein the fourth model receives, as input, 13 input variables. In various embodiments, the 13 input variables of the first model comprise the subject’s arterial pH, bicarbonate, creatinine, diastolic blood pressure (BP), FiO2, heart rate, highest mean arterial pressure, lowest mean arterial pressure, potassium, highest respiratory rate, lowest respiratory rate, SPO2, and systolic BP. In various embodiments, the 13 input variables of the first model comprise the subject’s most recent arterial pH, lowest bicarbonate, most recent creatinine, most recent diastolic blood pressure (BP), most recent FiO2, most recent heart rate, highest mean arterial pressure, lowest mean arterial pressure, most recent potassium, highest respiratory rate, lowest respiratory rate, most recent SPO2, and most recent systolic BP. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.67 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.40.
In various embodiments, the 8 input variables of the second model comprise the subject’s arterial pH, bicarbonate, creatinine, FiO2, heart rate, PaO2, mean arterial pressure, and respiratory rate. In various embodiments, the 8 input variables of the second model comprise the subject’s most recent arterial pH, lowest bicarbonate, most recent creatinine, most recent FiO2, most recent heart rate, most recent PaO2, most recent mean arterial pressure, and most recent respiratory rate. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.69 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.42. In various embodiments, the 17 input variables of the third model comprise the subject’s age, arterial pH, bicarbonate, bilirubin, BMI, creatinine, FiO2, gender, heart rate, PaCO2, PaO2/FiO2, PaO2, positive end-expiratory pressure (PEEP), platelet count, tidal volume, mean arterial pressure, and respiratory rate. In various embodiments, the 17 input variables of the third model comprise the subject’s age, most recent arterial pH, lowest bicarbonate, highest bilirubin, BMI, most recent creatinine, most recent FiO2, gender, most recent heart rate, most recent PaCO2, lowest PaO2/FiO2 within 24 hours following ARDS diagnosis, most recent PaO2, most recent positive end-expiratory pressure (PEEP), lowest platelet count, lowest tidal volume, most recent mean arterial pressure, and most recent respiratory rate. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.71 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.62. In various embodiments, the 13 input variables of the fourth model comprise the subject’s arterial pH, bicarbonate, BMI, creatinine, FiO2, gender, heart rate, PaCO2, PaO2/FiO2, PEEP, platelet count, mean arterial pressure, and respiratory rate. In various embodiments, the 13 input variables of the fourth model comprise the subject’s most recent arterial pH, most recent bicarbonate, BMI, most recent creatinine, most recent FiO2, gender, most recent heart rate, most recent PaCO2, lowest PaO2/FiO2 within 24 hours following ARDS diagnosis, most recent PEEP, lowest platelet count, most recent mean arterial pressure, and most recent respiratory rate. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.67 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.46.
In various embodiments, the classification of the subject is selected from three or more subphenotypes. In various embodiments, the three or more subphenotypes comprise a lower risk subphenotype, a medium risk subphenotype, and a high risk subphenotype. In various embodiments, the classification of the subject is selected from three by comparing a score to two threshold values. In various embodiments, the patient subphenotype classifier has at least an area under receiver-operator curve (AUROC) greater than or equal to 0.691.
In various embodiments, the patient subphenotype classifier is trained using a training dataset comprising patient data from one or more clinical trial datasets. In various embodiments, the one or more clinical trial datasets are any of ARMA dataset, KARMA dataset, LARMA dataset, ALVEOLI dataset, EDEN dataset, FACTT dataset, SAILS dataset, eICU-CRD dataset, and the Brazillian ART dataset. In various embodiments, the patient data is derived from a sub-cohort of patients of the one or more clinical trial datasets, wherein the sub-cohort of patients are characterized by having a ratio of arterial oxygen concentration to the fraction of inspired oxygen (P/F ratio) of less than or equal to 200. In various embodiments, the patient data is derived from a sub-cohort of patients of the one or more clinical trial datasets, wherein the sub-cohort of patients are characterized by having a ratio of arterial oxygen concentration to the fraction of inspired oxygen (P/F ratio) of less than or equal to 300.
In various embodiments, the two or more subphenotypes comprise subphenotype A and subphenotype B that are characterized by differences in expression levels in one or more biomarkers. In various embodiments, the one or more biomarkers comprise one or more of PAI-1, IL-6, IL-8, IL-10, TNFR-I, TNFR-II, ICAM-1, or von Willebrand factor. In various embodiments, the one or more biomarkers comprise each of PAI-1, IL-6, IL-8, IL-10, TNFR-I, TNFR-II, ICAM-1, or von Willebrand factor.
Additionally disclosed herein is a non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain a classification of the subject exhibiting acute respiratory distress syndrome (ARDS), the classification of the subject selected from two or more subphenotypes and determined using a non-transitory computer readable medium disclosed herein; and identify a mortality prognosis for the subject based at least in part on the classification, wherein responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes, the mortality prognosis identified for the subject comprises high mortality risk, and wherein responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes, the mortality prognosis identified for the subject comprises low mortality risk. In various embodiments, low mortality risk comprises at least one of reduced risk of hospital mortality, reduced risk of ICU mortality, reduced risk of 28-day mortality, reduced risk of 90-day mortality, reduced risk of 180-day mortality, and reduced risk of 6-month mortality relative to high mortality risk. In various embodiments, low mortality risk further comprises positive patient outcome, wherein high mortality risk further comprises negative patient outcome, and wherein positive patient outcome comprises at least one of shorter hospital length of stay, shorter ICU length of stay and more ventilator-free days relative to negative patient outcome.
Additionally disclosed herein is a non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain a classification of a subject exhibiting acute respiratory distress syndrome (ARDS), the classification of the subject selected from two or more subphenotypes and determined using a non-transitory computer readable medium disclosed herein; and identify a therapy recommendation for the subject based at least in part on the classification, wherein responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes, the therapy recommendation identified for the subject comprises one or more of neuromuscular blockade (NMB) therapy or no NMB therapy, high PEEP or low PEEP, no treatment or methylprednisolone, dexamethasone, no lisofylline, ketoconazole, catheter and fluid treatment, recruitment maneuver, statins, or full or trophic enteral feeding and wherein responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes, the therapy recommendation identified for the subject comprises one or more of NMB therapy, low PEEP therapy, no methylprednisolone, no treatment or dexamethasone, no treatment or lisofylline, no treatment or ketoconazole, no combination of catheter and fluid treatment, no recruitment maneuver, statins as a preemptive therapy, or full enteral feeding.
Additionally disclosed herein is a non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: for one or more subjects, obtain a classification of the subject exhibiting acute respiratory distress syndrome (ARDS), the classification of the subject selected from two or more subphenotypes and determined using a non-transitory computer readable medium disclosed herein; and determine whether the subject is a candidate subject based at least in part on the classification. In various embodiments, the therapy is a neuromuscular blockade (NMB) therapy, and wherein determining whether the subject is a candidate subject comprises determining that the subject is a likely responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is a neuromuscular blockade (NMB) therapy, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a low positive end-expiratory pressure (PEEP) treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is a high positive end-expiratory pressure (PEEP) treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a corticosteroid treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a corticosteroid treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the corticosteroid treatment is methylpredinosolone or dexamethasone. In various embodiments, the therapy is a lisofylline treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a lisofylline treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is a ketoconazole treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a pulmonary artery catheter and liberal fluid treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a pulmonary artery catheter and liberal fluid treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the catheter and fluid treatment comprises a central venous catheter line treatment or a pulmonary artery catheter line treatment. In various embodiments, the therapy is a recruitment maneuver, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a recruitment maneuver, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is a statin treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a preemptive statin treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is full enteral feeding, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is trophic enteral feeding, and wherein determining whether the subject is a candidate subject comprising determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes.
Additionally, disclosed herein is a system comprising: a storage memory configured to store electronic health record (EHR) data for a subject exhibiting acute respiratory distress syndrome (ARDS); and a processor communicatively coupled to the storage memory to determine a classification of the subject selected from two or more subphenotypes by analyzing, using a patient subphenotype classifier, the EHR data for the subject without analyzing biomarker levels of the subject. In various embodiments, the patient subphenotype classifier receives one or more input variables comprising heart rate, mean arterial pressure, and respiratory rate. In various embodiments, the patient subphenotype classifier receives each of the input variables of heart rate, mean arterial pressure, and respiratory rate. In various embodiments, the patient subphenotype classifier further receives one or more input variables comprising arterial pH, partial pressure of oxygen, and bicarbonate. In various embodiments, the patient subphenotype classifier further receives each of the input variables comprising arterial pH, partial pressure of oxygen, and bicarbonate. In various embodiments, the patient subphenotype classifier further receives one or more input variables comprising inspirited fraction of oxygen, creatinine, and bilirubin. In various embodiments, the patient subphenotype classifier further receives each of the input variables comprising inspirited fraction of oxygen, creatinine, and bilirubin. In various embodiments, the patient subphenotype classifier further receives one or more input variables comprising partial pressure of carbon dioxide, PaO2/FiO2, platelet count, age, gender, positive end-expiratory pressure, and tidal volume. In various embodiments, the patient subphenotype classifier further receives each of the input variables comprising partial pressure of carbon dioxide, PaO2/FiO2, platelet count, age, gender, positive end-expiratory pressure, and tidal volume. In various embodiments, the patient subphenotype classifier further receives one or more input variables comprising body mass index, plateau pressure, minute ventilation, and vasopressor use in prior 24 hours. In various embodiments, the patient subphenotype classifier further receives each of the input variables comprising body mass index, plateau pressure, minute ventilation, and vasopressor use in prior 24 hours. In various embodiments, the patient subphenotype classifier comprises a subphenotyping submodel that outputs a prediction for an ARDS subphenotype. In various embodiments, the patient subphenotype classifier comprises a mortality submodel that outputs a prediction of an ARDS mortality rate.
In various embodiments, the patient subphenotype classifier comprises: (A) a subphenotyping submodel that outputs a prediction for an ARDS subphenotype; and (B) a mortality submodel that outputs a prediction of an ARDS mortality rate. In various embodiments, the prediction for the ARDS subphenotype outputted by the subphenotyping submodel serves as an input to the mortality submodel. In various embodiments, the subphenotyping submodel receives one or more input variables comprising the subject’s arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FiO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, the subphenotyping submodel receives each of the input variables of the subject’s arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FIO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, implementation of the subphenotyping submodel comprises implementing an unsupervised clustering algorithm. In various embodiments, the mortality submodel receives input variables comprising the subject’s gender and age. In various embodiments, the mortality submodel receives input variables comprising the subject’s bilirubin, partial pressure of carbon dioxide (PaCO2), PaO2/FiO2, positive end expiratory pressure (PEEP), platelet count, and tidal volume. In various embodiments, the mortality submodel receives input variables comprising the subject’s arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FiO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2).
In various embodiments, the mortality submodel receives 10 or more input variables comprising the prediction for the ARDS subphenotype outputted by the subphenotyping submodel, the subject’s gender, age, bilirubin, partial pressure of carbon dioxide (PaCO2), PaO2/FiO2, positive end expiratory pressure (PEEP), platelet count, tidal volume, and BMI. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.689 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.650. In various embodiments, the mortality submodel receives 9 or more input variables comprising the prediction for the ARDS subphenotype outputted by the subphenotyping submodel, the subject’s gender, age, bilirubin, partial pressure of carbon dioxide (PaCO2), PaO2/FiO2, positive end expiratory pressure (PEEP), platelet count, and tidal volume. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.673 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.668.
In various embodiments, the mortality submodel receives 12 or more input variables comprising the prediction for the ARDS subphenotype outputted by the subphenotyping submodel, the subject’s gender, age, bilirubin, arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FIO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.658 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.597. In various embodiments, the mortality submodel receives 11 or more input variables comprising the prediction for the ARDS subphenotype outputted by the subphenotyping submodel, the subject’s gender, age, arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FiO2), heart rate, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.643 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.532. In various embodiments, implementation of the mortality submodel comprises implementing a supervised machine learning algorithm. In various embodiments, the instructions that cause the processor to determine the classification of the subject based on the EHR data using the patient subphenotype classifier further comprises instructions that, when executed by the processor, cause the processor to: determine that data elements of a higher rank mortality submodel are unavailable in the EHR data; and determine that data elements of the mortality submodel are available in the EHR data. In various embodiments, the instructions that cause the processor to determine the classification of the subject based on the EHR data using the patient subphenotype classifier further comprises instructions that, when executed by the processor, cause the processor to implement the mortality submodel responsive to determining that data elements of the mortality submodel are available in the EHR data. In various embodiments, the mortality submodel comprises two or more sub-models that each outputs a prediction informative for determining an ARDS mortality rate. In various embodiments, the first sub-model receives input variables comprising a first prediction for the ARDS subphenotype outputted by the subphenotyping submodel and the second sub-model receives input variables comprising a second prediction for the ARDS subphenotype outputted by the subphenotyping submodel. In various embodiments, the first sub-model receives input variables further comprising the subject’s bilirubin. In various embodiments, the second sub-model receives input variables further comprising the subject’s bilirubin, partial pressure of carbon dioxide (PaCO2), PaO2/FiO2, positive end expiratory pressure (PEEP), platelet count, and tidal volume. In various embodiments, the subphenotyping submodel comprises two or more sub-models that each outputs a prediction of an ARDS subphenotype. In various embodiments, implementation of the two or more sub-models comprises implementing unsupervised clustering algorithms. In various embodiments, the patient subphenotype classifier further comprises a pre-mortality model that outputs a prediction that serves as input to the mortality submodel. In various embodiments, implementation of the pre-mortality model comprises implementing a supervised machine learning algorithm.
In various embodiments, the mortality submodel receives, as input, 8 or more input variables. In various embodiments, the 8 or more input variables comprise at least the subject’s arterial pH, bicarbonate, creatinine, fraction of inspired oxygen (FiO2), and heart rate. In various embodiments, the 8 or more input variables further comprise at least the subject’s airway pressure, arterial pressure, respiration rate, and partial pressure of oxygen (PaO2). In various embodiments, the patient subphenotype classifier comprises one of a first model, a second model, a third model, and a fourth model, wherein the first model receives, as input, 13 input variables, wherein the second model receives, as input, 8 input variables, wherein the third model receives, as input, 17 input variables, and wherein the fourth model receives, as input, 13 input variables. In various embodiments, the 13 input variables of the first model comprise the subject’s arterial pH, bicarbonate, creatinine, diastolic blood pressure (BP), FiO2, heart rate, highest mean arterial pressure, lowest mean arterial pressure, potassium, highest respiratory rate, lowest respiratory rate, SPO2, and systolic BP. In various embodiments, the 13 input variables of the first model comprise the subject’s most recent arterial pH, lowest bicarbonate, most recent creatinine, most recent diastolic blood pressure (BP), most recent FiO2, most recent heart rate, highest mean arterial pressure, lowest mean arterial pressure, most recent potassium, highest respiratory rate, lowest respiratory rate, most recent SPO2, and most recent systolic BP. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.67 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.40. In various embodiments, the 8 input variables of the second model comprise the subject’s arterial pH, bicarbonate, creatinine, FiO2, heart rate, PaO2, mean arterial pressure, and respiratory rate. In various embodiments, the 8 input variables of the second model comprise the subject’s most recent arterial pH, lowest bicarbonate, most recent creatinine, most recent FiO2, most recent heart rate, most recent PaO2, most recent mean arterial pressure, and most recent respiratory rate. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.69 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.42. In various embodiments, the 17 input variables of the third model comprise the subject’s age, arterial pH, bicarbonate, bilirubin, BMI, creatinine, FiO2, gender, heart rate, PaCO2, PaO2/FiO2, PaO2, positive end-expiratory pressure (PEEP), platelet count, tidal volume, mean arterial pressure, and respiratory rate. In various embodiments, the 17 input variables of the third model comprise the subject’s age, most recent arterial pH, lowest bicarbonate, highest bilirubin, BMI, most recent creatinine, most recent FiO2, gender, most recent heart rate, most recent PaCO2, lowest PaO2/FiO2 within 24 hours following ARDS diagnosis, most recent PaO2, most recent positive end-expiratory pressure (PEEP), lowest platelet count, lowest tidal volume, most recent mean arterial pressure, and most recent respiratory rate. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.71 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.62. In various embodiments, the 13 input variables of the fourth model comprise the subject’s arterial pH, bicarbonate, BMI, creatinine, FiO2, gender, heart rate, PaCO2, PaO2/FiO2, PEEP, platelet count, mean arterial pressure, and respiratory rate. In various embodiments, the 13 input variables of the fourth model comprise the subject’s most recent arterial pH, most recent bicarbonate, BMI, most recent creatinine, most recent FiO2, gender, most recent heart rate, most recent PaCO2, lowest PaO2/FiO2 within 24 hours following ARDS diagnosis, most recent PEEP, lowest platelet count, most recent mean arterial pressure, and most recent respiratory rate. In various embodiments, the patient subphenotype classifier has at least one of an area under receiver-operator curve (AUROC) greater than or equal to 0.67 and an area under the precision-recall curve (AUPRC) greater than or equal to 0.46.
In various embodiments, the classification of the subject is selected from three or more subphenotypes. In various embodiments, the three or more subphenotypes comprise a lower risk subphenotype, a medium risk subphenotype, and a high risk subphenotype. In various embodiments, the classification of the subject is selected from three by comparing a score to two threshold values. In various embodiments, the patient subphenotype classifier has at least an area under receiver-operator curve (AUROC) greater than or equal to 0.691.
In various embodiments, the patient subphenotype classifier is trained using a training dataset comprising patient data from one or more clinical trial datasets. In various embodiments, the one or more clinical trial datasets are any of ARMA dataset, KARMA dataset, LARMA dataset, ALVEOLI dataset, EDEN dataset, FACTT dataset, SAILS dataset, ROSE dataset, eICU-CRD dataset, and the Brazillian ART dataset. In various embodiments, the patient data is derived from a sub-cohort of patients of the one or more clinical trial datasets, wherein the sub-cohort of patients are characterized by having a ratio of arterial oxygen concentration to the fraction of inspired oxygen (P/F ratio) of less than or equal to 200. In various embodiments, the patient data is derived from a sub-cohort of patients of the one or more clinical trial datasets, wherein the sub-cohort of patients are characterized by having a ratio of arterial oxygen concentration to the fraction of inspired oxygen (P/F ratio) of less than or equal to 300.
In various embodiments, the two or more subphenotypes comprise subphenotype A and subphenotype B that are characterized by differences in expression levels in one or more biomarkers. In various embodiments, the one or more biomarkers comprise one or more of PAI-1, IL-6, IL-8, IL-10, TNFR-I, TNFR-II, ICAM-1, or von Willebrand factor. In various embodiments, the one or more biomarkers comprise each of PAI-1, IL-6, IL-8, IL-10, TNFR-I, TNFR-II, ICAM-1, or von Willebrand factor.
Additionally disclosed herein is a system comprising: a storage memory configured to store electronic health record (EHR) data for a subject exhibiting acute respiratory distress syndrome (ARDS); and a processor communicatively coupled to the storage memory to: obtain a classification of the subject exhibiting acute respiratory distress syndrome (ARDS), the classification of the subject selected from two or more subphenotypes and determined using the system of any one of claims 183-249; and identify a mortality prognosis for the subject based at least in part on the classification, wherein responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes, the mortality prognosis identified for the subject comprises high mortality risk, and wherein responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes, the mortality prognosis identified for the subject comprises low mortality risk.
In various embodiments, low mortality risk comprises at least one of reduced risk of hospital mortality, reduced risk of ICU mortality, reduced risk of 28-day mortality, reduced risk of 90-day mortality, reduced risk of 180-day mortality, and reduced risk of 6-month mortality relative to high mortality risk. In various embodiments, low mortality risk further comprises positive patient outcome, wherein high mortality risk further comprises negative patient outcome, and wherein positive patient outcome comprises at least one of shorter hospital length of stay, shorter ICU length of stay and more ventilator-free days relative to negative patient outcome.
Additionally disclosed herein is a system comprising: a storage memory configured to store electronic health record (EHR) data for a subject exhibiting acute respiratory distress syndrome (ARDS); and a processor communicatively coupled to the storage memory to: obtain a classification of a subject exhibiting acute respiratory distress syndrome (ARDS), the classification of the subject selected from two or more subphenotypes and determined using the system of any one of claims 183-249; and identify a therapy recommendation for the subject based at least in part on the classification, wherein responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes, the therapy recommendation identified for the subject comprises one or more of neuromuscular blockade (NMB) therapy or no NMB therapy, high PEEP or low PEEP, no treatment or methylprednisolone, dexamethasone, no lisofylline, ketoconazole, catheter and fluid treatment, recruitment maneuver, statins, or full or trophic enteral feeding and wherein responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes, the therapy recommendation identified for the subject comprises one or more of NMB therapy, low PEEP therapy, no methylprednisolone, no treatment or dexamethasone, no treatment or lisofylline, no treatment or ketoconazole, no combination of catheter and fluid treatment, no recruitment maneuver, statins as a preemptive therapy, or full enteral feeding.
Additionally disclosed herein is a system comprising: a storage memory configured to store electronic health record (EHR) data for a subject exhibiting acute respiratory distress syndrome (ARDS); and a processor communicatively coupled to the storage memory to: for one or more subjects, obtain a classification of the subject exhibiting acute respiratory distress syndrome (ARDS), the classification of the subject selected from two or more subphenotypes and determined using the system of any one of claims 183-249; and determine whether the subject is a candidate subject based at least in part on the classification.
In various embodiments, the therapy is a neuromuscular blockade (NMB) therapy, and wherein determining whether the subject is a candidate subject comprises determining that the subject is a likely responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is a neuromuscular blockade (NMB) therapy, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a low positive end-expiratory pressure (PEEP) treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is a high positive end-expiratory pressure (PEEP) treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a corticosteroid treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a corticosteroid treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the corticosteroid treatment is methylpredinosolone or dexamethasone. In various embodiments, the therapy is a lisofylline treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a lisofylline treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is a ketoconazole treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a pulmonary artery catheter and liberal fluid treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a pulmonary artery catheter and liberal fluid treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the catheter and fluid treatment comprises a central venous catheter line treatment or a pulmonary artery catheter line treatment. In various embodiments, the therapy is a recruitment maneuver, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a recruitment maneuver, and wherein determining whether the subject is a candidate subject comprises determining that the subject is unlikely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is a statin treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes. In various embodiments, the therapy is a preemptive statin treatment, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is a full enteral feeding, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype A from the two or more subphenotypes. In various embodiments, the therapy is a trophic enteral feeding, and wherein determining whether the subject is a candidate subject comprises determining that the subject is likely to be a responder responsive to the classification of the subject comprising subphenotype B from the two or more subphenotypes.
These and other features, aspects, and advantages of the present disclosure will become better understood with regard to the following description, and accompanying drawings, where:
The figures depict various embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein can be employed without departing from the principles of the disclosure described herein.
In general, terms used in the claims and the specification are intended to be construed as having the plain meaning understood by a person of ordinary skill in the art. Certain terms are defined below to provide additional clarity. In case of conflict between the plain meaning and the provided definitions, the provided definitions are to be used.
The terms “patient” or “subject” are used interchangeably and encompass or organism, mammals including humans or non-humans (e.g., non-human primates, canines, felines, murines, bovines, equines, and porcines), whether in vivo, ex vivo, or in vitro, male or female.
The term “sample” can include a single cell or multiple cells or fragments of cells or an aliquot of body fluid, such as a blood sample, taken from a subject, by means including venipuncture, excretion, ejaculation, massage, biopsy, needle aspirate, lavage sample, scraping, surgical incision, or intervention or other means known in the art. Examples of an aliquot of body fluid include amniotic fluid, aqueous humor, bile, lymph, breast milk, interstitial fluid, blood, blood plasma, cerumen (earwax), Cowper’s fluid (pre-ejaculatory fluid), chyle, chyme, female ejaculate, menses, mucus, saliva, urine, vomit, tears, vaginal lubrication, sweat, serum, semen, sebum, pus, pleural fluid, cerebrospinal fluid, synovial fluid, intracellular fluid, and vitreous humour.
The term “obtaining or having obtained EHR data” encompasses obtaining a set of data determined from at least one sample. Obtaining a dataset encompasses obtaining a sample and processing the sample to experimentally determine the data. The phrase also encompasses receiving a set of data, e.g., from a third party that has processed the sample to experimentally determine the dataset. Additionally, the phrase encompasses mining data from at least one database or at least one publication or a combination of databases and publications. A dataset can be obtained by one of skill in the art via a variety of known ways including stored on a storage memory.
Any terms not directly defined herein shall be understood to have the meanings commonly associated with them as understood within the art of the disclosure. Certain terms are discussed herein to provide additional guidance to the practitioner in describing the compositions, devices, methods and the like of aspects of the disclosure, and how to make or use them. It will be appreciated that the same thing can be said in more than one way. Consequently, alternative language and synonyms can be used for any one or more of the terms discussed herein. No significance is to be placed upon whether or not a term is elaborated or discussed herein. Some synonyms or substitutable methods, materials and the like are provided. Recital of one or a few synonyms or equivalents does not exclude use of other synonyms or equivalents, unless it is explicitly stated. Use of examples, including examples of terms, is for illustrative purposes only and does not limit the scope and meaning of the aspects of the disclosure herein.
Additionally, as used in the specification, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise.
In various embodiments, the subject 110 is an individual that was diagnosed with acute respiratory distress syndrome (ARDS). For example, the subject 110 may have been clinically diagnosed as having mild ARDS, moderate ARDS, or severe ARDS based on the Berlin definition. For example, a patient may have been clinically diagnosed with mild ARDS for exhibiting a decreased PaO2/FiO2 ratio of between 201-300 mmHg. As another example, a patient may have been clinically diagnosed with moderate ARDS for exhibiting a decreased PaO2/FiO2 ratio of between 101-200 mmHg. As another example, a patient may have been clinically diagnosed with severe ARDS for exhibiting a decreased PaO2/FiO2 ratio of less than 100 mmHg. In various embodiments, the individual may have been diagnosed with ARDS based on radiologic imaging (e.g., X-ray imaging) or other types of imaging (e.g., CT imaging or ultrasound imaging) that reveals pulmonary accumulation that results in symptoms of ARDS.
Generally, the electronic health record system 120 stores electronic health record (EHR) data for one or more subjects (e.g., subject 110). For example, the electronic health record system 120 may be a physician’s office, the emergency department of a hospital, the intensive care unit of a hospital, the ward of a hospital, a clinical laboratory, a research laboratory, a consumer medical device, a therapeutic device (e.g., an infusion pump), a monitoring device such as a wearable device (e.g., a heart rate monitor), or any other site. Different examples of EHR data is described further herein.
In particular embodiments, the electronic health record system 120 is operated by a party that interacts with the subject 110 (e.g., interacts with subject 110 by diagnosing the subject 110 with ARDS). For example, the electronic health record system 120 can be operated within a healthcare provider’s office and therefore, the electronic health record system 120 stores EHR data of a subject 110 that visits the healthcare provider. In various embodiments, the electronic health record system 120 is operated in a critical care setting. For example, the electronic health record system 120 can be operated within a hospital department (e.g., emergency department or intensive care unit in a hospital). Thus, the EHR data of the subject 110 can be obtained and stored by the electronic health record system 120 for subsequent analysis (e.g., by the patient classifier system 130) to identify a possible treatment for the subject 110. In various embodiments, the electronic health system 120 serves as a repository that electronically records EHR data. Here, the electronic health system 120 can serve as a third-party system that is remote from a location in which the subject 110 is observed and/or interacted with. In such embodiments, the electronic health system 120 can be transmitted the EHR data obtained from a subject 110.
In various embodiments, the electronic health record system 120 can be any of a private, public, and/or commercial source of EHR data. For example, the electronic health record system 120 can be a private medical and/or health record and/or middleware system including a patient care center record system, a clinical laboratory record system, a research laboratory record system, such as EPIC®, Cerner®, Allscripts®, MedMined™, Beaker®, and Data Innovations®, and any alternative private medical and/or health record and/or middleware system. In various embodiments, the electronic health record system 120 stores publicly- and/or commercially-available source of EHR data, including published medical record databases and scientific publications such as PhysioNet datasets including the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) datasets, Philips eICU datasets, and National Heart, Lung, and Blood Institute Biospecimen and Data Repository Information Coordinating Center (BioLINCC) datasets.
The patient classifier system 130 analyzes EHR data stored by the one or more electronic health record systems 120 and determines a treatment prediction 140 (e.g., a treatment prediction for the subject 110). In various embodiments, the patient classifier system 130 applies a patient subphenotype classifier to predict a classification for subject 110. According the classification, the patient classifier system 130 can determine a treatment prediction 140 for the subject 110 that is likely to be efficacious. In various embodiments, a patient subphenotype classifier can be a machine-learned model. In such embodiments, the patient classification system 130 may train the patient subphenotype classifier using training data and/or deploy the patient subphenotype classifier to analyze the EHR data of the subject 110.
In various embodiments, the patient classifier system 130 and the electronic health record system 120 are operated by different entities. For example, the electronic health record system 120 can be operated by a hospital or healthcare provider, and the patient classifier system 130 can be operated by a third party system that receives and analyzes EHR data stored by the electronic health record system 120. In such embodiments, the electronic health record system 120 transmits EHR data to the patient classifier system 130. The patient classifier system 130 deploys a patient subphenotype classifier and generates a prediction (e.g., treatment prediction 140). The patient classifier system 130 can provide the treatment prediction 140 to the electronic health record system 120 (e.g., to guide patient treatment using the treatment prediction 140).
In various embodiments, the electronic health record system 120 and patient classifier system 130 are implemented in a critical care setting such that a therapy prediction is to be generated for a subject 110 within a maximum amount of time. In various embodiments, the maximum amount of time is 30 minutes. In various embodiments, the maximum amount of time is 1 hour, 2 hours, 3 hours, 4 hours, 5 hours, 6 hours, 7 hours, 8 hours, 9 hours, 10 hours, 11 hours, or 12 hours. Thus, within the maximum amount of time, a therapy prediction is generated and a therapy can be selected for possible administration to the subject 110.
In various embodiments, the patient classifier system 130 and/or the electronic health record system 120 can be distributed computing systems implemented in a cloud computing environment. For example, steps performed by the patient classifier system 130 can be performed using systems in geographically different locations. In particular embodiments, the patient classifier system 130 receives EHR data from the electronic health record system 120 at a first location. The patient classifier system 130 transmits the EHR data and analyzes the EHR data to predict a classification using a patient subphenotype classifier at a second location (e.g., cloud computing). The patient classification system 130 can further transmit the classification back to the first location for subsequent use.
Cloud computing can be employed to offer on-demand access to the shared set of configurable computing resources. The shared set of configurable computing resources can be rapidly provisioned via virtualization and released with low management effort or service provider interaction, and then scaled accordingly. A cloud-computing model can be composed of various characteristics such as, for example, on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, and so forth. A cloud-computing model can also expose various service models, such as, for example, Software as a Service (“SaaS”), Platform as a Service (“PaaS”), and Infrastructure as a Service (“IaaS”). A cloud-computing model can also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth. In this description and in the claims, a “cloud-computing environment” is an environment in which cloud computing is employed.
Turning next to
Generally, the model training module 150 constructs a patient subphenotype classifier that is useful for deployment (e.g., by the model deployment module 155) for analyzing EHR data from a subject. In various embodiments, the model training module 150 can construct various patient subphenotype classifiers, each of which is useful for deployment (e.g., by the model deployment module 155) for analyzing EHR data from a subject. In various embodiments, different patient subphenotype classifiers can be structured to receive different input variables (e.g., different EHR data). Therefore, different patient subphenotype classifiers can analyze different EHR data to determine a classification.
In some embodiments, the training data store 170 stores the training dataset that is used to train the patient subphenotype classifier. In various embodiments, the contents of the training dataset depend on the type of the patient subphenotype classifier being trained. In general, the training dataset comprises a plurality of training samples. Each training sample i from the training dataset is associated with a retrospective subject. Each training sample i that is associated with a retrospective subject comprises EHR data for the retrospective subject. Depending on the type of the patient subphenotype classifier, each training sample i of the training dataset may further comprise additional components. For example, in embodiments in which the patient subphenotype classifier is learned via supervised learning, each training sample i from the training dataset can further include a retrospective classification for the retrospective subject associated with the training sample (e.g., a reference ground truth value).
The model deployment module 155 selects one or more patient subphenotype classifiers to be deployed for analyzing EHR data for a subject. In various embodiments, the model deployment module 155 selects and deploys one patient subphenotype classifier to predict a classification for the subject. In various embodiments, the model deployment module 155 selects and deploys multiple patient subphenotype classifiers to predict a classification for the subject. For example, the model deployment module 155 can select and deploy X different patient subphenotype classifiers, each of which determines a classification for the subject. Thus, the model deployment module 155 can compare the classifications for the subject across the different patient subphenotype classifiers and assigns a single classification for the subject. For example, the model deployment module 155 can assign a single classification for the subject that appears across a majority of the outputs of the different patient subphenotype classifiers.
In various embodiments, the model deployment module 155 selects a patient subphenotype classifier to be deployed based on the EHR data that is available. For example, assume that a patient subphenotype classifier receives Y different EHR data variables as input. If less than the Y different EHR data variables are available, the model deployment module 155 can determine whether the EHR data contains Z different EHR data variables such that a different patient subphenotype classifier that receives the Z different EHR data variables (e.g., where Z is less than Y) can be deployed. If the EHR data does not include the Z different EHR data variables, the model deployment module 155 can repeat the process and continue to search for a patient subphenotype classifier that receives fewer EHR data variables as input for which the data variables are available in the EHR data.
In various embodiments, a patient subtype classifier outputs a prediction such as a score. Here, the score can be indicative of the classification for the subject. In various embodiments, the model deployment module 155 compares the score outputted by a patient subtype classifier to one or more threshold scores to determine the classification for the subject. As an example, the patient subtype classifier may output a score between 0 and 1. The model deployment module 155 compares the score outputted by the patient subtype classifier to one or more threshold values. In various embodiments, a threshold value can be a score of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, or 0.9. In particular embodiments, the threshold value can be a score of 0.5. Therefore, the model deployment module 155 can compare the score outputted by the patient subtype classifier to the threshold value and classifies the subject based on whether the score is lower or higher than the threshold value.
In various embodiments, the model deployment module 155 compares the score outputted by the patient subtype classifier to two threshold values and classifies the subject based on the two comparisons. In various embodiments, the first threshold value can be a score of 0.1, 0.2, 0.3, 0.4, or 0.5. In various embodiments, the second threshold value can be a score of 0.5, 0.6, 0.7, 0.8, or 0.9. In particular embodiments, the first threshold value is a score of 0.3 and the second threshold value is a score of 0.6. In particular embodiments, the first threshold value is a score of 0.4 and the second threshold value is a score of 0.7. Therefore, the model deployment module 155 compares the score outputted by the patient subtype classifier to both the first threshold value and the second threshold value. Based on the comparisons, the model deployment module 155 classifies the subject into one of three different classifications (e.g., first classification = score is less than first threshold value, second classification = score is greater than first threshold value but less than second threshold value, and third classification = score is greater than second threshold value).
In various embodiments, the model deployment module 155 compares the score outputted by the patient subtype classifier to A different threshold values and classifies the subject based on the X comparisons. For example, the A different threshold values delineates X-1 different score ranges and therefore, based on the X comparisons, the model deployment module 155 determines that the score outputted by the patient subtype classifiers is within one of the X-1 score ranges. Therefore, the model deployment module 155 classifies the subject into a classification corresponding to the one of the X-1 score ranges.
The treatment selection module 160 selects one or more treatments for a subject according to the classification of the subject determined by the model deployment module 155. For example, the treatment selection module 160 may access a lookup table that includes previously determined correspondences between one or more treatments and the classification of the subject. Further examples of specific guided therapies according to patients subphenotypes is described herein.
In various embodiments, the treatment selection module 160 selects one treatment for the subject according to the classification of the subject. In various embodiments, the treatment selection module 160 selects two treatments for the subject according to the classification of the subject. In various embodiments, the treatment selection module 160 selects three treatments for the subject according to the classification of the subject. In various embodiments, the treatment selection module 160 selects four treatments for the subject according to the classification of the subject. In various embodiments, the treatment selection module 160 selects five treatments for the subject according to the classification of the subject.
In various embodiments, the treatment selection module 160 generates a list of the selected one or more treatments and transmits the list. For example, in some embodiments, the treatment selection module 160 transmits the list of selected one or more treatments to a third party such that the list can guide the treatment of the subject under the care of the third party. For example, the third party system can be a hospital department (e.g., intensive care unit or emergency department) at which the subject is located. Therefore, the third party system can provide one or more of the selected treatments identified and provided by the treatment selection module 160.
Generally, the patient subtype classifier is a predictive model that classifies a subject into one out of a plurality of possible classifications based on the EHR data of the subject. In particular embodiments, the patient subtype classifier classifies the subject in a subphenotype out of two possible subphenotypes based on the EHR data of the subject. In particular embodiments, the patient subtype classifier classifies the subject in a subphenotype out of three possible subphenotypes based on the EHR data of the subject. In particular embodiments, the patient subtype classifier classifies the subject in a subphenotype out of four, five, six, seven, eight, nine, or ten possible subphenotypes based on the EHR data of the subject. Additional examples of patient subphenotypes are described herein.
Generally, the patient subtype classifier analyzes EHR data of a subject. In particular embodiments, the patient subtype classifier does not analyze biomarker data for the subject. By analyzing EHR data and not biomarker data, such a patient subtype classifier can be rapidly implemented, which is useful in settings where time is of the essence, such as in critical care settings. Analyzing a sample to obtain biomarker data for a subject can require more resources (e.g., resources in terms of time reagent assays) than obtaining EHR data for the subject.
In various embodiments, the patient subphenotype classifier is a machine learned model. In various embodiments, the predictive model is any one of a regression model (e.g., linear regression, logistic regression, or polynomial regression), decision tree, random forest, support vector machine, Naive Bayes model, k-means cluster, or neural network (e.g., feed-forward networks, convolutional neural networks (CNN), deep neural networks (DNN), autoencoder neural networks, generative adversarial networks, or recurrent networks (e.g., long short-term memory networks (LSTM), bi-directional recurrent networks, deep bi-directional recurrent networks), or any combination thereof. In particular embodiments, the patient subphenotype classifier is a k-mean cluster model that performs unsupervised clustering of subjects according to their EHR data. In particular embodiments, the patient subphenotype classifier is a logistic regression model, such as a Bayesian logistic regression model. In various embodiments, the patient subphenotype classifier is a mixed-effect Bayesian logistic regression model. In various embodiments, the patient subphenotype classifier is a Bayesian hierarchical logistic model that is modelled as a simple regression and shrinkage model.
In various embodiments, the patient subphenotype classifier can be trained using a machine learning implemented method, such as any one of a linear regression algorithm, logistic regression algorithm, decision tree algorithm, support vector machine classification, Naive Bayes classification, K-Nearest Neighbor classification, random forest algorithm, deep learning algorithm, gradient boosting algorithm, and dimensionality reduction techniques such as manifold learning, principal component analysis, factor analysis, autoencoder regularization, and independent component analysis, or combinations thereof. In various embodiments, the predictive model is trained using supervised learning algorithms, unsupervised learning algorithms, semi-supervised learning algorithms (e.g., partial supervision), weak supervision, transfer, multi-task learning, or any combination thereof. In particular embodiments, the predictive model is trained using supervised learning algorithms.
In various embodiments, the predictive model has one or more parameters, such as hyperparameters or model parameters. Hyperparameters are generally established prior to training. Examples of hyperparameters include the learning rate, depth or leaves of a decision tree, number of hidden layers in a deep neural network, number of clusters in a k-means cluster, penalty in a regression model, and a regularization parameter associated with a cost function. Model parameters are generally adjusted during training. Examples of model parameters include weights associated with nodes in layers of neural network, support vectors in a support vector machine, and coefficients in a regression model. The model parameters of the predictive model are trained (e.g., adjusted) using the training data to improve the predictive capacity of the predictive model.
In various embodiments, the patient subphenotype classifier comprises a parametric-model. Thus, such a patient phenotype classifier can be represented as:
where y denotes the prediction determined by the patient phenotype classifier, xk denotes the independent variables (e.g., x1 = EHR data), θ denotes the set of parameters, and ƒ(·) is the function.
In some embodiments, the patient phenotype classifier comprises two or more functions. In such embodiments, the model can be represented as:
where the indicator “ * ” represents any mathematical operation (e.g., summation, multiplication, etc.) such that the two functions, ƒ1 and ƒ2, are combined to determine y, the prediction.
In some embodiments, the patient phenotype classifier comprises two or more functions where the output of a first function serves as input to a second function. In such embodiments, the model can be represented as:
where ƒ is the first function and the output of ƒ serves as input to the second function g.
In some embodiments, the patient phenotype classifier comprises a plurality of functions whose outputs serve as input to one or more functions. In such embodiments, the model can be represented as:
where ƒ1 and ƒ2 are the plurality of functions whose output serve as input to an additional function g, which outputs y, the prediction.
In certain embodiments in which xk denotes multiple different independent variables (e.g., x1 and x2), the multiple independent variables can be combined prior to being input into the function ƒ(·). For example, independent variables of different EHR data can be combined to create a new independent variable prior to being input into the function ƒ(·). For example, EHR data in the form of PaO2 can be combined with the subject’s EHR data in the form of FiO2 to create a new independent variable describing the ratio of the two values (e.g., PaO2/FiO2). In some embodiments in which xk denotes multiple different independent variables (e.g., x1 and x2), the different independent variables remain separate and distinct from one another when input into the function ƒ(·).
The function f(·) can be any function, and can comprise any combination of hyperparameters. For example, in some embodiments, the function f(·) can be an affine function given by:
that linearly combines independent variables xk with a corresponding parameter in the set of parameters.
As another example, in some embodiments, the function ƒ(·) can be a network function given by:
where NN(-) is a network model. Generally, network models NN(·) can be feed-forward networks, such as artificial neural networks (ANN), convolutional neural networks (CNN), deep neural networks (DNN), and/or recurrent networks, such as long short-term memory networks (LSTM), bi-directional recurrent networks, deep bi-directional recurrent networks,and the like. A network model NN(·) can be defined by any combination of hyperparameters. For example, in a recurrent network, the network can comprise any number of hidden layers, with any number of nodes per layer, and each layer can comprise any layer type, including, but not limited to, a Masking Layer, a Long-Short Term Memory (LSTM) Layer, a Gated Recurrent Units (GRU) Layer, and a Densification Layer. Furthermore, the learning rate of the model can comprise any rate.
In even further embodiments, the function f(·) can be an ensemble of decision trees, such as a random forest or a gradient boosting classifier. In such embodiments, any number of decision trees may be incorporated into the model, and each decision tree may have any maximum depth. Furthermore, the learning rate of the model can comprise any rate.
As discussed above with regard to Equation 1, the function f(·) can be any function. For example, in some embodiments the function f(·) can be an affine function depicted in Equation 2, where xk becomes x1 or x2. Alternatively, the function ƒ(·) can be a network function depicted in Equation 3, where xk becomes x1 or x2. In even further embodiments, the function ƒ(·) can be an ensemble of decision trees, such as a random forest or a gradient boosting classifier.
Reference is made to
In various embodiments, the classifier 230 receives, as input, values of one or more different types of EHR data. Different types of EHR data for a subject include any of: arterial pH, bicarbonate levels, creatinine levels, potassium levels, fraction of inspired oxygen (FiO2), heart rate, mean arterial pressure, respiration rate, partial pressure of oxygen (PaO2), gender, age, bilirubin levels, partial pressure of carbon dioxide (PaCO2), ratio of PaO2/FiO2, positive end expiratory pressure (PEEPR), platelet count, mean airway pressure, tidal volume, diastolic blood pressure, systolic blood pressure, plateau pressure, minute ventilation, vasopressor use, and body mass index (BMI). In various embodiments, EHR data can refer to a most recent measurement any of arterial pH, bicarbonate levels, creatinine levels, potassium levels, fraction of inspired oxygen (FiO2), heart rate, mean arterial pressure, respiration rate, partial pressure of oxygen (PaO2), gender, age, bilirubin levels, partial pressure of carbon dioxide (PaCO2), ratio of PaO2/FiO2, positive end expiratory pressure (PEEPR), platelet count, mean airway pressure, tidal volume, diastolic blood pressure, systolic blood pressure, plateau pressure, minute ventilation, vasopressor use (e.g., use in the last 24 hours), and body mass index (BMI). As described herein, most recent measurement of EHR data is denoted using “R” that is appended after the type of EHR data. For example, a most recent measure of heart rate is denoted as “heart rate-R” or “HRATER” where the “R” notation is underlined and bolded.
In various embodiments, an alternative to a most recent measurement of EHR data can be used. In various embodiments, EHR data can be aggregated according to a standard midpoint for an EHR data input. For example, for a highest and lowest value of a EHR data input, the distance from the mean is calculated. Whichever value (highest or lowest) was furthest from the mean can be selected as a feature for input.
In various embodiments, EHR data can refer to the lowest measurement of any of arterial pH, bicarbonate levels, creatinine levels, potassium levels, fraction of inspired oxygen (FiO2), heart rate, mean arterial pressure, respiration rate, partial pressure of oxygen (PaO2), bilirubin levels, partial pressure of carbon dioxide (PaCO2), ratio of PaO2/FiO2, positive end expiratory pressure (PEEPR), platelet count, mean airway pressure, tidal volume, diastolic blood pressure, systolic blood pressure, plateau pressure, minute ventilation, and body mass index (BMI). As described herein, lowest measurement of EHR data is denoted using “L” that is appended after the type of EHR data. For example, a lowest measure of bicarbonate is denoted as “bicarbonate-L” or “BICARL” where the “L” notation is underlined and bolded.
In various embodiments, EHR data can refer to the highest measurement of any of: arterial pH, bicarbonate levels, creatinine levels, potassium levels, fraction of inspired oxygen (FiO2), heart rate, mean arterial pressure, respiration rate, partial pressure of oxygen (PaO2), bilirubin levels, partial pressure of carbon dioxide (PaCO2), ratio of PaO2/FiO2, positive end expiratory pressure (PEEPR), platelet count, mean airway pressure, tidal volume, diastolic blood pressure, systolic blood pressure, plateau pressure, minute ventilation, and body mass index (BMI). As described herein, highest measurement of EHR data is denoted using “H” that is appended after the type of EHR data. For example, a highest measure of bilirubin is denoted as “bilirubin-H” or “BILIH” where the “H” notation is underlined and bolded.
In various embodiments, EHR data can refer to measurements obtained at a clinically relevant time. In various embodiments, a clinically relevant time refers to a time the subject was admitted (e.g., admitted to the hospital). In various embodiments, a clinically relevant time refers to a time the subject was admitted into the emergency department or in the intensive care unit (ICU). In various embodiments, a clinically relevant time refers to a time the subject was enrolled into a clinical trial. In various embodiments, a clinically relevant time refers to a time the subject was diagnosed (e.g., diagnosed with ARDS). In various embodiments, a clinically relevant time refers to a time a clinician ordered a test for the subject. Thus, in such embodiments, the EHR can refer to the measurement at the clinically relevant time for any of arterial pH, bicarbonate levels, creatinine levels, potassium levels, fraction of inspired oxygen (FiO2), heart rate, mean arterial pressure, respiration rate, partial pressure of oxygen (PaO2), bilirubin levels, partial pressure of carbon dioxide (PaCO2), ratio of PaO2/FiO2, positive end expiratory pressure (PEEPR), platelet count, mean airway pressure, tidal volume, diastolic blood pressure, systolic blood pressure, plateau pressure, minute ventilation, vasopressor use, and body mass index (BMI).
In various embodiments, a patient subphenotype classifier receives, as input, values of at least two different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least three different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least four different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least five different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least six different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least seven different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least eight different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least nine different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least ten different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least eleven different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least twelve different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least thirteen different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least fourteen different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least fifteen different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least sixteen different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least seventeen different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least eighteen different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least nineteen different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of at least twenty different types of EHR data.
In various embodiments, a patient subphenotype classifier receives, as input, values of two different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of three different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of four different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of five different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of six different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of seven different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of eight different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of nine different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of ten different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of eleven different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of twelve different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of thirteen different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of fourteen different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of fifteen different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of sixteen different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of seventeen different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of eighteen different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of nineteen different types of EHR data. In various embodiments, a patient subphenotype classifier receives, as input, values of twenty different types of EHR data.
In various embodiments, a patient subphenotype classifier receives, as input, the following thirteen input variables: Arterial pH-R, Bicarbonate-L, creatinine -R, Diastolic BP-R, FIO2-R, Heart Rate-R, Mean arterial pressure-H, mean arterial pressure-L, potassium-R, respiratory rate-H, respiratory rate-L, most recent oxygen saturation (SPO2—R), systolic BP-R.
In various embodiments, a patient subphenotype classifier receives, as input, the following eight input variables: Arterial pH-R, bicarbonate-L, creatinine-R, FIO2-R, heart rate-R, PaO2—R, mean arterial pressure-R, respiratory rate-R.
In various embodiments, a patient subphenotype classifier receives, as input, the following seventeen input variables: Age, arterial pH-R, bicarbonate-L, bilirubin-H, BMI, creatinine-R, FiO2-R, gender, heart rate-R, PaCO2—R, PaO2/FiO2-LP, PaO2—R, PEEP-R, Platelet-L, Tidal Volume-R, mean arterial pressure-R, respiratory rate-R.
In various embodiments, a patient subphenotype classifier receives, as input, the following thirteen input variables: Arterial pH-R, bicarbonate-R, BMI, creatinine-R, FiO2-R, gender, heart rate-R, PaCO2—R, PaO2/FiO2-LP, PEEP-R, Platelets-L, mean arterial pressure-R, respiratory rate-R.
In various embodiments, a patient subphenotype classifier receives, as input, the following nine input variables: Arterial pH-R, bicarbonate-L, creatinine-R, FIO2-R, heart rate-R, PaO2—R, mean airway pressure-R, respiratory rate-R, bilirubin-H.
In various embodiments, a patient subphenotype classifier receives, as input, the following sixteen input variables: Age, arterial pH-R, bicarbonate-L, bilirubin-H, creatinine-R, FiO2-R, gender, heart rate-R, PaCO2—R, PaO2/FiO2-LP, PaO2—R, PEEP-R, Platelet-L, Tidal Volume-R, mean arterial pressure-R, respiratory rate-R.
In various embodiments, a patient subphenotype classifier receives, as input, the following eight input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, and creatinine-R. Such an example patient subphenotype classifier is described in Example 5 as Model B.1.
In various embodiments, a patient subphenotype classifier receives, as input, the following nine input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, and bilirubin-H. Such an example patient subphenotype classifier is described in Example 5 as Model B.2.
In various embodiments, a patient subphenotype classifier receives, as input, the following eleven input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, bilirubin-H, age, and gender. Such an example patient subphenotype classifier is described in Example 5 as Model B.3.
In various embodiments, a patient subphenotype classifier receives, as input, the following ten input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, age, and gender. Such an example patient subphenotype classifier is described in Example 5 as Model B.4.
In various embodiments, a patient subphenotype classifier receives, as input, the following fifteen input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, PaCO2—R, PaO2/FiO2, bicarbonate-L, creatinine-R, platelet-L, age, gender, positive end-expiratory pressure-R, and tidal volume-R. Such an example patient subphenotype classifier is described in Example 5 as Model B.5.
In various embodiments, a patient subphenotype classifier receives, as input, the following sixteen input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, PaCO2—R, PaO2/FiO2, bicarbonate-L, creatinine-R, bilirubin-H, platelet-L, age, gender, positive end-expiratory pressure-R, and tidal volume-R. Such an example patient subphenotype classifier is described in Example 5 as Model B.6.
In various embodiments, a patient subphenotype classifier receives, as input, the following ten input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, PaCO2—R, bicarbonate-L, creatinine-R, and bilirubin-H. Such an example patient subphenotype classifier is described in Example 5 as Model B.7.
In various embodiments, a patient subphenotype classifier receives, as input, the following eleven input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, PaCO2—R, bicarbonate-L, creatinine-R, bilirubin-H, and platelet-L. Such an example patient subphenotype classifier is described in Example 5 as Model B.8.
In various embodiments, a patient subphenotype classifier receives, as input, the following nine input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, PaCO2—R, bicarbonate-L, and creatinine-R. Such an example patient subphenotype classifier is described in Example 5 as Model B.9.
In various embodiments, a patient subphenotype classifier receives, as input, the following five input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, age, and gender. Such an example patient subphenotype classifier is described in Example 5 as Model B.10.
In various embodiments, a patient subphenotype classifier receives, as input, the following twelve input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, PaCO2—R, bicarbonate-L, creatinine-R, bilirubin-H, age, and gender. Such an example patient subphenotype classifier is described in Example 5 as Model B.11.
In various embodiments, a patient subphenotype classifier receives, as input, the following fourteen input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, PaCO2—R, PaO2/FiO2, bicarbonate-L, creatinine-R, bilirubin-H, platelets-L, positive end-expiratory pressure-R, and tidal volume-R. Such an example patient subphenotype classifier is described in Example 5 as Model B.12.
In various embodiments, a patient subphenotype classifier receives, as input, the following twenty input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, PaCO2—R, PaO2/FiO2, bicarbonate-L, creatinine-R, bilirubin-H, platelets-L, age, gender, body mass index, positive end-expiratory pressure-R, tidal volume-R, plateau pressure-R, minute ventilation-R, and vasopressor use in the prior 24 hours. Such an example patient subphenotype classifier is described in Example 5 as Model B.13.
In various embodiments, a patient subphenotype classifier receives, as input, the following seven input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, bicarbonate-L, and creatinine-R. Such an example patient subphenotype classifier is described in Example 5 as Model B.14.
In various embodiments, a patient subphenotype classifier receives, as input, the following six input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, and bicarbonate-L. Such an example patient subphenotype classifier is described in Example 5 as Model B.15.
In various embodiments, a patient subphenotype classifier receives, as input, the following seven input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, PaCO2—R, and bicarbonate-L. Such an example patient subphenotype classifier is described in Example 5 as Model B.16.
In various embodiments, a patient subphenotype classifier receives, as input, the following eight input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, and creatinine-R. Such an example patient subphenotype classifier is described in Example 7 as Model C.1.
In various embodiments, a patient subphenotype classifier receives, as input, the following eight input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, and vasopressor use in the prior 24 hours. Such an example patient subphenotype classifier is described in Example 7 as Model C.2.
In various embodiments, a patient subphenotype classifier receives, as input, the following ten input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, age, gender, and vasopressor use in the prior 24 hours. Such an example patient subphenotype classifier is described in Example 7 as Model C.3.
In various embodiments, a patient subphenotype classifier receives, as input, the following nine input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, and bilirubin-H. Such an example patient subphenotype classifier is described in Example 7 as Model C.4.
In various embodiments, a patient subphenotype classifier receives, as input, the following eleven input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, bilirubin-H, age, and gender. Such an example patient subphenotype classifier is described in Example 7 as Model C.5.
In various embodiments, a patient subphenotype classifier receives, as input, the following fourteen input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, bilirubin-H, platelets-L, positive end-expiratory pressure-R, tidal volume-R, plateau pressure-R, and vasopressor use in the prior 24 hours. Such an example patient subphenotype classifier is described in Example 7 as Model C.6.
In various embodiments, a patient subphenotype classifier receives, as input, the following thirteen input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, platelets-L, positive end-expiratory pressure-R, tidal volume-R, plateau pressure-R, and vasopressor use in the prior 24 hours. Such an example patient subphenotype classifier is described in Example 7 as Model C.7.
In various embodiments, a patient subphenotype classifier receives, as input, the following fifteen input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, platelets-L, age, gender, positive end-expiratory pressure-R, tidal volume-R, plateau pressure-R, and vasopressor use in the prior 24 hours. Such an example patient subphenotype classifier is described in Example 7 as Model C.8.
In various embodiments, a patient subphenotype classifier receives, as input, the following sixteen input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, bilirubin-H, platelets-L, age, gender, positive end-expiratory pressure-R, tidal volume-R, plateau pressure-R, and vasopressor use in the prior 24 hours. Such an example patient subphenotype classifier is described in Example 7 as Model C.9.
In various embodiments, a patient subphenotype classifier receives, as input, the following fifteen input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, bilirubin-H, platelets-L, age, gender, tidal volume-R, plateau pressure-R, and vasopressor use in the prior 24 hours. Such an example patient subphenotype classifier is described in Example 7 as Model C.10.
In various embodiments, a patient subphenotype classifier receives, as input, the following fourteen input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, bilirubin-H, platelets-L, age, tidal volume-R, plateau pressure-R, and vasopressor use in the prior 24 hours. Such an example patient subphenotype classifier is described in Example 7 as Model C.11.
In various embodiments, a patient subphenotype classifier receives, as input, the following thirteen input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, arterial pH-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, bilirubin-H, platelets-L, age, plateau pressure-R, and vasopressor use in the prior 24 hours. Such an example patient subphenotype classifier is described in Example 7 as Model C.12.
In various embodiments, a patient subphenotype classifier receives, as input, the following twelve input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, bilirubin-H, platelets-L, age, plateau pressure-R, and vasopressor use in the prior 24 hours. Such an example patient subphenotype classifier is described in Example 7 as Model C.13.
In various embodiments, a patient subphenotype classifier receives, as input, the following eleven input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, PaO2—R, FiO2-R, creatinine-R, bilirubin-H, platelets-L, age, plateau pressure-R, and vasopressor use in the prior 24 hours. Such an example patient subphenotype classifier is described in Example 7 as Model C.14.
In various embodiments, a patient subphenotype classifier receives, as input, the following eleven input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, PaO2—R, FiO2-R, bicarbonate-L, creatinine-R, bilirubin-H, platelets-L, age, and plateau pressure-R. Such an example patient subphenotype classifier is described in Example 7 as Model C.15.
In various embodiments, a patient subphenotype classifier receives, as input, the following ten input variables: heart rate-R, mean arterial pressure-R, respiratory rate-R, PaO2—R, FiO2-R, creatinine-R, bilirubin-H, platelets-L, age, and plateau pressure-R. Such an example patient subphenotype classifier is described in Example 7 as Model C.16.
In various embodiments, the patient subphenotype classifier is composed of two or more submodels that enable the patient subphenotype classifier to generate a prediction. Here, each of the two or more submodels of the patient subphenotype classifier can analyze EHR data of the subject. In various embodiments, the two or more submodels of the patient subphenotype classifier each analyze different EHR data of the subject. In various embodiments, the two or more submodels of the patient subphenotype classifier each analyze same EHR data of the subject. In various embodiments, the patient subphenotype classifier is composed of two submodels. In various embodiments, the patient subphenotype classifier is composed of three submodels. In various embodiments, the patient subphenotype classifier is composed of four submodels. In various embodiments, the patient subphenotype classifier is composed of five submodels. In various embodiments, the patient subphenotype classifier is composed of six submodels. In various embodiments, the patient subphenotype classifier is composed of seven submodels. In various embodiments, the patient subphenotype classifier is composed of eight submodels. In various embodiments, the patient subphenotype classifier is composed of nine submodels. In various embodiments, the patient subphenotype classifier is composed of ten submodels.
In particular embodiments, the patient subphenotype classifier is composed of at least a first model that generates a preliminary prediction as to a subphenotype of the subject and a second model that generates a prediction as to the likely mortality of the subject. As used herein, such a first model that generates a preliminary prediction of the subphenotype of the subject is referred to as a subphenotyping submodel. For example, the preliminary prediction of the subphenotype can be an indication that identifies whether the subject is preliminarily determined to be in one of a plurality of classifications. As a specific example, the subphenotyping model may perform an unsupervised clustering analysis (e.g., K-means cluster) and therefore, subphenotyping model clusters the subject according to EHR data of the subject. Therefore, the classification corresponding to the cluster of the subject can serve as the preliminary prediction of the subphenotype of the subject.
Here, a second model that generates a prediction of the likely mortality of the subject is referred to as a mortality submodel. The mortality submodel can output a prediction of a mortality score. A mortality score can be indicative of a level of mortality risk for the subject. In various embodiments, the mortality score is between 0 and 1. For example, a mortality risk closer to 1 indicates a high risk of mortality for the subject, whereas a mortality risk closer to 0 indicates a lower risk of mortality for the subject. In various embodiments, the mortality score can be the prediction outputted by the patient subphenotype classifier. Thus, the mortality score can be compared to one or more threshold values to determine a classification for the subject.
In various embodiments, the subphenotyping submodel is constructed via unsupervised learning methods. For example, the subphenotyping submodel can be constructed using unsupervised K-means clustering methods. In various methods the mortality submodel is constructed via supervised learning models.
In various embodiments, the output of one of the submodels is provided as input to another one of the submodels. For example, the output of a subphenotyping submodel can be provided as input to a mortality submodel. As another example, the output of a mortality submodel can be provided as input to a subphenotyping submodel. In various embodiments, the patient subphenotype classifier includes multiple subphenotyping submodels and one mortality submodel. For example, the patient subphenotype classifier can include two subphenotyping submodels whose outputs serve as two inputs into a single mortality submodel. For example, the patient subphenotype classifier can include three subphenotyping submodels whose outputs serve as three inputs into a single mortality submodel. In various embodiments, the patient subphenotype classifier includes one subphenotyping submodel and multiple mortality submodels. For example, the patient subphenotype classifier can include one subphenotyping submodel whose output serves as an input into each of two mortality submodels.
Reference is made to
As shown in
In various embodiments, the subphenotyping submodel 240 can receive, as input, any of the combinations of EHR data described above in relation to the patient subphenotyping classifier. In particular embodiments, the subphenotyping submodel 240 receives the following eight EHR data as input: arterial pH-R, bicarbonate-L, creatinine-R, FiO2-R, heart rate-R, mean arterial pressure-R, respiratory rate-R, and PaO2—R. The subphenotyping submodel 240 analyzes the EHR data and outputs a preliminary prediction of the subphenotype of the subject. For example, the subphenotyping submodel 240 performs a clustering analysis (e.g., K-means clustering) and determines a preliminary prediction of the subphenotype of the subject according to the cluster in which the subject is located in.
In various embodiments, the mortality submodel 250 can receive, as input, any of the combinations of EHR data described above in relation to the patient subphenotyping classifier as well as the preliminary prediction of the subphenotype of the subject determined by the subphenotyping submodel 240. In particular embodiments, the mortality submodel 250 receives, as input, the following nine EHR data inputs: bilirubin-H, age, gender, PaCO2—R, ratio of PaO2—R/FiO2-R, positive end-expiratory pressure-R, plateau pressure-R, tidal volume R, and body mass index (BMI). In addition to these nine EHR data inputs, the mortality submodel 250 receives the preliminary prediction of the subphenotype of the subject determined by the subphenotyping submodel 240.
In various embodiments, the classifier 230 may include one subphenotyping submodel 240 and two mortality submodels 250. Here, the output of the subphenotyping model 240 can serve as inputs to each of the two mortality submodels 250. Such an example of a classifier 230 including a subphenotyping submodel 240 and two mortality submodels 250 is described below in relation to
In various embodiments, the subphenotyping submodel can receive, as input, any of the combinations of EHR data described above in relation to the patient subphenotyping classifier. In particular embodiments, the subphenotyping submodel receives the following eight EHR data as input: arterial pH-R, bicarbonate-L, creatinine-R, FiO2-R, heart rate-R, mean arterial pressure-R, respiratory rate-R, and PaO2—R. The subphenotyping submodel analyzes the EHR data and outputs a preliminary prediction of the subphenotype of the subject. For example, the subphenotyping submodel performs a clustering analysis (e.g., K-means clustering) and determines a preliminary prediction of the subphenotype of the subject according to the cluster in which the subject is located in.
In various embodiments, each of the first and second mortality submodels 250 can receive, as input, any of the combinations of EHR data described above in relation to the patient subphenotyping classifier as well as the preliminary prediction of the subphenotype of the subject determined by the subphenotyping submodel. In particular embodiments, the first mortality submodel receives, as input, bilirubin-H and the preliminary prediction of the subphenotype of the subject determined by the subphenotyping submodel. In particular embodiments, the second mortality submodel receives, as input, the following six EHR data inputs: bilirubin-H, PaCO2—R, ratio of PaO2—R/FiO2-R, positive end-expiratory pressure-R, tidal volume-R, and plateau pressure-R. The second mortality submodel further receives the preliminary prediction of the subphenotype of the subject determined by the subphenotyping submodel. Here, the outputs of each of the first mortality submodel and the second mortality submodels can be combined to produce a combined mortality score that is informative for classifying the subject.
Reference is made to
In various embodiments, a classifier 230 can include multiple subphenotyping submodels 240. For example, the classifier can include two subphenotyping submodels 240 as well as a mortality submodel 260 and mortality submodel 250. Such an example of a classifier 230 including two subphenotyping submodels 240, a mortality submodel 260, and a mortality submodel 250 is described below in relation to
In various embodiments, the first subphenotyping submodel and the second subphenotyping submodel receive the same EHR data as input. For example, the first subphenotyping submodel and the second subphenotyping submodel receive, as input the following eight EHR data inputs: arterial pH-R, bicarbonate-L, creatinine-R, FiO2-R, heart rate-R, mean arterial pressure-R, respiratory rate-R, and PaO2—R. In various embodiments, the mortality submodel 250 receives as input the same eight EHR data inputs (e.g., arterial pH-R, bicarbonate-L, creatinine-R, FiO2-R, heart rate-R, mean arterial pressure-R, respiratory rate-R, and PaO2—R). Each of the outputs from the two subphenotyping models and the first mortality submodel (e.g., mortality submodel 260) are provided as input to a second mortality submodel (e.g., mortality submodel 250). In various embodiments, the mortality submodel 250 additionally receives as input the following nine EHR data inputs: bilirubin-H, age, gender, PaCO2—R, ratio of PaO2—R/FiO2-R, positive end-expiratory pressure-R, plateau pressure-R, tidal volume R, and body mass index (BMI). Thus, the mortality submodel 250 receives a total of twelve inputs (e.g., 9 EHR data inputs and 3 inputs determined from other submodels). The mortality submodel 250 outputs a prediction, such as a mortality score that is informative for determining a classification of the subject.
As described herein, the model training module 150 as shown in
In various embodiments, patient subphenotype classifiers comprise a function and/or a plurality of parameters. The function captures the relationship between independent variables (e.g., EHR data) and dependent variables (e.g., a score or prediction) in the training dataset. The parameters modify the function, and are identified during training of the patient subphenotype classifier based on the training dataset. Generally, parameters of the patient subphenotype classifier are learned by a computer because it would be too difficult or too inefficient for the parameters to be identified by a human based on the training dataset due to the size and/or complexity of the training dataset. For example, if the patient subphenotype classifier is a K-means cluster, the parameters of the patient subphenotype classifier can be the positions of cluster centroids and observations assigned to each cluster.
The training dataset used to construct the patient subphenotype classifier can depend on the type of the patient subphenotype classifier. Generally, the training dataset comprises a plurality of training samples. Each training sample i from the training dataset is associated with a retrospective subject, and comprises EHR data for the retrospective subject. A retrospective subject is a subject for whom at least EHR data is known.
To train the patient subphenotype classifier, each training sample i from the training dataset is input into the patient subphenotype classifier. The patient subphenotype classifier processes these inputs as if the model were being routinely used to generate a prediction (e.g., a score). However, depending on the type of the patient subphenotype classifier, each training sample i of the training dataset may comprise additional components.
In embodiments in which the patient subphenotype classifier is learned via unsupervised learning, the patient subphenotype classifier is trained based on the basic training dataset described above. For example, in embodiments in which the patient subphenotype classifier is constructed via K-means clustering, an optimal number and configuration of clusters that both minimize differences between the training samples within each cluster, and maximize differences between the training samples between clusters, are determined. Specifically, in training the patient subphenotype classifier using K-means clustering, parameters θ that define the centroid of each cluster in the variable space of the patient subphenotype classifier are learned. Collectively, these parameters θ can mathematically modify the function to specify the dependence between independent variables (e.g., EHR data) and dependent variables (e.g., a prediction or score). The clinical significance of each cluster can be determined by examining the inputs to the patient subphenotype classifier that affect assignment of the inputs to clusters.
In embodiments in which the patient subphenotype classifier is learned via supervised learning, each training sample i from the training dataset further includes a retrospective classification (e.g., ARDS subphenotype classification) for the retrospective subject associated with the training sample. In other words, in embodiments in which the patient subphenotype classifier is learned via supervised learning, the patient subphenotype classifier is trained based in part on the known ARDS subphenotype classification of retrospective subjects associated with the training dataset.
In addition to training the patient subphenotype classifier to optimize a prediction of an ARDS subphenotype, in some embodiments, the patient subphenotype classifier can be trained to optimize other performance metrics. For example, the patient subphenotype classifier can also be trained to optimize fundamental predictive metrics, such as, for example, sensitivity and specificity of the prediction. Furthermore, the patient subphenotype classifier can be trained to optimize for any weighted combination of performance metrics.
Turning back to training of the patient subphenotype classifier using retrospective medical outcomes, after each iteration of the patient subphenotype classifier using a training sample i in the training dataset, the difference between the prediction output by the model and the retrospective classification of the retrospective subject is determined. Specifically, in embodiments in which the patient subphenotype classifier is configured to determine an ARDS classification for a subject, the patient subphenotype classifier determines the difference between the classification output by the model and the known retrospective classification for the retrospective subject.
The patient subphenotype classifier seeks to maximize improvement of the performance of the classifier by adjusting this difference between the predicted classification by the patient subphenotype classifier and the retrospective classification. For example, the patient subphenotype classifier seeks to maximize improvement by adjusting the difference between the predicted classification output by the model and the known retrospective classification. To adjust this difference, the patient subphenotype classifier can minimize or minimize a loss function for the patient subphenotype classifier. The loss function ℓ(ui∈S,, θ) represents discrepancies between values of dependent variables ui∈S for one or more training samples i in the training data S (e.g., known, retrospective classification). In simple terms, the loss function represents the difference between the prediction classification by the patient subphenotype classifier and the known, retrospective classification in the training dataset. There are a plurality of loss functions known to those skilled in the art, and any one of these loss functions can be utilized in generating the patient subphenotype classifier.
By minimizing or maximizing the loss function with respect to θ, values for a set of parameters θ can be determined. In some embodiments, the patient subphenotype classifier can be a parametric model in which the set of parameters θ mathematically modify the function to specify the dependence between independent variables (e.g., EHR data) and dependent variable (e.g., predicted classification). In other words, the set of parameters θ determined by minimizing or maximizing the loss function can be used to modify the function of the patient subphenotype classifier such that the outputted predicted classification is optimized. Typically, the parameters of parametric-type models that minimize or maximize the loss function are determined through gradient-based numerical optimization algorithms, such as batch gradient algorithms, stochastic gradient algorithms, and the like. Alternatively, the patient subphenotype classifier may be a non-parametric model in which the model structure is determined from the training dataset and is not strictly based on a fixed set of parameters.
In some embodiments, during training of the patient subphenotype classifier, one or more training samples i are automatically received at specified time intervals and the plurality of parameters of the patient subphenotype classifier are automatically identified using the received training samples i at specified time intervals, such that the patient subphenotype classifier is automatically updated at specified time intervals. In alternative embodiments, during training of the patient subphenotype classifier, one or more training samples i are automatically received in real-time, near real-time, delayed batch or on demand and the plurality of parameters are automatically identified in-real time using the received training samples i, such that the patient subphenotype classifier is automatically updated in-real time.
When the patient subphenotype classifier achieves a threshold level of prediction accuracy (e.g., when the predicted classifications determined by the model are sufficiently optimized), the patient subphenotype classifier is ready for use. To determine when the patient subphenotype classifier has achieved the threshold level of prediction accuracy sufficient for use, validation of the patient subphenotype classifier can be performed. Once the patient subphenotype classifier has been validated as having achieved the threshold level of prediction accuracy sufficient for use, in some embodiments, this does not preclude the model from continued training. In fact, in a preferred embodiment, despite validation, the patient subphenotype classifier continues to be automatically trained such that the set of parameters of the patient subphenotype classifier are automatically and continuously updated, such that the accuracy of the patient subphenotype classifier continues to improve.
Disclosed herein is the analysis of EHR data using patient subphenotype classifiers for predicting classifications for subjects. In various embodiments, EHR data can be collected and electronically recorded at any site prior to being provided as input into the patient subphenotype classifiers. In particular embodiments, the EHR data can be obtained from any private, public, and/or commercial source of EHR data. For example, the EHR data can be obtained from a private medical and/or health record and/or middleware system including a patient care center record system, a clinical laboratory record system, a research laboratory record system, such as EPIC®, Cerner®, Allscripts®, MedMined™, Beaker®, and Data Innovations®, and any alternative private medical and/or health record and/or middleware system. The EHR data can also be obtained from any publicly- and/or commercially-available source of EHR data, including published medical record databases and scientific publications such as PhysioNet datasets including the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) datasets, Philips eICU datasets, and National Heart, Lung, and Blood Institute Biospecimen and Data Repository Information Coordinating Center (BioLINCC) datasets. In various embodiments, the EHR data can include any of the ALVEOLI dataset, ARMA dataset, ARDSnet dataset, ARMA-KARMA-LARMA datasets, FACTT dataset, EDEN dataset, SAILS dataset, and ART dataset.
In certain embodiments, the EHR data received by the patient classifier system (e.g., patient classifier system 130 shown in
In various embodiments, the EHR data can be received from multiple, distinct third-party sources and therefore, the EHR data may be represented in multiple, distinct data formats in accordance with the different third-party sources. For instance, EHR data for different subjects can be organized within different structures. As an example, in some embodiments, EHR data can be organized in delimited flat files, structured documents (e.g., JSON formatted documents), or relational databases. Furthermore, the labeling of EHR data within these different structures can differ as well. For example, in a first structure, heart rate data may be labeled as “HR,” while in a second, different structure, heart rate data may be labeled as “heart rate,” while in yet a third, different structure, heart rate data may be labeled in code. Even further, EHR data can be stored in different units. For example, a first set of EHR data describing temperature may be recorded in Fahrenheit units, while a second set of EHR data describing temperature may be recorded in Celsius units. To render all of these distinct data formats compatible with one another such that the data can be merged to form a single dataset and can be input into the patient subphenotype classifier, the distinct data formats can be transformed into a common data format. In some embodiments, the distinct data formats can be transformed into a common data format using a publicly-available data transformation model such as, for example, the OMOP Common Data Model.
In certain embodiments, prior to inputting the EHR data into the patient subphenotype classifier, the EHR data can be combined to create new EHR data. For example, the EHR data can be used to create new EHR data describing data trends over time. As another example, the EHR data can be used to create new EHR data comprising ratios or differences between different EHR data variables. In such embodiments, this new, combined EHR data can be input into the model.
In various embodiments, prior to inputting the EHR data into the patient subphenotype classifier, certain patients can be removed from analysis according to their EHR data. For example, in certain embodiments, the patient subphenotype classifier is only deployed to analyze a subset of ARDS patients. In various embodiments, a subset of ARDS patients are patients with any of mild, moderate, or severe ARDS. Patients with mild ARDS can be characterized by a P/F ratio between 200 and 300, where “P” refers to the partial pressure of oxygen (PaO2) and “F” refers to the fraction of inspired oxygen (FiO2). Patients with moderate ARDS can be characterized by a P/F ratio between 100 and 200. Patients with severe ARDS can be characterized by a P/F ratio less than 100. In various embodiments, patients with moderate to severe ARDS can be characterized by a P/F ratio ≤ 200. In various embodiments, patients with mild, moderate, or severe ARDS can be characterized by a P/F ratio ≤ 300. Thus, ARDS patients that are not included in the subset of ARDS patients are not analyzed.
In further embodiments, prior to inputting the EHR data into the patient subphenotype classifier, the EHR data is encoded. In some embodiments, the EHR data is encoded prior to being input into the patient subphenotype classifier. As one example, EHR data describing a heart rate of 60 beats/minute can be encoded in an array of bits as [111100]. As another example, EHR data can be encoded via K-means clustering. K-means clustering can serve to both de-identify subject EHR data, as well as to prevent effects of data-drift. For example, in a case in which EHR data describing mean and median subject body weight steadily increases, the EHR data can continuously undergo K-means clustering, and each identified cluster can be assigned a numeric index. Then, the actual subject body weight values are associated with the numeric indices, and can fluctuate over time and geography.
Step 330 involves selecting a treatment for the subject based on the ARDS classification. For example, one or more treatments can be selected for administration to the subject based on the ARDS classification. As another example, one or more treatments can be selected to be withheld from the subject based on the ARDS classification. Example treatments include neuromuscular blockage (NMB) treatments, Positive End-Expiratory Pressure (PEEP), corticosteroids (e.g., methylpredinosolone or dexamethasone), lisofylline, ketoconazole, catheter and fluid treatment, recruitment maneuver, or statins. Guided therapy based on the ARDS classification is described in further detail herein.
Disclosed herein are methods, non-transitory computer readable media, and systems for classifying subjects into different ARDS patient subphenotypes by implementing a patient subphenotype classifier. In various embodiments, the patient subphenotype classifier classifies a subject into one out of two possible ARDS subphenotypes. In various embodiments, the patient subphenotype classifier classifies a subject into one out of three possible ARDS subphenotypes. In various embodiments, the patient subphenotype classifier classifies a subject into one out of four possible ARDS subphenotypes. In various embodiments, the patient subphenotype classifier classifies a subject into one out of five possible ARDS subphenotypes. In various embodiments, the patient subphenotype classifier classifies a subject into one out of more than five possible ARDS subphenotypes.
In various embodiments, ARDS subphenotypes are associated with certain biological processes of ARDS. For example, an ARDS subphenotype can be associated with a particular inflammatory response. As another example, an ARDS subphenotype can be associated with a particular immune response.
In particular embodiments, an ARDS subphenotype for a subject, herein referred to as subphenotype A, corresponds to a hypoinflammatory state. In some scenarios, a hypoinflammatory ARDS subphenotype can be correlated with better outcomes (e.g., lower mortality). In particular embodiments, an ARDS subphenotype for a subject, herein referred to as subphenotype B, corresponds to a hyperinflammatory state. In some scenarios, a hyperinflammatory ARDS subphenotype can be correlated with worse outcomes (e.g., higher mortality).
In various embodiments, ARDS subphenotypes are associated with different patient outcomes. For example, an ARDS subphenotype can be associated with better outcomes and therefore, can be referred to as a lower risk group subphenotype. As another example, an ARDS subphenotype can be associated with intermediate outcomes and therefore, can be referred to as a medium risk group. As another example, an ARDS subphenotype can be associated with worse outcomes and therefore, can be referred to as a higher risk group.
In various embodiments, different ARDS subphenotypes can be characterized by differences in expression levels of one or more biomarkers. For example, if ARDS subphenotypes as are associated with certain underlying biological processes of ARDS (e.g., inflammation or immune response), the ARDS subphenotypes can be further characterized by different expression levels in biomarkers associated with those biological processes. In various embodiments, the biomarkers can include one or more of intercellular adhesion molecule-1 (ICAM-1), interleukin-6 (IL-6), plasminogen activator inhibitor-1 (PAI-1), interleukin-8 (IL-8), interleukin-10 (IL-10); tumor necrosis factor receptor 1 (TNFR-I); tumor necrosis factor II (TNFR-II), or von Willebrand factor (VW). In particular embodiments, an ARDS subphenotype associated with a hyperinflammatory state (e.g., subphenotype B) can be characterized by increased expression levels of inflammatory markers such as one or more of ICAM-1, IL-6, PAI-1, IL-6, IL-8, IL-10, TNFR-I, TNFR-II, and VW. In particular embodiments, an ARDS subphenotype associated with a hypoinflammatory state (e.g., subphenotype A) can be characterized by decreased expression levels of inflammatory markers such as one or more of ICAM-1, IL-6, PAI-1, IL-6, IL-8, IL-10, TNFR-I, TNFR-II, and VW.
Methods disclosed herein involve classifying a subject into one of two or more ARDS subphenotypes using a patient subphenotype classifier that analyzes EHR data of the subject. In various embodiments, the ARDS classification of the subject, is useful for guiding a treatment selection for the subject. For example, the ARDS classification can be useful for selecting a treatment for providing to the subject. As another example, the ARDS classification can be useful for determining whether a treatment is to be withheld from a subject.
In various embodiments, the ARDS classification of the subject is useful for guiding an ARDS treatment for the subject, including any one of a neuromuscular blockage (NMB) therapy, positive end-expiratory pressure (PEEP) therapy, corticosteroid therapy (e.g., methylprednisolone or dexamethasone), lisofylline, ketoconazole, catheter and fluid treatment, recruitment maneuver, statins, and feeding/nutrition.
In particular embodiments, depending on the ARDS classification, the selected treatment is to administer NMB therapy. In particular embodiments, the selected treatment is to withhold NMB therapy. In particular embodiments, the selected treatment is to administer either high PEEP or low PEEP. In particular embodiments, the selected treatment is to only administer low PEEP. In particular embodiments, the selected treatment is to administer methylprednisolone. In particular embodiments, the selected treatment is to withhold methylprednisolone. In particular embodiments, the selected treatment is to administer dexamethasone. In particular embodiments, the selected treatment is to withhold dexamethasone. In particular embodiments, the selected treatment is to withhold lisofylline. In particular embodiments, the selected treatment is to administer lisofylline. In particular embodiments, the selected treatment is to administer ketoconazole. In particular embodiments, the selected treatment is to withhold ketoconazole. In particular embodiments, the selected treatment is to provide liberal or conservative fluid management. The liberal or conservative fluid management can be provided through either a pulmonary artery catheter (PAC) or central venous catheter (CVC) line. In particular embodiments, the selected treatment is to withhold a combination of PAC line and liberal fluid. In particular embodiments, the selected treatment is to provide recruitment maneuver. In particular embodiments, the selected treatment is to withhold recruitment maneuver. In particular embodiments, the selected treatment is to administer statins. In particular embodiments, the selected treatment is to administer statins at any time. In particular embodiments, the selected treatment is to administer statins as early as possible, even prior to ARDS diagnosis (if no contraindications). In particular embodiments, the selected treatment is to administer full feeding. In particular embodiments, the selected treatment is to administer full or enteral feeding.
Table 1 below shows particular guided therapies according to the patient subphenotypes of subphenotype A and subphenotype B in accordance with an embodiment.
The methods disclosed herein, are, in some embodiments, performed on one or more computers or computer systems. For example, the training and implementation of a patient subphenotype classifier can be implemented in hardware or software, or a combination of both. In one embodiment of the invention, a machine-readable storage medium is provided, the medium comprising a data storage material encoded with machine readable data which, when using a machine programmed with instructions for using said data, is capable of displaying any of the datasets and execution and results of the models described herein. The invention can be implemented in computer programs executing on programmable computers, comprising a processor, a data storage system (including volatile and non-volatile memory and/or storage elements), a graphics adapter, a pointing device, a network adapter, at least one input device, and at least one output device. A display is coupled to the graphics adapter. Program code is applied to input data to perform the functions described above and generate output information. The output information is applied to one or more output devices, in known fashion. The computer can be, for example, a personal computer, microcomputer, or workstation of conventional design.
Each program can be implemented in a high-level procedural or object-oriented programming language to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language can be a compiled or interpreted language. Each such computer program is preferably stored on a storage media or device (e.g., ROM or magnetic diskette) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. The system can also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
The signature patterns and databases thereof can be provided in a variety of media to facilitate their use. “Media” refers to a manufacture that contains the signature pattern information of the present invention. The databases of the present invention can be recorded on computer readable media, e.g. any medium that can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. One of skill in the art can readily appreciate how any of the presently known computer readable mediums can be used to create a manufacture comprising a recording of the present database information. “Recorded” refers to a process for storing information on computer readable medium, using any such methods as known in the art. Any convenient data storage structure can be chosen, based on the means used to access the stored information. A variety of data processor programs and formats can be used for storage, e.g. word processing text file, database format, etc.
The storage device 408 is a non-transitory computer-readable storage medium such as a hard drive, compact disk read-only memory (CD-ROM), DVD, or a solid-state memory device. The memory 406 holds instructions and data used by the processor 402. The input interface 414 is a touch-screen interface, a mouse, track ball, or other type of pointing device, a keyboard, or some combination thereof, and is used to input data into the computer 400. In some embodiments, the computer 400 may be configured to receive input (e.g., commands) from the input interface 414 via gestures from the user. The graphics adapter 412 displays images and other information on the display 418. The network adapter 416 couples the computer 400 to one or more computer networks.
The computer 400 is adapted to execute computer program modules for providing functionality described herein. As used herein, the term “module” refers to computer program logic used to provide the specified functionality. Thus, a module can be implemented in hardware, firmware, and/or software. In one embodiment, program modules are stored on the storage device 408, loaded into the memory 406, and executed by the processor 402.
The types of computers 400 used by the entities of
In one aspect, the disclosure provides a method for determining a subphenotype classification of a subject exhibiting acute respiratory distress syndrome (ARDS). ARDS is respiratory failure with rapid onset of widespread inflammation in the lungs. ARDS is not triggered by a single pathology-ARDS can be caused by sepsis, pneumonia, trauma, aspiration, pancreatitis, and/or other insults. A subject can be classified as subphenotype A or subphenotype B.
To classify a subject exhibiting ARDS as subphenotype A or subphenotype B, electronic health record (EHR) data is obtained for the subject. EHR data for a subject comprises an electronically-recorded set of medical and/or health information for the subject. EHR data can comprise any type of medical and/or health data for a subject, and can be collected by any means. For example, EHR data can be collected and electronically recorded at a patient care center (e.g., a physician’s office, the emergency department of a hospital, the intensive care unit of a hospital, the ward of a hospital), a clinical laboratory, a research laboratory, a remote consumer medical device, a therapeutic device (e.g., an infusion pump), a monitoring device such as a wearable device (e.g., a heart rate monitor), and any other site. EHR data can also be obtained from any private, public, and/or commercial source. In a preferred embodiment, the EHR data obtained for the subject comprises data that is routinely collected as standard-of-care for ARDS treatment. For instance, in a preferred embodiment, the EHR data obtained for the subject does not include data which must be measured outside of lab work and clinical data typically involved in standard-of-care for ARDS (e.g., with a dedicated blood test).
The EHR data for the subject is used by a patient subphenotype classifier to determine a subphenotype classification of the subject. In other words, based on the subject’s EHR data, a patient subphenotype classifier classifies the subject as subphenotype A or subphenotype B.
In alternative embodiments, rather than determining a classification of the subject exhibiting ARDS, the classification of the subject can be simply obtained. For example, in some embodiments, the classification of the subject can be pre-determined (e.g., already known).
In some embodiments, a mortality prognosis can be determined for the subject based at least in part on the classification of the subject as subphenotype A or subphenotype B. Specifically, in some embodiments, a subject classified as subphenotype B can be determined to have a mortality prognosis of high mortality risk, while a subject classified as subphenotype A can be determined to have a mortality prognosis of low mortality risk. In certain embodiments, low mortality risk can comprise at least one of reduced risk of hospital mortality, reduced risk of ICU mortality, reduced risk of 28-day mortality, reduced risk of 90-day mortality, reduced risk of 180-day mortality, and reduced risk of 6-month mortality relative to high mortality risk. In some further embodiments, low mortality risk can further comprise positive patient outcome, high mortality risk can further comprise negative patient outcome, and positive patient outcome can comprise at least one of shorter hospital length of stay, shorter ICU length of stay, and more ventilator-free days relative to negative patient outcome.
In some embodiments, a treatment recommendation can be determined for the subject based at least in part on the classification of the subject as subphenotype A or subphenotype B. Specifically, in some embodiments, the treatment recommendation for a subject classified as subphenotype B can be at least neuromuscular blockade (NMB) therapy, while the treatment recommendation for a subject classified as subphenotype A can be at least no NMB therapy. In certain embodiments, identifying the treatment recommendation for the subject can further include administering or having administered therapy to the subject based on the treatment recommendation.
In some embodiments, the patient subphenotype classifier can comprise one of a Model 1, a Model 2, a Model 3, a Model 4, a Model 5, or a Model 6. In embodiments in which the patient subphenotype classifier comprises the Model 1, the EHR data for the subject can include 13 input variables. In embodiments in which the patient subphenotype classifier comprises the Model 2, the EHR data for the subject can include 8 input variables. In embodiments in which the patient subphenotype classifier comprises the Model 3, the EHR data for the subject can include 17 input variables. In embodiments in which the patient subphenotype classifier comprises the Model 4, the EHR data for the subject can include 13 input variables. In embodiments in which the patient subphenotype classifier comprises the Model 5, the EHR data for the subject can include 9 input variables. In embodiments in which the patient subphenotype classifier comprises the Model 6, the EHR data for the subject can include 16 input variables.
In embodiments in which the patient subphenotype classifier comprises the Model 1, the EHR data for the subject can include the subject’s arterial pH, bicarbonate, creatinine, diastolic blood pressure (BP), FiO2, heart rate, highest mean arterial pressure, lowest mean arterial pressure, potassium, highest respiratory rate, lowest respiratory rate, oxygen saturation (SPO2), and systolic BP. More specifically, in some embodiments in which the patient subphenotype classifier comprises the Model 1, the EHR data for the subject can include the subject’s most recent arterial pH, lowest bicarbonate, most recent creatinine, most recent diastolic blood pressure (BP), most recent FiO2, most recent heart rate, highest mean arterial pressure, lowest mean arterial pressure, most recent potassium, highest respiratory rate, lowest respiratory rate, most recent SPO2, and most recent systolic BP.
In embodiments in which the patient subphenotype classifier comprises the Model 2, the EHR data for the subject can include the subject’s arterial pH, bicarbonate, creatinine, FiO2, heart rate, PaO2, mean arterial pressure, and respiratory rate. More specifically, in some embodiments in which the patient subphenotype classifier comprises the Model 2, the EHR data for the subject can include the subject’s most recent arterial pH, lowest bicarbonate, most recent creatinine, most recent FiO2, most recent heart rate, most recent PaO2, most recent mean arterial pressure, and most recent respiratory rate.
In embodiments in which the patient subphenotype classifier comprises the Model 3, the EHR data for the subject can include the subject’s age, arterial pH, bicarbonate, bilirubin, BMI, creatinine, FiO2, gender, heart rate, PaCO2, PaO2/FiO2, PaO2, positive end-expiratory pressure (PEEP), platelet count, tidal volume, mean arterial pressure, and respiratory rate. More specifically, in some embodiments in which the patient subphenotype classifier comprises the Model 3, the EHR data for the subject can include the subject’s age, most recent arterial pH, lowest bicarbonate, highest bilirubin, BMI, most recent creatinine, most recent FiO2, gender, most recent heart rate, most recent PaCO2, lowest PaO2/FiO2 within 24 hours following ARDS diagnosis, most recent PaO2, most recent positive end-expiratory pressure (PEEP), lowest platelet count, lowest tidal volume, most recent mean arterial pressure, and most recent respiratory rate.
In embodiments in which the patient subphenotype classifier comprises the Model 4, the EHR data for the subject can include the subject’s arterial pH, bicarbonate, BMI, creatinine, Fi 02, gender, heart rate, PaCO2, PaO2/FiO2, PEEP, platelet count, mean arterial pressure, and respiratory rate. More specifically, in some embodiments in which the patient subphenotype classifier comprises the Model 4, the EHR data for the subject can include the subject’s most recent arterial pH, most recent bicarbonate, BMI, most recent creatinine, most recent FiO2, gender, most recent heart rate, most recent PaCO2, lowest PaO2/FiO2 within 24 hours following ARDS diagnosis, most recent PEEP, lowest platelet count, most recent mean arterial pressure, and most recent respiratory rate.
In embodiments in which the patient subphenotype classifier comprises the Model 5, the EHR data for the subject can include the subject’s arterial pH, bicarbonate, creatinine, FiO2, heart rate, PaO2, mean arterial pressure, bilirubin, and respiratory rate. More specifically, in some embodiments in which the patient subphenotype classifier comprises the Model 5, the EHR data for the subject can include the subject’s most recent arterial pH, lowest bicarbonate, most recent creatinine, most recent FiO2, most recent heart rate, most recent PaO2, most recent mean arterial pressure, highest bilirubin, and most recent respiratory rate.
In embodiments in which the patient subphenotype classifier comprises the Model 6, the EHR data for the subject can include the subject’s age, arterial pH, bicarbonate, bilirubin, creatinine, FiO2, gender, heart rate, PaCO2, PaO2/FiO2, PaO2, positive end-expiratory pressure (PEEP), platelet count, tidal volume, mean arterial pressure, and respiratory rate. More specifically, in some embodiments in which the patient subphenotype classifier comprises the Model 6, the EHR data for the subject can include the subject’s age, most recent arterial pH, lowest bicarbonate, highest bilirubin, most recent creatinine, most recent FiO2, gender, most recent heart rate, most recent PaCO2, lowest PaO2/FiO2 within 24 hours following ARDS diagnosis, most recent PaO2, most recent positive end-expiratory pressure (PEEP), lowest platelet count, lowest tidal volume, most recent mean arterial pressure, and most recent respiratory rate.
In embodiments in which the patient subphenotype classifier comprises the Model 1, the patient subphenotype classifier can have at least one of an area under receiver-operator curve (AUROC) of greater than or equal to 0.67 and an area under the precision-recall curve (AUPRC) of greater than or equal to 0.40.
In embodiments in which the patient subphenotype classifier comprises the Model 2, the patient subphenotype classifier can have at least one of an AUROC greater than or equal to 0.69 and an AUPRC greater than or equal to 0.42.
In embodiments in which the patient subphenotype classifier comprises the Model 3, the patient subphenotype classifier can have at least one of an AUROC greater than or equal to 0.71 and an AUPRC greater than or equal to 0.62
In embodiments in which the patient subphenotype classifier comprises the Model 4, the patient subphenotype classifier can have at least one of an AUROC greater than or equal to 0.67 and an AUPRC greater than or equal to 0.46.
In some embodiments, the patient subphenotype classifier can comprise a machine-learned model. For example, in certain embodiments, the patient subphenotype classifier can comprise at least one of a k-means clustering classifier, a logistic regression classifier, a decision tree classifier, a random forest classifier, a gradient boosting classifier, a neural network, and any other machine-learned classifier trained to determine the classification of the subject based on the EHR data.
In various embodiments, the patient subphenotype classifier is an ensemble-based model comprising two or more machine learning models. In various embodiments, an output of a first of the two or more machine learning models is used as input to a second of the two or more machine learning models. In various embodiments, a first of the two or more machine learning models of the ensemble-based model is implemented responsive to determining that data elements of the first of the two or more machine learning models are available in the EHR data. In various embodiments, a second of the two or more machine learning models of the ensemble-based model is implemented responsive to: determining that data elements of a first of the two or more machine learning models is unavailable in the EHR data; and determining that data elements of the second of the two or more machine learning models are available in the EHR data. In various embodiments, the first of the two or more machine learning models comprises more features than the second of the two or more machine learning models.
In various embodiments, subphenotype A and subphenotype B are characterized by differences in expression levels in one or more biomarkers. In various embodiments, the one or more biomarkers comprise one or more of PAI-1, IL-6, IL-8, IL-10, TNFR-I, TNFR-II, ICAM-1, or von Willebrand factor. In various embodiments, the one or more biomarkers comprise each of PAI-1, IL-6, IL-8, IL-10, TNFR-I, TNFR-II, ICAM-1, or von Willebrand factor.
Any of the steps of the method described above may be performed by any party and/or at the direction of any party. For instance, in certain embodiments, the steps of the method described above can be performed at the direction of any third-party, such as a provider of the patient subphenotype classifier. In certain further embodiments, the steps of the method described above can have been previously performed at the direction of any third-party, such as a provider of the patient subphenotype classifier.
In another aspect, the disclosure provides a computer-implemented method, including any combination of the steps mentioned above.
In another aspect, the disclosure provides a non-transitory computer-readable storage medium storing computer program instructions that when executed by a computer processor, cause the computer processor to perform any combination of the steps mentioned above.
In another aspect, the disclosure provides a system that includes a storage memory and a processor communicatively coupled to the storage memory. The storage memory is configured to store the EHR data of the subject. The processor is configured to determine the classification of the subject based on the subject’s EHR data stored in the storage memory, as discussed above. In some embodiments, the processor can be further configured to identify the treatment recommendation for the subject based at least in part on the determined classification, as discussed above. In some additional embodiments, the processor can be further configured to identify the mortality prognosis for the subject based at least in part on the determined classification, as discussed above.
Any terms not directly defined herein shall be understood to have the meanings commonly associated with them as understood within the art of the invention. Certain terms are discussed herein to provide additional guidance to the practitioner in describing the compositions, devices, methods and the like of aspects of the invention, and how to make or use them. It will be appreciated that the same thing may be said in more than one way. Consequently, alternative language and synonyms may be used for any one or more of the terms discussed herein. No significance is to be placed upon whether or not a term is elaborated or discussed herein. Some synonyms or substitutable methods, materials and the like are provided. Recital of one or a few synonyms or equivalents does not exclude use of other synonyms or equivalents, unless it is explicitly stated. Use of examples, including examples of terms, is for illustrative purposes only and does not limit the scope and meaning of the aspects of the invention herein.
It must be noted that, as used in the specification, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise.
All references, issued patents and patent applications cited within the body of the specification are hereby incorporated by reference in their entirety, for all purposes.
The foregoing description of the embodiments of the disclosure has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.
Some portions of this description describe the embodiments of the disclosure in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like.
Any of the steps, operations, or processes described herein can be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product including a computer-readable non-transitory medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.
Embodiments may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory, tangible computer readable storage medium, or any type of media suitable for storing electronic instructions, which may be coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
Embodiments of the disclosure may also relate to a product that is produced by a computing process described herein. Such a product may include information resulting from a computing process, where the information is stored on a non-transitory, tangible computer-readable storage medium and may include any embodiment of a computer program product or other data combination described herein.
Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the disclosure be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments of the disclosure is intended to be illustrative, but not limiting, of the scope of the disclosure.
Acute Respiratory Distress Syndrome (ARDS) is respiratory failure with rapid onset of widespread inflammation in the lungs. ARDS is not triggered by a single pathology-- it can be caused by sepsis, pneumonia, trauma, aspiration, pancreatitis, and/or other insults. Based on the hypothesis that the evaluation of ARDS subphenotypes may allow for identifying subgroups that are more homogeneous with respect to pathogenesis, and that this could potentially provide insights into patient outcomes, multiple machine learning-derived electronic health record (EHR)-based classifiers (i.e., “Models”) were developed that are capable of classifying patients into ARDS subphenotypes.
Via post-hoc analysis of the ARDSnet ALVEOLI (available at the URL: https://biolincc.nhlbi.nih.gov/studies/alveoli/), ARMA-KARMA-LARMA (available at the URL https://biolincc.nhlbi.nih.gov/studies/ardsnet/), FACTT (available at the URL https://biolincc.nhlbi.nih.gov/studies/factt/) datasets, the eICU dataset (available at the URL: eicu-crd.mit.edu/about/eicu/), the Brazilian ART dataset (available at the URL: www.ncbi.nlm.nih.gov/pubmed/28973363), and privately-provided data from the Cleveland Clinic, these Models are able to elucidate differential mortality rates in ARDS patients. Models were created using K-means clustering, with each model resulting in 2 clusters. One cluster showed a group of patients with worse sickness and worse outcomes, including higher mortality (i.e., “subphenotype B”) while the second cluster showed a distinctly separate pattern of less severe sickness and generally better outcomes, including lower mortality (i.e., “subphenotype A”). In the Model utilizing the minimal amount of EHR data (Model 2), mortality rates were significantly different, at 20.75% and 35.57% in subphenotype A and subphenotype B, respectively (binomial p-value: 1.0e-08), in a mixed training set from the three ARDSnet datasets. In the holdout dataset from the same three ARDSnet datasets, mortality rates were 23.43% and 38.57% in subphenotype A and subphenotype B, respectively (binomial p-value: 3.6e-03). Similar significant differences in morality were seen in eICU and ART datasets.
Current standard practice dictates that a patient should receive neuromuscular blockade (NMB) therapy if they have a P/F ratio < 150 and FiO2 > 0.6. Across three datasets with NMB information available, mortality rates were 31% for patients whose treatment followed that protocol, and 29% in patients where the protocol was not followed. Patient classification is proposed herein as a new treatment guidance, wherein patients assigned to subphenotype B should receive NMB and patients assigned to subphenotype A should not. Using those guidelines, mortality was significantly reduced when the protocol was followed (28% and 36% in subphenotype B and subphenotype A, respectively (p = 0.002957)).
Overall, this work demonstrates the potential of employing an EHR-based subphenotyping classifier to identify subgroups of patients with varying mortality using readily available data. Patient subphenotype information can be combined with treatment and outcome information to identify populations of patients who have differential responses to therapy and ultimately improve treatment guidance and patient outcomes.
Briefly, patients are flagged for ARDS classification by one or more of Models 1-6 (e.g., patients eligible for ARDS classification by one or more of Models 1-6 are identified), and then a call of the one or more Models is made for that patient at a specific time for subphenotyping. This can be accomplished via batch integration or real-time integration. Batch integration includes collecting a batch of patients for which to run the one or more Models. Real-time integration includes continuously identifying patients for which to run the one or more Models. Batch integration can be done manually or can be automated.
Furthermore, the following describes one embodiment of an example of classification of a patient via real-time integration of one or more of the Models 1-6:
The following describes of an example of classification of patients via batch integration of one or more of the Models 1-6:
The following describes an example of prognostic classification of a patient by one or more of the Models 1-6
The following describes an example of predictive (therapy guidance) classification of a patient by one or more of the Models 1-6:
This Example describes the science and techniques behind the construction of Models that are derived using machine learning and used to assign ARDS patients into subphenotypes for various purposes such as predicting mortality and guiding clinical therapy. Multiple cohort datasets with different survival rates were analyzed to evaluate the effectiveness of the methodology on different patient cohorts.
Preliminary models were developed with publicly available data from the NHLBI ARDS Network (available at the URL: www.ardsnet.org/). Specifically, the ARMA-KARMA-LARMA, ALVEOLI, and FACTT datasets were used. Potential Model inputs were collated into a single file with 2,023 subjects. A randomization algorithm was used to split the combined dataset into 64% train, 16% test, and 20% hold-out validation samples.
After models were developed on the ARDS net data, the eICU-CRD dataset (available at the URL: eicu-crd.mit.edu/about/eicu/) was queried to provide an independent dataset for validation. Patients included were those who had a diagnosis of ARDS during their ICU stay, regardless of admitting diagnoses, with non-APACHE labs and vitals sources from the 24 hours prior to the time their ARDS diagnosis was charted in the ICU (n = 2094 patients with full data).
Additional validation data was sourced from the Brazillian ART dataset (available at the URL: www.ncbi.nlm.nih.gov/pubmed/28973363). Finally, validation data was sourced from internal Cleveland Clinic data.
Commonly recorded EHR vitals, laboratory results, and ventilator information were collated into a dataset with common variable names across all datasets. Variables of interest included Arterial pH, bicarbonate, bilirubin, creatinine, systolic, diastolic, and mean arterial pressure, FiO2, heart rate, mean airway pressure, PaCO2, PaO2, PaO2/FiO2, PEEP, platelets, potassium, respiratory rate, SpO2, and tidal volume. If continuous data were available, the lowest and highest values prior to study enrollment (or diagnosis time in the eICU dataset) were recorded, using L as a postscript for lowest and H as a postscript for highest, as well as the most recent value (postscript of R). For PaO2/FiO2, the lowest value in the 24 hours following enrollment or diagnosis was also recorded (postscript of LP). Age, gender, and BMI were also recorded.
As proof of concept, an initial K-means clustering Model was developed in Alteryx (Irvine, CA). Additionally, a python version was created to enable clinical utilization across numerous operating systems without need for specialized software. ARDSnet flat files prepared as described above were read into python for Model development. Patients were excluded from the dataset if they did not have measurements for all of the input variables, which reduced the total data available based on the model implemented.
Scikit-leam’s (Pedregosa, et al., 2011) StandardScaler (available at the URL: scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) was used to develop a z-score transform for each input variable based on the training data, and that scaler was then applied to both training and validation data. The scikit-leam KMeans algorithm was next used (available at the URL: scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html) to train 2 clusters with 20 initial seeds. After experimentation and examination of contributions to principal components of the data, six Models were developed. The six Models were optimized based on different clinical needs as described in Table 2 below. Each resultant cluster was assigned to an ARDS subphenotype (subphenotype A and subphenotype B).
While Models 1-6 were developed based on the number of input variables and the specific list of input variables provided above in Table 2, in further embodiments, additional Models are developed to include alternative numbers of input variables and alternative combinations of input variables. Specifically, additional Models are developed to include any alternative combination of the input variables listed in Table 2 above. Even further, additional Models are developed to include any alternative combination of variables, not limited to the input variables listed in Table 2 above.
Following assignment of each cluster as a subphenotype, post-hoc analysis was performed to identify differential response to therapy in various datasets. Mortality rates were compared using Chi-Square for large sample size groups, while Fisher exact test was used to compare rates in small sample-size groups. T-tests were used to compare means of numeric values.
Following Model development, the 28 day and 90 day mortality rates were calculated for each subphenotype, dataset, and Model combination. Mortality rates for subphenotype A and subphenotype B for each of Models 1-4 are shown below in Table 3. The ARDSnet datasets are split to show separate results for training versus validation. Model 1 only shows results for the ARDSnet and eICU datasets because some of the input variables were not available in the ART and Cleveland Clinic datasets. Models 2-4 were developed specifically to include input variables which were available in each validation dataset.
ST A mortality
ST B mortality
19.5
36.0
0.000
24.4
38.1
0.009
11.4
30.7
0.000
20.8
35.6
0.000
29.8
47.8
0.000
23.4
38.6
0.004
50.0
54.3
37.0
46.0
15.3
37.4
0.000
23.2
33.8
0.001
29.4
46.3
21.1
37.2
0.012
50.7
53.7
39.4
45.6
19.5
53.7
0.000
22.8
33.9
0.000
24.4
43.0
0.031
21.8
38.8
0.002
55.0
53.0
47.7
43.5
16.7
44.9
0.000
As shown in Table 3, the ARDSnet training and validation datasets and eICU dataset have a significant mortality difference across subphenotypes for each Model created. The ART dataset shows significant difference in patient prognosis for Models 2 and 4, and a p value nearing significance (p = 0.06) for Model 3.
For Models 2, 3, and 4, the Cleveland Clinic dataset did not show a significant difference in mortality (p = 0.43, 0.54, and 0.70 respectively). Upon further consultation with their clinical staff, it was determined that their data included a patient cohort which was significantly sicker than patients in the other datasets. To align Cleveland Clinic data to be more similar to the other data sources, a subset of data “Cleveland - w/o Comorbidities” was created with the following exclusion criteria:
The resultant Cleveland Clinic subset resulted in an improved difference in mortality between subphenotypes A and B.
Based on the availability of data for future studies, Model 2 was selected for future work. Model 2 provides significant differential mortality between subphenotype A and subphenotype B, and a minimal number of input variables which are likely to be collected and stored in the EHR for nearly all patients undergoing ARDS therapy. Likewise the input variables collected are likely to be included in any clinical trials being analyzed. A detailed comparison of patient characteristics by subphenotype for each of the eight input variables of Model 2 is shown below in Table 4A and 4B and Tables 5-8. Generally, subphenotype B patients tend to be sicker than subphenotype A patients. Table 9 below summarizes additional outcomes across each dataset beyond the single mortality rate shown above using Model 2.
Note: Subphenotypes were assigned to 3,259 patient stays in eICU. Of the 3,259 patients, 2,623 (80.48%) had a ‘Full therapy’ care directive during their stay, 305 (9.36%) had a ‘Do not resuscitate’ directive, 87 had no recorded care directive, and the remaining 244 had a care directive less than full therapy, or a combination of directives over their stay. Of the patients with ‘Full therapy’ as the only directive during their stay, mortality was 29.5% in Subphenotype B (116/393) and 10.3% in Subphenotype A (223/2165) (p < 0.0000).
In almost every mortality metric (ICU, hospital, 28 day, 90 day, and 6 month mortality), subphenotype B had a significantly higher mortality rate. Similarly, in the eICU dataset, subphenotype B patients also had a significantly higher predicted mortality risk. In addition to a lower mortality rate, patients in subphenotype A have significantly more ventilator free days in all datasets except in the eICU dataset, which had a lower acuity patient demographic and ART. ART’s analysis does not take the recruitment maneuvers of the study intervention into account. Patients in the Cleveland Clinic dataset did not have a significant difference in ICU or hospital LOS. However, eICU subphenotype A patients had significantly longer LOS for both metrics, even though patients in subphenotype B had significantly higher predicted ICU and hospital LOS.
Table 10 below compares subphenotype A and subphenotype B mortalities from Model 2 with the mortality of the APACHE III and SOFA cutoffs using the metrics of true positives (TP), false positives (FP), false negatives (FN), true negatives (TN), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1, which provides a balanced metric of sensitivity and PPV. The F1 values of Model 2 did not achieve the F1 of APACHE and SOFA. However, the number of input variables of Model 2 is lower and, in the case of APACHE, does not rely upon prior knowledge of a patient’s existing comorbidities.
Furthermore, Model 2 appears to provide information which supplements the APACHE and SOFA scores. A new variable was created which concatenates each of the Model 2 subphenotype A and subphenotype B scores with each of the APACHE scores and SOFA scores. Table 11 below shows differential mortality when each of the subphenotype A and subphenotype B scores from Model 2 were combined with the APACHE cutoff scores. This technique adds an additional level of separation in identifying patient risk. Of note, the lowest mortality is typically seen when subphenotype B scores are mixed with the low-risk mortality APACHE scores (i.e., “ST A AP0”).
Similar results in differential mortality when each of the subphenotype A and subphenotype B scores from Model 2 were combined with the SOFA cutoff scores are shown in Table 12 below for the Cleveland Clinic full dataset (i.e., “CC-All”) and for the Cleveland Clinic with comorbidities removed dataset (i.e., “CC- w/o comorbid”). In this case, subphenotype A cases above the SOFA cutoff score have the highest mortality rate.
Data provided by the Cleveland Clinic identified six potential adjuvant interventions for ARDS patients. Current guidance from the Cleveland Clinic dictates that an ARDS patient is eligible for the first two adjunctive ARDS therapies of proning and NMB within 48 hours of diagnosis if their P/F ratio < 150 and FiO2 > 0.6. Based on the availability of data (228 patients receiving NMB and 76 patients receiving proning), NMB was identified as a first target for differential analysis within subphenotypes A and B of Model 2.
Previous studies have shown conflicting results about the benefits of NMB early in ARDS therapy (ROSE study, PETAL clinical trials network, 2019; ACURASYS study, Papazian, L., available at URL: www.nejm.org/doi/full/10.1056/NEJMoa1005372, 2010). The ROSE study was a US-based study of NMB with sedation. Raw 90-day in-hospital mortality in the NMB intervention group was 42.5% compared with 42.8% in the control group. There were no differences in the additional endpoints measured, and the study was concluded early due to futility. The ACURASYS study showed that patients who received NMB early in their ARDS treatment had significantly lower mortality after adjusting for baseline PaO2/FiO2 and Simplified Acute Physiology II score. Raw mortality rates were 31.6% in the group receivi NMB and 40.7% in the placebo group. Because of the conflicting results and varying methodologies of the studies, there is not an international consensus on use of NMB in ARDS.
Confusion matrices were created to understand the impact of giving NMB versus not giving an NMB when a patient either qualified or did not qualify for NMB using the Cleveland Clinic Protocol. Sample sizes in Cleveland Clinic dataset alone were small, so the additional datasets were queried. ARMA-KARMA-LARMA and ALVEOLI provided relatively large sample sizes with a good mix of treatment and non-treatment. FACTT did not include data on NMB utilization. eICU had a large sample size, but the total number of patients receiving NMB was small. The ART dataset was excluded from this analysis for several reasons. First, in the ART arm of the ART dataset, almost every patient received NMB as part of their recruitment maneuver. Within the ARDSnet control arm, there was still a very high mortality rate, with outcomes not aligned with the other studies.
The data in Tables 13 and 14 suggests that patients in subphenotype B may benefit (or at least not be harmed) from NMB regardless of whether they meet eligibility criteria defined by the PaO2/FiO2 and FiO2 criteria. Conversely, it appears that patients in subphenotype A are harmed by NMB, regardless of their PaO2/FiO2 and FiO2.
49%
54%
74%
56%
45%
53%
67%
58%
42%
56%
100%
42%
46%
50%
37%
46%
64%
45%
30%
47%
56%
51%
9%
47%
100%
17%
38%
47%
Overall Mortality
15%
37%
Regardless of Eligibility
44%
78%
17%
38%
Eligible for prone/NMB
43%
78%
26%
48%
Not Eligible for Prone/NMB
50%
11%
26%
Overall Mortality
24%
36%
Regardless of Eligibility
36%
41%
20%
32%
33%
43%
25%
32%
35%
25%
19%
30%
Overall Mortality
17%
39%
Regardless of Eligibility
25%
52%
15%
31%
Eligible for prone/NMB
21%
56%
16%
30%
Not Eligible for Prone/NMB
28%
25%
16%
33%
Based on those observations, the hypothesis is that a protocol for NMB administration where NMB is administered if a patient is in subphenotype B and NMB is not administered if a patient is in subphenotype A (i.e., “Protocol 1”), will outperform a NMB protocol where a patient receives NMB if their PaO2/FiO2 > 150 and FiO2 > 0.6 (i.e., “Protocol 2”).
Table 15 below depicts the hypothetical NMB Protocol 2, in which an ARDS patient receives NMB therapy if the patient’s PaO2/FiO2 < 150 and FiO2 < 0.6, according to the Cleveland Clinic protocol. A patient was classified as ‘Protocol Followed’ if they met the Cleveland Clinic protocol and received NMB, or if they did not meet the Cleveland Clinic protocol and did not receive NMB. Patients classified as “Protocol Not Followed” were those who met Cleveland Clinic protocol and did not receive NMB, or did not meet Cleveland Clinic protocol but received NMB anyway.
Table 16 below depicts the hypothetical NMB Protocol 1, in which an ARDS patient classified as subphenotype B by Model 2 receives NMB therapy and in which an ARDS patient classified as subphenotype A by Model 2 does not receive NMB therapy. A patient was classified as ‘Protocol Followed’ if they were classified as subphenotype B by Model 2 and received NMB, or if they were classified as subphenotype A by Model 2 and did not receive NMB. Patients classified as “Protocol Not Followed” were those who were classified as subphenotype B by Model 2 and did not receive NMB, or were classified as subphenotype A by Model 2 but received NMB anyway.
Table 15 shows that the overall mortality rate across the Cleveland, ARMA, and ALVEOLI datasets was higher among patients whose care followed Protocol 2 (i.e., the Cleveland Clinic protocol) than it was for patients who were not treated according to Protocol 2 (i.e., the Cleveland Clinic protocol). Following Protocol 2 did not result in a significant difference in mortality (p = 0.3474). In contrast, Table 16 shows that using Protocol 1 (i.e., subphenotyping using Model 2), each dataset showed reduced mortality. While a significant mortality reduction was not identified for any individual dataset, the combination of data from each of the three datasets did show a significant reduction in mortality using Protocol 1 (p = 0.002957).
Additional outcomes are shown in Tables 17 and 18 below for both Protocols 1 and 2. subphenotype A patients who did not receive NMB had more ventilator free days across all datasets. While subphenotype B patients who received NMB benefited from lower mortality rates, they did not see a reduction in ventilator free days. In the 90 day survival rates, patients in subphenotype A who received NMB had significantly lower survival than the other treatment groups, followed by patients in subphenotype B who did not receive NMB. Similar relationships are seen for Protocol 2. However, the relationships for Protocol 2 are not as strong.
Unlike supervised learning which requires data to be labeled with patient outcomes, unsupervised learning draws inferences from the data without awareness of associated patient outcomes. By using K-means clustering analysis as an unsupervised learning approach, this methodology elucidated hidden patterns in ARDS patients. Two ARDS subphenotypes, subphenotype B (high-mortality) and subphenotype A (low-mortality,) were consistently observed by applying K-means clustering to clinical trial and clinical practice data. Comparison of the physiological characteristics of the two subphenotypes shows distinct characteristics between subphenotypes, indicating potential for guided treatment.
The identified subphenotypes were analyzed to identify differential responses to treatment. A potential explanation for the differences in patient outcomes between subphenotypes is that patients in one group are more likely to experience micro-asynchrony. Another potential explanation for the differences in patient outcomes between subphenotypes is that subphenotype B patients are inflamed whereas subphenotype A patients are not inflamed. NMBs have an anti-inflammatory effect. Reducing inflammation in subphenotype B patients may block an immune over-response, whereas patients in subphenotype A may experience normal immune response and the anti-inflammatory effect of the NMBs stops their functioning immune system from doing its job. Another potential explanation for the differences in patient outcomes between subphenotypes is that patients in subphenotype B have additional underlying comorbidities that make it harder to wean them from NMB and ventilator use.
The methods disclosed herein are intended to be used by healthcare professionals to determine a prognostic mortality risk associated with ARDS. It is intended for use on patients having or suspected of having ARDS. The result of the ARDS prognostic tool is intended to be used in conjunction with other clinical assessments by healthcare professionals to assist with triage and/or prioritization of critically ill patients. The ARDS therapy guidance tool is machine learning software that analyzes data from the EHR and is intended to be used by healthcare professionals as aid in assessing patients for whom treatment with NMB agents is being considered.
Using the same datasets and Model input variables outlined above in Example 1, rather than using a K-means clustering Model, binary classifiers were trained to predict patient mortality by assigning each patient to a high mortality risk group or to a low mortality risk group. While in some embodiments, the binary classifiers may be trained using a variety of machine learning methods (e.g., logistic regression classifier, decision tree classifier, random forest classifier, gradient boosting classifier, neural net, and others), in this particular embodiment the Scikit-leam (Pedregosa, et al., 2011) tool kit was used to train a standard scalar (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) for each input variable and then fit a logistic regression (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) to the resulting scaled input variables.
Table 19 below presents the input variables of the logistic regression Models 1-4.
Table 20 below depicts key logistic regression Model performance metrics including the training and validation area under the receiver-operator curve (AUROC) and the training and validation area under the precision-recall curve (AUPRC).
To further evaluate the clinical utility of logistic regression Models 1-4, the impact of tuning the threshold used to turn a decimal score between 0 and 1 output by the logistic regression Model into a 1 (dead) or 0 (alive) prediction, was examined.
Table 21 below depicts logistic regression Model 2 performance metrics with scores tuned to various prediction thresholds. Specifically, Table 21 below depicts that there are one or more prediction thresholds for which logistic regression Model 2′s performance metrics meet or exceed those of procalcitonin (PCT) as a mortality predictor (Schuetz et al., 2017). Underlined values in Table 21 indicates where logistic regression Model 2 matches or exceeds PCT performance on the subset of their patients who were in the ICU on Day 4. In contrast to PCT, which requires multiple blood tests on Day 0 or 1 of ARDS diagnosis and then again on Day 4 to provide a prognosis, the Models presented herein provide a prognostic immediately following ARDS diagnosis if the Model input variables have been measured in the previous 24 hours.
Table 22 below confirms that logistic regression Model 2 produces similar mortality risk stratification to the k-means clustering Models discussed above, as well as to PCT.
There are a number of ensemble techniques which can be used to improve algorithm performance. The general concept of ensembling models involves taking the output from one or more models and using that output as input feature(s) for another model, potentially along with additional new data features.
Using the same data sources and model features as outlined in the EHR-based ARDS Subphenotyper for Mortality Prediction and Treatment Guidance Technical Note, an additional set of ARDS mortality classifiers was developed by ensembling output from the K-means clustering-derived ARDS subphenotype with additional features.
In this specific case, the Sub-8 K-means clustering model was used as input to the various classifier models. Classifier models were evaluated both with and without the 8 features of Sub8. Table 23 below shows an example of the variables input to the ensemble models.
Alternatively, an ensemble model may be built which creates a different model (in this case a logistic regression model) for each subphenotype from the input K-means cluster (
Alternatively, a combination of model outputs (K-means clustering, logistic or linear regression, GMM clustering, etc with the same or different input variables), could be used in combination as inputs to an ensembling algorithm, whose output could then be used to predict an ARDS prognosis or other outcome (
An ensemble of models could also include a series of models which would be applied based on the amount of data available. For the example below, if all data elements are available, the top performing model could be used. If some data elements are unavailable for a given patient or EHR system, a second line model (the gold model shown here) using fewer data elements could be used. If not all of those elements are available, a third line model could be used, and so on. Specifically,
A number of ensembled models were created.
In critical care settings where patients are often treated according their height-based ideal weight rather than their actual admission weight, patient weight is not always recorded in the EHR, and thus the patient BMI may not be available. In that case, a second line model (marked gold below) using 16 inputs can be ensembled in the algorithm suite. In this example, the model follows the flow of
Using the same data sources and model features outlined in the EHR-based ARDS subphenotyper for Mortality Prediction and Treatment Guidance Technical note (K-means Cluster model 2, trained on ARMA-ALVEOLI-FACTT), the patient’s subphenotype was used to evaluate levels of circulating plasma biomarkers measured on the day of study randomization in the ARMA and ALVEOLI studies. Two sample t-tests or Kruskal-Wallis tests were used to identify differences in biomarker levels, depending on whether the biomarker level had a normal distribution. Based on the difference of biomarker levels between subphenotypes, an EHR-only based algorithm could be used to predict specific levels of biomarkers, or ratios of biomarkers.
As shown in Table 25, in both datasets, Subphenotype B (higher mortality subphenotype) exhibited increased levels of ICAM-1 and IL-6. In the ARMA dataset, subphenotype B was further indicative of increased circulating levels of IL-8, sTNFR1, PAI-1, VWF, IL-10 and sTNFR2.
Four biomarkers were correlated with Ensembles 14 (17 features) and Ensemble 4 (8 features, K-means Cluster 8 plus bilirubin subphenotype) to see if there was a correlation between biomarker level and predictor score. Pearson correlation identifies linear correlation, whereas Spearman correlation nonparametrically quantifies rank correlation (the largest values in X correlate with largest values in Y and smallest values in X correlate with smallest values in Y, but not necessarily in a linear manner). Table 26 shows that correlation with biomarkers varies by algorithm. IL6 exhibited a moderate Spearman correlation with Ensemble 14 score.
Scatter plots of Ensemble 14 score versus level of IL-6 (
In addition to the binary high risk / low risk mortality predictions discussed in the above examples, the results from the ARDS mortality prediction algorithms can be used with more than one score threshold to produce more than two risk groups. In one embodiment, the ARDS Prognostic Digital version 1 (APDvl), the Gold ensemble model described in Table 23 is used with two prediction score thresholds to produce three categories of mortality risk: lower, medium, and higher.
Mortality prediction score thresholds of 0.3 and 0.6 are used to categorize patients into lower risk, medium risk, and higher risk categories. The mortality separation for the three APDvl risk groups is shown in Table 27 for the validation cohort. The 95% confidence intervals for the three groups do not overlap, and the chi-squared p-value for mortality rate separation between the three groups is 8.40e-22. The lower risk and higher risk groups are likely to be most useful in informing clinical decisions; they cover 11.0% and 31.4% of the validation population, respectively, with 42.4% of the population falling into one of those two groups.
To visualize the separation of the APDvl risk groups, Kaplan-Meier survival curves were implemented. Specifically,
There are two useful baselines in comparing APDvl performance to other commonly accepted approaches for predicting the mortality of critically ill patients such as those with COVID-19 pneumonia: procalcitonin (PCT) and the APACHE and SAPS severity scores. While neither Procalcitonin nor APACHE and SAPS are directly used for the in-hospital mortality prognosis of ARDS patients, they are simply used as surrogate market indicators for performance to guide product development.
In comparing the results of APDvl to procalcitonin, the FDA-approved procalcitonin assay is intended to be used as a mortality prognostic for sepsis patients. This is a relevant benchmark as most COVID-19 patients with ARDS would also meet Sepsis-3 criteria (infection with dysregulated immune response causing life-threatening organ dysfunction). However, the PCT mortality prognostic requires measuring procalcitonin levels in the patients’ blood on Day 0 or Day 1 and again on Day 4 in order to find whether the level has dropped by 80% or more over that time. This means the PCT prognostic result is not available to the clinical team until four days into treating the patient; in contrast APDvl uses clinical variables measured in the 24 hours prior to the patients’ ARDS diagnosis and is available without waiting to collect further data.
The MOSES study that validated the usefulness of PCT as a mortality prognostic found that their low risk group had an average 28-day mortality of 10.7% (6.6 - 14.9%) compared with 20.4% (16.3 - 24.4%) for their high risk group. Given that the overall mortality rate for their intent to diagnose (ITD) population was 16.9% compared to 48.4% for the validation cohort, these rates cannot be directly compared to the APDv1 lower and higher risk group mortality rates. However the relative risk ratio of their high to low mortality groups is 1.9 while the relative risk ratio of the APDvl high to low mortality groups is 3.0.
Severity scores (e.g., APACHE and SAPS scores) have been developed to compare the severity of illness for critically ill patients. In the validation data sets, the Cleveland Clinic ARDS data set and the eICU observational data sets provided Apache III scores for each patient and the ART data set provided SAPS III scores for each patient.
The Berlin criteria, which is a diagnostic criteria of timing, chest imaging, origin of edema, and hypoxemia for the assessment of ARDS severity can be used to determine the patient mortality risk. However, it has several weaknesses:
The ARDS Prognostic Digital described herein Example 4 provides a strong separation between lower and higher risk groups of ARDS patients with performance comparable to or better than currently available prognostic tools for ARDS patients, with faster and easier data collection than those comparable tools. System and methods described herein evaluate patient mortality risk in three categories for a validation population with an overall mortality rate of 48.4% - the lower risk group has an average mortality rate of 22.1% (95% confidence interval of 15.6 - 30.1%), the medium risk group has an average mortality rate of 43.5% (39.8 - 47.2%), and the higher risk group has an average mortality rate of 66.8% (61.8 -71.4%). For the validation population of 1235 patients, 11% fall in the lower risk group and 31% fall in the higher risk group, with a combined 42% of patients with an actionable recommendation.
This performance is comparable to or better than currently-available FDA-approved mortality risk assessment tools such as procalcitonin and often used severity indicators such as SAPS and APACHE scores. Additionally, it is faster than PCT (the mortality risk is estimated on Day 1, not Day 4 of the ICU stay) and requires less information and fewer lab tests than the APACHE score.
The objective of the present study is: 1) to describe how clinical and biological meaningful ARDS subphenotypes can be created using a minimum set of collectable clinical variables from ARDS patients with PaO2/FiO2 < 300, without the use of biomarkers; 2) to assess the heterogeneity of treatment effect (HTE) of different levels of PEEP (higher or lower) on mortality at the latest follow-up according to subphenotypes determined by K-means clustering clusters derived from clinical characteristics of patients with ARDS; and lastly 3) to assess the heterogeneity in the treatment effect of different levels of PEEP if only ARDS patients with PaO2/FiO2 < 200 are used to develop the subphenotypes.
The Berlin definition of acute respiratory distress syndrome (ARDS) encompasses acute hypoxemic respiratory failure due to a wide variety of etiologies. ARDS consensus definitions to date, including the Berlin definition, have solely relied on clinical variables, which help with early identification of patients and ensure implementation of standardized management and appropriate inclusion of patients in clinical trials. Clinical risk stratification currently depends on the PaO2/FiO2 ratio only. However, due to the inclusion of heterogeneous conditions exhibited within the syndrome, there are significant clinical and biological differences making ARDS challenging to treat.
These differences amongst ARDS patients are associated with variation in risk of disease development and progression, potentially generating differential responses to treatments and interventions. Therefore, identifying groups of patients who have similar clinical, physiologic, or biomarker traits becomes relevant as it can help with stratification of patients based on disease severity or risk of death, enrichment in clinical trials, and better targeting of therapies and interventions. These different groups can be defined as ARDS subphenotypes.
Two ARDS subphenotypes (hypoinflammatory and hyperinflammatory) have been consistently identified based on previous studies using Latent Class Analysis (LCA) and machine learning classifier models, showing that mortality and other clinical outcomes are worse in the hyperinflammatory subphenotype. However, these models are complex, and significant barriers exist in their implementation and use in clinical practice. Existing models use up to 40 predictor variables, including biomarkers and other variables that are not easily and readily available at the bedside which makes generalizability of some models very limited.
Recent publications have provided models with a parsimonious set of variables, but these models were mostly developed using biomarker profiles, which again limits its clinical utility. Furthermore, most previously reported studies have used data from randomized controlled trials conducted by a single network, raising questions about the generalizability of these results to different ARDS populations. Therefore, the aim of this study was to develop and validate a model using a small number of easily available clinical variables and evaluate whether it can identify ARDS subphenotypes in different populations.
A retrospective study was performed in a de-identified dataset pooling data from six randomized clinical trials in patients with ARDS, namely: ARMA, ALVEOLI, FACTT, EDEN, SAILS, and ART. The patients in the ARMA, ALVEOLI, FACTT, EDEN and SAILS trials were eligible if they met the American-European consensus for ARDS, including patients with a PaO2 / FiO2 ratio < 300 up to 48 hours before enrollment. From 1996 to 2013, these trials respectively enrolled 902, 549, 1000, 1000 and 745 patients and tested a variety of interventions. The multinational ART trial enrolled 1010 patients diagnosed with moderate to severe ARDS according to the Berlin criteria (PaO2 / FiO2 ratio < 200) for less than 72 hours of duration and assessed two different ventilatory strategies, between 2011 and 2017.
To avoid biases due to high mortality in the patients in the high tidal volume group of the ARMA study, which is not standard of care since the beginning of 2000, only patients receiving low tidal volume in that study were included (n= 473). All patients from each of the remaining trials were eligible for inclusion in this analysis, with an expected final sample size of 4,777 adult ARDS patients.
Data from the ARDSnet studies is publicly available from the NHLBI ARDS Network and data from the ART trial can be requested from study authors.
Baseline characteristics of the patients in the training and validation sets are presented in Table 28. Pneumonia was the prevailing etiology followed by sepsis and aspiration in all trials. Between 29.3% to 72.7% of the patients were receiving vasopressors at the time of randomization. At randomization, PaO2 / FiO2 ratio ranged from 112 (75 - 158) to 134 (96 -185) mmHg, and PEEP from 8 (5 - 10) to 12 (10 - 14) cmH2O across trials. Mortality at 60 days for the ARDSnet trials ranged from 22.7% to 30.1%, while in the ART trial mortality at 28 days was 58.8%.
Datasets from the six trials were evaluated to identify a set of clinical variables which were most available across all datasets closest to time of randomization. The list of potential elements was then further refined to include only the ones that are frequently observed in the routine care of ARDS patients at the time of its diagnosis. To make a K-means clustering algorithm of potential rapid clinical use, elements which would not be commonly found in the electronic health records (EHR) at the time of ARDS diagnosis, such as biomarker levels, ARDS risk factors, therapeutics for organ support apart from mechanical ventilation settings, treatment assignment, severity scores, and clinical outcomes were excluded from model development.
After all assessment, 16 variables that are routinely collected as part of the usual care and which were uniformly present in all the trials were considered, including: age, gender, arterial pH, PaO2, PaCO2, bicarbonate, creatinine, bilirubin, platelets, heart rate, respiratory rate, mean arterial pressure, positive end-expiratory pressure (PEEP), plateau pressure, FiO2, and tidal volume adjusted for predicted body weight (mL/kg PBW). The PBW was calculated as equal to 50 + 0.91 (centimeters of height - 152.4) in males, and 45.5 + 0.91 (centimeters of height - 152.4) in females. These variables were grouped into five domains named demographics, arterial blood gases, laboratory values, vital signs, and ventilatory variables. Plateau pressure was excluded due to a high rate of missingness across the trials included in the training set.
Data preprocessing was performed before modeling, and the pooled dataset was assessed for completeness and consistency. Patients with values out of the plausible physiological range for a specific variable were excluded from the final analysis. The training dataset was constructed using data from the two largest ARDSnet trials, EDEN and FACTT. The validation dataset was sourced from the four remaining trials: ALVEOLI, ARMA, SAILS, and ART. Means and standard deviations for z-scoring variables were calculated from the training dataset and subsequently applied to the validation data.
Baseline and outcome data were presented according to the assigned subphenotype. Continuous variables were presented as medians with their interquartile ranges and categorical variables as total number and percentage. Proportions were compared using Fisher exact tests and continuous variables were compared using the Wilcoxon rank-sum test. Study outcomes were further compared using the median and mean absolute differences for continuous and categorical values, respectively.
For the model development, the K-means clustering algorithm was used. K-means is one of the simplest and most commonly used classes of clustering algorithms. In critical care research, unsupervised machine learning techniques have already been used in several studies, attempting to find homogeneous subgroups within a broad heterogeneous population. This specific algorithm identifies a K number of clusters in a dataset by finding K centroids within the n-dimensional space of clinical features.
For feature selection, different sets of candidate variables were tested to assess their ability to produce significantly different mortality probabilities in each cluster using the minimum amount of readily available clinical data. For each set of candidate variables, the optimal number of clusters was determined by comparing models with between 2 and 5 clusters, using the Elbow method and the Calinski-Harabasz index. Information about the methods for selecting number of clusters are provided in the supplemental material.
Subsequently, the biological meaningfulness of each cluster was evaluated using their clinical, laboratory, and (when available) biomarker data. Then, each cluster was assigned a subphenotype label (Subphenotype A or Subphenotype B) All iterations in model development were conducted on the training set and the generalizability of the final model was assessed using the validation dataset.
K-means clustering analysis is structured to ignore cases with missing data. No assumption was made for missingness and therefore a complete case analysis was conducted. Model development and evaluation was performed using Python version 3.8 and scikit-leam 0.23.1.
The primary outcome was 60-day mortality for ARDSnet trials and 28-day mortality for the ART trial. Secondary outcomes were 90-day mortality, number of ventilator free days at day 28, and the duration of mechanical ventilation in survivors within the first 28 days post enrollment.
In total, 16 models were tested on ALVEOLI and ART for the differential effect of treatment on PEEP strategy according to subphenotype assignment. Variables in each of the 16 models (denoted as Model B.1, Model B.2...) are shown in Table 29. The testing involved employing a logistic regression model incorporating an interaction term for the product of subphenotype and mortality (28, 60, 90 and 180 day). For the ART trial, also included into the logistic regression model was the hospital of inclusion as a random effect.
Quantile models were used to assess ventilator-free days. Quantile models considered a T = 0.50 and an asymmetric Laplace distribution. P values were extracted after 1,000 bootstrap samplings and the effect estimate is the median difference. p-values <0.05 were considered statistically significant.
Among all trials and clinical measurements available closest to randomization, there were 20 variables that were considered not only routinely collected during care but also uniformly present in all trials. Sixteen different combinations of features were investigated in model development (Table 29). These combinations were defined based on the perceived clinical importance of each variable and their combinations, aiming for a minimum set of variables. According to the Elbow method and the Calinski-Harabasz index, two was the optimal number of K-means clusters among all sixteen models. The cluster of patients assigned to subphenotype B clearly had clinical and laboratory signs compatible with higher inflammation and worst outcomes (e.g., higher mortality). On the other hand, the cluster of patients assigned to subphenotype A exhibited signs of less inflammation and better outcomes (e.g., lower mortality).
The correlation between the 15 variables selected for K-means clustering is shown in Table 30. The strongest correlation was between PEEP and FiO2 (r = 0.49). The optimal number of clusters based on both the Elbow method and the Calinski-Harabasz index determined that two clusters were a better fit than a higher number of clusters.
Further analysis was conducted across a subset of the 16 models. Specifically, across ten of the models (e.g., Models B.2, B.3, B.4, B.6, B.7, B.8, B.10, B.11, B.12, and B.16), absolute mortality difference between subphenotype A and subphenotype B ranged from 3.9% to 13.1% for the FACTT study and between 0.1% to 8.1% for EDEN. The models with the highest 60-day absolute mortality separation between subphenotypes for each of the two trials in the training set were then further evaluated. Models B.2, B.4, and B.8 were consistently amongst the models with highest separation. Of the 3 models with the highest mortality separation, Model B.2 was selected for further investigation, as it required the fewest variables (Table 29).
Based on model B.2, only nine clinical and laboratory variables were included to identify the two distinct subphenotypes in ARDS patients, namely: heart rate, mean arterial pressure, respiratory rate, bilirubin, bicarbonate, creatinine, PaO2, arterial pH, and FiO2. For each variable in the model, opposing measurements could be observed for each subphenotype. Specifically,
Reference is now made to
After comparing the clinical characteristics of the K-means clusters based on model B.2, each K-means cluster was assigned to represent a distinct subphenotype of ARDS, with patients in K-means cluster 1 assigned to subphenotype A, and patients in K-means cluster 2 assigned to subphenotype B. Using blood biomarker information available for a subset of patients from both ARMA and ALVEOLI, subphenotype B showed increased levels of pro-inflammatory markers when compared to subphenotype A (
Furthermore, the other 15 models (e.g., models other than model B.2) were also used to generate two clusters of patients that represent two distinct subphenotypes of ARDS, with patients in K-means cluster 1 assigned to subphenotype A, and patients in K-means cluster 2 assigned to subphenotype B. Table 35B shows the levels of IL-6 in patients of each subphenotype generated by any of the 16 different K-means clustering models. Generally, IL-6 is elevated in subphenotype B patients in comparison to subphenotype A patients.
Additionally, Tables 36-51 show the implementation of the 16 different models for guiding PEEP differential treatment response according to subphenotype assignments based on ARDS severity (e.g., P/F < 200 or P/F < 300 patients) from the ALVEOLI study. Additionally, Tables 52-67 show the implementation of the 16 different models for guiding PEEP differential treatment response according to subphenotype assignments based on ARDS severity (e.g., P/F < 200 or P/F < 300 patients) from the ART study. Generally, the subphenotype assignments of patients across both the ALVEOLI study and the ART study show that within Subphenotype A, patients receiving low PEEP had lower mortality with more ventilator free days, while results were less consistent in Subphenotype B. This suggests that patients in Subphenotype A benefit from lower PEEP, but contrary to current treatment guidelines for ARDS, patients within Subphenotype B may or may not benefit from lower PEEP.
This study has several strengths. First, it is the largest cohort of patients that has been studied to develop distinct phenotypes of ARDS patients. Moreover, the validation cohort included patients from the ART trial, enabling the validation of the model in the contemporaneous population of a large international randomized clinical trial in addition to the ARDSnet studies used in other subphenotyping studies. Second, the subphenotyping classifier was developed exclusively on the training set and then validated across multiple separate datasets and nevertheless similar separation in mortality was seen between the two subphenotypes across all trials. Third, the K-means algorithm was used to identify the subphenotypes, and the results obtained with this technique can be easily interpreted by clinicians and implemented in clinical practice. Lastly, this is the first phenotyping study that has used easily available clinical variables to identify ARDS phenotypes, which allows for early identification of these patients in the clinical care at the bedside. Using this algorithm with a small number of routinely collected variables could enable the model to be applied in trials that either retrospectively or prospectively assess interventions targeted to each subphenotype.
This is a retrospective study in a de-identified dataset pooling data from two randomized clinical trials in patients with ARDS, namely: the ALVEOLI and the ART trial. Patients in the ALVEOLI trial were eligible if they met the American-European Consensus Criteria for ARDS, including patients with a PaO2 / FiO2 ratio < 300 up to 48 hours before enrollment, and assessed a strategy using the high vs. low PEEP table. The ART trial enrolled patients with moderate to severe ARDS according to Berlin criteria (PaO2 / FiO2 ratio < 200) for less than 72 hours’ duration, and assessed two different ventilatory strategies, titrated PEEP with recruitment maneuvers vs. low PEEP according to ARDSNet PEEP FiO2 table. Although the datasets come from rigorous well controlled trials, the pooled dataset was assessed for completeness and consistency.
Subphenotypes were determined by clusters derived from clinical characteristics of patients with ARDS. Briefly, a K-means clustering algorithm was used to develop a model including only variables that are routinely collected and inputted in electronic health records during the care of ARDS patients and were highly available closest to time of randomization. Data used to develop the model were acquired from the clinical trials ARMA, ALVEOLI, EDEN, FACTT, SAILS and ART. EDEN and FACCT were used for the training set. The trials ARMA, ALVEOLI, SAILS and ART were used for validation. The final model segregated patients into two subphenotypes (A and B) using nine of their clinical characteristics: pH, PaO2, mean arterial pressure, bicarbonate, bilirubin, creatinine, FiO2, heart rate, and respiratory rate. Subphenotype B exhibits clinical and laboratory signals compatible with higher inflammation while subphenotype A shows the opposite. Lastly, subphenotype B has higher mortality than subphenotype A.
Heterogeneity of treatment effect of different levels of PEEP was assessed following a Bayesian hierarchical logistic model for the primary outcome. All hierarchical models were modelled as a simple regression and shrinkage model. The hierarchical models partially pool the data and shrink the estimates in each subphenotype towards the overall estimate, with shrinkage proportional to the size of the subphenotype. While traditional subgroup analyses are at higher risk of increased type 1 error due to exaggeration of the subgroup effects, the proposed hierarchical model limits this risk through shrinkage. For all analyses, weakly informative priors will be used, aiming to encompass all plausible effect sizes. Since the sample size of the pooled dataset is expected to be large, probably the likelihood will dominate the posteriors.
The priors were used to reflect varying degrees of beliefs for benefit or harm of higher levels of PEEP. The treatment prior’s distributions are shown in
The prior was a normally distributed prior with mean 0 and variance 2.25 (prior risk with a 95% probability between 5% and 95%). This prior was used for all analysis including the sensitivity analysis with optimistic and pessimistic priors. For a shrinking parameter, the prior was a normally distributed prior with mean of 0 and variance of Ω, where Ω is the shrinkage factor having a half-normally distributed prior with variance of 1. This prior was used for all analysis including the sensitivity analysis with optimistic and pessimistic priors.
For treatment effect, a weakly informative prior was used to produce results essentially dependent on data from the analysis. This was a normally distributed prior with mean of 0 and standard deviation of 0.421 (variance of 0.177). In this prior, there is 90% probability of an 0.50 < OR < 2.00. Additionally, an optimistic prior was defined to represent archetypes of prior belief that higher PEEP effectively lowers mortality. This was a normally distributed prior with mean of -0.287 and standard deviation of 0.174 (variance of 0.030). This prior distribution was centered at an OR of 0.75 based on the assumed relative risk of death used to power the ART trial (OR ≤ 0.75) with a probability of an OR > 1.00 of 5%. Furthermore, a pessimistic prior was defined to represent archetypes of prior belief that higher PEEP increases mortality. This was a normally distributed prior with mean of 0.183 and standard deviation of 0.113 (variance of 0.012). This prior distribution was centered at a OR of 1.20 based on the relative risk of death found in the ART trial with a probability of OR < 1.00 of 5%.
For the interaction term between treatment group and PaO2 / FiO2 (sub-analysis 1), the prior was a normally distributed prior with mean 0 and standard deviation of 0.100 (variance of 0.010) for both terms. This prior distribution corresponds to an OR with mean of 1.00 with 95% prior probability of an OR among 0.82 to 1.22 for a 1-point increase in PaO2 / FiO2. For subphenotype and PaO2 / FiO2 (sub-analysis 2), the prior was a normally distributed prior with mean 0 and standard deviation of 0.100 (variance of 0.010) for both terms. This prior distribution corresponds to an OR with mean of 1.00 with 95% prior probability of an OR among 0.82 to 1.22.
All described Bayesian models were done using a Markov Chain Monte Carlo simulation with four chains. All models will consider a burn-in of 1,000 iterations, with sampling from a further 10,000 iterations for each chain. All chains were required to be free of divergent transitions and additional sampler settings (adapt_delta) were tuned accordingly until this is achieved. To monitor convergence, trace plots, and the Gelman-Rubin convergence diagnostic (Rhat < 1.01) were used for all parameters.
Subphenotype A is characterized by less inflammation, lower severity of illness, improved ventilator-free days and mortality compared with subphenotype B. The subphenotypes were validated as described in Example 5. All analyses are presented in the pooled population combining the ALVEOLI and ART populations and stratified by the study. The primary outcome was 28-day mortality. No secondary outcome was assessed. Continuous data were presented as median (interquartile range) and compared with the Wilcoxon rank-sum test, and categorical data were presented as number and percentage and compared with Fisher exact tests.
For the primary outcome, in addition to the odds ratio (OR) with 95% credible interval (CrI), the probability of the following OR was considered as possible thresholds for the minimum clinically important treatment effect: 1) OR < 1.00; 2) OR < 0.97; and 3) OR < 0.90. To assess the possibility of harm, the probability of harm, defined as a OR > 1.00 (null), is also reported.
To further understand the interaction according to subphenotypes and baseline hypoxemia on HTE for PEEP strategy, the within-phenotype association between higher levels of PEEP and mortality in a mixed-effect Bayesian logistic regression model according to PaO2:FiO2 was used. In this model, interactions between PaO2:FiO2 groups (stratified into six groups) and allocation groups, subphenotypes and allocation groups, and subphenotypes and PaO2:FiO2 groups were included. Also, to assess the interaction according to subphenotypes and baseline driving pressure on HTE for PEEP strategy, the within-phenotype association between higher levels of PEEP and mortality in a mixed-effect Bayesian logistic regression model according to baseline driving pressure was used. In this model, interactions between baseline driving pressure groups (stratified into six groups) and allocation groups, subphenotypes and allocation groups, and subphenotypes and baseline driving pressure groups were included. The model considered a Bernoulli distribution, with studies as random effect and with starting values randomly generated. All priors will be drawn from normal distributions and were weakly informative.
All effect estimates were drawn from the median of the posterior distribution and the 95% CrI from the 95th percentile of the distribution. Additional analyses considering pessimistic and optimistic priors were conducted as sensitivity analyses for the primary HTE analysis. All analyses were performed using the R software (R, version 4.0.2, Core Team, Vienna, Austria, 2016) with the beanz package and Stan through brms.
A total of 1559 ARDS patients from both ALVEOLI and ART trials were considered for this analysis. The majority of the patients were male, and pneumonia was the prevailing etiology followed by sepsis and aspiration in all trials (Table 68). There was no difference in any outcome according to randomization group in the ALVEOLI trial, and in the ART trial ventilator-free days at day 28 were lower in the ART group.
Baseline characteristics of the patients according to the subphenotype in the pooled cohort are described in Table 68. Overall, patients in subphenotype B had statistically detectably higher severity of illness, rate of vasopressor use, heart rate, creatinine, and bilirubin, as well as lower platelets, pH, BUN and bicarbonate compared to patients in subphenotype A (Table 68). 28-day mortality was higher and ventilator-free days at day 28 was lower in patients in subphenotype B. 28-day mortality was lower in patients in the low PEEP group in subphenotype A, and it was higher in the high PEEP group in subphenotype B. This can be seen in Table 68 as well as
High PEEP resulted in higher risk for 28-day mortality compared to low PEEP in patients in subphenotype A (OR, 1.66 [95% CrI, 1.13 to 2.47]), with a probability of benefit in this subphenotype of only 0.6% (Table 70 and
On the other hand, high PEEP did not affect the mortality of patients in subphenotype B (OR, 0.94 [95% CrI, 0.65 to 1.34]; probability of benefit of 63.9%). The probability that assignment to the high PEEP group results in lower OR for 28-day mortality in patients in subphenotype B (more beneficial), compared to subphenotype A, was 98.3%. The signal of the findings was similar in the individual cohorts and the use of different priors did not materially change these findings (Table 69).
The results of the model assessing interactions between subphenotypes, PaO2 / FiO2 and use of high PEEP is shown in
The probability of benefit of high PEEP was always higher in patients in subphenotype B compared to subphenotype A, especially with more severe hypoxemia. The probability of benefit of high PEEP was always higher in patients in subphenotype B compared to subphenotype A, but this probability decreased with increase in baseline driving pressure.
Using subphenotypes previously derived from routine clinical variables, this study demonstrates heterogeneity of treatment effect with regards to PEEP strategies. Subphenotype A, characterized by lower severity of illness and inflammation, had a 99.4% probability of harm when assigned to a high PEEP strategy. The overall sicker subphenotype B was more likely to benefit from a high PEEP strategy compared to A, but overall the mortality in subphenotype B between strategies did not meaningfully differ. These mortality differences between subphenotypes were maintained even when stratified by PaO2:FiO2 ratio or driving pressure. They were also stable across all priors in the Bayesian analyses.
Different training data sets than those used in Examples 1-4 are described here for generating additional models. For example, models were trained on the ARDSnet EDEN and FACTT datasets, and then the results were assessed for differential treatment response. In another alternate training, a specific subset of patients were selected for training from a greater patient population. For example, among the FACTT and EDEN datasets, a population of only patients with moderate to severe ARDS (as characterized by a P/F ratio <= 200 or as characterized by a P/F ratio <= 300) were selected from the entire dataset.
A number of potential features sets were originally examined for their use in the ARDS subphenotyper and mortality predictor. After detailed data audit, a number of additional potential models were examined as shown below (Table 70). The goal of examining the alternate feature sets was to identify the combination of features which provided the maximum biologic meaningfulness (by mortality, biomarker levels, and clinical values) with the smallest possible combination of variables, while covering at least 75% patients in the training data.
After a candidate feature set was identified, the optimal number of K-means clusters was determined by comparing a number of factors, including the elbow criterion method, the Calinski-Harabasz method, and the Silhouette score(“2.3. Clustering — Scikit-Learn 0.23.2 Documentation″ n.d.)(2.3. Clustering — scikit-learn 0.23.2...), across K-means models of 2, 3, 4, and 5 clusters. Feature selection and the number of clusters were selected based on the evaluation on the test set. The validation set was then used to assess the generalizability of the model.
A combination of data sources or subsets of data sources were combined as training data to create an ARDS subphenotyper or mortality predictor using a machine learning algorithm (such as K-means, logistic regression, XG boost, Neural networks, or another machine learning algorithm). The algorithm was applied to another retrospective or prospective data set of ARDS patients. Below, embodiments of differential treatment analysis are described with respect to various clinical interventions based on group assignment made by any machine learning algorithm. Example clinical interventions include NMB Therapy (as described above in Example 1), low or high positive end expiratory pressure (PEEP) which represents a ventilator setting, corticosteroids (e.g., methylprednisolone or dexamethasone, lisofylline (anti-inflammatory), ketoconazole (anti-fungal), catheter and fluid management, recruitment maneuver (ventilator strategy), and statins.
The different clinical interventions were considered for differential treatment response using various combinations of training data, model feature sets, validation data, and recorded interventions. Differential response was examined using numerous outcomes, including mortality, ventilator free days, or ventilator days.
Positive End-Expiratory Pressure (PEEP) is the amount of pressure above atmospheric pressure remaining in the airway at the end of the respiratory cycle (exhalation) in mechanically ventilated patients. Current guidelines recommend high PEEP in patients with moderate or severe ARDS (Papazian et al. 2019; Fan et al. 2017). However, the ideal level of PEEP may also be correlated with a patient’s phenotype.
High PEEP and low PEEP treatments are provided to patients based on the patient’s fraction of inspired oxygen (FiO2) level. Further details of high and low PEEP in relation to patient FiO2 levels are described in Brower RG et al. “Higher versus lower positive end-expiratory pressures in patients with the acute respiratory distress syndrome.” N Engl J Med. 2004 Jul 22;351(4):327-36, which is incorporated by reference in its entirety. In particular, the allowable combinations of PEEP and FiO2 are shown below in Tables 71A-71C. Therefore, a low PEEP treatment for a patient would refer to a particular PEEP (cm H2O) based on the corresponding FiO2 level of the patient shown in Table 71A. Similarly, a high PEEP treatment for a patient would refer to a particular PEEP (cm H2O) based on the corresponding FiO2 level of the patient shown in Table 71B or 71C.
Recruitment maneuvers in ARDS are periods of sustained increased transpulmonary pressure (through increased PEEP) designed to help re-open (recruit) collapsed alveoli. Recommendations about recruitment maneuvers in ARDS are mixed, with some saying “recruitment maneuvers should probably not be used routinely in ARDS patients” (Papazian et al. 2019) and others recommending for recruitment maneuvers with moderate or severe ARDS (Fan et al. 2017). Again, some patients may benefit from increased PEEP via recruitment maneuvers whereas others may benefit from lower levels of PEEP.
To evaluate these hypotheses, K-means clustering was applied using Model C.4 described above in Table 70. In particular, Model C.4 includes the following features: recent arterial pH (Arterial pH-R), lowest bicarbonate (bicarbonate-L), recent creatinine (creatinine-R), recent FiO2 (FiO2-R), recent heart rate (heart rate-R), recent PaO2 (PaO2—R), recent mean arterial pressure (mean arterial pressure-R), recent respiratory rate (respiratory rate-R), and highest bilirubin (bilirubin-H).
In the first iteration, the training data consisted of all patients enrolled in the FACTT and EDEN ARDSnet studies. Patients who did not have measurements for each of the 9 data elements used were excluded from the training dataset. The resulting K-means algorithm was then applied to the ALVEOLI and ART studies (described previously). Key outcomes, including 60 and 90-day mortality (ALVEOLI), 28 and 180-day mortality (ART), ventilator free days, and number of days on ventilator were calculated for each treatment arm of each phenotype, as shown in Tables 72A and 72B below. Mortality was assessed by a logistic regression model incorporating the subphenotype (based on K-means cluster assignment) and an interaction term. Due to overdispersion and excessive zeros, the ventilator and ventilator-free days were compared among the subphenotypes considering a mixed-effect generalized linear model with zero-inflated negative binomial distribution. Models were unadjusted and included the hospital of inclusion as a random effect if hospital information was available. A two-sided p-value < 0.05 was considered evidence of statistical significance. Statistical analysis was performed in R, version 4.0.2.
In both ALVEOLI and ART there was a trend toward significance in mortality, and a significant difference in ventilator free days between subphenotype and study arms. Within subphenotype B (the high mortality subphenotype), patients receiving high PEEP had slightly lower mortality in both studies; however, within subphenotype A, the group receiving low PEEP had lower mortality with more ventilator free days. This suggests that contrary to current treatment guidelines for ARDS, patients within subphenotype A may benefit from lower PEEP.
Findings for the ALVEOLI study aligned with the findings of Calfee et al (Calfee et al. 2014). Within Calfee’s Phenotype 2 (similar to Endpoint Health subphenotype B), mortality was reduced and ventilator-free and organ failure-free days were increased among patients receiving high PEEP. Conversely, Phenotype 1 patients (similar to Endpoint Health subphenotype A) experienced lower mortality when they received low PEEP, though there was little change in ventilator-free and organ failure-free days.
While the findings here show similar results to Calfee et al, they are distinguishable because they are based on a generalizable K-means clustering model which can be applied across numerous data sets, whereas Calfee’s work was trained and evaluated on the same data set. This suggests that the results here could be applied prospectively to data outside of the ALVEOLI data set. The similar findings in ART support this claim.
Characteristics of Subphenotype A show that these patients tend to not be as sick as Subphenotype B patients. They have lower mortality and more ventilator free days. At the time of enrollment, the mean PaO2/FiO2 (P/F ratio) for ALVEOLI was 117.4 (SD = 58.2) for Subphenotype B and 156.2 (SD = 63.3) for Subphenotype A. It was hypothesized that the differential mortality seen due to high and low PEEP may have been due to the proportion of patients with moderate or severe ARDS in each subphenotype compared to patients with mild ARDS. To test this hypothesis, a secondary set of models was created which was only trained and tested on patients with moderate to severe ARDS, removing the possibility of patients with mild ARDS contributing to a false differential response.
In this iteration, the training set still consisted of patients from FACTT and EDEN, however, only patients with moderate or severe ARDS (P/F ratio <= 200) were included in the training data set. A new K-means model was created using the same readily-available data features defined previously. The model was then applied to the ALVEOLI and ART data sets, but again excluding patients with a P/F ratio > 200. Table 73 shows the results. (NOTE: the ART trial originally only excluded patients with a P/F ratio <= 200, so no additional patients were excluded from that study). The same post-hoc analysis was performed to identify statistically significant differences in outcomes.
While mortality was not statistically significant in the ALVEOLI data, there was a decrease in 60-day mortality among subphenotype A patients who received low PEEP therapy. In ART, the difference in mortality across all subphenotypes and treatment arms neared significance, with subphenotype A patients with low PEEP showing reduced mortality, and subphenotype B patients who received high PEEP showing reduced mortality. subphenotype A patients with low PEEP also had significantly more ventilator free days.
The dataset from the LASRS study was used for analysis. The LASRS study involved administration of corticosteroids, specifically methylprednisolone. K-means clustering was applied using Model C.4 described above in Table 70 and patients were separated into two subphenotypes based on the K-means cluster. Tables 74A-74C show the characteristics of the different subphenotypes. Overall mortality was 40% in Subphenotype B and 28.57% in Subphenotype A (p = 0.3287). Within Subphenotype B, mortality rates were 40% regardless of whether the patient received methylprednisolone or a placebo; however, in Subphenotype A, mortality was 50% in the cohort receiving methylprednisolone, compared with 9.09% in the placebo cohort (p = 0.0382).
Observation: Patients that meet the LASRS inclusion criteria that are identified by the test to be in Subphenotype A exhibit higher mortality (50%) when treated with methylprednisolone vs. placebo (9.1%). Hypothesis: Hydrocortisone harms ARDS patients in Subphenotype A. Therefore, when considering methylprednisolone treatment for ARDS patients, the subphenotyping test should be run and methylprednisolone should be avoided for patients identified by the test to be in Subphenotype A.
The dataset from the CoDEX study was used for analysis. The CoDEX study involved treating COVID-19 patients with dexamethasone. K-means clustering was applied using Model C.4 described above in Table 70 and patients were separated into two clusters assigned to Subphenotype A and Subphenotype B. Tables 75A and 75B show the corresponding results. The number of ventilator free days increased by 101% in Subphenotype B patients who received dexamethasone versus placebo; however, the number of vent free days increased by only 45% in patients in Subphenotype A (p = 0.03309).
Observation: Patients that meet the CoDEX inclusion criteria and are treated with dexamethasone that are identified by the test to be in Subphenotype A do not see as strong of an improvement in ventilator free days as patients in Subphenotype B who are treated with dexamethasone.
Hypothesis: The highest improvement in outcomes from dexamethasone therapy for ARDS patients are achieved in patients identified by the test to be in Subphenotype B.
Product use, if hypothesis is confirmed: When considering dexamethasone treatment for ARDS patients, the subphenotyping test should be run and dexamethasone should be administered to patients identified by the test to be in Subphenotype B. The subphenotyping test can be used as a prognostic to better understand the expected ventilator use in individual patients or in a pandemic situation.
The dataset from the ARMA-KARMA-LARMA study was used for analysis. Interventions in the study included lisofylline and ketoconazole. Subphenotype A had a strong signal to not use lisofylline. Overall mortality for ARMA study showed Subphenotype B with 34% mortality and Subphenotype A with 25.9% mortality (Table 76A).
Within the subset of patients identified as lisofylline: active and lisofylline: placebo, the difference in mortality between subphenotypes was negligible, with the Subphenotype A having a 27.1% mortality, and Subphenotype B having a 28% mortality (Table 76B).
When just Subphenotype B was examined, mortality was 40% for patients who got lisofylline, and 16% for patients who received placebo (p = 0.0588) (Table 76C).
There was no significant difference in mortality for patients in Subphenotype A who received lisofylline versus placebo (31.4% vs 22.9%, p = 0.4201) (Table 76D).
Observation: Patients that meet the ARMA-KARMA-LARMA inclusion criteria that are identified by the test to be in Subphenotype B exhibit higher mortality when treated with lisofylline vs. placebo.
Hypothesis: Lisofylline harms ARDS patients in Subphenotype B.
Product use, if hypothesis is confirmed: When considering lisofylline treatment for ARDS patients, the subphenotyping test should be run and lisofylline should be avoided for patients identified by the test to be in Subphenotype B.
The dataset from the FACTT study was used for analysis. The FACTT study involved the use of a pulmonary artery catheter (PAC) in comparison to a less invasive alternative (central venous catheter (CVC). K-means clustering was applied using Model C.4 described above in Table 70 and patients were separated into two clusters, assigned to subphenotype A and subphenotype B. Findings: Preliminary logistic regression analysis showed that subphenotype, and the interaction term of subphenotype and type of line were each significant or nearing significance in predicting 90 day mortality.
Further analysis showed the overall dataset had a high mortality phenotype (Subphenotype B) (34.2%) and a low mortality phenotype (Subphenotype A) (26.0%) (Table 77A).
Among patients who received the CVC line, mortality rates were similar to the overall population (38.1% and 23.7% in the Subphenotype B and Subphenotype A, respectively) (Table 77B).
However, there was no difference in mortality among patients who received the PAC line; mortality was slightly lower in Subphenotype B (30.8%) and slightly higher in Subphenotype A (28.2%) (Table 77C).
There was not a significant interaction between fluid management strategy and a patient’s subphenotype. However, based on the findings that there is a significant interaction with PAC lines and subphenotype, the fluid management strategy was combined with the PAC line to identify interactions. In the Subphenotype B, there was no significant difference (p = 0.9346) in 90-day mortality between PAC line and liberal fluid (34.6% mortality) and the other combinations of line and fluid management (34.1% mortality).
However, in Subphenotype A, mortality increased to 30.3% if a patient was treated with a PAC line and liberal fluid, whereas mortality in the remaining population was 24.6% (p = 0.2601).
A Welch’s two-sample t-test also showed a difference in ventilator free days which neared significance for patients in Subphenotype A who got a PAC line and liberal fluid (13.1 ventilator free days on average) vs all other patients within Subphenotype A(14.9 ventilator free days on average). Specifically, for a t-statistic of 1.62 and 168.81 degrees of freedom, the comparison yielded a p-value of 0.10716.
Observation 1: patients who get a CVC line exhibit similar behavior to subphenotypes, with a high mortality and a low mortality subphenotype; however, mortality rates are not consistent when patients receive a PAC line.
Observation 2: Patients that meet the FACTT inclusion criteria that are identified by the test to be in Subphenotype A exhibit higher mortality when treated with PAC+ liberal fluids vs. PAC + conservative fluid, CVC + conservative fluid, or CVC + liberal fluid.
Hypothesis: PAC+liberal fluids harms ARDS patients in the Subphenotype A.
Product use, if hypothesis is confirmed: When considering PAC+liberal fluids treatment for ARDS patients, the subphenotyping test should be run and PAC+liberal fluids should be avoided for patients identified by the test to be in Subphenotype A.
The dataset from the ART study was used for analysis. The ART study involved administering recruitment maneuvers to patients. K-means clustering was applied using Model C.4 described above in Table 70 and patients were separated into two clusters assigned to subphenotype A and subphenotype B. Logistic regression analysis showed that subphenotype, recruitment maneuver vs standard ARDSnet guidance care, and the interaction term of subphenotype and recruitment maneuver were each significant or nearing significance in predicting 90 day mortality based on Pr(>|z|) scores.
Further chi-square analysis showed the following: Similar to previous findings, a low mortality subphenotype (31.1%) - Subphenotype A, and a high mortality subphenotype (49.6%) - Subphenotype B, were identified (Table 78A).
Among the Subphenotype A, there was no difference in mortality for those who received the standards ARDSnet care (30.6%) versus those who received additional recruitment maneuver via the ART protocol (31.7%, p = 0.8477) (Table 78B).
Among the Subphenotype B, patients who received recruitment maneuvers according to the ART protocol had significantly lower mortality (42.5%) than those who received the standard ARDSnet care protocol (56.8%, p = 0.0234) (Table 78C).
Observation 2: Patients that meet the ART inclusion criteria and that are identified by the test to be in Subphenotype B exhibit lower mortality when treated with a more aggressive recruitment maneuver protocol.
Hypothesis: recruitment maneuvers support ARDS patients in Subphenotype B.
Product use, if hypothesis is confirmed: When considering recruitment maneuver treatment for ARDS patients, the subphenotyping test should be run and recruitment maneuvers should be considered as treatment for Subphenotype B.
The dataset from the eICU (v1) dataset was used for analysis. The intervention of interest was statins. K-means clustering was applied using Model C.4 described above in Table 70 and patients were separated into two clusters, assigned to subphenotype A and subphenotype B. Patients in the Subphenotype A who were charted as on any statin at the time of ICU admission (6.81% mortality) may have increased survival as compared with those who had no statin during their stay (13.28% mortality) (Chi-square = 6.2409, p = 0.012). Patients who initiated a statin during their ICU stay did not see the same mortality benefit as patients on a statin at admission (Chi-square = 0.0802, p = 0.777051); in fact, their mortality rate was closer to that of patients who received no statin therapy (12.56%).
Observation: ARDS patients in the eICU dataset that are identified by the test to be in Subphenotype A and who were taking statins at the time of ICU admission exhibit lower mortality vs. those who were not taking statins at the time of ICU admission.
Hypothesis: ARDS Subphenotype A patients on statins prior to ICU admission exhibit lower mortality.
Product use, if hypothesis is confirmed: ARDS Subphenotype A patients on statins prior to ICU admission exhibit better prognosis. Patients presenting to the emergency department with pneumonia, sepsis or other ARDS risk factors should be tested for their subphenotype. If found to be in Subphenotype A with no contraindications, pre-emptive statins may be considered.
Conversely, in the Subphenotype B, statin therapy seemed to benefit patient outcomes regardless of timing of therapy initiation. Patients who received a statin at any time in their stay had a mortality rate of 26.44% whereas patients who did not receive a statin had a mortality rate of 35.46% (Chi-square = 4.8126, p = 0.028253). Mortality rates were similar whether the statin was already initiated at the time of ICU admit (27%) or initiated during the ICU stay (26%); however chi square was nonsignificant compared with patients not receiving statins, due to the smaller sample size of the subgroups.
Observation: ARDS patients in the eICU dataset that are identified by the test to be in the Subphenotype B exhibit lower mortality when receiving statins during their ICU stay vs. when not receiving statins during their ICU stay. Tables 79A-79C show characteristics of patients that were administered any of simvastatin, atorvastatin, or any statin.
Hypothesis: Subphenotype B ARDS patients exhibit lower mortality when treated with statins.
Product use, if hypothesis is confirmed: ARDS patients identified to be in Subphenotype B using the sub-phenotyping test should be treated with statins
The analysis was repeated on the eICU data, removing patients who had medical history codes which would indicate a patient had an indication for statin use prior to ICU admission. This included patients with history of angina, congestive heart failure, coronary artery bypass grafting, multiple coronary artery bypass, hypertension requiring treatment, previous acute myocardial infarction, peripheral vascular disease, previous coronary intervention procedure, stroke, and/or transient ischemic attack. Tables 80A-80C summarize the results of the analysis.
The individual statins were then examined with no consideration to number of doses and minimum dose size. Using this methodology, there were several differential responses identified (bolded and underlined cells as shown below in Table 81).
This was a retrospective study in a de-identified dataset from one randomized clinical trial in patients with ARDS, entitled ‘Early Versus Delayed Enteral Feeding to Treat People with Acute Lung Injury or Acute Respiratory Distress Syndrome (EDEN)’. Patients were included in the trial in they met the American-European consensus for ARDS, including patients with a PaO2 / FiO2 ratio < 300 up to 48 hours before enrollment, and compared the use of full enteral feeding to trophic feeding.
Data was assessed for completeness and consistency. Of 1,000 patients enrolled, 777 had complete data to train and apply model B.2 as described in Example 5. The majority of the patients were male, and pneumonia was the prevailing etiology followed by sepsis and aspiration.
The primary outcome of the study was 60-day mortality. No secondary outcome was assessed.
The statistical analysis plan was pre-planned. Continuous data were presented as median (quartile 25% - quartile 75%) and compared with the Wilcoxon rank-sum test, and categorical data were presented as number and percentage and compared with Fisher exact tests.
Heterogeneity of Treatment Effect (HTE) of full enteral feeding was assessed following a Bayesian hierarchical logistic model for the primary outcome. All hierarchical models were modelled as a simple regression and shrinkage model. The hierarchical models partially pool the data and shrink the estimates in each subphenotype towards the overall estimate, with shrinkage proportional to the size of the subphenotype. While traditional subgroup analyses are at higher risk of increased type 1 error due to exaggeration of the subgroup effects, the proposed hierarchical model limits this risk through shrinkage.
For all analyses, weakly informative priors were used, aiming to encompass all plausible effect sizes. Since the sample size of the pooled dataset was expected to be large, probably the likelihood will dominate the posteriors.
All described Bayesian models were done using a Markov Chain Monte Carlo simulation with four chains. All models will consider a burn-in of 1,000 iterations, with sampling from a further 10,000 iterations for each chain. All chains were required to be free of divergent transitions and additional sampler settings (adapt delta) were tuned accordingly until this is achieved. To monitor convergence, trace plots, and the Gelman-Rubin convergence diagnostic (Rhat < 1.01) were used for all parameters.
The probability of the following odds ratios (OR) was considered as possible thresholds for the minimum clinically important treatment effect: 1) OR < 1.00; 2) OR < 0.97; and 3) OR < 0.90. These thresholds seem reasonable in view of several considerations. First, the null hypothesis in the frequentist approach is no benefit (OR = 1.00), thus the probability of any benefit (OR < 1.00) will be estimated to evaluate the equivalent hypothesis under Bayesian terms. Second, since the use of statins is a highly feasible intervention, even small effects on mortality would be sufficient to justify its use. Indeed, an OR of 0.97 would be equivalent to an estimated 440 lives saved per year in United States of America (assuming 104000 cases of ARDS annually [7], 40% of these cases meet criteria for moderate-to-severe ARDS [8], and a baseline mortality rate of 35% [8]). To expand the possible detectable effects, we also computed the posterior probabilities at a OR of 0.90, equivalent to 1456 lives saved annually in USA.
The priors were used to reflect varying degrees of beliefs for benefit or harm of use of statins. Specifically,
Intercept: The prior was a normally distributed prior with mean 0 and variance 2.25 (prior risk with a 95% probability between 5% and 95%). This prior was used for all analysis including the sensitivity analysis with optimistic and pessimistic priors.
Shrinkage parameter: The prior was a normally distributed prior with mean of 0 and variance of Ω, where Ω is the shrinkage factor having a half-normally distributed prior with variance of 1. This prior was used for all analysis including the sensitivity analysis with optimistic and pessimistic priors.
Treatment Effect - Weakly informative prior: A weakly informative prior was used to produce results essentially dependent on data from the analysis. This was a normally distributed prior with mean of 0 and standard deviation of 0.421 (variance of 0.177). In this prior, there is 90% probability of an 0.50 < OR < 2.00.
Treatment Effect - Optimistic prior: An optimistic prior will be defined to represent archetypes of prior belief that the use of statins effectively lowers mortality. This will be a normally distributed prior with mean of -0.287 and standard deviation of 0.174 (variance of 0.030). This prior distribution will be centered at an OR of 0.75 with a probability of an OR > 1.00 of 5%. This was chosen because and OR ≤ 0.75 was used to power several studies in the field of ARDS, like the ART, EXPRESS, ALVEOLI, SAILS and ROSE trials. Specifically, the SAILS trial was powered to detect an OR ≤ 0.66, however, we judged this an implausible effect size and chose a more conservative one.
Treatment Effect - Pessimistic Prior: A pessimistic prior will be defined to represent archetypes of prior belief that the use of statins increases mortality. This will be a normally distributed prior with mean of 0.183 and standard deviation of 0.113 (variance of 0.012). This prior distribution will be centered at a OR of 1.20 based on the relative risk of death found in the ART trial with a probability of OR < 1.00 of 5%. This was chosen because the ART trial reports an intervention that ultimately increased mortality in ARDS patients.
For the primary outcome, in addition to the odds ratio (OR) with 95% credible interval (CrI), the probability of the following OR was considered as possible thresholds for the minimum clinically important treatment effect: 1) OR < 1.00; 2) OR < 0.97; and 3) OR < 0.90. To understand the possible harm, the probability of harm, defined as a OR > 1.00 (null), is also reported.
All effect estimates were drawn from the median of the posterior distribution and the 95% CrI from the 95% percentiles of the distribution. Additional analyses considering pessimistic and optimistic priors were conducted as sensitivity analyses for the primary HTE analysis. All analyses were performed using the R software (R, version 4.0.2, Core Team, Vienna, Austria, 2016) with the beanz package and Stan through brms.
Baseline characteristics of the patients according to the subphenotype is described in Table 82. Overall, patients in subphenotype B had statistically significant higher severity of illness, rate of vasopressor use, heart rate, creatinine, and bilirubin, as well as lower platelets, pH, BUN and bicarbonate compared to patients in subphenotype A.
Table 83 summarizes EDEN outcomes by subphenotype and feeding intervention. 60-day mortality was higher and ventilator-free days at day 28 was lower in patients in subphenotype B. 60-day mortality was lower in patients in the full enteral feeding group in subphenotype A, and it was higher in this group in subphenotype B (Table 83). Additionally,
There was no difference in mortality with the use of full enteral feeding neither in subphenotype A (OR, 0.78 [95% CrI, 0.49 to 1.22], probability of benefit of 86.3%) nor in subphenotype B (OR, 1.05 [95% CrI, 0.66 to 1.67], probability of benefit of 42.1%) (Table 84). However, the probability that assignment to a full enteral feeding group results in lower OR for 60-day mortality in patients in subphenotype B (more beneficial), compared to subphenotype A, was only 18.3%. The use of different priors did not materially change these findings (Table 84). These results are further observed in
Product use, if hypothesis confirmed: ARDS patients identified as Subphenotype A should be treated with full feeding; ARDS patients identified as Subphenotype B should be treated with full or trophic feeding.
The preliminary analysis of ARDS subphenotypes to drive neuromuscular block treatment guidance described above in Example 1 represents preliminary findings in observational data and randomized clinical trials studying interventions other than neuromuscular block. Findings in these trials may be driven by patient severity of illness, hospital and/or study protocol, or other unknown factors.
These findings suggest the presence of a differential response, but a clinical trial of neuromuscular block would be required to show a differential response. In May 2021, data from the Reevaluation of Systemic Early Neuromuscular Blockade (ROSE) trial became publicly available. Because the trial was a controlled study of neuromuscular blockade, it allows for more accurate analysis of differential response in ARDS subphenotypes to neuromuscular blockade.
The ROSE trial enrolled 1006 ARDS patients with a PaO2/FiO2 ratio < 150 and a PEEP > 8 between January 2016 and April 2018. Data was cleaned and prepared in Python. Data elements of interest were identified across the various data tables provided by the ROSES authors and collated into a single dataframe/CSV. Data columns with text for missing values were changed to numeric, with NaN replacing text strings.
In previous work, the MAP, creatinine, heart rate, and respiratory rate used in the subphenotyper were aggregated based on the value measured closest to randomization. The ROSE trial did not provide that aggregation measure; instead the highest and lowest values in the 24 hours prior to randomization were provided for those values, which is consistent with calculation of the APACHE score. Because the most recent aggregation method was not available, the APACHE aggregation method to determine values to input to the subphenotyping algorithm. The APACHE method provides a standard midpoint for each clinical variable. For the highest and lowest value, the distance from the mean is calculated. Whichever value (highest or lowest) was furthest from the midpoint was used for input to the subphenotyper.
If the high MAP was further from the APACHE midpoint, it was used. If the low MAP was furthest from the APACHE midpoint, it was used. If the high and low value were equidistant to the midpoint, the value which would receive more APACHE points was used. In the event that high and low value were equidistant to the APACHE midpoint and had the same APACHE points, the lower MAP value was used.
All high and low heart rate values which were equidistant to the APACHE midpoint were in the zero APACHE points range (low value >= 50 bmp and high value <=99 bmp). In all cases, the higher heart rate was used.
Based on study inclusion criteria, all patients were assumed to be mechanically ventilated. This was confirmed in the SCREENING.csv data form in the field scr_intubdttm (hours from randomization to current intubation). 1005/1006 patients had a negative value, signifying intubation prior to study enrollment (one patient had a null value). Because all patients were ventilated, respiratory rates 6 - 12 and 14-24 were both considered 0 APACHE points. APACHE documentation is unclear on how to handle a respiratory rate of 13 in ventilated patients. In one patient with a low respriatory rate of 13 and high respiratory rate of 25, we made the assumption that 13 bpm would be scored as a 0 and used the higher respiratory rate as the most recent respiratory rate. 11 patients had a high and low respiratory rate between 14 - 24. For those patients, the higher respiratory rate was used.
1 patient had a high and low creatinine value that were equidistant from the APACHE midpoint. They were found to not have acute renal failure (high creatinine = 1.02, low creatinine 0.98, urine output = 1885 mL, no history of chronic dialysis). Both the high and low value fell in the 0 point range for APACHE. For that patient, the higher creatinine score was used, because higher creatinine values are typically associated with higher APACHE scores. 398 patients had equal high and low creatinine values, in which case the value from the higher creatinine field was used.
The physiologic limits identified in previous work were applied to the 1006 patients in the ROSE trial (Table 85). 3 patients had values outside of the previously identified physiologic limits. Those values were replaced with null values, which exclude the patient from being assigned a subphenotype.
Table 86 shows the percentage of missing data for each of the 9 data elements used in the ARDS phenotyper. Rates of missingness were less than 7% for all elements except bilirubin, which had 27.8% missing.
Outcome data derived from study data was calculated and provided by the study authors without need for further processing. Derived outcomes included all cause mortality prior to discharge home before 90 (the primary study outcome), study hospital mortality prior to discharge alive to day 28, vent free days (to day 28), hospital free days (to day 28), and ICU free days (to day 28). The date of hospital discharge alive through 90 days and the last date of assisted breathing to day 28 were also provided.
A patient subphenotype classifier (referred as Model B.2 in Example 5) was applied to the 657 ROSE trial data patients that did not have missing data. Of those, 127 (19.3%) were identified as subphenotype A and 525 (80.7%) were assigned to subphenotype B.
The previous hypothesis of lower inflammation in subphenotype A was supported in this data by subphenotype A exhibiting a lower SOFA and APACHE score at study enrollment, lower use of vasopressors and corticosteroids at enrollment, and, in general less severe clinical manifestation, including lower temperature, heart rate, respiratory rate, creatinine, BUN, FiO2, and plateau pressure, and higher mean arterial pressure, urine output, albumin, bicarbonate, arterial pH, PaO2/FiO2. Similarly, Subphenotype A had better outcomes, with lower mortality at 28 and 90 days, and more ventilator, icu, and hospital free days at day 28.
Clinical characteristics of the ROSE population and subphenotypes A and B are shown in Table 87.
Next, the outcomes were compared across intervention and subphenotype (Table 88).
Patients in subphenotype A who received no treatment (the control group) had higher mortality and fewer ventilator, ICU, and hospital free days than subphenotype A patients in the cohort who received NMB. Thus, NMB therapy can benefit patients in subphenotype A. Conversely, patients in subphenotype B did not have dramatic differences in mortality or ventilator, ICU, or hospital free days.
Further analysis of differential response was carried out using binomial regression for binary outcomes and quantile regression for continuous variables. Of note, model B.2. trained on all EDEN and FACTT and applied to ROSE showed a p value of 0.077 for 90-day mortality (the primary study outcome) interaction between subphenotype and NMB treatment (Table 89).
Day of hospital discharge through 90 days and final day of assisted breathing through day 28 were available.
Overall, the findings of the re-analysis of the randomized controlled ROSE trial suggest that patients in Subphenotype A benefit from neuromuscular blockade, while patients in Subphenotype B may or may not benefit from neuromuscular blockaded.
Table 90 summarizes the guided differential treatments for ARDS patients K-means clustered in either Subphenotype A or Subphenotype B using a model (e.g., model C.4) disclosed herein.
This application claims the benefit of and priority to U.S. Provisional Pat. Application No. 63/034,368 filed on Jun. 3, 2020, U.S. Provisional Pat. Application No. 63/064,054 filed on Aug. 11, 2020, and U.S. Provisional Pat. Application No. 63/180,880 filed on Apr. 28, 2021, the entire disclosure of each of which is hereby incorporated by reference in its entirety for all purposes.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/035638 | 6/3/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63034368 | Jun 2020 | US | |
63064054 | Aug 2020 | US | |
63180880 | Apr 2021 | US |